Slicing#

Variable, DataArrays, and Datasets in scipp can be sliced in several ways. The general way is positional indexing using indices as in numpy. A second approach is to use label-based indexing which is uses actual coordinate values for selection. Positional and label-based indexing returns view into the indexed object and can be used to modify an object in-place.

In addition, advanced indexing, which comprises integer array indexing and boolean variable indexing, can be used for more complex selections. Unlike the aforementioned basic positional and label-based indexing, indexing with integer arrays or boolean variables returns a copy of the indexed object.

Positional indexing#

Overview#

Data in a variable, data array, or dataset can be indexed in a similar manner to NumPy and xarray. The dimension to be sliced is specified using a dimension label. In contrast to NumPy, positional dimension lookup is not available, unless the object being sliced is one-dimensional. Positional indexing with an integer or an integer range is made via __getitem__ and __setitem__ with a dimension label as first argument. This is available for variables, data arrays, and datasets. In all cases a view is returned, i.e., just like when slicing a numpy.ndarray no copy is performed.

Variables#

Consider the following variable:

[1]:
import numpy as np
import scipp as sc

var = sc.array(
    dims=['z', 'y', 'x'],
    values=np.random.rand(2, 3, 4),
    variances=np.random.rand(2, 3, 4))
sc.show(var)
dims=['z', 'y', 'x'], shape=[2, 3, 4], unit=dimensionless, variances=Truevariances z yxvalues z yx

As when slicing a numpy.ndarray, the dimension 'x' is removed since no range is specified:

[2]:
s = var['x', 1]
sc.show(s)
print(s.dims, s.shape)
dims=['z', 'y'], shape=[2, 3], unit=dimensionless, variances=Truevariances zyvalues zy
['z', 'y'] [2, 3]

When a range is specified, the dimension is kept, even if it has extent 1:

[3]:
s = var['x', 1:3]
sc.show(s)
print(s.dims, s.shape)

s = var['x', 1:2]
sc.show(s)
print(s.dims, s.shape)
dims=['z', 'y', 'x'], shape=[2, 3, 2], unit=dimensionless, variances=Truevariances z yxvalues z yx
['z', 'y', 'x'] [2, 3, 2]
dims=['z', 'y', 'x'], shape=[2, 3, 1], unit=dimensionless, variances=Truevariances z yxvalues z yx
['z', 'y', 'x'] [2, 3, 1]

Slicing can be chained arbitrarily:

[4]:
s = var['x', 1:4]['y', 2]['x', 1]
sc.show(s)
print(s.dims, s.shape)
dims=['z'], shape=[2], unit=dimensionless, variances=Truevariances zvalues z
['z'] [2]

The copy() method turns a view obtained from a slice into an independent object:`

[5]:
s = var['x', 1:2].copy()
s += 1000
var
[5]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (384 Bytes)
    • (z: 2, y: 3, x: 4)
      float64
      𝟙
      0.720, 0.181, ..., 0.687, 0.988
      σ = 0.899, 0.699, ..., 0.904, 0.610
      Values:
      array([[[0.7197341 , 0.18067049, 0.66820569, 0.54271058], [0.47814241, 0.58353295, 0.82518448, 0.63937649], [0.14597885, 0.4249462 , 0.21622178, 0.74995095]], [[0.65945631, 0.32610762, 0.51346945, 0.17579602], [0.09967696, 0.24618239, 0.40101757, 0.31719137], [0.38610718, 0.24119835, 0.68732417, 0.98757746]]])

      Variances (σ²):
      array([[[0.80740181, 0.48903164, 0.94335996, 0.87748022], [0.21262572, 0.47774104, 0.84622518, 0.48340431], [0.10336407, 0.35764896, 0.14090342, 0.91097062]], [[0.0045115 , 0.29568494, 0.36082308, 0.7380746 ], [0.86615464, 0.51982108, 0.84614991, 0.89257784], [0.63695428, 0.2399802 , 0.81668978, 0.3719499 ]]])

To avoid subtle and hard-to-spot bugs, positional indexing without dimension label is in general not supported:

[6]:
try:
    var[1]
except sc.DimensionError as e:
    print(e)
Slicing with implicit dimension label is only possible for 1-D objects. Got Sizes[z:2, y:3, x:4, ] with ndim=3. Provide an explicit dimension label, e.g., var['z', 0] instead of var[0].

Scipp makes an exception from this rule in the unambiguous case of 1-D objects:

[7]:
var1d = sc.linspace(dim='x', start=0.1, stop=0.2, num=5)
var1d[1]
[7]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (8 Bytes out of 40 Bytes)
    • ()
      float64
      𝟙
      0.125
      Values:
      array(0.125)
[8]:
var1d[2:4]
[8]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (16 Bytes out of 40 Bytes)
    • (x: 2)
      float64
      𝟙
      0.150, 0.175
      Values:
      array([0.15 , 0.175])

Positional index also supports an optional stride (step):

[9]:
var['x', 1:4:2]
[9]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (192 Bytes out of 384 Bytes)
    • (z: 2, y: 3, x: 2)
      float64
      𝟙
      0.181, 0.543, ..., 0.241, 0.988
      σ = 0.699, 0.937, ..., 0.490, 0.610
      Values:
      array([[[0.18067049, 0.54271058], [0.58353295, 0.63937649], [0.4249462 , 0.74995095]], [[0.32610762, 0.17579602], [0.24618239, 0.31719137], [0.24119835, 0.98757746]]])

      Variances (σ²):
      array([[[0.48903164, 0.87748022], [0.47774104, 0.48340431], [0.35764896, 0.91097062]], [[0.29568494, 0.7380746 ], [0.51982108, 0.89257784], [0.2399802 , 0.3719499 ]]])

Negative step sizes are current not supported.

Data arrays#

Slicing for data arrays works in the same way, but some additional rules apply. Consider:

[10]:
a = sc.DataArray(
    data=sc.array(dims=['y', 'x'], values=np.random.rand(2, 3)),
    coords={
        'x': sc.array(dims=['x'], values=np.arange(3.0), unit=sc.units.m),
        'y': sc.array(dims=['y'], values=np.arange(2.0), unit=sc.units.m)},
    masks={
        'mask': sc.array(dims=['x'], values=[True, False, False])},
    attrs={
        'aux_x': sc.array(dims=['x'], values=np.arange(3.0), unit=sc.units.m),
        'aux_y': sc.array(dims=['y'], values=np.arange(2.0), unit=sc.units.m)})
sc.show(a)
a
(dims=['y', 'x'], shape=[2, 3], unit=dimensionless, variances=False)values yx aux_yaux_y(dims=['y'], shape=[2], unit=m, variances=False)values y yy(dims=['y'], shape=[2], unit=m, variances=False)values y xx(dims=['x'], shape=[3], unit=m, variances=False)values x maskmask(dims=['x'], shape=[3], unit=None, variances=False)values x aux_xaux_x(dims=['x'], shape=[3], unit=m, variances=False)values x
[10]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (131 Bytes)
    • y: 2
    • x: 3
    • x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])
    • (y, x)
      float64
      𝟙
      0.440, 0.326, ..., 0.067, 0.126
      Values:
      array([[0.44026803, 0.32551925, 0.13312726], [0.74997156, 0.06724533, 0.12561651]])
    • mask
      (x)
      bool
      True, False, False
      Values:
      array([ True, False, False])
    • aux_x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • aux_y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])

As when slicing a variable, the sliced dimension is removed when slicing without range, and kept when slicing with range.

When slicing a data array the following additional rule applies:

  • Meta data (coords, masks, attrs) that do not depend on the slice dimension are marked as readonly

  • Slicing without range:

    • The coordinates for the sliced dimension are removed and inserted as attributes instead.

  • Slicing with a range:

    • The coordinates for the sliced dimension are kept.

The rationale behind this mechanism is as follows. Meta data is often of a lower dimensionality than data, such as in this example where coords, masks, and attrs are 1-D whereas data is 2-D. Elements of meta data entries are thus shared by many data elements, and we must be careful to not apply operations to subsets of data while unintentionally modifying meta data for other unrelated data elements:

[11]:
a['x', 0:1].coords['x'] *= 2  # ok, modifies only coord value "private" to this x-slice
try:
    a['x', 0:1].coords['y'] *= 2  # not ok, would modify coord value "shared" by all x-slices
except sc.VariableError as e:
    print(f'\'y\' is shared with other \'x\'-slices and should not be modified by the slice, so we get an error:\n{e}')
'y' is shared with other 'x'-slices and should not be modified by the slice, so we get an error:
Read-only flag is set, cannot mutate data.

In practice, a much more dangerous issue this mechanism protects from is unintentional changes to masks. Consider

[12]:
val = a['x', 1]['y', 1].copy()
val
[12]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (41 Bytes)
    • ()
      float64
      𝟙
      0.0672453259408392
      Values:
      array(0.06724533)
    • mask
      ()
      bool
      False
      Values:
      array(False)
    • aux_x
      ()
      float64
      m
      1.0
      Values:
      array(1.)
    • aux_y
      ()
      float64
      m
      1.0
      Values:
      array(1.)
    • x
      ()
      float64
      m
      1.0
      Values:
      array(1.)
    • y
      ()
      float64
      m
      1.0
      Values:
      array(1.)

If we now assign this scalar val to a slice at y=0, using = we need to update the mask. However, the mask in this example depends only on x so it also applies to the slices y=1. If we would allow updating the mask, the following would unmask data for all y:

[13]:
try:
    a['y', 0] = val
except sc.DimensionError as e:
    print(e)
Cannot update meta data 'mask' via slice since it is implicitly broadcast along the slice dimension 'y'.

Since we cannot update the mask in a consistent manner the entire operation fails. Data is not modified. The same mechanism is applied for binary arithmetic operations such as += where the masks would be updated using a logical OR operation.

The purpose for turning coords into attrs when slicing without a range is to support useful operations such as:

[14]:
a - a['x', 1]  # compute difference compared to data at x=1
[14]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (107 Bytes)
    • y: 2
    • x: 3
    • x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])
    • (y, x)
      float64
      𝟙
      0.115, 0.0, ..., 0.0, 0.058
      Values:
      array([[ 0.11474878, 0. , -0.19239198], [ 0.68272623, 0. , 0.05837119]])
    • mask
      (x)
      bool
      True, False, False
      Values:
      array([ True, False, False])
    • aux_y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])

If a['x', 0] had an x coordinate this would fail due to a coord mismatch. If coord checking is required, use a range-slice such as a['x', 1:2]. Compare the two cases shown in the following and make sure to inspect the dims and shape of all variables (data and coordinates) of the resulting slice views (note the tooltip shown when moving the mouse over the name also contains this information):

[15]:
sc.show(a['y', 1:2])  # Range of length 1
a['y', 1:2]
(dims=['y', 'x'], shape=[1, 3], unit=dimensionless, variances=False)values yx aux_yaux_y(dims=['y'], shape=[1], unit=m, variances=False)values y yy(dims=['y'], shape=[1], unit=m, variances=False)values y xx(dims=['x'], shape=[3], unit=m, variances=False)values x maskmask(dims=['x'], shape=[3], unit=None, variances=False)values x aux_xaux_x(dims=['x'], shape=[3], unit=m, variances=False)values x
[15]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (91 Bytes out of 131 Bytes)
    • y: 1
    • x: 3
    • x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • y
      (y)
      float64
      m
      1.0
      Values:
      array([1.])
    • (y, x)
      float64
      𝟙
      0.750, 0.067, 0.126
      Values:
      array([[0.74997156, 0.06724533, 0.12561651]])
    • mask
      (x)
      bool
      True, False, False
      Values:
      array([ True, False, False])
    • aux_x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • aux_y
      (y)
      float64
      m
      1.0
      Values:
      array([1.])
[16]:
sc.show(a['y', 1])  # No range
a['y', 1]
(dims=['x'], shape=[3], unit=dimensionless, variances=False)values x yy(dims=[], shape=[], unit=m, variances=False)values aux_yaux_y(dims=[], shape=[], unit=m, variances=False)values xx(dims=['x'], shape=[3], unit=m, variances=False)values x maskmask(dims=['x'], shape=[3], unit=None, variances=False)values x aux_xaux_x(dims=['x'], shape=[3], unit=m, variances=False)values x
[16]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (91 Bytes out of 131 Bytes)
    • x: 3
    • x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • (x)
      float64
      𝟙
      0.750, 0.067, 0.126
      Values:
      array([0.74997156, 0.06724533, 0.12561651])
    • mask
      (x)
      bool
      True, False, False
      Values:
      array([ True, False, False])
    • aux_x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • aux_y
      ()
      float64
      m
      1.0
      Values:
      array(1.)
    • y
      ()
      float64
      m
      1.0
      Values:
      array(1.)

Datasets#

Slicing for datasets works just like for data arrays. In addition to changing certain coords into attrs and marking certain meta data entries as read-only, slicing a dataset also marks lower-dimensional data entries readonly. Consider a dataset d:

[17]:
d = sc.Dataset(
    data={
        'a': sc.array(dims=['y', 'x'], values=np.random.rand(2, 3)),
        'b': sc.array(dims=['x', 'y'], values=np.random.rand(3, 2)),
        'c': sc.array(dims=['y'], values=np.random.rand(2)),
        '0d-data': sc.scalar(1.0)},
    coords={
        'x': sc.array(dims=['x'], values=np.arange(3.0), unit=sc.units.m),
        'y': sc.array(dims=['y'], values=np.arange(2.0), unit=sc.units.m)})
sc.show(d)
bb(dims=['x', 'y'], shape=[3, 2], unit=dimensionless, variances=False)values x y aa(dims=['y', 'x'], shape=[2, 3], unit=dimensionless, variances=False)values yx yy(dims=['y'], shape=[2], unit=m, variances=False)values y cc(dims=['y'], shape=[2], unit=dimensionless, variances=False)values y 0d-da..0d-data(dims=[], shape=[], unit=dimensionless, variances=False)values xx(dims=['x'], shape=[3], unit=m, variances=False)values x

and a slice of d:

[18]:
sc.show(d['y', 0])
cc(dims=[], shape=[], unit=dimensionless, variances=False)values 0d-da..0d-data(dims=[], shape=[], unit=dimensionless, variances=False)values bb(dims=['x'], shape=[3], unit=dimensionless, variances=False)values x aa(dims=['x'], shape=[3], unit=dimensionless, variances=False)values x xx(dims=['x'], shape=[3], unit=m, variances=False)values x

By marking lower-dimensional entries in the slice as read-only we prevent unintentional multiple modifications of the same scalar:

[19]:
try:
    d['y', 0] += 1  # would add 1 to `0d-data`
    d['y', 1] += 2  # would add 2 to `0d-data`
except sc.VariableError as e:
    print(e)
Read-only flag is set, cannot mutate data.

This is an important aspect and it is worthwhile to take some time and think through the mechanism.

Slicing a data item of a dataset should not bring any surprises. Essentially this behaves like slicing a data array:

[20]:
sc.show(d['a']['x', 1:2])
(dims=['y', 'x'], shape=[2, 1], unit=dimensionless, variances=False)values yx yy(dims=['y'], shape=[2], unit=m, variances=False)values y xx(dims=['x'], shape=[1], unit=m, variances=False)values x

Slicing and item access can be done in arbitrary order with identical results:

[21]:
d['x', 1:2]['a'] == d['a']['x', 1:2]
d['x', 1:2]['a'].coords['x'] == d.coords['x']['x', 1:2]
[21]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (1 Bytes)
    • (x: 1)
      bool
      True
      Values:
      array([ True])

Label-based indexing#

Overview#

Data in a dataset or data array can be selected by the coordinate value. This is similar to pandas pandas.DataFrame.loc. Scipp leverages its ubiquitous support for physical units to provide label-based indexing in an intuitive manner, using the same syntax as positional indexing. For example:

  • array['x', 0:3] selects positionally, i.e., returns the first three element along 'x'.

  • array['x', 1.2*sc.units.m:1.3*sc.units.m] selects by label, i.e., returns the elements along 'x' falling between 1.2 m and 1.3 m.

That is, label-based indexing is made via __getitem__ and __setitem__ with a dimension label as first argument and a scalar variable or a Python slice() as created by the colon operator : from two scalar variables. In all cases a view is returned, i.e., just like when slicing a numpy.ndarray no copy is performed.

Consider:

[22]:
da = sc.DataArray(
    data=sc.array(dims=['year','x'], values=np.random.random((3, 7))),
    coords={
        'x': sc.array(dims=['x'], values=np.linspace(0.1, 0.9, num=7), unit=sc.units.m),
        'year': sc.array(dims=['year'], values=[2020,2023,2027])})
sc.show(da)
da
(dims=['year', 'x'], shape=[3, 7], unit=dimensionless, variances=False)values yearx yearyear(dims=['year'], shape=[3], unit=dimensionless, variances=False)values year xx(dims=['x'], shape=[7], unit=m, variances=False)values x
[22]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (248 Bytes)
    • year: 3
    • x: 7
    • x
      (x)
      float64
      m
      0.1, 0.233, ..., 0.767, 0.9
      Values:
      array([0.1 , 0.23333333, 0.36666667, 0.5 , 0.63333333, 0.76666667, 0.9 ])
    • year
      (year)
      int64
      𝟙
      2020, 2023, 2027
      Values:
      array([2020, 2023, 2027])
    • (year, x)
      float64
      𝟙
      0.167, 0.622, ..., 0.199, 0.750
      Values:
      array([[0.1671047 , 0.6222413 , 0.5590205 , 0.48905804, 0.35423582, 0.50849897, 0.62686244], [0.47029184, 0.25629033, 0.42196477, 0.7233372 , 0.06477951, 0.92316231, 0.88674644], [0.12850545, 0.31899502, 0.61892534, 0.24962953, 0.01610898, 0.19949215, 0.74974637]])

We can select a slice of da based on the 'year' labels:

[23]:
year = sc.scalar(2023)
da['year', year]
[23]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (120 Bytes out of 248 Bytes)
    • x: 7
    • x
      (x)
      float64
      m
      0.1, 0.233, ..., 0.767, 0.9
      Values:
      array([0.1 , 0.23333333, 0.36666667, 0.5 , 0.63333333, 0.76666667, 0.9 ])
    • (x)
      float64
      𝟙
      0.470, 0.256, ..., 0.923, 0.887
      Values:
      array([0.47029184, 0.25629033, 0.42196477, 0.7233372 , 0.06477951, 0.92316231, 0.88674644])
    • year
      ()
      int64
      𝟙
      2023
      Values:
      array(2023)

In this case 2023 is the second element of the coordinate so this is equivalent to positionally slicing data['year', 1] and the usual rules regarding dropping dimensions and converting dimension coordinates to attributes apply:

[24]:
assert sc.identical(da['year', year], da['year', 1])

Warning

It is essential to not mix up integers and scalar scipp variables containing an integer. As in above example, positional indexing yields different slices than label-based indexing.

Note

Here, we created year using sc.scalar. Alternatively, we could use year = 2023 * sc.units.dimensionless which is useful for dimensionful coordinates like 'x' in this case, see below.

For floating-point-valued coordinates selecting a single point would require an exact match, which is typically not feasible in practice. Scipp does not do fuzzy matching in this case, instead an IndexError is raised:

[25]:
x = 0.23 * sc.units.m # No x coordinate value at this point. Equivalent of sc.scalar(0.23, unit=sc.units.m)
try:
    da['x', x]
except IndexError as e:
    print(str(e))
Coord x does not contain unique point with value <scipp.Variable> ()    float64              [m]  [0.23]

For such coordinates we may thus use an interval to select a range of values using the : operator:

[26]:
x_left = 0.1 * sc.units.m
x_right = 0.4 * sc.units.m
da['x', x_left:x_right]
[26]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (120 Bytes out of 248 Bytes)
    • year: 3
    • x: 3
    • x
      (x)
      float64
      m
      0.1, 0.233, 0.367
      Values:
      array([0.1 , 0.23333333, 0.36666667])
    • year
      (year)
      int64
      𝟙
      2020, 2023, 2027
      Values:
      array([2020, 2023, 2027])
    • (year, x)
      float64
      𝟙
      0.167, 0.622, ..., 0.319, 0.619
      Values:
      array([[0.1671047 , 0.6222413 , 0.5590205 ], [0.47029184, 0.25629033, 0.42196477], [0.12850545, 0.31899502, 0.61892534]])

The selection includes the bounds on the “left” but excludes the bounds on the “right”, i.e., we select the half-open interval \(x \in [x_{\text{left}},x_{\text{right}})\), closed on the left and open on the right.

The half-open interval implies that we can select consecutive intervals without including any data point in both intervals:

[27]:
x_mid = 0.2 * sc.units.m
sc.to_html(da['x', x_left:x_mid])
sc.to_html(da['x', x_mid:x_right])
Show/Hide data repr Show/Hide attributes
scipp.DataArray (56 Bytes out of 248 Bytes)
    • year: 3
    • x: 1
    • x
      (x)
      float64
      m
      0.1
      Values:
      array([0.1])
    • year
      (year)
      int64
      𝟙
      2020, 2023, 2027
      Values:
      array([2020, 2023, 2027])
    • (year, x)
      float64
      𝟙
      0.167, 0.470, 0.129
      Values:
      array([[0.1671047 ], [0.47029184], [0.12850545]])
Show/Hide data repr Show/Hide attributes
scipp.DataArray (88 Bytes out of 248 Bytes)
    • year: 3
    • x: 2
    • x
      (x)
      float64
      m
      0.233, 0.367
      Values:
      array([0.23333333, 0.36666667])
    • year
      (year)
      int64
      𝟙
      2020, 2023, 2027
      Values:
      array([2020, 2023, 2027])
    • (year, x)
      float64
      𝟙
      0.622, 0.559, ..., 0.319, 0.619
      Values:
      array([[0.6222413 , 0.5590205 ], [0.25629033, 0.42196477], [0.31899502, 0.61892534]])

Just like when slicing positionally one of the bounds can be omitted, to include either everything from the start, or everything until the end:

[28]:
da['x', :x_right]
[28]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (120 Bytes out of 248 Bytes)
    • year: 3
    • x: 3
    • x
      (x)
      float64
      m
      0.1, 0.233, 0.367
      Values:
      array([0.1 , 0.23333333, 0.36666667])
    • year
      (year)
      int64
      𝟙
      2020, 2023, 2027
      Values:
      array([2020, 2023, 2027])
    • (year, x)
      float64
      𝟙
      0.167, 0.622, ..., 0.319, 0.619
      Values:
      array([[0.1671047 , 0.6222413 , 0.5590205 ], [0.47029184, 0.25629033, 0.42196477], [0.12850545, 0.31899502, 0.61892534]])

Coordinates used for label-based indexing must be monotonically ordered. While it is natural to think of slicing in terms of ascending coordinates, the slicing mechanism also works for descending coordinates.

Bin-edge coordinates#

Bin-edge coordinates are handled slightly differently from standard coordinates in label-based indexing. Consider:

[29]:
da = sc.DataArray(
    data = sc.array(dims=['x'], values=np.random.random(7)),
    coords={
        'x': sc.array(dims=['x'], values=np.linspace(1.0, 2.0, num=8), unit=sc.units.m)})
da
[29]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (120 Bytes)
    • x: 7
    • x
      (x [bin-edge])
      float64
      m
      1.0, 1.143, ..., 1.857, 2.0
      Values:
      array([1. , 1.14285714, 1.28571429, 1.42857143, 1.57142857, 1.71428571, 1.85714286, 2. ])
    • (x)
      float64
      𝟙
      0.497, 0.003, ..., 0.708, 0.305
      Values:
      array([0.4973011 , 0.00274724, 0.93056575, 0.53321967, 0.07771822, 0.70756957, 0.30490565])

Here 'x' is a bin-edge coordinate, i.e., its length exceeds the array dimensions by one. Label-based slicing with a single coord value finds and returns the bin that contains the given coord value:

[30]:
x = 1.5 * sc.units.m
da['x', x]
[30]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (24 Bytes out of 120 Bytes)
    • ()
      float64
      𝟙
      0.5332196701385937
      Values:
      array(0.53321967)
    • x
      (x [bin-edge])
      float64
      m
      1.429, 1.571
      Values:
      array([1.42857143, 1.57142857])

If an interval is provided when slicing with a bin-edge coordinate, the range of bins containing the values falling into the right-open interval bounds is selected:

[31]:
x_left = 1.3 * sc.units.m
x_right = 1.7 * sc.units.m
da['x', x_left:x_right]
[31]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (56 Bytes out of 120 Bytes)
    • x: 3
    • x
      (x [bin-edge])
      float64
      m
      1.286, 1.429, 1.571, 1.714
      Values:
      array([1.28571429, 1.42857143, 1.57142857, 1.71428571])
    • (x)
      float64
      𝟙
      0.931, 0.533, 0.078
      Values:
      array([0.93056575, 0.53321967, 0.07771822])

Limitations#

Label-based indexing not supported for:

  • Multi-dimensional coordinates.

  • Non-monotonic coordinates.

The first is a fundamental limitation since a slice cannot be defined in such as case. The latter two will likely be supported in the future to some extent.

Advanced indexing#

Integer array indexing#

Indexing a variable, data array, or dataset with an integer array (instead of an integer or slice) returns a new variable, data array, or dataset including the elements with the elements at the selected positions.

This is identical to positional indexing except that multiple “positions” (but not ranges) can be specified at once, and a single object is returned.

Consider:

[32]:
import scipp as sc

var = sc.arange('dummy', 12).fold(dim='dummy', sizes={'x':6, 'y':2})
var
[32]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (96 Bytes)
    • (x: 6, y: 2)
      int64
      𝟙
      0, 1, ..., 10, 11
      Values:
      array([[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11]])

Indexing with a list [1,2,5] extracts the corresponding slices and return a single output object containing the specified slices:

[33]:
var['x', [1,2,5]]
[33]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (48 Bytes)
    • (x: 3, y: 2)
      int64
      𝟙
      2, 3, ..., 10, 11
      Values:
      array([[ 2, 3], [ 4, 5], [10, 11]])

Note

By necessity — since the returned elements are generally non-contiguous — indexing with an integer array returns a copy of the input data. This is in contrast to positional indexing and label-based indexing which return a view.

Note that indexing with an index array always returns a copy, even if the selected elements form a contiguous range.

For 1-D objects the dimension label can be omitted:

[34]:
var1d = sc.linspace(dim='x', start=0.1, stop=0.2, num=5)
var1d[[3,1,3]]
[34]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (24 Bytes)
    • (x: 3)
      float64
      𝟙
      0.175, 0.125, 0.175
      Values:
      array([0.175, 0.125, 0.175])

Boolean variable indexing#

Indexing a variable, data array, or dataset with a condition variable of dtype=bool returns a new variable, data array, or dataset including the elements where the condition is True.

The condition variable must be 1-D and must be compatible with the shape of the indexed object. Consider a 2-D variable:

[35]:
import scipp as sc

var = sc.arange('dummy', 12).fold(dim='dummy', sizes={'x':6, 'y':2})
var
[35]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (96 Bytes)
    • (x: 6, y: 2)
      int64
      𝟙
      0, 1, ..., 10, 11
      Values:
      array([[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11]])

Indexing with a boolean variable corresponds to extracting rows or columns:

[36]:
condition = sc.array(dims=['x'], values=[True, False, False, True, False, False])
var[condition]
[36]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (32 Bytes)
    • (x: 2, y: 2)
      int64
      𝟙
      0, 1, 6, 7
      Values:
      array([[0, 1], [6, 7]])
[37]:
condition = sc.array(dims=['y'], values=[False, True])
var[condition]
[37]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (48 Bytes)
    • (x: 6, y: 1)
      int64
      𝟙
      1, 3, ..., 9, 11
      Values:
      array([[ 1], [ 3], [ 5], [ 7], [ 9], [11]])

Note

By necessity — since the returned elements are generally non-contiguous — indexing with a condition variable returns a copy of the input data. This is in contrast to positional indexing and label-based indexing which return a view.

Note that indexing with a condition variable always returns a copy, even if the selected elements form a contiguous range.

Given a multi-dimensional condition variable such as

[38]:
condition = var < 5
condition
[38]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (12 Bytes)
    • (x: 6, y: 2)
      bool
      True, True, ..., False, False
      Values:
      array([[ True, True], [ True, True], [ True, False], [False, False], [False, False], [False, False]])

an indexing attempt will raise an error:

[39]:
var[condition]
---------------------------------------------------------------------------
DimensionError                            Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 var[condition]

DimensionError: Expected 1 dimensions, got 2

Unlike numpy’s boolean array indexing scipp does not support this, since it would require automatic flattening for the output, which is incompatible with scipp’s philosophy of enforcing labeled dimensions. If such indexing is required, flatten by hand instead:

[40]:
var.flatten(to='elem')[condition.flatten(to='elem')]
[40]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (40 Bytes)
    • (elem: 5)
      int64
      𝟙
      0, 1, 2, 3, 4
      Values:
      array([0, 1, 2, 3, 4])

When the index object has a bin-edge coordinate along the dimension being index, this coordinate is dropped from the output, since edges from non-adjacent bins are generally not compatible, i.e., we cannot define a sensible output coordinate:

[41]:
da = sc.DataArray(var)
da.coords['x'] = sc.arange('x', 7)
da.coords['x2'] = sc.arange('x', 6)
da
[41]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (200 Bytes)
    • x: 6
    • y: 2
    • x
      (x [bin-edge])
      int64
      𝟙
      0, 1, ..., 5, 6
      Values:
      array([0, 1, 2, 3, 4, 5, 6])
    • x2
      (x)
      int64
      𝟙
      0, 1, ..., 4, 5
      Values:
      array([0, 1, 2, 3, 4, 5])
    • (x, y)
      int64
      𝟙
      0, 1, ..., 10, 11
      Values:
      array([[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11]])
[42]:
condition = sc.array(dims=['x'], values=[True, False, False, True, False, False])
da[condition]
[42]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (48 Bytes)
    • x: 2
    • y: 2
    • x2
      (x)
      int64
      𝟙
      0, 3
      Values:
      array([0, 3])
    • (x, y)
      int64
      𝟙
      0, 1, 6, 7
      Values:
      array([[0, 1], [6, 7]])