Masking

[1]:
import numpy as np
import scipp as sc

Creating and manipulating masks

Masks are simply variables with dtype=bool:

[2]:
mask = sc.Variable(dims=['x'], values=[False, False, True])
sc.table(mask)

Boolean operators can be used to manipulate such variables:

[3]:
print(~mask)
print(mask ^ mask)
print(mask & ~mask)
print(mask | ~mask)
<scipp.Variable> (x: 3)       bool  [dimensionless]  [True, True, False]
<scipp.Variable> (x: 3)       bool  [dimensionless]  [False, False, False]
<scipp.Variable> (x: 3)       bool  [dimensionless]  [False, False, False]
<scipp.Variable> (x: 3)       bool  [dimensionless]  [True, True, True]

Comparison operators such as ==, !=, <, or >= (see also the list of comparison functions) are a common method of defining masks:

[4]:
var = sc.Variable(dims=['x'], values=np.random.random(5), unit=sc.units.m)
mask2 = var < 0.5 * sc.units.m
mask2
[4]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (5 Bytes)
    • (x: 5)
      bool
      True, True, True, True, True
      Values:
      array([ True, True, True, True, True])

Masks in data arrays and items of dataset

Data arrays and equivalently items of dataset can store arbitrary masks. Datasets themselves do not support masks. Masks are accessible using the masks keyword-argument and property, which behaves in the same way as coords:

[5]:
a = sc.DataArray(
    data = sc.Variable(dims=['y', 'x'], values=np.arange(1.0, 7.0).reshape((2, 3))),
    coords={
        'y': sc.Variable(dims=['y'], values=np.arange(2.0), unit=sc.units.m),
        'x': sc.Variable(dims=['x'], values=np.arange(3.0), unit=sc.units.m)},
    masks={
        'x': sc.Variable(dims=['x'], values=[False, False, True])}
    )
sc.show(a)
(dims=['y', 'x'], shape=[2, 3], unit=dimensionless, variances=False)values yx yy(dims=['y'], shape=[2], unit=m, variances=False)values y xx(dims=['x'], shape=[3], unit=m, variances=False)values x xx(dims=['x'], shape=[3], unit=dimensionless, variances=False)values x
[6]:
b = a.copy()
b.masks['x'].values[1] = True
b.masks['y'] = sc.Variable(dims=['y'], values=[False, True])

Note that setting a mask does not affect the data.

Masks of dataset items are accessed using the masks property of the item:

[7]:
ds = sc.Dataset(data={'a':a})
ds['a'].masks['x']
[7]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (3 Bytes)
    • (x: 3)
      bool
      False, False, True
      Values:
      array([False, False, True])

Operations with masked objects

Element-wise binary operations

The result of operations between data arrays or dataset with masks contains the masks of both inputs. If both inputs contain a mask with the same name, the output mask is the combination of the input masks with an OR operation:

[8]:
a + b
[8]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (93 Bytes)
    • y: 2
    • x: 3
    • x
      (x)
      float64
      m
      0.0, 1.0, 2.0
      Values:
      array([0., 1., 2.])
    • y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])
    • (y, x)
      float64
      2.0, 4.0, ..., 10.0, 12.0
      Values:
      array([[ 2., 4., 6.], [ 8., 10., 12.]])
    • x
      (x)
      bool
      False, True, True
      Values:
      array([False, True, True])
    • y
      (y)
      bool
      False, True
      Values:
      array([False, True])

Reduction operations

Operations like sum and mean over a particular dimension cannot preserve masks that depend on this dimension. If this is the case, the mask is applied during the operation and is not present in the output:

[9]:
sc.sum(a, 'x')
[9]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (32 Bytes)
    • y: 2
    • y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])
    • (y)
      float64
      3.0, 9.0
      Values:
      array([3., 9.])

The mean operation takes into account that masking is reducing the number of points in the mean, i.e., masked elements are not counted (in contrast to, e.g., treating them as 0):

[10]:
sc.mean(a, 'x')
[10]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (32 Bytes)
    • y: 2
    • y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])
    • (y)
      float64
      1.5, 4.5
      Values:
      array([1.5, 4.5])

If a mask does not depend on the dimension used for the sum or mean operation, it is preserved. Here b has two masks, one that is applied and one that is preserved:

[11]:
sc.sum(b, 'x')
[11]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (34 Bytes)
    • y: 2
    • y
      (y)
      float64
      m
      0.0, 1.0
      Values:
      array([0., 1.])
    • (y)
      float64
      1.0, 4.0
      Values:
      array([1., 4.])
    • y
      (y)
      bool
      False, True
      Values:
      array([False, True])