Concepts#
DataArray and Dataset meta data handling#
This section describes details about how coords (and masks) of datasets and data arrays behave when slicing, combining, or inserting.
[1]:
import numpy as np
import scipp as sc
x = sc.Variable(dims=['x'], values=[1, 2, 3, 4])
da = sc.DataArray(data=x, coords={'x': x}, masks={'x': sc.less(x, 2 * sc.units.one)})
ds = sc.Dataset(data={'a': da})
Consider a data array da
and a dataset ds
with an aligned coord and an aligned mask. The following conditions must hold:
[2]:
assert da['x', 0:1].coords['x'].aligned # range slice preserves coord and alignment
assert 'x' in da['x', 0:1].masks # range slice preserves mask
assert not da['x', 0].coords['x'].aligned # point slice makes coord unaligned
assert 'x' in da['x', 0].masks # point slice preserves masks
[3]:
assert sc.identical(ds['a']['x', 0:1], ds['x', 0:1]['a'])
assert sc.identical(ds['a']['x', 0], ds['x', 0]['a'])
[4]:
assert ds['a'].coords['x'].aligned
assert ds['x', 0:1].coords['x'].aligned
assert not ds['x', 0].coords['x'].aligned
assert 'x' in ds['a'].masks
assert 'x' in ds['x', 0:1]['a'].masks
assert 'x' in ds['a']['x', 0].masks
assert 'x' in ds['x', 0]['a'].masks
In operations, aligned coords are compared:
[5]:
try:
ok = da['x', 0:1] + da['x', 1:2]
except RuntimeError:
ok = False
assert not ok
Mismatching unaligned coords are dropped:
[6]:
assert sc.identical(da + da['x', 1], da + da['x', 1].data)
Masks are ORed, there is no concept of “unaligned masks”:
[7]:
assert not sc.identical(da + da['x', 0], da + da['x', 0].data)
Missing unaligned coords are interpreted as mismatch:
[8]:
a = da['x', 0].copy()
b = da['x', 1].copy()
c = da['x', 2].copy()
assert sc.identical(a + (b + c), (a + b) + c)
Aligned coords take precedence over unaligned coords:
[9]:
a = da['x', 0].copy()
a.coords.set_aligned('x', True)
b = da['x', 1].copy()
assert sc.identical((a + b).coords['x'], a.coords['x'])
Masks of dataset items are independent:
[10]:
masked1 = da.copy()
masked1.masks['x'] = sc.less(x, 1 * sc.units.one)
masked2 = da.copy()
masked2.masks['x'] = sc.less(x, 2 * sc.units.one)
assert not sc.identical(masked1, masked2)
ds = sc.Dataset({'a': masked1, 'b': masked2})
assert not sc.identical(ds['a'].masks['x'], ds['b'].masks['x'])
[11]:
edges = sc.Variable(dims=['x'], values=[1, 2, 3, 4, 5])
da.coords['x'] = edges
assert sc.identical(sc.concat([da['x', :2], da['x', 2:]], 'x'), da)
assert sc.identical(sc.concat([da['x', 0], da['x', 1]], 'x'), da['x', 0:2])
assert sc.identical(sc.concat([da['x', :-1], da['x', -1]], 'x'), da)
da_yx = sc.concat([da['x', :2], da['x', 2:]], 'y') # create 2-D coord
assert sc.identical(
da_yx.coords['x'],
sc.concat([da.coords['x']['x', :3], da.coords['x']['x', 2:]], 'y'),
)
2-D coords for a dimension prevent operations between slices that are not along that dimension:
[12]:
da_2d = sc.DataArray(
data=sc.zeros(dims=['y', 'x'], shape=[2, 2]),
coords={
'x': sc.Variable(dims=['y', 'x'], values=np.array([[1, 2], [3, 4]])),
'y': sc.Variable(dims=['y'], values=[3, 4]),
},
)
(
da_2d['x', 0] + da_2d['x', 1]
) # Same as with 1-D coord: x-coord differs but not aligned due to slice.
try:
# 'y' sliced, so 'x' coord is aligned and yields different values from slices of 2-D coord.
da_2d['y', 0] + da_2d['y', 1]
except RuntimeError:
ok = False
else:
ok = True
assert not ok
Coords cannot be added or erased via items since a new coord dict is created when getting a dataset item:
[13]:
try:
ds['a'].coords['fail'] = 1.0 * sc.units.m
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'fail' not in ds.coords
[14]:
ds.coords['xx'] = 1.0 * sc.units.m
assert 'xx' in ds['a'].coords
try:
del ds['a'].coords['xx']
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'xx' in ds.coords
The same mechanism applies for coords, masks, and attrs of slices:
[15]:
try:
da['x', 0].coords['fail'] = 1.0 * sc.units.m
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'fail' not in da.coords