# Concepts¶

## DataArray and Dataset meta data handling¶

This section describes details about how coords (and masks) of datasets and data arrays behave when slicing, combining, or inserting.

[1]:

import numpy as np
import scipp as sc

x = sc.Variable(dims=['x'], values=[1,2,3,4])
da = sc.DataArray(data=x,
coords={'x':x},
ds = sc.Dataset(data={'a':da})


Consider a data array da and a dataset ds with an aligned coord and an aligned mask. The following conditions must hold:

[2]:

assert 'x' in da['x', 0:1].coords # range slice preserves coord
assert 'x' in da['x', 0:1].masks # range slice preserves coord
assert 'x' in da['x', 0].attrs # point slice converts coord to attr
assert 'x' not in da['x', 0].coords
assert 'x' in da['x', 0].attrs
assert 'x' in da['x', 0].masks # point slice preserves masks as aligned

[3]:

assert sc.identical(ds['a']['x', 0:1], ds['x', 0:1]['a'])
assert sc.identical(ds['a']['x', 0], ds['x', 0]['a'])

[4]:

assert 'x' in ds['a'].coords
assert 'x' in ds['x', 0:1].coords
assert 'x' not in ds['x', 0].coords # cannot have attr (unaligned coord) in dataset
assert 'x' in ds['x', 0:1]['a'].coords
assert 'x' in ds['a']['x', 0].attrs
assert 'x' in ds['x', 0]['a'].attrs



In operations, coords are compared:

[5]:

try:
ok = da['x', 0:1] + da['x', 1:2]
except:
ok = False
assert not ok


Mismatching attrs (“unaligned coords”) are dropped:

[6]:

assert sc.identical(da + da['x', 1], da + da['x', 1].data)


[7]:

assert not sc.identical(da + da['x', 0], da + da['x', 0].data)


A missing attr is interpreted as mismatch to ensure that:

[8]:

a = da['x', 0]
b = da['x', 1]
c = da['x', 2]
assert sc.identical(a + (b + c), (a + b) + c)


Insertion order does not matter for attrs:

[9]:

a = da.copy()
a.attrs['attr'] = 1.0 * sc.units.m
b = da.copy()
b.attrs['attr'] = 2.0 * sc.units.m
ds1 = sc.Dataset()
ds2 = sc.Dataset()
ds1['a'] = a
ds1['b'] = b
ds2['b'] = b
ds2['a'] = a
assert sc.identical(ds1, ds2)


Insert into dataset with mismatching attrs drops attr:

[10]:

ds = sc.Dataset()
ds.coords['x'] = x['x', 0]
ds['a'] = da['x', 1] # Drops 'x' from 'a'
assert sc.identical(ds.coords['x'], ds['a'].coords['x']) # shadowing is NOT supported


Masks of dataset items are independent:

[11]:

ds = sc.Dataset()


If there is no coord it is preserved for all items. Adding a coord later makes the meta property invalid because of ambiguous name shadowing:

[12]:

ds = sc.Dataset()
ds['a'] = da['x', 0]
ds['b'] = da['x', 1]
assert 'x' not in ds.coords
assert 'x' in ds['a'].attrs
assert 'x' in ds['b'].attrs
ds.coords['x'] = x['x', 0] # introduce shadowing
try:
ds['a'].meta # raises because of shadowing
except:
ok = True
else:
ok = False
assert ok
del ds.coords['x']

[13]:

edges = sc.Variable(dims=['x'], values=[1,2,3,4,5])
da.coords['x'] = edges
assert sc.identical(sc.concatenate(da['x', :2], da['x', 2:], 'x'), da)
assert sc.identical(sc.concatenate(da['x', 0], da['x', 1], 'x'), da['x', 0:2])
assert sc.identical(sc.concatenate(da['x', :-1], da['x', -1], 'x'), da)
da_yx = sc.concatenate(da['x', :2], da['x', 2:], 'y') # create 2-D coord
assert sc.identical(da_yx.coords['x'], sc.concatenate(da.coords['x']['x', :3], da.coords['x']['x', 2:], 'y'))


2-D coords for a dimension prevent operations between slices that are not along that dimension:

[14]:

da_2d = sc.DataArray(
data=sc.zeros(dims=['y', 'x'], shape=[2, 2]),
coords={
'x':sc.Variable(dims=['y', 'x'], values=np.array([[1, 2], [3, 4]])),
'y':sc.Variable(dims=['y'], values=[3, 4])})

da_2d['x', 0] + da_2d['x', 1] # Same as with 1-D coord: x-coord differs but not aligned due to slice.
try:
# 'y' sliced, so 'x' coord is aligned and yields different values from slices of 2-D coord.
da_2d['y', 0] + da_2d['y', 1]
except RuntimeError:
ok = False
else:
ok = True
assert not ok


coords always refers to (aligned) coords in dataset, cannot add or erase via item since a new coord dict is created when getting a dataset item:

[15]:

try:
ds['a'].coords['fail'] = 1.0 * sc.units.m
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'fail' not in ds.coords

[16]:

ds.coords['xx'] = 1.0 * sc.units.m
assert 'xx' in ds['a'].coords
try:
del ds['a'].coords['xx']
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'xx' in ds.coords


The same mechanism applies for coords, masks, and attrs of slices:

[17]:

try:
da['x', 0].coords['fail'] = 1.0 * sc.units.m
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'fail' not in da.coords


meta contains dataset coordinates as well as item attributes, cannot add or erase, since ambiguous:

[18]:

try:
ds['a'].meta['fail'] = 1.0 * sc.units.m
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'fail' not in ds['a'].meta

[19]:

ds['a'].attrs['attr'] = 1.0 * sc.units.m
assert 'attr' in ds['a'].meta
try:
del ds['a'].meta['attr']
except sc.DataArrayError:
ok = False
else:
ok = True
assert not ok
assert 'attr' in ds['a'].meta


Attributes are independent for each item, and show up in meta of the items:

[20]:

ds['a'].attrs['attr'] = 1.0 * sc.units.m
ds['b'].attrs['attr'] = 2.0 * sc.units.m
assert 'attr' in ds['a'].meta
assert 'attr' in ds['b'].meta
assert 'attr' not in ds.meta
assert not sc.identical(ds['a'].attrs['attr'], ds['b'].attrs['attr'])
del ds['a'].attrs['attr']
del ds['b'].attrs['attr']