Ownership mechanism and readonly flags#

The Scipp data structures (variables, data arrays, and datasets) behave mostly like nested Python objects, i.e., sub-objects are shared by default. Some of the effects are exemplified in the following.

Shared ownership#

Variables#

Slices or other views of variables are also of type Variable and all views share ownership of the underlying data.

If a variable refers only to a section of the underlying data buffer this is indicated in the HTML view in the title line as part of the size (“x Bytes out of y Bytes”). This allows for identification of “small” variables that keep alive potentially large buffers:

[1]:
import scipp as sc

var = sc.arange(dim='x', unit='m', start=0, stop=12)
var['x', 4:6]
[1]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (272 Bytes out of 352 Bytes)
    • (x: 2)
      int64
      m
      4, 5
      Values:
      array([4, 5])

To create a variable with sole ownership of a buffer, use the copy() method:

[2]:
var['x', 4:6].copy()
[2]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (272 Bytes)
    • (x: 2)
      int64
      m
      4, 5
      Values:
      array([4, 5])

By default, copy() returns a deep copy. Shallow copies can be made by specifying deep=False, which preserves shared ownership of underlying buffers:

[3]:
shallow_copy = var['x', 4:6].copy(deep=False)
shallow_copy
[3]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (272 Bytes out of 352 Bytes)
    • (x: 2)
      int64
      m
      4, 5
      Values:
      array([4, 5])

Data arrays#

As a result of the sharing mechanism, extra care must be taken in some cases, just like when working with any other Python library. Consider the following example, using the same variable as data and as a coordinate:

[4]:
da = sc.DataArray(data=var, coords={'x': var})
da += 666 * sc.units.m
da
[4]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.19 KB)
    • x: 12
    • x
      (x)
      int64
      m
      666, 667, ..., 676, 677
      Values:
      array([666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677])
    • (x)
      int64
      m
      666, 667, ..., 676, 677
      Values:
      array([666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677])

The modification unintentionally also affected the coordinate. However, if we think of data arrays and coordinate dicts as Python-like objects, then the behavior should not be surprising.

Note that the original var is also affected:

[5]:
var
[5]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (352 Bytes)
    • (x: 12)
      int64
      m
      666, 667, ..., 676, 677
      Values:
      array([666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677])

To avoid this, use copy(), e.g.,:

[6]:
da = sc.DataArray(data=var.copy(), coords={'x': var.copy()})
da += 666 * sc.units.m
da
[6]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.19 KB)
    • x: 12
    • x
      (x)
      int64
      m
      666, 667, ..., 676, 677
      Values:
      array([666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677])
    • (x)
      int64
      m
      1332, 1333, ..., 1342, 1343
      Values:
      array([1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342, 1343])

Apart from the standard and pythonic behavior, one advantage of this is that creating data arrays from variables is typically cheap, without inflicting copies of potentially large objects.

Datasets#

Just like creating data arrays from variables is cheap (without deep-copies), inserting items into datasets does not inflict potentially expensive deep copies:

[7]:
ds = sc.Dataset({'a': da})  # shallow copy

Note that while the buffers are shared, the meta-data dicts coords and masks are not. Compare:

[8]:
ds['a'].masks['m'] = da.coords['x'] < 670 * sc.Unit('m')
'm' in da.masks  # the masks *dict* is copied
[8]:
False

with

[9]:
da.coords['x'] *= -1
# the coords *dict* is copied,
# but the 'x' coordinate references same buffer
ds.coords['x']
[9]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (352 Bytes)
    • (x: 12)
      int64
      m
      -666, -667, ..., -676, -677
      Values:
      array([-666, -667, -668, -669, -670, -671, -672, -673, -674, -675, -676, -677])

Read-only flags#

Consider the following attempt to modify the data via a slice:

[10]:
try:
    da['x', 0].data = var['x', 2]
except sc.DataArrayError as e:
    print(e)
Read-only flag is set, cannot set new data.

Since da['x',0] is itself a data array, assigning to the data property would repoint the data to whatever is given on the right-hand side. However, this would not affect da, and the attempt to change the data would silently do nothing, since the temporary da['x',0] disappears immediately. The read-only flag protects us from this.

To actually modify the slice, use __setitem__ instead:

[11]:
da['x', 0] = var['x', 2]

Variables, meta-data dicts (coords, masks, and attrs properties), data arrays, and datasets also have read-only flags. The flags solve a number of conceptual issues and serve as a safeguard against hidden bugs.

One example is a broadcast of a variable:

[12]:
var = sc.broadcast(sc.scalar(1.0), dims=['x'], shape=[10])
try:
    var += 7
except sc.VariableError as e:
    print(e)
Read-only flag is set, cannot mutate data.

Since broadcast returns a view, the readon-only flag is set to avoid multiple additions to the same element.