Data Structures#

To keep this documentation generic we typically use dimensions x or y, but this should not be seen as a recommendation to use these labels for anything but actual positions or offsets in space.

Variable#

Basics#

scipp.Variable is a labeled multi-dimensional array. A variable has the following key properties:

values: a multi-dimensional array of values, e.g., similar to a numpy.ndarray
variances: a (optional) multi-dimensional array of variances for the array values
dims: a list of dimension labels (strings) for each axis of the array
unit: a (optional) physical unit of the values in the array

Note that variables, unlike DataArray and its eponym xarray.DataArray, do not have coordinate dicts.

[1]:

import numpy as np
import scipp as sc

Variables should generally be created using one of the available creation functions. For example, we can create a variable from a NumPy array:

[2]:

var = sc.array(dims=['x', 'y'], values=np.random.rand(2, 4), unit='s')

Using a unit is optional but highly encouraged if the variable represents a physical quantity. See Creating Arrays and Datasets for an overview of the different methods for creating variables.

Note:

Internally Scipp does not use NumPy, so the above makes a copy of the numpy array of values into an internal buffer.

We can inspect the created variable as follows:

[3]:

sc.show(var)

[4]:

var

[4]:

scipp.Variable (320 Bytes)

(x: 2, y: 4)

float64

0.085, 0.792, ..., 0.577, 0.254

Values:
array([[0.08508553, 0.79172351, 0.75280512, 0.45086292],
       [0.60671371, 0.803682  , 0.57709797, 0.25405065]])

WARNING:

The above makes use of IPython’s rich output representation, but relying on this feature has a common pitfall:

IPython (and thus Jupyter) has an Output caching system. By default this keeps the last 1000 cell outputs. In the above case this is var (not the displayed HTML, but the object itself). If such cell outputs are large then this output cache can consume enormous amounts of memory.

Note that del var will not release the memory, since the IPython output cache still holds a reference to the same object. See this FAQ entry for clearing or disabling this caching.

[5]:

var.unit

[5]:

[6]:

var.values

[6]:

array([[0.08508553, 0.79172351, 0.75280512, 0.45086292],
       [0.60671371, 0.803682  , 0.57709797, 0.25405065]])

0-D variables (scalars)#

A 0-dimensional variable contains a single value (and an optional variance).

[7]:

scalar = sc.scalar(1.2, unit='s')
sc.show(scalar)
scalar

[7]:

scipp.Variable (264 Bytes)

- ()
  float64
  s
  1.2
```
Values:
array(1.2)
```

Singular versions of the values and variances properties are provided:

[8]:

print(scalar.value)
print(scalar.variance)

1.2
None

An exception is raised from the value and variance properties if the variable is not 0-dimensional.

Note:

Scalar variables are distinct from arrays that contain a single value. For example, sc.scalar(1) is equivalent to sc.array(dims=[], values=1). But all the following are distinct:

sc.array(dims=[], values=1)
sc.array(dims=['x'], values=[1])
sc.array(dims=['x', 'y'], values=[[1]])

In particular, the first is a scalar while the other two are not; they are arrays with an extent of one. Accessing the value property of one of the latter two variables would raise an exception because this property requires a 0-dimensional variable.

DataGroup#

scipp.DataGroup is a dict-like container for arbitrary Scipp or Python objects. Unlike Dataset, DataGroup does not have coords and does not enforce compatible dimensions of its items. A DataGroup can contain other DataGroup objects and thus allows for representing tree-like data. It can be created like a Python dict:

[32]:

import numpy as np

import scipp as sc

dg = sc.DataGroup(
    a=sc.arange('x', 4),
    b=sc.arange('x', 6),
    c=sc.arange('y', 2),
    d=np.ones((2, 3)),
    e='a string',
)
dg

[32]:

scipp.DataGroup

(x: None, y: 2)

a
scipp
Variable
(x: 4)
int64
𝟙
0, 1, 2, 3
b
scipp
Variable
(x: 6)
int64
𝟙
0, 1, ..., 4, 5
c
scipp
Variable
(y: 2)
int64
𝟙
0, 1
d
numpy
ndarray
()
shape=(2, 3), dtype=float64, values=1.0, ... , 1.0
e
str
()
a string

Just like DataArray, DataGroup provides properties such as dims, shape, and sizes:

[33]:

dg.dims

[33]:

('x', 'y')

[34]:

dg.shape

[34]:

(None, 2)

[35]:

dg.sizes

[35]:

{'x': None, 'y': 2}

The properties return the union of these properties over all the items in the data group. Non-Scipp objects are considered to have dims=() and shape=(). When items have inconsistent size along a dimension then shape and sizes report this as None.

DataGroup supports positional indexing if the shape along the indexed dimension is unique. Label-based indexing is supported if all items have a corresponding coordinate, even if the shape is not unique.

Most Scipp operations also work for DataGroup, provided that the operation works for all items in the group. That is, operations will generally fail if the data group contains non-Scipp objects such as NumPy arrays or other Python objects such as integers or strings.

This Page

Data Structures#

Variable#

Basics#

0-D variables (scalars)#

DataArray#

Basics#

Dataset#

DataGroup#