Creating Arrays and Datasets#

There are several ways to create data structures in Scipp. scipp.Variable is particularly diverse.

The examples on this page only show some representative creation functions and only the most commonly used arguments. See the linked reference pages for complete lists of functions and arguments.

Variable#

Variables can be created using any of the dedicated creation functions. These fall into several categories as described by the following subsections.

From Python Sequences or NumPy Arrays#

Arrays: N-D Variables#

Variables can be constructed from any Python object that can be used to create a NumPy array or NumPy arrays directly. See NumPy array creation for details. Given such an object, an array variable can be created using scipp.array (not to be confused with data arrays!)

[1]:
import scipp as sc

v1d = sc.array(dims=['x'], values=[1, 2, 3, 4])
v2d = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]])
v3d = sc.array(dims=['x', 'y', 'z'], values=[[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

Alternatively, passing a NumPy array:

[2]:
import numpy as np

a = np.array([[1, 2], [3, 4]])
v = sc.array(dims=['x', 'y'], values=a)

Note that both the NumPy array and Python lists are copied into the Scipp variable which leads to some additional time and memory costs. See Filling with a Value for ways of creating variables without this overhead.

The dtype of the variable is deduced automatically in the above cases. The unit is set to scipp.units.dimensionless if the dtype is a numeric type (e.g. integer, floating point) or None otherwise (e.g. strings). This applies to all creation functions, not just scipp.array.

[3]:
v
[3]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (288 Bytes)
    • (x: 2, y: 2)
      int64
      𝟙
      1, 2, 3, 4
      Values:
      array([[1, 2], [3, 4]])

The dtype can be overridden with the dtype argument (see Data types):

[4]:
sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], dtype='float64')
[4]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (288 Bytes)
    • (x: 2, y: 2)
      float64
      𝟙
      1.0, 2.0, 3.0, 4.0
      Values:
      array([[1., 2.], [3., 4.]])

The unit can and almost always should be set manually (see unit):

[5]:
sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], unit='m')
[5]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (288 Bytes)
    • (x: 2, y: 2)
      int64
      m
      1, 2, 3, 4
      Values:
      array([[1, 2], [3, 4]])

Variances can be added using the variances keyword:

[6]:
sc.array(
    dims=['x', 'y'], values=[[1.0, 2.0], [3.0, 4.0]], variances=[[0.1, 0.2], [0.3, 0.4]]
)
[6]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (320 Bytes)
    • (x: 2, y: 2)
      float64
      𝟙
      1.0, 2.0, 3.0, 4.0
      σ = 0.316, 0.447, 0.548, 0.632
      Values:
      array([[1., 2.], [3., 4.]])

      Variances (σ²):
      array([[0.1, 0.2], [0.3, 0.4]])

Note:

scipp.array takes variances as an input. If your input stores standard deviations, make sure to square them: sc.array(..., variances=stddevs**2).

Note further that the output in Jupyter notebooks shows standard deviations (σ). And converting variables to a plain string (str(var) or print(var)) shows variances.

All of this also applies to all other creation functions.

Scalars: 0-D Variables#

Scalars are variables with no dimensions. See 0-D variables (scalars) for a more detailed definition. They can be constructed using, among other functions, scipp.scalar:

[7]:
sc.scalar(3.41)
[7]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (264 Bytes)
    • ()
      float64
      𝟙
      3.41
      Values:
      array(3.41)

scipp.scalar will always produce a scalar variable, even when passed a sequence like a list:

[8]:
sc.scalar([3.41])
[8]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (264 Bytes)
    • ()
      PyObject
      [3.41]
      Values:
      [3.41]

In this case, it stores the Python list as-is in a Scipp variable.

Multiplying or dividing a value by a unit also produces a scalar variable:

[9]:
4.2 * sc.Unit('m')
[9]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (264 Bytes)
    • ()
      float64
      m
      4.2
      Values:
      array(4.2)
[10]:
4.2 / sc.Unit('m')
[10]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (264 Bytes)
    • ()
      float64
      1/m
      4.2
      Values:
      array(4.2)

Generating Values#

Range-Like Variables#

1D ranges and similar sequences can be created directly in Scipp. scipp.linspace creates arrays with regularly spaced values with a given number of elements. For example (click the stacked disks icon to see all values):

[11]:
sc.linspace('x', start=-2, stop=5, num=6, unit='s')
[11]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (304 Bytes)
    • (x: 6)
      float64
      s
      -2.0, -0.600, ..., 3.600, 5.0
      Values:
      array([-2. , -0.6, 0.8, 2.2, 3.6, 5. ])

scipp.arange similarly creates arrays but with a given step size:

[12]:
sc.arange('x', start=-2, stop=5, step=1.2, unit='K')
[12]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (304 Bytes)
    • (x: 6)
      float64
      K
      -2.0, -0.8, ..., 2.8, 4.0
      Values:
      array([-2. , -0.8, 0.4, 1.6, 2.8, 4. ])

arange does not include the stop value but linspace does by default. Please note that the caveats described in NumPy’s documentation apply to Scipp as well.

All range-like functions currently use NumPy to generate values and thus incur the same costs from copying as scipp.array.

Filling with a Value#

There are a number opf functions to create N-D arrays with a fixed value, e.g. scipp.zeros and scipp.full. scipp.zeros creates a variable of any number of dimensions filled with zeros:

[13]:
sc.zeros(dims=['x', 'y'], shape=[3, 4])
[13]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (352 Bytes)
    • (x: 3, y: 4)
      float64
      𝟙
      0.0, 0.0, ..., 0.0, 0.0
      Values:
      array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])

And scipp.full creates a variable filled with a given value:

[14]:
sc.full(dims=['x', 'y'], shape=[3, 4], value=1.23)
[14]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (352 Bytes)
    • (x: 3, y: 4)
      float64
      𝟙
      1.23, 1.23, ..., 1.23, 1.23
      Values:
      array([[1.23, 1.23, 1.23, 1.23], [1.23, 1.23, 1.23, 1.23], [1.23, 1.23, 1.23, 1.23]])

Every filling function has a corresponding function with a _like postfix, for instance scipp.zeros_like. These create a new variable (or data array) based on another variable (or data array):

[15]:
v = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], unit='m')
sc.zeros_like(v)
[15]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (288 Bytes)
    • (x: 2, y: 2)
      int64
      m
      0, 0, 0, 0
      Values:
      array([[0, 0], [0, 0]])

Special DTypes#

Scipp has a number of dtypes that require some conversion when creating variables. Notably scipp.datetimes, scipp.vectors, and their scalar counterparts scipp.datetime, scipp.vector. As well as types for spatial transformations in scipp.spatial. While variables of all of these dtypes can be constructed using scipp.array and scipp.scalar, the specialized functions offer more convenience and document their intent better.

scipp.datetimes constructs an array of date-time-points. It can be called either with strings in ISO 8601 format:

[16]:
sc.datetimes(dims=['time'], values=['2021-01-10T01:23:45', '2021-01-11T23:45:01'])
[16]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (272 Bytes)
    • (time: 2)
      datetime64
      s
      2021-01-10T01:23:45, 2021-01-11T23:45:01
      Values:
      array(['2021-01-10T01:23:45', '2021-01-11T23:45:01'], dtype='datetime64[s]')

Or with integers which encode the number of time units elapsed since the Unix epoc. See also scipp.epoch.

[17]:
sc.datetimes(dims=['time'], values=[0, 1610288175], unit='s')
[17]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (272 Bytes)
    • (time: 2)
      datetime64
      s
      1970-01-01T00:00:00, 2021-01-10T14:16:15
      Values:
      array(['1970-01-01T00:00:00', '2021-01-10T14:16:15'], dtype='datetime64[s]')

Note that the unit is mandatory in the second case.

It is also possible to create a range of datetimes:

[18]:
sc.arange(
    'time',
    '2022-08-04T14:00:00',
    '2022-08-04T14:04:00',
    step=30 * sc.Unit('s'),
    dtype='datetime64',
)
[18]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (320 Bytes)
    • (time: 8)
      datetime64
      s
      2022-08-04T14:00:00, 2022-08-04T14:00:30, ..., 2022-08-04T14:03:00, 2022-08-04T14:03:30
      Values:
      array(['2022-08-04T14:00:00', '2022-08-04T14:00:30', '2022-08-04T14:01:00', '2022-08-04T14:01:30', '2022-08-04T14:02:00', '2022-08-04T14:02:30', '2022-08-04T14:03:00', '2022-08-04T14:03:30'], dtype='datetime64[s]')

The current time according to the system clock can be accessed using

[19]:
sc.datetime('now', unit='ms')
[19]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (264 Bytes)
    • ()
      datetime64
      ms
      2024-09-12T08:22:37.000
      Values:
      array('2024-09-12T08:22:37.000', dtype='datetime64[ms]')

Scipp stores date times relative to the Unix epoch which is available via a shorthand:

[20]:
sc.epoch(unit='s')
[20]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (264 Bytes)
    • ()
      datetime64
      s
      1970-01-01T00:00:00
      Values:
      array('1970-01-01T00:00:00', dtype='datetime64[s]')

scipp.vectors creates an array of 3-vectors. It does so by converting a sequence or array with a length of 3 in its inner dimension:

[21]:
sc.vectors(dims=['position'], values=[[1, 2, 3], [4, 5, 6]], unit='m')
[21]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (360 Bytes)
    • (position: 2)
      vector3
      m
      [1. 2. 3.], [4. 5. 6.]
      Values:
      array([[1., 2., 3.], [4., 5., 6.]])

Data Arrays#

A DataArray’s “data” is set to a Variable, and optionally a dict-like mapping, where the keys are the coordinates names and the values are coordinate variables. Typically we have coordinates with a name matching their dimension, and frequently each dimension of the data variable has a corresponding coordinate of the same name:

[22]:
import scipp as sc

x = sc.linspace('x', start=1.5, stop=3.0, num=2, unit='m')
x_square = x * x
time = sc.linspace('time', start=1.0, stop=5.0, num=4, unit='s')
data = sc.array(dims=['x', 'time'], values=[[1, 2, 3, 4], [6, 7, 8, 9]], unit='K')
da = sc.DataArray(data, coords={'x': x, 'time': time, 'x_square': x_square})
da
[22]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.63 KB)
    • x: 2
    • time: 4
    • time
      (time)
      float64
      s
      1.0, 2.333, 3.667, 5.0
      Values:
      array([1. , 2.33333333, 3.66666667, 5. ])
    • x
      (x)
      float64
      m
      1.5, 3.0
      Values:
      array([1.5, 3. ])
    • x_square
      (x)
      float64
      m^2
      2.25, 9.0
      Values:
      array([2.25, 9. ])
    • (x, time)
      int64
      K
      1, 2, ..., 8, 9
      Values:
      array([[1, 2, 3, 4], [6, 7, 8, 9]])

Less frequently used, similar to coords, a data array can be created with masks:

[23]:
x = sc.linspace('x', start=1.5, stop=3.0, num=4, unit='m')
m = sc.array(dims=['x'], values=[True, False, True, False])
data = x**2
sc.DataArray(data, coords={'x': x}, masks={'m': m})
[23]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.32 KB)
    • x: 4
    • x
      (x)
      float64
      m
      1.5, 2.0, 2.5, 3.0
      Values:
      array([1.5, 2. , 2.5, 3. ])
    • (x)
      float64
      m^2
      2.25, 4.0, 6.25, 9.0
      Values:
      array([2.25, 4. , 6.25, 9. ])
    • m
      (x)
      bool
      True, False, True, False
      Values:
      array([ True, False, True, False])

coords and masks are optional but the data must always be given. Note how the creation functions for scipp.Variable can be used to make the individual pieces of a data array. These variables are not copied into the data array. Rather, the data array contains a reference to the same underlying piece of memory. See Ownership mechanism and readonly flags for details.

Dataset#

Datasets are constructed by combining multiple data arrays or variables. For instance, using the previously defined variables:

[24]:
sc.Dataset({'data1': data, 'data2': -data}, coords={'x': x})
[24]:
Show/Hide data repr Show/Hide attributes
scipp.Dataset (2.04 KB)
    • x: 4
    • x
      (x)
      float64
      m
      1.5, 2.0, 2.5, 3.0
      Values:
      array([1.5, 2. , 2.5, 3. ])
    • data1
      (x)
      float64
      m^2
      2.25, 4.0, 6.25, 9.0
      Values:
      array([2.25, 4. , 6.25, 9. ])
    • data2
      (x)
      float64
      m^2
      -2.25, -4.0, -6.25, -9.0
      Values:
      array([-2.25, -4. , -6.25, -9. ])

Or from data arrays:

[25]:
da1 = sc.DataArray(data, coords={'x': x}, masks={'m': m})
da2 = sc.DataArray(-data, coords={'x': x})
sc.Dataset({'data1': da1, 'data2': da2})
[25]:
Show/Hide data repr Show/Hide attributes
scipp.Dataset (2.30 KB)
    • x: 4
    • x
      (x)
      float64
      m
      1.5, 2.0, 2.5, 3.0
      Values:
      array([1.5, 2. , 2.5, 3. ])
    • data1
      (x)
      float64
      m^2
      2.25, 4.0, 6.25, 9.0
      Values:
      array([2.25, 4. , 6.25, 9. ])
    • data2
      (x)
      float64
      m^2
      -2.25, -4.0, -6.25, -9.0
      Values:
      array([-2.25, -4. , -6.25, -9. ])

Any Data Structure#

Any of scipp.Variable, scipp.DataArray, and scipp.Dataset and be created using the methods described in the following subsections.

From Scipp HDF5 Files#

Scipp has a custom file format based on HDF5 which can store data structures. See Reading and Writing Files for details. In short, scipp.io.load_hdf5 loads whatever Scipp object is stored in a given file. For demonstration purposes, we use a BytesIO object here. But the same code can be used by passing a string as a file name to save_hdf5 and load_hdf5.

[26]:
from io import BytesIO

buffer = BytesIO()
v = sc.arange('x', start=1.0, stop=5.0, step=1.0, unit='s')
v.save_hdf5(buffer)
sc.io.load_hdf5(buffer)
[26]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (288 Bytes)
    • (x: 4)
      float64
      s
      1.0, 2.0, 3.0, 4.0
      Values:
      array([1., 2., 3., 4.])

From Other Libraries#

Scipp’s data structures can be converted to and from certain other structures using the functions in the compat module. For example, scipp.compat.from_pandas can convert a Pandas dataframe into a Scipp dataset:

[27]:
import pandas as pd

df = pd.DataFrame({'x': 10 * np.arange(5), 'y': np.linspace(0.1, 0.5, 5)})
df
[27]:
x y
0 0 0.1
1 10 0.2
2 20 0.3
3 30 0.4
4 40 0.5
[28]:
sc.compat.from_pandas(df)
[28]:
Show/Hide data repr Show/Hide attributes
scipp.Dataset (1.78 KB)
    • row: 5
    • x
      (row)
      int64
      𝟙
      0, 10, 20, 30, 40
      Values:
      array([ 0, 10, 20, 30, 40])
    • y
      (row)
      float64
      𝟙
      0.1, 0.2, 0.300, 0.4, 0.5
      Values:
      array([0.1, 0.2, 0.3, 0.4, 0.5])

From NeXus Files#

NeXus is an HDF5-based file format neutron, x-ray, and muon science. Scippnexus is a package that allows loading NeXus files into Scipp objects with a simple h5py-like interface. For example, the following code loads the first detector bank into a data array:

import scippnexus as snx
with snx.File(filename) as f:
    da = f['entry/bank0'][...]

See the documentation of Scippnexus linked above for more information.