Creating Arrays and Datasets#
There are several ways to create data structures in Scipp. scipp.Variable
is particularly diverse.
The examples on this page only show some representative creation functions and only the most commonly used arguments. See the linked reference pages for complete lists of functions and arguments.
Variable#
Variables can be created using any of the dedicated creation functions. These fall into several categories as described by the following subsections.
From Python Sequences or NumPy Arrays#
Arrays: N-D Variables#
Variables can be constructed from any Python object that can be used to create a NumPy array or NumPy arrays directly. See NumPy array creation for details. Given such an object, an array variable can be created using scipp.array (not to be confused with data arrays!)
[1]:
import scipp as sc
v1d = sc.array(dims=['x'], values=[1, 2, 3, 4])
v2d = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]])
v3d = sc.array(dims=['x', 'y', 'z'], values=[[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
Alternatively, passing a NumPy array:
[2]:
import numpy as np
a = np.array([[1, 2], [3, 4]])
v = sc.array(dims=['x', 'y'], values=a)
Note that both the NumPy array and Python lists are copied into the Scipp variable which leads to some additional time and memory costs. See Filling with a Value for ways of creating variables without this overhead.
The dtype
of the variable is deduced automatically in the above cases. The unit is set to scipp.units.dimensionless
if the dtype
is a numeric type (e.g. integer, floating point) or None
otherwise (e.g. strings). This applies to all creation functions, not just scipp.array
.
[3]:
v
[3]:
- (x: 2, y: 2)int64𝟙1, 2, 3, 4
Values:
array([[1, 2], [3, 4]])
The dtype
can be overridden with the dtype
argument (see Data types):
[4]:
sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], dtype='float64')
[4]:
- (x: 2, y: 2)float64𝟙1.0, 2.0, 3.0, 4.0
Values:
array([[1., 2.], [3., 4.]])
The unit can and almost always should be set manually (see unit):
[5]:
sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], unit='m')
[5]:
- (x: 2, y: 2)int64m1, 2, 3, 4
Values:
array([[1, 2], [3, 4]])
Variances can be added using the variances
keyword:
[6]:
sc.array(
dims=['x', 'y'], values=[[1.0, 2.0], [3.0, 4.0]], variances=[[0.1, 0.2], [0.3, 0.4]]
)
[6]:
- (x: 2, y: 2)float64𝟙1.0, 2.0, 3.0, 4.0σ = 0.316, 0.447, 0.548, 0.632
Values:
array([[1., 2.], [3., 4.]])
Variances (σ²):
array([[0.1, 0.2], [0.3, 0.4]])
Note:
scipp.array
takes variances as an input. If your input stores standard deviations, make sure to square them: sc.array(..., variances=stddevs**2)
.
Note further that the output in Jupyter notebooks shows standard deviations (σ
). And converting variables to a plain string (str(var)
or print(var)
) shows variances.
All of this also applies to all other creation functions.
Scalars: 0-D Variables#
Scalars are variables with no dimensions. See 0-D variables (scalars) for a more detailed definition. They can be constructed using, among other functions, scipp.scalar:
[7]:
sc.scalar(3.41)
[7]:
- ()float64𝟙3.41
Values:
array(3.41)
scipp.scalar
will always produce a scalar variable, even when passed a sequence like a list:
[8]:
sc.scalar([3.41])
[8]:
- ()PyObject[3.41]
Values:
[3.41]
In this case, it stores the Python list as-is in a Scipp variable.
Multiplying or dividing a value by a unit also produces a scalar variable:
[9]:
4.2 * sc.Unit('m')
[9]:
- ()float64m4.2
Values:
array(4.2)
[10]:
4.2 / sc.Unit('m')
[10]:
- ()float641/m4.2
Values:
array(4.2)
Generating Values#
Range-Like Variables#
1D ranges and similar sequences can be created directly in Scipp. scipp.linspace creates arrays with regularly spaced values with a given number of elements. For example (click the stacked disks icon to see all values):
[11]:
sc.linspace('x', start=-2, stop=5, num=6, unit='s')
[11]:
- (x: 6)float64s-2.0, -0.600, ..., 3.600, 5.0
Values:
array([-2. , -0.6, 0.8, 2.2, 3.6, 5. ])
scipp.arange similarly creates arrays but with a given step size:
[12]:
sc.arange('x', start=-2, stop=5, step=1.2, unit='K')
[12]:
- (x: 6)float64K-2.0, -0.8, ..., 2.8, 4.0
Values:
array([-2. , -0.8, 0.4, 1.6, 2.8, 4. ])
arange
does not include the stop value but linspace
does by default. Please note that the caveats described in NumPy’s documentation apply to Scipp as well.
All range-like functions currently use NumPy to generate values and thus incur the same costs from copying as scipp.array
.
Filling with a Value#
There are a number opf functions to create N-D arrays with a fixed value, e.g. scipp.zeros and scipp.full. scipp.zeros
creates a variable of any number of dimensions filled with zeros:
[13]:
sc.zeros(dims=['x', 'y'], shape=[3, 4])
[13]:
- (x: 3, y: 4)float64𝟙0.0, 0.0, ..., 0.0, 0.0
Values:
array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
And scipp.full
creates a variable filled with a given value:
[14]:
sc.full(dims=['x', 'y'], shape=[3, 4], value=1.23)
[14]:
- (x: 3, y: 4)float64𝟙1.23, 1.23, ..., 1.23, 1.23
Values:
array([[1.23, 1.23, 1.23, 1.23], [1.23, 1.23, 1.23, 1.23], [1.23, 1.23, 1.23, 1.23]])
Every filling function has a corresponding function with a _like
postfix, for instance scipp.zeros_like. These create a new variable (or data array) based on another variable (or data array):
[15]:
v = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], unit='m')
sc.zeros_like(v)
[15]:
- (x: 2, y: 2)int64m0, 0, 0, 0
Values:
array([[0, 0], [0, 0]])
Special DTypes#
Scipp has a number of dtypes
that require some conversion when creating variables. Notably scipp.datetimes, scipp.vectors, and their scalar counterparts scipp.datetime, scipp.vector. As well as types for spatial transformations in scipp.spatial.
While variables of all of these dtypes can be constructed using scipp.array
and scipp.scalar
, the specialized functions offer more convenience and document their intent better.
scipp.datetimes
constructs an array of date-time-points. It can be called either with strings in ISO 8601 format:
[16]:
sc.datetimes(dims=['time'], values=['2021-01-10T01:23:45', '2021-01-11T23:45:01'])
[16]:
- (time: 2)datetime64s2021-01-10T01:23:45, 2021-01-11T23:45:01
Values:
array(['2021-01-10T01:23:45', '2021-01-11T23:45:01'], dtype='datetime64[s]')
Or with integers which encode the number of time units elapsed since the Unix epoc. See also scipp.epoch.
[17]:
sc.datetimes(dims=['time'], values=[0, 1610288175], unit='s')
[17]:
- (time: 2)datetime64s1970-01-01T00:00:00, 2021-01-10T14:16:15
Values:
array(['1970-01-01T00:00:00', '2021-01-10T14:16:15'], dtype='datetime64[s]')
Note that the unit is mandatory in the second case.
It is also possible to create a range of datetimes:
[18]:
sc.arange(
'time',
'2022-08-04T14:00:00',
'2022-08-04T14:04:00',
step=30 * sc.Unit('s'),
dtype='datetime64',
)
[18]:
- (time: 8)datetime64s2022-08-04T14:00:00, 2022-08-04T14:00:30, ..., 2022-08-04T14:03:00, 2022-08-04T14:03:30
Values:
array(['2022-08-04T14:00:00', '2022-08-04T14:00:30', '2022-08-04T14:01:00', '2022-08-04T14:01:30', '2022-08-04T14:02:00', '2022-08-04T14:02:30', '2022-08-04T14:03:00', '2022-08-04T14:03:30'], dtype='datetime64[s]')
The current time according to the system clock can be accessed using
[19]:
sc.datetime('now', unit='ms')
[19]:
- ()datetime64ms2024-09-12T08:22:37.000
Values:
array('2024-09-12T08:22:37.000', dtype='datetime64[ms]')
Scipp stores date times relative to the Unix epoch which is available via a shorthand:
[20]:
sc.epoch(unit='s')
[20]:
- ()datetime64s1970-01-01T00:00:00
Values:
array('1970-01-01T00:00:00', dtype='datetime64[s]')
scipp.vectors
creates an array of 3-vectors. It does so by converting a sequence or array with a length of 3 in its inner dimension:
[21]:
sc.vectors(dims=['position'], values=[[1, 2, 3], [4, 5, 6]], unit='m')
[21]:
- (position: 2)vector3m[1. 2. 3.], [4. 5. 6.]
Values:
array([[1., 2., 3.], [4., 5., 6.]])
Data Arrays#
A DataArray’s “data” is set to a Variable, and optionally a dict-like mapping, where the keys are the coordinates names and the values are coordinate variables. Typically we have coordinates with a name matching their dimension, and frequently each dimension of the data variable has a corresponding coordinate of the same name:
[22]:
import scipp as sc
x = sc.linspace('x', start=1.5, stop=3.0, num=2, unit='m')
x_square = x * x
time = sc.linspace('time', start=1.0, stop=5.0, num=4, unit='s')
data = sc.array(dims=['x', 'time'], values=[[1, 2, 3, 4], [6, 7, 8, 9]], unit='K')
da = sc.DataArray(data, coords={'x': x, 'time': time, 'x_square': x_square})
da
[22]:
- x: 2
- time: 4
- time(time)float64s1.0, 2.333, 3.667, 5.0
Values:
array([1. , 2.33333333, 3.66666667, 5. ]) - x(x)float64m1.5, 3.0
Values:
array([1.5, 3. ]) - x_square(x)float64m^22.25, 9.0
Values:
array([2.25, 9. ])
- (x, time)int64K1, 2, ..., 8, 9
Values:
array([[1, 2, 3, 4], [6, 7, 8, 9]])
Less frequently used, similar to coords
, a data array can be created with masks
:
[23]:
x = sc.linspace('x', start=1.5, stop=3.0, num=4, unit='m')
m = sc.array(dims=['x'], values=[True, False, True, False])
data = x**2
sc.DataArray(data, coords={'x': x}, masks={'m': m})
[23]:
- x: 4
- x(x)float64m1.5, 2.0, 2.5, 3.0
Values:
array([1.5, 2. , 2.5, 3. ])
- (x)float64m^22.25, 4.0, 6.25, 9.0
Values:
array([2.25, 4. , 6.25, 9. ])
- m(x)boolTrue, False, True, False
Values:
array([ True, False, True, False])
coords
and masks
are optional but the data
must always be given. Note how the creation functions for scipp.Variable
can be used to make the individual pieces of a data array. These variables are not copied into the data array. Rather, the data array contains a reference to the same underlying piece of memory. See Ownership mechanism and readonly flags for details.
Dataset#
Datasets are constructed by combining multiple data arrays or variables. For instance, using the previously defined variables:
[24]:
sc.Dataset({'data1': data, 'data2': -data}, coords={'x': x})
[24]:
- x: 4
- x(x)float64m1.5, 2.0, 2.5, 3.0
Values:
array([1.5, 2. , 2.5, 3. ])
- data1(x)float64m^22.25, 4.0, 6.25, 9.0
Values:
array([2.25, 4. , 6.25, 9. ]) - data2(x)float64m^2-2.25, -4.0, -6.25, -9.0
Values:
array([-2.25, -4. , -6.25, -9. ])
Or from data arrays:
[25]:
da1 = sc.DataArray(data, coords={'x': x}, masks={'m': m})
da2 = sc.DataArray(-data, coords={'x': x})
sc.Dataset({'data1': da1, 'data2': da2})
[25]:
- x: 4
- x(x)float64m1.5, 2.0, 2.5, 3.0
Values:
array([1.5, 2. , 2.5, 3. ])
- data1(x)float64m^22.25, 4.0, 6.25, 9.0
Values:
array([2.25, 4. , 6.25, 9. ]) - data2(x)float64m^2-2.25, -4.0, -6.25, -9.0
Values:
array([-2.25, -4. , -6.25, -9. ])
Any Data Structure#
Any of scipp.Variable
, scipp.DataArray
, and scipp.Dataset
and be created using the methods described in the following subsections.
From Scipp HDF5 Files#
Scipp has a custom file format based on HDF5 which can store data structures. See Reading and Writing Files for details. In short, scipp.io.load_hdf5
loads whatever Scipp object is stored in a given file. For demonstration purposes, we use a BytesIO
object here. But the same code can be used by passing a string as a file name to save_hdf5
and load_hdf5
.
[26]:
from io import BytesIO
buffer = BytesIO()
v = sc.arange('x', start=1.0, stop=5.0, step=1.0, unit='s')
v.save_hdf5(buffer)
sc.io.load_hdf5(buffer)
[26]:
- (x: 4)float64s1.0, 2.0, 3.0, 4.0
Values:
array([1., 2., 3., 4.])
From Other Libraries#
Scipp’s data structures can be converted to and from certain other structures using the functions in the compat module. For example, scipp.compat.from_pandas
can convert a Pandas dataframe into a Scipp dataset:
[27]:
import pandas as pd
df = pd.DataFrame({'x': 10 * np.arange(5), 'y': np.linspace(0.1, 0.5, 5)})
df
[27]:
x | y | |
---|---|---|
0 | 0 | 0.1 |
1 | 10 | 0.2 |
2 | 20 | 0.3 |
3 | 30 | 0.4 |
4 | 40 | 0.5 |
[28]:
sc.compat.from_pandas(df)
[28]:
- row: 5
- x(row)int64𝟙0, 10, 20, 30, 40
Values:
array([ 0, 10, 20, 30, 40]) - y(row)float64𝟙0.1, 0.2, 0.300, 0.4, 0.5
Values:
array([0.1, 0.2, 0.3, 0.4, 0.5])
From NeXus Files#
NeXus is an HDF5-based file format neutron, x-ray, and muon science. Scippnexus is a package that allows loading NeXus files into Scipp objects with a simple h5py-like interface. For example, the following code loads the first detector bank into a data array:
import scippnexus as snx
with snx.File(filename) as f:
da = f['entry/bank0'][...]
See the documentation of Scippnexus linked above for more information.