scipp.Dataset#
- class scipp.Dataset#
Dict of data arrays with aligned dimensions.
A Dataset groups multiple DataArrays that share common coordinates. Operations on a Dataset apply to all contained DataArrays, and slicing preserves shared coordinates.
Examples
Create a Dataset with shared coordinates:
>>> import scipp as sc >>> ds = sc.Dataset( ... data={ ... 'temperature': sc.array(dims=['x'], values=[20.0, 21.0, 22.0], unit='K'), ... 'pressure': sc.array(dims=['x'], values=[1.0, 1.1, 1.2], unit='bar'), ... }, ... coords={'x': sc.array(dims=['x'], values=[0.0, 1.0, 2.0], unit='m')}, ... )
Slice a Dataset by dimension (applies to all data arrays):
>>> ds['x', 0] <scipp.Dataset> Dimensions: Sizes[] Coordinates: x float64 [m] () 0 Data: pressure float64 [bar] () 1 temperature float64 [K] () 20
>>> ds['x', 1:3] <scipp.Dataset> Dimensions: Sizes[x:2, ] Coordinates: * x float64 [m] (x) [1, 2] Data: pressure float64 [bar] (x) [1.1, 1.2] temperature float64 [K] (x) [21, 22]
Broadcasting operations across all data arrays:
>>> result = ds + ds # adds corresponding arrays >>> result['temperature'].values array([40., 42., 44.])
See also
- __init__(self, data: Mapping[str, Variable | DataArray] | Iterable[tuple[str, Variable | DataArray]] = {}, coords: Mapping[str, Variable] | Iterable[tuple[str, Variable]] = {}) None#
Dataset initializer.
- Parameters:
data – Dictionary of name and data pairs.
coords – Dictionary of name and coord pairs.
Examples
Create a Dataset with two data arrays and a shared coordinate:
>>> import scipp as sc >>> ds = sc.Dataset( ... data={ ... 'a': sc.array(dims=['x'], values=[1, 2, 3]), ... 'b': sc.array(dims=['x'], values=[4, 5, 6]), ... }, ... coords={'x': sc.array(dims=['x'], values=[0.1, 0.2, 0.3], unit='m')}, ... ) >>> 'a' in ds True >>> ds['a'].dims ('x',) >>> ds.coords['x'].unit == sc.Unit('m') True
Methods
__init__(self[, data, coords])Dataset initializer.
all([dim])Logical AND over input values.
any([dim])Logical OR over input values.
assign_coords([coords])Return new object with updated or inserted coordinate.
clear(self)Removes all data, preserving coordinates.
copy(self[, deep])Return a (by default deep) copy.
drop_coords(*args, **kwargs)Overloaded function.
Get the value associated with the provided key or the default value.
groupby(group, *[, bins])Group dataset or data array based on values of specified labels.
hist([arg_dict, dim])Compute a histogram.
items(self)View of the Dataset's (name, data array) pairs.
keys(self)View of the Dataset's data array names.
max([dim])Maximum of elements in the input.
mean([dim])Arithmetic mean of elements in the input.
median([dim])Compute the median of the input values.
min([dim])Minimum of elements in the input.
nanmax([dim])Maximum of elements in the input ignoring NaN's.
nanmean([dim])Arithmetic mean of elements in the input ignoring NaN's.
nanmedian([dim])Compute the median of the input values ignoring NaN's.
nanmin([dim])Minimum of elements in the input ignoring NaN's.
nanstd([dim])Compute the standard deviation of the input values ignoring NaN's.
nansum([dim])Sum of elements in the input ignoring NaN's.
nanvar([dim])Compute the variance of the input values ignoring NaN's.
plot(**kwargs)Wrapper function to plot data.
Remove and return an element.
rebin([arg_dict])Rebin a data array or dataset.
rename([dims_dict])Rename the dimensions, coordinates and attributes of all the items.
rename_dims([dims_dict])Rename dimensions.
save_hdf5(filename)Write an object out to file in HDF5 format.
squeeze([dim])Remove dimensions of length 1.
std([dim])Compute the standard deviation of the input values.
sum([dim])Sum of elements in the input.
transform_coords([targets, graph, ...])Compute new coords based on transformations of input coords.
underlying_size(self)Return the size of the object in bytes.
update(self[, other])Update items from dict-like or iterable.
values(self)View of the Dataset's data arrays.
var([dim])Compute the variance of the input values.
Attributes
Returns helper
scipp.Binsfor bin-wise operations.Dict of coordinates.
The only dimension label for 1-dimensional data, raising an exception if the data is not 1-dimensional.
Dimension labels of the data (read-only).
Return True if the object is binned.
Number of dimensions of the data (read-only).
Shape of the data (read-only).
dict mapping dimension labels to dimension sizes (read-only).
- __getitem__(*args, **kwargs)#
Overloaded function.
__getitem__(self: scipp._scipp.core.Dataset, name: str) -> scipp._scipp.core.DataArray
Access a data item by name.
- Parameters:
name – Name of the data item to access.
- Returns:
The DataArray with the given name.
Examples
Access a data item in the dataset:
>>> import scipp as sc >>> ds = sc.Dataset({ ... 'a': sc.array(dims=['x'], values=[1, 2, 3]), ... 'b': sc.array(dims=['x'], values=[4.0, 5.0, 6.0], unit='m') ... }) >>> ds['a'] <scipp.DataArray> Dimensions: Sizes[x:3, ] Data: a int64 [dimensionless] (x) [1, 2, 3]
>>> ds['b'].unit Unit(m)
__getitem__(self: scipp._scipp.core.Dataset, arg0: typing.SupportsInt) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: slice) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: scipp._scipp.core.Variable) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: tuple[str, scipp._scipp.core.Variable]) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: tuple[str, typing.SupportsInt]) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: tuple[str, slice]) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: ellipsis) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: collections.abc.Sequence[typing.SupportsInt]) -> scipp._scipp.core.Dataset
__getitem__(self: scipp._scipp.core.Dataset, arg0: tuple[str, collections.abc.Sequence[typing.SupportsInt]]) -> scipp._scipp.core.Dataset
- all(dim=None)#
Logical AND over input values.
- any(dim=None)#
Logical OR over input values.
- assign_coords(coords=None, /, **coords_kwargs)#
Return new object with updated or inserted coordinate.
- Parameters:
- Returns:
TypeVar(_T,Dataset,DataArray) –scipp.DataArrayorscipp.Datasetwith updated coordinates.
Examples
Add a new coordinate using keyword arguments:
>>> import scipp as sc >>> da = sc.DataArray( ... sc.array(dims=['x'], values=[1.0, 2.0, 3.0]), ... coords={'x': sc.arange('x', 3)} ... ) >>> da.assign_coords(y=sc.array(dims=['x'], values=[10, 20, 30])) <scipp.DataArray> Dimensions: Sizes[x:3, ] Coordinates: * x int64 [dimensionless] (x) [0, 1, 2] * y int64 [dimensionless] (x) [10, 20, 30] Data: float64 [dimensionless] (x) [1, 2, 3]
Update an existing coordinate using a dict:
>>> da.assign_coords({'x': sc.arange('x', 3.0, unit='m')}) <scipp.DataArray> Dimensions: Sizes[x:3, ] Coordinates: * x float64 [m] (x) [0, 1, 2] Data: float64 [dimensionless] (x) [1, 2, 3]
- property bins: Bins[_O]#
Returns helper
scipp.Binsfor bin-wise operations.Deprecated since version 25.11.0:
binscurrently returnsNoneif the object is not binned. In the future, this will change andbinswill raise ascipp.BinnedDataErrorinstead. Usex.is_binnedinstead ofx.bins is not Noneto check ifxcontains binned data.
- clear(self: scipp._scipp.core.Dataset) None#
Removes all data, preserving coordinates.
- property coords#
Dict of coordinates.
- copy(self: scipp._scipp.core.Dataset, deep: bool = True) scipp._scipp.core.Dataset#
Return a (by default deep) copy.
If deep=True (the default), a deep copy is made. Otherwise, a shallow copy is made, and the returned data (and meta data) values are new views of the data and meta data values of this object.
Examples
>>> import scipp as sc >>> var = sc.array(dims=['x'], values=[1.0, 2.0, 3.0], unit='m') >>> var_copy = var.copy() >>> var_copy.values[0] = 999.0 >>> var # Original unchanged <scipp.Variable> (x: 3) float64 [m] [1, 2, 3]
- property dim#
The only dimension label for 1-dimensional data, raising an exception if the data is not 1-dimensional.
Examples
>>> import scipp as sc >>> var = sc.array(dims=['x'], values=[1, 2, 3], unit='m') >>> var.dim 'x'
>>> da = sc.DataArray(sc.array(dims=['time'], values=[1.0, 2.0, 3.0], unit='K')) >>> da.dim 'time'
- property dims#
Dimension labels of the data (read-only).
Examples
>>> import scipp as sc >>> var = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]]) >>> var.dims ('x', 'y')
>>> da = sc.DataArray( ... sc.array(dims=['x', 'y'], values=[[1.0, 2.0], [3.0, 4.0]]), ... coords={'x': sc.array(dims=['x'], values=[0.0, 1.0], unit='m')} ... ) >>> da.dims ('x', 'y')
- drop_coords(*args, **kwargs)#
Overloaded function.
drop_coords(self: scipp._scipp.core.Dataset, arg0: str) -> scipp._scipp.core.Dataset
drop_coords(self: scipp._scipp.core.Dataset, arg0: collections.abc.Sequence[str]) -> scipp._scipp.core.Dataset
- get()#
Get the value associated with the provided key or the default value.
Examples
Access a coordinate with a default value:
>>> import scipp as sc >>> da = sc.DataArray(sc.array(dims=['x'], values=[1, 2, 3])) >>> da.coords.get('x') # returns None if 'x' does not exist >>> da.coords['x'] = sc.arange('x', 3) >>> da.coords.get('x') <scipp.Variable> (x: 3) int64 [dimensionless] [0, 1, 2]
Access a Dataset item with a default value:
>>> ds = sc.Dataset({'a': sc.array(dims=['x'], values=[1, 2, 3])}) >>> ds.get('b', sc.DataArray(sc.zeros(dims=['x'], shape=[3]))) <scipp.DataArray> Dimensions: Sizes[x:3, ] Data: float64 [dimensionless] (x) [0, 0, 0]
- groupby(group, *, bins=None)#
Group dataset or data array based on values of specified labels.
- Seealso:
Details in
scipp.groupby()- Return type:
- hist(arg_dict=None, /, *, dim=None, **kwargs)#
Compute a histogram.
- items(self: scipp._scipp.core.Dataset) scipp._scipp.core.Dataset_items_view#
View of the Dataset’s (name, data array) pairs.
Examples
>>> import scipp as sc >>> ds = sc.Dataset({'a': sc.array(dims=['x'], values=[1, 2]), ... 'b': sc.array(dims=['x'], values=[3, 4])}) >>> for name, da in ds.items(): ... print(f'{name}: {da.dims}') a: ('x',) b: ('x',)
- keys(self: scipp._scipp.core.Dataset) scipp._scipp.core.Dataset_keys_view#
View of the Dataset’s data array names.
Examples
>>> import scipp as sc >>> ds = sc.Dataset({'a': sc.array(dims=['x'], values=[1, 2]), ... 'b': sc.array(dims=['x'], values=[3, 4])}) >>> list(ds.keys()) ['a', 'b']
- max(dim=None)#
Maximum of elements in the input.
- mean(dim=None)#
Arithmetic mean of elements in the input.
- median(dim=None)#
Compute the median of the input values.
- min(dim=None)#
Minimum of elements in the input.
- nanmax(dim=None)#
Maximum of elements in the input ignoring NaN’s.
- nanmean(dim=None)#
Arithmetic mean of elements in the input ignoring NaN’s.
- nanmedian(dim=None)#
Compute the median of the input values ignoring NaN’s.
- nanmin(dim=None)#
Minimum of elements in the input ignoring NaN’s.
- nanstd(dim=None, *, ddof)#
Compute the standard deviation of the input values ignoring NaN’s.
- nansum(dim=None)#
Sum of elements in the input ignoring NaN’s.
- nanvar(dim=None, *, ddof)#
Compute the variance of the input values ignoring NaN’s.
- property ndim#
Number of dimensions of the data (read-only).
Examples
>>> import scipp as sc >>> sc.scalar(1.0).ndim 0
>>> sc.array(dims=['x'], values=[1, 2, 3]).ndim 1
>>> sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]]).ndim 2
- plot(**kwargs)#
Wrapper function to plot data. See https://scipp.github.io/plopp/ for details.
- Return type:
- pop()#
Remove and return an element.
If key is not found, default is returned if given, otherwise KeyError is raised.
Examples
Remove a coordinate from a DataArray:
>>> import scipp as sc >>> da = sc.DataArray( ... sc.array(dims=['x'], values=[1.0, 2.0, 3.0]), ... coords={'x': sc.arange('x', 3), 'y': sc.arange('x', 3) * 10} ... ) >>> da.coords.pop('y') <scipp.Variable> (x: 3) int64 [dimensionless] [0, 10, 20] >>> 'y' in da.coords False
Pop with default value for missing key:
>>> da.coords.pop('z', sc.scalar(0)) <scipp.Variable> () int64 [dimensionless] 0
Remove an item from a Dataset:
>>> ds = sc.Dataset({'a': sc.array(dims=['x'], values=[1, 2, 3]), ... 'b': sc.array(dims=['x'], values=[4, 5, 6])}) >>> ds.pop('b') <scipp.DataArray> ... >>> list(ds.keys()) ['a']
- rebin(arg_dict=None, /, **kwargs)#
Rebin a data array or dataset.
- rename(dims_dict=None, /, **names)#
Rename the dimensions, coordinates and attributes of all the items.
The renaming can be defined:
using a dict mapping the old to new names, e.g.
rename({'x': 'a', 'y': 'b'})using keyword arguments, e.g.
rename(x='a', y='b')
In both cases, x is renamed to a and y to b.
Names not specified in either input are unchanged.
- Parameters:
- Returns:
Dataset– A new dataset with renamed dimensions, coordinates, and attributes. Buffers are shared with the input.
See also
scipp.Dataset.rename_dimsOnly rename dimensions, not coordinates and attributes.
- rename_dims(dims_dict=None, /, **names)#
Rename dimensions.
The renaming can be defined:
using a dict mapping the old to new names, e.g.
rename_dims({'x': 'a', 'y': 'b'})using keyword arguments, e.g.
rename_dims(x='a', y='b')
In both cases, x is renamed to a and y to b.
Dimensions not specified in either input are unchanged.
This function only renames dimensions. See the
renamemethod to also rename coordinates and attributes.- Parameters:
- Returns:
TypeVar(_T,Variable,DataArray,Dataset) – A new object with renamed dimensions.
Examples
>>> import scipp as sc >>> var = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]]) >>> var.rename_dims({'x': 'row', 'y': 'col'}).sizes {'row': 2, 'col': 2}
Using keyword arguments:
>>> var.rename_dims(x='a', y='b').dims ('a', 'b')
Only specified dimensions are renamed:
>>> var.rename_dims(x='i').dims ('i', 'y')
- save_hdf5(filename)#
Write an object out to file in HDF5 format.
Supported types include
Variable,DataArray,Dataset, andDataGroup. Nested structures are supported.- Parameters:
- Return type:
See also
scipp.io.load_hdf5Load data from HDF5 files.
Examples
Save and load a Variable:
>>> import scipp as sc >>> import tempfile >>> var = sc.array(dims=['x'], values=[1.0, 2.0, 3.0], unit='m') >>> with tempfile.NamedTemporaryFile(suffix='.h5') as f: ... sc.io.save_hdf5(var, f.name) ... loaded = sc.io.load_hdf5(f.name) >>> loaded <scipp.Variable> (x: 3) float64 [m] [1, 2, 3]
Save and load a DataArray with coordinates:
>>> da = sc.DataArray( ... sc.array(dims=['x'], values=[10, 20, 30], unit='counts'), ... coords={'x': sc.array(dims=['x'], values=[0.1, 0.2, 0.3], unit='m')} ... ) >>> with tempfile.NamedTemporaryFile(suffix='.h5') as f: ... sc.io.save_hdf5(da, f.name) ... loaded = sc.io.load_hdf5(f.name) >>> loaded <scipp.DataArray> Dimensions: Sizes[x:3, ] Coordinates: * x float64 [m] (x) [0.1, 0.2, 0.3] Data: int64 [counts] (x) [10, 20, 30]
- property shape#
Shape of the data (read-only).
Examples
>>> import scipp as sc >>> var = sc.array(dims=['x', 'y'], values=[[1, 2, 3], [4, 5, 6]]) >>> var.shape (2, 3)
>>> sc.scalar(1.0).shape ()
- property sizes#
dict mapping dimension labels to dimension sizes (read-only).
Examples
>>> import scipp as sc >>> var = sc.array(dims=['x', 'y'], values=[[1, 2, 3], [4, 5, 6]]) >>> var.sizes {'x': 2, 'y': 3}
>>> da = sc.DataArray( ... sc.array(dims=['time', 'channel'], values=[[1, 2], [3, 4], [5, 6]]) ... ) >>> da.sizes {'time': 3, 'channel': 2}
- squeeze(dim=None)#
Remove dimensions of length 1.
- std(dim=None, *, ddof)#
Compute the standard deviation of the input values.
- sum(dim=None)#
Sum of elements in the input.
- transform_coords(targets=None, /, graph=None, *, rename_dims=True, keep_aliases=True, keep_intermediate=True, keep_inputs=True, quiet=False, **kwargs)#
Compute new coords based on transformations of input coords.
- Seealso:
Details in
scipp.transform_coords()- Return type:
- underlying_size(self: scipp._scipp.core.Dataset) int#
Return the size of the object in bytes.
The size includes the object itself and all arrays contained in it. But arrays may be counted multiple times if components share buffers, e.g. multiple coordinates referencing the same memory. Conversely, the size may be underestimated. Especially, but not only, with dtype=PyObject.
This function includes all memory of the underlying buffers. Use
__sizeof__to get the size of the current slice only.
- update(self: scipp._scipp.core.Dataset, other: object = None, /, **kwargs) None#
Update items from dict-like or iterable.
If
otherhas a .keys() method, then update does:for k in other.keys(): self[k] = other[k].If
otheris given but does not have a .keys() method, then update does:for k, v in other: self[k] = v.In either case, this is followed by:
for k in kwargs: self[k] = kwargs[k].See also
- values(self: scipp._scipp.core.Dataset) scipp._scipp.core.Dataset_values_view#
View of the Dataset’s data arrays.
Examples
>>> import scipp as sc >>> ds = sc.Dataset({'a': sc.array(dims=['x'], values=[1, 2]), ... 'b': sc.array(dims=['x'], values=[3, 4])}) >>> for da in ds.values(): ... print(da.dims) ('x',) ('x',)