scipp.group#

scipp.group(x, /, *args)#

Create binned data by grouping input by one or more coordinates.

Grouping can be specified in two ways: (1) When a string is provided the unique values of the corresponding coordinate are used as groups. (2) When a scipp variable is provided then the variable’s values are used as groups.

Note that option (1) may be very slow if the input is very large.

When grouping a dimension with an existing dimension-coord, the binning for the dimension is modified, i.e., the input and the output will have the same dimension labels.

When grouping by non-dimension-coords, the output will have new dimensions given by the names of these coordinates. These new dimensions replace the dimensions the input coordinates depend on.

Parameters
Returns

Union[DataArray, Dataset] – Binned data.

See also

scipp.bin

Creating binned data by binning based on edges, instead of grouping.

scipp.binning.make_binned

Lower level function that can bin and group, and does not automatically replace/erase dimensions.

Examples

Group a table by one of its coord columns, specifying (1) a coord name or (2) an actual grouping:

>>> from numpy.random import default_rng
>>> rng = default_rng(seed=1234)
>>> x = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> y = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> data = sc.ones(dims=['row'], unit='K', shape=[100])
>>> table = sc.DataArray(data=data, coords={'x': x, 'y': y})
>>> table.coords['label'] = (table.coords['x'] * 10).to(dtype='int64')
>>> table.group('label').sizes
{'label': 10}
>>> groups = sc.array(dims=['label'], values=[1, 3, 5], unit='m')
>>> table.group(groups).sizes
{'label': 3}

Group a table by two of its coord columns:

>>> table.coords['a'] = (table.coords['x'] * 10).to(dtype='int64')
>>> table.coords['b'] = (table.coords['y'] * 10).to(dtype='int64')
>>> table.group('a', 'b').sizes
{'a': 10, 'b': 10}
>>> groups = sc.array(dims=['a'], values=[1, 3, 5], unit='m')
>>> table.group(groups, 'b').sizes
{'a': 3, 'b': 10}

Group binned data along an additional dimension:

>>> table.coords['a'] = (table.coords['y'] * 10).to(dtype='int64')
>>> binned = table.bin(x=10)
>>> binned.group('a').sizes
{'x': 10, 'a': 10}