scipp.group#

scipp.group(x: DataArray, /, *args: str | Variable) DataArray#
scipp.group(x: DataGroup, /, *args: str | Variable) DataGroup

Create binned data by grouping input by one or more coordinates.

Grouping can be specified in two ways: (1) When a string is provided the unique values of the corresponding coordinate are used as groups. (2) When a Scipp variable is provided then the variable’s values are used as groups.

Note that option (1) may be very slow if the input is very large.

When grouping a dimension with an existing dimension-coord, the binning for the dimension is modified, i.e., the input and the output will have the same dimension labels.

When grouping by non-dimension-coords, the output will have new dimensions given by the names of these coordinates. These new dimensions replace the dimensions the input coordinates depend on.

Warning

When there is existing binning or grouping, the algorithm assumes that coordinates of the binned data are correct, i.e., compatible with the corresponding coordinate values in the individual bins. If this is not the case then the behavior if UNSPECIFIED. That is, the algorithm may or may not ignore the existing coordinates. If you encounter such as case, remove the conflicting coordinate, e.g., using scipp.DataArray.drop_coords().

Parameters:
  • x – Input data.

  • *args – Dimension labels or grouping variables.

Returns:

Binned data.

See also

scipp.bin

Creating binned data by binning based on edges, instead of grouping.

scipp.binning.make_binned

Lower level function that can bin and group, and does not automatically replace/erase dimensions.

Examples

Group a table by one of its coord columns, specifying (1) a coord name or (2) an actual grouping:

>>> from numpy.random import default_rng
>>> rng = default_rng(seed=1234)
>>> x = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> y = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> data = sc.ones(dims=['row'], unit='K', shape=[100])
>>> table = sc.DataArray(data=data, coords={'x': x, 'y': y})
>>> table.coords['label'] = (table.coords['x'] * 10).to(dtype='int64')
>>> table.group('label').sizes
{'label': 10}
>>> groups = sc.array(dims=['label'], values=[1, 3, 5], unit='m')
>>> table.group(groups).sizes
{'label': 3}

Group a table by two of its coord columns:

>>> table.coords['a'] = (table.coords['x'] * 10).to(dtype='int64')
>>> table.coords['b'] = (table.coords['y'] * 10).to(dtype='int64')
>>> table.group('a', 'b').sizes
{'a': 10, 'b': 10}
>>> groups = sc.array(dims=['a'], values=[1, 3, 5], unit='m')
>>> table.group(groups, 'b').sizes
{'a': 3, 'b': 10}

Group binned data along an additional dimension:

>>> table.coords['a'] = (table.coords['y'] * 10).to(dtype='int64')
>>> binned = table.bin(x=10)
>>> binned.group('a').sizes
{'x': 10, 'a': 10}