scipp.group#

scipp.group(x, /, *args, dim=None)#

Create binned data by grouping input by one or more coordinates.

Grouping can be specified in two ways: (1) When a string is provided the unique values of the corresponding coordinate are used as groups. (2) When a Scipp variable is provided then the variable’s values are used as groups.

Note that option (1) may be very slow if the input is very large.

The dim argument controls which dimensions are concatenated and which are preserved. The default dim=None means that the dimensions of the coordinate used for binning are concatenated. In case of an input that is binned-data there may be no such coordinate, in which case dim=None is equivalent to dim=(), resulting in a new dimension in the output. In many cases this default yields the desired behavior, there are two classes of exceptions where specifying dim explicitly can be useful:

  1. Given input data with an N-D coordinate, where N>1, we can use dim to restrict the grouping to a subset of M dimensions, resulting in an (N-M)-D array of bins. This can be of particular importance when the input is binned data: Frequently we may want to group to add an additional dimension, but if there is a dense coordinate present the default dim=None would result in removal of the coordinate’s dimensions. This can be prevented by setting dim=(), which will always add a new dimensions.

  2. Given M-D input data with an N-D coordinate, where N<M, we can specify dim to concatenate, e.g., the remaining M-N dimensions while grouping. This is often equivalent to not specifying dim and a call to da.bins.concat() after grouping but is more memory efficient.

If the dimensions of the input coordinate are not known, using an explicit dim argument can be useful to obtain predictable behavior in generic code.

Warning

When there is existing binning or grouping, the algorithm assumes that coordinates of the binned data are correct, i.e., compatible with the corresponding coordinate values in the individual bins. If this is not the case then the behavior if UNSPECIFIED. That is, the algorithm may or may not ignore the existing coordinates. If you encounter such as case, remove the conflicting coordinate, e.g., using scipp.DataArray.drop_coords().

Parameters:
  • x (DataArray | DataGroup[Any]) – Input data.

  • *args (str | Variable) – Dimension labels or grouping variables.

  • dim (Union[str, tuple[str, ...], None], default: None) – Dimension(s) to concatenate into a single bin. If None (the default), the dimensions of the coordinate used for grouping are concatenated.

Returns:

DataArray | DataGroup[Any] – Binned data.

See also

scipp.bin

Creating binned data by binning based on edges, instead of grouping.

scipp.binning.make_binned

Lower level function that can bin and group.

Examples

Group a table by one of its coord columns, specifying (1) a coord name or (2) an actual grouping:

>>> from numpy.random import default_rng
>>> rng = default_rng(seed=1234)
>>> x = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> y = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> data = sc.ones(dims=['row'], unit='K', shape=[100])
>>> table = sc.DataArray(data=data, coords={'x': x, 'y': y})
>>> table.coords['label'] = (table.coords['x'] * 10).to(dtype='int64')
>>> table.group('label').sizes
{'label': 10}
>>> groups = sc.array(dims=['label'], values=[1, 3, 5], unit='m')
>>> table.group(groups).sizes
{'label': 3}

Group a table by two of its coord columns:

>>> table.coords['a'] = (table.coords['x'] * 10).to(dtype='int64')
>>> table.coords['b'] = (table.coords['y'] * 10).to(dtype='int64')
>>> table.group('a', 'b').sizes
{'a': 10, 'b': 10}
>>> groups = sc.array(dims=['a'], values=[1, 3, 5], unit='m')
>>> table.group(groups, 'b').sizes
{'a': 3, 'b': 10}

Group binned data along an additional dimension:

>>> table.coords['a'] = (table.coords['y'] * 10).to(dtype='int64')
>>> binned = table.bin(x=10)
>>> binned.group('a').sizes
{'x': 10, 'a': 10}

The dim argument controls which dimensions are concatenated and which are preserved. Given 3-D data with a 2-D coordinate, the default dim=None results in:

>>> xyz = sc.data.table_xyz(100).bin(x=4, y=5, z=6)
>>> times = rng.integers(low=1, high=3, size=(4, 5))
>>> xyz.coords['t'] = sc.array(dims=['x', 'y'], unit='s', values=times)
>>> xyz.group('t').sizes
{'z': 6, 't': 2}

Specifying dim=(‘x’, ‘y’, ‘z’) or equivalently dim=xyz.dims will additionally concatenate along the z-dimension, resulting in a 1-D array of bins:

>>> xyz.group('t', dim=('x', 'y', 'z')).sizes
{'t': 2}

To preserve a dimension of the input’s t-coordinate, we can drop this dimension from the tuple of dimensions to concatenate:

>>> xyz.group('t', dim='y').sizes
{'x': 4, 'z': 6, 't': 2}

Finally, we can add a new dimension without touching the existing dimensions:

>>> xyz.group('t', dim=()).sizes
{'x': 4, 'y': 5, 'z': 6, 't': 2}

Note that this is generally only useful if the input is binned data with a binned t-coordinate.