scipp.bin#
- scipp.bin(x, arg_dict=None, /, *, dim=None, **kwargs)#
Create binned data by binning input along all dimensions given by edges.
Bin edges can be specified in three ways:
When an integer is provided, a ‘linspace’ with this requested number of bins is created, based on the min and max of the corresponding coordinate.
A scalar Scipp variable (a value with a unit) is interpreted as a target bin width, and an ‘arange’ covering the min and max of the corresponding coordinate is created.
A custom coordinate, given as a Scipp variable with compatible unit. Typically, this should have a single dimension matching the target dimension.
The dim argument controls which dimensions are concatenated and which are preserved. The default dim=None means that the dimensions of the coordinate used for binning are concatenated. In case of an input that is binned-data there may be no such coordinate, in which case dim=None is equivalent to dim=(), resulting in a new dimension in the output. In many cases this default yields the desired behavior, there are two classes of exceptions where specifying dim explicitly can be useful:
Given input data with an N-D coordinate, where N>1, we can use dim to restrict the binning to a subset of M dimensions, resulting in an (N-M)-D “array” of bins. This can be of particular importance when the input is binned data: Frequently we may want to bin to add an additional dimension, but if there is a dense coordinate present the default dim=None would result in removal of the coordinate’s dimensions. This can be prevented by setting dim=(), which will always add a new dimensions.
Given M-D input data with an N-D coordinate, where N<M, we can specify dim to concatenate, e.g., the remaining M-N dimensions while binning. This is often equivalent to not specifying dim and a call to da.bins.concat() after binning but is more memory efficient.
If the dimensions of the input coordinate are not known, using an explicit dim argument can be useful to obtain predictable behavior in generic code.
Warning
When there is existing binning or grouping, the algorithm assumes that coordinates of the binned data are correct, i.e., compatible with the corresponding coordinate values in the individual bins. If this is not the case then the behavior if UNSPECIFIED. That is, the algorithm may or may not ignore the existing coordinates. If you encounter such as case, remove the conflicting coordinate, e.g., using
scipp.DataArray.drop_coords()
.- Parameters:
arg_dict (
Optional
[dict
[str
,SupportsIndex
|Variable
]], default:None
) – Dictionary mapping dimension labels to binning parameters.dim (
Union
[str
,tuple
[str
,...
],None
], default:None
) – Dimension(s) to concatenate into a single bin. If None (the default), the dimensions of the coordinate used for binning are concatenated.**kwargs (
SupportsIndex
|Variable
) – Mapping of dimension label to corresponding binning parameters.
- Returns:
See also
scipp.hist
For histogramming data.
scipp.group
Creating binned data by grouping, instead of binning based on edges.
scipp.binning.make_binned
Lower level function that can bin and group.
Examples
Bin a table by one of its coord columns, specifying (1) number of bins, (2) bin width, or (3) actual binning:
>>> from numpy.random import default_rng >>> rng = default_rng(seed=1234) >>> x = sc.array(dims=['row'], unit='m', values=rng.random(100)) >>> y = sc.array(dims=['row'], unit='m', values=rng.random(100)) >>> data = sc.ones(dims=['row'], unit='K', shape=[100]) >>> table = sc.DataArray(data=data, coords={'x': x, 'y': y}) >>> table.bin(x=2).sizes {'x': 2}
>>> table.bin(x=sc.scalar(0.2, unit='m')).sizes {'x': 5}
>>> table.bin(x=sc.linspace('x', 0.2, 0.8, num=10, unit='m')).sizes {'x': 9}
Bin a table by two of its coord columns:
>>> table.bin(x=4, y=6).sizes {'x': 4, 'y': 6}
Bin binned data, using new bins along existing dimension:
>>> binned = table.bin(x=10) >>> binned.bin(x=20).sizes {'x': 20}
Bin binned data along an additional dimension:
>>> binned = table.bin(x=10) >>> binned.bin(y=5).sizes {'x': 10, 'y': 5}
The dim argument controls which dimensions are concatenated and which are preserved. Given 3-D data with a 2-D coordinate, the default dim=None results in:
>>> xyz = sc.data.table_xyz(100).bin(x=4, y=5, z=6) >>> values = rng.random((4, 5)) >>> xyz.coords['t'] = sc.array(dims=['x', 'y'], unit='s', values=values) >>> xyz.bin(t=3).sizes {'z': 6, 't': 3}
Specifying dim=(‘x’, ‘y’, ‘z’) or equivalently dim=xyz.dims will additionally concatenate along the z-dimension, resulting in a 1-D array of bins:
>>> xyz.bin(t=3, dim=('x', 'y', 'z')).sizes {'t': 3}
To preserve a dimension of the input’s t-coordinate, we can drop this dimension from the tuple of dimensions to concatenate:
>>> xyz.bin(t=4, dim='y').sizes {'x': 4, 'z': 6, 't': 4}
Finally, we can add a new dimension without touching the existing dimensions:
>>> xyz.bin(t=4, dim=()).sizes {'x': 4, 'y': 5, 'z': 6, 't': 4}
Note that this is generally only useful if the input is binned data with a binned t-coordinate.