scipp.hist#

scipp.hist(x, arg_dict=None, /, *, dim=None, **kwargs)#

Compute a histogram.

Bin edges can be specified in three ways:

  1. When an integer is provided, a ‘linspace’ with this requested number of bins is created, based on the min and max of the corresponding coordinate.

  2. A scalar Scipp variable (a value with a unit) is interpreted as a target bin width, and an ‘arange’ covering the min and max of the corresponding coordinate is created.

  3. A custom coordinate, given as a Scipp variable with compatible unit. Typically this should have a single dimension matching the target dimension.

The dim argument controls which dimensions are summed over and which are preserved. The default dim=None means that the dimensions of the coordinate used for histogramming are summed over. In case of an input that is binned-data there may be no such coordinate, in which case dim=None is equivalent to dim=(), resulting in a new dimension in the output. In many cases this default yields the desired behavior, there are two classes of exceptions where specifying dim explicitly can be useful:

  1. Given input data with an N-D coordinate, where N>1, we can use dim to restrict the sum to a subset of M dimensions, resulting in an (N-M)-D “array” of histograms. This can be of particular importance when the input is binned data: Frequently we may want to bin to add an additional dimension, but if there is a dense coordinate present the default dim=None would result in removal of the coordinate’s dimensions. This can be prevented by setting dim=(), which will always add a new dimensions.

  2. Given M-D input data with an N-D coordinate, where N<M, we can specify dim to sum over, e.g., the remaining M-N dimensions while histogramming. This is often equivalent to not specifying dim and a call to sum after histogramming but is more memory efficient.

If the dimensions of the input coordinate are not known, using an explicit dim argument can be useful to obtain predictable behavior in generic code.

Parameters:
  • x – Input data.

  • arg_dict (default: None) – Dictionary mapping dimension labels to binning parameters.

  • dim (default: None) – Dimension(s) to sum over when histogramming. If None (the default), the dimensions of the coordinate used for histogramming are summed over.

  • **kwargs – Mapping of dimension label to corresponding binning parameters.

Returns:

Histogrammed data.

See also

scipp.nanhist

Like

py:func:scipp.hist, but NaN values are skipped.

scipp.bin

Creating binned data by binning instead of summing all contributions.

scipp.binning.make_histogrammed

Lower level function for histogramming.

Examples

Histogram a table by one of its coord columns, specifying (1) number of bins, (2) bin width, or (3) actual binning:

>>> from numpy.random import default_rng
>>> rng = default_rng(seed=1234)
>>> x = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> y = sc.array(dims=['row'], unit='m', values=rng.random(100))
>>> data = sc.ones(dims=['row'], unit='K', shape=[100])
>>> table = sc.DataArray(data=data, coords={'x': x, 'y': y})
>>> table.hist(x=2)
<scipp.DataArray>
Dimensions: Sizes[x:2, ]
Coordinates:
* x                         float64              [m]  (x [bin-edge])  [0.00313229, 0.497696, 0.992259]
Data:
                            float64              [K]  (x)  [53, 47]
>>> table.hist(x=sc.scalar(0.2, unit='m')).sizes
{'x': 5}
>>> table.hist(x=sc.linspace('x', 0.2, 0.8, num=10, unit='m')).sizes
{'x': 9}

Histogram a table by two of its coord columns:

>>> table.hist(x=4, y=6).sizes
{'x': 4, 'y': 6}

Histogram binned data, using existing bins:

>>> binned = table.bin(x=10)
>>> binned.hist().sizes
{'x': 10}

Histogram binned data, using new bins along existing dimension:

>>> binned = table.bin(x=10)
>>> binned.hist(x=20).sizes
{'x': 20}

Histogram binned data along an additional dimension:

>>> binned = table.bin(x=10)
>>> binned.hist(y=5).sizes
{'x': 10, 'y': 5}

The dim argument controls which dimensions are summed over and which are preserved. Given 3-D data with a 2-D coordinate, the default dim=None results in:

>>> xyz = sc.data.table_xyz(100).bin(x=4, y=5, z=6)
>>> xyz.coords['t'] = sc.array(dims=['x', 'y'], unit='s', values=rng.random((4, 5)))
>>> xyz.hist(t=3).sizes
{'z': 6, 't': 3}

Specifying dim=(‘x’, ‘y’, ‘z’) or equivalently dim=xyz.dims will additionally sum over the z-dimension, resulting in a 1-D histogram:

>>> xyz.hist(t=3, dim=('x', 'y', 'z')).sizes
{'t': 3}

To preserve a dimension of the input’s t-coordinate, we can drop this dimension from the tuple of dimensions to sum over:

>>> xyz.hist(t=4, dim='y').sizes
{'x': 4, 'z': 6, 't': 4}