Computation with Binned Data

As described in Binned Data scipp can handle certain types of sparse or scattered data such as event data, i.e., data that cannot directly be represented as a multi-dimensional array. This could, e.g., be used to store data from an array of sensors/detectors that are read out independently, with potentially widely varying frequency.

Scipp supports two classes of operations with binned data.

  1. Bin-centric arithmetic treats every bin as an element to, e.g., apply a different scale factor to every bin.

  2. Event-centric arithmetic considers the individual events within bins. This allows for operation without the precision loss that would ensue from simply histogramming data.

Overview and Quick Reference

Before going into a detailed explanation below we provide a quick reference:

  • Unary operations such as sin, cos, or sqrt work as normal.

  • Comparison operations such as less (<) are not supported.

  • Binary operations such as + work in principle, but usually not if both operands represent event data. In that case, see table below.

Given two data arrays a and b, equivalent operations are:

Dense data operation

Binned data equivalent

Comment

a + b

a.bins.concatenate(b)

if both a and b are event data

a - b

a.bins.concatenate(-b)

if both a and b are event data

a += b

a.bins.concatenate(b, out=a)

if both a and b are event data

a -= b

a.bins.concatenate(-b, out=a)

if both a and b are event data

sc.sum(a, 'dim')

a.bins.concatenate('dim')

sc.mean(a, 'dim')

not available

min, max, and other similar reductions are also not available

sc.rebin(a, dim, 'edges')

sc.bin(a, edges=[edges])

groupby(...).sum('dim')

groupby(...).bins.concatenate('dim')

mean, max, and other similar reductions are also available

Concepts

Before assigning events to bins, we can initialize them as a single long list or table. In the simplest case this table has just a single column, i.e., it is a scipp variable:

[1]:
import numpy as np
import scipp as sc

table = sc.array(dims=['event'], values=[0,1,3,1,1,1,42,1,1,1,1,1], dtype='float64')
sc.table(table)

The events in the table can then be mapped into bins:

[2]:
begin = sc.array(dims=['x'], values=[0,6,6,8])
end = sc.array(dims=['x'], values=[6,6,8,12])
var = sc.bins(begin=begin, end=end, dim='event', data=table)
sc.show(var)
var
dims=['x'], shape=[4], unit=dimensionless, variances=Falsevalues x dims=['event'], shape=[12], unit=dimensionless, variances=Falsevalues event
[2]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (160 Bytes)
    • (x: 4)
      VariableView
      binned data [len=6, len=0, len=2, len=4]
      Values:
      [<scipp.Variable> (event: 6) float64 [dimensionless] [0, 1, ..., 1, 1], <scipp.Variable> (event: 0) float64 [dimensionless] [], <scipp.Variable> (event: 2) float64 [dimensionless] [42, 1], <scipp.Variable> (event: 4) float64 [dimensionless] [1, 1, 1, 1]]

Each element of the resulting “bin variable” references a section of the underlying table:

[3]:
sc.table(var['x', 0].value)
sc.table(var['x', 1].value)
sc.table(var['x', 2].value)

Bin-centric arithmetic

Elements of binned variables are views of slices of a variable or data array. An operation such as multiplication of a binned variable with a dense array thus computes the product of the bin (a variable view or data array view) with a scalar element of the dense array. In other words, operations between variables or data arrays broadcast dense data to the lists held in the bins:

[4]:
scale = sc.Variable(dims=['x'], values=np.arange(2.0, 6))
var *= scale
var['x', 0].values
[4]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (48 Bytes out of 96 Bytes)
    • (event: 6)
      float64
      0.0, 2.0, ..., 2.0, 2.0
      Values:
      array([0., 2., 6., 2., 2., 2.])
[5]:
var['x', 1].values
[5]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (0 Bytes out of 96 Bytes)
    • (event: 0)
      float64
      Values:
      array([], dtype=float64)
[6]:
var['x', 2].values
[6]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (16 Bytes out of 96 Bytes)
    • (event: 2)
      float64
      168.0, 4.0
      Values:
      array([168., 4.])

In practice scattered data requires more than one “column” of information. Typically we need at least one coordinate such as an event time stamp in addition to weights. If each scattered data point (event) corresponds to, e.g., a single detected neutron then the weight is 1. As above, we start by creating a single table containing all events:

[7]:
times = sc.array(dims=['event'],
                 unit='us', # micro second
                 values=[0,1,3,1,1,1,4,1,1,2,1,1],
                 dtype='float64')
weights = sc.ones(dims=['event'], unit='counts', shape=[12], with_variances=True)

table = sc.DataArray(data=weights, coords={'time':times})
sc.table(table)
table
[7]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (288 Bytes)
    • event: 12
    • time
      (event)
      float64
      µs
      0.0, 1.0, ..., 1.0, 1.0
      Values:
      array([0., 1., 3., 1., 1., 1., 4., 1., 1., 2., 1., 1.])
    • (event)
      float64
      counts
      1.0, 1.0, ..., 1.0, 1.0
      σ = 1.0, 1.0, ..., 1.0, 1.0
      Values:
      array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

      Variances (σ²):
      array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

This table is then mapped into bins. The resulting “bin variable” can, e.g., be used as the data in a data array, and can be combined with coordinates as usual:

[8]:
var = sc.bins(begin=begin, end=end, dim='event', data=table)
a = sc.DataArray(data=var, coords={'x':sc.Variable(dims=['x'], values=np.arange(4.0))})
a
[8]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (384 Bytes)
    • x: 4
    • x
      (x)
      float64
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • (x)
      DataArrayView
      binned data [len=6, len=0, len=2, len=4]
      Values:
      [<scipp.DataArray> Dimensions: Sizes[event:6, ] Coordinates: time float64 [µs] (event) [0, 1, ..., 1, 1] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:0, ] Coordinates: time float64 [µs] (event) [] Data: float64 [counts] (event) [] [] , <scipp.DataArray> Dimensions: Sizes[event:2, ] Coordinates: time float64 [µs] (event) [4, 1] Data: float64 [counts] (event) [1, 1] [1, 1] , <scipp.DataArray> Dimensions: Sizes[event:4, ] Coordinates: time float64 [µs] (event) [1, 2, 1, 1] Data: float64 [counts] (event) [1, 1, 1, 1] [1, 1, 1, 1] ]

In practice we rarely use sc.bins (which requires begin and end indices for each bin and an appropriately sorted table) to create binned data. For creating binned variables sc.bins is the only option, but for binning data arrays we typically bin based on the meta data of the individual data points, i.e., the coord columns in the table of scattered data. Use sc.bin (not sc.bins) as described in Binned Data.

In the graphical representation of the data array we can see the dense coordinate (green), and the bins (yellow):

[9]:
sc.show(a)
(dims=['x'], shape=[4], unit=dimensionless, variances=False)values x (dims=['event'], shape=[12], unit=counts, variances=True)variances eventvalues event timetime(dims=['event'], shape=[12], unit=µs, variances=False)values event xx(dims=['x'], shape=[4], unit=dimensionless, variances=False)values x

As before, each bin references a section of the underlying table:

[10]:
sc.table(a['x', 0].value)
sc.table(a['x', 1].value)
sc.table(a['x', 2].value)

Event-centric arithmetic

Complementary to the bin-centric arithmetic operations, which treat every bin as an element, we may need more fine-grained operations that treat events within bins individually. Scipp provides a number of such operations, defined in a manner to produce a result equivalent to that of a corresponding operation for histogrammed data, but preserving events and thus keeping the options of, e.g., binning to a higher resolution or filtering by meta data later. For example, addition of histogrammed data would correspond to concatenating event lists of individual bins.

The following operations are supported:

  • “Addition” of data arrays containing event data in bins. This is achieved by concatenating the underlying event lists.

  • “Subtraction” of data arrays containing event data in bins. This is performed by concatenating with a negative weight for the subtrahend.

  • “Multiplication” of a data array containing event data in bins with a data array with dense, histogrammed data. The weight of each event is scaled by the value of the corresponding bin in the histogram. Note that in contrast to the bin-centric operations the histogram may have a different (higher) resolution than the bin size of the binned event data. This operation does not match based on the coords of the bin but instead considers the more precise coord values of the individual events, i.e., the operation is performances based on the meta data column in the underlying table.

  • “Division” of a data array containing event data in bins by a data array with dense, histogrammed data. This is performed by scaling with the inverse of the denominator.

WARNING:

It is important to note that these operations, in particular multiplication and division, are only interchangeable with histogramming if the variances of the “histogram” operand are negligible. If these variances are not negligible the operation on the event data introduces correlations in the error bars of the individual events. Scipp has no way of tracking such correlations and a subsequent histogram step propagates uncertainties under the assumption of uncorrelated error bars.

Addition

[11]:
sc.show(a['x',2].value)
a.bins.concatenate(a, out=a)
sc.show(a['x',2].value)
a.bins.sum().plot()
(dims=['event'], shape=[2], unit=counts, variances=True)variances eventvalues event timetime(dims=['event'], shape=[2], unit=µs, variances=False)values event
(dims=['event'], shape=[4], unit=counts, variances=True)variances eventvalues event timetime(dims=['event'], shape=[4], unit=µs, variances=False)values event

Subtraction

[12]:
zero = a.copy()
sc.show(zero['x',2].value)
zero.bins.concatenate(-zero, out=zero)
sc.show(zero['x',2].value)
zero.bins.sum().plot()
(dims=['event'], shape=[4], unit=counts, variances=True)variances eventvalues event timetime(dims=['event'], shape=[4], unit=µs, variances=False)values event
(dims=['event'], shape=[8], unit=counts, variances=True)variances eventvalues event timetime(dims=['event'], shape=[8], unit=µs, variances=False)values event
/usr/share/miniconda/envs/test/conda-bld/scipp_1642083266382/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.7/site-packages/scipp/plotting/figure1d.py:344: UserWarning: Attempting to set identical bottom == top == 0.0 results in singular transformations; automatically expanding.
  self.ax.set_ylim(vmin, vmax)

Multiplication and division

We prepare a histogram that will be used to scale the event weights:

[13]:
time_bins = sc.array(dims=['time'], unit=sc.units.us, values=[0.0, 3.0, 6.0])
weight = sc.array(dims=['time'], values=[10.0, 3.0], variances=[10.0, 3.0])
hist = sc.DataArray(data=weight, coords={'time':time_bins})
hist
[13]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (56 Bytes)
    • time: 2
    • time
      (time [bin-edge])
      float64
      µs
      0.0, 3.0, 6.0
      Values:
      array([0., 3., 6.])
    • (time)
      float64
      10.0, 3.0
      σ = 3.162, 1.732
      Values:
      array([10., 3.])

      Variances (σ²):
      array([10., 3.])

The helper sc.lookup provides a wrapper for a discrete “function”, given by a data array depending on a specified dimension. The binary arithmetic operators on the bins property support this function-like object:

[14]:
a.bins /= sc.lookup(func=hist, dim='time')
sc.histogram(a, bins=time_bins).plot()