{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Computation with Binned Data\n", "\n", "As described in [Binned Data](binned-data.ipynb) scipp can handle certain types of sparse or scattered data such as *event data*, i.e., data that cannot directly be represented as a multi-dimensional array.\n", "This could, e.g., be used to store data from an array of sensors/detectors that are read out independently, with potentially widely varying frequency.\n", "\n", "Scipp supports two classes of operations with binned data.\n", "\n", "1. [Bin-centric arithmetic](#Bin-centric-arithmetic) treats every bin as an element to, e.g., apply a different scale factor to every bin.\n", "2. [Event-centric arithmetic](#Event-centric-arithmetic) considers the individual events within bins.\n", " This allows for operation without the precision loss that would ensue from simply histogramming data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview and Quick Reference\n", "\n", "Before going into a detailed explanation below we provide a quick reference:\n", "\n", "- Unary operations such as `sin`, `cos`, or `sqrt` work as normal.\n", "- Comparison operations such as `less` (`<`) are not supported.\n", "- Binary operations such as `+` work in principle, but usually not if both operands represent event data.\n", " In that case, see table below.\n", "\n", "Given two data arrays `a` and `b`, equivalent operations are:\n", "\n", "Dense data operation | Binned data equivalent | Comment\n", ":--- |:--- |:---\n", "`a + b` | `a.bins.concatenate(b)` | if both `a` and `b` are event data\n", "`a - b` | `a.bins.concatenate(-b)` | if both `a` and `b` are event data\n", "`a += b` | `a.bins.concatenate(b, out=a)` | if both `a` and `b` are event data\n", "`a -= b` | `a.bins.concatenate(-b, out=a)` | if both `a` and `b` are event data\n", "`sc.sum(a, 'dim')` | `a.bins.concatenate('dim')` | \n", "`sc.mean(a, 'dim')` | not available | `min`, `max`, and other similar reductions are also not available\n", "`sc.rebin(a, dim, 'edges')` | `sc.bin(a, edges=[edges])` | \n", "`groupby(...).sum('dim')` | `groupby(...).bins.concatenate('dim')` | `mean`, `max`, and other similar reductions are also available" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Concepts\n", "\n", "Before assigning events to bins, we can initialize them as a single long list or table.\n", "In the simplest case this table has just a single column, i.e., it is a scipp variable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scipp as sc\n", "\n", "table = sc.array(dims=['event'], values=[0,1,3,1,1,1,42,1,1,1,1,1], dtype='float64')\n", "sc.table(table)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The events in the table can then be mapped into bins:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "begin = sc.array(dims=['x'], values=[0,6,6,8])\n", "end = sc.array(dims=['x'], values=[6,6,8,12])\n", "var = sc.bins(begin=begin, end=end, dim='event', data=table)\n", "sc.show(var)\n", "var" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each element of the resulting \"bin variable\" references a section of the underlying table:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sc.table(var['x', 0].value)\n", "sc.table(var['x', 1].value)\n", "sc.table(var['x', 2].value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bin-centric arithmetic\n", "\n", "Elements of binned variables are views of slices of a variable or data array.\n", "An operation such as multiplication of a binned variable with a dense array thus computes the product of the bin (a variable view or data array view) with a scalar element of the dense array.\n", "In other words, operations between variables or data arrays broadcast dense data to the lists held in the bins:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scale = sc.Variable(dims=['x'], values=np.arange(2.0, 6))\n", "var *= scale\n", "var['x', 0].values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var['x', 1].values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var['x', 2].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice scattered data requires more than one \"column\" of information.\n", "Typically we need at least one coordinate such as an event time stamp in addition to weights.\n", "If each scattered data point (event) corresponds to, e.g., a single detected neutron then the weight is 1.\n", "As above, we start by creating a single table containing *all* events:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times = sc.array(dims=['event'],\n", " unit='us', # micro second\n", " values=[0,1,3,1,1,1,4,1,1,2,1,1],\n", " dtype='float64')\n", "weights = sc.ones(dims=['event'], unit='counts', shape=[12], with_variances=True)\n", "\n", "table = sc.DataArray(data=weights, coords={'time':times})\n", "sc.table(table)\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This table is then mapped into bins.\n", "The resulting \"bin variable\" can, e.g., be used as the data in a data array, and can be combined with coordinates as usual:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var = sc.bins(begin=begin, end=end, dim='event', data=table)\n", "a = sc.DataArray(data=var, coords={'x':sc.Variable(dims=['x'], values=np.arange(4.0))})\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "