Download this Jupyter notebook

What is scipp

Containers providing multi-dimensional array with associated dicts of coordinates, masks, and attributes

  • A Mantid evolution borne out of attempt to rethink data structures

  • Heavily influenced by python xarray project

  • C++ core with python bindings. Python is first-class element.

  • Development gathered pace in 2020

Feature Summary

  • Very flexible containers with good optimistaion potential

  • Supports key features Variances, Histograms, Masking, Events, Units, Bin-edges, Slicing, Sample-Environment

  • Can provide a good scientific representation of data, does not force users to work in Detector-Space

  • Emphasises use of built-in generic functions

  • Bundles it’s own plotting library

  • Dataset, DataArray are main data containers

Feature Exhibit

There are many demos and tutorials in the scipp online documentation

N-d data

We take the example of a 2D numpy array with values between 1 and 100

[1]:
import numpy as np
data = np.arange(1.0, 101.0).reshape(10,10)
data
[1]:
array([[  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.],
       [ 11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.],
       [ 21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.],
       [ 31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.],
       [ 41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.,  50.],
       [ 51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,  60.],
       [ 61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,  70.],
       [ 71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,  80.],
       [ 81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,  90.],
       [ 91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99., 100.]])

In scipp we attach labels to the dimensions. This additional information helps with numerous things as we will see below.

[2]:
import scipp as sc
image_data = sc.array(dims=['y', 'x'], values=data)
[3]:
sc.plot(image_data)

Coordinates and Units

[4]:
# Lets give our image data the correct units
image_data.unit = sc.units.counts

x = sc.array(dims=['x'], values=np.arange(10), unit=sc.units.mm)
y = sc.array(dims=['y'], values=np.arange(10), unit=sc.units.mm)
image = sc.DataArray(data=image_data, coords={'x':x, 'y':y})
sc.plot(image, aspect='equal')
[5]:
sc.show(image)
(dims=['y', 'x'], shape=[10, 10], unit=counts, variances=False)values yx yy(dims=['y'], shape=[10], unit=mm, variances=False)values y xx(dims=['x'], shape=[10], unit=mm, variances=False)values x

Unit mismatch

Coords and units, not about pretty labels, give safety to help with preventable/costly mistakes. Lets see.

[6]:
reference = image.copy()
normalized = image / reference
try:
    image + normalized # Caught!
except RuntimeError as e:
    print(e)
Cannot add counts and dimensionless.

Coordinate mismatch

[7]:
background_corrected = reference - image
sc.plot(background_corrected)
[8]:
reference.coords['x'] += 4 * sc.units.mm # Detector shifted along x
sc.plot(reference)
[9]:
try:
    reference - image
except RuntimeError as e:
    print(e)
Mismatch in coordinate 'x', expected
(x: 10)      int64             [mm]  [4, 5, ..., 12, 13], got
(x: 10)      int64             [mm]  [0, 1, ..., 8, 9]

Masking

[10]:
image2 = image.copy()
image.masks['lhs'] = image.coords['x'] < 5.0 * sc.units.mm
sc.plot(image)

Lets make more masks…

[11]:
image.masks['bad-pixel'] = image.data >= 99 * sc.units.counts
sc.plot(image)
[12]:
image2.masks['bad-row'] = image.coords['y'] == 6 * sc.units.mm
sc.plot(image2)

Masks are applied with OR. But data is not zero’d until the mask has to be lost.

[13]:
image += image2
sc.plot(image)
[14]:
sc.to_html(image)
sc.show(image)
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.05 KB)
    • y: 10
    • x: 10
    • x
      (x)
      int64
      mm
      0, 1, ..., 8, 9
      Values:
      array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    • y
      (y)
      int64
      mm
      0, 1, ..., 8, 9
      Values:
      array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    • (y, x)
      float64
      counts
      2.0, 4.0, ..., 198.0, 200.0
      Values:
      array([[ 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.], [ 22., 24., 26., 28., 30., 32., 34., 36., 38., 40.], [ 42., 44., 46., 48., 50., 52., 54., 56., 58., 60.], [ 62., 64., 66., 68., 70., 72., 74., 76., 78., 80.], [ 82., 84., 86., 88., 90., 92., 94., 96., 98., 100.], [102., 104., 106., 108., 110., 112., 114., 116., 118., 120.], [122., 124., 126., 128., 130., 132., 134., 136., 138., 140.], [142., 144., 146., 148., 150., 152., 154., 156., 158., 160.], [162., 164., 166., 168., 170., 172., 174., 176., 178., 180.], [182., 184., 186., 188., 190., 192., 194., 196., 198., 200.]])
    • bad-pixel
      (y, x)
      bool
      False, False, ..., True, True
      Values:
      array([[False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, True, True]])
    • bad-row
      (y)
      bool
      False, False, ..., False, False
      Values:
      array([False, False, False, False, False, False, True, False, False, False])
    • lhs
      (x)
      bool
      True, True, ..., False, False
      Values:
      array([ True, True, True, True, True, False, False, False, False, False])
(dims=['y', 'x'], shape=[10, 10], unit=counts, variances=False)values yx bad-r..bad-row(dims=['y'], shape=[10], unit=dimensionless, variances=False)values y yy(dims=['y'], shape=[10], unit=mm, variances=False)values y xx(dims=['x'], shape=[10], unit=mm, variances=False)values x bad-pixelbad-pixel(dims=['y', 'x'], shape=[10, 10], unit=dimensionless, variances=False)values yx lhslhs(dims=['x'], shape=[10], unit=dimensionless, variances=False)values x

Slicing

In numpy you are required to know your dimension order

[15]:
data[4:,:]
[15]:
array([[ 41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.,  50.],
       [ 51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,  60.],
       [ 61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,  70.],
       [ 71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,  80.],
       [ 81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,  90.],
       [ 91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99., 100.]])
[16]:
data[:, 4:] # Or was it the other way round?
[16]:
array([[  5.,   6.,   7.,   8.,   9.,  10.],
       [ 15.,  16.,  17.,  18.,  19.,  20.],
       [ 25.,  26.,  27.,  28.,  29.,  30.],
       [ 35.,  36.,  37.,  38.,  39.,  40.],
       [ 45.,  46.,  47.,  48.,  49.,  50.],
       [ 55.,  56.,  57.,  58.,  59.,  60.],
       [ 65.,  66.,  67.,  68.,  69.,  70.],
       [ 75.,  76.,  77.,  78.,  79.,  80.],
       [ 85.,  86.,  87.,  88.,  89.,  90.],
       [ 95.,  96.,  97.,  98.,  99., 100.]])

but with scipp “crop” any dimension using the dimension label as a key.

[17]:
sc.plot(image['x', 4:], aspect='equal')

You can also chain the slicing operations.

[18]:
sc.plot(image['y', 1:]['x', 4:], aspect='equal')

Dynamic type control

[19]:
image_data
[19]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (800 Bytes)
    • (y: 10, x: 10)
      float64
      counts
      2.0, 4.0, ..., 198.0, 200.0
      Values:
      array([[ 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.], [ 22., 24., 26., 28., 30., 32., 34., 36., 38., 40.], [ 42., 44., 46., 48., 50., 52., 54., 56., 58., 60.], [ 62., 64., 66., 68., 70., 72., 74., 76., 78., 80.], [ 82., 84., 86., 88., 90., 92., 94., 96., 98., 100.], [102., 104., 106., 108., 110., 112., 114., 116., 118., 120.], [122., 124., 126., 128., 130., 132., 134., 136., 138., 140.], [142., 144., 146., 148., 150., 152., 154., 156., 158., 160.], [162., 164., 166., 168., 170., 172., 174., 176., 178., 180.], [182., 184., 186., 188., 190., 192., 194., 196., 198., 200.]])
[20]:
image_data.astype('float32')
[20]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (400 Bytes)
    • (y: 10, x: 10)
      float32
      counts
      2.0, 4.0, ..., 198.0, 200.0
      Values:
      array([[ 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.], [ 22., 24., 26., 28., 30., 32., 34., 36., 38., 40.], [ 42., 44., 46., 48., 50., 52., 54., 56., 58., 60.], [ 62., 64., 66., 68., 70., 72., 74., 76., 78., 80.], [ 82., 84., 86., 88., 90., 92., 94., 96., 98., 100.], [102., 104., 106., 108., 110., 112., 114., 116., 118., 120.], [122., 124., 126., 128., 130., 132., 134., 136., 138., 140.], [142., 144., 146., 148., 150., 152., 154., 156., 158., 160.], [162., 164., 166., 168., 170., 172., 174., 176., 178., 180.], [182., 184., 186., 188., 190., 192., 194., 196., 198., 200.]], dtype=float32)

Compatibility

Mantid

  • scipp data structures are not API compatible with Mantid’s

  • scipp and Mantid data structures (workspaces) are convertible. As one-liners in some cases:

ds = sc.neutron.from_mantid(a_mantid)
  • scipp can load and use nexus files like Mantid

ds = sc.neutron.load("experiment.nxs")

More on this topic in the docs

Numpy

scipp objects can expose their underlying arrays in a numpy compatible form. This makes it possible to use numpy operations directly on scipp variables.

[21]:
x = sc.array(dims=['x'], values=np.linspace(-np.pi, np.pi, 20))
y = x.copy() # empty container
np.sin(x.values, out=y.values)
sc.plot(y)

Packages

Conda packages for Linux, OSX, and Windows on anaconda cloud

Installation

Simply

 conda install -c conda-forge -c scipp scipp

Interoperability with mantid is achieved by installing the mantid-framework package, which is an optional dependency. It can be installed through the same channels.

 conda install -c conda-forge -c scipp mantid-framework

Full installation notes here

Lots more in scipp

  • IO

  • label-based slicing

  • events/binning

  • grouping and filtering operations

Future Plans

  • Across technique areas Issues and priorities are already being driven by Instrument Data Scientists. Includes Søren Schmidt.

  • Data driven development using reduction workflows

  • Priority is to support getting Day One instruments ready for Hot Commissioning

  • Aligned to above, scipp is being supplimented by ess and neutron specific modules that provide bespoke tools.

  • scipp-widgets library also under deveopment for building-block gui additions. See docs

  • Technical short-term roadmap already available

Futher Reading

  1. Heybrock, Simon et al. "Scipp: Scientific Data Handling with Labeled Multi-dimensional Arrays for C++ and Python". 1 Jan. 2020 : 169 181 PDF Simon Heybrock et al.

  2. Source Code

[22]:
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.max_open_warning': 0})