GroupBy#

“Group by” operations refers to an implementation of the “split-apply-combine” approach known from pandas and xarray. We currently support only a limited number of operations that can be applied.

Grouping with bins#

Note that this notebook requires Mantid.

[1]:
import numpy as np
import scipp as sc
import scippneutron as scn
import scippneutron.data
[2]:
# Load event data. Here, we use `get_path` to find a data file that comes bundled
# with scippneutron. Normally, we would simply pass a file path to `scn.load`.
data = scn.load_with_mantid(
    scn.data.get_path('PG3_4844_event.nxs'), load_pulse_times=False
)
data
FrameworkManager-[Notice] Welcome to Mantid 6.11.0
FrameworkManager-[Notice] Please cite: http://dx.doi.org/10.1016/j.nima.2014.07.029 and this release: http://dx.doi.org/10.5286/Software/Mantid6.11
DownloadInstrument-[Notice] All instrument definitions up to date
Load-[Notice] Load started
Load-[Notice] Load successful, Duration 1.36 seconds
DeleteWorkspace-[Notice] DeleteWorkspace started
DeleteWorkspace-[Notice] DeleteWorkspace successful, Duration 0.00 seconds
[2]:
  • data
    scipp
    DataArray
    (spectrum: 24794, tof: 1)
    DataArrayView
    binned data [len=0, len=0, ..., len=0, len=0]
  • sample
    scipp
    Variable
    ()
    PyObject
    <mantid.api._api.Sample object at 0x7f73b614c270>
  • instrument_name
    str
    ()
    POWGEN
  • start_time
    scipp
    Variable
    ()
    string
    𝟙
    2011-08-12T15:50:17
  • end_time
    scipp
    Variable
    ()
    string
    𝟙
    2011-08-12T17:22:05
  • ChopperStatus1
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    4.0, 4.0
  • ChopperStatus2
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    4.0, 4.0
  • ChopperStatus3
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    4.0, 4.0
  • CurrentSP
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    300.0, 300.0
  • EnergyRequest
    scipp
    DataArray
    (time: 2)
    float64
    meV
    287.955, 287.955
  • LKSRampRate
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    0.0, 0.0
  • LambdaRequest
    scipp
    DataArray
    (time: 2)
    float64
    Å
    0.533, 0.533
  • Phase1
    scipp
    DataArray
    (time: 1794)
    float64
    µs
    8166.720, 8165.158, ..., 8163.853, 8163.853
  • Phase2
    scipp
    DataArray
    (time: 1793)
    float64
    µs
    8335.626, 8334.088, ..., 8332.859, 8332.859
  • Phase3
    scipp
    DataArray
    (time: 1777)
    float64
    µs
    1.400e+04, 1.400e+04, ..., 1.400e+04, 1.400e+04
  • PhaseRequest1
    scipp
    DataArray
    (time: 2)
    float64
    µs
    8164.075, 8164.075
  • PhaseRequest2
    scipp
    DataArray
    (time: 2)
    float64
    µs
    8332.893, 8332.893
  • PhaseRequest3
    scipp
    DataArray
    (time: 2)
    float64
    µs
    1.400e+04, 1.400e+04
  • SampleTemp
    scipp
    DataArray
    (time: 467)
    float64
    𝟙
    299.352, 299.446, ..., 300.0, 300.0
  • Speed1
    scipp
    DataArray
    (time: 2)
    float64
    Hz
    60.0, 60.0
  • Speed2
    scipp
    DataArray
    (time: 2)
    float64
    Hz
    60.0, 60.0
  • Speed3
    scipp
    DataArray
    (time: 2)
    float64
    Hz
    60.0, 60.0
  • SpeedRequest1
    scipp
    DataArray
    (time: 2)
    float64
    Hz
    60.0, 60.0
  • SpeedRequest2
    scipp
    DataArray
    (time: 2)
    float64
    Hz
    60.0, 60.0
  • SpeedRequest3
    scipp
    DataArray
    (time: 2)
    float64
    Hz
    60.0, 60.0
  • TolRequest
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    20.0, 20.0
  • currentsample
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    4.0, 4.0
  • fernsstatus
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    3.0, 3.0
  • frequency
    scipp
    DataArray
    (time: 330473)
    float64
    Hz
    0.0, 60.024, ..., 60.002, 59.999
  • proton_charge
    scipp
    DataArray
    (time: 330473)
    float64
    pC
    1.484e+07, 1.484e+07, ..., 1.487e+07, 1.484e+07
  • samplerequest
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    4.0, 4.0
  • S1HCenter
    scipp
    DataArray
    (time: 2)
    float64
    mm
    0.0, 0.0
  • S1HCenterOffset
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    0.0, 0.0
  • S1HWidth
    scipp
    DataArray
    (time: 2)
    float64
    mm
    10.0, 10.0
  • S1VCenter
    scipp
    DataArray
    (time: 2)
    float64
    mm
    5.0, 5.0
  • S1VCenterOffset
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    0.0, 0.0
  • S1VHeight
    scipp
    DataArray
    (time: 2)
    float64
    mm
    30.0, 30.0
  • commErrs
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    0.0, 0.0
  • guide
    scipp
    DataArray
    (time: 2)
    float64
    mm
    -55.463, -55.463
  • s1b
    scipp
    DataArray
    (time: 2)
    float64
    mm
    20.0, 20.0
  • s1l
    scipp
    DataArray
    (time: 2)
    float64
    mm
    5.0, 5.0
  • s1r
    scipp
    DataArray
    (time: 2)
    float64
    mm
    -5.0, -5.0
  • s1t
    scipp
    DataArray
    (time: 2)
    float64
    mm
    -10.0, -10.0
  • vGuide
    scipp
    DataArray
    (time: 2)
    float64
    𝟙
    2.0, 2.0
  • veto_pulse_time
    scipp
    DataArray
    (time: 1)
    float64
    𝟙
    0.0
  • gd_prtn_chrg
    scipp
    Variable
    ()
    float64
    µAh
    1171.953902925
  • run_start
    scipp
    Variable
    ()
    string
    𝟙
    2011-08-12T15:50:17
  • run_title
    scipp
    Variable
    ()
    string
    𝟙
    diamond cw0.533 4.22e12 60Hz [10x30]
  • file_notes
    scipp
    Variable
    ()
    string
    𝟙
    NONE
  • run_number
    scipp
    Variable
    ()
    string
    𝟙
    4844
  • experiment_identifier
    scipp
    Variable
    ()
    string
    𝟙
    IPTS-2767
  • duration
    scipp
    Variable
    ()
    float64
    s
    5508.0
  • running
    scipp
    DataArray
    (time: 1)
    bool
    𝟙
    True
  • Filename
    scipp
    Variable
    ()
    string
    𝟙
    /home/runner/.cache/scippneutron/5/PG3_4844_event.nxs
[3]:
events = data['data']
events
[3]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (274.59 MB)
    • spectrum: 24794
    • tof: 1
    • position
      (spectrum)
      vector3
      m
      [ 1.17451004 -1.01106149 -2.03796699], [ 1.18147634 -0.95946649 -2.05334117], ..., [1.81428985 0.09565841 3.84338287], [1.81375055 0.1499371 3.84269584]
      Values:
      array([[ 1.17451004, -1.01106149, -2.03796699], [ 1.18147634, -0.95946649, -2.05334117], [ 1.18844265, -0.90787149, -2.06871534], ..., [ 1.81482915, 0.04137972, 3.8440699 ], [ 1.81428985, 0.09565841, 3.84338287], [ 1.81375055, 0.1499371 , 3.84269584]])
    • sample_position
      ()
      vector3
      m
      [0. 0. 0.]
      Values:
      array([0., 0., 0.])
    • source_position
      ()
      vector3
      m
      [ 0. 0. -60.]
      Values:
      array([ 0., 0., -60.])
    • spectrum
      (spectrum)
      int32
      1, 2, ..., 24793, 24794
      Values:
      array([ 1, 2, 3, ..., 24792, 24793, 24794], dtype=int32)
    • tof
      (tof [bin-edge])
      float64
      µs
      20.0, 1.669e+04
      Values:
      array([ 20. , 16694.30078125])
    • (spectrum, tof)
      DataArrayView
      binned data [len=0, len=0, ..., len=0, len=0]
      dim='event',
      content=DataArray(
                dims=(event: 17926980),
                data=float32[counts],
                coords={'tof':float64[µs]})

Example 1 (dense data): split-sum-combine#

We histogram the event data:

[4]:
pos_hist = events.hist(tof=400)

A plot shows the shortcoming of the data representation. There is no physical meaning attached to the “spectrum” dimension and the plot is hard to interpret:

[5]:
pos_hist.hist(spectrum=500).transpose().plot()
[5]:
../_images/user-guide_groupby_7_0.svg

To improve the plot, we first store the scattering angle as labels in the data array. Then we create a variable containing the desired target binning:

[6]:
pos_hist.coords['two_theta'] = scn.two_theta(pos_hist)
two_theta = sc.linspace(dim='two_theta', unit='rad', start=0.0, stop=np.pi, num=501)

We use scipp.groupby with the desired bins and apply a sum over dimension spectrum:

[7]:
theta_hist = pos_hist.groupby('two_theta', bins=two_theta).sum('spectrum')

The result has spectrum replaced by the physically meaningful two_theta dimension and the resulting plot is easily interpretable:

[8]:
theta_hist.plot()
[8]:
../_images/user-guide_groupby_13_0.svg

Example 2 (event data): split-flatten-combine#

This is essentially the same as example 1 but avoids histogramming data too early. A plot of the original data is hard to interpret:

[9]:
events.hist(spectrum=500, tof=400).plot()
[9]:
../_images/user-guide_groupby_15_0.svg

Again, we improve the plot by first storing the scattering angle as labels in the data array with the events. Then we create a variable containing the desired target binning:

[10]:
events.coords['two_theta'] = scn.two_theta(events)
two_theta = sc.linspace(dim='two_theta', unit='rad', start=0.0, stop=np.pi, num=501)

We use scipp.groupby with the desired bins and apply a concatenation operation on dimension spectrum. This is the event-data equivalent to summing histograms:

[11]:
theta_events = events.groupby('two_theta', bins=two_theta).concat('spectrum')

The result has dimension spectrum replaced by the physically meaningful two_theta and results in the same plot as before with histogrammed data.

[12]:
theta_events.hist(tof=400).plot()
[12]:
../_images/user-guide_groupby_21_0.svg