GroupBy#
“Group by” operations refers to an implementation of the “split-apply-combine” approach known from pandas and xarray. We currently support only a limited number of operations that can be applied.
Grouping with bins#
Note that this notebook requires Mantid.
[1]:
import numpy as np
import scipp as sc
import scippneutron as scn
import scippneutron.data
[2]:
# Load event data. Here, we use `get_path` to find a data file that comes bundled
# with scippneutron. Normally, we would simply pass a file path to `scn.load`.
data = scn.load_with_mantid(
scn.data.get_path('PG3_4844_event.nxs'), load_pulse_times=False
)
data
FrameworkManager-[Notice] Welcome to Mantid 6.11.0
FrameworkManager-[Notice] Please cite: http://dx.doi.org/10.1016/j.nima.2014.07.029 and this release: http://dx.doi.org/10.5286/Software/Mantid6.11
DownloadInstrument-[Notice] All instrument definitions up to date
Load-[Notice] Load started
Load-[Notice] Load successful, Duration 1.36 seconds
DeleteWorkspace-[Notice] DeleteWorkspace started
DeleteWorkspace-[Notice] DeleteWorkspace successful, Duration 0.00 seconds
[2]:
- datascippDataArray(spectrum: 24794, tof: 1)DataArrayViewbinned data [len=0, len=0, ..., len=0, len=0]
- samplescippVariable()PyObject<mantid.api._api.Sample object at 0x7f73b614c270>
- instrument_namestr()POWGEN
- start_timescippVariable()string𝟙2011-08-12T15:50:17
- end_timescippVariable()string𝟙2011-08-12T17:22:05
- ChopperStatus1scippDataArray(time: 2)float64𝟙4.0, 4.0
- ChopperStatus2scippDataArray(time: 2)float64𝟙4.0, 4.0
- ChopperStatus3scippDataArray(time: 2)float64𝟙4.0, 4.0
- CurrentSPscippDataArray(time: 2)float64𝟙300.0, 300.0
- EnergyRequestscippDataArray(time: 2)float64meV287.955, 287.955
- LKSRampRatescippDataArray(time: 2)float64𝟙0.0, 0.0
- LambdaRequestscippDataArray(time: 2)float64Å0.533, 0.533
- Phase1scippDataArray(time: 1794)float64µs8166.720, 8165.158, ..., 8163.853, 8163.853
- Phase2scippDataArray(time: 1793)float64µs8335.626, 8334.088, ..., 8332.859, 8332.859
- Phase3scippDataArray(time: 1777)float64µs1.400e+04, 1.400e+04, ..., 1.400e+04, 1.400e+04
- PhaseRequest1scippDataArray(time: 2)float64µs8164.075, 8164.075
- PhaseRequest2scippDataArray(time: 2)float64µs8332.893, 8332.893
- PhaseRequest3scippDataArray(time: 2)float64µs1.400e+04, 1.400e+04
- SampleTempscippDataArray(time: 467)float64𝟙299.352, 299.446, ..., 300.0, 300.0
- Speed1scippDataArray(time: 2)float64Hz60.0, 60.0
- Speed2scippDataArray(time: 2)float64Hz60.0, 60.0
- Speed3scippDataArray(time: 2)float64Hz60.0, 60.0
- SpeedRequest1scippDataArray(time: 2)float64Hz60.0, 60.0
- SpeedRequest2scippDataArray(time: 2)float64Hz60.0, 60.0
- SpeedRequest3scippDataArray(time: 2)float64Hz60.0, 60.0
- TolRequestscippDataArray(time: 2)float64𝟙20.0, 20.0
- currentsamplescippDataArray(time: 2)float64𝟙4.0, 4.0
- fernsstatusscippDataArray(time: 2)float64𝟙3.0, 3.0
- frequencyscippDataArray(time: 330473)float64Hz0.0, 60.024, ..., 60.002, 59.999
- proton_chargescippDataArray(time: 330473)float64pC1.484e+07, 1.484e+07, ..., 1.487e+07, 1.484e+07
- samplerequestscippDataArray(time: 2)float64𝟙4.0, 4.0
- S1HCenterscippDataArray(time: 2)float64mm0.0, 0.0
- S1HCenterOffsetscippDataArray(time: 2)float64𝟙0.0, 0.0
- S1HWidthscippDataArray(time: 2)float64mm10.0, 10.0
- S1VCenterscippDataArray(time: 2)float64mm5.0, 5.0
- S1VCenterOffsetscippDataArray(time: 2)float64𝟙0.0, 0.0
- S1VHeightscippDataArray(time: 2)float64mm30.0, 30.0
- commErrsscippDataArray(time: 2)float64𝟙0.0, 0.0
- guidescippDataArray(time: 2)float64mm-55.463, -55.463
- s1bscippDataArray(time: 2)float64mm20.0, 20.0
- s1lscippDataArray(time: 2)float64mm5.0, 5.0
- s1rscippDataArray(time: 2)float64mm-5.0, -5.0
- s1tscippDataArray(time: 2)float64mm-10.0, -10.0
- vGuidescippDataArray(time: 2)float64𝟙2.0, 2.0
- veto_pulse_timescippDataArray(time: 1)float64𝟙0.0
- gd_prtn_chrgscippVariable()float64µAh1171.953902925
- run_startscippVariable()string𝟙2011-08-12T15:50:17
- run_titlescippVariable()string𝟙diamond cw0.533 4.22e12 60Hz [10x30]
- file_notesscippVariable()string𝟙NONE
- run_numberscippVariable()string𝟙4844
- experiment_identifierscippVariable()string𝟙IPTS-2767
- durationscippVariable()float64s5508.0
- runningscippDataArray(time: 1)bool𝟙True
- FilenamescippVariable()string𝟙/home/runner/.cache/scippneutron/5/PG3_4844_event.nxs
[3]:
events = data['data']
events
[3]:
- spectrum: 24794
- tof: 1
- position(spectrum)vector3m[ 1.17451004 -1.01106149 -2.03796699], [ 1.18147634 -0.95946649 -2.05334117], ..., [1.81428985 0.09565841 3.84338287], [1.81375055 0.1499371 3.84269584]
Values:
array([[ 1.17451004, -1.01106149, -2.03796699], [ 1.18147634, -0.95946649, -2.05334117], [ 1.18844265, -0.90787149, -2.06871534], ..., [ 1.81482915, 0.04137972, 3.8440699 ], [ 1.81428985, 0.09565841, 3.84338287], [ 1.81375055, 0.1499371 , 3.84269584]]) - sample_position()vector3m[0. 0. 0.]
Values:
array([0., 0., 0.]) - source_position()vector3m[ 0. 0. -60.]
Values:
array([ 0., 0., -60.]) - spectrum(spectrum)int321, 2, ..., 24793, 24794
Values:
array([ 1, 2, 3, ..., 24792, 24793, 24794], dtype=int32) - tof(tof [bin-edge])float64µs20.0, 1.669e+04
Values:
array([ 20. , 16694.30078125])
- (spectrum, tof)DataArrayViewbinned data [len=0, len=0, ..., len=0, len=0]
dim='event', content=DataArray( dims=(event: 17926980), data=float32[counts], coords={'tof':float64[µs]})
Example 1 (dense data): split-sum-combine#
We histogram the event data:
[4]:
pos_hist = events.hist(tof=400)
A plot shows the shortcoming of the data representation. There is no physical meaning attached to the “spectrum” dimension and the plot is hard to interpret:
[5]:
pos_hist.hist(spectrum=500).transpose().plot()
[5]:
To improve the plot, we first store the scattering angle as labels in the data array. Then we create a variable containing the desired target binning:
[6]:
pos_hist.coords['two_theta'] = scn.two_theta(pos_hist)
two_theta = sc.linspace(dim='two_theta', unit='rad', start=0.0, stop=np.pi, num=501)
We use scipp.groupby
with the desired bins and apply a sum
over dimension spectrum
:
[7]:
theta_hist = pos_hist.groupby('two_theta', bins=two_theta).sum('spectrum')
The result has spectrum
replaced by the physically meaningful two_theta
dimension and the resulting plot is easily interpretable:
[8]:
theta_hist.plot()
[8]:
Example 2 (event data): split-flatten-combine#
This is essentially the same as example 1 but avoids histogramming data too early. A plot of the original data is hard to interpret:
[9]:
events.hist(spectrum=500, tof=400).plot()
[9]:
Again, we improve the plot by first storing the scattering angle as labels in the data array with the events. Then we create a variable containing the desired target binning:
[10]:
events.coords['two_theta'] = scn.two_theta(events)
two_theta = sc.linspace(dim='two_theta', unit='rad', start=0.0, stop=np.pi, num=501)
We use scipp.groupby
with the desired bins and apply a concatenation operation on dimension spectrum
. This is the event-data equivalent to summing histograms:
[11]:
theta_events = events.groupby('two_theta', bins=two_theta).concat('spectrum')
The result has dimension spectrum
replaced by the physically meaningful two_theta
and results in the same plot as before with histogrammed data.
[12]:
theta_events.hist(tof=400).plot()
[12]: