# GroupBy

"Group by" operations refers to an implementation of the "split-apply-combine" approach known from [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) and [xarray](http://xarray.pydata.org/en/stable/groupby.html).
We currently support only a limited number of operations that can be applied.

## Grouping with bins

Note that this notebook requires [Mantid](https://www.mantidproject.org/Main_Page).


In [None]:
import numpy as np
import scipp as sc
import scippneutron as scn
import scippneutron.data

In [None]:
# Load event data. Here, we use `get_path` to find a data file that comes bundled
# with scippneutron. Normally, we would simply pass a file path to `scn.load`.
data = scn.load_with_mantid(
    scn.data.get_path('PG3_4844_event.nxs'), load_pulse_times=False
)
data

In [None]:
events = data['data']
events

### Example 1 (dense data): split-sum-combine

We histogram the event data:

In [None]:
pos_hist = events.hist(tof=400)

A plot shows the shortcoming of the data representation.
There is no physical meaning attached to the "spectrum" dimension and the plot is hard to interpret:

In [None]:
pos_hist.hist(spectrum=500).transpose().plot()

To improve the plot, we first store the scattering angle as labels in the data array.
Then we create a variable containing the desired target binning:

In [None]:
pos_hist.coords['two_theta'] = scn.two_theta(pos_hist)
two_theta = sc.linspace(dim='two_theta', unit='rad', start=0.0, stop=np.pi, num=501)

We use `scipp.groupby` with the desired bins and apply a `sum` over dimension `spectrum`:

In [None]:
theta_hist = pos_hist.groupby('two_theta', bins=two_theta).sum('spectrum')

The result has `spectrum` replaced by the physically meaningful `two_theta` dimension and the resulting plot is easily interpretable:

In [None]:
theta_hist.plot()

### Example 2 (event data): split-flatten-combine

This is essentially the same as example 1 but avoids histogramming data too early.
A plot of the original data is hard to interpret:

In [None]:
events.hist(spectrum=500, tof=400).plot()

Again, we improve the plot by first storing the scattering angle as labels in the data array with the events.
Then we create a variable containing the desired target binning:

In [None]:
events.coords['two_theta'] = scn.two_theta(events)
two_theta = sc.linspace(dim='two_theta', unit='rad', start=0.0, stop=np.pi, num=501)

We use `scipp.groupby` with the desired bins and apply a concatenation operation on dimension `spectrum`.
This is the event-data equivalent to summing histograms:

In [None]:
theta_events = events.groupby('two_theta', bins=two_theta).concat('spectrum')

The result has dimension `spectrum` replaced by the physically meaningful `two_theta` and results in the same plot as before with histogrammed data.

In [None]:
theta_events.hist(tof=400).plot()