Filtering#

Event filtering refers to the process of removing or extracting a subset of events based on some criterion such as the temperature of the measured sample at the time an event was detected. Scipp’s binned data can be used for this purpose.

Below, we describe two cases. In the simple case the data contains the required coordinate and sc.bin can be used directly. In the more complex case metadata requires preprocessing, and generally there are three steps to take:

  1. Preprocess the metadata used for filtering. For example, a noisy time series of temperature values needs to converted into a series of time intervals with a fixed temperature value within the interval. This process might involve defining thresholds and tolerances or interpolation methods between measured temperature values.

  2. Map event timestamps to temperature values.

  3. Filter data based on temperature values.

Preparation#

We create some fake data for illustration purposes.

Note

In practice data to be filtered would be based on a loaded file. Details of this subsection can safely by skipped, as long as all cells are executed.

[1]:
import numpy as np
import scipp as sc
[2]:
np.random.seed(1) # Fixed for reproducibility
end_time = 100000
tof_max = 10000
width = tof_max/20
sizes = 4*np.array([7000, 3333, 3000, 5000])
size = np.sum(sizes)
data = sc.ones(dims=['event'], unit='counts', shape=[size], with_variances=True)
time = sc.zeros(dims=['event'], unit='s', dtype='datetime64', shape=[size])
# time-of-flight in a neutron-scattering experiment
tof = sc.zeros(dims=['event'], unit='us', dtype='float64', shape=[size])
table = sc.DataArray(data=data, coords={'time':time, 'tof':tof})
table
[2]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (2.24 MB)
    • event: 73332
    • time
      (event)
      datetime64
      s
      1970-01-01T00:00:00, 1970-01-01T00:00:00, ..., 1970-01-01T00:00:00, 1970-01-01T00:00:00
      Values:
      array(['1970-01-01T00:00:00', '1970-01-01T00:00:00', '1970-01-01T00:00:00', ..., '1970-01-01T00:00:00', '1970-01-01T00:00:00', '1970-01-01T00:00:00'], dtype='datetime64[s]')
    • tof
      (event)
      float64
      µs
      0.0, 0.0, ..., 0.0, 0.0
      Values:
      array([0., 0., 0., ..., 0., 0., 0.])
    • (event)
      float64
      counts
      1.0, 1.0, ..., 1.0, 1.0
      σ = 1.0, 1.0, ..., 1.0, 1.0
      Values:
      array([1., 1., 1., ..., 1., 1., 1.])

      Variances (σ²):
      array([1., 1., 1., ..., 1., 1., 1.])
[3]:
ntemp = 100
sample_temperature = sc.DataArray(
    data=sc.array(dims=['time'], unit='K',
                  values=5*np.random.rand(100)+np.linspace(100, 120, num=ntemp)),
    coords={'time':sc.Variable(dims=['time'], unit='s',
                               values=np.linspace(0, end_time, num=ntemp).astype('datetime64[s]'))})
x = sc.linspace(dim='x', unit='m', start=0, stop=1, num=4)

end = sc.array(dims=['x'], values=np.cumsum(sizes), unit=None)
begin = end.copy()
begin.values -= sizes
events = sc.DataArray(
    data=sc.bins(begin=begin, end=end, dim='event', data=table),
    coords={'x': x},
    attrs={'sample_temperature': sc.scalar(value=sample_temperature)})
for size, bucket in zip(sizes, events.values):
    bucket.coords['time'].values = np.linspace(0, end_time, num=size).astype('datetime64[s]')
    bucket.coords['tof'].values = np.concatenate(
        (np.concatenate(
            (7*width + width*np.random.randn(size//4),
             13*width + width*np.random.randn(size//4))),
         10*width + width*np.random.randn(size//2)))
events
[3]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (2.24 MB)
    • x: 4
    • x
      (x)
      float64
      m
      0.0, 0.333, 0.667, 1.0
      Values:
      array([0. , 0.33333333, 0.66666667, 1. ])
    • (x)
      DataArrayView
      binned data [len=28000, len=13332, len=12000, len=20000]
      Values:
      [<scipp.DataArray> Dimensions: Sizes[event:28000, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:00, 1970-01-01T00:00:03, ..., 1970-01-02T03:46:36, 1970-01-02T03:46:40] tof float64 [µs] (event) [3657.82, 2488.9, ..., 4940.78, 5328.82] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:13332, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:00, 1970-01-01T00:00:07, ..., 1970-01-02T03:46:32, 1970-01-02T03:46:40] tof float64 [µs] (event) [4095.09, 3596.57, ..., 5866.26, 5605.49] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:12000, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:00, 1970-01-01T00:00:08, ..., 1970-01-02T03:46:31, 1970-01-02T03:46:40] tof float64 [µs] (event) [4099.62, 2460.33, ..., 4803.6, 5116.26] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:20000, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:00, 1970-01-01T00:00:05, ..., 1970-01-02T03:46:34, 1970-01-02T03:46:40] tof float64 [µs] (event) [3640.65, 3293.74, ..., 5004.57, 5290.82] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] ]
    • sample_temperature
      ()
      DataArray
      {dims=[time: 100], unit=K, coords=[time]}
      Values:
      <scipp.DataArray> Dimensions: Sizes[time:100, ] Coordinates: time datetime64 [s] (time) [1970-01-01T00:00:00, 1970-01-01T00:16:50, ..., 1970-01-02T03:29:49, 1970-01-02T03:46:40] Data: float64 [K] (time) [102.085, 103.804, ..., 119.812, 123.086]

Filtering based on existing coords#

Extracting data based on an interval#

We can use sc.bin with the desired bounds to extract all data points (events) that have coord values falling within an interval:

[4]:
tof_interval = sc.array(dims=['tof'], values=[2000.0, 3000.0], unit='us')
filtered = sc.bin(events, edges=[tof_interval])
filtered
[4]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (90.34 KB)
    • x: 4
    • tof: 1
    • tof
      (tof [bin-edge])
      float64
      µs
      2000.000, 3000.000
      Values:
      array([2000., 3000.])
    • x
      (x)
      float64
      m
      0.0, 0.333, 0.667, 1.0
      Values:
      array([0. , 0.33333333, 0.66666667, 1. ])
    • (x, tof)
      DataArrayView
      binned data [len=1094, len=564, len=440, len=787]
      Values:
      [<scipp.DataArray> Dimensions: Sizes[event:1094, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:03, 1970-01-01T00:01:47, ..., 1970-01-01T06:56:23, 1970-01-02T01:55:31] tof float64 [µs] (event) [2488.9, 2952.54, ..., 2926.64, 2883.42] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:564, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:30, 1970-01-01T00:01:30, ..., 1970-01-01T06:56:26, 1970-01-02T03:08:54] tof float64 [µs] (event) [2597.44, 2882.81, ..., 2926.81, 2955.76] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:440, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:08, 1970-01-01T00:00:16, ..., 1970-01-01T06:54:12, 1970-01-01T17:00:21] tof float64 [µs] (event) [2460.33, 2193.95, ..., 2232.85, 2979.16] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:787, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:30, 1970-01-01T00:00:45, ..., 1970-01-01T06:56:26, 1970-01-02T02:21:44] tof float64 [µs] (event) [2713.52, 2782.41, ..., 2962.54, 2836.56] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] ]
    • sample_temperature
      ()
      DataArray
      {dims=[time: 100], unit=K, coords=[time]}
      Values:
      <scipp.DataArray> Dimensions: Sizes[time:100, ] Coordinates: time datetime64 [s] (time) [1970-01-01T00:00:00, 1970-01-01T00:16:50, ..., 1970-01-02T03:29:49, 1970-01-02T03:46:40] Data: float64 [K] (time) [102.085, 103.804, ..., 119.812, 123.086]

Extracting/splitting data based on multiple intervals#

In the same manner, we can extract data with a list of (adjacent) intervals:

[5]:
tof_intervals = sc.linspace(dim='tof', start=2000, stop=3000, num=4, unit='us')
filtered = sc.bin(events, edges=[tof_intervals])
filtered
[5]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (90.48 KB)
    • x: 4
    • tof: 3
    • tof
      (tof [bin-edge])
      float64
      µs
      2000.000, 2333.333, 2666.667, 3000.000
      Values:
      array([2000. , 2333.33333333, 2666.66666667, 3000. ])
    • x
      (x)
      float64
      m
      0.0, 0.333, 0.667, 1.0
      Values:
      array([0. , 0.33333333, 0.66666667, 1. ])
    • (x, tof)
      DataArrayView
      binned data [len=56, len=265, ..., len=182, len=558]
      Values:
      [<scipp.DataArray> Dimensions: Sizes[event:56, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:04:35, 1970-01-01T00:10:35, ..., 1970-01-01T06:47:23, 1970-01-01T06:54:35] tof float64 [µs] (event) [2282.58, 2103.46, ..., 2252.32, 2179.65] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:265, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:03, 1970-01-01T00:02:58, ..., 1970-01-01T06:54:10, 1970-01-01T06:56:15] tof float64 [µs] (event) [2488.9, 2571.01, ..., 2415.41, 2615.17] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , ..., <scipp.DataArray> Dimensions: Sizes[event:182, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:55, 1970-01-01T00:07:15, ..., 1970-01-01T06:49:06, 1970-01-01T06:52:46] tof float64 [µs] (event) [2665.11, 2638.1, ..., 2492.92, 2554.22] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:558, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:30, 1970-01-01T00:00:45, ..., 1970-01-01T06:56:26, 1970-01-02T02:21:44] tof float64 [µs] (event) [2713.52, 2782.41, ..., 2962.54, 2836.56] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] ]
    • sample_temperature
      ()
      DataArray
      {dims=[time: 100], unit=K, coords=[time]}
      Values:
      <scipp.DataArray> Dimensions: Sizes[time:100, ] Coordinates: time datetime64 [s] (time) [1970-01-01T00:00:00, 1970-01-01T00:16:50, ..., 1970-01-02T03:29:49, 1970-01-02T03:46:40] Data: float64 [K] (time) [102.085, 103.804, ..., 119.812, 123.086]

Events in each of the subintervals can then be accessed using the usual slicing syntax:

[6]:
filtered['tof',2]
[6]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (64.03 KB out of 90.48 KB)
    • x: 4
    • x
      (x)
      float64
      m
      0.0, 0.333, 0.667, 1.0
      Values:
      array([0. , 0.33333333, 0.66666667, 1. ])
    • (x)
      DataArrayView
      binned data [len=773, len=404, len=308, len=558]
      Values:
      [<scipp.DataArray> Dimensions: Sizes[event:773, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:01:47, 1970-01-01T00:02:08, ..., 1970-01-01T06:56:23, 1970-01-02T01:55:31] tof float64 [µs] (event) [2952.54, 2813.44, ..., 2926.64, 2883.42] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:404, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:01:30, 1970-01-01T00:02:30, ..., 1970-01-01T06:56:26, 1970-01-02T03:08:54] tof float64 [µs] (event) [2882.81, 2701.77, ..., 2926.81, 2955.76] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:308, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:50, 1970-01-01T00:01:40, ..., 1970-01-01T06:51:58, 1970-01-01T17:00:21] tof float64 [µs] (event) [2770.82, 2771.94, ..., 2784.8, 2979.16] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] , <scipp.DataArray> Dimensions: Sizes[event:558, ] Coordinates: time datetime64 [s] (event) [1970-01-01T00:00:30, 1970-01-01T00:00:45, ..., 1970-01-01T06:56:26, 1970-01-02T02:21:44] tof float64 [µs] (event) [2713.52, 2782.41, ..., 2962.54, 2836.56] Data: float64 [counts] (event) [1, 1, ..., 1, 1] [1, 1, ..., 1, 1] ]
    • sample_temperature
      ()
      DataArray
      {dims=[time: 100], unit=K, coords=[time]}
      Values:
      <scipp.DataArray> Dimensions: Sizes[time:100, ] Coordinates: time datetime64 [s] (time) [1970-01-01T00:00:00, 1970-01-01T00:16:50, ..., 1970-01-02T03:29:49, 1970-01-02T03:46:40] Data: float64 [K] (time) [102.085, 103.804, ..., 119.812, 123.086]
    • tof
      (tof)
      float64
      µs
      2666.667, 3000.000
      Values:
      array([2666.66666667, 3000. ])

Filtering based on arbitrary metadata#

Step 1: Preprocess metadata#

Our data contains a coordinate with metadata related to the temperature of the measured sample:

[7]:
timeseries = events.attrs['sample_temperature'].value
timeseries.plot()

This is a timeseries with noisy measurements, as could be obtained, e.g., from a temperature sensor. For event filtering we require intervals with a fixed temperature. This can be obtained in many ways. In this example we do so by taking the mean over subintervals:

[8]:
average=4
temperature = sc.fold(timeseries, dim='time', sizes={'time': ntemp//average, 'dummy': average})
time_coord = temperature.coords['time']['dummy', 0]
temperature.coords['time'] = sc.concat([time_coord, time_coord['time', -1] + 1*sc.units.s], 'time')
temperature = temperature.mean('dummy')
temperature.plot()

Step 2: Map time stamps#

The temperature data array computed above can be seen as a discretized functional dependence of temperature on time. This “function” can now be used to map the time of each event to the temperature of each event:

[9]:
event_temp = sc.lookup(temperature, 'time')[events.bins.coords['time']]
events.bins.coords['temperature'] = event_temp

The event lists with temperature values created by scipp.map have been added as a new coordinate:

[10]:
events.values[0]
[10]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.07 MB out of 2.80 MB)
    • event: 28000
    • temperature
      (event)
      float64
      K
      102.103, 102.103, ..., 0.0, 0.0
      Values:
      array([102.10277211, 102.10277211, 102.10277211, ..., 0. , 0. , 0. ])
    • time
      (event)
      datetime64
      s
      1970-01-01T00:00:00, 1970-01-01T00:00:03, ..., 1970-01-02T03:46:36, 1970-01-02T03:46:40
      Values:
      array(['1970-01-01T00:00:00', '1970-01-01T00:00:03', '1970-01-01T00:00:07', ..., '1970-01-02T03:46:32', '1970-01-02T03:46:36', '1970-01-02T03:46:40'], dtype='datetime64[s]')
    • tof
      (event)
      float64
      µs
      3657.817, 2488.899, ..., 4940.781, 5328.825
      Values:
      array([3657.81747362, 2488.89939209, 3346.89799369, ..., 5505.65421418, 4940.7805401 , 5328.8249963 ])
    • (event)
      float64
      counts
      1.0, 1.0, ..., 1.0, 1.0
      σ = 1.0, 1.0, ..., 1.0, 1.0
      Values:
      array([1., 1., 1., ..., 1., 1., 1.])

      Variances (σ²):
      array([1., 1., 1., ..., 1., 1., 1.])

Step 3: Filter with scipp.bin#

The temperature coordinate created in the previous step can now be used for the actual filtering step. With a temperature coordinate stored as part of events it is possible to use scipp.bin with temperature bins:

[11]:
tof_bins = sc.Variable(dims=['tof'], unit=sc.units.us, values=np.linspace(0,tof_max,num=100))
temp_bins = sc.Variable(dims=['temperature'], unit=sc.units.K, values=np.linspace(100.0, 130.0, num=6))
binned_events = sc.bin(events, edges=[temp_bins, tof_bins])
binned_events
[11]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (2.74 MB)
    • x: 4
    • temperature: 5
    • tof: 99
    • temperature
      (temperature [bin-edge])
      float64
      K
      100.0, 106.0, ..., 124.0, 130.0
      Values:
      array([100., 106., 112., 118., 124., 130.])
    • tof
      (tof [bin-edge])
      float64
      µs
      0.0, 101.010, ..., 9898.990, 1.000e+04
      Values:
      array([ 0. , 101.01010101, 202.02020202, 303.03030303, 404.04040404, 505.05050505, 606.06060606, 707.07070707, 808.08080808, 909.09090909, 1010.1010101 , 1111.11111111, 1212.12121212, 1313.13131313, 1414.14141414, 1515.15151515, 1616.16161616, 1717.17171717, 1818.18181818, 1919.19191919, 2020.2020202 , 2121.21212121, 2222.22222222, 2323.23232323, 2424.24242424, 2525.25252525, 2626.26262626, 2727.27272727, 2828.28282828, 2929.29292929, 3030.3030303 , 3131.31313131, 3232.32323232, 3333.33333333, 3434.34343434, 3535.35353535, 3636.36363636, 3737.37373737, 3838.38383838, 3939.39393939, 4040.4040404 , 4141.41414141, 4242.42424242, 4343.43434343, 4444.44444444, 4545.45454545, 4646.46464646, 4747.47474747, 4848.48484848, 4949.49494949, 5050.50505051, 5151.51515152, 5252.52525253, 5353.53535354, 5454.54545455, 5555.55555556, 5656.56565657, 5757.57575758, 5858.58585859, 5959.5959596 , 6060.60606061, 6161.61616162, 6262.62626263, 6363.63636364, 6464.64646465, 6565.65656566, 6666.66666667, 6767.67676768, 6868.68686869, 6969.6969697 , 7070.70707071, 7171.71717172, 7272.72727273, 7373.73737374, 7474.74747475, 7575.75757576, 7676.76767677, 7777.77777778, 7878.78787879, 7979.7979798 , 8080.80808081, 8181.81818182, 8282.82828283, 8383.83838384, 8484.84848485, 8585.85858586, 8686.86868687, 8787.87878788, 8888.88888889, 8989.8989899 , 9090.90909091, 9191.91919192, 9292.92929293, 9393.93939394, 9494.94949495, 9595.95959596, 9696.96969697, 9797.97979798, 9898.98989899, 10000. ])
    • x
      (x)
      float64
      m
      0.0, 0.333, 0.667, 1.0
      Values:
      array([0. , 0.33333333, 0.66666667, 1. ])
    • (x, temperature, tof)
      DataArrayView
      binned data [len=0, len=0, ..., len=0, len=0]
      Values:
      [<scipp.DataArray> Dimensions: Sizes[event:0, ] Coordinates: temperature float64 [K] (event) [] time datetime64 [s] (event) [] tof float64 [µs] (event) [] Data: float64 [counts] (event) [] [] , <scipp.DataArray> Dimensions: Sizes[event:0, ] Coordinates: temperature float64 [K] (event) [] time datetime64 [s] (event) [] tof float64 [µs] (event) [] Data: float64 [counts] (event) [] [] , ..., <scipp.DataArray> Dimensions: Sizes[event:0, ] Coordinates: temperature float64 [K] (event) [] time datetime64 [s] (event) [] tof float64 [µs] (event) [] Data: float64 [counts] (event) [] [] , <scipp.DataArray> Dimensions: Sizes[event:0, ] Coordinates: temperature float64 [K] (event) [] time datetime64 [s] (event) [] tof float64 [µs] (event) [] Data: float64 [counts] (event) [] [] ]
    • sample_temperature
      ()
      DataArray
      {dims=[time: 100], unit=K, coords=[time]}
      Values:
      <scipp.DataArray> Dimensions: Sizes[time:100, ] Coordinates: time datetime64 [s] (time) [1970-01-01T00:00:00, 1970-01-01T00:16:50, ..., 1970-01-02T03:29:49, 1970-01-02T03:46:40] Data: float64 [K] (time) [102.085, 103.804, ..., 119.812, 123.086]

Filtering is then performed by slicing and, if desired, copying:

[12]:
filtered_view = binned_events['temperature', 0:3] # view containing only relevant events
filtered = binned_events['temperature', 0:3].copy() # extract only relevant events by copying

Slicing combined with histogramming also performs a filter operation since all events outside the histogram bounds are dropped:

[13]:
binned_events['temperature', 1].bins.sum().plot()
[14]:
binned_events['temperature', 3].bins.sum().plot()

Results from filter operations can also be inserted into a dataset for convenient handling of further operations such as histogramming, summing, or plotting:

[15]:
d = sc.Dataset()
d['below_T_c'] = binned_events['temperature', 1]
d['above_T_c'] = binned_events['temperature', 3]
sc.sum(d.bins.sum(), 'x').plot()

We can also bin without the time-of-flight coordinate to obtain the temperature dependence of the total event count, e.g., for normalization purposes:

[16]:
binned_events = sc.bin(events, edges=[temp_bins])
binned_events.plot(resolution={'x':50, 'y':10})