{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Filtering\n",
    "\n",
    "Event filtering refers to the process of removing or extracting a subset of events based on some criterion such as the temperature of the measured sample at the time an event was detected.\n",
    "Scipp's binned data can be used for this purpose.\n",
    "\n",
    "Below, we describe two cases.\n",
    "In the simple case the data contains the required coordinate and `sc.bin` can be used directly.\n",
    "In the more complex case metadata requires preprocessing, and generally there are three steps to take:\n",
    "\n",
    "1. Preprocess the metadata used for filtering.\n",
    "   For example, a noisy time series of temperature values needs to converted into a series of time intervals with a fixed temperature value within the interval.\n",
    "   This process might involve defining thresholds and tolerances or interpolation methods between measured temperature values.\n",
    "2. Map event timestamps to temperature values.\n",
    "3. Filter data based on temperature values.\n",
    "\n",
    "## Preparation\n",
    "\n",
    "We create some fake data for illustration purposes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note**\n",
    "\n",
    "In practice data to be filtered would be based on a loaded file. Details of this subsection can safely by skipped, as long as all cells are executed.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipp as sc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(1) # Fixed for reproducibility\n",
    "end_time = 100000\n",
    "tof_max = 10000\n",
    "width = tof_max/20\n",
    "sizes = 4*np.array([7000, 3333, 3000, 5000])\n",
    "size = np.sum(sizes)\n",
    "data = sc.ones(dims=['event'], unit='counts', shape=[size], with_variances=True)\n",
    "time = sc.zeros(dims=['event'], unit='s', dtype='datetime64', shape=[size])\n",
    "# time-of-flight in a neutron-scattering experiment\n",
    "tof = sc.zeros(dims=['event'], unit='us', dtype='float64', shape=[size])\n",
    "table = sc.DataArray(data=data, coords={'time':time, 'tof':tof})\n",
    "table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ntemp = 100\n",
    "sample_temperature = sc.DataArray(\n",
    "    data=sc.array(dims=['time'], unit='K',\n",
    "                  values=5*np.random.rand(100)+np.linspace(100, 120, num=ntemp)),\n",
    "    coords={'time':sc.Variable(dims=['time'], unit='s',\n",
    "                               values=np.linspace(0, end_time, num=ntemp).astype('datetime64[s]'))})\n",
    "x = sc.linspace(dim='x', unit='m', start=0, stop=1, num=4)\n",
    "\n",
    "end = sc.array(dims=['x'], values=np.cumsum(sizes))\n",
    "begin = end.copy()\n",
    "begin.values -= sizes\n",
    "events = sc.DataArray(\n",
    "    data=sc.bins(begin=begin, end=end, dim='event', data=table),\n",
    "    coords={'x': x},\n",
    "    attrs={'sample_temperature': sc.scalar(value=sample_temperature)})\n",
    "for size, bucket in zip(sizes, events.values):\n",
    "    bucket.coords['time'].values = np.linspace(0, end_time, num=size).astype('datetime64[s]')\n",
    "    bucket.coords['tof'].values = np.concatenate(\n",
    "        (np.concatenate(\n",
    "            (7*width + width*np.random.randn(size//4),\n",
    "             13*width + width*np.random.randn(size//4))),\n",
    "         10*width + width*np.random.randn(size//2)))\n",
    "events"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering based on existing coords\n",
    "\n",
    "### Extracting data based on an interval\n",
    "\n",
    "We can use `sc.bin` with the desired bounds to extract all data points (events) that have coord values falling within an interval:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tof_interval = sc.array(dims=['tof'], values=[2000.0, 3000.0], unit='us')\n",
    "filtered = sc.bin(events, edges=[tof_interval])\n",
    "filtered"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extracting/splitting data based on multiple intervals\n",
    "\n",
    "In the same manner, we can extract data with a list of (adjacent) intervals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tof_intervals = sc.linspace(dim='tof', start=2000, stop=3000, num=4, unit='us')\n",
    "filtered = sc.bin(events, edges=[tof_intervals])\n",
    "filtered"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Events in each of the subintervals can then be accessed using the usual slicing syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filtered['tof',2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering based on arbitrary metadata\n",
    "### Step 1: Preprocess metadata\n",
    "\n",
    "Our data contains a coordinate with metadata related to the temperature of the measured sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "timeseries = events.attrs['sample_temperature'].value\n",
    "timeseries.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a timeseries with noisy measurements, as could be obtained, e.g., from a temperature sensor.\n",
    "For event filtering we require intervals with a fixed temperature.\n",
    "This can be obtained in many ways.\n",
    "In this example we do so by taking the mean over subintervals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "average=4\n",
    "temperature = sc.fold(timeseries, dim='time', sizes={'time': ntemp//average, 'dummy': average})\n",
    "time_coord = temperature.coords['time']['dummy', 0]\n",
    "temperature.coords['time'] = sc.concatenate(time_coord, time_coord['time', -1] + 1*sc.units.s, 'time')\n",
    "temperature = temperature.mean('dummy')\n",
    "temperature.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Map time stamps\n",
    "\n",
    "The `temperature` data array computed above can be seen as a discretized functional dependence of temperature on time.\n",
    "This \"function\" can now be used to map the `time` of each event to the `temperature` of each event:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "event_temp = sc.lookup(temperature, 'time')[events.bins.coords['time']]\n",
    "events.bins.coords['temperature'] = event_temp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The event lists with temperature values created by `scipp.map` have been added as a new coordinate:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "events.values[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Filter with `scipp.bin`\n",
    "\n",
    "The temperature coordinate created in the previous step can now be used for the actual filtering step.\n",
    "With a `temperature` coordinate stored as part of `events` it is possible to use `scipp.bin` with temperature bins:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tof_bins = sc.Variable(dims=['tof'], unit=sc.units.us, values=np.linspace(0,tof_max,num=100))\n",
    "temp_bins = sc.Variable(dims=['temperature'], unit=sc.units.K, values=np.linspace(100.0, 130.0, num=6))\n",
    "binned_events = sc.bin(events, edges=[temp_bins, tof_bins])\n",
    "binned_events"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Filtering is then performed by slicing and, if desired, copying:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filtered_view = binned_events['temperature', 0:3] # view containing only relevant events\n",
    "filtered = binned_events['temperature', 0:3].copy() # extract only relevant events by copying"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Slicing combined with histogramming also performs a filter operation since all events outside the histogram bounds are dropped:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binned_events['temperature', 1].bins.sum().plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binned_events['temperature', 3].bins.sum().plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Results from filter operations can also be inserted into a dataset for convenient handling of further operations such as histogramming, summing, or plotting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = sc.Dataset()\n",
    "d['below_T_c'] = binned_events['temperature', 1]\n",
    "d['above_T_c'] = binned_events['temperature', 3]\n",
    "sc.sum(d.bins.sum(), 'x').plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also bin without the time-of-flight coordinate to obtain the temperature dependence of the total event count, e.g., for normalization purposes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binned_events = sc.bin(events, edges=[temp_bins])\n",
    "binned_events.plot(resolution={'x':50, 'y':10})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}