Audience: Advanced beginner / intermediate (requires basic knowledge of scipp.DataArray)

Objectives: Constructing binned data from an event list, modifying binning in order to extract different quantities, and basic masking and filtering of event and binned data.

RHESSI Solar Flares#

This tutorial covers the basics of binned data in Scipp by analyzing the list of solar flares recorded by NASA’s RHESSI small explorer [Lin et al].

The input data has been constructed from the official flare list. It is available as a HDF5 file in Scipp’s own format and can be downloaded and accessed directly via scipp.data as shown below.

Attention

The tutorial data has been filtered and modified. It should not be used for any actual scientific analyses!

See docs/getting-started/tutorials/prepare_data_rhessi.py in the scipp source for details.

2 Spatial Distribution#

2.1 Bin by x and y#

Plot the spatial distribution of flares.

Plotting the event list flares would yield a scatter plot which is not particularly useful. A better approach is computing and plotting the density as a function of x and y. This is commonly done by histogramming the events. But Scipp offers an alternative: binning.

Scipp’s ‘binned data’ is similar to a histogram, except that the individual events are preserved. They are simply collected into bins as defined by bin-edge coordinates.

Define bin-edges for x and y, use scipp.bin to create binned data from flares, and plot the result.

Tip:

Use scipp.arange or scipp.linspace to construct the edges. Make sure to use the correct unit!
Use sc.bin(flares, edges=[<edge_x>, <edge_y>]).
Turn your binned data into a histogram before plotting using <binned>.bins.sum().
See the Plopp documentation for the relevant parts of the plotting API.

[21]:

# YOUR CODE

Solution#

Bin spatially.

[22]:

spatial = flares.bin(
    y=sc.linspace('y', -1200, 1200, 100, unit='asec'),
    x=sc.linspace('x', -1200, 1200, 100, unit='asec'),
)

Histogram and plot. The arguments to plot are optional but improve the result here.

[23]:

p = spatial.hist().plot(aspect='equal', norm='log', cmap='inferno')
p.canvas.xlabel = 'x [asec]'
p.canvas.ylabel = 'y [asec]'
p

[23]:

../../_images/getting-started_tutorials_solar_flares_58_0.svg

2.2 Remove Outliers#

The plot shows a lot of outliers. They are not the non-solar events from before because those are not visible in the plot as they were masked out. Instead, the outliers are caused by RHESSI’s electronics or analysis software glitching out and assigning bad positions to the flares. This includes the circular shape even though it looks deceptively Sun-like.

The flare list does not include enough information to exclude all such bad positions. But the instrument can only detect x-rays for \(x \in [-1000~\text{asec}, 1000~\text{asec}]\) and \(y \in [-600~\text{asec}, 600~\text{asec}]\). So to first order, all events outside that range should be removed.

Instead of using a mask as before, use binning this time. Create new bin-edges for x and y with the proper limits and bin the data with them, this will remove all events outside the valid range. Plot the result. (You can either bin the original flares data array or apply new bins to the previously binned array; both via scipp.bin).

[24]:

# YOUR CODE

Solution#

Bin into a more narrow range. Bin sizes are chosen such that bins are square.

[25]:

spatial = flares.bin(
    y=sc.linspace('y', -600, 600, 90, unit='asec'),
    x=sc.linspace('x', -1000, 1000, 150, unit='asec'),
)

And plot like before.

[26]:

p = spatial.hist().plot(aspect='equal', norm='log', cmap='inferno')
p.canvas.xlabel = 'x [asec]'
p.canvas.ylabel = 'y [asec]'
p

[26]:

../../_images/getting-started_tutorials_solar_flares_66_0.svg

2.3 Correct for Detector Efficiency#

In this tutorial, we assume that the instrument consists of a 3x3 grid of detectors which each record x-rays from distinct directions. (The reality is more complicated, of course.) Furthermore, the tutorial data has been manipulated to simulate different efficiencies of the individual detectors.

The efficiency is available in the input data group:

[27]:

efficiency = flare_datagroup['detector_efficiency']
efficiency

[27]:

scipp.DataArray (1.25 KB)

Dimensions:
- x: 3
- y: 3
Coordinates: (2)
- x
  (x [bin-edge])
  float64
  4.84813681109535984e-06rad
  -1000.000, -300.0, 300.0, 1000.000
```
Values:
array([-1000.,  -300.,   300.,  1000.])
```
- y
  (y [bin-edge])
  float64
  4.84813681109535984e-06rad
  -600.0, -200.0, 200.0, 600.0
```
Values:
array([-600., -200.,  200.,  600.])
```

Data: (1)

(x, y)

float64

𝟙

0.9, 0.95, ..., 0.94, 0.91

Values:
array([[0.9 , 0.95, 0.77],
       [0.98, 0.89, 0.82],
       [0.93, 0.94, 0.91]])

Normalize the data by dividing by efficiency. You first need to bin into the correct bins as defined by the coordinates of efficiency.

[28]:

# YOUR CODE

Solution#

[29]:

coarse_spatial = spatial.bin(x=efficiency.coords['x'], y=efficiency.coords['y'])
corrected = coarse_spatial / efficiency

You can plot the corrected data as before. But it makes sense to return to smaller bins in order to resolve the distribution properly. This can be done using scipp.bin with finer edges because the data was only binned and not histogrammed.

[30]:

# YOUR CODE

Solution#

Bin in the same way as before but using corrected as input

[31]:

spatial = corrected.bin(
    y=sc.linspace('y', -600, 600, 90, unit='asec'),
    x=sc.linspace('x', -1000, 1000, 150, unit='asec'),
)

And plot like before.

[32]:

p = spatial.hist().plot(aspect='equal', norm='log', cmap='inferno')
p.canvas.xlabel = 'x [asec]'
p.canvas.ylabel = 'y [asec]'
p

[32]:

../../_images/getting-started_tutorials_solar_flares_82_0.svg

3 Time Series#

Now, we want to look at the distribution of flares over time. The temporal distribution can be obtained like the spatial one using binning.

3.1 Bin on time coordinate#

Select one of the time coordinates (e.g. ‘peak_time’) and bin with an appropriate bin size. Plot the result.

Important: Use the pre-binned data from the previous task in order to include the detector normalization.

Tip:

The time coordinates are event-coordinates, that is, they are not defined at the top level of the binned data (i.e. per bin) but inside of the bins (i.e. per event). You can access them using <binned>.bins.coords['peak_time'].
You can add binning by time using <binned>.bin(peak_time=<n_bins>) and produce a three-dimensional array. But here, we are more interested in a one-dimensional distribution. So we need to erase the binning in x and y which can be achieved, e.g., using <binned>.bins.concat('x').

Alternatively, the lower level function scipp.binning.make_binned can be used to bin in time and erase spatial binning at the same time.

[33]:

# YOUR CODE

Solution#

Here we keep the existing binning and plot time slices of the distribution of the events using the plopp.slicer tool. The slicer returns two widgets, a figure and a slider, to change properties of the figure we access the widget at index 0. Notice that the color scale is kept fixed by setting vmin and vmax keyword arguments in the call to plot.

[34]:

%matplotlib widget
import plopp as pp

temporal_and_spatial = spatial.bin(peak_time=10)
p = pp.slicer(
    temporal_and_spatial.hist(),
    keep=['y', 'x'],
    vmin=sc.scalar(1),
    autoscale='fixed',
    aspect='equal',
    cmap='magma',
    norm='log',
)
p.canvas.xlabel = 'x [asec]'
p.canvas.ylabel = 'y [asec]'
p

[34]:

Or you can remove the existing binning and bin in time to yield a 1D-variable representing the number of flares in each time bin.

[35]:

temporal = spatial.bins.concat('x').bins.concat('y').bin(peak_time=200)
temporal.hist().plot()

[35]:

Note that it is important to first remove the binning in x and y by using concat. Binning in time first and then removing other binning would use too much memory. It is possible to do everything in one step using a lower-level API:

[36]:

from scipp.binning import make_binned

time = spatial.bins.coords['peak_time']
min_time = time.min().value
max_time = time.max().value
step = (max_time - min_time) / 200
time_edges = sc.arange('peak_time', min_time, max_time, step, unit=time.bins.unit)
temporal = make_binned(spatial, edges=[time_edges], erase=('x', 'y'))
temporal.hist().plot()

[36]:

3.2 Flare Durations#

Another interesting quantity to look at is the duration of flares. The duration was computed earlier and should already be stored as an event coordinate.

Plot the duration as a function of time.

Tip:

Construct a new data array from the previous result: duration = <binned_by_time>.copy(). And assign new data using duration.bins.data = <duration_data>.
Previously, we used .hist() (which is an alias for .bins.sum()) to make histograms because the data was given as counts. Now the data is seconds which should be averaged instead of summed. So use .bins.mean() instead.

[37]:

# YOUR CODE

Solution#

Create a new data array containing the duration as its data.

[38]:

duration = temporal.copy()
duration.name = 'duration'
duration.bins.data = duration.bins.coords.pop('duration')

Plot duration vs time.

[39]:

duration.bins.mean().plot()

[39]:

4 Energy Bands#

The flares were recorded in several non-overlapping energy bands. Those are identified by the ‘min_energy’ and ‘max_energy’ coordinates in flares. Since the bands do not overlap, it is sufficient to label them with ‘min_energy’ for simplicity.

4.1 Group by energy content#

In this section, we want to split the temporal distribution from above into the different energy bands. Group the temporal distribution by ‘min_energy’ to obtain two-dimensional data. Use scipp.group instead of scipp.bin this time because every event has exactly one of a set of possible energies instead of a value in a range.

[40]:

# YOUR CODE

Solution#

Skipping the lowest energy band here because those events are not confirmed to be solar flares. This is optional.

[41]:

grouped_by_energy = temporal.group('min_energy')['min_energy', 1:]

Plot the result. A 2D plot is not very useful here, so split the data by ‘min_energy’ and either plot each energy in a separate plot or combine them into a dictionary and plot that:

lines = {'<name>': <data_array>}
sc.plot(lines)

[42]:

# YOUR CODE

Solution#

[43]:

lines = {
    f"min_energy={energy.value} {energy.unit}": grouped_by_energy[
        'min_energy', energy
    ].bins.sum()
    for energy in grouped_by_energy.coords['min_energy']
}
sc.plot(lines)

[43]:

References#

Lin, R., Dennis, B., Hurford, G. et al. The Reuven Ramaty High-Energy Solar Spectroscopic Imager (RHESSI). Sol Phys 210, 3–32 (2002). doi:10.1023/A:1022428818870

This Page

RHESSI Solar Flares#

1 Loading and preprocessing data#

1.1 Load Flare List#

1.2 Inspect the Loaded Data#

Solution#

1.3 Compute Duration#

Solution#

Solution#

1.4 Create Masks#

Non-solar mask#

Solution#

Unknown position mask#

Solution#

Solution#

2 Spatial Distribution#

2.1 Bin by x and y#

Solution#

2.2 Remove Outliers#

Solution#

2.3 Correct for Detector Efficiency#

Solution#

Solution#

3 Time Series#

3.1 Bin on time coordinate#

Solution#

3.2 Flare Durations#

Solution#

4 Energy Bands#

4.1 Group by energy content#

Solution#

Solution#

References#