Quick Start Guide#

Overview#

The NeXus Data Format is typically used to structure HDF5 files. An HDF5 file is a container for datasets and groups. Groups are folder-like and work like Python dictionaries. Datasets work like NumPy arrays. In addition, groups and datasets have a dictionary of attributes.

NeXus extends this with the following:

  • Definitions for attributes for datasets, in particular a units attribute. In NeXus, datasets are referred to as field.

  • Definitions for attributes and structure of groups. This includes:

    • An NX_class attribute, identifying a group as an instance of a particular NeXus class such as NXdata or NXlog.

    • Attributes that identify which fields contained in the group hold signal values, and which hold axis labels.

In the following we use a file from the POWGEN instrument at SNS. It is bundled with ScippNexus and will be downloaded automatically using pooch if it is not cached already:

[1]:
from scippnexus import data

filename = data.get_path('PG3_4844_event.nxs')

Opening files#

Given such a NeXus file, we first need to open it. Wherever possible this should be done using a context manager as follows:

[2]:
# To use the legacy interface, use:
# import scippnexus.v1 as snx
import scippnexus as snx

with snx.File(filename) as f:
    print(list(f.keys()))
['entry']

Unfortunately working with a context manager in a Jupyter Notebook is cumbersome, so for the following we open the file directly instead:

[3]:
f = snx.File(filename)

Loading groups and datasets#

This proton_charge group we “navigated” to above is an NXlog, which typically contains 1-D data with a time axis. Since ScippNexus knows about NXlog, it knows how to identify its shape:

[7]:
proton_charge.shape
[7]:
(330473,)

Note:

This is in contrast to plain HDF5 where groups do not have a shape. Note that not all NeXus classes have a defined shape.

We read the NXlog from the file using the slicing notation. To read the entire group, use ellipses (or an empty tuple):

[8]:
proton_charge[...]
[8]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (5.04 MB)
    • time: 330473
    • average_value
      ()
      float64
      pC
      12766652.799260454
      σ = 5061670.635363746
      Values:
      array(12766652.79926045)

      Variances (σ²):
      array(2.56205096e+13)
    • description
      ()
      string
      description
      Values:
      'description'
    • duration
      ()
      float32
      s
      5508.0
      Values:
      array(5508., dtype=float32)
    • maximum_value
      ()
      float64
      pC
      15146700.0
      Values:
      array(15146700.)
    • minimum_value
      ()
      float64
      pC
      0.0
      Values:
      array(0.)
    • time
      (time)
      datetime64
      ns
      2011-08-12T15:50:17.000000000, 2011-08-12T15:50:17.016659999, ..., 2011-08-12T17:22:05.085449218, 2011-08-12T17:22:05.102050781
      Values:
      array(['2011-08-12T15:50:17.000000000', '2011-08-12T15:50:17.016659999', '2011-08-12T15:50:17.033321000', ..., '2011-08-12T17:22:05.068359375', '2011-08-12T17:22:05.085449218', '2011-08-12T17:22:05.102050781'], dtype='datetime64[ns]')
    • (time)
      float64
      pC
      1.484e+07, 1.484e+07, ..., 1.487e+07, 1.484e+07
      Values:
      array([14843360., 14843360., 14787820., ..., 14809180., 14873260., 14839090.])

Above, ScippNexus automatically dealt with:

  • Loading the data field (signal value dataset and its 'units' attribute).

  • Identifying the dimension labels (here: 'time').

  • Other fields in the group were loaded as coordinates, including:

    • Units of the fields.

    • Uncertainties of the fields (here for 'average_value').

This structure is compatible with a scipp.DataArray and is returned as such.

We may also load an individual field instead of an entire group. A field corresponds to a scipp.Variable, i.e., similar to how h5py represents datasets as NumPy arrays but with an added unit and dimension labels (if applicable). For example, we may load only the 'value' dataset:

[9]:
proton_charge['value'][...]
[9]:
Show/Hide data repr Show/Hide attributes
scipp.Variable (2.52 MB)
    • (time: 330473)
      float64
      pC
      1.484e+07, 1.484e+07, ..., 1.487e+07, 1.484e+07
      Values:
      array([14843360., 14843360., 14787820., ..., 14809180., 14873260., 14839090.])

Attributes of datasets or groups are accessed just like in h5py:

[10]:
proton_charge['value'].attrs['units']
[10]:
'picoCoulombs'

A subset of the group (and its datasets) can be loaded by selecting only a slice. We can also plot this directly using the plot method of scipp.DataArray:

[11]:
proton_charge['time', 193000:197000].plot()
[11]:
../_images/user-guide_quick-start-guide_25_0.svg

As another example, consider the following NXdata group:

[12]:
bank = f['entry/bank103']
print(bank.shape, bank.dims)
(154, 7) ('x_pixel_offset', 'y_pixel_offset')

This can be loaded and plotted as above. In this case the resulting data array is 2-D:

[13]:
da = bank[...]
da
[13]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (6.09 KB)
    • x_pixel_offset: 154
    • y_pixel_offset: 7
    • x_pixel_offset
      (x_pixel_offset)
      float32
      m
      -0.3825, -0.3775, ..., 0.3775, 0.3825
      Values:
      array([-0.3825, -0.3775, -0.3725, -0.3675, -0.3625, -0.3575, -0.3525, -0.3475, -0.3425, -0.3375, -0.3325, -0.3275, -0.3225, -0.3175, -0.3125, -0.3075, -0.3025, -0.2975, -0.2925, -0.2875, -0.2825, -0.2775, -0.2725, -0.2675, -0.2625, -0.2575, -0.2525, -0.2475, -0.2425, -0.2375, -0.2325, -0.2275, -0.2225, -0.2175, -0.2125, -0.2075, -0.2025, -0.1975, -0.1925, -0.1875, -0.1825, -0.1775, -0.1725, -0.1675, -0.1625, -0.1575, -0.1525, -0.1475, -0.1425, -0.1375, -0.1325, -0.1275, -0.1225, -0.1175, -0.1125, -0.1075, -0.1025, -0.0975, -0.0925, -0.0875, -0.0825, -0.0775, -0.0725, -0.0675, -0.0625, -0.0575, -0.0525, -0.0475, -0.0425, -0.0375, -0.0325, -0.0275, -0.0225, -0.0175, -0.0125, -0.0075, -0.0025, 0.0025, 0.0075, 0.0125, 0.0175, 0.0225, 0.0275, 0.0325, 0.0375, 0.0425, 0.0475, 0.0525, 0.0575, 0.0625, 0.0675, 0.0725, 0.0775, 0.0825, 0.0875, 0.0925, 0.0975, 0.1025, 0.1075, 0.1125, 0.1175, 0.1225, 0.1275, 0.1325, 0.1375, 0.1425, 0.1475, 0.1525, 0.1575, 0.1625, 0.1675, 0.1725, 0.1775, 0.1825, 0.1875, 0.1925, 0.1975, 0.2025, 0.2075, 0.2125, 0.2175, 0.2225, 0.2275, 0.2325, 0.2375, 0.2425, 0.2475, 0.2525, 0.2575, 0.2625, 0.2675, 0.2725, 0.2775, 0.2825, 0.2875, 0.2925, 0.2975, 0.3025, 0.3075, 0.3125, 0.3175, 0.3225, 0.3275, 0.3325, 0.3375, 0.3425, 0.3475, 0.3525, 0.3575, 0.3625, 0.3675, 0.3725, 0.3775, 0.3825], dtype=float32)
    • y_pixel_offset
      (y_pixel_offset)
      float32
      m
      -0.1629, -0.1086, ..., 0.1086, 0.1629
      Values:
      array([-0.1629, -0.1086, -0.0543, 0. , 0.0543, 0.1086, 0.1629], dtype=float32)
    • (x_pixel_offset, y_pixel_offset)
      int32
      0, 0, ..., 0, 0
      Values:
      array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=int32)
[14]:
da.plot()
[14]:
../_images/user-guide_quick-start-guide_30_0.svg