Application Definitions#

Overview#

NeXus Application Definitions define mandatory and optional class contents for specific applications. ScippNexus’ approach to application definitions is to consider them as a guide, without performing full validation. This is to avoid getting in the way of the library user, e.g., when working with incomplete or partially broken files. For example, ScippNexus will generally not validate that the tree structure conforms to a given application definition.

Warning:

ScippNexus’ support for application definitions is currently experimental and the API is still subject to changes.

Definitions provide customization points, e.g., for how ScippNexus can find required information in the HDF5 group, and how contents are mapped to aspects of the returned data (typically a scipp.DataArray or scipp.DataGroup).

Definitions in ScippNexus are subclasses of NXobject. A definitions mapping passed to snx.File serves as a repository of definitions that snx.Group will use when opening a group in a file. snx.base_definitions() is used by default. The NX_class attribute of the HDF5 group is used as a key into the definitions mapping. It provides subclasses such as NXlog, NXdata, and NXdetector.

Users can implement their application definition (or any definition) by subclassing NXobject, or one of the existing base-class definitions.

Writing files#

Skip ahead to Reading files if you simply want to customize how data is read from existing files. ScippNexus provides a customization point for writing content to NeXus files with __setitem__. The requirements are that the value

  1. provides an nx_class attribute that returns a valid NeXus class name such as 'NXdata' or scippnexus.NXdata and

  2. defines the __write_to_nexus_group__ method that takes a h5py.Group, i.e., an open HDF5 group, as its single argument.

__write_to_nexus_group__ may then write its content to this group. This can (and should) make use of ScippNexus features for writing Nexus fields (HDF5 datasets) from a scipp.Variable via snx.create_field, such as automatic writing of the units attribute, or writing datetime64 data. Consider the following example:

[1]:
import h5py
import scipp as sc
import scippnexus as snx


class MyData:
    nx_class = snx.NXdata  # required

    def __init__(self, data: sc.DataArray):
        self._data = data

    def __write_to_nexus_group__(self, group: h5py.Group):  # required
        group.attrs['axes'] = self._data.dims  # NeXus way of defining dim labels
        snx.create_field(group, 'mysignal', self._data.data)

Note that above we use a custom “signal” name and do not to set the “signal” attribute on the group and as such deviate from the NeXus specification. We can then write our data using:

[2]:
mydata = sc.DataArray(sc.arange('x', 5, unit='s'))

with snx.File('test.nxs', 'w') as f:
    f['data'] = MyData(mydata)

You can also manually write nexus classes to a hdf5 dataset with snx.create_class:

[3]:
with h5py.File('test2.nxs', mode='w') as f:
    nxdata = snx.create_class(f, 'data', nx_class=snx.NXdata)
    nxdata.attrs['axes'] = mydata.dims
    snx.create_field(nxdata, 'mysignal', mydata.data)
[4]:
%%bash
# The files created above are identical
cmp -s test.nxs test2.nxs

Reading files#

Overview#

For some application definitions — or classes within application definitions — the default ScippNexus mechanisms for reading are sufficient. This is the case when the application definition follows the NeXus standard and, e.g., introduces no new attributes.

In other cases we require customization of how ScippNexus reads class contents. This is handled using definitions that can be passed to snx.File or snx.Group.

As an example, consider the following simple definition for loading data with a custom signal name, which the file failed to specify. In this particular case we subclass snx.NXdata, and pass a custom argument to its __init__. In general this is rarely sufficient, and in practice a definition may need to implement other parts of the snx.NXobject interface:

[5]:
class MyDataDefinition(snx.NXdata):
    def __init__(self, attrs, children):
        super().__init__(
            attrs=attrs, children=children, fallback_signal_name='mysignal'
        )


my_definitions = snx.base_definitions()
my_definitions['NXdata'] = MyDataDefinition

We can then load our file (created above in Writing files) by our custom definitions to snx.File:

[6]:
with snx.File('test.nxs', 'r', definitions=my_definitions) as f:
    loaded = f['data'][...]
loaded
[6]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (808 Bytes)
    • x: 5
    • (x)
      int64
      s
      0, 1, 2, 3, 4
      Values:
      array([0, 1, 2, 3, 4])

ScippNexus does currently not ship with a library of application definitions. Custom definitions can be provided by a user as outlined above.

Using definitions for filtering#

The application-definition mechanism can be used for filtering or selecting which children from a group should be loaded. For example, we may wish to exclude certain NeXus classes from loading. We define a custom definition as follows:

[7]:
import scippnexus as snx


def skip(name, obj):
    skip_classes = (snx.NXevent_data, snx.NXinstrument)
    return isinstance(obj, snx.Group) and (
        (obj.nx_class in skip_classes) or (name == 'DASlogs')
    )


class FilteredEntry(snx.NXobject):
    def __init__(self, attrs, children):
        children = {
            name: child for name, child in children.items() if not skip(name, child)
        }
        super().__init__(attrs=attrs, children=children)


my_definitions = snx.base_definitions()
my_definitions['NXentry'] = FilteredEntry

We can use these definitions as follows:

[8]:
from scippnexus import data

filename = data.get_path('PG3_4844_event.nxs')
f = snx.File(filename, definitions=my_definitions)
f['entry'][...]
Downloading file 'PG3_4844_event.nxs' from 'https://public.esss.dk/groups/scipp/scippnexus/1/PG3_4844_event.nxs' to '/home/runner/.cache/scippnexus/1'.
[8]:
  • scipp
    DataGroup
    ()
      • SNSbanking_file_name
        str
        ()
        PG3_bank_2011_02_25.xml
      • SNSmapping_file_name
        str
        ()
        PG3_TS_2009_04_17.dat
      • author
        str
        ()
        HistoTool
      • command1
        str
        ()
        event2nxl --mapping PG3_TS_2009_04_17.dat --banking PG3_bank_2011_02_25.xml --in...
      • command2
        str
        ()
        monitorappend --time_offset 0.0 --max_time_bin 200001.0 -l 1.0 --input PG3_4844_...
      • date
        str
        ()
        2011-08-12
      • description
        str
        ()
        List of commands run within the HistoTool package
      • version
        str
        ()
        3.4.5
  • bank102
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank103
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank104
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank105
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank106
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank123
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank124
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank143
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank144
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank164
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank184
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank22
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank23
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank24
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank42
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank43
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank44
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank62
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank63
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank64
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank82
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank83
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • bank84
    scipp
    DataArray
    (x_pixel_offset: 154,
     y_pixel_offset: 7)
    int32
    0, 0, ..., 0, 0
  • collection_identifier
    str
    ()
    0
  • collection_title
    str
    ()
    No title entered
  • definition
    str
    ()
    EVENTRAW
  • duration
    scipp
    Variable
    ()
    float32
    s
    5508.0
  • end_time
    str
    ()
    2011-08-12T13:22:05-04:00
  • entry_identifier
    str
    ()
    4844
  • experiment_identifier
    str
    ()
    IPTS-2767
  • scipp
    DataGroup
    (time_of_flight: 200001)
      • mode
        str
        ()
        monitor
      • data
        scipp
        DataArray
        (time_of_flight: 200001)
        int32
        25, 10, ..., 0, 0
  • notes
    str
    ()
    NONE
  • proton_charge
    scipp
    Variable
    ()
    float64
    pC
    4219034050530.0
  • raw_frames
    int32
    ()
    330473
  • run_number
    str
    ()
    4844
  • scipp
    DataGroup
    ()
      • changer_position
        str
        ()
        NONE
      • holder
        str
        ()
        NONE
      • identifier
        str
        ()
        NONE
      • name
        str
        ()
        LaB6
      • nature
        str
        ()
        NONE
  • start_time
    str
    ()
    2011-08-12T11:50:17-04:00
  • title
    str
    ()
    diamond cw0.533 4.22e12 60Hz [10x30]
  • total_counts
    int32
    ()
    17926980
  • total_uncounted_counts
    int32
    ()
    0
  • scipp
    DataGroup
    ()
      • facility_user_id
        str
        ()
        HPJ
      • name
        str
        ()
        HPJ
      • role
        str
        ()
        E
  • scipp
    DataGroup
    ()
      • facility_user_id
        str
        ()
        3AH
      • name
        str
        ()
        3AH
      • role
        str
        ()
        P
  • scipp
    DataGroup
    ()
      • facility_user_id
        str
        ()
        OG6
      • name
        str
        ()
        OG6
      • role
        str
        ()
        E
  • scipp
    DataGroup
    ()
      • facility_user_id
        str
        ()
        2IH
      • name
        str
        ()
        2IH
      • role
        str
        ()
        E