Application Definitions#
Overview#
NeXus Application Definitions define mandatory and optional class contents for specific applications. ScippNexus’ approach to application definitions is to consider them as a guide, without performing full validation. This is to avoid getting in the way of the library user, e.g., when working with incomplete or partially broken files. For example, ScippNexus will generally not validate that the tree structure conforms to a given application definition.
Warning:
ScippNexus’ support for application definitions is currently experimental and the API is still subject to changes.
Definitions provide customization points, e.g., for how ScippNexus can find required information in the HDF5 group, and how contents are mapped to aspects of the returned data (typically a scipp.DataArray
or scipp.DataGroup
).
Definitions in ScippNexus are subclasses of NXobject. A definitions
mapping passed to snx.File
serves as a repository of definitions that snx.Group
will use when opening a group in a file. snx.base_definitions()
is used by default. The NX_class
attribute of the HDF5 group is used as a key into the definitions
mapping. It provides subclasses such as NXlog
, NXdata
, and NXdetector
.
Users can implement their application definition (or any definition) by subclassing NXobject
, or one of the existing base-class definitions.
Writing files#
Skip ahead to Reading files if you simply want to customize how data is read from existing files. ScippNexus provides a customization point for writing content to NeXus files with __setitem__
. The requirements are that the value
provides an
nx_class
attribute that returns a valid NeXus class name such as'NXdata'
orscippnexus.NXdata
anddefines the
__write_to_nexus_group__
method that takes ah5py.Group
, i.e., an open HDF5 group, as its single argument.
__write_to_nexus_group__
may then write its content to this group. This can (and should) make use of ScippNexus features for writing Nexus fields (HDF5 datasets) from a scipp.Variable
via snx.create_field
, such as automatic writing of the units
attribute, or writing datetime64
data. Consider the following example:
[1]:
import h5py
import scipp as sc
import scippnexus as snx
class MyData:
nx_class = snx.NXdata # required
def __init__(self, data: sc.DataArray):
self._data = data
def __write_to_nexus_group__(self, group: h5py.Group): # required
group.attrs['axes'] = self._data.dims # NeXus way of defining dim labels
snx.create_field(group, 'mysignal', self._data.data)
Note that above we use a custom “signal” name and do not to set the “signal” attribute on the group and as such deviate from the NeXus specification. We can then write our data using:
[2]:
mydata = sc.DataArray(sc.arange('x', 5, unit='s'))
with snx.File('test.nxs', 'w') as f:
f['data'] = MyData(mydata)
You can also manually write nexus classes to a hdf5 dataset with snx.create_class
:
[3]:
with h5py.File('test2.nxs', mode='w') as f:
nxdata = snx.create_class(f, 'data', nx_class=snx.NXdata)
nxdata.attrs['axes'] = mydata.dims
snx.create_field(nxdata, 'mysignal', mydata.data)
[4]:
%%bash
# The files created above are identical
cmp -s test.nxs test2.nxs
Reading files#
Overview#
For some application definitions — or classes within application definitions — the default ScippNexus mechanisms for reading are sufficient. This is the case when the application definition follows the NeXus standard and, e.g., introduces no new attributes.
In other cases we require customization of how ScippNexus reads class contents. This is handled using definitions that can be passed to snx.File
or snx.Group
.
As an example, consider the following simple definition for loading data with a custom signal name, which the file failed to specify. In this particular case we subclass snx.NXdata
, and pass a custom argument to its __init__
. In general this is rarely sufficient, and in practice a definition may need to implement other parts of the snx.NXobject
interface:
[5]:
class MyDataDefinition(snx.NXdata):
def __init__(self, attrs, children):
super().__init__(
attrs=attrs, children=children, fallback_signal_name='mysignal'
)
my_definitions = snx.base_definitions()
my_definitions['NXdata'] = MyDataDefinition
We can then load our file (created above in Writing files) by our custom definitions to snx.File
:
[6]:
with snx.File('test.nxs', 'r', definitions=my_definitions) as f:
loaded = f['data'][...]
loaded
[6]:
- x: 5
- (x)int64s0, 1, 2, 3, 4
Values:
array([0, 1, 2, 3, 4])
ScippNexus does currently not ship with a library of application definitions. Custom definitions can be provided by a user as outlined above.
Using definitions for filtering#
The application-definition mechanism can be used for filtering or selecting which children from a group should be loaded. For example, we may wish to exclude certain NeXus classes from loading. We define a custom definition as follows:
[7]:
import scippnexus as snx
def skip(name, obj):
skip_classes = (snx.NXevent_data, snx.NXinstrument)
return isinstance(obj, snx.Group) and (
(obj.nx_class in skip_classes) or (name == 'DASlogs')
)
class FilteredEntry(snx.NXobject):
def __init__(self, attrs, children):
children = {
name: child for name, child in children.items() if not skip(name, child)
}
super().__init__(attrs=attrs, children=children)
my_definitions = snx.base_definitions()
my_definitions['NXentry'] = FilteredEntry
We can use these definitions as follows:
[8]:
from scippnexus import data
filename = data.get_path('PG3_4844_event.nxs')
f = snx.File(filename, definitions=my_definitions)
f['entry'][...]
Downloading file 'PG3_4844_event.nxs' from 'https://public.esss.dk/groups/scipp/scippnexus/1/PG3_4844_event.nxs' to '/home/runner/.cache/scippnexus/1'.
[8]:
- scippDataGroup()
- SNSbanking_file_namestr()PG3_bank_2011_02_25.xml
- SNSmapping_file_namestr()PG3_TS_2009_04_17.dat
- authorstr()HistoTool
- command1str()event2nxl --mapping PG3_TS_2009_04_17.dat --banking PG3_bank_2011_02_25.xml --in...
- command2str()monitorappend --time_offset 0.0 --max_time_bin 200001.0 -l 1.0 --input PG3_4844_...
- datestr()2011-08-12
- descriptionstr()List of commands run within the HistoTool package
- versionstr()3.4.5
- bank102scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank103scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank104scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank105scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank106scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank123scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank124scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank143scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank144scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank164scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank184scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank22scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank23scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank24scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank42scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank43scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank44scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank62scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank63scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank64scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank82scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank83scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - bank84scippDataArray(x_pixel_offset: 154,
 y_pixel_offset: 7)int320, 0, ..., 0, 0 - collection_identifierstr()0
- collection_titlestr()No title entered
- definitionstr()EVENTRAW
- durationscippVariable()float32s5508.0
- end_timestr()2011-08-12T13:22:05-04:00
- entry_identifierstr()4844
- experiment_identifierstr()IPTS-2767
- scippDataGroup(time_of_flight: 200001)
- modestr()monitor
- datascippDataArray(time_of_flight: 200001)int3225, 10, ..., 0, 0
- notesstr()NONE
- proton_chargescippVariable()float64pC4219034050530.0
- raw_framesint32()330473
- run_numberstr()4844
- scippDataGroup()
- changer_positionstr()NONE
- holderstr()NONE
- identifierstr()NONE
- namestr()LaB6
- naturestr()NONE
- start_timestr()2011-08-12T11:50:17-04:00
- titlestr()diamond cw0.533 4.22e12 60Hz [10x30]
- total_countsint32()17926980
- total_uncounted_countsint32()0
- scippDataGroup()
- facility_user_idstr()HPJ
- namestr()HPJ
- rolestr()E
- scippDataGroup()
- facility_user_idstr()3AH
- namestr()3AH
- rolestr()P
- scippDataGroup()
- facility_user_idstr()OG6
- namestr()OG6
- rolestr()E
- scippDataGroup()
- facility_user_idstr()2IH
- namestr()2IH
- rolestr()E