# Quick Start Guide

## Overview

The [NeXus Data Format](https://www.nexusformat.org/) is typically used to structure HDF5 files.
An HDF5 file is a container for *datasets* and *groups*.
Groups are folder-like and work like Python dictionaries.
Datasets work like NumPy arrays.
In addition, groups and datasets have a dictionary of *attributes*.

NeXus extends this with the following:

- Definitions for attributes for datasets, in particular a `units` attribute.
  In NeXus, datasets are referred to as *field*.
- Definitions for attributes and structure of groups.
  This includes:
  - An `NX_class` attribute, identifying a group as an instance of a particular NeXus class such as [NXdata](https://manual.nexusformat.org/classes/base_classes/NXdata.html) or [NXlog](https://manual.nexusformat.org/classes/base_classes/NXlog.html).
  - Attributes that identify which fields contained in the group hold signal values, and which hold axis labels.
  
In the following we use a file from the [POWGEN](https://neutrons.ornl.gov/powgen) instrument at SNS.
It is bundled with ScippNexus and will be downloaded automatically using [pooch](https://pypi.org/project/pooch/) if it is not cached already:

In [None]:
from scippnexus import data

filename = data.get_path('PG3_4844_event.nxs')

## Loading files

Given such a NeXus file, we can load the entire file using [snx.load](../generated/functions/scippnexus.load.rst):

In [None]:
import scippnexus as snx

data = snx.load(filename)
data

[snx.load](../generated/functions/scippnexus.load.rst) supports selecting part of a file to load:

In [None]:
bank102 = snx.load(filename, root='entry/bank102')
bank102

This is a simpler and less powerful version of the interface described below.

## Opening files

It is often useful to only load part of a file or inspecting the file structure without loading any data.
ScippNexus provides an interface that is similar to [h5py](https://docs.h5py.org/en/stable/) for this purpose.

We first need to open the file using [snx.File](../generated/classes/scippnexus.File.rst).
Wherever possible this should be done using a context manager as follows:

In [None]:
import scippnexus as snx

with snx.File(filename) as f:
    print(list(f.keys()))

Unfortunately working with a context manager in a Jupyter Notebook is cumbersome, so for the following we open the file directly instead:

In [None]:
f = snx.File(filename)

## Navigating files

### Name-based access

If there are multiple children with a specific `NX_class` attribute then the aforementioned properties cannot be used.
Above we saw that the file contains a single key, `'entry'` (the name could be anything, it just happens to match the class name here).
When we access it we can see that it belongs to the class [NXentry](https://manual.nexusformat.org/classes/base_classes/NXentry.html) which is found on the top level in any NeXus file:

In [None]:
entry = f['entry']
entry

We could continue inspecting keys, until we find a group we are interested in.
For this example we use the `'proton_charge'` log found within `'DASlogs'`:

In [None]:
proton_charge = entry['DASlogs']['proton_charge']
proton_charge

### Getting all children of a specific `NX_class`

The `__getitem__` method can be used with a class imported from `scippnexus` to obtain a dict of all children with a matching `NX_class` attribute.
For example, we can get all detectors within the `NXintrument` using:

In [None]:
f['entry/instrument'][snx.NXdetector]

## Loading groups and datasets

This `proton_charge` group we "navigated" to above is an [NXlog](https://manual.nexusformat.org/classes/base_classes/NXlog.html), which typically contains 1-D data with a time axis.
Since ScippNexus knows about NXlog, it knows how to identify its shape:

In [None]:
proton_charge.shape

<div class="alert alert-info">
    <b>Note:</b>

This is in contrast to plain HDF5 where groups do *not* have a shape.
Note that not all NeXus classes have a defined shape.

</div>

We read the NXlog from the file using the slicing notation.
To read the entire group, use ellipses (or an empty tuple):

In [None]:
proton_charge[...]

Above, ScippNexus automatically dealt with:

- Loading the data field (signal value dataset and its `'units'` attribute).
- Identifying the dimension labels (here: `'time'`).
- Other fields in the group were loaded as coordinates, including:
  - Units of the fields.
  - Uncertainties of the fields (here for `'average_value'`).
  
This structure is compatible with a `scipp.DataArray` and is returned as such.

We may also load an individual field instead of an entire group.
A field corresponds to a `scipp.Variable`, i.e., similar to how h5py represents datasets as NumPy arrays but with an added unit and dimension labels (if applicable).
For example, we may load only the `'value'` dataset:

In [None]:
proton_charge['value'][...]

Attributes of datasets or groups are accessed just like in h5py:

In [None]:
proton_charge['value'].attrs['units']

A subset of the group (and its datasets) can be loaded by selecting only a slice.
We can also plot this directly using the `plot` method of `scipp.DataArray`:

In [None]:
proton_charge['time', 193000:197000].plot()

As another example, consider the following [NXdata](https://manual.nexusformat.org/classes/base_classes/NXdata.html) group:

In [None]:
bank = f['entry/bank103']
print(bank.shape, bank.dims)

This can be loaded and plotted as above.
In this case the resulting data array is 2-D:

In [None]:
da = bank[...]
da

In [None]:
da.plot()

## Writing to files

See [application definitions](application-definitions.ipynb#Writing-files) section for documentation about how to write to Nexus files.