Reading and Writing Files#

HDF5#

Scipp supports writing variables, data arrays, and dataset to HDF5 files. Reading of HDF5 is supported only for these scipp-specific files. Other HDF5-based formats are not supported at this point. For reading the HDF5-based NeXus files, see scippneutron.

Warning

We do not recommend to use Scipp HDF5 files for archiving or as the sole means of storing valuable data. The current Scipp HDF5 schema is not a standard and will likely be subject to change due to the early development status of scipp. Future versions of Scipp may not be able to read older files.

That being said, the file format is quite simple and based on the HDF5 standard so it would still be possible to recover data from such files in such a case. Note that the Scipp version is stored as an HDF5 attribute of the saved objects.

[1]:
import numpy as np
import scipp as sc

x = sc.Variable(dims=['x'], values=np.arange(10))
var = sc.Variable(dims=['x', 'y'], values=np.random.rand(9, 3))
a = sc.DataArray(data=var, coords={'x': x})

a.save_hdf5(filename='test.hdf5')
/home/runner/work/scipp/scipp/.tox/docs/lib/python3.10/site-packages/scipp/io/hdf5.py:368: VisibleDeprecationWarning: sc.DataArray.attrs has been deprecated and will be removed in Scipp v24.12.0. The deprecation includes sc.DataArray.meta and sc.DataArray.drop_attrs. For unaligned coords, use sc.DataArray.coords and unset the alignment flag. For other attributes, use a higher-level data structure.
  views = [data.coords, data.masks, data.attrs]
[2]:
b = sc.io.load_hdf5(filename='test.hdf5')
[3]:
b
[3]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.29 KB)
    • x: 9
    • y: 3
    • x
      (x [bin-edge])
      int64
      𝟙
      0, 1, ..., 8, 9
      Values:
      array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    • (x, y)
      float64
      𝟙
      0.432, 0.662, ..., 0.968, 0.942
      Values:
      array([[4.32304899e-01, 6.62054081e-01, 8.72234821e-01], [4.02572792e-01, 2.15642644e-01, 6.91839122e-01], [4.19118356e-01, 9.41061418e-01, 5.31531106e-01], [8.91241171e-01, 9.01343284e-01, 3.64561204e-01], [8.08260224e-02, 6.63459937e-01, 2.12303912e-04], [7.36856917e-01, 3.63033382e-01, 5.45717390e-01], [9.19763987e-01, 8.79328914e-01, 1.44303275e-01], [2.63300078e-01, 3.72453241e-01, 1.94160654e-02], [4.03925654e-01, 9.67619177e-01, 9.42220336e-01]])

CSV#

Note

CSV support requires pandas which must be installed separately.

CSV files can be read into datasets with scipp.io.load_csv. For example, given the following CSV-encoded data can be read into a dataset as shown:

[4]:
csv_content = '''a [m],b [s],c
1,5,9
2,6,10
3,7,11
4,8,12'''
[5]:
from io import StringIO

ds = sc.io.load_csv(StringIO(csv_content), header_parser='bracket')
ds
[5]:
Show/Hide data repr Show/Hide attributes
scipp.Dataset (2.54 KB)
    • row: 4
    • a
      (row)
      int64
      m
      1, 2, 3, 4
      Values:
      array([1, 2, 3, 4])
    • b
      (row)
      int64
      s
      5, 6, 7, 8
      Values:
      array([5, 6, 7, 8])
    • c
      (row)
      int64
      9, 10, 11, 12
      Values:
      array([ 9, 10, 11, 12])

This example uses StringIO to load the data directly from a string. But load_csv can also load from a file on your hard drive or even from a remote server. Simply pass the path or URL of the file as the first argument. See also pandas.read_csv.

See scipp.io.load_csv for more options to customize how the data is structured in the dataset.

Using pandas#

The CSV reader shown above is a wrapper around pandas.read_csv and provides commonly used functionality. But pandas supports many more file readers for, among others, Excel, JSON, and XML files. See pandas IO tools for a complete list.

It is possible to use pandas manually to load these files and then convert the result to a Scipp dataset using from_pandas. For example, JSON can be read as follows:

[6]:
json = '''{"A [m]": {"0": 1, "1": 3, "2": 5},
"B [m/s]": {"0": 2, "1": 4, "2": 6}}'''
[7]:
import pandas as pd

df = pd.read_json(json)
df
/tmp/ipykernel_7502/3504722790.py:3: FutureWarning: Passing literal json to 'read_json' is deprecated and will be removed in a future version. To read from a literal string, wrap it in a 'StringIO' object.
  df = pd.read_json(json)
[7]:
A [m] B [m/s]
0 1 2
1 3 4
2 5 6
[8]:
sc.compat.from_pandas(df, header_parser='bracket')
[8]:
Show/Hide data repr Show/Hide attributes
scipp.Dataset (2.02 KB)
    • row: 3
    • row
      (row)
      int64
      𝟙
      0, 1, 2
      Values:
      array([0, 1, 2])
    • A
      (row)
      int64
      m
      1, 3, 5
      Values:
      array([1, 3, 5])
    • B
      (row)
      int64
      m/s
      2, 4, 6
      Values:
      array([2, 4, 6])

NeXus#

Scipp has no built-in support for loading NeXus files. However, the scippneutron package can internally use Mantid to load such files, or any other Mantid-supported file type, see scippneutron and in particular scippneutron.load_with_mantid.