# Creating Arrays and Datasets

There are several ways to create data structures in Scipp.
`scipp.Variable` is particularly diverse.

The examples on this page only show some representative creation functions and only the most commonly used arguments.
See the linked reference pages for complete lists of functions and arguments.

## Variable

[Variables](../../generated/classes/scipp.Variable.rst) can be created using any of the dedicated [creation functions](../../reference/creation-functions.rst#creation-functions).
These fall into several categories as described by the following subsections.

### From Python Sequences or NumPy Arrays

#### Arrays: N-D Variables

Variables can be constructed from any Python object that can be used to create a NumPy array or NumPy arrays directly.
See [NumPy array creation](https://numpy.org/doc/stable/user/basics.creation.html) for details.
Given such an object, an array variable can be created using [scipp.array](../../generated/functions/scipp.array.rst) (not to be confused with [data arrays](../../generated/classes/scipp.DataArray.rst)!)

In [None]:
import scipp as sc
v1d = sc.array(dims=['x'], values=[1, 2, 3, 4])
v2d = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]])
v3d = sc.array(dims=['x', 'y', 'z'], values=[[[1, 2], [3, 4]],
                                             [[5, 6], [7, 8]]])

Alternatively, passing a NumPy array:

In [None]:
import numpy as np
a = np.array([[1, 2], [3, 4]])
v = sc.array(dims=['x', 'y'], values=a)

Note that *both* the NumPy array and Python lists are copied into the Scipp variable which leads to some additional time and memory costs.
See [Filling with a Value](#Filling-with-a-Value) for ways of creating variables without this overhead.

The `dtype` of the variable is deduced automatically in the above cases.
The unit is set to `scipp.units.dimensionless` if the `dtype` is a numeric type (e.g. integer, floating point) or `None` otherwise (e.g. strings).
This applies to all creation functions, not just `scipp.array`.

In [None]:
v

The `dtype` can be overridden with the `dtype` argument (see [Data types](../../reference/dtype.rst)):

In [None]:
sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], dtype='float64')

The unit can and almost always should be set manually (see [unit](../../reference/units.rst)):

In [None]:
sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], unit='m')

Variances can be added using the `variances` keyword:

In [None]:
sc.array(dims=['x', 'y'], values=[[1.0, 2.0], [3.0, 4.0]],
         variances=[[0.1, 0.2], [0.3, 0.4]])

<div class="alert alert-info">

**Note:**

`scipp.array` takes *variances* as an input.
If your input stores standard deviations, make sure to square them: `sc.array(..., variances=stddevs**2)`.

Note further that the output in Jupyter notebooks shows standard deviations (`Ïƒ`).
And converting variables to a plain string (`str(var)` or `print(var)`) shows variances.

All of this also applies to all other creation functions.

</div>

#### Scalars: 0-D Variables

Scalars are variables with no dimensions.
See [0-D variables (scalars)](data-structures.ipynb#0-D-variables-(scalars)) for a more detailed definition.
They can be constructed using, among other functions, [scipp.scalar](../../generated/functions/scipp.scalar.rst):

In [None]:
sc.scalar(3.41)

`scipp.scalar` will always produce a scalar variable, even when passed a sequence like a list:

In [None]:
sc.scalar([3.41])

In this case, it stores the Python list as-is in a Scipp variable.

Multiplying or dividing a value by a unit also produces a scalar variable:

In [None]:
4.2 * sc.Unit('m')

In [None]:
4.2 / sc.Unit('m')

### Generating Values

#### Range-Like Variables

1D ranges and similar sequences can be created directly in scipp.
[scipp.linspace](../../generated/functions/scipp.linspace.rst) creates arrays with regularly spaced values with a given number of elements.
For example (click the stacked disks icon to see all values):

In [None]:
sc.linspace('x', start=-2, stop=5, num=6, unit='s')

[scipp.arange](../../generated/functions/scipp.arange.rst) similarly creates arrays but with a given step size:

In [None]:
sc.arange('x', start=-2, stop=5, step=1.2, unit='K')

`arange` does not include the stop value but `linspace` does by default.
Please note that the caveats described in [NumPy's documentation](https://numpy.org/doc/stable/user/basics.creation.html#d-array-creation-functions) apply to Scipp as well.

All range-like functions currently use NumPy to generate values and thus incur the same costs from copying as `scipp.array`.

#### Filling with a Value

There are a number opf functions to create N-D arrays with a fixed value, e.g. [scipp.zeros](../../generated/functions/scipp.zeros.rst) and [scipp.full](../../generated/functions/scipp.full.rst).
`scipp.zeros` creates a variable of any number of dimensions filled with zeros:

In [None]:
sc.zeros(dims=['x', 'y'], shape=[3, 4])

And `scipp.full` creates a variable filled with a given value:

In [None]:
sc.full(dims=['x', 'y'], shape=[3, 4], value=1.23)

Every filling function has a corresponding function with a `_like` postfix, for instance [scipp.zeros_like](../../generated/functions/scipp.zeros_like.rst).
These create a new variable (or data array) based on another variable (or data array):

In [None]:
v = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], unit='m')
sc.zeros_like(v)


### Special DTypes

Scipp has a number of `dtypes` that require some conversion when creating variables.
Notably [scipp.datetimes](../../generated/functions/scipp.datetimes.rst), [scipp.vectors](../../generated/functions/scipp.vectors.rst), and their scalar counterparts [scipp.datetime](../../generated/functions/scipp.datetime.rst), [scipp.vector](../../generated/functions/scipp.vector.rst).
As well as types for spatial transformations in [scipp.spatial](../../generated/modules/scipp.spatial.rst).
While variables of all of these dtypes can be constructed using `scipp.array` and `scipp.scalar`, the specialized functions offer more convenience and document their intent better.

`scipp.datetimes` constructs an array of date-time-points.
It can be called either with strings in ISO 8601 format:

In [None]:
sc.datetimes(dims=['time'], values=['2021-01-10T01:23:45',
                                    '2021-01-11T23:45:01'])

Or with integers which encode the number of time units elapsed since the Unix epoc.
See also [scipp.epoch](../../generated/functions/scipp.epoch.rst).

In [None]:
sc.datetimes(dims=['time'], values=[0, 1610288175], unit='s')

Note that the unit is mandatory in the second case.

It is also possible to create a range of datetimes:

In [None]:
sc.arange('time', '2022-08-04T14:00:00', '2022-08-04T14:04:00',
          step=30 * sc.Unit('s'), dtype='datetime64')

The current time according to the system clock can be accessed using

In [None]:
sc.datetime('now', unit='ms')

Scipp stores date times relative to the Unix epoch which is available via a shorthand:

In [None]:
sc.epoch(unit='s')

`scipp.vectors` creates an array of 3-vectors.
It does so by converting a sequence or array with a length of 3 in its inner dimension:

In [None]:
sc.vectors(dims=['position'], values=[[1, 2, 3], [4, 5, 6]], unit='m')

## Data Arrays

There is essentially only one way to construct [data arrays](../../generated/classes/scipp.DataArray.rst), namely its initializer:

In [None]:
x = sc.linspace('x', start=1.5, stop=3.0, num=4, unit='m')
a = sc.scalar('an attribute')
m = sc.array(dims=['x'], values=[True, False, True, False])
data = x ** 2
sc.DataArray(data, coords={'x': x}, attrs={'a': a}, masks={'m': m})

`coords`, `attrs`, and `masks` are optional but the `data` must always be given.
Note how the creation functions for `scipp.Variable` can be used to make the individual pieces of a data array.
These variables are not copied into the data array.
Rather, the data array contains a reference to the same underlying piece of memory.
See [Ownership mechanism and readonly flags](../../reference/ownership-mechanism-and-readonly-flags.rst) for details.

## Dataset

[Datasets](../../generated/classes/scipp.Dataset.rst) are constructed by combining multiple data arrays or variables.
For instance, using the previously defined variables:

In [None]:
sc.Dataset({'data1': data, 'data2': -data}, coords={'x': x})

Or from data arrays:

In [None]:
da1 = sc.DataArray(data, coords={'x': x}, attrs={'a': a}, masks={'m': m})
da2 = sc.DataArray(-data, coords={'x': x})
sc.Dataset({'data1': da1, 'data2': da2})

## Any Data Structure

Any of `scipp.Variable`, `scipp.DataArray`, and `scipp.Dataset` and be created using the methods described in the following subsections.

### From Files

Scipp has a custom file format based on HDF5 which can store data structures.
See [Reading and Writing Files](../reading-and-writing-files.rst) for details.
In short, `scipp.io.open_hdf5` loads whatever Scipp object is stored in a given file.
For demonstration purposes, we use a `BytesIO` object here. But the same code can be used by passing a string as a file name to `to_hdf5` and `open_hdf5`.

In [None]:
from io import BytesIO
buffer = BytesIO()
v = sc.arange('x', start=1.0, stop=5.0, step=1.0, unit='s')
v.to_hdf5(buffer)
sc.io.open_hdf5(buffer)

### From Other Libraries

Scipp's data structures can be converted to and from certain other structures using the functions listed under [Compatibility](../../reference/free-functions.rst#compatibility).
For example, `scipp.compat.from_pandas` can convert a Pandas dataframe into a Scipp dataset:

`coords`, `attrs`, and `masks` are optional but the `data` must always be given.
Note how the creation functions for `scipp.Variable` can be used to make the individual pieces of a data array.

## Dataset

[Datasets](../../generated/classes/scipp.Dataset.rst) are constructed by combining multiple data arrays or variables.
For instance, using the previously defined variables:

In [None]:
sc.Dataset({'data1': data, 'data2': -data}, coords={'x': x})

Or from data arrays:

In [None]:
da1 = sc.DataArray(data, coords={'x': x}, attrs={'a': a}, masks={'m': m})
da2 = sc.DataArray(-data, coords={'x': x})
sc.Dataset({'data1': da1, 'data2': da2})

## Any Data Structure

Any of `scipp.Variable`, `scipp.DataArray`, and `scipp.Dataset` and be created using the methods described in the following subsections.

### From Scipp HDF5 Files

Scipp has a custom file format based on HDF5 which can store data structures.
See [Reading and Writing Files](../reading-and-writing-files.rst) for details.
In short, `scipp.io.open_hdf5` loads whatever Scipp object is stored in a given file.
For demonstration purposes, we use a `BytesIO` object here. But the same code can be used by passing a string as a file name to `to_hdf5` and `open_hdf5`.

In [None]:
from io import BytesIO
buffer = BytesIO()
v = sc.arange('x', start=1.0, stop=5.0, step=1.0, unit='s')
v.to_hdf5(buffer)
sc.io.open_hdf5(buffer)

### From Other Libraries

Scipp's data structures can be converted to and from certain other structures using the functions listed under [Compatibility](../../reference/free-functions.rst#compatibility).
For example, `scipp.compat.from_pandas` can convert a Pandas dataframe into a Scipp dataset:

In [None]:
import pandas as pd
df = pd.DataFrame({'x': 10*np.arange(5), 'y': np.linspace(0.1, 0.5, 5)})
df

In [None]:
sc.compat.from_pandas(df)

### From NeXus Files

[NeXus](https://www.nexusformat.org/) is an HDF5-based file format neutron, x-ray, and muon science.
[Scippnexus](https://scipp.github.io/scippnexus/) is a package that allows loading NeXus files into Scipp objects with a simple h5py-like interface.
For example, the following code loads the first detector bank into a data array:
```python
import scippnexus as snx
with snx.File(filename) as f:
    da = f['entry/bank0'][...]
```
See the documentation of Scippnexus linked above for more information.