Data types¶
In most cases, the data type (dtype
) of a Variable is derived from the data. For instance when passing a numpy array to scipp, scipp will use the dtype
provided by numpy:
[1]:
import numpy as np
import scipp as sc
var = sc.Variable(dims=['x'], values=np.arange(4.0))
var.dtype
[1]:
float64
[2]:
var = sc.Variable(dims=['x'], values=np.arange(4))
var.dtype
[2]:
int64
The dtype
may also be specified using a keyword argument to sc.Variable
and most creation functions. It is possible to use scipp’s own scipp.dtype
, numpy.dtype, or (where a numpy equivalent exists) a string:
[3]:
var = sc.zeros(dims=['x'], shape=[2], dtype=sc.dtype.float32)
var.dtype
[3]:
float32
[4]:
var = sc.zeros(dims=['x'], shape=[2], dtype=np.dtype(np.float32))
var.dtype
[4]:
float32
[5]:
var = sc.zeros(dims=['x'], shape=[2], dtype='float32')
var.dtype
[5]:
float32
Scipp supports common dtypes like
float32
,float64
int32
,int64
bool
string
datetime64
It is also possible to nest Variables, DataArrays, or Datasets inside of Variables. This is useful for storing attributes in DataArrays and Datasets. But there is only limited interoperability with numpy in those cases.
[6]:
var = sc.scalar(sc.zeros(dims=['x'], shape=[2], dtype='float64'))
var
[6]:
- ()Variable<scipp.Variable> (x: 2) float64 [dimensionless] [0.000000, 0.000000]
Values:
<scipp.Variable> (x: 2) float64 [dimensionless] [0.000000, 0.000000]
You can get a full list using
[s for s in dir(sc.dtype) if not s.startswith('__')]
but note that many of those dtypes are only meant for internal use.
Dates and Times¶
Scipp has a special dtype for time-points, sc.dtype.datetime64
. Variables can be constructed from integers which encode the time since the Unix epoch:
[7]:
sc.scalar(value=0, unit=sc.units.s, dtype=sc.dtype.datetime64)
[7]:
- ()datetime64s1970-01-01T00:00:00
Values:
array('1970-01-01T00:00:00', dtype='datetime64[s]')
[8]:
sc.scalar(value=681794055, unit=sc.units.s, dtype=sc.dtype.datetime64)
[8]:
- ()datetime64s1991-08-10T03:14:15
Values:
array('1991-08-10T03:14:15', dtype='datetime64[s]')
Datetime variables always need a temporal unit and that unit determines how the integer that is passed to value=
is interpreted:
[9]:
var = sc.scalar(value=681794055, unit=sc.units.ns, dtype=sc.dtype.datetime64)
var
[9]:
- ()datetime64ns1970-01-01T00:00:00.681794055
Values:
array('1970-01-01T00:00:00.681794055', dtype='datetime64[ns]')
Datetime elements are automatically converted to and from numpy.datetime64 objects:
[10]:
var.value
[10]:
numpy.datetime64('1970-01-01T00:00:00.681794055')
[11]:
now = sc.scalar(value=np.datetime64('now'))
now
[11]:
- ()datetime64s2021-11-10T14:27:37
Values:
array('2021-11-10T14:27:37', dtype='datetime64[s]')
Note that now
has unit s
even though we did not specify it. The unit was deduced from the numpy.datetime64
object which encodes a unit of its own.
Operations¶
Variables containing datetimes only support a limited set of operations as it makes no sense to, for instance, add two time points. In contrast to numpy, scipp does not have a separate type for time differences. Those are simply encoded by integer Variables with a temporal unit.
[12]:
a = sc.scalar(value=np.datetime64('2021-03-14', 'ms'))
b = sc.scalar(value=np.datetime64('2000-01-01', 'ms'))
a - b
[12]:
- ()int64ms668995200000
Values:
array(668995200000)
[13]:
try:
a + b
except sc.DTypeError as err:
print(err)
'add' does not support dtypes datetime64 datetime64
[14]:
a + sc.scalar(value=123, unit='ms')
[14]:
- ()datetime64ms2021-03-14T00:00:00.123
Values:
array('2021-03-14T00:00:00.123', dtype='datetime64[ms]')
Time zones¶
Scipp does not support manual handling of time zones. All datetime objects are assumed to be in UTC. Scipp does not look at your local time zone, thus the following will always produce 12:30 on 2021-03-09 UTC no matter where you are when you run this code:
[15]:
sc.scalar(value=np.datetime64('2021-09-03T12:30:00'))
[15]:
- ()datetime64s2021-09-03T12:30:00
Values:
array('2021-09-03T12:30:00', dtype='datetime64[s]')