# What's new in scipp

This page highlights feature additions and discusses major changes from recent releases.
For a full list of changes see the [Release Notes](https://scipp.github.io/about/release-notes.html).

In [None]:
import numpy as np
import scipp as sc

## General

### String formatting

<div class="alert alert-info">

**New in 0.15**

Added support for compact formatting of 0-D variables.

</div>

Example:

In [None]:
var = sc.scalar(12.5, variance=4.0, unit='mm')
print(f'{var}')
print(f'{var:c}')

### Implicit conversion to boolean

<div class="alert alert-info">

**New in 0.15**

Added support for implicit conversion of 0-D variables to bool.

</div>

Examples:

In [None]:
if sc.scalar(1, unit='m') < sc.scalar(2, unit='m'):
    print('ok')

var = sc.array(values=[1, 2, 3, 4, 5], dims=['x'], unit='m')
if sc.any(var == sc.scalar(3, unit='m')):
    print('ok')

### Keyword-argument syntax for `rename_dims`

<div class="alert alert-info">

**New in 0.15**

Added support for keyword arguments in `rename_dims` to define dimensions, as also supported by `rename`.

</div>

Example:

In [None]:
var = sc.ones(dims=['x', 'y'], shape=(4, 3))
var.rename_dims(x='xnew', y='ynew')

### Unique dimensions and slicing of 1-D objects

<div class="alert alert-info">

**New in 0.9**

The new `dim` property checks whether an object is 1-D, and returns the only dimension label.
An exception is raised if the object is not 1-D.
</div>

Example:

In [None]:
x = sc.linspace(dim='x', start=0, stop=1, num=4)
x.dim

<div class="alert alert-info">

**New in 0.11**

1-D objects can now be sliced without specifying a dimension.
</div>

Example:

In [None]:
x[-1]

If an object is not 1-D then `DimensionError` is raised:

In [None]:
var2d = sc.concat([x, x], 'y')
var2d[0]

### Slicing with stride

<div class="alert alert-info">

**New in 0.12**

Positional slicing (slicing with integer indices, as opposed to slicing with a label matching a coordinate value) now supports strides.

Negative strides are currently not supported.

</div>

Examples:

In [None]:
y = sc.arange('y', 10)
y[::2]

In [None]:
x = sc.linspace('x', 0.0, 1.0, num=5)
da = sc.DataArray(
    sc.ones(dims=['x', 'y'], shape=[4, 10], unit='K'), coords={'x': x, 'y': y}
)
da['y', 1::2]

Slicing a dimension with a bin-edge coordinate with a stride is ill-defined and not supported:

In [None]:
da['x', ::2]

### Slicing: Advanced indexing support with integer array or boolean variable

<div class="alert alert-info">

**New in 0.13**

- Added support for indexing with an integer array.
- Added support for indexing with a boolean variable.
    
The [Slicing](https://scipp.github.io/user-guide/slicing.html) documentation provides details and examples.

</div>

### Units

#### Unified conversion of unit and dtype

<div class="alert alert-info">

**New in 0.11**

Variables and data arrays have a new method, `to`, for conversion of dtype, unit, or both.
This can be used to replace uses of `to_unit` and `astype`.

</div>

Example:

In [None]:
var = sc.arange(dim='x', start=0, stop=4, unit='m')
var

Use the `unit` keyword argument to convert to a different unit:

In [None]:
var.to(unit='mm')

Use the `dtype` keyword argument to convert to a different dtype:

In [None]:
var.to(dtype='float64')

If both `unit` and `dtype` are provided, the implementation attempts to apply the two conversions in optimal order to reduce or avoid the effect of rounding/truncation errors:

In [None]:
var.to(dtype='float64', unit='km')

#### Support for `unit=None`

<div class="alert alert-info">

**New in 0.12**

Previously scipp used `unit=sc.units.dimensionless` (or the alias `unit=sc.units.one`) for anything that does not have a unit, such as strings, booleans, or bins.
To allow for distinction of actual physically dimensionless quantities from these cases, scipp now supports variables and, by extension, data arrays that have their unit set to `None`.
    
This change is accompanied by a number of related changes:

- Creation function use a default unit if not given explicitly.
  The default for *numbers* (floating point or integer) is `sc.units.dimensionless`.
  The default for everything else, including `bool` is `None`.
- Comparison operations, which return variables with `dtype=bool`, have `unit=None`.
- A new function `index` was added, to allow for creation of 0-D variable with `unit=None`.
  This complements `scalar`, which uses the default unit (depending on the `dtype`).

</div>

Examples:

In [None]:
print(sc.array(dims=['x'], values=[1.1, 2.2, 3.3]))
print(sc.array(dims=['x'], values=[1, 2, 3]))
print(sc.array(dims=['x'], values=[False, True, False]))
print(sc.array(dims=['x'], values=['a', 'b', 'c']))

In [None]:
a = sc.array(dims=['x'], values=[1, 2, 3])
b = sc.array(dims=['x'], values=[1, 3, 3])
print(a == b)
print(a < b)

In [None]:
(a == b).unit is None

For some purposes we may use a coordinate with unique integer-valued identifiers.
Since the identifiers to not have a physical meaning, we use `unit=None`.
Note that this has to be given explicitly since otherwise integers are treated as numbers, i.e., the unit would be dimensionless:

In [None]:
da = sc.DataArray(
    a, coords={'id': sc.array(dims=['x'], unit=None, values=[34, 21, 14])}
)
da

The `index` function can now be used to conveniently lookup data by its identifier:

In [None]:
da['id', sc.index(21)]

#### Reduced effect of rounding errors when converting units

<div class="alert alert-info">

**New in 0.14**

`sc.to_unit` (and therefore also the `to()` method) now avoid rounding errors when converting from a large unit to a small unit, if the conversion factor is integral.

</div>

Example:

In [None]:
sc.scalar(1.0, unit='m').to(unit='nm')

### Checking if coordinates are bin-edges

<div class="alert alert-info">

**New in 0.13**

The `coords` property (and also the `attrs`, `meta`, and `masks` properties) now provide the `is_edges` method to check if an entry is a bin-edge coordinate.

</div>

Example:

In [None]:
import scipp as sc

x = sc.arange('x', 3)
da = sc.DataArray(x, coords={'x1': x, 'x2': sc.arange('x', 4)})
print(f"{da.coords.is_edges('x1') = }")
print(f"{da.coords.is_edges('x2') = }")

### Coordinate transformations

<div class="alert alert-info">

**New in 0.15**
    
Several improvements for `transform_coords`:
    
- Support a keyword-syntax for defining single-step transformations.
- Now works with `lookup` (see below).
- Now works with callables other than functions, such as the output of `partial` (not with keyword arguments) or instances of classes defining `__call__`.

</div>

Examples:

In [None]:
da = sc.data.table_xyz(nrow=10)
da.transform_coords(xy=lambda x, y: x * y)

In [None]:
from functools import partial


def linear(a, b, x):
    return a * x + b


func = partial(linear, 0.5, sc.scalar(10.0, unit='m'))
da.transform_coords(fx=func)

### Operations

#### Creation functions

<div class="alert alert-info">

**New in 0.11**
    
Creation functions for datetimes where added:

- Added `epoch`, `datetime` and `datetimes`.

</div>

In [None]:
sc.datetime('now', unit='ms')

In [None]:
times = sc.datetimes(
    dims=['time'], values=['2022-01-11T10:24:03', '2022-01-11T10:24:03']
)
times

The new `epoch` function is useful for obtaining the time since epoch, i.e., a time difference (`dtype='int64'`) instead of a time point (`dtype='datetime64'`):

In [None]:
times - sc.epoch(unit=times.unit)

<div class="alert alert-info">

**New in 0.12**
    
`zeros_like`, `ones_like`, `empty_like`, and `full_like` can now be used with data arrays.

</div>

Example:

In [None]:
x = sc.linspace('x', 0.0, 1.0, num=5)
da = sc.DataArray(sc.ones(dims=['x', 'y'], shape=[4, 6], unit='K'), coords={'x': x})
sc.zeros_like(da)

#### Utility methods and functions

<div class="alert alert-info">

**New in 0.12**
    
- Added `squeeze` method to remove length-1 dimensions from objects.
- Added `rename` method to rename dimensions and associated dimension-coordinates (or attributes).
  This complements `rename_dims`, which only changes dimension labels but does not rename coordinates.
- Added `midpoints` to compute bin-centers.

</div>

Example:

In [None]:
x = sc.linspace('x', 0.0, 1.0, num=5)
da = sc.DataArray(sc.ones(dims=['x', 'y'], shape=[4, 6], unit='K'), coords={'x': x})

A length-1 x-dimension...

In [None]:
da['x', 0:1]

... can be removed with `squeeze`:

In [None]:
da['x', 0:1].squeeze()

`squeeze` returns a new object and leaves the original unchanged.

Renaming is most convenient using keyword arguments:

In [None]:
da.rename(x='xnew')

`rename` returns a new object and leaves the original unchanged.

`midpoints` can be used to replace a bin-edge coordinate by bin centers:

In [None]:
da.coords['x'] = sc.midpoints(da.coords['x'])
da

### Binning and histogramming operations

#### Reworked API for better user experience

<div class="alert alert-info">

**New in 0.15**

Simpler interface for binning and histogramming operations:
    
- `sc.bin` moved to `sc.binning.make_binned`.
  - Most users should use `sc.bin` or `sc.group` (see below for new interface).
- `sc.histogram` moved to `sc.binning.make_histogrammed`.
  - Most users should use `sc.hist` (see below for new interface).
- `bin`, `group`, `hist`, and `rebin` are now available as methods (in addition to free functions).
- `bin` and `hist` can be provided with one of:
  - Bin count.
  - Bin size.
  - Bin edges.
    
</div>

Examples, given a table:

In [None]:
table = sc.data.table_xyz(nrow=100)
table.coords['label'] = (table.coords['x'] * 10).to(dtype='int32')
table

Bin into 10 x and y bins:

In [None]:
table.bin(x=10, y=10)

Bin based on bin size:

In [None]:
table.bin(x=1 * sc.Unit('mm'))

Group by label and bin by y:

In [None]:
table.group('label').bin(y=20)

For more examples see the documentation of the functions.

#### Multi-dimensional histogramming

<div class="alert alert-info">

**New in 0.15**
    
Added support for multi-dimensional histogramming with `hist`.
This is partially based on `bin`, i.e., performance may be sub-optimal.
    
</div>

Example:

In [None]:
table.hist(x=10, y=20)

#### `nanhist`

<div class="alert alert-info">

**New in 0.15**
    
Added `nanhist`, to skip NaN values when computing a histogram.
This is based on `bin`, i.e., performance may be sub-optimal.
    
</div>

### Binned data

#### Interpolation using `lookup`

<div class="alert alert-info">

**New in 0.15**

`lookup` is extended and improved, to facilitate "event filtering" operations:
    
- Support for non-histogram data arrays as input functions.
  In this case two lookup modes, `previous` and `nearest` are provided.
  This makes this similar to `scipy.interpolate.interp1d`.
- Custom fill values are now supported.
  This is used for out-of-range as well as for masked values.
- Works with `transform_coords`.

</div>

Example:

Given a function `func` and a data array:

In [None]:
x = sc.linspace('x', 0, 1, num=51, unit='m')
func = sc.DataArray(x * x, coords={'x': x})  # approximating f(x) = x**2
table = sc.data.table_xyz(nrow=100)
da = table.bin(y=2, x=10)  # note x=10, unlike in func above

We can compute a new coordinate `x2`, for both the bin coordinate and the event coordinate:

In [None]:
da = da.transform_coords(x2=sc.lookup(func, mode='nearest'))
da

In [None]:
sc.show(da)

### Reduction operations

#### More operations supported by data arrays and datasets

<div class="alert alert-info">

**New in 0.14**

- `DataArray` and `Dataset` now support more reduction operations, including `sum`, `nansum`, `mean`, `nanmean`, `max`, `min`, `nanmax`, `nanmin`, `all`, and `any`.
- All of the above are now also supported for the `bins` property.
- `groupby` now also supports all of these operations.
  Exception: `nanmean`.
- Event-based masks are now supported in all reduction operations.
</div>

Example:

In [None]:
da = sc.data.binned_x(nevent=100, nbin=3)
da

The maximum value in each bin:

In [None]:
da.bins.max()

The maximum value in each bin of a binned variable, here a coordinate:

In [None]:
da.bins.coords['x'].bins.max()

### Shape operations

#### `fold` supports size -1

<div class="alert alert-info">

**New in 0.12**

`fold` now accepts up to one size (or shape) entry with value `-1`.
This indicates that the size should be computed automatically based on the input size and other provided sizes.

</div>

Example:

In [None]:
var = sc.arange('xyz', 2448)
var.fold('xyz', sizes={'x': 4, 'y': 4, 'z': -1})

#### `broadcast` supports `DataArray`

<div class="alert alert-info">

**New in 0.13**

`broadcast` now also supports data arrays.

</div>

#### `flatten` drops mismatching bin edges

<div class="alert alert-info">

**New in 0.15**

`flatten` now drops mismatching bin edges instead of raising an exception.

</div>

Example:

In [None]:
hist = sc.data.table_xyz(nrow=100).hist(y=2, x=4)
hist.flatten(to='yx')

Above the `x` edges cannot be joined together so the coordinate is dropped in the result.
Note the similar behavior of integer-array indexing, for the same reason:

In [None]:
hist['x', [0, 2, 3]]  # drops x edges

### Vectors and matrices

#### General

<div class="alert alert-info">

**New in 0.11**
    
`scipp.spatial` has been restructured and extended:

- New data types for spatial transforms were added:
  - `vector3` (renamed from `vector3_float64`)
  - `rotation3` (3-D rotation defined using quaternion coeffiecients)
  - `translation3` (translation in 3-D)
  - `linear_transform3` (previously `matrix_3_float64`, 3-D linear transform with, e.g., rotation and scaling)
  - `affine_transform3` (affine transform in 3-D, combination of a linear transform and a translation, defined using 4x4 matrix)
- The [scipp.spatial](https://scipp.github.io/generated/modules/scipp.spatial.html) submodule was extended with a number of new creation functions, in particular for the new dtypes.
- `matrix` and `matrices` for creating "matrices" have been deprecated. Use `scipp.spatial.linear_transform` and `scipp.spatial.linear_transforms` instead.

</div>

Note that the `scipp.spatial` subpackage must be imported explicitly:

In [None]:
from scipp import spatial

linear = spatial.linear_transform(value=[[1, 0, 0], [0, 2, 0], [0, 0, 3]])
linear

In [None]:
trans = spatial.translation(value=[1, 2, 3], unit='m')
trans

Multiplication can be used to combine the various transforms:

In [None]:
linear * trans

Note that in the case of `affine_transform3` the unit refers to the translation part.
A unit for the linear part is currently not supported.

## SciPy compatibility layer

<div class="alert alert-info">

**New in 0.11**
    
A number of subpackages providing wrappers for a *subset* of functions from the corresponding packages in SciPy was added:
    
- [scipp.integrate](../generated/modules/scipp.integrate.rst) providing `simpson` and `trapezoid`.
- [scipp.interpolate](../generated/modules/scipp.interpolate.rst) providing `interp1d`.
- [scipp.optimize](../generated/modules/scipp.optimize.rst) providing `curve_fit`.
- [scipp.signal](../generated/modules/scipp.signal.rst) providing `butter` and `sosfiltfilt`.

</div>

Please refer to the function documentation for working examples.

<div class="alert alert-info">

**New in 0.14**
    
- [scipp.ndimage](../generated/modules/scipp.ndimage.rst) providing `gaussian_filter`, `median_filter`, and more.

</div>

## Python ecosystem compatibility

<div class="alert alert-info">

**New in 0.15**
    
Added `scipp.compat.to_xarray`

</div>

Example:

In [None]:
da = sc.data.data_xy()
sc.compat.to_xarray(da)

## Performance

<div class="alert alert-info">

**New in 0.12**

- `sc.bin()` is now faster when binning or grouping into thousands of bins or more.

</div>

<div class="alert alert-info">

**New in 0.14**

Fixed slow import times of `scipp`.

</div>