In [None]:
import numpy as np
import scipp as sc

# Tips, tricks, and anti-patterns
## Choose dimensions wisely

A good choice of dimension for representing data goes a long way in making working with scipp efficient.
Consider, e.g., data gathered from detector pixels at certain time intervals.
We could represent it as

In [None]:
npix = 100
ntime = 10
data = sc.zeros(dims=['pixel','time'], shape=[npix, ntime])
data

For irregularly spaced detectors this may well be the correct or only choice.
If however the pixels are actually forming a regular 2-D image sensor we should probably prefer

In [None]:
nx = 10
ny = npix // nx
data = sc.zeros(dims=['y', 'x', 'time'], shape=[ny, nx, ntime])
data

With this layout we can naturally perform slices, access neighboring pixel rows or columns, or sum over rows or columns.

## Choose dimension order wisely

In principle the order of dimension in scipp can be arbitrary since operations transpose automatically based on dimension labels.
In practice however a bad choice of dimension order can lead to performance bottlenecks.
This is most obvious when slicing multi-dimensional variables or arrays, where slicing any but the outer dimension yields a slice with gaps between data values, i.e., a very inefficient memory layout.
If an application requires slicing (directly or indirectly, e.g., in `groupby` operations) predominantly for a certain dimension, this dimension should be made the *outermost* dimension.
For example, for a stack of images the best choice would typically be

In [None]:
nimage = 13
images = sc.zeros(dims=['image', 'y', 'x'], shape=[nimage, ny, nx,])
images

Slices such as

In [None]:
images['image', 3]

will then have data for all pixels in a contiguous chunk of memory.
Note that in scipp the first listed dimension in `dims` is always the *outermost* dimension (numpy's default).

## Avoid loops

With scipp, just like with numpy or Matlab, loops such as `for`-loops should be avoided.
Loops typically lead to many small slices or many small array objects and very quickly lead to very inefficient code.
If we encounter the need for a loop in a workflow using scipp we should try and take a step back to understand how it can be avoided.
Some tips to do this include:

### Use slicing with "shifts"

When access to neighbor slices is required, replace

In [None]:
for i in range(len(images.values)-1):
    images['image', i] -= images['image', i+1]

with

In [None]:
images['image', :-1] -= images['image', 1:]

Note that a this point numpy provides more powerful functions such as [numpy.roll](https://numpy.org/doc/stable/reference/generated/numpy.roll.html).
Scipp's toolset for such purposes is not fully developed yet.

### Seek advice from numpy

There is a huge amount of information available for numpy, e.g., on [stackoverflow](https://stackoverflow.com/questions/tagged/numpy?tab=Votes).
We can profit in two ways from this.
In some cases, the same techniques can be applied to scipp variables or data arrays, since mechanisms such as slicing and basic operations are very similar.
In other cases, e.g., when functionality is not available in scipp yet, we can resort to processing the raw array accessible through the `values` property:

In [None]:
var = sc.Variable(dims=['x'], values=np.arange(10.0))
var.values = np.roll(var.values, 2)
var

The `values` property can also be used as the `out` argument that many numpy functions support:

In [None]:
np.exp(var.values, out=var.values)
var

<div class="alert alert-warning">
    <b>WARNING</b>

When applying numpy functions to the `values` directly we lose handling of units and variances, so this should be used with care.
</div>

### Use helper dimensions or reshaped data

Some operations may be difficult to implement without a loop in a certain data layout.
If this layout cannot be changed globally, we can still change it temporarily for a certain operation.
Even if this requires a copy it may still be faster and more concise than implementing the operation with a loop.
For example, we can sum neighboring elements by temporarily reshaping with a helper dimension using `fold`:

In [None]:
var = sc.Variable(dims=['x'], values=np.arange(10.0))
sc.sum(sc.fold(var, dim='x', sizes={'x': 5, 'neighbors': 2}), 'neighbors')

In the case of only two neighbors, the same could be achieved using slicing with strides, however scipp does not support strides yet.

Note that `fold` returns a view, i.e., the operation is performance without making a copy of the underlying data buffers.
The companion operation of `fold` is `flatten`, which provides the reverse operation.

## Use in-place operations

Allocating memory or copying data is an expensive process and may even be the dominant factor for overall application performance, apart from loading large amounts of data from disk.
Therefore, it pays off the avoid copies where possible.

Scipp provides two mechanisms for this, in-place arithmetic operators such as `+=`, and `out`-arguments similar to what numpy provides.
Examples:

In [None]:
var = var * 2.0 # makes a copy
var *= 2.0 # in-place (faster)

In [None]:
var = sc.sqrt(var) # makes a copy
var = sc.sqrt(var, out=var) # in-place (faster)

Note that in-place operations cannot be used if a broadcast is required or a dtype change happens, since in-place operations may only change the data contained in a variable.