Migrating to Scipp 23.01.0#

Overview#

Starting with Scipp 23.01.0, a VariancesError is raised when an operand with variances (statistical uncertianties) is implicitly or explicitly broadcast. This is a necessary bugfix, since Scipp is not capable of tracking correlations and would thus silently underestimate uncertainties.

What is affected?#

Any operation with a broadcast of an operand with variances is affected. This can happen for dense data as well as for binned data. We give two examples:

[1]:
import scipp as sc

var = sc.ones(dims=['x', 'y'], shape=(4, 3))
norm = sc.array(
    dims=['y'],
    values=[0.1, 0.2, 0.3],
    variances=[0.1, 0.2, 0.3],
)

An operation between var and norm will raise VariancesError since norm is broadcast into dimension 'x':

[2]:
var / norm
---------------------------------------------------------------------------
VariancesError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 var / norm

VariancesError: Cannot broadcast object with variances as this would introduce unhandled correlations. Input dimensions were:
(x: 4, y: 3) variances=False
(y: 3) variances=True

See https://doi.org/10.3233/JNR-220049 for more background.
[3]:
binned = sc.data.table_xyz(100).bin(y=3)

An operation between binned and norm will raise VariancesError since norm is broadcast to all bin elements:

[4]:
binned / norm
---------------------------------------------------------------------------
VariancesError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 binned / norm

VariancesError: Cannot broadcast object with variances as this would introduce unhandled correlations. Input dimensions were:
(y: 3) variances=False
(y: 3) variances=True

See https://doi.org/10.3233/JNR-220049 for more background.

Working around the new behavior#

Scipp will not be able to support an automatic approach for handling these cases correctly, since this would be either very costly or very problem-specific. An easy but wrong work around (equivalent to the old behavior in almost all cases) is to set the variances of the term being broadcast to None:

[5]:
import logging

stream_handler = logging.StreamHandler()
logger = sc.get_logger()
logger.addHandler(stream_handler)
[6]:
logger.warning(
    """ATTENTION:
    Statistical uncertainties of the normalization term were IGNORED.
    This is probably incorrect."""
)
norm.variances = None
var / norm
binned / norm
ATTENTION:
    Statistical uncertainties of the normalization term were IGNORED.
    This is probably incorrect.
[6]:
Show/Hide data repr Show/Hide attributes
scipp.DataArray (5.82 KB)
    • y: 3
    • y
      (y [bin-edge])
      float64
      m
      0.007, 0.335, 0.663, 0.991
      Values:
      array([0.00675377, 0.33500208, 0.6632504 , 0.99149871])
    • (y)
      DataArrayView
      binned data [len=41, len=25, len=34]
      dim='row',
      content=DataArray(
                dims=(row: 100),
                data=float64[K],
                coords={'x':float64[m], 'y':float64[m], 'z':float64[m]})

We repeat that this is incorrect. We strongly recommend to issue an explicit warning to the user, as shown in the example.

Fixing the problem#

For many applications there is no simple solution to this, as it may involve an intractably large correlation matrix. See our publication (preprint submitted to Journal of Neutron Science) for solutions in certain cases.

We recommend to derive the actual expression for uncertainties of the entire calculation, using a first-order truncated Taylor expansion. In a number of our own applications it turned out that the final expression can be computed rather cheaply — without dealing with the intractably large intermediate correlation matrices.

If this is not possible, approaches such as Bootstrapping may be used.