Migrating to Scipp 23.01.0#
Overview#
Starting with Scipp 23.01.0, a VariancesError
is raised when an operand with variances (statistical uncertianties) is implicitly or explicitly broadcast. This is a necessary bugfix, since Scipp is not capable of tracking correlations and would thus silently underestimate uncertainties.
What is affected?#
Any operation with a broadcast of an operand with variances is affected. This can happen for dense data as well as for binned data. We give two examples:
[1]:
import scipp as sc
var = sc.ones(dims=['x', 'y'], shape=(4, 3))
norm = sc.array(
dims=['y'],
values=[0.1, 0.2, 0.3],
variances=[0.1, 0.2, 0.3],
)
An operation between var
and norm
will raise VariancesError
since norm
is broadcast into dimension 'x'
:
[2]:
var / norm
---------------------------------------------------------------------------
VariancesError Traceback (most recent call last)
Cell In[2], line 1
----> 1 var / norm
VariancesError: Cannot broadcast object with variances as this would introduce unhandled correlations. Input dimensions were:
(x: 4, y: 3) variances=False
(y: 3) variances=True
See https://doi.org/10.3233/JNR-220049 for more background.
[3]:
binned = sc.data.table_xyz(100).bin(y=3)
An operation between binned
and norm
will raise VariancesError
since norm
is broadcast to all bin elements:
[4]:
binned / norm
---------------------------------------------------------------------------
VariancesError Traceback (most recent call last)
Cell In[4], line 1
----> 1 binned / norm
VariancesError: Cannot broadcast object with variances as this would introduce unhandled correlations. Input dimensions were:
(y: 3) variances=False
(y: 3) variances=True
See https://doi.org/10.3233/JNR-220049 for more background.
Working around the new behavior#
Scipp will not be able to support an automatic approach for handling these cases correctly, since this would be either very costly or very problem-specific. An easy but wrong work around (equivalent to the old behavior in almost all cases) is to set the variances of the term being broadcast to None
:
[5]:
import logging
stream_handler = logging.StreamHandler()
logger = sc.get_logger()
logger.addHandler(stream_handler)
[6]:
logger.warning(
"""ATTENTION:
Statistical uncertainties of the normalization term were IGNORED.
This is probably incorrect."""
)
norm.variances = None
var / norm
binned / norm
ATTENTION:
Statistical uncertainties of the normalization term were IGNORED.
This is probably incorrect.
[6]:
- y: 3
- y(y [bin-edge])float64m0.007, 0.335, 0.663, 0.991
Values:
array([0.00675377, 0.33500208, 0.6632504 , 0.99149871])
- (y)DataArrayViewbinned data [len=41, len=25, len=34]
dim='row', content=DataArray( dims=(row: 100), data=float64[K], coords={'x':float64[m], 'y':float64[m], 'z':float64[m]})
We repeat that this is incorrect. We strongly recommend to issue an explicit warning to the user, as shown in the example.
Fixing the problem#
For many applications there is no simple solution to this, as it may involve an intractably large correlation matrix. See our publication (preprint submitted to Journal of Neutron Science) for solutions in certain cases.
We recommend to derive the actual expression for uncertainties of the entire calculation, using a first-order truncated Taylor expansion. In a number of our own applications it turned out that the final expression can be computed rather cheaply — without dealing with the intractably large intermediate correlation matrices.
If this is not possible, approaches such as Bootstrapping may be used.