scipp.curve_fit#

scipp.curve_fit(coords, f, da, *, p0=None, bounds=None, reduce_dims=(), unsafe_numpy_f=False, **kwargs)#

Use non-linear least squares to fit a function, f, to data. The function interface is similar to that of xarray.DataArray.curvefit().

New in version 23.12.0.

This is a wrapper around scipy.optimize.curve_fit(). See there for indepth documentation and keyword arguments. The differences are:

  • Instead of separate xdata, ydata, and sigma arguments, the input data array defines these, xdata by the coords on the data array, ydata by da.data, and sigma is defined as the square root of the variances, if present, i.e., the standard deviations.

  • The fit function f must work with scipp objects. This provides additional safety over the underlying scipy function by ensuring units are consistent.

  • The initial guess in p0 must be provided as a dict, mapping from fit-function parameter names to initial guesses.

  • The parameter bounds must also be provided as a dict, like p0.

  • The fit parameters may be scalar scipp variables. In that case an initial guess p0 with the correct units must be provided.

  • The returned optimal parameter values popt and the covariance matrix pcov will have units provided that the initial parameters have units. popt and pcov are DataGroup and a DataGroup of DataGroup respectively. They are indexed by the fit parameter names. The variances of the parameter values in popt are set to the corresponding diagonal value in the covariance matrix.

Parameters:
  • coords (Union[Sequence[str], Mapping[str, Union[str, Variable]]]) – The coords that act as predictor variables in the fit. If a mapping, the keys signify names of arguments to f and the values signify coordinate names in da.coords. If a sequence, the names of the arguments to f and the names of the coords are taken to be the same. To use a fit coordinate not present in da.coords, pass it as a Variable.

  • f (Callable) – The model function, f(x, y..., a, b...). It must take all coordinates listed in coords as arguments, otherwise a ValueError will be raised, all other arguments will be treated as parameters of the fit.

  • da (DataArray) – The values of the data array provide the dependent data. If the data array stores variances then the standard deviations (square root of the variances) are taken into account when fitting.

  • p0 (Optional[Dict[str, Union[Variable, Real]]], default: None) – An optional dict of initial guesses for the parameters. If None, then the initial values will all be dimensionless 1. If the fit function cannot handle initial values of 1, in particular for parameters that are not dimensionless, then typically a :py:class:scipp.UnitError is raised, but details will depend on the function.

  • bounds (Optional[Dict[str, Union[Tuple[Variable, Variable], Tuple[Real, Real]]]], default: None) – Lower and upper bounds on parameters. Defaults to no bounds. Bounds are given as a dict of 2-tuples of (lower, upper) for each parameter where lower and upper are either both Variables or plain numbers. Parameters omitted from the bounds dict are unbounded.

  • reduce_dims (Sequence[str], default: ()) – Additional dimensions to aggregate while fitting. If a dimension is not in reduce_dims, or in the dimensions of the coords used in the fit, then the values of the optimal parameters will depend on that dimension. One fit will be performed for every slice, and the data arrays in the output will have the dimension in their dims. If a dimension is passed to reduce_dims all data in that dimension is instead aggregated in a single fit and the dimension will not be present in the output.

  • unsafe_numpy_f (bool, default: False) – By default the provided fit function f is assumed to take scipp Variables as input and use scipp operations to produce a scipp Variable as output. This has the safety advantage of unit checking. However, in some cases it might be advantageous to implement f using Numpy operations for performance reasons. This is particularly the case if the curve fit will make many small curve fits involving relatively few data points. In this case the pybind overhead on scipp operations might be considerable. If unsafe_numpy_f is set to True then the arguments passed to f will be Numpy arrays instead of scipp Variables and the output of f is expected to be a Numpy array.

Returns:

  • popt – Optimal values for the parameters.

  • pcov – The estimated covariance of popt.

See also

scipp.scipy.optimize.curve_fit

Similar functionality for 1D fits.

Examples

A 1D example

>>> def round(a, d):
...     'Helper for the doctests'
...     return sc.round(10**d * a) / 10**d
>>> def func(x, a, b):
...     return a * sc.exp(-b * x)
>>> rng = np.random.default_rng(1234)
>>> x = sc.linspace(dim='x', start=0.0, stop=0.4, num=50, unit='m')
>>> y = func(x, a=5, b=17/sc.Unit('m'))
>>> y.values += 0.01 * rng.normal(size=50)
>>> da = sc.DataArray(y, coords={'x': x})
>>> from scipp import curve_fit
>>> popt, _ = curve_fit(['x'], func, da, p0 = {'b': 1.0 / sc.Unit('m')})
>>> round(sc.values(popt['a']), 3), round(sc.stddevs(popt['a']), 4)
(<scipp.DataArray>
 Dimensions: Sizes[]
 Data:
                       float64  [dimensionless]  ()  4.999
 ,
 <scipp.DataArray>
 Dimensions: Sizes[]
 Data:
                       float64  [dimensionless]  ()  0.0077
 )

A 2D example where two coordinates participate in the fit

>>> def func(x, z, a, b):
...     return a * z * sc.exp(-b * x)
>>> x = sc.linspace(dim='x', start=0.0, stop=0.4, num=50, unit='m')
>>> z = sc.linspace(dim='z', start=0.0, stop=1, num=10)
>>> y = func(x, z, a=5, b=17/sc.Unit('m'))
>>> y.values += 0.01 * rng.normal(size=500).reshape(10, 50)
>>> da = sc.DataArray(y, coords={'x': x, 'z': z})
>>> popt, _ = curve_fit(['x', 'z'], func, da, p0 = {'b': 1.0 / sc.Unit('m')})
>>> round(sc.values(popt['a']), 3), round(sc.stddevs(popt['a']), 3)
(<scipp.DataArray>
 Dimensions: Sizes[]
 Data:
                       float64  [dimensionless]  ()  5.004
 ,
 <scipp.DataArray>
 Dimensions: Sizes[]
 Data:
                       float64  [dimensionless]  ()  0.004
 )

A 2D example where only one coordinate participates in the fit and we map over the dimension of the other coordinate. Note that the value of one of the parameters is z-dependent and that the output has a z-dimension

>>> def func(x, a, b):
...     return a * sc.exp(-b * x)
>>> x = sc.linspace(dim='xx', start=0.0, stop=0.4, num=50, unit='m')
>>> z = sc.linspace(dim='zz', start=0.0, stop=1, num=10)
>>> # Note that parameter a is z-dependent.
>>> y = func(x, a=z, b=17/sc.Unit('m'))
>>> y.values += 0.01 * rng.normal(size=500).reshape(10, 50)
>>> da = sc.DataArray(y, coords={'x': x, 'z': z})
>>> popt, _ = curve_fit(
...    ['x'], func, da,
...     p0 = {'b': 1.0 / sc.Unit('m')})
>>> # Note that approximately a = z
>>> round(sc.values(popt['a']), 2),
(<scipp.DataArray>
 Dimensions: Sizes[zz:10, ]
 Coordinates:
 * z      float64  [dimensionless]  (zz)  [0, 0.111111, ..., 0.888889, 1]
 Data:
          float64  [dimensionless]  (zz)  [-0.01, 0.11, ..., 0.89, 1.01]
 ,)

Lastly, a 2D example where only one coordinate participates in the fit and the other coordinate is reduced.

>>> def func(x, a, b):
...     return a * sc.exp(-b * x)
>>> x = sc.linspace(dim='xx', start=0.0, stop=0.4, num=50, unit='m')
>>> z = sc.linspace(dim='zz', start=0.0, stop=1, num=10)
>>> y = z * 0 + func(x, a=5, b=17/sc.Unit('m'))
>>> y.values += 0.01 * rng.normal(size=500).reshape(10, 50)
>>> da = sc.DataArray(y, coords={'x': x, 'z': z})
>>> popt, _ = curve_fit(
...    ['x'], func, da,
...    p0 = {'b': 1.0 / sc.Unit('m')}, reduce_dims=['zz'])
>>> round(sc.values(popt['a']), 3), round(sc.stddevs(popt['a']), 4)
(<scipp.DataArray>
 Dimensions: Sizes[]
 Data:
                       float64  [dimensionless]  ()  5
 ,
 <scipp.DataArray>
 Dimensions: Sizes[]
 Data:
                       float64  [dimensionless]  ()  0.0021
 )

Note that the variance is about 10x lower in this example than in the first 1D example. That is because in this example 50x10 points are used in the fit while in the first example only 50 points were used in the fit.