scipp.curve_fit#
- scipp.curve_fit(coords, f, da, *, p0=None, bounds=None, reduce_dims=(), unsafe_numpy_f=False, **kwargs)#
Use non-linear least squares to fit a function, f, to data. The function interface is similar to that of
xarray.DataArray.curvefit()
.Added in version 23.12.0.
This is a wrapper around
scipy.optimize.curve_fit()
. See there for in depth documentation and keyword arguments. The differences are:Instead of separate
xdata
,ydata
, andsigma
arguments, the input data array defines these,xdata
by the coords on the data array,ydata
byda.data
, andsigma
is defined as the square root of the variances, if present, i.e., the standard deviations.The fit function
f
must work with scipp objects. This provides additional safety over the underlying scipy function by ensuring units are consistent.The initial guess in
p0
must be provided as a dict, mapping from fit-function parameter names to initial guesses.The parameter bounds must also be provided as a dict, like
p0
.If the fit parameters are not dimensionless the initial guess must be a scipp
Variable
with the correct unit.If the fit parameters are not dimensionless the bounds must be a variables with the correct unit.
The bounds and initial guesses may be scalars or arrays to allow the initial guesses or bounds to vary in different regions. If they are arrays they will be broadcasted to the shape of the output.
The returned optimal parameter values
popt
and the covariance matrixpcov
will have units if the initial guess has units.popt
andpcov
areDataGroup
and aDataGroup
ofDataGroup
respectively. They are indexed by the fit parameter names. The variances of the parameter values inpopt
are set to the corresponding diagonal value in the covariance matrix.
- Parameters:
coords (
Sequence
[str
] |Mapping
[str
,str
|Variable
]) – The coords that act as predictor variables in the fit. If a mapping, the keys signify names of arguments tof
and the values signify coordinate names inda.coords
. If a sequence, the names of the arguments tof
and the names of the coords are taken to be the same. To use a fit coordinate not present inda.coords
, pass it as a Variable.f (
Callable
) – The model function,f(x, y..., a, b...)
. It must take all coordinates listed incoords
as arguments, otherwise aValueError
will be raised, all other arguments will be treated as parameters of the fit.da (
DataArray
) – The values of the data array provide the dependent data. If the data array stores variances then the standard deviations (square root of the variances) are taken into account when fitting.p0 (
Optional
[dict
[str
,Variable
|Real
]], default:None
) – An optional dict of initial guesses for the parameters. If None, then the initial values will all be dimensionless 1. If the fit function cannot handle initial values of 1, in particular for parameters that are not dimensionless, then typically a :py:class:scipp.UnitError
is raised, but details will depend on the function.bounds (
Optional
[dict
[str
,tuple
[Variable
,Variable
] |tuple
[Real
,Real
]]], default:None
) – Lower and upper bounds on parameters. Defaults to no bounds. Bounds are given as a dict of 2-tuples of (lower, upper) for each parameter where lower and upper are either both Variables or plain numbers. Parameters omitted from thebounds
dict are unbounded.reduce_dims (
Sequence
[str
], default:()
) – Additional dimensions to aggregate while fitting. If a dimension is not inreduce_dims
, or in the dimensions of the coords used in the fit, then the values of the optimal parameters will depend on that dimension. One fit will be performed for every slice, and the data arrays in the output will have the dimension in theirdims
. If a dimension is passed toreduce_dims
all data in that dimension is instead aggregated in a single fit and the dimension will not be present in the output.unsafe_numpy_f (
bool
, default:False
) – By default the provided fit functionf
is assumed to take scipp Variables as input and use scipp operations to produce a scipp Variable as output. This has the safety advantage of unit checking. However, in some cases it might be advantageous to implementf
using Numpy operations for performance reasons. This is particularly the case if the curve fit will make many small curve fits involving relatively few data points. In this case the pybind overhead on scipp operations might be considerable. Ifunsafe_numpy_f
is set toTrue
then the arguments passed tof
will be Numpy arrays instead of scipp Variables and the output off
is expected to be a Numpy array.
- Returns:
popt
– Optimal values for the parameters.pcov
– The estimated covariance of popt.
See also
scipp.scipy.optimize.curve_fit
Similar functionality for 1D fits.
Examples
A 1D example
>>> def round(a, d): ... 'Helper for the doctests' ... return sc.round(10**d * a) / 10**d
>>> def func(x, a, b): ... return a * sc.exp(-b * x)
>>> rng = np.random.default_rng(1234) >>> x = sc.linspace(dim='x', start=0.0, stop=0.4, num=50, unit='m') >>> y = func(x, a=5, b=17/sc.Unit('m')) >>> y.values += 0.01 * rng.normal(size=50) >>> da = sc.DataArray(y, coords={'x': x})
>>> from scipp import curve_fit >>> popt, _ = curve_fit(['x'], func, da, p0 = {'b': 1.0 / sc.Unit('m')}) >>> round(sc.values(popt['a']), 3), round(sc.stddevs(popt['a']), 4) (<scipp.DataArray> Dimensions: Sizes[] Data: float64 [dimensionless] () 4.999 , <scipp.DataArray> Dimensions: Sizes[] Data: float64 [dimensionless] () 0.0077 )
A 2D example where two coordinates participate in the fit
>>> def func(x, z, a, b): ... return a * z * sc.exp(-b * x)
>>> x = sc.linspace(dim='x', start=0.0, stop=0.4, num=50, unit='m') >>> z = sc.linspace(dim='z', start=0.0, stop=1, num=10) >>> y = func(x, z, a=5, b=17/sc.Unit('m')) >>> y.values += 0.01 * rng.normal(size=500).reshape(10, 50) >>> da = sc.DataArray(y, coords={'x': x, 'z': z})
>>> popt, _ = curve_fit(['x', 'z'], func, da, p0 = {'b': 1.0 / sc.Unit('m')}) >>> round(sc.values(popt['a']), 3), round(sc.stddevs(popt['a']), 3) (<scipp.DataArray> Dimensions: Sizes[] Data: float64 [dimensionless] () 5.004 , <scipp.DataArray> Dimensions: Sizes[] Data: float64 [dimensionless] () 0.004 )
A 2D example where only one coordinate participates in the fit and we map over the dimension of the other coordinate. Note that the value of one of the parameters is z-dependent and that the output has a z-dimension
>>> def func(x, a, b): ... return a * sc.exp(-b * x)
>>> x = sc.linspace(dim='xx', start=0.0, stop=0.4, num=50, unit='m') >>> z = sc.linspace(dim='zz', start=0.0, stop=1, num=10) >>> # Note that parameter a is z-dependent. >>> y = func(x, a=z, b=17/sc.Unit('m')) >>> y.values += 0.01 * rng.normal(size=500).reshape(10, 50) >>> da = sc.DataArray(y, coords={'x': x, 'z': z})
>>> popt, _ = curve_fit( ... ['x'], func, da, ... p0 = {'b': 1.0 / sc.Unit('m')}) >>> # Note that approximately a = z >>> round(sc.values(popt['a']), 2), (<scipp.DataArray> Dimensions: Sizes[zz:10, ] Coordinates: * z float64 [dimensionless] (zz) [0, 0.111111, ..., 0.888889, 1] Data: float64 [dimensionless] (zz) [-0.01, 0.11, ..., 0.89, 1.01] ,)
Lastly, a 2D example where only one coordinate participates in the fit and the other coordinate is reduced.
>>> def func(x, a, b): ... return a * sc.exp(-b * x)
>>> x = sc.linspace(dim='xx', start=0.0, stop=0.4, num=50, unit='m') >>> z = sc.linspace(dim='zz', start=0.0, stop=1, num=10) >>> y = z * 0 + func(x, a=5, b=17/sc.Unit('m')) >>> y.values += 0.01 * rng.normal(size=500).reshape(10, 50) >>> da = sc.DataArray(y, coords={'x': x, 'z': z})
>>> popt, _ = curve_fit( ... ['x'], func, da, ... p0 = {'b': 1.0 / sc.Unit('m')}, reduce_dims=['zz']) >>> round(sc.values(popt['a']), 3), round(sc.stddevs(popt['a']), 4) (<scipp.DataArray> Dimensions: Sizes[] Data: float64 [dimensionless] () 5 , <scipp.DataArray> Dimensions: Sizes[] Data: float64 [dimensionless] () 0.0021 )
Note that the variance is about 10x lower in this example than in the first 1D example. That is because in this example 50x10 points are used in the fit while in the first example only 50 points were used in the fit.