. Advertisement .
..3..
. Advertisement .
..4..
Fitting models to numerical data is a common task for scientists and engineers. This tutorial will show you can get this done with the SciPy scipy.optimize.curve_fit
method.
Data Fitting With The SciPy scipy.optimize.curve_fit Method
scipy.optimize.curve_fit()
The SciPy library comes with a method for curve fitting with nonlinear least squares: curve_fit()
. It uses the scipy.optimize.leastsq()
under the hood to find the optimal parameters for your model function.
Syntax:
scipy.optimize.curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, *, full_output, **kwargs)'
Parameters:
- f: the model function used for the curve fitting. Its first variable must be independent, while the remaining arguments are the parameters the
curve_fit()
method has to fit. - xdata: an object that stores the data of the independent variable.
- yadata: an array-like object that stores the dependent data.
- p0: an optional array-like object that contains the initial guesses for the parameters. If it is set to None, the initial values of all parameters will be 1.
- sigma: an optional MxM array or M-length sequence that specifies ydata’s uncertainty. The default value is None, which is equivalent to a 1D sigma of one values.
- absolute_signa: an optional boolean value that determines whether the sigma parameter above should be used absolutely. The default value is False, which means the method only takes into account the relative magnitudes of its values.
- check_finite: an optional boolean value that determines whether the method should check for infs and nans in the input arrays. If True, the check is performed, and a ValueError is raised if there are infs or nans in them. This is also the default value. You may set it to False and omit the check. But doing so may create nonsensical results when your input array has nans.
- bounds: an optional tuple that sets the upper and lower bounds of parameters. If the elements of the tuple are scalar, these bounds are set for all parameters. You can change the bounds for each individual parameter by using a tuple of two arrays that have equal lengths to the number of parameters.
- method: an optional string parameter that sets the optimization mode. The default methods are ‘trf’ (if you provide the method with bounds) and ‘lm’ (for problems with no bounds).
- full_output: an optional boolean parameter that controls whether the method should return additional information, including ier, mesg, and infodict.
The curve_fit()
method always returns two arrays: popt
and pcov
:
- popt: this array stores the optimal values of all parameters that ensures the least squares condition.
- pcov: this two-dimensional array contains the estimated covariance of popt.
Examples
Suppose you have two text files containing data of x and y values of a linear function: y = a*x + b.
First, you will have to load data from those files into two NumPy arrays:
x = np.loadtxt('x.out')
y = np.loadtxt('y.out')
You can check the data by drawing a matplotlib plot:
plt.plot(x, y, '.')
plt.show()
......... ADVERTISEMENT .........
..8..
Note: to learn how to make more complicated plots like histograms, check out this guide.
Your job is now to find out the slope and intercept of the line that is best-fit to our x and y data. The following code can help you with it.
You will need to define a function that represents your model first, then pass it to the curve_fit()
method to find the best-fit parameters.
def ourFunc(x, a, b)
return a * x + b
popt, pcov = curve_fit(ourFunc, x, y)
The best-fit parameters of your model function are stored in the first returned array – popt:
>>> popt
array([2.93296399, 4.06325395])
Meanwhile, the second array (pcov) is the covariance matrix. It contains the correlations and uncertainties between parameters.
You can use the popt array to draw a plot of the best-fit function:
a, b = popt
plt.scatter(x, y)
x_line = np.arange(min(x), max(x), 1)
y_line = ourFunc(x_line, a, b)
plt.plot(x_line, y_line, '--', color='red')
plt.show()
......... ADVERTISEMENT .........
..8..
This is another example which shows the curve fitting for a Gaussian function.
......... ADVERTISEMENT .........
..8..
def ourFunc(x, A, B):
y = A*np.exp(-1*B*x**2)
return y
popt, pcov = curve_fit(ourFunc, x, y)
a, b = popt
plt.scatter(x, y)
x_line = np.arange(min(x), max(x), 1)
y_line = ourFunc(x_line, a, b)
plt.plot(x_line, y_line, '--', color='red')
plt.show()
......... ADVERTISEMENT .........
..8..
Summary
With the SciPy scipy.optimize.curve_fit
method, you can find the best-fit parameters to fit a model function to your existing data. It works well if you intend to use the least squares analysis.
Leave a comment