There are a handful of families of functions that form the basic toolkit that we use to model data. Curve fitting is the process of constructing a curve, or mathematical function, that best fits a series of data points.

The simplest way to decide which, if any, type of function fits a dataset is to examine a scatterplot of the data. If we can spot a general pattern that looks like any of the functions we would like to model with, then we can try to fit the data with it. Let’s look at how to fit models to data in Python.

There are a handful of ways you can go about fitting functions to data in Python. The theory and practice are well explained in this outstanding video by Brant Carlson.

Let’s reproduce an example from Emily Grace Ripka’s blog, with some adaptations from Scipy’s documentation.

import scipy.optimize
import numpy as np
import matplotlib.pyplot as plt

x_array = np.linspace(1, 10, 10)
y_array = np.linspace(5, 200, 10)
y_noise = 60 * (np.random.ranf(10))
y_array += y_noise

By inspecting the graph, we believe the relationship to be linear, so let’s create a linear model \(f(x) = mx + b\).

def linear(x, m, b):
  return x * m + b
params, covariance = scipy.optimize.curve_fit(linear, x_array, y_array)
## array([22.59559759,  4.98133243])
plt.scatter(x_array, y_array, label = "data points")
plt.plot(x_array, linear(x_array, *params), "--r", label = f"fit parameters : m = {round(params[0], 2)}, b = {round(params[1], 2)}")