make_smoothing_spline(x, y, w=None, lam=None)
A smoothing spline is found as a solution to the regularized weighted linear regression problem:
\sum\limits_{i=1}^n w_i\lvert y_i - f(x_i) \rvert^2 + \lambda\int\limits_{x_1}^{x_n} (f^{(2)}(u))^2 d u
where f is a spline function, w is a vector of weights and \lambda is a regularization parameter.
If lam
is None, we use the GCV criteria to find an optimal regularization parameter, otherwise we solve the regularized weighted linear regression problem with given parameter. The parameter controls the tradeoff in the following way: the larger the parameter becomes, the smoother the function gets.
This algorithm is a clean room reimplementation of the algorithm introduced by Woltring in FORTRAN [2]. The original version cannot be used in SciPy source code because of the license issues. The details of the reimplementation are discussed here (available only in Russian) [4].
If the vector of weights w
is None, we assume that all the points are equal in terms of weights, and vector of weights is vector of ones.
Note that in weighted residual sum of squares, weights are not squared: \sum\limits_{i=1}^n w_i\lvert y_i - f(x_i) \rvert^2 while in splrep
the sum is built from the squared weights.
In cases when the initial problem is ill-posed (for example, the product X^T W X where X is a design matrix is not a positive defined matrix) a ValueError is raised.
Abscissas.
Ordinates.
Vector of weights. Default is np.ones_like(x)
.
Regularization parameter. If lam
is None, then it is found from the GCV criteria. Default is None.
A callable representing a spline in the B-spline basis as a solution of the problem of smoothing splines using the GCV criteria [1] in case lam
is None, otherwise using the given parameter lam
.
Compute the (coefficients of) smoothing cubic spline function using lam
to control the tradeoff between the amount of smoothness of the curve and its proximity to the data. In case lam
is None, using the GCV criteria [1] to find it.
import numpy as np
np.random.seed(1234)
n = 200
def func(x):
return x**3 + x**2 * np.sin(4 * x)
x = np.sort(np.random.random_sample(n) * 4 - 2)
y = func(x) + np.random.normal(scale=1.5, size=n)
from scipy.interpolate import make_smoothing_spline
spl = make_smoothing_spline(x, y)
import matplotlib.pyplot as plt
grid = np.linspace(x[0], x[-1], 400)
plt.plot(grid, spl(grid), label='Spline')
plt.plot(grid, func(grid), label='Original function')
plt.scatter(x, y, marker='.')
plt.legend(loc='best')
plt.show()
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them