- Least Squares Approach
- Error function approximation
- Steepest descent method
- Gauss-Newton method
- Levenberg-Marquardt method
- SVD method
Let us introduce a weighted error function
In the ideal case of precise measurements and exact fitting function we have the following set of equations:
and the error function is equal to zero.
The number of equations K should be greater than or equal to N for the unique determination of the fitting parameters:
The least square approach defines the best fit as the set of parameters minimizing the error function:
If the error function has derivatives, then the minimum is characterized by the following condition:
In the vicinity of the point near the minimum, the error function can be presented as
|where||is the gradient|
|and||is the Hessian|
The above error function approximation (7) can be rewritten in the following vector-matrix form:
The gradient vector gives the direction of the steepest ascent on the error surface. The fastest way to reach the minimum is to go in the opposite direction:
This approach is called the steepest descent method. This method certainly converges to the minimum if the step is sufficiently small. Although the convergence is slow.
For the second order approximation (8) the minimum condition (6) becomes
and the solution is
Here and below the * superscript at g and H is omitted.
Eq. (11) solves the optimization problem in one step if the error function is the second order surface. Otherwise the procedure (11) should be applied iteratively.
If the Hessian is (almost) degenerate then the procedure (11) becomes unstable. The regularization techniques use a modified operator instead of the original one (the Hessian in this case). Often a scalar matrix is used for the Hessian correction:
Instead of Eq. (10) we have
At a sufficiently large the solution of Eq.(13) is stable because it reduces to Eq. (9) with the small step= -1.
Let us apply the Gauss-Newton method to the above error function (2). We need the expressions for the gradient and the Hessian
The gradient is
and the Hessian is
When the error function approaches the minimum, the residuals tend to zero:
The first term in (15) becomes small and can be ignored:
Eq. (17) can be rewritten in the following matrix form
and Eq. (10) becomes
Eq. (19) can be solved iteratively. The use of regularization (12) guarantees the convergence.
On the other hand, Eq. (19) is the least squares problem for the following system of linear equations:
The system is over-defined because of condition (4) and (18c) is a rectangular K x N matrix. The matrix can always be presented as a product of three matrices:
Here U is K x K, is K x N and V is N x N matrices. has only diagonal non-zero elements and the number of non-zero elements is M
Keeping only non-zero elements in , the dimensions of matrices in (21) can be reduced to K x M, M x M and N x M , respectively.
Eq. (21) is called Singular Value Decomposition (SVD), the column vectors of U and V are called Singular Vectors and the diagonal elements of are called Singular Values. It is convenient to order the singular values as follows:
The singular vectors are orthogonal
The least squares solution of Eq. (20) is
The regularized solution is
The advantage of this approach is that change of the regularization factor does not require recalculation of SVD.
Truncation of the sum in Eq. (25) also stabilize (regularize) the solution. This regularization is called truncated SVD.
Please e-mail me at firstname.lastname@example.org
©Nikolai Shokhirev, 2001-2009