Least Squares Approach



Let us introduce a weighted error function


 In the ideal case of precise measurements and exact fitting function we have the following set of equations:


and the error function is equal to zero.

The number of equations K should be greater than or equal to N for the unique determination of the fitting parameters:


The least square approach defines the best fit as the set of parameters minimizing the error function:


If the error function has derivatives, then the minimum is characterized by the following condition:


Error function approximation

In the vicinity of the point near the minimum, the error function can be presented as



where   is the gradient 
and   is the Hessian

The above error function approximation (7) can be rewritten in the following vector-matrix form:


Steepest descent method

The gradient vector gives the direction of the steepest ascent on the error surface. The fastest way to reach the minimum is to go in the opposite direction:


This approach is called the steepest descent method. This method certainly converges to the minimum if the step  is sufficiently small. Although the convergence is slow.

Gauss-Newton method

For the second order approximation (8) the minimum condition (6) becomes 


and the solution is


Here and below the * superscript at g and H is omitted.

Eq. (11) solves the optimization problem in one step if the error function is the second order surface. Otherwise the procedure (11) should be applied iteratively. 


If the Hessian is (almost) degenerate then the procedure (11) becomes unstable. The regularization techniques use a modified operator instead of the original one (the Hessian in this case). Often a scalar matrix is used for the Hessian correction: 


Instead of Eq. (10) we have


At a sufficiently large the solution of Eq.(13) is stable because it reduces to Eq. (9) with the small step= -1

Levenberg-Marquardt method

Let us apply the Gauss-Newton method to the above error function (2). We need the expressions for the gradient and the Hessian

The gradient is


and the Hessian is 


When the error function approaches the minimum, the residuals tend to zero: 


The first term in (15) becomes small and can be ignored: 


Eq. (17) can be rewritten in the following matrix form 




and Eq. (10) becomes 



Eq. (19) can be solved iteratively. The use of regularization (12) guarantees the convergence. 

SVD method

On the other hand, Eq. (19) is the least squares problem for the following system of linear equations:


The system is over-defined because of condition (4) and  (18c) is a rectangular K x N matrix. The matrix can always be presented as a product of three matrices:


Here U is K x K,  is K x N  and V is N x N matrices. has only diagonal non-zero elements and the number of non-zero elements is M


Keeping only non-zero elements in , the dimensions of matrices in (21) can be reduced to K x M, M x M  and  N x M , respectively.

Eq. (21) is called Singular Value Decomposition (SVD), the column vectors of U and V are called Singular Vectors and the diagonal elements of are called Singular Values. It is convenient to order the singular values as follows: 


The singular vectors are orthogonal 


The least squares solution of Eq. (20) is 


The regularized solution is 


The advantage of this approach is that change of  the regularization factor does not require recalculation of SVD. 

Truncation of the sum in Eq. (25) also stabilize (regularize) the solution. This regularization is called truncated SVD. 


  1. George E. Forsythe, Mike A. Malcolm, and Cleve B. Moler, Computer Methods for Mathematical Computations, Englewood Cliffs, N.J.: Prentice Hall, 1977.
  2. Levenberg-Marquardt Method, Wolfram Research, 
  3. S. M. Tan and Colin Fox. Introduction to Inverse Problem, 


ABC Tutorials | Data Processing | Indirect Measurements | NMR Tutorials

Home | Resumé | |  Computing |  Links |  Publications

Please e-mail me at

©Nikolai Shokhirev, 2001-2009