Finding the n parameters a which minimize when n>1 is a much
harder problem. The n=2 case is like trying to find the point at the
bottom of the lowest valley in a range of mountains. Higher dimensional
cases are hard to visualize at all. This author usually just pretends they
are 2-D problems and tries not to worry too much.
When presented with a new problem it is usually fruitful to get a feel for
the shape of the function surface by plotting against
for each
, in each case keeping
fixed.
As mentioned above the choice of local minimizer will depend upon whether it is possible to generate first or second derivatives efficiently, and how noisy the function is. All the local minimizers below assume that the function is locally relatively smooth - any noise will confuse the derivative estimates and may confound the algorithm. For such functions the Simplex Method is recommended (Section 2.7.1).
Most of the local minimizers below rely on approximating the function about the current point, a, using a Taylor series expansion
The matrix is called the Hessian matrix of the function at
a.
At the minimum the gradient is zero (if it wasn't we could move further
downhill), so and
(where A is the Hessian evaluated at the minimum point ).
See also Covariance Estimation (section 1.9).