Finish reviewing Newton's method

Vectornaut 2025-10-21 23:43:10 +00:00
parent a1a1393056
commit a89e219b0b

@ -30,17 +30,34 @@ Uniform regularization can be seen as an interpolation between Newtons method
#### Review #### Review
Let's say we're trying to minimize a smooth function $f$ on an affine search space with underlying vector space $V$. We can use the affine structure to take a second-order approximation of $f$ near any point $p$: Lets say we're trying to minimize a smooth function $f$ on an affine search space with underlying vector space $V$. We can use the affine structure to take a second-order approximation of $f$ near any point $p$:
```math
f(p + v) \in f^{(0)}_p + f^{(1)}_p(v) + \tfrac{1}{2} f^{(2)}_p(v, v) + \mathfrak{m}^2
```
Here, $v$ is a $V$-valued variable, each $f^{(k)}_p$ is a symmetric $k$-linear form on $V$, and $\mathfrak{m}$ is the ideal of smooth functions on $V$ that vanish to first order at the origin. The form $f^{(k)}_p$ is called the *$k$th derivative* of $f$ at $p$, and it turns out that $f^{(2)}(v, \_\!\_)$ is the derivative of $f^{(1)}_{p}(v)$ with respect to $p$. Most people writing about Newtons method use the term *Hessian* to refer to either the second derivative or an operator that represents it.
$$f(p + v) \in f^{(0)}_p + f^{(1)}_p(v) + \tfrac{1}{2} f^{(2)}_p(v, v) + \mathfrak{m}^2$$ When the second derivative is positive-definite, the second-order approximation has a unique minimum. Newtons method is based on the hope that this minimum is near a local minimum of $f$. It works by repeatedly stepping to the minimum of the second-order approximation and then taking a new second-order approximation at that point. The minimum of the second-order approximation is nicely characterized as the place where the derivative of the second-order approximation vanishes. The *Newton step*—the vector $v \in V$ that takes us to the minimum of the second-order approximation—is therefore the solution of the following equation:
```math
f^{(1)}_p(\_\!\_) + f^{(2)}_p(v, \_\!\_) = 0.
```
For computations, well choose a basis $\mathbb{R}^n \to V$ and express $f^{(1)}_p$ and $f^{(2)}_p$ in terms of the standard inner product $\langle \_\!\_, \_\!\_ \rangle$ on $\mathbb{R}^n$, writing
```math
\begin{align*}
f^{(1)}_p(w) & = \langle F^{(1)}_p, w \rangle \\
f^{(2)}_p(v, w) & = \langle F^{(2)}_p v, w \rangle
\end{align*}
```
with a vector $F^{(1)}_p \in \R^{n}$ and a symmetric operator $F^{(2)}_p \in \operatorname{End}(\mathbb{R}^n)$. The equation that defines the Newton step can then be written
```math
\begin{align*}
\langle F^{(1)}_p, \_\!\_ \rangle + \langle F^{(2)}_p v, \_\!\_ \rangle & = 0 \\
\langle F^{(1)}_p + F^{(2)}_p v, \_\!\_ \rangle & = 0 \\
F^{(1)}_p + F^{(2)}_p v & = 0
\end{align*}
```
using the non-degeneracy of the inner product. When the bilinear form $f^{(2)}_p$ is positive-definite, the operator $F^{(2)}_p$ is positive-definite too, so we can solve this equation by taking the Cholesky decomposition of $F^{(2)}_p$.
Here, $v$ is a $V$-valued variable, each $f^{(k)}_p$ is a symmetric $k$-linear form on $V$, and $\mathfrak{m}$ is the ideal of smooth functions on $V$ that vanish to first order at the origin. If $f$ is convex, its second derivative is positive-definite everywhere, so the Newton step is always well-defined. However, non-convex loss functions show up in many interesting problems, including ours. For these problems, we need to decide how to step at a point where the second derivative is indefinite. One approach is to *regularize* the equation that defines the Newton step by making some modification of the second derivative that turns it into a positive-definite bilinear form. Well discuss some regularization methods below.
When $f^{(2)}_p(v, v)$ is positive-definite, the second-order approximation has a unique minimum. Newtons method is based on the hope that this minimum is near a local minimum of $f$. It works by repeatedly stepping to the minimum of the second-order approximation and then taking a new second-order approximation at that point. The minimum of the second-order approximation is nicely characterized as the place where the derivative of the second-order approximation vanishes. The *Newton step*—the vector $v \in V$ that takes us to the minimum of the second-order approximation—is therefore the solution of the following equation:
$$f^{(1)}(-) + f^{(2)}(v, -) = 0.$$
_To be continued_
#### Uniform regularization #### Uniform regularization