Finish reviewing Newton's method

2025-10-21 23:43:10 +00:00 · 2025-10-21 23:43:10 +00:00 · a89e219b0b
commit a89e219b0b
parent a1a1393056
1 changed files with 26 additions and 9 deletions
--- a/Numerical-optimization.md
+++ b/Numerical-optimization.md
@ -30,17 +30,34 @@ Uniform regularization can be seen as an interpolation between Newton’s method
 #### Review
-Let's say we're trying to minimize a smooth function $f$ on an affine search space with underlying vector space $V$. We can use the affine structure to take a second-order approximation of $f$ near any point $p$:
+Let’s say we're trying to minimize a smooth function $f$ on an affine search space with underlying vector space $V$. We can use the affine structure to take a second-order approximation of $f$ near any point $p$:
 ```math
 f(p + v) \in f^{(0)}_p + f^{(1)}_p(v) + \tfrac{1}{2} f^{(2)}_p(v, v) + \mathfrak{m}^2
 ```
 Here, $v$ is a $V$-valued variable, each $f^{(k)}_p$ is a symmetric $k$-linear form on $V$, and $\mathfrak{m}$ is the ideal of smooth functions on $V$ that vanish to first order at the origin. The form $f^{(k)}_p$ is called the *$k$th derivative* of $f$ at $p$, and it turns out that $f^{(2)}(v, \_\!\_)$ is the derivative of $f^{(1)}_{p}(v)$ with respect to $p$. Most people writing about Newton’s method use the term *Hessian* to refer to either the second derivative or an operator that represents it.
-$$f(p + v) \in f^{(0)}_p + f^{(1)}_p(v) + \tfrac{1}{2} f^{(2)}_p(v, v) + \mathfrak{m}^2$$
+When the second derivative is positive-definite, the second-order approximation has a unique minimum. Newton’s method is based on the hope that this minimum is near a local minimum of $f$. It works by repeatedly stepping to the minimum of the second-order approximation and then taking a new second-order approximation at that point. The minimum of the second-order approximation is nicely characterized as the place where the derivative of the second-order approximation vanishes. The *Newton step*—the vector $v \in V$ that takes us to the minimum of the second-order approximation—is therefore the solution of the following equation:
 ```math
 f^{(1)}_p(\_\!\_) + f^{(2)}_p(v, \_\!\_) = 0.
 ```
 For computations, we’ll choose a basis $\mathbb{R}^n \to V$ and express $f^{(1)}_p$ and $f^{(2)}_p$ in terms of the standard inner product $\langle \_\!\_, \_\!\_ \rangle$ on $\mathbb{R}^n$, writing
 ```math
 \begin{align*}
 f^{(1)}_p(w) & = \langle F^{(1)}_p, w \rangle \\
 f^{(2)}_p(v, w) & = \langle F^{(2)}_p v, w \rangle
 \end{align*}
 ```
 with a vector $F^{(1)}_p \in \R^{n}$ and a symmetric operator $F^{(2)}_p \in \operatorname{End}(\mathbb{R}^n)$. The equation that defines the Newton step can then be written
 ```math
 \begin{align*}
 \langle F^{(1)}_p, \_\!\_ \rangle + \langle F^{(2)}_p v, \_\!\_ \rangle & = 0 \\
 \langle F^{(1)}_p + F^{(2)}_p v, \_\!\_ \rangle & = 0 \\
 F^{(1)}_p + F^{(2)}_p v & = 0
 \end{align*}
 ```
 using the non-degeneracy of the inner product. When the bilinear form $f^{(2)}_p$ is positive-definite, the operator $F^{(2)}_p$ is positive-definite too, so we can solve this equation by taking the Cholesky decomposition of $F^{(2)}_p$.
-Here, $v$ is a $V$-valued variable, each $f^{(k)}_p$ is a symmetric $k$-linear form on $V$, and $\mathfrak{m}$ is the ideal of smooth functions on $V$ that vanish to first order at the origin.
+If $f$ is convex, its second derivative is positive-definite everywhere, so the Newton step is always well-defined. However, non-convex loss functions show up in many interesting problems, including ours. For these problems, we need to decide how to step at a point where the second derivative is indefinite. One approach is to *regularize* the equation that defines the Newton step by making some modification of the second derivative that turns it into a positive-definite bilinear form. We’ll discuss some regularization methods below.
 When $f^{(2)}_p(v, v)$ is positive-definite, the second-order approximation has a unique minimum. Newton’s method is based on the hope that this minimum is near a local minimum of $f$. It works by repeatedly stepping to the minimum of the second-order approximation and then taking a new second-order approximation at that point. The minimum of the second-order approximation is nicely characterized as the place where the derivative of the second-order approximation vanishes. The *Newton step*—the vector $v \in V$ that takes us to the minimum of the second-order approximation—is therefore the solution of the following equation:
 $$f^{(1)}(-) + f^{(2)}(v, -) = 0.$$
 _To be continued_
 #### Uniform regularization