diff --git a/Numerical-optimization.md b/Numerical-optimization.md index f5dde98..8dfb86e 100644 --- a/Numerical-optimization.md +++ b/Numerical-optimization.md @@ -63,11 +63,19 @@ If $f$ is convex, its second derivative is positive-definite everywhere, so the #### Uniform regularization -Given an inner product $(\_\!\_, \_\!\_)$ on $V$, we can make the modified second derivative $f^{(2)}_p(v, \_\!\_) + \lambda (\_\!\_, \_\!\_)$ positive-definite by choosing a large enough coefficient $\lambda$. We can say more precisely what it means for $\lambda$ to be large enough by expressing $f^{(2)}_p$ as $(\_\!\_, \tilde{F}^{(2)}_p\_\!\_)$ and taking the lowest eigenvalue $\lambda_{\text{min}}$ of $\tilde{F}^{(2)}_p$. The modified second derivative is positive-definite when $\lambda > -\max\{\lambda_\text{min}, 0\}$. +Given an inner product $(\_\!\_, \_\!\_)$ on $V$, we can make the modified second derivative $f^{(2)}_p(v, \_\!\_) + \lambda (\_\!\_, \_\!\_)$ positive-definite by choosing a large enough coefficient $\lambda$. We can say precisely what it means for $\lambda$ to be large enough by expressing $f^{(2)}_p$ as $(\_\!\_, \tilde{F}^{(2)}_p\_\!\_)$ and taking the lowest eigenvalue $\lambda_{\text{min}}$ of $\tilde{F}^{(2)}_p$. The modified second derivative is positive-definite when $\delta := \lambda_\text{min} + \lambda$ is positive. We typically make a “minimal modification,” choosing $\lambda$ just a little larger than $-\max\{\lambda_\text{min}, 0\}$. This makes $\delta$ small when $\lambda_\text{min}$ is negative and $\lambda$ small when $\lambda_\text{min}$ is positive. -Uniform regularization can be seen as interpolating between Newton’s method and gradient descent. To see why, consider the regularized equation that defines the Newton step: +Uniform regularization can be seen as interpolating between Newton’s method in regions where the second derivative is solidly positive-definite and gradient descent in regions where the second derivative is far from positive definite. To see why, consider the regularized Newton step $v$ defined by the equation ```math -f^{(1)}_p(\_\!\_) + f^{(2)}_p(v, \_\!\_) + \lambda (v, \_\!\_) = 0. +f^{(1)}_p(\_\!\_) + f^{(2)}_p(v, \_\!\_) + \lambda (v, \_\!\_) = 0, +``` +the standard Newton step $w$ defined by the equation +```math +f^{(1)}_p(\_\!\_) + f^{(2)}_p(w, \_\!\_) = 0, +``` +and the gradient descent step $u$ defined by the equation +```math +f^{(1)}_p(\_\!\_) + (u, \_\!\_) = 0. ``` _To be continued_