diff --git a/Numerical-optimization.md b/Numerical-optimization.md index 31ad4e4..6665b34 100644 --- a/Numerical-optimization.md +++ b/Numerical-optimization.md @@ -23,6 +23,14 @@ We saw above that the vanishingly rare gradient descent paths that lead to saddl Uniform regularization can be seen as an interpolation between Newton’s method and gradient descent, which kicks in when lowest eigenvalue of the Hessian drops below zero and brings the search direction closer to the gradient descent direction as the lowest eigenvalue gets more negative. Since the Hessian is indefinite near a saddle point, Newton’s method with uniform regularization should act at least sort of like gradient descent near a saddle point. This suggests that it could get bogged down near saddle points in the same way. +### Flightiness + +_To be added_ + +### Gratuitous symmetry-breaking + +_To be added_ + ## Methods ### Newton’s method @@ -87,6 +95,17 @@ _To be added_ #### Modified Cholesky decomposition -Recall from [above](#Review) that once we express the first and second derivatives of $f$ as a vector $F^{(1)}_p \in \mathbb{R}^n$ and a matrix $F^{(2)}_p \in \operatorname{End}(\mathbb{R}^n)$ with respect to a computational basis $\mathbb{R}^n \to V$, we can find the Newton step $v$ at $p$ by solving the equation $F^{(1)}_p + F^{(2)}_p v = 0$. More abstractly, if we express the first and second derivatives of $f$ as a vector $\tilde{F}^{(1)}_p \in V$ and an operator $\tilde{F}^{(2)}_p \in \operatorname{End}(V)$ with respect to a chosen inner product on $V$, as discussed [above](#Uniform_regularization), we can find the Newton step by solving the equation $\tilde{F}^{(1)}_p + \tilde{F}^{(2)}_p v = 0$. When $f^{(2)}_p$ is positive-definite, taking the Cholesky decomposition of $\tilde{F}^{(2)}_p$ with respect to the standard inner product on $\mathbb{R}^n$ provides an efficient and numerically stable solution method. Since the Newton step doesn’t depend on the choice of inner product, we typically use the inner product given by the computational basis. +Recall from [above](#review) that once we express the first and second derivatives of $f$ as a vector $F^{(1)}_p \in \mathbb{R}^n$ and a matrix $F^{(2)}_p \in \operatorname{End}(\mathbb{R}^n)$ with respect to a computational basis $\mathbb{R}^n \to V$, we can find the Newton step $v$ at $p$ by solving the equation $F^{(1)}_p + F^{(2)}_p v = 0$. More abstractly, if we express the first and second derivatives of $f$ as a vector $\tilde{F}^{(1)}_p \in V$ and an operator $\tilde{F}^{(2)}_p \in \operatorname{End}(V)$ with respect to a chosen inner product $(\_\!\_, \_\!\_)$ on $V$, as discussed [above](#uniform-regularization), we can find the Newton step by solving the equation $\tilde{F}^{(1)}_p + \tilde{F}^{(2)}_p v = 0$. When $f^{(2)}_p$ is positive-definite, taking the Cholesky decomposition of $\tilde{F}^{(2)}_p$ with respect to the standard inner product on $\mathbb{R}^n$ provides an efficient and numerically stable solution method. Since the Newton step doesn’t depend on the choice of inner product, we typically use the inner product given by the computational basis. -_To be continued, citing [NW, §3.4]_ \ No newline at end of file +When $f^{(2)}_p$ isn’t guaranteed to be positive-definite, we can use a modified Cholesky decomposition to solve an implicitly regularized Newton step equation +```math +\tilde{F}^{(1)}_p + \big[\tilde{F}^{(2)}_p + E\big] v = 0, +``` +where the modification $E \in \operatorname{End}(V)$ is symmetric with respect to $(\_\!\_, \_\!\_)$ and balances some notion of smallness against the requirement that $\tilde{F}^{(2)}_p + E$ be safely positive-definite [CH, §1][NW, §3.4]. The hallmark of a modified Cholesky decomposition is that it directly produces a Cholesky decomposition of a pivoted version of $\tilde{F}^{(2)}_p + E$, with no need to produce $E$ as an intermediate step, and its speed is comparable to the ordinary Cholesky decomposition [CH, §1]. + +The [`modcholesky`](https://argmin-rs.github.io/modcholesky/modcholesky/) crate implements the modified Cholesky decompositions from [GMW], [SE90], and [SE99]. In our application, they tend to exhibit [flightiness](#flightiness) and [gratuitous symmetry-breaking](#gratuitous-symmetry-breaking). The latter might be caused by pivoting, which differentiates between the elements of an arbitrary orthonormal basis in a way that isn’t directly related to the problem we’re trying to solve. + +- **[CH]** Sheung Hun Cheng and Nicholas J. Higham. [“A Modified Cholesky Algorithm Based on a Symmetric Indefinite Factorization.”](https://doi.org/10.1137/S0895479896302898) _SIAM Journal on Matrix Analysis and Applications_ 19(4), 1998. +- **[GMW]** Philip E. Gill, Walter Murray and Margaret H. Wright. [_Practical Optimization._](https://doi.org/10.1137/1.9781611975604) Academic Press, 1981. +- **[SE90]** Robert B. Schnabel and Elizabeth Eskow. [“A New Modified Cholesky Factorization.”](https://doi.org/10.1137/0911064) _SIAM Journal on Scientific and Statistical Computing_ 11(6), 1990. +- **[SE99]** Robert B. Schnabel and Elizabeth Eskow. [“A New Modified Cholesky Factorization.”](https://doi.org/10.1137/0911064) _SIAM Journal on Optimization_ 9(4), 1999. \ No newline at end of file