Finish discussing modified Cholesky decompositions

Vectornaut 2025-11-05 21:01:27 +00:00
parent 85668021c5
commit 1e34d4f0d6

@ -23,6 +23,14 @@ We saw above that the vanishingly rare gradient descent paths that lead to saddl
Uniform regularization can be seen as an interpolation between Newtons method and gradient descent, which kicks in when lowest eigenvalue of the Hessian drops below zero and brings the search direction closer to the gradient descent direction as the lowest eigenvalue gets more negative. Since the Hessian is indefinite near a saddle point, Newtons method with uniform regularization should act at least sort of like gradient descent near a saddle point. This suggests that it could get bogged down near saddle points in the same way.
### Flightiness
_To be added_
### Gratuitous symmetry-breaking
_To be added_
## Methods
### Newtons method
@ -87,6 +95,17 @@ _To be added_
#### Modified Cholesky decomposition
Recall from [above](#Review) that once we express the first and second derivatives of $f$ as a vector $F^{(1)}_p \in \mathbb{R}^n$ and a matrix $F^{(2)}_p \in \operatorname{End}(\mathbb{R}^n)$ with respect to a computational basis $\mathbb{R}^n \to V$, we can find the Newton step $v$ at $p$ by solving the equation $F^{(1)}_p + F^{(2)}_p v = 0$. More abstractly, if we express the first and second derivatives of $f$ as a vector $\tilde{F}^{(1)}_p \in V$ and an operator $\tilde{F}^{(2)}_p \in \operatorname{End}(V)$ with respect to a chosen inner product on $V$, as discussed [above](#Uniform_regularization), we can find the Newton step by solving the equation $\tilde{F}^{(1)}_p + \tilde{F}^{(2)}_p v = 0$. When $f^{(2)}_p$ is positive-definite, taking the Cholesky decomposition of $\tilde{F}^{(2)}_p$ with respect to the standard inner product on $\mathbb{R}^n$ provides an efficient and numerically stable solution method. Since the Newton step doesnt depend on the choice of inner product, we typically use the inner product given by the computational basis.
Recall from [above](#review) that once we express the first and second derivatives of $f$ as a vector $F^{(1)}_p \in \mathbb{R}^n$ and a matrix $F^{(2)}_p \in \operatorname{End}(\mathbb{R}^n)$ with respect to a computational basis $\mathbb{R}^n \to V$, we can find the Newton step $v$ at $p$ by solving the equation $F^{(1)}_p + F^{(2)}_p v = 0$. More abstractly, if we express the first and second derivatives of $f$ as a vector $\tilde{F}^{(1)}_p \in V$ and an operator $\tilde{F}^{(2)}_p \in \operatorname{End}(V)$ with respect to a chosen inner product $(\_\!\_, \_\!\_)$ on $V$, as discussed [above](#uniform-regularization), we can find the Newton step by solving the equation $\tilde{F}^{(1)}_p + \tilde{F}^{(2)}_p v = 0$. When $f^{(2)}_p$ is positive-definite, taking the Cholesky decomposition of $\tilde{F}^{(2)}_p$ with respect to the standard inner product on $\mathbb{R}^n$ provides an efficient and numerically stable solution method. Since the Newton step doesnt depend on the choice of inner product, we typically use the inner product given by the computational basis.
_To be continued, citing [NW, §3.4]_
When $f^{(2)}_p$ isnt guaranteed to be positive-definite, we can use a modified Cholesky decomposition to solve an implicitly regularized Newton step equation
```math
\tilde{F}^{(1)}_p + \big[\tilde{F}^{(2)}_p + E\big] v = 0,
```
where the modification $E \in \operatorname{End}(V)$ is symmetric with respect to $(\_\!\_, \_\!\_)$ and balances some notion of smallness against the requirement that $\tilde{F}^{(2)}_p + E$ be safely positive-definite [CH, §1][NW, §3.4]. The hallmark of a modified Cholesky decomposition is that it directly produces a Cholesky decomposition of a pivoted version of $\tilde{F}^{(2)}_p + E$, with no need to produce $E$ as an intermediate step, and its speed is comparable to the ordinary Cholesky decomposition [CH, §1].
The [`modcholesky`](https://argmin-rs.github.io/modcholesky/modcholesky/) crate implements the modified Cholesky decompositions from [GMW], [SE90], and [SE99]. In our application, they tend to exhibit [flightiness](#flightiness) and [gratuitous symmetry-breaking](#gratuitous-symmetry-breaking). The latter might be caused by pivoting, which differentiates between the elements of an arbitrary orthonormal basis in a way that isnt directly related to the problem were trying to solve.
- **[CH]** Sheung Hun Cheng and Nicholas J. Higham. [“A Modified Cholesky Algorithm Based on a Symmetric Indefinite Factorization.”](https://doi.org/10.1137/S0895479896302898) _SIAM Journal on Matrix Analysis and Applications_ 19(4), 1998.
- **[GMW]** Philip E. Gill, Walter Murray and Margaret H. Wright. [_Practical Optimization._](https://doi.org/10.1137/1.9781611975604) Academic Press, 1981.
- **[SE90]** Robert B. Schnabel and Elizabeth Eskow. [“A New Modified Cholesky Factorization.”](https://doi.org/10.1137/0911064) _SIAM Journal on Scientific and Statistical Computing_ 11(6), 1990.
- **[SE99]** Robert B. Schnabel and Elizabeth Eskow. [“A New Modified Cholesky Factorization.”](https://doi.org/10.1137/0911064) _SIAM Journal on Optimization_ 9(4), 1999.