Bring up subtleties of regularization and frozen entries

2025-01-27 07:29:36 +00:00 · 2025-01-27 07:29:36 +00:00 · 00d7d1b369
commit 00d7d1b369
parent bdd7926ce2
1 changed files with 7 additions and 4 deletions
--- a/Gram-matrix-parameterization.md
+++ b/Gram-matrix-parameterization.md
@ -119,11 +119,14 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
   2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
      * Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
   3. If the Hessian isn't positive-definite, make it positive definite by adding $-c \lambda_\text{min}$, where $\lambda_\text{min}$ is its lowest eigenvalue and $c > 1$ is a parameter of the minimization routine. In other words, find the regularized Hessian
-      $$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \text{otherwise} \end{cases}.$$
+      $$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \lambda_\text{min} \le 0 \end{cases}.$$
      * The parameter $c$ is passed to `realize_gram` as the argument `reg_scale`.
-      * Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, but we don't bother.
-   4. Find the base step $u$, which is defined by the property that $-\operatorname{grad}(f) = H(f)\,u$.
-   5. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
+      * When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
+   4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
+      * For this write-up, we'll write the projection as $\mathcal{Q}$.
+   5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
+      * When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
+   6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.

 ### Reconstructing a rigid subassembly