Bring up subtleties of regularization and frozen entries

2025-01-27 07:29:36 +00:00 · 2025-01-27 07:29:36 +00:00 · 00d7d1b369
commit 00d7d1b369
parent bdd7926ce2
1 changed files with 7 additions and 4 deletions
--- a/Gram-matrix-parameterization.md
+++ b/Gram-matrix-parameterization.md
@ -119,11 +119,14 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
   2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
      * Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
   3. If the Hessian isn't positive-definite, make it positive definite by adding $-c \lambda_\text{min}$, where $\lambda_\text{min}$ is its lowest eigenvalue and $c > 1$ is a parameter of the minimization routine. In other words, find the regularized Hessian
-      $$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \text{otherwise} \end{cases}.$$
+      $$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \lambda_\text{min} \le 0 \end{cases}.$$
      * The parameter $c$ is passed to `realize_gram` as the argument `reg_scale`.
-      * Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, but we don't bother.
+      * When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
-   4. Find the base step $u$, which is defined by the property that $-\operatorname{grad}(f) = H(f)\,u$.
+   4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
-   5. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
+      * For this write-up, we'll write the projection as $\mathcal{Q}$.
   5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
      * When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
   6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
 ### Reconstructing a rigid subassembly