From 00d7d1b36986748eddb4a6667cf984571e22a2c4 Mon Sep 17 00:00:00 2001
From: Vectornaut <vectornaut@nobody@nowhere.net>
Date: Mon, 27 Jan 2025 07:29:36 +0000
Subject: [PATCH] Bring up subtleties of regularization and frozen entries

---
 Gram-matrix-parameterization.md | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/Gram-matrix-parameterization.md b/Gram-matrix-parameterization.md
index 61d1fe0..31ee5db 100644
--- a/Gram-matrix-parameterization.md
+++ b/Gram-matrix-parameterization.md
@@ -119,11 +119,14 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
    2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
       * Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
    3. If the Hessian isn't positive-definite, make it positive definite by adding $-c \lambda_\text{min}$, where $\lambda_\text{min}$ is its lowest eigenvalue and $c > 1$ is a parameter of the minimization routine. In other words, find the regularized Hessian
-      $$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \text{otherwise} \end{cases}.$$
+      $$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \lambda_\text{min} \le 0 \end{cases}.$$
       * The parameter $c$ is passed to `realize_gram` as the argument `reg_scale`.
-      * Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, but we don't bother.
-   4. Find the base step $u$, which is defined by the property that $-\operatorname{grad}(f) = H(f)\,u$.
-   5. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
+      * When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
+   4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
+      * For this write-up, we'll write the projection as $\mathcal{Q}$.
+   5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
+      * When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
+   6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
 
 ### Reconstructing a rigid subassembly