Describe the backtracking routine

2025-01-27 07:55:08 +00:00 · 2025-01-27 07:55:08 +00:00 · aa08be4cbf
commit aa08be4cbf
parent 00d7d1b369
1 changed files with 9 additions and 3 deletions
--- a/Gram-matrix-parameterization.md
+++ b/Gram-matrix-parameterization.md
@ -114,7 +114,7 @@ We minimize the loss function using a cheap imitation of Ueda and Yamashita's re

 The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-proto/src/engine.rs). (In the old Julia prototype of the engine, it's in [`Engine.jl`](../src/branch/main/engine-proto/gram-test/Engine.jl).) It works like this.

-1. Do Newton steps, as described below, until either the loss gets tolerably close to zero or we reach the maximum allowed number of steps.
+1. Do Newton steps, as described below, until the loss gets tolerably close to zero. Fail out if we reach the maximum allowed number of descent steps.
   1. Find $-\operatorname{grad}(f)$, as described in "The first derivative of the loss function."
   2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
      * Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
@ -124,9 +124,15 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
      * When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
   4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
      * For this write-up, we'll write the projection as $\mathcal{Q}$.
-   5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
+   5. Find the base step $s_\text{base} \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,s_\text{base}$ and being orthogonal to the frozen subspace.
      * When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
-   6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
+   6. Backtrack by reducing the step size, as described below, until we find a step that reduces the loss at a good fraction of the maximum possible rate. Fail out if we reach the maximum allowed number of backtracking steps.
+      1. Find the change in loss that we would get from the step $s$ under consideration. At the beginning of the loop, $s$ is set to $s_\text{base}$.
+      2. The definition of the derivative tells us that by making $s$ is small enough, we should be able to bring the change in loss as close as we want to $\langle -\operatorname{grad}(f), s \rangle$
+      3. If the change in loss is more negative than $\alpha \langle -\operatorname{grad}(f), s \rangle$, where $\alpha \in (0, 1)$ is a parameter of the minimization routine, we're done: take the step $s$
+         * The parameter $\alpha$ is passed to `realize_gram` as the argument `min_efficiency`.
+      4. Otherwise, multiply the step by the back-off parameter $\beta \in (0, 1)$.
+         * This parameter is passed to `realize_gram` as the argument `backoff`.

 ### Reconstructing a rigid subassembly