From aa08be4cbf874745f358cefcc53a781aa2853e6b Mon Sep 17 00:00:00 2001
From: Vectornaut <vectornaut@nobody@nowhere.net>
Date: Mon, 27 Jan 2025 07:55:08 +0000
Subject: [PATCH] Describe the backtracking routine

---
 Gram-matrix-parameterization.md | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Gram-matrix-parameterization.md b/Gram-matrix-parameterization.md
index 31ee5db..3e6407e 100644
--- a/Gram-matrix-parameterization.md
+++ b/Gram-matrix-parameterization.md
@@ -114,7 +114,7 @@ We minimize the loss function using a cheap imitation of Ueda and Yamashita's re
 
 The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-proto/src/engine.rs). (In the old Julia prototype of the engine, it's in [`Engine.jl`](../src/branch/main/engine-proto/gram-test/Engine.jl).) It works like this.
 
-1. Do Newton steps, as described below, until either the loss gets tolerably close to zero or we reach the maximum allowed number of steps.
+1. Do Newton steps, as described below, until the loss gets tolerably close to zero. Fail out if we reach the maximum allowed number of descent steps.
    1. Find $-\operatorname{grad}(f)$, as described in "The first derivative of the loss function."
    2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
       * Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
@@ -124,9 +124,15 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
       * When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
    4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
       * For this write-up, we'll write the projection as $\mathcal{Q}$.
-   5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
+   5. Find the base step $s_\text{base} \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,s_\text{base}$ and being orthogonal to the frozen subspace.
       * When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
-   6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
+   6. Backtrack by reducing the step size, as described below, until we find a step that reduces the loss at a good fraction of the maximum possible rate. Fail out if we reach the maximum allowed number of backtracking steps.
+      1. Find the change in loss that we would get from the step $s$ under consideration. At the beginning of the loop, $s$ is set to $s_\text{base}$.
+      2. The definition of the derivative tells us that by making $s$ is small enough, we should be able to bring the change in loss as close as we want to $\langle -\operatorname{grad}(f), s \rangle$
+      3. If the change in loss is more negative than $\alpha \langle -\operatorname{grad}(f), s \rangle$, where $\alpha \in (0, 1)$ is a parameter of the minimization routine, we're done: take the step $s$
+         * The parameter $\alpha$ is passed to `realize_gram` as the argument `min_efficiency`.
+      4. Otherwise, multiply the step by the back-off parameter $\beta \in (0, 1)$.
+         * This parameter is passed to `realize_gram` as the argument `backoff`.
 
 ### Reconstructing a rigid subassembly