Describe the backtracking routine

Vectornaut 2025-01-27 07:55:08 +00:00
parent 00d7d1b369
commit aa08be4cbf

@ -114,7 +114,7 @@ We minimize the loss function using a cheap imitation of Ueda and Yamashita's re
The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-proto/src/engine.rs). (In the old Julia prototype of the engine, it's in [`Engine.jl`](../src/branch/main/engine-proto/gram-test/Engine.jl).) It works like this.
1. Do Newton steps, as described below, until either the loss gets tolerably close to zero or we reach the maximum allowed number of steps.
1. Do Newton steps, as described below, until the loss gets tolerably close to zero. Fail out if we reach the maximum allowed number of descent steps.
1. Find $-\operatorname{grad}(f)$, as described in "The first derivative of the loss function."
2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
* Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
@ -124,9 +124,15 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
* When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
* For this write-up, we'll write the projection as $\mathcal{Q}$.
5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
5. Find the base step $s_\text{base} \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,s_\text{base}$ and being orthogonal to the frozen subspace.
* When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
6. Backtrack by reducing the step size, as described below, until we find a step that reduces the loss at a good fraction of the maximum possible rate. Fail out if we reach the maximum allowed number of backtracking steps.
1. Find the change in loss that we would get from the step $s$ under consideration. At the beginning of the loop, $s$ is set to $s_\text{base}$.
2. The definition of the derivative tells us that by making $s$ is small enough, we should be able to bring the change in loss as close as we want to $\langle -\operatorname{grad}(f), s \rangle$
3. If the change in loss is more negative than $\alpha \langle -\operatorname{grad}(f), s \rangle$, where $\alpha \in (0, 1)$ is a parameter of the minimization routine, we're done: take the step $s$
* The parameter $\alpha$ is passed to `realize_gram` as the argument `min_efficiency`.
4. Otherwise, multiply the step by the back-off parameter $\beta \in (0, 1)$.
* This parameter is passed to `realize_gram` as the argument `backoff`.
### Reconstructing a rigid subassembly