Describe the backtracking routine
parent
00d7d1b369
commit
aa08be4cbf
1 changed files with 9 additions and 3 deletions
|
@ -114,7 +114,7 @@ We minimize the loss function using a cheap imitation of Ueda and Yamashita's re
|
||||||
|
|
||||||
The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-proto/src/engine.rs). (In the old Julia prototype of the engine, it's in [`Engine.jl`](../src/branch/main/engine-proto/gram-test/Engine.jl).) It works like this.
|
The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-proto/src/engine.rs). (In the old Julia prototype of the engine, it's in [`Engine.jl`](../src/branch/main/engine-proto/gram-test/Engine.jl).) It works like this.
|
||||||
|
|
||||||
1. Do Newton steps, as described below, until either the loss gets tolerably close to zero or we reach the maximum allowed number of steps.
|
1. Do Newton steps, as described below, until the loss gets tolerably close to zero. Fail out if we reach the maximum allowed number of descent steps.
|
||||||
1. Find $-\operatorname{grad}(f)$, as described in "The first derivative of the loss function."
|
1. Find $-\operatorname{grad}(f)$, as described in "The first derivative of the loss function."
|
||||||
2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
|
2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
|
||||||
* Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
|
* Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
|
||||||
|
@ -124,9 +124,15 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
|
||||||
* When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
|
* When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
|
||||||
4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
|
4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
|
||||||
* For this write-up, we'll write the projection as $\mathcal{Q}$.
|
* For this write-up, we'll write the projection as $\mathcal{Q}$.
|
||||||
5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
|
5. Find the base step $s_\text{base} \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,s_\text{base}$ and being orthogonal to the frozen subspace.
|
||||||
* When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
|
* When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
|
||||||
6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
|
6. Backtrack by reducing the step size, as described below, until we find a step that reduces the loss at a good fraction of the maximum possible rate. Fail out if we reach the maximum allowed number of backtracking steps.
|
||||||
|
1. Find the change in loss that we would get from the step $s$ under consideration. At the beginning of the loop, $s$ is set to $s_\text{base}$.
|
||||||
|
2. The definition of the derivative tells us that by making $s$ is small enough, we should be able to bring the change in loss as close as we want to $\langle -\operatorname{grad}(f), s \rangle$
|
||||||
|
3. If the change in loss is more negative than $\alpha \langle -\operatorname{grad}(f), s \rangle$, where $\alpha \in (0, 1)$ is a parameter of the minimization routine, we're done: take the step $s$
|
||||||
|
* The parameter $\alpha$ is passed to `realize_gram` as the argument `min_efficiency`.
|
||||||
|
4. Otherwise, multiply the step by the back-off parameter $\beta \in (0, 1)$.
|
||||||
|
* This parameter is passed to `realize_gram` as the argument `backoff`.
|
||||||
|
|
||||||
### Reconstructing a rigid subassembly
|
### Reconstructing a rigid subassembly
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue