diff --git a/Gram-matrix-parameterization.md b/Gram-matrix-parameterization.md index 31ee5db..3e6407e 100644 --- a/Gram-matrix-parameterization.md +++ b/Gram-matrix-parameterization.md @@ -114,7 +114,7 @@ We minimize the loss function using a cheap imitation of Ueda and Yamashita's re The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-proto/src/engine.rs). (In the old Julia prototype of the engine, it's in [`Engine.jl`](../src/branch/main/engine-proto/gram-test/Engine.jl).) It works like this. -1. Do Newton steps, as described below, until either the loss gets tolerably close to zero or we reach the maximum allowed number of steps. +1. Do Newton steps, as described below, until the loss gets tolerably close to zero. Fail out if we reach the maximum allowed number of descent steps. 1. Find $-\operatorname{grad}(f)$, as described in "The first derivative of the loss function." 2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function." * Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$. @@ -124,9 +124,15 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app- * When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem. 4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$. * For this write-up, we'll write the projection as $\mathcal{Q}$. - 5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace. + 5. Find the base step $s_\text{base} \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,s_\text{base}$ and being orthogonal to the frozen subspace. * When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties. - 6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate. + 6. Backtrack by reducing the step size, as described below, until we find a step that reduces the loss at a good fraction of the maximum possible rate. Fail out if we reach the maximum allowed number of backtracking steps. + 1. Find the change in loss that we would get from the step $s$ under consideration. At the beginning of the loop, $s$ is set to $s_\text{base}$. + 2. The definition of the derivative tells us that by making $s$ is small enough, we should be able to bring the change in loss as close as we want to $\langle -\operatorname{grad}(f), s \rangle$ + 3. If the change in loss is more negative than $\alpha \langle -\operatorname{grad}(f), s \rangle$, where $\alpha \in (0, 1)$ is a parameter of the minimization routine, we're done: take the step $s$ + * The parameter $\alpha$ is passed to `realize_gram` as the argument `min_efficiency`. + 4. Otherwise, multiply the step by the back-off parameter $\beta \in (0, 1)$. + * This parameter is passed to `realize_gram` as the argument `backoff`. ### Reconstructing a rigid subassembly