Bring up subtleties of regularization and frozen entries

Vectornaut 2025-01-27 07:29:36 +00:00
parent bdd7926ce2
commit 00d7d1b369

@ -119,11 +119,14 @@ The minimization routine is implemented in [`engine.rs`](../src/branch/main/app-
2. Find the Hessian $H(f) := d\operatorname{grad}(f)$, as described in "The second derivative of the loss function."
* Recall that we express $H(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$.
3. If the Hessian isn't positive-definite, make it positive definite by adding $-c \lambda_\text{min}$, where $\lambda_\text{min}$ is its lowest eigenvalue and $c > 1$ is a parameter of the minimization routine. In other words, find the regularized Hessian
$$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \text{otherwise} \end{cases}.$$
$$H_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \lambda_\text{min} \le 0 \end{cases}.$$
* The parameter $c$ is passed to `realize_gram` as the argument `reg_scale`.
* Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, but we don't bother.
4. Find the base step $u$, which is defined by the property that $-\operatorname{grad}(f) = H(f)\,u$.
5. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
* When $\lambda_\text{min}$ is exactly zero, our regularization doesn't do anything, so $H_\text{reg}(f)$ isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of $\|\operatorname{grad}(f)\|$, which takes care of this problem.
4. Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of $\operatorname{End}(\mathbb{R}^n)$.
* For this write-up, we'll write the projection as $\mathcal{Q}$.
5. Find the base step $u \in \operatorname{End}(\mathbb{R}^n)$, which is defined by two properties: satisfying the equation $-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,u$ and being orthogonal to the frozen subspace.
* When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
6. Backtrack by reducing the step size until we find a step that reduces the loss at a good fraction of the maximum possible rate.
### Reconstructing a rigid subassembly