Table of Contents
- Linear representation of elements and constraints
- Reconstructing an assembly from its Gram matrix
- Reconstruction as low-rank factorization
- Overview
- The Frobenius product
- The loss function
- The first derivative of the loss function
- The second derivative of the loss function
- Finding minima
- Other minimization methods
- Reconstructing a rigid subassembly
- Consistency constraints on the Gram matrix
- Choosing dependent variables
Linear representation of elements and constraints
Elements as vectors
Take a 5d vector space V with a bilinear form (\_\!\_, \_\!\_) of signature ++++-, which we'll call the Lorentz form. In inversive coordinates, points and generalized spheres are represented, respectively, by timelike and spacelike vectors in V. If we normalize these vectors to pseudo-length \pm 1, and choose a vector on the lightlike 1d subspace representing the point at infinity, a lot of the constraints we care about can be expressed by fixing the Lorentz products between vectors.
Constraints as Gram matrix entries
The vectors a_1, \ldots, a_n \in V representing the elements of our assembly can be encoded in the linear map A \colon \mathbb{R}^n \to V with Ae_j = a_j for each of the standard basis vectors e_1, \ldots, e_n. We can then express constraints by fixing elements of the Gram matrix G = A^\dagger A, where \dagger is the adjoint with respect to the inner product \langle\_\!\_, \_\!\_\rangle on \mathbb{R}^n and the Lorentz form. Since the inner product and the Lorentz form are both non-degenerate, the rank of G matches the dimension of the span of a_1, \ldots, a_n.
The symmetric bilinear form \langle\_\!\_, G\_\!\_\rangle is the pullback of the Lorentz form along A:
\begin{align*} \langle\_\!\_, G\_\!\_\rangle & = \langle\_\!\_, A^\dagger A\_\!\_\rangle \\ & = (A\_\!\_, A\_\!\_). \end{align*}
Coordinate computations
When we choose a basis for V, represented as an identification V \cong \mathbb{R}^5, the Lorentz form gets encoded in the matrix Q \colon \mathbb{R}^5 \to \mathbb{R}^5 defined by the relation (\_\!\_, \_\!\_) = \langle\_\!\_, Q\_\!\_\rangle. Using the fact that A^\dagger = A^\top Q, where ^\top is the adjoint with respect to the inner product \langle\_\!\_, \_\!\_\rangle on both \mathbb{R}^n and V \cong \mathbb{R}^5, we can express the Gram matrix G = A^\top Q A in terms of positive-signature matrix operations. This expression is convenient for programming, because linear algebra packages tend to be written for positive signature.
Reconstructing an assembly from its Gram matrix
Reconstruction as low-rank factorization
Overview
We saw above that when we represent assemblies as linear maps A \colon \R^n \to V, many constraints can be expressed by fixing entries of the Gram matrix A^\dagger A, which can be written more conveniently as A^\top Q A when we choose an identification V \cong \R^5. This is an example of a low-rank factorization problem. In this section, we'll express the problem in a way that's useful for numerical computations.
We'll find a non-negative function f on the space of linear maps \mathbb{R}^n \to V which vanishes at the matrices representing assemblies that satisfy the constraints. Finding a global minimum of the loss function f will be equivalent to finding an assembly that satisfies the constraints. The loss function will be a quadratic polynomial in matrix entries, and its first and second derivatives will be straightforward to compute using a typical linear algebra package. This makes it easy to search for minima using Newton's method with backtracking.
The Frobenius product
For any dimensions n and m, the inner products on \mathbb{R}^n and \mathbb{R}^m induce an inner product \langle\!\langle X, Y \rangle\!\rangle = \operatorname{tr}(X^\top Y) on the space of matrices \mathbb{R}^n \to \mathbb{R}^m. We'll call this inner product the Frobenius product, because the associated norm, defined by \|X\|^2 = \langle\!\langle X, X \rangle\!\rangle, is called the Frobenius norm. The entry-wise dot product of X and Y gives \langle\!\langle X, Y \rangle\!\rangle, so the sum of the squares of the entries of X gives \|X\|^2. More generally, in any orthonormal basis v_1, \ldots, v_n for \mathbb{R}^n we have
\langle\!\langle X, Y \rangle\!\rangle = \sum_{k \in [n]} \langle X v_k, Y v_k \rangle.
The loss function
Consider the matrices \mathbb{R}^n \to \mathbb{R}^n whose entries vanish at the indices where the Gram matrix is unconstrained. They form a subspace C of the matrix space \operatorname{End}(\mathbb{R}^n). Let \mathcal{P} \colon \operatorname{End}(\mathbb{R}^n) \to C be the orthogonal projection with respect to the Frobenius product. The constrained entries of the Gram matrix can be expressed uniquely as a matrix G \in C, and the linear maps A \colon \mathbb{R}^n \to V that satisfy the constraints form the zero set of the non-negative function
f = \|G - \mathcal{P}(A^\top Q A)\|^2
on \operatorname{Hom}(\mathbb{R}^n, V). Finding a global minimum of the loss function f is thus equivalent to finding an assembly that satisfies the constraints.
The first derivative of the loss function
Write the loss function as
\begin{align*}
f & = \|\Delta\|^2 \\
& = \operatorname{tr}(\Delta^\top \Delta),
\end{align*}
where \Delta = G - \mathcal{P}(A^\top Q A). Differentiate both sides and simplify the result using the transpose-invariance of the trace:
\begin{align*}
df & = \operatorname{tr}(d\Delta^\top \Delta) + \operatorname{tr}(\Delta^\top d\Delta) \\
& = 2\operatorname{tr}(\Delta^\top d\Delta).
\end{align*}
To compute d\Delta, it will be helpful to write the projection operator \mathcal{P} more explicitly. Let \mathcal{C} be the set of indices where the Gram matrix is unconstrained. We can express C as the span of the standard basis matrices \{E_{ij}\}_{(i, j) \in \mathcal{C}}. Observing that E_{ij} X^\top E_{ij} = X_{ij} E_{ij} for any matrix X, we can do orthogonal projection onto C using standard basis matrices:
\mathcal{P}(X) = \sum_{(i, j) \in \mathcal{C}} E_{ij} X^\top E_{ij}.
It follows that
\begin{align*}
d\mathcal{P}(X) & = \sum_{(i, j) \in \mathcal{C}} E_{ij}\,dX^\top E_{ij} \\
& = \mathcal{P}(dX).
\end{align*}
Since the subspace C is transpose-invariant, we also have
\mathcal{P}(X^\top) = \mathcal{P}(X)^\top.
We can now see that
\begin{align*}
d\Delta & = -\mathcal{P}(dA^\top Q A + A^\top Q\,dA) \\
& = -\big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big].
\end{align*}
Plugging this into our formula for df, and recalling that \Delta is symmetric, we get
\begin{align*}
df & = 2\operatorname{tr}(-\Delta^\top \big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big]) \\
& = -2\operatorname{tr}\left(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top + \Delta^\top \mathcal{P}(A^\top Q\,dA)\big]\right) \\
& = -2\left[\operatorname{tr}\big(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top\big)
+ \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big)\right] \\
& = -4 \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big),
\end{align*}
using the transpose-invariance and cyclic property of the trace in the final step. Writing the projection in terms of standard basis matrices, we learn that
\begin{align*}
df & = -4 \operatorname{tr}\left(\Delta^\top \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} (A^\top Q\,dA)^\top E_{ij} \right] \right) \\
& = -4 \operatorname{tr}\left(\sum_{(i, j) \in \mathcal{C}} \Delta^\top E_{ij}\,dA^\top Q A E_{ij}\right) \\
& = -4 \operatorname{tr}\left(dA^\top Q A \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} \Delta^\top E_{ij} \right]\right) \\
& = -4 \operatorname{tr}\big(dA^\top Q A\,\mathcal{P}(\Delta)\big) \\
& = \langle\!\langle dA,\,-4 Q A\,\mathcal{P}(\Delta) \rangle\!\rangle.
\end{align*}
From here, we get a nice matrix expression for the negative gradient of the loss function:
-\operatorname{grad}(f) = 4 Q A\,\mathcal{P}(\Delta).
This matrix is stored as neg_grad in the Rust and Julia implementations of the realization routine.
The second derivative of the loss function
Recalling that
-d\Delta = \mathcal{P}(dA^\top Q A + A^\top Q\,dA),
we can express the derivative of \operatorname{grad}(f) as
\begin{align*}
d\operatorname{grad}(f) & = -4 Q\,dA\,\mathcal{P}(\Delta) - 4 Q A\,\mathcal{P}(d\Delta) \\
& = 4 Q\big[{-dA}\,\mathcal{P}(\Delta) + A\,\mathcal{P}(-d\Delta)\big].
\end{align*}
In the Rust and Julia implementations of the realization routine, we express d\operatorname{grad}(f) as a matrix in the standard basis for
\operatorname{End}(\mathbb{R}^n). We apply the cotangent vector d\operatorname{grad}(f) to each standard basis matrix E_{ij} by setting the value of the matrix-valued 1-form dA to E_{ij}.
Finding minima
We minimize the loss function using a cheap imitation of Ueda and Yamashita's uniformly regularized Newton's method with backtracking [UY]. The minimization routine is implemented in engine.rs. (In the old Julia prototype of the engine, it's in Engine.jl.) It works like this.
- Do Newton steps, as described below, until the loss gets tolerably close to zero. Fail out if we reach the maximum allowed number of descent steps.
- Find
-\operatorname{grad}(f), as described in "The first derivative of the loss function." - Find the Hessian
H(f) := d\operatorname{grad}(f), as described in "The second derivative of the loss function."- Recall that we express
H(f)as a matrix in the standard basis for\operatorname{End}(\mathbb{R}^n).
- Recall that we express
- If the Hessian isn't positive-definite, make it positive definite by adding
-c \lambda_\text{min}, where\lambda_\text{min}is its lowest eigenvalue andc > 1is a parameter of the minimization routine. In other words, find the regularized HessianH_\text{reg}(f) := H(f) + \begin{cases}0 & \lambda_\text{min} > 0 \\ -c \lambda_\text{min} & \lambda_\text{min} \le 0 \end{cases}.- The parameter
cis passed torealize_gramas the argumentreg_scale. - When
\lambda_\text{min}is exactly zero, our regularization doesn't do anything, soH_\text{reg}(f)isn't actually positive-definite. Ueda and Yamashita add an extra regularization term that's proportional to a power of\|\operatorname{grad}(f)\|, which takes care of this problem.
- The parameter
- Project the negative gradient and the regularized Hessian onto the orthogonal complement of the frozen subspace of
\operatorname{End}(\mathbb{R}^n).- For this write-up, we'll write the projection as
\mathcal{Q}.
- For this write-up, we'll write the projection as
- Find the base step
s_\text{base} \in \operatorname{End}(\mathbb{R}^n), which is defined by two properties: satisfying the equation-\mathcal{Q} \operatorname{grad}(f) = H_\text{reg}(f)\,s_\text{base}and being orthogonal to the frozen subspace.- When we say in the code that we're "projecting" the regularized Hessian, we're really turning it into an operator that can be used to express both properties.
- Backtrack by reducing the step size, as described below, until we find a step that reduces the loss at a good fraction of the maximum possible rate. Fail out if we reach the maximum allowed number of backtracking steps.
- Find the change in loss that we would get from the step
sunder consideration. At the beginning of the loop,sis set tos_\text{base}. - The definition of the derivative tells us that by making
sis small enough, we should be able to bring the change in loss as close as we want to\langle -\operatorname{grad}(f), s \rangle - If the change in loss is more negative than
\alpha \langle -\operatorname{grad}(f), s \rangle, where\alpha \in (0, 1)is a parameter of the minimization routine, we're done: take the steps- The parameter
\alphais passed torealize_gramas the argumentmin_efficiency.
- The parameter
- Otherwise, multiply the step by the back-off parameter
\beta \in (0, 1).- This parameter is passed to
realize_gramas the argumentbackoff.
- This parameter is passed to
- Find the change in loss that we would get from the step
- Find
Other minimization methods
Ge, Chou, and Gao have written about using optimization methods to solve 2d geometric constraint problems [GCG]. Instead of Newton's method, they use the BFGS method, which they say is less sensitive to the initial guess and more likely to arrive at a solution close to the initial guess.
Julia's Optim package uses an unconventional variant of Newton's method whose rationale is sketched in this issue discussion. It's based on the Cholesky-like factorization implemented in the PositiveFactorizations package.
- [GCG] Jian-Xin Ge, Shang-Ching Chou, and Xiao-Shan Gao. "Geometric Constraint Satisfaction Using Optimization Methods," 1999.
Reconstructing a rigid subassembly
Suppose we can find a set of vectors \{a_k\}_{k \in K} whose Lorentz products are all known. Restricting the Gram matrix to \mathbb{R}^K and projecting its output orthogonally onto \mathbb{R}^K gives a submatrix G_K \colon \mathbb{R}^K \to \mathbb{R}^K whose entries are all known. Suppose in addition that the set \{a_k\}_{k \in K} spans V, implying that G_K has rank five.
Write G_K as L_K U_K, where the matrix U_K is a row echelon form of G_K, and the the matrix L_K encodes the sequence of row operations that restores G_K. Since G_K has rank five, the first five rows of U_K are non-zero, and the rest are zero. Thus, the first five columns of L_K are non-zero, and the rest are zero. Since G_K is symmetric, we should have L_K = U_K^\top D_K, where D_K is a diagonal matrix that has one negative entry and four positive entries at the pivot positions on the diagonal.
Let A_K \colon \R^K \to V be the restriction of A. The first five rows of U_K form the matrix of A_K in some orthogonal basis for V. The matrix D_K describes the Lorentz form in this basis. By looking at D_K, it should be possible to rewrite U_K in our chosen basis for V.
Consistency constraints on the Gram matrix
Since G has rank five, every six-by-six minor is zero. Some of the entries of G are known, because they're specified by constraints. Let's treat some of the unknown entries as independent variables, and the rest as dependent variables. Whenever we find a six-by-six submatrix where one entry is dependent and the rest are known or independent, knowing that the corresponding minor is zero gives us a quadratic equation for the dependent variable. Treating that dependent variable as "1-knowable" may provide further six-by-six submatrices where one entry is dependent and the rest are known, independent, or 1-knowable. The resulting quadratic equation for the dependent variable makes that variable "2-knowable", and the process continues.
For G to be realizable as a Gram matrix, it's necessary for each dependent variable to be real. Whenever we have a quadratic equation for a dependent variable, the discriminant of the equation gives us a consistency constraint inequality. The complexity of the consistency constraint for an $n$-knowable variable gets more complicated as n increases.