11 Gram matrix parameterization
Vectornaut edited this page 2024-11-17 05:35:24 +00:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Linear representation of constructions and constraints

Construction elements as vectors

Take a 5d vector space V with a bilinear form (\_\!\_, \_\!\_) of signature ++++-, which we'll call the Lorentz form. In inversive coordinates, points and generalized spheres are represented, respectively, by timelike and spacelike vectors in V. If we normalize these vectors to pseudo-length \pm 1, and choose a vector on the lightlike 1d subspace representing the point at infinity, a lot of the constraints we care about can be expressed by fixing the Lorentz products between vectors.

Constraints as Gram matrix entries

The vectors a_1, \ldots, a_n \in V representing the elements of our construction can be encoded in the linear map A \colon \mathbb{R}^n \to V with Ae_j = a_j for each of the standard basis vectors e_1, \ldots, e_n. We can then express constraints by fixing elements of the Gram matrix G = A^\dagger A, where \dagger is the adjoint with respect to the inner product \langle\_\!\_, \_\!\_\rangle on \mathbb{R}^n and the Lorentz form. Since the inner product and the Lorentz form are both non-degenerate, the rank of G matches the dimension of the span of a_1, \ldots, a_n.

The symmetric bilinear form \langle\_\!\_, G\_\!\_\rangle is the pullback of the Lorentz form along A:

\begin{align*} \langle\_\!\_, G\_\!\_\rangle & = \langle\_\!\_, A^\dagger A\_\!\_\rangle \\ & = (A\_\!\_, A\_\!\_). \end{align*}

Coordinate computations

When we choose a basis for V, represented as an identification V \cong \mathbb{R}^5, the Lorentz form gets encoded in the matrix Q \colon \mathbb{R}^5 \to \mathbb{R}^5 defined by the relation (\_\!\_, \_\!\_) = \langle\_\!\_, Q\_\!\_\rangle. Using the fact that A^\dagger = A^\top Q, where ^\top is the adjoint with respect to the inner product \langle\_\!\_, \_\!\_\rangle on both \mathbb{R}^n and V \cong \mathbb{R}^5, we can express the Gram matrix G = A^\top Q A in terms of positive-signature matrix operations. This expression is convenient for programming, because linear algebra packages tend to be written for positive signature.

Reconstructing a construction from its Gram matrix

Reconstruction as low-rank factorization

Overview

We saw above that when we represent constructions as linear maps A \colon \R^n \to V, many constraints can be expressed by fixing entries of the Gram matrix A^\dagger A, which can be written more conveniently as A^\top Q A when we choose an identification V \cong \R^5. This is an example of a low-rank factorization problem. In this section, we'll express the problem in a way that's useful for numerical computations.

We'll find a non-negative function f on the space of linear maps \mathbb{R}^n \to V which vanishes at the matrices representing constructions that satisfy the constraints. Finding a global minimum of the loss function f will be equivalent to finding a construction that satisfies the constraints. The loss function will be a quadratic polynomial in matrix entries, and its first and second derivatives will be straightforward to compute using a typical linear algebra package. This makes it easy to search for minima using Newton's method with backtracking.

The Frobenius product

For any dimensions n and m, the inner products on \mathbb{R}^n and \mathbb{R}^m induce an inner product \langle\!\langle X, Y \rangle\!\rangle = \operatorname{tr}(X^\top Y) on the space of matrices \mathbb{R}^n \to \mathbb{R}^m. We'll call this inner product the Frobenius product, because the associated norm, defined by \|X\|^2 = \langle\!\langle X, X \rangle\!\rangle, is called the Frobenius norm. The entry-wise dot product of X and Y gives \langle\!\langle X, Y \rangle\!\rangle, so the sum of the squares of the entries of X gives \|X\|^2. More generally, in any orthonormal basis v_1, \ldots, v_n for \mathbb{R}^n we have

 \langle\!\langle X, Y \rangle\!\rangle = \sum_{k \in [n]} \langle X v_k, Y v_k \rangle. 

The loss function

Consider the matrices \mathbb{R}^n \to \mathbb{R}^n whose entries vanish at the indices where the Gram matrix is unconstrained. They form a subspace C of the matrix space \operatorname{End}(\mathbb{R}^n). Let \mathcal{P} \colon \operatorname{End}(\mathbb{R}^n) \to C be the orthogonal projection with respect to the Frobenius product. The constrained entries of the Gram matrix can be expressed uniquely as a matrix G \in C, and the linear maps A \colon \mathbb{R}^n \to V that satisfy the constraints form the zero set of the non-negative function

 f = \|G - \mathcal{P}(A^\top Q A)\|^2

on \operatorname{Hom}(\mathbb{R}^n, V). Finding a global minimum of the loss function f is thus equivalent to finding a construction that satisfies the constraints.

The first derivative of the loss function

Write the loss function as


\begin{align*}
f & = \|\Delta\|^2 \\
& = \operatorname{tr}(\Delta^\top \Delta),
\end{align*}

where \Delta = G - \mathcal{P}(A^\top Q A). Differentiate both sides and simplify the result using the transpose-invariance of the trace:


\begin{align*}
df & = \operatorname{tr}(d\Delta^\top \Delta) + \operatorname{tr}(\Delta^\top d\Delta) \\
& = 2\operatorname{tr}(\Delta^\top d\Delta).
\end{align*}

To compute d\Delta, it will be helpful to write the projection operator \mathcal{P} more explicitly. Let \mathcal{C} be the set of indices where the Gram matrix is unconstrained. We can express C as the span of the elementary matrices \{E_{ij}\}_{(i, j) \in \mathcal{C}}. Observing that E_{ij} X^\top E_{ij} = X_{ij} E_{ij} for any matrix X, we can do orthogonal projection onto C using elementary matrices:

 \mathcal{P}(X) = \sum_{(i, j) \in \mathcal{C}} E_{ij} X^\top E_{ij}. 

It follows that


\begin{align*}
d\mathcal{P}(X) & = \sum_{(i, j) \in \mathcal{C}} E_{ij}\,dX^\top E_{ij} \\
& = \mathcal{P}(dX).
\end{align*}

Since the subspace C is transpose-invariant, we also have

 \mathcal{P}(X^\top) = \mathcal{P}(X)^\top. 

We can now see that


\begin{align*}
d\Delta & = -\mathcal{P}(dA^\top Q A + A^\top Q\,dA) \\
& = -\big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big].
\end{align*}

Plugging this into our formula for df, and recalling that \Delta is symmetric, we get


\begin{align*}
df & = 2\operatorname{tr}(-\Delta^\top \big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big]) \\
& = -2\operatorname{tr}\left(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top + \Delta^\top \mathcal{P}(A^\top Q\,dA)\big]\right) \\
& = -2\left[\operatorname{tr}\big(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top\big) + \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big)\right] \\
& = -4 \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big),
\end{align*}

using the transpose-invariance and cyclic property of the trace in the final step. Writing the projection in terms of elementary matrices, we learn that


\begin{align*}
df & = -4 \operatorname{tr}\left(\Delta^\top \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} (A^\top Q\,dA)^\top E_{ij} \right] \right) \\
& = -4 \operatorname{tr}\left(\sum_{(i, j) \in \mathcal{C}} \Delta^\top E_{ij}\,dA^\top Q A E_{ij}\right) \\
& = -4 \operatorname{tr}\left(dA^\top Q A \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} \Delta^\top E_{ij} \right]\right) \\
& = -4 \operatorname{tr}\big(dA^\top Q A\,\mathcal{P}(\Delta)\big) \\
& = \langle\!\langle dA,\,-4 Q A\,\mathcal{P}(\Delta) \rangle\!\rangle.
\end{align*}

From here, we get a nice matrix expression for the negative gradient of the loss function:

 -\operatorname{grad}(f) = 4 Q A\,\mathcal{P}(\Delta). 

This matrix is stored as neg_grad in the Rust and Julia implementations of the realization routine.

The second derivative of the loss function

Writeup in progress. Implemented in app-proto/src/engine.rs and engine-proto/gram-test/Engine.jl.

Finding minima

Writeup in progress. Implemented in app-proto/src/engine.rs and engine-proto/gram-test/Engine.jl.

Reconstructing a rigid subassembly

Suppose we can find a set of vectors \{a_k\}_{k \in K} whose Lorentz products are all known. Restricting the Gram matrix to \mathbb{R}^K and projecting its output orthogonally onto \mathbb{R}^K gives a submatrix G_K \colon \mathbb{R}^K \to \mathbb{R}^K whose entries are all known. Suppose in addition that the set \{a_k\}_{k \in K} spans V, implying that G_K has rank five.

Write G_K as L_K U_K, where the matrix U_K is a row echelon form of G_K, and the the matrix L_K encodes the sequence of row operations that restores G_K. Since G_K has rank five, the first five rows of U_K are non-zero, and the rest are zero. Thus, the first five columns of L_K are non-zero, and the rest are zero. Since G_K is symmetric, we should have L_K = U_K^\top D_K, where D_K is a diagonal matrix that has one negative entry and four positive entries at the pivot positions on the diagonal.

Let A_K \colon \R^K \to V be the restriction of A. The first five rows of U_K form the matrix of A_K in some orthogonal basis for V. The matrix D_K describes the Lorentz form in this basis. By looking at D_K, it should be possible to rewrite U_K in our chosen basis for V.

Consistency constraints on the Gram matrix

Since G has rank five, every six-by-six minor is zero. Some of the entries of G are known, because they're specified by constraints. Let's treat some of the unknown entries as independent variables, and the rest as dependent variables. Whenever we find a six-by-six submatrix where one entry is dependent and the rest are known or independent, knowing that the corresponding minor is zero gives us a quadratic equation for the dependent variable. Treating that dependent variable as "1-knowable" may provide further six-by-six submatrices where one entry is dependent and the rest are known, independent, or 1-knowable. The resulting quadratic equation for the dependent variable makes that variable "2-knowable", and the process continues.

For G to be realizable as a Gram matrix, it's necessary for each dependent variable to be real. Whenever we have a quadratic equation for a dependent variable, the discriminant of the equation gives us a consistency constraint inequality. The complexity of the consistency constraint for an $n$-knowable variable gets more complicated as n increases.

Choosing dependent variables