This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Linear representation of constructions and constraints
Construction elements as vectors
Take a 5d vector space V
with a bilinear form (\_\!\_, \_\!\_)
of signature ++++-
, which we'll call the Lorentz form. In inversive coordinates, points and generalized spheres are represented, respectively, by timelike and spacelike vectors in V
. If we normalize these vectors to pseudo-length \pm 1
, and choose a vector on the lightlike 1d subspace representing the point at infinity, a lot of the constraints we care about can be expressed by fixing the Lorentz products between vectors.
Constraints as Gram matrix entries
The vectors a_1, \ldots, a_n \in V
representing the elements of our construction can be encoded in the linear map A \colon \mathbb{R}^n \to V
with Ae_j = a_j
for each of the standard basis vectors e_1, \ldots, e_n
. We can then express constraints by fixing elements of the Gram matrix G = A^\dagger A
, where \dagger
is the adjoint with respect to the inner product \langle\_\!\_, \_\!\_\rangle
on \mathbb{R}^n
and the Lorentz form. Since the inner product and the Lorentz form are both non-degenerate, the rank of G
matches the dimension of the span of a_1, \ldots, a_n
.
The symmetric bilinear form \langle\_\!\_, G\_\!\_\rangle
is the pullback of the Lorentz form along A
:
\begin{align*} \langle\_\!\_, G\_\!\_\rangle & = \langle\_\!\_, A^\dagger A\_\!\_\rangle \\ & = (A\_\!\_, A\_\!\_). \end{align*}
Coordinate computations
When we choose a basis for V
, represented as an identification V \cong \mathbb{R}^5
, the Lorentz form gets encoded in the matrix Q \colon \mathbb{R}^5 \to \mathbb{R}^5
defined by the relation (\_\!\_, \_\!\_) = \langle\_\!\_, Q\_\!\_\rangle
. Using the fact that A^\dagger = A^\top Q
, where ^\top
is the adjoint with respect to the inner product \langle\_\!\_, \_\!\_\rangle
on both \mathbb{R}^n
and V \cong \mathbb{R}^5
, we can express the Gram matrix G = A^\top Q A
in terms of positive-signature matrix operations. This expression is convenient for programming, because linear algebra packages tend to be written for positive signature.
Reconstructing a construction from its Gram matrix
Reconstruction as low-rank factorization
Overview
We saw above that when we represent constructions as linear maps A \colon \R^n \to V
, many constraints can be expressed by fixing entries of the Gram matrix A^\dagger A
, which can be written more conveniently as A^\top Q A
when we choose an identification V \cong \R^5
. This is an example of a low-rank factorization problem. In this section, we'll express the problem in a way that's useful for numerical computations.
We'll find a non-negative function f
on the space of linear maps \mathbb{R}^n \to V
which vanishes at the matrices representing constructions that satisfy the constraints. Finding a global minimum of the loss function f
will be equivalent to finding a construction that satisfies the constraints. The loss function will be a quadratic polynomial in matrix entries, and its first and second derivatives will be straightforward to compute using a typical linear algebra package. This makes it easy to search for minima using Newton's method with backtracking.
The Frobenius product
For any dimensions n
and m
, the inner products on \mathbb{R}^n
and \mathbb{R}^m
induce an inner product \langle\!\langle X, Y \rangle\!\rangle = \operatorname{tr}(X^\top Y)
on the space of matrices \mathbb{R}^n \to \mathbb{R}^m
. We'll call this inner product the Frobenius product, because the associated norm, defined by \|X\|^2 = \langle\!\langle X, X \rangle\!\rangle
, is called the Frobenius norm. The entry-wise dot product of X
and Y
gives \langle\!\langle X, Y \rangle\!\rangle
, so the sum of the squares of the entries of X
gives \|X\|^2
. More generally, in any orthonormal basis v_1, \ldots, v_n
for \mathbb{R}^n
we have
\langle\!\langle X, Y \rangle\!\rangle = \sum_{k \in [n]} \langle X v_k, Y v_k \rangle.
The loss function
Consider the matrices \mathbb{R}^n \to \mathbb{R}^n
whose entries vanish at the indices where the Gram matrix is unconstrained. They form a subspace C
of the matrix space \operatorname{End}(\mathbb{R}^n)
. Let \mathcal{P} \colon \operatorname{End}(\mathbb{R}^n) \to C
be the orthogonal projection with respect to the Frobenius product. The constrained entries of the Gram matrix can be expressed uniquely as a matrix G \in C
, and the linear maps A \colon \mathbb{R}^n \to V
that satisfy the constraints form the zero set of the non-negative function
f = \|G - \mathcal{P}(A^\top Q A)\|^2
on \operatorname{Hom}(\mathbb{R}^n, V)
. Finding a global minimum of the loss function f
is thus equivalent to finding a construction that satisfies the constraints.
The first derivative of the loss function
Write the loss function as
\begin{align*}
f & = \|\Delta\|^2 \\
& = \operatorname{tr}(\Delta^\top \Delta),
\end{align*}
where \Delta = G - \mathcal{P}(A^\top Q A)
. Differentiate both sides and simplify the result using the transpose-invariance of the trace:
\begin{align*}
df & = \operatorname{tr}(d\Delta^\top \Delta) + \operatorname{tr}(\Delta^\top d\Delta) \\
& = 2\operatorname{tr}(\Delta^\top d\Delta).
\end{align*}
To compute d\Delta
, it will be helpful to write the projection operator \mathcal{P}
more explicitly. Let \mathcal{C}
be the set of indices where the Gram matrix is unconstrained. We can express C
as the span of the standard basis matrices \{E_{ij}\}_{(i, j) \in \mathcal{C}}
. Observing that E_{ij} X^\top E_{ij} = X_{ij} E_{ij}
for any matrix X
, we can do orthogonal projection onto C
using standard basis matrices:
\mathcal{P}(X) = \sum_{(i, j) \in \mathcal{C}} E_{ij} X^\top E_{ij}.
It follows that
\begin{align*}
d\mathcal{P}(X) & = \sum_{(i, j) \in \mathcal{C}} E_{ij}\,dX^\top E_{ij} \\
& = \mathcal{P}(dX).
\end{align*}
Since the subspace C
is transpose-invariant, we also have
\mathcal{P}(X^\top) = \mathcal{P}(X)^\top.
We can now see that
\begin{align*}
d\Delta & = -\mathcal{P}(dA^\top Q A + A^\top Q\,dA) \\
& = -\big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big].
\end{align*}
Plugging this into our formula for df
, and recalling that \Delta
is symmetric, we get
\begin{align*}
df & = 2\operatorname{tr}(-\Delta^\top \big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big]) \\
& = -2\operatorname{tr}\left(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top + \Delta^\top \mathcal{P}(A^\top Q\,dA)\big]\right) \\
& = -2\left[\operatorname{tr}\big(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top\big) + \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big)\right] \\
& = -4 \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big),
\end{align*}
using the transpose-invariance and cyclic property of the trace in the final step. Writing the projection in terms of standard basis matrices, we learn that
\begin{align*}
df & = -4 \operatorname{tr}\left(\Delta^\top \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} (A^\top Q\,dA)^\top E_{ij} \right] \right) \\
& = -4 \operatorname{tr}\left(\sum_{(i, j) \in \mathcal{C}} \Delta^\top E_{ij}\,dA^\top Q A E_{ij}\right) \\
& = -4 \operatorname{tr}\left(dA^\top Q A \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} \Delta^\top E_{ij} \right]\right) \\
& = -4 \operatorname{tr}\big(dA^\top Q A\,\mathcal{P}(\Delta)\big) \\
& = \langle\!\langle dA,\,-4 Q A\,\mathcal{P}(\Delta) \rangle\!\rangle.
\end{align*}
From here, we get a nice matrix expression for the negative gradient of the loss function:
-\operatorname{grad}(f) = 4 Q A\,\mathcal{P}(\Delta).
This matrix is stored as neg_grad
in the Rust and Julia implementations of the realization routine.
The second derivative of the loss function
Recalling that
-d\Delta = \mathcal{P}(dA^\top Q A + A^\top Q\,dA),
we can express the derivative of \operatorname{grad}(f)
as
\begin{align*}
d\operatorname{grad}(f) & = -4 Q\,dA\,\mathcal{P}(\Delta) - 4 Q A\,\mathcal{P}(d\Delta) \\
& = 4 Q\big[{-dA}\,\mathcal{P}(\Delta) + A\,\mathcal{P}(-d\Delta)\big].
\end{align*}
In the Rust and Julia implementations of the realization routine, we express d\operatorname{grad}(f)
as a matrix in the standard basis for \operatorname{End}(\mathbb{R}^n)
. We apply the cotangent vector d\operatorname{grad}(f)
to each standard basis matrix E_{ij}
by setting the value of the matrix-valued 1-form dA
to E_{ij}
.
Finding minima
Writeup in progress. Implemented in app-proto/src/engine.rs
and engine-proto/gram-test/Engine.jl
.
Reconstructing a rigid subassembly
Suppose we can find a set of vectors \{a_k\}_{k \in K}
whose Lorentz products are all known. Restricting the Gram matrix to \mathbb{R}^K
and projecting its output orthogonally onto \mathbb{R}^K
gives a submatrix G_K \colon \mathbb{R}^K \to \mathbb{R}^K
whose entries are all known. Suppose in addition that the set \{a_k\}_{k \in K}
spans V
, implying that G_K
has rank five.
Write G_K
as L_K U_K
, where the matrix U_K
is a row echelon form of G_K
, and the the matrix L_K
encodes the sequence of row operations that restores G_K
. Since G_K
has rank five, the first five rows of U_K
are non-zero, and the rest are zero. Thus, the first five columns of L_K
are non-zero, and the rest are zero. Since G_K
is symmetric, we should have L_K = U_K^\top D_K
, where D_K
is a diagonal matrix that has one negative entry and four positive entries at the pivot positions on the diagonal.
Let A_K \colon \R^K \to V
be the restriction of A
. The first five rows of U_K
form the matrix of A_K
in some orthogonal basis for V
. The matrix D_K
describes the Lorentz form in this basis. By looking at D_K
, it should be possible to rewrite U_K
in our chosen basis for V
.
Consistency constraints on the Gram matrix
Since G
has rank five, every six-by-six minor is zero. Some of the entries of G
are known, because they're specified by constraints. Let's treat some of the unknown entries as independent variables, and the rest as dependent variables. Whenever we find a six-by-six submatrix where one entry is dependent and the rest are known or independent, knowing that the corresponding minor is zero gives us a quadratic equation for the dependent variable. Treating that dependent variable as "1-knowable" may provide further six-by-six submatrices where one entry is dependent and the rest are known, independent, or 1-knowable. The resulting quadratic equation for the dependent variable makes that variable "2-knowable", and the process continues.
For G
to be realizable as a Gram matrix, it's necessary for each dependent variable to be real. Whenever we have a quadratic equation for a dependent variable, the discriminant of the equation gives us a consistency constraint inequality. The complexity of the consistency constraint for an $n$-knowable variable gets more complicated as n
increases.