Find the first derivative of the loss function
parent
d62573f484
commit
977e495841
@ -33,12 +33,65 @@ For any dimensions $n$ and $m$, the inner products on $\mathbb{R}^n$ and $\mathb
|
||||
#### The loss function
|
||||
|
||||
Consider the matrices $\mathbb{R}^n \to \mathbb{R}^n$ whose entries vanish at the indices where the Gram matrix is unconstrained. They form a subspace $C$ of the matrix space $\operatorname{End}(\mathbb{R}^n)$. Let $\mathcal{P} \colon \operatorname{End}(\mathbb{R}^n) \to C$ be the orthogonal projection with respect to the Frobenius product. The constrained entries of the Gram matrix can be expressed uniquely as a matrix $G \in C$, and the linear maps $A \colon \mathbb{R}^n \to V$ that satisfy the constraints form the zero set of the non-negative function
|
||||
\[ f(A) = \|G - \mathcal{P}(A^\top Q A)\|^2\]
|
||||
\[ f = \|G - \mathcal{P}(A^\top Q A)\|^2\]
|
||||
on $\operatorname{Hom}(\mathbb{R}^n, V)$. Finding a global minimum of the *loss function* $f$ is thus equivalent to finding a construction that satisfies the constraints.
|
||||
|
||||
#### The first derivative of the loss function
|
||||
|
||||
*Writeup in progress. Implemented in `app-proto/src/engine.rs` and `engine-proto/gram-test/Engine.jl`.*
|
||||
Write the loss function as
|
||||
\[
|
||||
\begin{align*}
|
||||
f & = \|\Delta\|^2 \\
|
||||
& = \operatorname{tr}(\Delta^\top \Delta),
|
||||
\end{align*}
|
||||
\]
|
||||
where $\Delta = G - \mathcal{P}(A^\top Q A)$. Differentiate both sides and simplify the result using the transpose-invariance of the trace:
|
||||
\[
|
||||
\begin{align*}
|
||||
df & = \operatorname{tr}(d\Delta^\top \Delta) + \operatorname{tr}(\Delta^\top d\Delta) \\
|
||||
& = 2\operatorname{tr}(\Delta^\top d\Delta).
|
||||
\end{align*}
|
||||
\]
|
||||
To compute $d\Delta$, it will be helpful to write the projection operator $\mathcal{P}$ more explicitly. Let $\mathcal{C}$ be the set of indices where the Gram matrix is unconstrained. We can express $C$ as the span of the elementary matrices $\{E_{ij}\}_{(i, j) \in \mathcal{C}}$. Observing that $E_{ij} X^\top E_{ij} = X_{ij} E_{ij}$ for any matrix $X$, we can do orthogonal projection onto $C$ using elementary matrices:
|
||||
\[ \mathcal{P}(X) = \sum_{(i, j) \in \mathcal{C}} E_{ij} X^\top E_{ij}. \]
|
||||
It follows that
|
||||
\[
|
||||
\begin{align*}
|
||||
d\mathcal{P}(X) & = \sum_{(i, j) \in \mathcal{C}} E_{ij}\,dX^\top E_{ij} \\
|
||||
& = \mathcal{P}(dX).
|
||||
\end{align*}
|
||||
\]
|
||||
Since the subspace $C$ is transpose-invariant, we also have
|
||||
\[ \mathcal{P}(X^\top) = \mathcal{P}(X)^\top. \]
|
||||
We can now see that
|
||||
\[
|
||||
\begin{align*}
|
||||
d\Delta & = -\mathcal{P}(dA^\top Q A + A^\top Q\,dA) \\
|
||||
& = -\big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big].
|
||||
\end{align*}
|
||||
\]
|
||||
Plugging this into our formula for $df$, and recalling that $\Delta$ is symmetric, we get
|
||||
\[
|
||||
\begin{align*}
|
||||
df & = 2\operatorname{tr}(-\Delta^\top \big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big]) \\
|
||||
& = -2\operatorname{tr}\left(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top + \Delta^\top \mathcal{P}(A^\top Q\,dA)\big]\right) \\
|
||||
& = -2\left[\operatorname{tr}\big(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top\big) + \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big)\right] \\
|
||||
& = -4 \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big),
|
||||
\end{align*}
|
||||
\]
|
||||
using the transpose-invariance and cyclic property of the trace in the final step. Writing the projection in terms of elementary matrices, we learn that
|
||||
\[
|
||||
\begin{align*}
|
||||
df & = -4 \operatorname{tr}\left(\Delta^\top \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} (A^\top Q\,dA)^\top E_{ij} \right] \right) \\
|
||||
& = -4 \operatorname{tr}\left(\sum_{(i, j) \in \mathcal{C}} \Delta^\top E_{ij}\,dA^\top Q A E_{ij}\right) \\
|
||||
& = -4 \operatorname{tr}\left(dA^\top Q A \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} \Delta^\top E_{ij} \right]\right) \\
|
||||
& = -4 \operatorname{tr}\big(dA^\top Q A\,\mathcal{P}(\Delta)\big) \\
|
||||
& = \langle\!\langle dA,\,-4 Q A\,\mathcal{P}(\Delta) \rangle\!\rangle.
|
||||
\end{align*}
|
||||
\]
|
||||
From here, we get a nice matrix expression for the negative gradient of the loss function:
|
||||
\[ -\operatorname{grad}(f) = 4 Q A\,\mathcal{P}(\Delta). \]
|
||||
This matrix is stored as `neg_grad` in the Rust and Julia implementations of the realization routine.
|
||||
|
||||
#### The second derivative of the loss function
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user