switch long display formulas to more readable triple-backtick block syntax
parent
75d8ca5fb7
commit
ae05a0f03a
1 changed files with 38 additions and 6 deletions
|
@ -46,11 +46,21 @@ Write the loss function as
|
||||||
\end{align*}
|
\end{align*}
|
||||||
```
|
```
|
||||||
where $\Delta = G - \mathcal{P}(A^\top Q A)$. Differentiate both sides and simplify the result using the transpose-invariance of the trace:
|
where $\Delta = G - \mathcal{P}(A^\top Q A)$. Differentiate both sides and simplify the result using the transpose-invariance of the trace:
|
||||||
\[ \begin{align*} df & = \operatorname{tr}(d\Delta^\top \Delta) + \operatorname{tr}(\Delta^\top d\Delta) \\ & = 2\operatorname{tr}(\Delta^\top d\Delta). \end{align*} \]
|
```math
|
||||||
|
\begin{align*}
|
||||||
|
df & = \operatorname{tr}(d\Delta^\top \Delta) + \operatorname{tr}(\Delta^\top d\Delta) \\
|
||||||
|
& = 2\operatorname{tr}(\Delta^\top d\Delta).
|
||||||
|
\end{align*}
|
||||||
|
```
|
||||||
To compute $d\Delta$, it will be helpful to write the projection operator $\mathcal{P}$ more explicitly. Let $\mathcal{C}$ be the set of indices where the Gram matrix is unconstrained. We can express $C$ as the span of the standard basis matrices $\{E_{ij}\}_{(i, j) \in \mathcal{C}}$. Observing that $E_{ij} X^\top E_{ij} = X_{ij} E_{ij}$ for any matrix $X$, we can do orthogonal projection onto $C$ using standard basis matrices:
|
To compute $d\Delta$, it will be helpful to write the projection operator $\mathcal{P}$ more explicitly. Let $\mathcal{C}$ be the set of indices where the Gram matrix is unconstrained. We can express $C$ as the span of the standard basis matrices $\{E_{ij}\}_{(i, j) \in \mathcal{C}}$. Observing that $E_{ij} X^\top E_{ij} = X_{ij} E_{ij}$ for any matrix $X$, we can do orthogonal projection onto $C$ using standard basis matrices:
|
||||||
\[ \mathcal{P}(X) = \sum_{(i, j) \in \mathcal{C}} E_{ij} X^\top E_{ij}. \]
|
\[ \mathcal{P}(X) = \sum_{(i, j) \in \mathcal{C}} E_{ij} X^\top E_{ij}. \]
|
||||||
It follows that
|
It follows that
|
||||||
\[ \begin{align*} d\mathcal{P}(X) & = \sum_{(i, j) \in \mathcal{C}} E_{ij}\,dX^\top E_{ij} \\ & = \mathcal{P}(dX). \end{align*} \]
|
```math
|
||||||
|
\begin{align*}
|
||||||
|
d\mathcal{P}(X) & = \sum_{(i, j) \in \mathcal{C}} E_{ij}\,dX^\top E_{ij} \\
|
||||||
|
& = \mathcal{P}(dX).
|
||||||
|
\end{align*}
|
||||||
|
\]
|
||||||
Since the subspace $C$ is transpose-invariant, we also have
|
Since the subspace $C$ is transpose-invariant, we also have
|
||||||
\[ \mathcal{P}(X^\top) = \mathcal{P}(X)^\top. \]
|
\[ \mathcal{P}(X^\top) = \mathcal{P}(X)^\top. \]
|
||||||
We can now see that
|
We can now see that
|
||||||
|
@ -61,9 +71,25 @@ d\Delta & = -\mathcal{P}(dA^\top Q A + A^\top Q\,dA) \\
|
||||||
\end{align*}
|
\end{align*}
|
||||||
\]
|
\]
|
||||||
Plugging this into our formula for $df$, and recalling that $\Delta$ is symmetric, we get
|
Plugging this into our formula for $df$, and recalling that $\Delta$ is symmetric, we get
|
||||||
\[ \begin{align*} df & = 2\operatorname{tr}(-\Delta^\top \big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big]) \\ & = -2\operatorname{tr}\left(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top + \Delta^\top \mathcal{P}(A^\top Q\,dA)\big]\right) \\ & = -2\left[\operatorname{tr}\big(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top\big) + \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big)\right] \\ & = -4 \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big), \end{align*} \]
|
```math
|
||||||
|
\begin{align*}
|
||||||
|
df & = 2\operatorname{tr}(-\Delta^\top \big[\mathcal{P}(A^\top Q\,dA)^\top + \mathcal{P}(A^\top Q\,dA)\big]) \\
|
||||||
|
& = -2\operatorname{tr}\left(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top + \Delta^\top \mathcal{P}(A^\top Q\,dA)\big]\right) \\
|
||||||
|
& = -2\left[\operatorname{tr}\big(\Delta\,\mathcal{P}(A^\top Q\,dA)^\top\big)
|
||||||
|
+ \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big)\right] \\
|
||||||
|
& = -4 \operatorname{tr}\big(\Delta^\top \mathcal{P}(A^\top Q\,dA)\big),
|
||||||
|
\end{align*}
|
||||||
|
```
|
||||||
using the transpose-invariance and cyclic property of the trace in the final step. Writing the projection in terms of standard basis matrices, we learn that
|
using the transpose-invariance and cyclic property of the trace in the final step. Writing the projection in terms of standard basis matrices, we learn that
|
||||||
\[ \begin{align*} df & = -4 \operatorname{tr}\left(\Delta^\top \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} (A^\top Q\,dA)^\top E_{ij} \right] \right) \\ & = -4 \operatorname{tr}\left(\sum_{(i, j) \in \mathcal{C}} \Delta^\top E_{ij}\,dA^\top Q A E_{ij}\right) \\ & = -4 \operatorname{tr}\left(dA^\top Q A \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} \Delta^\top E_{ij} \right]\right) \\ & = -4 \operatorname{tr}\big(dA^\top Q A\,\mathcal{P}(\Delta)\big) \\ & = \langle\!\langle dA,\,-4 Q A\,\mathcal{P}(\Delta) \rangle\!\rangle. \end{align*} \]
|
```math
|
||||||
|
\begin{align*}
|
||||||
|
df & = -4 \operatorname{tr}\left(\Delta^\top \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} (A^\top Q\,dA)^\top E_{ij} \right] \right) \\
|
||||||
|
& = -4 \operatorname{tr}\left(\sum_{(i, j) \in \mathcal{C}} \Delta^\top E_{ij}\,dA^\top Q A E_{ij}\right) \\
|
||||||
|
& = -4 \operatorname{tr}\left(dA^\top Q A \left[ \sum_{(i, j) \in \mathcal{C}} E_{ij} \Delta^\top E_{ij} \right]\right) \\
|
||||||
|
& = -4 \operatorname{tr}\big(dA^\top Q A\,\mathcal{P}(\Delta)\big) \\
|
||||||
|
& = \langle\!\langle dA,\,-4 Q A\,\mathcal{P}(\Delta) \rangle\!\rangle.
|
||||||
|
\end{align*}
|
||||||
|
```
|
||||||
From here, we get a nice matrix expression for the negative gradient of the loss function:
|
From here, we get a nice matrix expression for the negative gradient of the loss function:
|
||||||
\[ -\operatorname{grad}(f) = 4 Q A\,\mathcal{P}(\Delta). \]
|
\[ -\operatorname{grad}(f) = 4 Q A\,\mathcal{P}(\Delta). \]
|
||||||
This matrix is stored as `neg_grad` in the Rust and Julia implementations of the realization routine.
|
This matrix is stored as `neg_grad` in the Rust and Julia implementations of the realization routine.
|
||||||
|
@ -73,8 +99,14 @@ This matrix is stored as `neg_grad` in the Rust and Julia implementations of the
|
||||||
Recalling that
|
Recalling that
|
||||||
\[ -d\Delta = \mathcal{P}(dA^\top Q A + A^\top Q\,dA), \]
|
\[ -d\Delta = \mathcal{P}(dA^\top Q A + A^\top Q\,dA), \]
|
||||||
we can express the derivative of $\operatorname{grad}(f)$ as
|
we can express the derivative of $\operatorname{grad}(f)$ as
|
||||||
\[ \begin{align*} d\operatorname{grad}(f) & = -4 Q\,dA\,\mathcal{P}(\Delta) - 4 Q A\,\mathcal{P}(d\Delta) \\ & = 4 Q\big[{-dA}\,\mathcal{P}(\Delta) + A\,\mathcal{P}(-d\Delta)\big]. \end{align*} \]
|
```math
|
||||||
In the Rust and Julia implementations of the realization routine, we express $d\operatorname{grad}(f)$ as a matrix in the standard basis for $\operatorname{End}(\mathbb{R}^n)$. We apply the cotangent vector $d\operatorname{grad}(f)$ to each standard basis matrix $E_{ij}$ by setting the value of the matrix-valued 1-form $dA$ to $E_{ij}$.
|
\begin{align*}
|
||||||
|
d\operatorname{grad}(f) & = -4 Q\,dA\,\mathcal{P}(\Delta) - 4 Q A\,\mathcal{P}(d\Delta) \\
|
||||||
|
& = 4 Q\big[{-dA}\,\mathcal{P}(\Delta) + A\,\mathcal{P}(-d\Delta)\big].
|
||||||
|
\end{align*}
|
||||||
|
```
|
||||||
|
In the Rust and Julia implementations of the realization routine, we express $d\operatorname{grad}(f)$ as a matrix in the standard basis for
|
||||||
|
$\operatorname{End}(\mathbb{R}^n)$. We apply the cotangent vector $d\operatorname{grad}(f)$ to each standard basis matrix $E_{ij}$ by setting the value of the matrix-valued 1-form $dA$ to $E_{ij}$.
|
||||||
|
|
||||||
#### Finding minima
|
#### Finding minima
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue