This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Background
Among the languages we considered, Rust and Scala were the only two that we'd be enthusiastic about using. Each one has a key disadvantage:
- Rust doesn't have quiet syntax, so we'd have to invest in writing and maintaining a preprocessor.
- Scala doesn't currently have a WebAssembly target (although there are plans for one), so we'll be limited to the performance of the JavaScript target.
The decision between these two languages basically comes down to comparing the maintenance cost of a Rust preprocessor to the perormance cost of JavaScript.
Benchmark computation
To evaluate the performance cost, Aaron wrote a benchmark program in Rust and JavaScript. It does the following computation, given an even dimensions N
and an integer period R
:
- Initialize a random matrix
A \colon \mathbb{R}^N \to \mathbb{R}^N
whose entries are roughly independent and uniformly distributed in[-1, 1]
. (The independence and uniformity probably aren't very good: we used a very simple hash function to make it easy to get the same matrix in both versions.) - Initialize an orthogonal matrix
T \colon \colon \mathbb{R}^N \to \mathbb{R}^N
that splits into a direct sum of rotations with periods in\big\{R, \tfrac{R}{2}, \tfrac{R}{3}, \tfrac{R}{4}\big\}
. - Compute
A,\;TA,\;T^2A,\;\ldots,\;T^{R-1}A
usingR
left-multiplications byT
. - Find the eigenvalues of
A,\;\ldots\;T^{R-1}A
.
To validate the computation, the benchmark program displays the eigenvalues of T^r A
, with r \in \{0, \ldots, R\}
controlled by a slider. Displaying the eigenvalues isn't part of the benchmark computation, so it isn't timed.
The language comparison benchmark uses 64-bit floating point matrices of size N = 60
. Other variants of the benchmark, used to compare different design decisions within Rust, are described at the end.
Running the benchmark
Rust web
- To build and run, call
trunk serve --release
from therust-benchmark
folder and go to the URL that Trunk is serving.- The
--release
flag is crucial. By turning on compiler optimizations, turning off overflow checks, and changing other build settings, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
- The
Rust native
- To build and run, call
cargo run --release
from therust-benchmark-native
folder.- As with the web benchmark, the
--release
flag is crucial: it makes the compiled code about a hundred times faster on Aaron's machine.
- As with the web benchmark, the
Scala web
- To build, call
sbt
from thescala-benchmark
folder, and then callfullLinkJS
from thesbt
prompt.- The benchmark page points to the JavaScript file produced by
fullLinkJS
. CallingfastLinkJS
won't update the code the benchmark page uses, even if compilation succeeds. - Using
fullLinkJS
instead offastLinkJS
is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine.
- The benchmark page points to the JavaScript file produced by
- To run, launch a web server for the
scala-benchmark
folder and go to the URL that it's serving.
Program details
Rust
To make the Rust computation more similar to the Scala computation, we do the successive left-multiplications using the code
rand_mat = &rot_step * rand_mat;
which might allocate new memory to store the result in every time it runs. We could avoid the allocation by doing something like
rot_step.mul_to(&rand_mat, &rand_mat_next);
rand_mat.copy_from(&rand_mat_next);
where rand_mat_next
is pre-allocated outside the loop.
Browser details
- Firefox 128.0.3 (64-bit)
- Ungoogled Chromium 127.0.6533.88
Both running under Ubuntu 22.04 (64-bit) on an AMD Ryzen 7 7840U processor.
Results
Rust vs. Scala on the web
Firefox
The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent.
- Rust run time: 110–120 ms
- Scala run time: 700–1200 ms
Chromium
The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency.
- Rust run time: 80–90 ms
- Scala run time: 520–590 ms
Web vs. native in Rust
The native version typically ran 1.00–1.15 times as fast as the web version on Chromium, and 1.4–1.5 times as fast as the web version on Firefox, with slightly more consistency.
- Rust native run time: 77 ms
For this benchmark, WebAssembly achieved its aim of executing at near-native speed.
When I first added the GTK interface to the Rust native benchmark, the run time became unusually long for a while, hovering around 260 ms. I never figured out what was causing that. The run time eventually returned to typical after some combination of rebuilding, waiting, and shutting my laptop down for the night. —Aaron
Rust benchmark variants
Low-precision variant
- For matrices of size
N = 50
, using 32-bit floating point instead of 64-bit made the computation about 15% faster (60 ms instead of 70 ms). However, forN \ge 54
, the 32-bit floating point variant would hang indefinitely! Maybe the target precision doesn't change to accommodate the lower-precision data type?
Statically sized variant
- For matrices of size
N = 60
, using statically sized matrices instead of dynamically sized ones made the computation about 10% slower (125–130 ms instead of 110–120 ms). - For matrices of size
N = 50
, using statically sized matrices made the computation about 15% slower (80 ms instead of 70 ms). - For matrices of size
N = 20
, statically and dynamically sized matrices gave comparable run times (12–15 ms).