From d06e8551ad009243f854a560c7e970b7ab287ee3 Mon Sep 17 00:00:00 2001 From: Vectornaut Date: Mon, 12 Aug 2024 22:26:58 +0000 Subject: [PATCH] Write up Rust and Scala benchmarks --- Language-benchmarks.md | 51 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 Language-benchmarks.md diff --git a/Language-benchmarks.md b/Language-benchmarks.md new file mode 100644 index 0000000..9cee1b3 --- /dev/null +++ b/Language-benchmarks.md @@ -0,0 +1,51 @@ +## Background +Among the [languages we considered](Coding-environment), Rust and Scala were the only two that we'd be enthusiastic about using. Each one has a key disadvantage: + +- Rust doesn't have quiet syntax, so we'd have to invest in writing and maintaining a preprocessor. +- Scala doesn't currently have a WebAssembly target (although there are plans for one), so we'll be limited to the performance of the JavaScript target. + +The decision between these two languages basically comes down to comparing the maintenance cost of a Rust preprocessor to the perormance cost of JavaScript. +## Benchmark computation +To evaluate the performance cost, Aaron wrote a benchmark program in Rust and JavaScript. It does the following computation, given an even dimensions $N$ and an integer period $R$: + +- Initialize a random matrix $A \colon \mathbb{R}^N \to \mathbb{R}^N$ whose entries are roughly independent and uniformly distributed in $[-1, 1]$. (The independence and uniformity probably aren't very good: we used a very simple hash function to make it easy to get the same matrix in both versions.) +- Initialize an orthogonal matrix $T \colon \colon \mathbb{R}^N \to \mathbb{R}^N$ that splits into a direct sum of rotations with periods in $\big\{R, \tfrac{R}{2}, \tfrac{R}{3}, \tfrac{R}{4}\big\}$. +- Compute $A,\;TA,\;T^2A,\;\ldots,\;T^{R-1}A$ using $R$ left-multiplications by $T$. +- Find the eigenvalues of $A,\;\ldots\;T^{R-1}A$. + +To validate the computation, the benchmark program displays the eigenvalues of $T^r A$, with $r \in \{0, \ldots, R\}$ controlled by a slider. Displaying the eigenvalues isn't part of the benchmark computation, so it isn't timed. +## Running the benchmark +### Rust +- To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving. + - The `--release` flag is crucial. By turning off development features like debug symbols, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler. +### Scala +- To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt. + - The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds. + - Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine. +- To run, launch a web server for the `scala-benchmark` folder and go to the URL that it's serving. +## Program details +### Rust +To make the Rust computation more similar to the Scala computation, we do the successive left-multiplications using the code +```rust +rand_mat = &rot_step * rand_mat; +``` +which might allocate new memory to store the result in every time it runs. We could avoid the allocation by doing something like +```rust +rot_step.mul_to(&rand_mat, &rand_mat_next); +rand_mat.copy_from(&rand_mat_next); +``` +where `rand_mat_next` is pre-allocated outside the loop. +## Browser details +- Firefox 128.0.3 (64-bit) +- Ungoogled Chromium 127.0.6533.88 + +Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor. +## Results +### Firefox +The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent. +- Rust run time: 110–120 ms +- Scala run time: 700–1200 ms +### Chromium +The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency. +- Rust 80–90 ms +- Scala: 520–590 ms \ No newline at end of file