Write up Rust and Scala benchmarks
parent
d9cf029e6e
commit
d06e8551ad
51
Language-benchmarks.md
Normal file
51
Language-benchmarks.md
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
## Background
|
||||||
|
Among the [languages we considered](Coding-environment), Rust and Scala were the only two that we'd be enthusiastic about using. Each one has a key disadvantage:
|
||||||
|
|
||||||
|
- Rust doesn't have quiet syntax, so we'd have to invest in writing and maintaining a preprocessor.
|
||||||
|
- Scala doesn't currently have a WebAssembly target (although there are plans for one), so we'll be limited to the performance of the JavaScript target.
|
||||||
|
|
||||||
|
The decision between these two languages basically comes down to comparing the maintenance cost of a Rust preprocessor to the perormance cost of JavaScript.
|
||||||
|
## Benchmark computation
|
||||||
|
To evaluate the performance cost, Aaron wrote a benchmark program in Rust and JavaScript. It does the following computation, given an even dimensions $N$ and an integer period $R$:
|
||||||
|
|
||||||
|
- Initialize a random matrix $A \colon \mathbb{R}^N \to \mathbb{R}^N$ whose entries are roughly independent and uniformly distributed in $[-1, 1]$. (The independence and uniformity probably aren't very good: we used a very simple hash function to make it easy to get the same matrix in both versions.)
|
||||||
|
- Initialize an orthogonal matrix $T \colon \colon \mathbb{R}^N \to \mathbb{R}^N$ that splits into a direct sum of rotations with periods in $\big\{R, \tfrac{R}{2}, \tfrac{R}{3}, \tfrac{R}{4}\big\}$.
|
||||||
|
- Compute $A,\;TA,\;T^2A,\;\ldots,\;T^{R-1}A$ using $R$ left-multiplications by $T$.
|
||||||
|
- Find the eigenvalues of $A,\;\ldots\;T^{R-1}A$.
|
||||||
|
|
||||||
|
To validate the computation, the benchmark program displays the eigenvalues of $T^r A$, with $r \in \{0, \ldots, R\}$ controlled by a slider. Displaying the eigenvalues isn't part of the benchmark computation, so it isn't timed.
|
||||||
|
## Running the benchmark
|
||||||
|
### Rust
|
||||||
|
- To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving.
|
||||||
|
- The `--release` flag is crucial. By turning off development features like debug symbols, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
|
||||||
|
### Scala
|
||||||
|
- To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt.
|
||||||
|
- The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds.
|
||||||
|
- Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine.
|
||||||
|
- To run, launch a web server for the `scala-benchmark` folder and go to the URL that it's serving.
|
||||||
|
## Program details
|
||||||
|
### Rust
|
||||||
|
To make the Rust computation more similar to the Scala computation, we do the successive left-multiplications using the code
|
||||||
|
```rust
|
||||||
|
rand_mat = &rot_step * rand_mat;
|
||||||
|
```
|
||||||
|
which might allocate new memory to store the result in every time it runs. We could avoid the allocation by doing something like
|
||||||
|
```rust
|
||||||
|
rot_step.mul_to(&rand_mat, &rand_mat_next);
|
||||||
|
rand_mat.copy_from(&rand_mat_next);
|
||||||
|
```
|
||||||
|
where `rand_mat_next` is pre-allocated outside the loop.
|
||||||
|
## Browser details
|
||||||
|
- Firefox 128.0.3 (64-bit)
|
||||||
|
- Ungoogled Chromium 127.0.6533.88
|
||||||
|
|
||||||
|
Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor.
|
||||||
|
## Results
|
||||||
|
### Firefox
|
||||||
|
The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent.
|
||||||
|
- Rust run time: 110–120 ms
|
||||||
|
- Scala run time: 700–1200 ms
|
||||||
|
### Chromium
|
||||||
|
The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency.
|
||||||
|
- Rust 80–90 ms
|
||||||
|
- Scala: 520–590 ms
|
Loading…
Reference in New Issue
Block a user