Write up Rust native benchmark

2024-08-19 20:06:16 +00:00 · 2024-08-19 20:06:16 +00:00 · e97dee9a72
commit e97dee9a72
parent 34aa1eb66b
1 changed files with 18 additions and 7 deletions
--- a/Language-benchmarks.md
+++ b/Language-benchmarks.md
@ -17,10 +17,13 @@ To validate the computation, the benchmark program displays the eigenvalues of $

 The language comparison benchmark uses 64-bit floating point matrices of size $N = 60$. Other variants of the benchmark, used to compare different design decisions within Rust, are described at the end.
 ## Running the benchmark
-### Rust
+### Rust web
 - To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving.
-  - The `--release` flag is crucial. By turning off development features like debug symbols, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
-### Scala
+  - The [`--release`](https://doc.rust-lang.org/cargo/reference/profiles.html#release) flag is crucial. By turning on compiler optimizations, turning off overflow checks, and changing other build settings, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
+### Rust native
+- To build and run, call `cargo run --release` from the `rust-benchmark-native` folder.
+  - As with the web benchmark, the `--release` flag is crucial: it makes the compiled code about a hundred times faster on Aaron's machine.
+### Scala web
 - To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt.
  - The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds.
  - Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine.
@ -43,14 +46,22 @@ where `rand_mat_next` is pre-allocated outside the loop.

 Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor.
 ## Results
-### Firefox
+### Rust vs. Scala on the web
+#### Firefox
 The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent.
 - Rust run time: 110–120 ms
 - Scala run time: 700–1200 ms
-### Chromium
+#### Chromium
 The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency.
- Rust 80–90 ms
- Scala: 520–590 ms
+- Rust run time: 80–90 ms
+- Scala run time: 520–590 ms
+### Web vs. native in Rust
+The native version typically ran 1.00–1.15 times as fast as the web version on Chromium, and 1.4–1.5 times as fast as the web version on Firefox, with slightly more consistency.
+- Rust native run time: 77 ms
+
+For this benchmark, WebAssembly achieved its aim of executing at near-native speed.
+
+*When I first added the GTK interface to the Rust native benchmark, the run time became unusually long for a while, hovering around 260 ms. I never figured out what was causing that. The run time eventually returned to typical after some combination of rebuilding, waiting, and shutting my laptop down for the night. —Aaron*
 ## Rust benchmark variants
 ### Low-precision variant
 - For matrices of size $N = 50$, using 32-bit floating point instead of 64-bit made the computation about 15% faster (60 ms instead of 70 ms). However, for $N \ge 54$, the 32-bit floating point variant would hang indefinitely! Maybe the target precision doesn't change to accommodate the lower-precision data type?