Write up Rust native benchmark

Vectornaut 2024-08-19 20:06:16 +00:00
parent 34aa1eb66b
commit e97dee9a72

@ -17,10 +17,13 @@ To validate the computation, the benchmark program displays the eigenvalues of $
The language comparison benchmark uses 64-bit floating point matrices of size $N = 60$. Other variants of the benchmark, used to compare different design decisions within Rust, are described at the end. The language comparison benchmark uses 64-bit floating point matrices of size $N = 60$. Other variants of the benchmark, used to compare different design decisions within Rust, are described at the end.
## Running the benchmark ## Running the benchmark
### Rust ### Rust web
- To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving. - To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving.
- The `--release` flag is crucial. By turning off development features like debug symbols, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler. - The [`--release`](https://doc.rust-lang.org/cargo/reference/profiles.html#release) flag is crucial. By turning on compiler optimizations, turning off overflow checks, and changing other build settings, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
### Scala ### Rust native
- To build and run, call `cargo run --release` from the `rust-benchmark-native` folder.
- As with the web benchmark, the `--release` flag is crucial: it makes the compiled code about a hundred times faster on Aaron's machine.
### Scala web
- To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt. - To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt.
- The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds. - The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds.
- Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine. - Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine.
@ -43,14 +46,22 @@ where `rand_mat_next` is pre-allocated outside the loop.
Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor. Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor.
## Results ## Results
### Firefox ### Rust vs. Scala on the web
#### Firefox
The Rust version typically ran 611 times as fast as the Scala version, and its speed was much more consistent. The Rust version typically ran 611 times as fast as the Scala version, and its speed was much more consistent.
- Rust run time: 110120 ms - Rust run time: 110120 ms
- Scala run time: 7001200 ms - Scala run time: 7001200 ms
### Chromium #### Chromium
The Rust version typically ran 57 times as fast as the Scala version, with comparable consistency. The Rust version typically ran 57 times as fast as the Scala version, with comparable consistency.
- Rust 8090 ms - Rust run time: 8090 ms
- Scala: 520590 ms - Scala run time: 520590 ms
### Web vs. native in Rust
The native version typically ran 1.001.15 times as fast as the web version on Chromium, and 1.41.5 times as fast as the web version on Firefox, with slightly more consistency.
- Rust native run time: 77 ms
For this benchmark, WebAssembly achieved its aim of executing at near-native speed.
*When I first added the GTK interface to the Rust native benchmark, the run time became unusually long for a while, hovering around 260 ms. I never figured out what was causing that. The run time eventually returned to typical after some combination of rebuilding, waiting, and shutting my laptop down for the night. —Aaron*
## Rust benchmark variants ## Rust benchmark variants
### Low-precision variant ### Low-precision variant
- For matrices of size $N = 50$, using 32-bit floating point instead of 64-bit made the computation about 15% faster (60 ms instead of 70 ms). However, for $N \ge 54$, the 32-bit floating point variant would hang indefinitely! Maybe the target precision doesn't change to accommodate the lower-precision data type? - For matrices of size $N = 50$, using 32-bit floating point instead of 64-bit made the computation about 15% faster (60 ms instead of 70 ms). However, for $N \ge 54$, the 32-bit floating point variant would hang indefinitely! Maybe the target precision doesn't change to accommodate the lower-precision data type?