Write up Rust native benchmark
parent
34aa1eb66b
commit
e97dee9a72
@ -17,10 +17,13 @@ To validate the computation, the benchmark program displays the eigenvalues of $
|
||||
|
||||
The language comparison benchmark uses 64-bit floating point matrices of size $N = 60$. Other variants of the benchmark, used to compare different design decisions within Rust, are described at the end.
|
||||
## Running the benchmark
|
||||
### Rust
|
||||
### Rust web
|
||||
- To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving.
|
||||
- The `--release` flag is crucial. By turning off development features like debug symbols, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
|
||||
### Scala
|
||||
- The [`--release`](https://doc.rust-lang.org/cargo/reference/profiles.html#release) flag is crucial. By turning on compiler optimizations, turning off overflow checks, and changing other build settings, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
|
||||
### Rust native
|
||||
- To build and run, call `cargo run --release` from the `rust-benchmark-native` folder.
|
||||
- As with the web benchmark, the `--release` flag is crucial: it makes the compiled code about a hundred times faster on Aaron's machine.
|
||||
### Scala web
|
||||
- To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt.
|
||||
- The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds.
|
||||
- Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine.
|
||||
@ -43,14 +46,22 @@ where `rand_mat_next` is pre-allocated outside the loop.
|
||||
|
||||
Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor.
|
||||
## Results
|
||||
### Firefox
|
||||
### Rust vs. Scala on the web
|
||||
#### Firefox
|
||||
The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent.
|
||||
- Rust run time: 110–120 ms
|
||||
- Scala run time: 700–1200 ms
|
||||
### Chromium
|
||||
#### Chromium
|
||||
The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency.
|
||||
- Rust 80–90 ms
|
||||
- Scala: 520–590 ms
|
||||
- Rust run time: 80–90 ms
|
||||
- Scala run time: 520–590 ms
|
||||
### Web vs. native in Rust
|
||||
The native version typically ran 1.00–1.15 times as fast as the web version on Chromium, and 1.4–1.5 times as fast as the web version on Firefox, with slightly more consistency.
|
||||
- Rust native run time: 77 ms
|
||||
|
||||
For this benchmark, WebAssembly achieved its aim of executing at near-native speed.
|
||||
|
||||
*When I first added the GTK interface to the Rust native benchmark, the run time became unusually long for a while, hovering around 260 ms. I never figured out what was causing that. The run time eventually returned to typical after some combination of rebuilding, waiting, and shutting my laptop down for the night. —Aaron*
|
||||
## Rust benchmark variants
|
||||
### Low-precision variant
|
||||
- For matrices of size $N = 50$, using 32-bit floating point instead of 64-bit made the computation about 15% faster (60 ms instead of 70 ms). However, for $N \ge 54$, the 32-bit floating point variant would hang indefinitely! Maybe the target precision doesn't change to accommodate the lower-precision data type?
|
||||
|
Loading…
Reference in New Issue
Block a user