diff --git a/Language-benchmarks.md b/Language-benchmarks.md index 6d549c3..ea6ba43 100644 --- a/Language-benchmarks.md +++ b/Language-benchmarks.md @@ -17,10 +17,13 @@ To validate the computation, the benchmark program displays the eigenvalues of $ The language comparison benchmark uses 64-bit floating point matrices of size $N = 60$. Other variants of the benchmark, used to compare different design decisions within Rust, are described at the end. ## Running the benchmark -### Rust +### Rust web - To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving. - - The `--release` flag is crucial. By turning off development features like debug symbols, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler. -### Scala + - The [`--release`](https://doc.rust-lang.org/cargo/reference/profiles.html#release) flag is crucial. By turning on compiler optimizations, turning off overflow checks, and changing other build settings, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler. +### Rust native +- To build and run, call `cargo run --release` from the `rust-benchmark-native` folder. + - As with the web benchmark, the `--release` flag is crucial: it makes the compiled code about a hundred times faster on Aaron's machine. +### Scala web - To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt. - The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds. - Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine. @@ -43,14 +46,22 @@ where `rand_mat_next` is pre-allocated outside the loop. Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor. ## Results -### Firefox +### Rust vs. Scala on the web +#### Firefox The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent. - Rust run time: 110–120 ms - Scala run time: 700–1200 ms -### Chromium +#### Chromium The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency. -- Rust 80–90 ms -- Scala: 520–590 ms +- Rust run time: 80–90 ms +- Scala run time: 520–590 ms +### Web vs. native in Rust +The native version typically ran 1.00–1.15 times as fast as the web version on Chromium, and 1.4–1.5 times as fast as the web version on Firefox, with slightly more consistency. +- Rust native run time: 77 ms + +For this benchmark, WebAssembly achieved its aim of executing at near-native speed. + +*When I first added the GTK interface to the Rust native benchmark, the run time became unusually long for a while, hovering around 260 ms. I never figured out what was causing that. The run time eventually returned to typical after some combination of rebuilding, waiting, and shutting my laptop down for the night. —Aaron* ## Rust benchmark variants ### Low-precision variant - For matrices of size $N = 50$, using 32-bit floating point instead of 64-bit made the computation about 15% faster (60 ms instead of 70 ms). However, for $N \ge 54$, the 32-bit floating point variant would hang indefinitely! Maybe the target precision doesn't change to accommodate the lower-precision data type?