Language benchmarks

glen/dyna3

Fork 1

Table of Contents

Background
Benchmark computation
Running the benchmark

Rust web
Rust native
Scala web

Program details

Rust

Browser details
Results

Rust vs. Scala on the web

Firefox
Chromium

Web vs. native in Rust

Rust benchmark variants

Low-precision variant
Statically sized variant

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Background

Among the languages we considered, Rust and Scala were the only two that we'd be enthusiastic about using. Each one has a key disadvantage:

Rust doesn't have quiet syntax, so we'd have to invest in writing and maintaining a preprocessor.
Scala doesn't currently have a WebAssembly target (although there are plans for one), so we'll be limited to the performance of the JavaScript target.

The decision between these two languages basically comes down to comparing the maintenance cost of a Rust preprocessor to the perormance cost of JavaScript.

Benchmark computation

To evaluate the performance cost, Aaron wrote a benchmark program in Rust and JavaScript. It does the following computation, given an even dimensions N and an integer period R:

Initialize a random matrix A \colon \mathbb{R}^N \to \mathbb{R}^N whose entries are roughly independent and uniformly distributed in [-1, 1]. (The independence and uniformity probably aren't very good: we used a very simple hash function to make it easy to get the same matrix in both versions.)
Initialize an orthogonal matrix T \colon \colon \mathbb{R}^N \to \mathbb{R}^N that splits into a direct sum of rotations with periods in \big\{R, \tfrac{R}{2}, \tfrac{R}{3}, \tfrac{R}{4}\big\}.
Compute A,\;TA,\;T^2A,\;\ldots,\;T^{R-1}A using R left-multiplications by T.
Find the eigenvalues of A,\;\ldots\;T^{R-1}A.

To validate the computation, the benchmark program displays the eigenvalues of T^r A, with r \in \{0, \ldots, R\} controlled by a slider. Displaying the eigenvalues isn't part of the benchmark computation, so it isn't timed.

The language comparison benchmark uses 64-bit floating point matrices of size N = 60. Other variants of the benchmark, used to compare different design decisions within Rust, are described at the end.

Running the benchmark

Rust web

To build and run, call trunk serve --release from the rust-benchmark folder and go to the URL that Trunk is serving.
- The --release flag is crucial. By turning on compiler optimizations, turning off overflow checks, and changing other build settings, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.

Rust native

To build and run, call cargo run --release from the rust-benchmark-native folder.
- As with the web benchmark, the --release flag is crucial: it makes the compiled code about a hundred times faster on Aaron's machine.

Scala web

To build, call sbt from the scala-benchmark folder, and then call fullLinkJS from the sbt prompt.
- The benchmark page points to the JavaScript file produced by fullLinkJS. Calling fastLinkJS won't update the code the benchmark page uses, even if compilation succeeds.
- Using fullLinkJS instead of fastLinkJS is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine.
To run, launch a web server for the scala-benchmark folder and go to the URL that it's serving.

Program details

Rust

To make the Rust computation more similar to the Scala computation, we do the successive left-multiplications using the code

rand_mat = &rot_step * rand_mat;

which might allocate new memory to store the result in every time it runs. We could avoid the allocation by doing something like

rot_step.mul_to(&rand_mat, &rand_mat_next);
rand_mat.copy_from(&rand_mat_next);

where rand_mat_next is pre-allocated outside the loop.

Browser details

Firefox 128.0.3 (64-bit)
Ungoogled Chromium 127.0.6533.88

Both running under Ubuntu 22.04 (64-bit) on an AMD Ryzen 7 7840U processor.

Results

Rust vs. Scala on the web

Firefox

The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent.

Rust run time: 110–120 ms
Scala run time: 700–1200 ms

Chromium

The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency.

Rust run time: 80–90 ms
Scala run time: 520–590 ms

Web vs. native in Rust

The native version typically ran 1.00–1.15 times as fast as the web version on Chromium, and 1.4–1.5 times as fast as the web version on Firefox, with slightly more consistency.

Rust native run time: 77 ms

For this benchmark, WebAssembly achieved its aim of executing at near-native speed.

When I first added the GTK interface to the Rust native benchmark, the run time became unusually long for a while, hovering around 260 ms. I never figured out what was causing that. The run time eventually returned to typical after some combination of rebuilding, waiting, and shutting my laptop down for the night. —Aaron

Rust benchmark variants

Low-precision variant

For matrices of size N = 50, using 32-bit floating point instead of 64-bit made the computation about 15% faster (60 ms instead of 70 ms). However, for N \ge 54, the 32-bit floating point variant would hang indefinitely! Maybe the target precision doesn't change to accommodate the lower-precision data type?

Statically sized variant

For matrices of size N = 60, using statically sized matrices instead of dynamically sized ones made the computation about 10% slower (125–130 ms instead of 110–120 ms).
For matrices of size N = 50, using statically sized matrices made the computation about 15% slower (80 ms instead of 70 ms).
For matrices of size N = 20, statically and dynamically sized matrices gave comparable run times (12–15 ms).