Write up Rust and Scala benchmarks
							parent
							
								
									d9cf029e6e
								
							
						
					
					
						commit
						d06e8551ad
					
				
					 1 changed files with 51 additions and 0 deletions
				
			
		
							
								
								
									
										51
									
								
								Language-benchmarks.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										51
									
								
								Language-benchmarks.md
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,51 @@
 | 
				
			||||||
 | 
					## Background
 | 
				
			||||||
 | 
					Among the [languages we considered](Coding-environment), Rust and Scala were the only two that we'd be enthusiastic about using. Each one has a key disadvantage:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Rust doesn't have quiet syntax, so we'd have to invest in writing and maintaining a preprocessor.
 | 
				
			||||||
 | 
					- Scala doesn't currently have a WebAssembly target (although there are plans for one), so we'll be limited to the performance of the JavaScript target.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The decision between these two languages basically comes down to comparing the maintenance cost of a Rust preprocessor to the perormance cost of JavaScript.
 | 
				
			||||||
 | 
					## Benchmark computation
 | 
				
			||||||
 | 
					To evaluate the performance cost, Aaron wrote a benchmark program in Rust and JavaScript. It does the following computation, given an even dimensions $N$ and an integer period $R$:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Initialize a random matrix $A \colon \mathbb{R}^N \to \mathbb{R}^N$ whose entries are roughly independent and uniformly distributed in $[-1, 1]$. (The independence and uniformity probably aren't very good: we used a very simple hash function to make it easy to get the same matrix in both versions.)
 | 
				
			||||||
 | 
					- Initialize an orthogonal matrix $T \colon \colon \mathbb{R}^N \to \mathbb{R}^N$ that splits into a direct sum of rotations with periods in $\big\{R, \tfrac{R}{2}, \tfrac{R}{3}, \tfrac{R}{4}\big\}$.
 | 
				
			||||||
 | 
					- Compute $A,\;TA,\;T^2A,\;\ldots,\;T^{R-1}A$ using $R$ left-multiplications by $T$.
 | 
				
			||||||
 | 
					- Find the eigenvalues of $A,\;\ldots\;T^{R-1}A$.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To validate the computation, the benchmark program displays the eigenvalues of $T^r A$, with $r \in \{0, \ldots, R\}$ controlled by a slider. Displaying the eigenvalues isn't part of the benchmark computation, so it isn't timed.
 | 
				
			||||||
 | 
					## Running the benchmark
 | 
				
			||||||
 | 
					### Rust
 | 
				
			||||||
 | 
					- To build and run, call `trunk serve --release` from the `rust-benchmark` folder and go to the URL that Trunk is serving.
 | 
				
			||||||
 | 
					  - The `--release` flag is crucial. By turning off development features like debug symbols, it makes the compiled code literally a hundred times faster on Aaron's machine. However, it also seems prevent the benchmark computation from showing up in the Firefox profiler.
 | 
				
			||||||
 | 
					### Scala
 | 
				
			||||||
 | 
					- To build, call `sbt` from the `scala-benchmark` folder, and then call `fullLinkJS` from the `sbt` prompt.
 | 
				
			||||||
 | 
					  - The benchmark page points to the JavaScript file produced by `fullLinkJS`. Calling `fastLinkJS` won't update the code the benchmark page uses, even if compilation succeeds.
 | 
				
			||||||
 | 
					  - Using `fullLinkJS` instead of `fastLinkJS` is important. Doing a full build rather than a quick build provides more opportunities for optimization, making the transpiled code nearly twice as fast on Aaron's machine.
 | 
				
			||||||
 | 
					- To run, launch a web server for the `scala-benchmark` folder and go to the URL that it's serving.
 | 
				
			||||||
 | 
					## Program details
 | 
				
			||||||
 | 
					### Rust
 | 
				
			||||||
 | 
					To make the Rust computation more similar to the Scala computation, we do the successive left-multiplications using the code
 | 
				
			||||||
 | 
					```rust
 | 
				
			||||||
 | 
					rand_mat = &rot_step * rand_mat;
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					which might allocate new memory to store the result in every time it runs. We could avoid the allocation by doing something like
 | 
				
			||||||
 | 
					```rust
 | 
				
			||||||
 | 
					rot_step.mul_to(&rand_mat, &rand_mat_next);
 | 
				
			||||||
 | 
					rand_mat.copy_from(&rand_mat_next);
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					where `rand_mat_next` is pre-allocated outside the loop.
 | 
				
			||||||
 | 
					## Browser details
 | 
				
			||||||
 | 
					- Firefox 128.0.3 (64-bit)
 | 
				
			||||||
 | 
					- Ungoogled Chromium 127.0.6533.88
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Both running under Ubuntu 22.04 (64-bit) on an [AMD Ryzen 7 7840U](https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.html) processor.
 | 
				
			||||||
 | 
					## Results
 | 
				
			||||||
 | 
					### Firefox
 | 
				
			||||||
 | 
					The Rust version typically ran 6–11 times as fast as the Scala version, and its speed was much more consistent.
 | 
				
			||||||
 | 
					- Rust run time: 110–120 ms
 | 
				
			||||||
 | 
					- Scala run time: 700–1200 ms
 | 
				
			||||||
 | 
					### Chromium
 | 
				
			||||||
 | 
					The Rust version typically ran 5–7 times as fast as the Scala version, with comparable consistency.
 | 
				
			||||||
 | 
					- Rust 80–90 ms
 | 
				
			||||||
 | 
					- Scala: 520–590 ms
 | 
				
			||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue