4 Serialization
Glen Whitney edited this page 2025-02-25 14:49:19 +00:00

Goals

Content

Project files should include:

  • State snapshot
  • Command history

A full description of an item should contain enough information to create it, so the item description format we use in the state snapshot might be useful for the command history too.

Format

We want a format which is:

  • Established
  • Text-based
  • Human-readable
  • Lightweight
    • For example, XML's syntax is too heavy
  • Low line noise
  • Represents floating-point values faithfully and readably
    • A decimal format would be best

Framework and format

Rust frameworks

Serde is a very popular serialization framework for software written in Rust. It supports many formats out of the box. Here are the human-readable ones that are designed for hierarchical data and are supported for both serialization and deserialization:

  • JSON
  • YAML
  • TOML
  • RON
  • JSON5
  • S-expressions

State of the art

Package manifests in Julia, Python, and Rust are all written in TOML.

Serious candidates

  • TOML -- but we have cautions below that it looks good at first, then breaks down as you get more complicated
  • StrictYAML - it seems that with all its drawbacks, YAML is not a candidate. But StrictYAML is a subset that might be palatable; and perhaps Serde already produces StrictYAML or can easily be adapted to. Also, StrictYAML optionally uses a schema for reading, which might be sensible in our use case.

Comparison

Table

Wikipedia has a pretty extensive comparison of data-serialization formats. (This table does not include RON or TOML, however.)

Downsides

These are articles I came across while trying to understand YAML's bad reputation. I made no attempt at an unbiased comparison between formats.

EDN

JSON

YAML

TOML

Floating-point representation

We probably don't need to know much about how to represent floating-point numbers as human-readable strings, because we'll probably choose a serialization framework that does it for us. These articles are useful background, though.

Ryan Juckett has written a detailed history of faithful string formatting algorithms for floating-point numbers.

Ryan Juckett."Printing Floating-Point Numbers" (2014).

A few years after this article was published, Ulf Adams introduced a new algorithm called Ryū, which seems to be the current state of the art.

Ulf Adams. "Ryū: fast float-to-string conversion" (2018).