Goals
Content
Project files should include:
- State snapshot
- Command history
A full description of an item should contain enough information to create it, so the item description format we use in the state snapshot might be useful for the command history too.
Format
We want a format which is:
- Established
- Text-based
- Human-readable
- Lightweight
- For example, XML's syntax is too heavy
- Low line noise
- Represents floating-point values faithfully and readably
- A decimal format would be best
Behavior
We want write-read and write-read-write round trips to be faithful.
Framework and format
CAD formats
STEP
This format is described by the ISO 10303 standard. The standard includes an "Integrated Resource," described by ISO 10303-108, for "Parameterization and constraints for explicit geometric product models." As of 2008, there was no STEP Application Protocol that incorporated this integrated resource, but the authors of the paper below experimented with a hypothetical application protocol.
- Junhwan Kim, Michael J. Pratt, Raj G. Iyer, and Ram D. Sriram. "Standardized data exchange of CAD models with design intent". Computer-Aided Design 40 (7).
There's an experimental Rust crate for reading and writing STEP files, called ruststep.
This blog post about writing a STEP parser includes some concrete examples of STEP files.
IGES
This page claims that IGES "lacked the ability to store higher-level information like feature history or assembly constraints."
SolveSpace
It might be useful to look at the SolveSpace file format (.slvs), with the caveat that there's an ongoing effort to replace it.
None of the SolveSpace export options talk about preserving constraint information.
Rust frameworks
Serde is a very popular serialization framework for software written in Rust. It supports many formats out of the box. Here are the human-readable ones that are designed for hierarchical data and are supported for both serialization and deserialization:
- JSON
- YAML
- TOML
- RON
- JSON5
- S-expressions
State of the art
Package manifests in Julia, Python, and Rust are all written in TOML.
Serious candidates
- TOML -- but we have cautions below that it looks good at first, then breaks down as you get more complicated
- StrictYAML - it seems that with all its drawbacks, YAML is not a candidate. But StrictYAML is a subset that might be palatable; and perhaps Serde already produces StrictYAML or can easily be adapted to. Also, StrictYAML optionally uses a schema for reading, which might be sensible in our use case.
Comparison
Table
Wikipedia has a pretty extensive comparison of data-serialization formats. (This table does not include RON or TOML, however.)
Downsides
These are articles I came across while trying to understand YAML's bad reputation. I made no attempt at an unbiased comparison between formats.
EDN
JSON
YAML
TOML
Floating-point representation
We probably don't need to know much about how to represent floating-point numbers as human-readable strings, because we'll probably choose a serialization framework that does it for us. These articles are useful background, though.
Ryan Juckett has written a detailed history of faithful string formatting algorithms for floating-point numbers.
Ryan Juckett."Printing Floating-Point Numbers" (2014).
A few years after this article was published, Ulf Adams introduced a new algorithm called Ryū, which seems to be the current state of the art.
Ulf Adams. "Ryū: fast float-to-string conversion" (2018).
Alternatively, we're open to using an exact hexadecimal representation.