WIP: proof-of-concept of allowing expressions in pcSystems #52

Draft
glen wants to merge 12 commits from pc_notation into parser_combinator
Owner

One objection to parser-combinator was that its functional notation for rules is less clear and feels more verbose than the domain-specific languages of packages like peggy, Nearley, or Ohm. This PR demonstrates how easily parser-combinators can be used to define its own mini-DSL for specifying rules, providing more typical/comfortable syntax for a the mathjs grammar. The first commit only handles tokens, nonterminals, and ordered choice (any() in parser combinators, / in peggy, | in Ohm and Nearley), but already it makes the grammar more readable. This approach could easily be extended to allow * and + for repetition, whitespace-separation for sequencing, @ for value selection per peggy, etc, thereby allowing the mathjs grammar to be expressed almost entirely (or perhaps entirely) in a DSL defined using parser-combinators itself.

One objection to parser-combinator was that its functional notation for rules is less clear and feels more verbose than the domain-specific languages of packages like peggy, Nearley, or Ohm. This PR demonstrates how easily parser-combinators can be used to define its own mini-DSL for specifying rules, providing more typical/comfortable syntax for a the mathjs grammar. The first commit only handles tokens, nonterminals, and ordered choice (`any()` in parser combinators, `/` in peggy, `|` in Ohm and Nearley), but already it makes the grammar more readable. This approach could easily be extended to allow `*` and `+` for repetition, whitespace-separation for sequencing, `@` for value selection per peggy, etc, thereby allowing the mathjs grammar to be expressed almost entirely (or perhaps entirely) in a DSL defined using parser-combinators itself.
feat: proof-of-concept of allowing expressions in pcSystems
All checks were successful
/ test (pull_request) Successful in 17s
e73557a45d
refactor: more straightforward way to specify identifier handling
All checks were successful
/ test (pull_request) Successful in 17s
f2f4e07767
feat: add sequences, optionally w/ result entries picked, to notation
All checks were successful
/ test (pull_request) Successful in 19s
cd863490ac
Author
Owner

To continue/extend the proof of concept, I just added sequencing to the peggy-like notation, optionally with @ symbols to mark the items that should be "plucked" into the result of parsing the sequence. This latter feature borrowed from peggy makes it easy, for example, to ignore syntax-only punctuation in the result returned from parsing a sequence (such as the commas in a list of arguments).

At this point, already 19 out of 46 lines in the mathjs grammar specification can be expressed in the peggy-like notation, even though the notation parser only has 7 rules. Several more of the grammar lines would come just with "optional," "zero or more," and/or "one or more notations" (?, *, and/or +)

To continue/extend the proof of concept, I just added sequencing to the peggy-like notation, optionally with `@` symbols to mark the items that should be "plucked" into the result of parsing the sequence. This latter feature borrowed from peggy makes it easy, for example, to ignore syntax-only punctuation in the result returned from parsing a sequence (such as the commas in a list of arguments). At this point, already 19 out of 46 lines in the mathjs grammar specification can be expressed in the peggy-like notation, even though the notation parser only has 7 rules. Several more of the grammar lines would come just with "optional," "zero or more," and/or "one or more notations" (`?`, `*`, and/or `+`)
refactor: rename 'pick' to 'pluck' to match peggy trminology
All checks were successful
/ test (pull_request) Successful in 18s
ff0e699f1a
feat: add quantifiers ? and +/* optionally with separators
All checks were successful
/ test (pull_request) Successful in 18s
19aecf04a4
Author
Owner

Indeed, I have now added the quantifiers ?, *, and + to the peggy-like notation, plus an extension to that notation suggested by the parser-combinators zeroOrMany and oneOrMany combinators: if there is an expression in square brackets after the quantifier, that represents a "separator" that must match between occurrences of the quantified item, but the matches of the separator are dropped. For example, Assignment+[%comma] represents a comma-separated list of one or more Assignment matches, and returns an array of containing just the results of the Assignment matches.

With this added, just over half of the rules in mathjs grammar are defined using the peggy-like notation. The next bit to add that would allow more rules to be expressed this way is parenthesization to indicate subexpressions.

Indeed, I have now added the quantifiers `?`, `*`, and `+` to the peggy-like notation, plus an extension to that notation suggested by the parser-combinators `zeroOrMany` and `oneOrMany` combinators: if there is an expression in square brackets after the quantifier, that represents a "separator" that must match between occurrences of the quantified item, but the matches of the separator are dropped. For example, `Assignment+[%comma]` represents a comma-separated list of one or more Assignment matches, and returns an array of containing just the results of the Assignment matches. With this added, just over half of the rules in mathjs grammar are defined using the peggy-like notation. The next bit to add that would allow more rules to be expressed this way is parenthesization to indicate subexpressions.
refactor: Allow an optional marker ? in addition to a quantifier +
All checks were successful
/ test (pull_request) Successful in 17s
4ad9a81a44
Actually for implementation convenience `X*?` is allowed too, although
  it is not useful because `X*` always matches, so the optionality never
  has any effect. But `X+?` has subtly different behavior even though it
  accepts exactly the same expressions that `X*` does: it produces a result
  of `null` when there is no X at this point in the parse, whereas `X*`
  produces an empty array. The distinction can occasionally be important,
  for example when plucking results, the `null` will not be plucked.
feat: add parenthesized subexpressions to the pc notation
All checks were successful
/ test (pull_request) Successful in 17s
a3e23cd079
Author
Owner

OK, I added parenthesized subexpressions as well as allowed X+? which is very similar to X* except that it produces a null result when there are no occurrences of X (as opposed to X* that produces the empty list). This distinction is important when plucking, because nulls are never plucked.

With these changes, just 18 of 46 rules in the mathjs grammar can't be expressed in the peggyjs-like notation.

The next feature that would get a number of the remaining rules are positive and negative assertions, likely with prefix operators & for positive and ! for negative.

OK, I added parenthesized subexpressions as well as allowed `X+?` which is very similar to `X*` except that it produces a null result when there are no occurrences of X (as opposed to `X*` that produces the empty list). This distinction is important when plucking, because nulls are never plucked. With these changes, just 18 of 46 rules in the mathjs grammar _can't_ be expressed in the peggyjs-like notation. The next feature that would get a number of the remaining rules are positive and negative assertions, likely with prefix operators `&` for positive and `!` for negative.
feat: add positive and negative lookahead and re-examine last assertions
All checks were successful
/ test (pull_request) Successful in 18s
0d011a65f0
Author
Owner

OK, the positive and negative assertions seem to have worked ok. Besides a number of assoc(...) calls, which to me don't seem worth converting to a string notation because it seems to me that such a function call is the moral equivalent of a parameterized rule, there are just two rules left that are not presented in the peggy-like string notation (namely the top-level "Block" and the ever-frustrating "ImplicitMultiplication") . I believe that if we introduce a notation for "throw an error now" then we will be able to write these two in the string notation; all of the other ingredients seem to be there.

I was thinking that a notation like ^Missing operand^ would be reasonable, with the upward-pointing carets suggesting that you are escaping up and out of the parse (by throwing an error "up the call chain".

OK, the positive and negative assertions seem to have worked ok. Besides a number of `assoc(...)` calls, which to me don't seem worth converting to a string notation because it seems to me that such a function call is the moral equivalent of a parameterized rule, there are just two rules left that are not presented in the peggy-like string notation (namely the top-level "Block" and the ever-frustrating "ImplicitMultiplication") . I believe that if we introduce a notation for "throw an error now" then we will be able to write these two in the string notation; all of the other ingredients seem to be there. I was thinking that a notation like `^Missing operand^` would be reasonable, with the upward-pointing carets suggesting that you are escaping up and out of the parse (by throwing an error "up the call chain".
feat: add immediate error-throwing construct
All checks were successful
/ test (pull_request) Successful in 17s
bc33807eb9
Author
Owner

OK, indeed, the whole parser is now working with parser-combinators but all of the grammar rules expressed in a notation essentially identical to peggy, except for the basic associative operators, which I think is completely fine: as I mentioned above, these are just defined with a custom assoc(Term, operator) function, which is acting just like a parametrized rule. Peggy doesn't have parameterized rules, but it's an extension that has been asked for, and using a new combinator composed of ones from parse-combinators feels just like a parametrized rule. The notation I set up with a very compact auxiliary parser just has a small handful of enhancements over peggy: %type for matching a token of a given type, &< and !< positive and negative assertions that start one token earlier, and ^Forbidden operator combination^ immediate error throw. Otherwise, I think the grammar could pretty much be fed right into peggy.

The fact that I was so easily able to get the greater flexibility of parser-combinators working with (a mild extension) of peggy notation is to me an argument in favor of using parser-combinators. Will be interested to hear what you think.

OK, indeed, the whole parser is now working with parser-combinators but all of the grammar rules expressed in a notation essentially identical to peggy, except for the basic associative operators, which I think is completely fine: as I mentioned above, these are just defined with a custom `assoc(Term, operator)` function, which is acting just like a parametrized rule. Peggy doesn't have parameterized rules, but it's an extension that has been asked for, and using a new combinator composed of ones from parse-combinators feels just like a parametrized rule. The notation I set up with a very compact auxiliary parser just has a small handful of enhancements over peggy: `%type` for matching a token of a given type, `&<` and `!<` positive and negative assertions that start one token earlier, and `^Forbidden operator combination^` immediate error throw. Otherwise, I think the grammar could pretty much be fed right into peggy. The fact that I was so easily able to get the greater flexibility of parser-combinators working with (a mild extension) of peggy notation is to me an argument in favor of using parser-combinators. Will be interested to hear what you think.
feat: allow separator to be plucked in separated quantifier expressions
All checks were successful
/ test (pull_request) Successful in 17s
5db0a1b691
Author
Owner

Oh, actually, just by allowing the separator to be plucked in a separated quantifier expression like 'Shift+[@%relation], so that it means one or more Shift expressions separated by relation tokens, but keep those relation tokens (as opposed to discarding them by default if plucking is not specified), we get assoc. That is to say, TERM+[@sep] is exactly what assoc(Term, sep) meant. So I added that slight extension, and now the entire mathjs grammar fits quite neatly into the resulting slightly-extended peggy-like notation. I think this is a win, personally.

Oh, actually, just by allowing the separator to be plucked in a separated quantifier expression like `'Shift+[@%relation]`, so that it means one or more Shift expressions separated by relation tokens, but keep those relation tokens (as opposed to discarding them by default if plucking is not specified), we get assoc. That is to say, `TERM+[@sep]` is exactly what `assoc(Term, sep)` meant. So I added that slight extension, and now the entire mathjs grammar fits quite neatly into the resulting slightly-extended peggy-like notation. I think this is a win, personally.
glen force-pushed pc_notation from 5db0a1b691
All checks were successful
/ test (pull_request) Successful in 17s
to bafe4dbf69
All checks were successful
/ test (pull_request) Successful in 17s
2026-03-07 14:34:26 +00:00
Compare
All checks were successful
/ test (pull_request) Successful in 17s
This pull request is marked as a work in progress.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin pc_notation:pc_notation
git switch pc_notation

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch parser_combinator
git merge --no-ff pc_notation
git switch pc_notation
git rebase parser_combinator
git switch parser_combinator
git merge --ff-only pc_notation
git switch pc_notation
git rebase parser_combinator
git switch parser_combinator
git merge --no-ff pc_notation
git switch parser_combinator
git merge --squash pc_notation
git switch parser_combinator
git merge --ff-only pc_notation
git switch parser_combinator
git merge pc_notation
git push origin parser_combinator
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
StudioInfinity/nanomath!52
No description provided.