WIP: proof-of-concept of allowing expressions in pcSystems

glen commented

2026-02-28 09:20:15 +00:00

Owner

One objection to parser-combinator was that its functional notation for rules is less clear and feels more verbose than the domain-specific languages of packages like peggy, Nearley, or Ohm. This PR demonstrates how easily parser-combinators can be used to define its own mini-DSL for specifying rules, providing more typical/comfortable syntax for a the mathjs grammar. The first commit only handles tokens, nonterminals, and ordered choice (any() in parser combinators, / in peggy, | in Ohm and Nearley), but already it makes the grammar more readable. This approach could easily be extended to allow * and + for repetition, whitespace-separation for sequencing, @ for value selection per peggy, etc, thereby allowing the mathjs grammar to be expressed almost entirely (or perhaps entirely) in a DSL defined using parser-combinators itself.

One objection to parser-combinator was that its functional notation for rules is less clear and feels more verbose than the domain-specific languages of packages like peggy, Nearley, or Ohm. This PR demonstrates how easily parser-combinators can be used to define its own mini-DSL for specifying rules, providing more typical/comfortable syntax for a the mathjs grammar. The first commit only handles tokens, nonterminals, and ordered choice (`any()` in parser combinators, `/` in peggy, `|` in Ohm and Nearley), but already it makes the grammar more readable. This approach could easily be extended to allow `*` and `+` for repetition, whitespace-separation for sequencing, `@` for value selection per peggy, etc, thereby allowing the mathjs grammar to be expressed almost entirely (or perhaps entirely) in a DSL defined using parser-combinators itself.

glen added 1 commit

2026-02-28 09:20:16 +00:00

feat: proof-of-concept of allowing expressions in pcSystems

/ test (pull_request) Successful in 17s

Details

e73557a45d

glen referenced this pull request

2026-02-28 09:54:57 +00:00

WIP:parser_combinator #51

glen added 1 commit

2026-02-28 09:59:47 +00:00

refactor: more straightforward way to specify identifier handling

/ test (pull_request) Successful in 17s

Details

f2f4e07767

glen added 1 commit

2026-03-01 22:41:30 +00:00

feat: add sequences, optionally w/ result entries picked, to notation

/ test (pull_request) Successful in 19s

Details

cd863490ac

glen commented

2026-03-01 22:56:45 +00:00

Author

Owner

To continue/extend the proof of concept, I just added sequencing to the peggy-like notation, optionally with @ symbols to mark the items that should be "plucked" into the result of parsing the sequence. This latter feature borrowed from peggy makes it easy, for example, to ignore syntax-only punctuation in the result returned from parsing a sequence (such as the commas in a list of arguments).

At this point, already 19 out of 46 lines in the mathjs grammar specification can be expressed in the peggy-like notation, even though the notation parser only has 7 rules. Several more of the grammar lines would come just with "optional," "zero or more," and/or "one or more notations" (?, *, and/or +)

To continue/extend the proof of concept, I just added sequencing to the peggy-like notation, optionally with `@` symbols to mark the items that should be "plucked" into the result of parsing the sequence. This latter feature borrowed from peggy makes it easy, for example, to ignore syntax-only punctuation in the result returned from parsing a sequence (such as the commas in a list of arguments). At this point, already 19 out of 46 lines in the mathjs grammar specification can be expressed in the peggy-like notation, even though the notation parser only has 7 rules. Several more of the grammar lines would come just with "optional," "zero or more," and/or "one or more notations" (`?`, `*`, and/or `+`)

glen added 1 commit

2026-03-01 22:59:25 +00:00

refactor: rename 'pick' to 'pluck' to match peggy trminology

/ test (pull_request) Successful in 18s

Details

ff0e699f1a

glen added 1 commit

2026-03-02 08:56:23 +00:00

feat: add quantifiers ? and +/* optionally with separators

/ test (pull_request) Successful in 18s

Details

19aecf04a4

glen commented

2026-03-02 09:03:17 +00:00

Author

Owner

Indeed, I have now added the quantifiers ?, *, and + to the peggy-like notation, plus an extension to that notation suggested by the parser-combinators zeroOrMany and oneOrMany combinators: if there is an expression in square brackets after the quantifier, that represents a "separator" that must match between occurrences of the quantified item, but the matches of the separator are dropped. For example, Assignment+[%comma] represents a comma-separated list of one or more Assignment matches, and returns an array of containing just the results of the Assignment matches.

With this added, just over half of the rules in mathjs grammar are defined using the peggy-like notation. The next bit to add that would allow more rules to be expressed this way is parenthesization to indicate subexpressions.

Indeed, I have now added the quantifiers `?`, `*`, and `+` to the peggy-like notation, plus an extension to that notation suggested by the parser-combinators `zeroOrMany` and `oneOrMany` combinators: if there is an expression in square brackets after the quantifier, that represents a "separator" that must match between occurrences of the quantified item, but the matches of the separator are dropped. For example, `Assignment+[%comma]` represents a comma-separated list of one or more Assignment matches, and returns an array of containing just the results of the Assignment matches. With this added, just over half of the rules in mathjs grammar are defined using the peggy-like notation. The next bit to add that would allow more rules to be expressed this way is parenthesization to indicate subexpressions.

glen added 1 commit

2026-03-02 22:22:39 +00:00

refactor: Allow an optional marker ? in addition to a quantifier +

/ test (pull_request) Successful in 17s

Details

4ad9a81a44

Actually for implementation convenience `X*?` is allowed too, although
  it is not useful because `X*` always matches, so the optionality never
  has any effect. But `X+?` has subtly different behavior even though it
  accepts exactly the same expressions that `X*` does: it produces a result
  of `null` when there is no X at this point in the parse, whereas `X*`
  produces an empty array. The distinction can occasionally be important,
  for example when plucking results, the `null` will not be plucked.

glen added 1 commit

2026-03-02 23:21:59 +00:00

feat: add parenthesized subexpressions to the pc notation

/ test (pull_request) Successful in 17s

Details

a3e23cd079

glen commented

2026-03-02 23:26:56 +00:00

Author

Owner

OK, I added parenthesized subexpressions as well as allowed X+? which is very similar to X* except that it produces a null result when there are no occurrences of X (as opposed to X* that produces the empty list). This distinction is important when plucking, because nulls are never plucked.

With these changes, just 18 of 46 rules in the mathjs grammar can't be expressed in the peggyjs-like notation.

The next feature that would get a number of the remaining rules are positive and negative assertions, likely with prefix operators & for positive and ! for negative.

OK, I added parenthesized subexpressions as well as allowed `X+?` which is very similar to `X*` except that it produces a null result when there are no occurrences of X (as opposed to `X*` that produces the empty list). This distinction is important when plucking, because nulls are never plucked. With these changes, just 18 of 46 rules in the mathjs grammar _can't_ be expressed in the peggyjs-like notation. The next feature that would get a number of the remaining rules are positive and negative assertions, likely with prefix operators `&` for positive and `!` for negative.

glen added 1 commit

2026-03-03 19:45:30 +00:00

feat: add positive and negative lookahead and re-examine last assertions

/ test (pull_request) Successful in 18s

Details

0d011a65f0

glen commented

2026-03-03 20:08:36 +00:00

Author

Owner

OK, the positive and negative assertions seem to have worked ok. Besides a number of assoc(...) calls, which to me don't seem worth converting to a string notation because it seems to me that such a function call is the moral equivalent of a parameterized rule, there are just two rules left that are not presented in the peggy-like string notation (namely the top-level "Block" and the ever-frustrating "ImplicitMultiplication") . I believe that if we introduce a notation for "throw an error now" then we will be able to write these two in the string notation; all of the other ingredients seem to be there.

I was thinking that a notation like ^Missing operand^ would be reasonable, with the upward-pointing carets suggesting that you are escaping up and out of the parse (by throwing an error "up the call chain".

OK, the positive and negative assertions seem to have worked ok. Besides a number of `assoc(...)` calls, which to me don't seem worth converting to a string notation because it seems to me that such a function call is the moral equivalent of a parameterized rule, there are just two rules left that are not presented in the peggy-like string notation (namely the top-level "Block" and the ever-frustrating "ImplicitMultiplication") . I believe that if we introduce a notation for "throw an error now" then we will be able to write these two in the string notation; all of the other ingredients seem to be there. I was thinking that a notation like `^Missing operand^` would be reasonable, with the upward-pointing carets suggesting that you are escaping up and out of the parse (by throwing an error "up the call chain".

glen added 1 commit

2026-03-03 21:34:15 +00:00

feat: add immediate error-throwing construct

/ test (pull_request) Successful in 17s

Details

bc33807eb9

glen commented

2026-03-03 21:56:06 +00:00

Author

Owner

OK, indeed, the whole parser is now working with parser-combinators but all of the grammar rules expressed in a notation essentially identical to peggy, except for the basic associative operators, which I think is completely fine: as I mentioned above, these are just defined with a custom assoc(Term, operator) function, which is acting just like a parametrized rule. Peggy doesn't have parameterized rules, but it's an extension that has been asked for, and using a new combinator composed of ones from parse-combinators feels just like a parametrized rule. The notation I set up with a very compact auxiliary parser just has a small handful of enhancements over peggy: %type for matching a token of a given type, &< and !< positive and negative assertions that start one token earlier, and ^Forbidden operator combination^ immediate error throw. Otherwise, I think the grammar could pretty much be fed right into peggy.

The fact that I was so easily able to get the greater flexibility of parser-combinators working with (a mild extension) of peggy notation is to me an argument in favor of using parser-combinators. Will be interested to hear what you think.

OK, indeed, the whole parser is now working with parser-combinators but all of the grammar rules expressed in a notation essentially identical to peggy, except for the basic associative operators, which I think is completely fine: as I mentioned above, these are just defined with a custom `assoc(Term, operator)` function, which is acting just like a parametrized rule. Peggy doesn't have parameterized rules, but it's an extension that has been asked for, and using a new combinator composed of ones from parse-combinators feels just like a parametrized rule. The notation I set up with a very compact auxiliary parser just has a small handful of enhancements over peggy: `%type` for matching a token of a given type, `&<` and `!<` positive and negative assertions that start one token earlier, and `^Forbidden operator combination^` immediate error throw. Otherwise, I think the grammar could pretty much be fed right into peggy. The fact that I was so easily able to get the greater flexibility of parser-combinators working with (a mild extension) of peggy notation is to me an argument in favor of using parser-combinators. Will be interested to hear what you think.

glen added 1 commit

2026-03-03 22:47:12 +00:00

feat: allow separator to be plucked in separated quantifier expressions

/ test (pull_request) Successful in 17s

Details

5db0a1b691

glen commented

2026-03-03 22:52:07 +00:00

Author

Owner

Oh, actually, just by allowing the separator to be plucked in a separated quantifier expression like 'Shift+[@%relation], so that it means one or more Shift expressions separated by relation tokens, but keep those relation tokens (as opposed to discarding them by default if plucking is not specified), we get assoc. That is to say, TERM+[@sep] is exactly what assoc(Term, sep) meant. So I added that slight extension, and now the entire mathjs grammar fits quite neatly into the resulting slightly-extended peggy-like notation. I think this is a win, personally.

Oh, actually, just by allowing the separator to be plucked in a separated quantifier expression like `'Shift+[@%relation]`, so that it means one or more Shift expressions separated by relation tokens, but keep those relation tokens (as opposed to discarding them by default if plucking is not specified), we get assoc. That is to say, `TERM+[@sep]` is exactly what `assoc(Term, sep)` meant. So I added that slight extension, and now the entire mathjs grammar fits quite neatly into the resulting slightly-extended peggy-like notation. I think this is a win, personally.

glen referenced this pull request

2026-03-03 22:56:33 +00:00

WIP:parser_combinator #51

glen force-pushed pc_notation from 5db0a1b691

/ test (pull_request) Successful in 17s

Details

to bafe4dbf69

/ test (pull_request) Successful in 17s

Details

2026-03-07 14:34:26 +00:00

Compare

glen referenced this pull request

2026-03-07 20:56:41 +00:00

feat: define parse() function that takes a string to a Node tree #54

/ test (pull_request) Successful in 17s

Details

This pull request is marked as a work in progress.

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin pc_notation:pc_notation

git switch pc_notation

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch parser_combinator

git merge --no-ff pc_notation

git switch pc_notation

git rebase parser_combinator

git switch parser_combinator

git merge --ff-only pc_notation

git switch pc_notation

git rebase parser_combinator

git switch parser_combinator

git merge --no-ff pc_notation

git switch parser_combinator

git merge --squash pc_notation

git switch parser_combinator

git merge --ff-only pc_notation

git switch parser_combinator

git merge pc_notation

git push origin parser_combinator

Rows
Columns

WIP: proof-of-concept of allowing expressions in pcSystems #52

Checkout

Merge