WIP: proof-of-concept of allowing expressions in pcSystems #52
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "pc_notation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
One objection to parser-combinator was that its functional notation for rules is less clear and feels more verbose than the domain-specific languages of packages like peggy, Nearley, or Ohm. This PR demonstrates how easily parser-combinators can be used to define its own mini-DSL for specifying rules, providing more typical/comfortable syntax for a the mathjs grammar. The first commit only handles tokens, nonterminals, and ordered choice (
any()in parser combinators,/in peggy,|in Ohm and Nearley), but already it makes the grammar more readable. This approach could easily be extended to allow*and+for repetition, whitespace-separation for sequencing,@for value selection per peggy, etc, thereby allowing the mathjs grammar to be expressed almost entirely (or perhaps entirely) in a DSL defined using parser-combinators itself.To continue/extend the proof of concept, I just added sequencing to the peggy-like notation, optionally with
@symbols to mark the items that should be "plucked" into the result of parsing the sequence. This latter feature borrowed from peggy makes it easy, for example, to ignore syntax-only punctuation in the result returned from parsing a sequence (such as the commas in a list of arguments).At this point, already 19 out of 46 lines in the mathjs grammar specification can be expressed in the peggy-like notation, even though the notation parser only has 7 rules. Several more of the grammar lines would come just with "optional," "zero or more," and/or "one or more notations" (
?,*, and/or+)Indeed, I have now added the quantifiers
?,*, and+to the peggy-like notation, plus an extension to that notation suggested by the parser-combinatorszeroOrManyandoneOrManycombinators: if there is an expression in square brackets after the quantifier, that represents a "separator" that must match between occurrences of the quantified item, but the matches of the separator are dropped. For example,Assignment+[%comma]represents a comma-separated list of one or more Assignment matches, and returns an array of containing just the results of the Assignment matches.With this added, just over half of the rules in mathjs grammar are defined using the peggy-like notation. The next bit to add that would allow more rules to be expressed this way is parenthesization to indicate subexpressions.
?in addition to a quantifier+OK, I added parenthesized subexpressions as well as allowed
X+?which is very similar toX*except that it produces a null result when there are no occurrences of X (as opposed toX*that produces the empty list). This distinction is important when plucking, because nulls are never plucked.With these changes, just 18 of 46 rules in the mathjs grammar can't be expressed in the peggyjs-like notation.
The next feature that would get a number of the remaining rules are positive and negative assertions, likely with prefix operators
&for positive and!for negative.OK, the positive and negative assertions seem to have worked ok. Besides a number of
assoc(...)calls, which to me don't seem worth converting to a string notation because it seems to me that such a function call is the moral equivalent of a parameterized rule, there are just two rules left that are not presented in the peggy-like string notation (namely the top-level "Block" and the ever-frustrating "ImplicitMultiplication") . I believe that if we introduce a notation for "throw an error now" then we will be able to write these two in the string notation; all of the other ingredients seem to be there.I was thinking that a notation like
^Missing operand^would be reasonable, with the upward-pointing carets suggesting that you are escaping up and out of the parse (by throwing an error "up the call chain".OK, indeed, the whole parser is now working with parser-combinators but all of the grammar rules expressed in a notation essentially identical to peggy, except for the basic associative operators, which I think is completely fine: as I mentioned above, these are just defined with a custom
assoc(Term, operator)function, which is acting just like a parametrized rule. Peggy doesn't have parameterized rules, but it's an extension that has been asked for, and using a new combinator composed of ones from parse-combinators feels just like a parametrized rule. The notation I set up with a very compact auxiliary parser just has a small handful of enhancements over peggy:%typefor matching a token of a given type,&<and!<positive and negative assertions that start one token earlier, and^Forbidden operator combination^immediate error throw. Otherwise, I think the grammar could pretty much be fed right into peggy.The fact that I was so easily able to get the greater flexibility of parser-combinators working with (a mild extension) of peggy notation is to me an argument in favor of using parser-combinators. Will be interested to hear what you think.
Oh, actually, just by allowing the separator to be plucked in a separated quantifier expression like
'Shift+[@%relation], so that it means one or more Shift expressions separated by relation tokens, but keep those relation tokens (as opposed to discarding them by default if plucking is not specified), we get assoc. That is to say,TERM+[@sep]is exactly whatassoc(Term, sep)meant. So I added that slight extension, and now the entire mathjs grammar fits quite neatly into the resulting slightly-extended peggy-like notation. I think this is a win, personally.5db0a1b691bafe4dbf69View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.