WIP:parser_combinator #51
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "parser_combinator"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The final in the series #47, #49, #50 exploring parsing packages for the expression language. This one still uses the Tokenizr lexer, and then parses the resulting token stream with parser-combinator. For the initial fragment implemented so far, parse-combinator feels very clean and simple, so I remain hopeful that this may be the best option yet.
OK, the parser created via parser-combinators seems to be complete; I will be migrating the parser tests from ohm/peggy to this branch soon.
OK, the Tokenizr lexer and parser-combinator parser are now passing all of the tests from the prior series of parsing packages, including ones that did not pass in the previous rounds. While there are still more tests from mainline mathjs to transfer here, this milestone means this parser is mature enough that its code weight is not going to increase appreciably from where it is now. Therefore, the next step is to test how much this particular lexer/parser combination increases the mathjs bundle.
parser_combinatorto WIP:parser_combinatorAll right, @josdejong , I totally don't understand it: even though the parser code file is only 20K (less than half the size of the current mathjs) and the entire parser-combinators library I am using is 20K and the tokenizer library is 20K and
unrawstring-escape interpreter package is 10K, which only add up to 70K, the math.js bundle increases by a whopping 150K (23%) when I just install it in main-line mathjs and call its parse function from parseStart. (I checked there are no other new dependencies: these are all of the new files.)How is that even possible? I would have thought that 70K would have been the maximum possible increase, since that's the sum of the sizes of all of the new files...
Anyhow, the zipped bundle increases by a lot less in this case, only 20K (11.4%) for a total of 195K.
I'm assuming that's still more than you're willing to devote to a new parser...
I'm really discouraged and frustrated. I was quite happy with this latest parser, it seems very clean and simple. And I assumed that since the packages it is using are very small that it would be the least size increase yet. Alas.
So here are some things I could try:
(A) Go ahead and get the Nearley parser working even though Nearley is not maintained. I was pretty close to done when I realized that Nearley is not maintained, so it wouldn't be a lot of work. That way we could see how much it adds with a more substantial grammar (so far I only measured it with a trivial grammar, and that was very lightweight, so there's clearly not a lot of overhead from just the library itself). Then if a full Nearley parser is still good on resource usage, we could consider forking/cloning it and maintaining it just enough to use in mathjs, as probably it's my favorite to use of all four, with this current parser-combinators one a close second.
(B) The parser-combinators library feels so lightweight that I could just suck a streamlined version of it fully into one source file in our repo (i.e., not use the npm package at all). Again, that would be a pretty simple experiment, to just have our own little combinator library based on parser-combinators, but slimmer, in our source tree, and see if that reduces the bundle size increase at all.
(C) At this point, having been through the entire parser four times now, I could just go ahead and hand-write the most streamlined custom parser I can think of, and see what that does to the bundle size by way of comparison.
Which of (A), (B), (C) do you think I should try next? As before, my efforts on this are on hold until I get some feedback about this. Thanks!
Oh, good news. I decided I would break it down by the three new libraries, just to see what was going on. Cutting out parser-combinator didn't have a big effect, but putting in a dummy (non-working!) lexer and uninstalling the tokenizr library made a huge difference. Without that package, the bundle size only grows by 20K and the zipped bundle size only grows by 6K, which seem totally acceptable to me (especially with the savings we will eventually get from removing the existing parser).
So actually, the next step is clear: substitute a different or handwritten tokenizer (I really don't mind a handbuilt tokenizer, as they are much simpler pieces of software) and then see how things are. So I am back on the project (not tonight) and will keep you posted. I really wonder what it is about the tokenizr package that led to the blowup: it doesn't at all seem like a heavyweight package. As I said, the entire distribution is only 20K. How could it expand the bundle so much?
So this is very strange. I decided to try writing the tokenizer with parser-combinators as well: why not, we have the parser-combinators package for the parser anyway, so just use the same package to do tokenizing too. It seemed that should be lightweight.
But sadly, that code adds 225K to the bundle and 18K to the zip (!). And what's totally weird is if I cut out the lexer -- which uses the same package as the parser -- but leave in the parser, the bundle size instead only increases a completely acceptable 18K and the zip only 5K. So I totally don't understand what is going on, since the lexer source file is only 13K, and it only imports packages that the parser already imports. There is clearly something I don't comprehend about bundling.
In any case, this new parser-combinators lexer is no good either, so it seems the only alternative is to try a new hand-written tokenizer, which as I said I don't really mind. It's just sort of a pain since even for the tokenizer, I find one of the tokenizer/grammar formalisms clearer and easier to read/write than handwritten code. But it's clearly the next thing to try -- I assume that I can write a very lean tokenizer...
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.