Possible next direction: Parsing and type-enhanced Node trees #45
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In terms of the questions about what direction to go next to move nanomath toward a new mathjs engine, here's one option:
Work on the expressions category, and a streamlined set of Node types, and a new more modular parser.
In this plan, the current mathjs parser work would conclude with the more systematic tokenizer. Once that is in, the tokenizer would be moved to nanomath, with a new configurable/extensible parser on top of it.
In particular, the new parser would assign types to every node, denoting the type of the value produced at that node. By default, it would assume Unknown type at Symbol nodes. However, prior to the compile step, there would be a method to assign types to symbols and have the recalculated types propagate up the whole parse tree. Then if this has been done, the compile step would resolve and wire together specific implementations for a much more performant compiled version, that would work as long as the supplied values for the symbols matched the types supplied in the typing step.
This direction would serve at least three goals:
One highly plausible candidate for a new parsing framework is Ohm which has the advantage that an ohm grammar can be derived from and extended, which means that if we make the default ohm grammar and/or its source publicly exposed in the library, and hav a config setting for the grammar, then it should be relatively straightforward for clients to extend the grammar.
Note that being from the "PEG" family of grammars, Ohm has no separate tokenizer, so the current tokenizer would have to be converted into grammar rules (probably lexical ones) and the parser into another collection of rules (presumably syntactic ones).
More on how option (3) above (which is option (C) in #40) can alleviate the redundancy problem: The idea is to allow mathjs expressions as the implementation for an operation in the TypeDispatcher. Say just for argument, to use the example that Jos mentioned:
For convenience here we are saying that this implementation would only be used for those three types, and there might be
a different implementation for Complex, say (which is not so farfetched, maybe it's better to triple the arg and cube the abs and reassemble that into a Complex number).
How would this turn into actual implementations?
First, of course, it would make the parse tree of the expression , roughly
multiply(multiply(x, x), x)in schematic form. And it would note there is one free variable x, so to compute the cube for these types it has to set x to the argument and then compute the expression. (For multi-argument functions, it might be necessary also to say which variable corresponds to which argument.)So now suppose the resolution for this operation for a Fraction argument is requested. What happens is that the TypeDispatcher sets of a context in which the symbol x has type Fraction, and it then asks the parse tree to type itself on that basis (it keeps the untyped parse tree around so that it can type it differently for other resolution requests). The parse tree types itself bottom-up; it labels the "x" symbol nodes with type Fraction, then it looks up the result of multiply(Fraction, Fraction) and gets Fraction (it should do this with type strategy "full" except at the topmost node so that it doesn't get bigint|Fraction in intermediate nodes). So then it labels that node with Fraction, and moves up the tree. Obviously with more complicated parse trees there could be different types at different nodes, but the point is with the information we have in nanomath about return types, it is possible to label every node in the tree with a fully-instantiated type (i.e. Complex(NumberT) rather than just Complex()), once types are specified for all the symbols.
So now the TypeDispatcher compiles the typed parse tree, and we set up so that when a typed parse tree is compiled, type resolution occurs at every function node, selecting the proper implementation at each node. Then those implementations are compiled together in just the same way that mathjs 15 does now with the typed-functions at each node, but you end up with a JavaScript function that does no type resolution internally since we already resolved every function in the tree before assembling. So you should end up with something essentially equivalent to
just perhaps a tiny bit slower because
compilecreates an intermediate function at every node, rather than compiling the body all at once. But with current runtimes I think that penalty is quite small.And now the typeDispatcher immediately uses that compiled function whenever it gets a Fraction argument for cube, and when it encounters the cube of a number, say, it retypes the parse tree and compiles it to construct a new implementation function, much the way that the hand-implemented one would re-execute the factory with
Tset toNumberT.So that's the concept.
Jos also points out that to use this widely, the mathjs expression language would have to be enhanced with fuller programming features. I agree. But this effort is a good motivation for doing so, whereas before I didn't see one. And I think we should d it incrementally and conservatively. For example, you mention local variables. The mathjs expression language already has one context of symbol values that it can add to. They are thus all essentially global. I think that will suffice for 90% or more of the implementations we might want. When we hit one where it's really useful to have true local variables, I think we should initially do it in a functional style, e.g.
let(temp, 17, CODE_USING_temp) + CODE_CANNOT_SEE_tempthat will fit in to the existing language most naturally.Jos also mentions looping. Again, mathjs already has some means of iteration, e.g.
sum((1:10).forEach(_(i) = i*i))sums the squares of the first ten positive integers. Or you can even dototal = 0; (1:10).forEach(_(i) = (total = total + i*i)); totalalthough currently that returns a ResultSet rather than a number. I think the existing iterations will cover a third to a half of what one might one, and that as needed we can add other sorts of iteration, ideally trying to stay within the functional style that mathjs has already developed.If you want to see what a possible grammar specification for the mathjs expression language looks like in Ohm, a packrat-based left-recursive parsing package for JavaScript, check out #47.