Possible next direction: Parsing and type-enhanced Node trees

glen commented

2025-12-17 06:28:16 +00:00

Owner

In terms of the questions about what direction to go next to move nanomath toward a new mathjs engine, here's one option:

Work on the expressions category, and a streamlined set of Node types, and a new more modular parser.

In this plan, the current mathjs parser work would conclude with the more systematic tokenizer. Once that is in, the tokenizer would be moved to nanomath, with a new configurable/extensible parser on top of it.

In particular, the new parser would assign types to every node, denoting the type of the value produced at that node. By default, it would assume Unknown type at Symbol nodes. However, prior to the compile step, there would be a method to assign types to symbols and have the recalculated types propagate up the whole parse tree. Then if this has been done, the compile step would resolve and wire together specific implementations for a much more performant compiled version, that would work as long as the supplied values for the symbols matched the types supplied in the typing step.

This direction would serve at least three goals:

move nanomath closer to readiness to be a new mathjs engine.
streamline and modularize the parser
alleviate a significant chunk of the redundancy burden noted in #40, by making it practical to implement many library methods in the mathjs expression language: for some type pattern(s), one would give an expression for the function, and to resolve that function for specific arguments, the symbols in the expression for its inputs would be typed, it would be compiled with those typings, and the resulting compiled form would be evaluated on the inputs. When defined in this format, the implementor would not need to do any resolution of methods for the implementation: they would be selected in the typed compile process.

In terms of the [questions](https://github.com/josdejong/mathjs/discussions/2741#discussioncomment-15266712) about what direction to go next to move nanomath toward a new mathjs engine, here's one option: Work on the expressions category, and a streamlined set of Node types, and a new more modular parser. In this plan, the current mathjs parser work would conclude with the more systematic tokenizer. Once that is in, the tokenizer would be moved to nanomath, with a new configurable/extensible parser on top of it. In particular, the new parser would assign types to every node, denoting the type of the value produced at that node. By default, it would assume Unknown type at Symbol nodes. However, prior to the compile step, there would be a method to assign types to symbols and have the recalculated types propagate up the whole parse tree. Then if this has been done, the compile step would resolve and wire together specific implementations for a much more performant compiled version, that would work as long as the supplied values for the symbols matched the types supplied in the typing step. This direction would serve at least three goals: 1) move nanomath closer to readiness to be a new mathjs engine. 2) streamline and modularize the parser 3) alleviate a significant chunk of the redundancy burden noted in #40, by making it practical to implement many library methods in the mathjs expression language: for some type pattern(s), one would give an expression for the function, and to resolve that function for specific arguments, the symbols in the expression for its inputs would be typed, it would be compiled with those typings, and the resulting compiled form would be evaluated on the inputs. When defined in this format, the implementor would not need to do any resolution of methods for the implementation: they would be selected in the typed compile process.

glen referenced this issue

2025-12-17 15:19:31 +00:00

Nanomath resolution process is all WET... #40

glen referenced this issue

2025-12-17 15:29:20 +00:00

Nanomath resolution process is all WET... #40

glen commented

2025-12-18 00:05:02 +00:00

Author

Owner

One highly plausible candidate for a new parsing framework is Ohm which has the advantage that an ohm grammar can be derived from and extended, which means that if we make the default ohm grammar and/or its source publicly exposed in the library, and hav a config setting for the grammar, then it should be relatively straightforward for clients to extend the grammar.

Note that being from the "PEG" family of grammars, Ohm has no separate tokenizer, so the current tokenizer would have to be converted into grammar rules (probably lexical ones) and the parser into another collection of rules (presumably syntactic ones).

One highly plausible candidate for a new parsing framework is [Ohm](https://ohmjs.org/) which has the advantage that an ohm grammar can be derived from and extended, which means that if we make the default ohm grammar and/or its source publicly exposed in the library, and hav a config setting for the grammar, then it should be relatively straightforward for clients to extend the grammar. Note that being from the "PEG" family of grammars, Ohm has no separate tokenizer, so the current tokenizer would have to be converted into grammar rules (probably lexical ones) and the parser into another collection of rules (presumably syntactic ones).

glen referenced this issue

2025-12-19 15:11:23 +00:00

Nanomath resolution process is all WET... #40

josdejong referenced this issue

2025-12-19 15:39:01 +00:00

Nanomath resolution process is all WET... #40

glen commented

2025-12-19 16:06:28 +00:00

Author

Owner

More on how option (3) above (which is option (C) in #40) can alleviate the redundancy problem: The idea is to allow mathjs expressions as the implementation for an operation in the TypeDispatcher. Say just for argument, to use the example that Jos mentioned:

export const cube = match(OneOf(number, bigint, Fraction), 'x*x*x')

For convenience here we are saying that this implementation would only be used for those three types, and there might be
a different implementation for Complex, say (which is not so farfetched, maybe it's better to triple the arg and cube the abs and reassemble that into a Complex number).

How would this turn into actual implementations?
First, of course, it would make the parse tree of the expression , roughly multiply(multiply(x, x), x) in schematic form. And it would note there is one free variable x, so to compute the cube for these types it has to set x to the argument and then compute the expression. (For multi-argument functions, it might be necessary also to say which variable corresponds to which argument.)

So now suppose the resolution for this operation for a Fraction argument is requested. What happens is that the TypeDispatcher sets of a context in which the symbol x has type Fraction, and it then asks the parse tree to type itself on that basis (it keeps the untyped parse tree around so that it can type it differently for other resolution requests). The parse tree types itself bottom-up; it labels the "x" symbol nodes with type Fraction, then it looks up the result of multiply(Fraction, Fraction) and gets Fraction (it should do this with type strategy "full" except at the topmost node so that it doesn't get bigint|Fraction in intermediate nodes). So then it labels that node with Fraction, and moves up the tree. Obviously with more complicated parse trees there could be different types at different nodes, but the point is with the information we have in nanomath about return types, it is possible to label every node in the tree with a fully-instantiated type (i.e. Complex(NumberT) rather than just Complex()), once types are specified for all the symbols.

So now the TypeDispatcher compiles the typed parse tree, and we set up so that when a typed parse tree is compiled, type resolution occurs at every function node, selecting the proper implementation at each node. Then those implementations are compiled together in just the same way that mathjs 15 does now with the typed-functions at each node, but you end up with a JavaScript function that does no type resolution internally since we already resolved every function in the tree before assembling. So you should end up with something essentially equivalent to

export const cube = match(OneOf(number, bigint, Fraction), (math, T) => {
   const mult = math.multiply.resolve([T,T])
   return x => mult(mult(x, x), x)
}

just perhaps a tiny bit slower because compile creates an intermediate function at every node, rather than compiling the body all at once. But with current runtimes I think that penalty is quite small.

And now the typeDispatcher immediately uses that compiled function whenever it gets a Fraction argument for cube, and when it encounters the cube of a number, say, it retypes the parse tree and compiles it to construct a new implementation function, much the way that the hand-implemented one would re-execute the factory with T set to NumberT.

So that's the concept.

Jos also points out that to use this widely, the mathjs expression language would have to be enhanced with fuller programming features. I agree. But this effort is a good motivation for doing so, whereas before I didn't see one. And I think we should d it incrementally and conservatively. For example, you mention local variables. The mathjs expression language already has one context of symbol values that it can add to. They are thus all essentially global. I think that will suffice for 90% or more of the implementations we might want. When we hit one where it's really useful to have true local variables, I think we should initially do it in a functional style, e.g. let(temp, 17, CODE_USING_temp) + CODE_CANNOT_SEE_temp that will fit in to the existing language most naturally.

Jos also mentions looping. Again, mathjs already has some means of iteration, e.g. sum((1:10).forEach(_(i) = i*i)) sums the squares of the first ten positive integers. Or you can even do total = 0; (1:10).forEach(_(i) = (total = total + i*i)); total although currently that returns a ResultSet rather than a number. I think the existing iterations will cover a third to a half of what one might one, and that as needed we can add other sorts of iteration, ideally trying to stay within the functional style that mathjs has already developed.

More on how option (3) above (which is option (C) in #40) can alleviate the redundancy problem: The idea is to allow mathjs expressions as the implementation for an operation in the TypeDispatcher. Say just for argument, to use the example that Jos mentioned: ``` export const cube = match(OneOf(number, bigint, Fraction), 'x*x*x') ``` For convenience here we are saying that this implementation would only be used for those three types, and there might be a different implementation for Complex, say (which is not _so_ farfetched, maybe it's better to triple the arg and cube the abs and reassemble that into a Complex number). How would this turn into actual implementations? First, of course, it would make the parse tree of the expression , roughly `multiply(multiply(x, x), x)` in schematic form. And it would note there is one free variable x, so to compute the cube for these types it has to set x to the argument and then compute the expression. (For multi-argument functions, it might be necessary also to say which variable corresponds to which argument.) So now suppose the resolution for this operation for a Fraction argument is requested. What happens is that the TypeDispatcher sets of a context in which the symbol x has type Fraction, and it then asks the parse tree to type itself on that basis (it keeps the untyped parse tree around so that it can type it differently for other resolution requests). The parse tree types itself bottom-up; it labels the "x" symbol nodes with type Fraction, then it looks up the result of multiply(Fraction, Fraction) and gets Fraction (it should do this with type strategy "full" except at the topmost node so that it doesn't get bigint|Fraction in intermediate nodes). So then it labels that node with Fraction, and moves up the tree. Obviously with more complicated parse trees there could be different types at different nodes, but the point is with the information we have in nanomath about return types, it is possible to label every node in the tree with a fully-instantiated type (i.e. Complex(NumberT) rather than just Complex()), once types are specified for all the symbols. So now the TypeDispatcher compiles the _typed_ parse tree, and we set up so that when a typed parse tree is compiled, type resolution occurs at every function node, selecting the proper implementation at each node. Then those implementations are compiled together in just the same way that mathjs 15 does now with the typed-functions at each node, but you end up with a JavaScript function that does no type resolution internally since we already resolved every function in the tree before assembling. So you should end up with something essentially equivalent to ``` export const cube = match(OneOf(number, bigint, Fraction), (math, T) => { const mult = math.multiply.resolve([T,T]) return x => mult(mult(x, x), x) } ``` just perhaps a tiny bit slower because `compile` creates an intermediate function at every node, rather than compiling the body all at once. But with current runtimes I think that penalty is quite small. And now the typeDispatcher immediately uses that compiled function whenever it gets a Fraction argument for cube, and when it encounters the cube of a number, say, it retypes the parse tree and compiles it to construct a new implementation function, much the way that the hand-implemented one would re-execute the factory with `T` set to `NumberT`. So that's the concept. Jos also points out that to use this widely, the mathjs expression language would have to be enhanced with fuller programming features. I agree. But this effort is a good motivation for doing so, whereas before I didn't see one. And I think we should d it incrementally and conservatively. For example, you mention local variables. The mathjs expression language already has one context of symbol values that it can add to. They are thus all essentially global. I think that will suffice for 90% or more of the implementations we might want. When we hit one where it's really useful to have true local variables, I think we should initially do it in a functional style, e.g. `let(temp, 17, CODE_USING_temp) + CODE_CANNOT_SEE_temp` that will fit in to the existing language most naturally. Jos also mentions looping. Again, mathjs already has some means of iteration, e.g. `sum((1:10).forEach(_(i) = i*i))` sums the squares of the first ten positive integers. Or you can even do `total = 0; (1:10).forEach(_(i) = (total = total + i*i)); total` although currently that returns a ResultSet rather than a number. I think the existing iterations will cover a third to a half of what one might one, and that as needed we can add other sorts of iteration, ideally trying to stay within the functional style that mathjs has already developed.

glen referenced this issue

2025-12-19 16:06:57 +00:00

Nanomath resolution process is all WET... #40

glen commented

2025-12-19 21:54:15 +00:00

Author

Owner

If you want to see what a possible grammar specification for the mathjs expression language looks like in Ohm, a packrat-based left-recursive parsing package for JavaScript, check out #47.

glen commented

2026-02-04 10:10:13 +00:00

Author

Owner

Jos and I evaluated the Ohm parser and compared to mathjs, it has a very large footprint: on the order of 15% of the entire previously existing mathjs bundle. Therefore, the next step would be to try to reimplement the parser in peggy, which will hopefully generate a much lighter-weight parser (but still be powerful enough to parse the mathjs expression language...)

Jos and I evaluated the Ohm parser and compared to mathjs, it has a very large footprint: on the order of 15% of the entire previously existing mathjs bundle. Therefore, the next step would be to try to reimplement the parser in [peggy](https://peggyjs.org/documentation.html), which will hopefully generate a much lighter-weight parser (but still be powerful enough to parse the mathjs expression language...)

👍 1

glen commented

2026-02-05 18:15:35 +00:00

Author

Owner

@josdejong is this where you were going to post the notes from our consideration of possible block syntax for the mathjs language? I couldn't find a new posting of it; if you put it somewhere please post a link here. In any case, one other option did occur to me: there are a couple of ASCII symbols we are not (much) using yet, chiefly @ and \. So we could make an opening digraph for blocks using one of these and have it match a single-character closing (or use {...} for blocks and say \{ ... } for objects). So here are some logical possibilities for blocks:

f(x) = \{ y = x^2; x+y }
f(x) = @{ y = x^2; x+y }
f(x) = @( y = x^2; x+y )
f(x) = @ y = x^2; x+y }

All of these are trivially parseable and have zero effect on any existing syntax, by virtue of using a new symbol. I don't see any reason we would choose @( over @{ for blocks, since curly braces are generally more associated with blocks -- I am just trying to be logically complete. And I think the last one, even though it is shortest, is just about unreadable since my eye just doesn't want to match a @ with a }. But the first two seem entirely plausible to me. Of them, I think I prefer the first, because it still leaves @ for future use, and somehow the idea of escaping the opening curly bracket ties in with an idea that you are sort of escaping back to the initial Block environment. Or conversely, if you would like to consider {...} for blocks and \{a: 7, b:2} for objects since likely blocks will be more used in the mathjs language than objects once they are available, that would be fine, too, since the idea that you have to escape the opening brace to get it to be an object since otherwise it would be a block also makes sense to me.

Anyhow, hoping that maybe one of these is better than the options we discussed yesterday.

@josdejong is this where you were going to post the notes from our consideration of possible block syntax for the mathjs language? I couldn't find a new posting of it; if you put it somewhere please post a link here. In any case, one other option did occur to me: there are a couple of ASCII symbols we are not (much) using yet, chiefly `@` and `\`. So we could make an opening digraph for blocks using one of these and have it match a single-character closing (or use `{...}` for blocks and say `\{ ... }` for objects). So here are some logical possibilities for blocks: `f(x) = \{ y = x^2; x+y }` `f(x) = @{ y = x^2; x+y }` `f(x) = @( y = x^2; x+y )` `f(x) = @ y = x^2; x+y }` All of these are trivially parseable and have zero effect on any existing syntax, by virtue of using a new symbol. I don't see any reason we would choose `@(` over `@{` for blocks, since curly braces are generally more associated with blocks -- I am just trying to be logically complete. And I think the last one, even though it is shortest, is just about unreadable since my eye just doesn't want to match a `@` with a `}`. But the first two seem entirely plausible to me. Of them, I think I prefer the first, because it still leaves `@` for future use, and somehow the idea of escaping the opening curly bracket ties in with an idea that you are sort of escaping back to the initial Block environment. Or conversely, if you would like to consider `{...}` for blocks and `\{a: 7, b:2}` for objects since likely blocks will be more used in the mathjs language than objects once they are available, that would be fine, too, since the idea that you have to escape the opening brace to get it to be an object since otherwise it would be a block also makes sense to me. Anyhow, hoping that maybe one of these is better than the options we discussed yesterday.

josdejong commented

2026-02-05 18:36:13 +00:00

Collaborator

Yes! I didn't find the time today to work out the notes, I will do that tomorrow.

josdejong commented

2026-02-06 11:11:10 +00:00

Collaborator

Since this topic is separate from nanomath and a new core for mathjs, I thought it would be best to open a separate discussion for it:

https://github.com/josdejong/mathjs/discussions/3642

I quite like your idea of using \{...} for blocks! So far both {{...}} and \{...} feel like a good match.

Since this topic is separate from nanomath and a new core for mathjs, I thought it would be best to open a separate discussion for it: https://github.com/josdejong/mathjs/discussions/3642 I quite like your idea of using `\{...}` for blocks! So far both `{{...}}` and `\{...}` feel like a good match.

Rows
Columns

Possible next direction: Parsing and type-enhanced Node trees #45