Nanomath resolution process is all WET... #40
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
... where WET = Write Everything Twice. You can see this very acutely in the recent pull request #39 in the code for cbrt in complex/arithmetic. The final function that is returned is quite reasonable and compact, but just above it, there is a slew of
.resolvecalls that essentially completely reiterates the computation, just at the level of types rather than values. Essentially every .resolve call mirrors a step in the computation, just with the types of the arguments that will be used, and then the resulting type is used as the argument type for the next step in the computation.At this point, we are so close that I will go ahead and implement polynomialRoot per #30 and benchmark this iteration directly parallel to Pocomath (the most recent design iteration that made it all the way to a benchmark) and to mathjs 15. I think it is likely we will see a speedup, conceivably even slightly better than Pocomath.
But I don't think the nanomath scheme is scalable. The resolution section prior to the implementation amounts to a heavy load of boilerplate that makes writing the implementations feel very much of a chore. I think to extend this basic approach to the size of mathjs 15, we need to automate this resolution process somehow -- as I write the type selection code, it feels very automatable.
On the other hand, I don't see an obvious way in JavaScript to select out and resolve just all of the functions that the eventual implementation will need. Four concepts come to mind, but all have their drawbacks/uncertainties.
A) Something like automatic differentiation. In other words, the actual implementation has a "math" object injected into it, and at resolution time, it's called with a special version of this object that serves just to record dependencies and select resolutions for each of the calls. Then when the implementation is actually used, it's passed its own tailored "math" object in which every method that's actually needed just "magically" has the right definition, no type detection and dispatch needed . The big drawback here is that for a function with multiple code paths (because of conditionals, etc), it's not at all clear how that resolution-time call could get possibly get info on function calls in paths not taken..
B) A variation on A in which again, each implementation has a "math" object passed to it which it uses to get dependencies -- but in fact, it also has its own private little instance of the math object that lasts just the same length of time as it does. At first there's nothing in the math object, But each time it's used for resolution and it does the type detection and resolution, it asks th "real" math objet to do that, and the associated mini-math object remembers the instantiated function and where it occurred, so that next time it doesn't need to do that type detection/dispatch, it just directly calls the one in its mini-math. The main bit unclear here is how to distinguish different calls of the same function with different actual argument types so that the mini-math can serve up a different implementation of the dependency. I also think that all these computed calls will be slower than the nanomath scheme where all the selection is done all at once so that the functions can just be compiled into the implementation as JavaScript creates the function object. So this would likely be an intermediate-speed solution between nanomath and mathjs 15.
C) Shift as much of the implementations as possible to mathjs expression language, where we have a full syntax tree and can do this sort of static analysis directly. Certainly cbrt can be defined from more primitive operations via mathjs expressions. For the primitive ones or others that need JavaScript, just grin and bear the nanomath resolution scheme -- it's not bad at all to deal with for the simpler functions, which is why I didn't really become conceerned with this issue until now. The biggest concern with this approach is that it may invite many extensions and complications to the mathjs expression language to be able to implement the large mathjs library, which could get unwieldy.
D) Specify the implementation in JavaScript, but as a string, and parse it to an AST form, and just do static analysis on that AST, and inject all of the dependencies needed into the context in which the string is finally converted into an actual JavaScript function. This approach should in theory work and yield good performance, but it faces the hurdle that full static analysis of JavaScript code is a tall order, We would basically be writing our own JavaScript compiler in JavaScript.
I am wide open to any other options or thoughts on which of A - D we might best be able to actually pull off
I have thought a bit more about B, which can basically be thought of as caching the type detection/function resolution results for the dependencies of any implementation so that they are done only once. Other than looking up in the cache and then calling being slower than pre-computing the resolved function and compiling it into place like nanomath is doing now, it would work perfectly for dependencies that are only called with one actual-argument signature. But often that's not the case, and then the individual call sites must be distinguished. So B splits into two options about how to do that:
B.1) In the guts of the implementation, the implementor must decorate different instances of any dependency that is called in different places where it has different signatures, something like:
where "calledOnce" and "repeatedCall" are just stand-ins for math methods, like "multiply" and "negate," say. There would be various possibilities for the exact location and syntax of the site annotations, and the "site" values could be arbitrary labels, and there could and perhaps should be an annotation that says "don't even try to cache this call, it's going to be called with different argument types at this exact call site". Note that if there are two calls that the implementor knows will always be called with the same signature, the implementor can label them as the same "site" -- so maybe "flavor" would be a better term than "site".
B.2) We use Error.captureStackTrace() to autodetect the different call sites. It is non-standard but available in all major browsers and JavaScript runtimes except
deno, includingbun. It can't tell that two different sites will be sure to have the same signature, but that's pretty minor, it will just cache the same thing twice. There may still be a need to have a decoration that says "don't cache this site" if it is going to be called at that point with different types (but generally that doesn't happen much). The biggest concern here is that including a call to captureStackTrace will slow down the dependency calls even more, so it's not clear that this would even be faster than the current mathjs 15 always-re-dispatch strategy.So we will have to decide if it is worth experimenting with either (B.1) or (B.2)...
As to the other alternatives, I don't currently see any way to actually make (A) work, and I have essentially no enthusiasm for (D). On the other hand, (C) seems doable, and Jos has been interested for some time in making the mathjs language a more full-fledged programming language with loops and flow-control constructs. The biggest worry is that the compiled versions of complicated mathjs-expression-language functions may still be noticeably slower than the same function written directly in JavaScript. So again, we would have to test carefully to see what kind of performance would come out the other end.
So to summarize, at the moment I don't see a way around the laborious pre-resolution in nanomath that seems both workable and likely to deliver the same speedup. Three options (B.1), (B.2), or (C) seem as though they could be worth experimenting with to see the tradeoff between ease-of-implementation and resulting performance. I just don't want the quest for a mathjs core replacement to be never-ending. One point is that any of these three options can be done in a hybrid way with nanomath -- in other words, if we implement them, they could be used for bigger/more compute-intensive functions where the resolution-time may not be the major factor, and then any function that becomes a performance bottleneck could always be re-implemented in the current nanomath style to squeeze out some wasted time there.
So a potential strategy is to proceed with nanomath for now, and as we get to the points where the implementation pain of doing all the resolution calls before the real body of the method becomes too great, we can layer on either or both of (C) (use mathjs expression language internally) and/or one or the other of (B.1/B.2) (resolution caching schemes) without disrupting any existing implementations.
So I guess that's my plan for now, but definitely would love other thoughts/guidance.
I see what you mean. I'm glad that these pain points pop up now in the prototyping phase, and not in a later stage! The
cbrtfunction in #39 indeed requires a lot of logic for resolving the dependencies.Ideally, we would only have to bother about the actual logic and not about resolving correct signatures of all the dependencies. The reason we want that is to improve performance by moving type resolving to the loading phase instead of execution phase.
Just to be sure: if I remember well from one of the first prototypes, this idea of moving type resolving from execution to loading phase improves about the performance drastically, (like 9 or 10 times?), so it is worth the effort, right?
Right now option (B.1) sounds most promising to me. So this is sort of a proxy with a cache, or a memoization solution. That can work if the way to retrieve an item from the "cache" is faster than the way to retrieve it from the full function via type checking. This can only work if it is possible for the nanomath core to know for sure if some call will always have the same data types, in which case type checking can be skipped after the first time. But how can we be sure of that? That requires knowing whether each of the arguments has a known, fixed type. I'm not sure how to detect that in some smart, easy way.
Your ideas (C) and (D) are basically that we develop our own compiler (or use the TS compiler or so). It sounds like a lot of work.
I'll give these ideas some thought the coming days.
I think it would be best to first do some experimentation with the difference in performance between (a) the "slowest" way of just looking up the signature via the full tree, (b) try out some caching/memoization mechanisms to see if we can speed that up, and (c) not having to do type checking if the types are static and known.
I can assure you that (a) doing type resolution on every call is significantly slower in nanomath than mathjs with typed-function, which is not too surprising since the typed-function approach was entirely based on (a) and it is highly optimized for that operation.
But nevertheless, looking up at every call is 5-10 times slower than pre-resolving and assembling an implementation that is type-tailored. The fact is that when the input types are known, the types at any point in the computation are generally fixed. The entire point of nanomath is to take advantage of this phenomenon.
I agree we should not go down the path of static analysis of JavaScript code, option (D).
On the other hand, (C) appears to me to be a path of low resistance, because although you worry that implementing a compiler is a lot of work, we have already implemented a compiler for the mathjs expression language. Once you have the Node expression tree (AST) for a mathjs expression, with the type information in nanomath it is easy to type the entire tree based on the types of the symbols, and then it is trivial for compile to do pre-resolution and compile in the type-selected implementation rather than the full method that does resolution on every call.
Hence my recommendation to take advantage of the work we have already done to create the mathjs expression compiler and adopt a hybrid approach: by-hand pre-resolution as in the current nanomath for the simple and fundamental cases, and actually implement functions in the mathjs expression language for the higher-level functions that can be defined in terms of the simpler ones. Please see #45 and let me know what you think. Thanks!
You also suggest (b) which is basically to experiment with the caching-based possibilities (B.1) and/or (B.2). We can definitely do that but I will admit after 3-4 years of fiddling with this off and on with many different prototypes and possible designs some level of fatigue with iterating on designs in contrast with getting something sufficiently solid and moving on with life (especially as I worry about the lifetime of javascript as the main in-browser language).
So I will be happy to hear about your reaction to #45 and yes of course will instead/in addition prototype a resolution-caching scheme to experiment with but only if you think it is necessary to be able to move forward, i.e. that the combo of (C) with hand-pre-resolution where it's not onerous will not be deemed sufficient for adoption without trying out some flavor of (B). Give it some thought and let me know and I will proceed accordingly.
I would expect doing type resolution on every call in nanomath would be at least as fast as mathjs+typed-function but you may be right.
Ok that is really important information. This is indeed the core idea of nanomath, if this works out it's rocket fast 🚀.
I'm still trying to think through how (C) would work out concretely. I guess we can define for example
cube(x) = x*x*x, then look at the parsed node tree containing the input of the function (which can be any type), executing two multiplications. Then, we can somehow make a mapping which knows that ifxis number, it has to take the number implementation ofmultiplyor so, and the same for other data types? My feeling is that such a mapping can very easily grow very large and if the function grows larger, there are a huge amount of paths to explore.Besides that, if we would like to write many of the functions of mathjs in the expression parser instead of in JS, it would also require to extend the expression parser to support loops, local variables, etc (turn it more in a feature complete programming language). I like that but I thought that is not your cup of thee?
Can you explain what you have in mind with option (C)? How would that work?
@josdejong wrote in #40 (comment):
I have benchmarked exactly this. It is not as fast in nanomath as in typed-function. The approaches are very different. Nanomath types every argument, and then uses a 3rd-party "ArrayKeyedMap" to look up the proper implementation for that array of types. I thought that would be faster than running all the different test functions over and over, which is what typed-function does. But typed-function has lots of clever optimizations (hats off to you) like ordering the types in most-likely-to-be-used order, so for functions on numbers, it just runs the number test on each argument. It may be that the table lookup is faster than typed-function when you have 30 types and you are looking up the 27th-ranked type, but for a prototype the size of nanomath and running functions on numbers, typed-function wins hands-down. But the direct lookup of the array of types was quicker to get the prototype working in terms of my time, and it's not worse that twice the time, so I think we can afford to do some profiling and optimization of that later. Maybe a hybrid approach (try numbers and some other common types first, then fall back to table lookup or something) would be best. I don't think it's make or break for this prototype.
I will put more details on (C) in #45, which I did start working on ;-)
Thanks, I get it. But, it may be possible (over time) to add clever optimizations in
nanomathtoo ;)I'll await to see what you're cooking up :)
OK, there's now a long-ish explanation of (C) in #45.