module statics imports signatures/fostr-sig /** md Title: Adding Program Analysis with Statix ## Development of fostr static analysis This section is more documentation of Spoofax in general and Statix in particular than of fostr itself, but is being maintained here in case it could be either helpful to someone getting started with Statix or helpful in understanding how the static characteristics of fostr were designed. As mentioned in the [Overview](../README.md), I don't like to program and a corollary of that is never to use a facility unless/until there's a need for it. So the first few rudimentary passes at fostr simply declared every program to be "OK" from the point of view of Statix: ```statix {! "\git docs/statix_start:trans/statics.stx" extract: start: programOk stop: (.*TopLevel.*) !} ``` Then I reached the point at which the grammar was basically just ```SDF3 // Start.TopLevel = // Seq = // Seq.Sequence = sq:Ex+ {layout(align-list sq)} // Ex.Terminated = <;> {! "\git docs/statix_start:syntax/fostr.sdf3" extract: start: TermEx.Terminate stop: (.*bracket.*) !} ``` (The first four clauses are in comments because they approximate fostr's grammar; it actually uses a few more sorts for sequences of expressions, to achieve fostr's exact layout rules.) This was the first point at which there were two different types that might need to be written to standard output (Int and String), and although of course the dynamically-typed Python and Javascript code generated dealt with both fine, the Haskell code needed to differ depending on the type of the item written (and I hadn't even started OCaml code generation at that point since I knew it would be hopeless without statically typing fostr programs). So it was time to bite the bullet and add type checking via Statix to fostr. The first step is to replace the simple assertion that any TopLevel is OK with a constraint that its Seq must type properly, and an assignment of that type to the top level node: ```statix programOk(tl@TopLevel(seq)) :- {T} type_Seq(seq) == T, @tl.type := T. ``` Of course, for this to even parse, we must have a definition of `type_Seq`: ```statix **/ /** md */ signature sorts TYPE // semantic type constructors INT : TYPE STRING : TYPE STREAM : TYPE /* **/ // see docs/implementation.md for detail on how to switch to multi-file analysis rules // single-file entry point programOk : Start /** md rules type_Seq : Seq -> TYPE ``` **/ type_LineSeq : LineSeq -> TYPE programOk(tl@TopLevel(seq)) :- {T} type_LineSeq(seq) == T, @tl.type := T. /** md Now to type a Seq, we look to the syntax, and see that there are two possibilities for what it might be: just an Ex, or a Sequence(_) of a list of 'Ex's. For the first, Statix does not allow one sort to simply "become" another, but the Spoofax infrastructure automatically inserts "injection" constructors for us, in this case one named Ex2Seq. So the first rule for `type_Seq` is straightforward: ```statix type_Seq(s@Ex2Seq(e)) = T : - type_Ex(e) == T, @s.type := T. ``` where of course type_Ex needs its own declaration analogous to the above. **/ type_Line : Line -> TYPE type_LineSeq(ls@Line2LineSeq(l)) = T :- type_Line(l) == T, @ls.type := T. type_OptTermEx : OptTermEx -> TYPE type_Line(l@OptTermEx2Line(ote)) = T :- type_OptTermEx(ote) == T, @l.type := T. type_Ex : Ex -> TYPE type_OptTermEx(ote@Ex2OptTermEx(e)) = T :- type_Ex(e) == T, @ote.type :=T. /** md This brings us to the syntax rules for the basic expressions themselves, which comprise almost all of the remaining fostr language constructs. But first a mechanism suggested by Ivo Wilms to avoid repeating the node type annotation in every rule: ```statix **/ /** md */ ty_Ex : Ex -> TYPE type_Ex(e) = ty@ty_Ex(e) :- @e.type := ty. /* **/ /** md ``` At this stage in fostr's development, there was no difference between a terminated and unterminated expression, so the typing rule for that constructor was trivial: ```statix ty_Ex(Terminated(e)) = ty_Ex(e). ``` **/ type_TermEx: TermEx -> TYPE type_TermEx(te@Terminate(e)) = T :- type_Ex(e) == T, @te.type := T. /** md Now typing literals is straightforward: ```statix **/ /** md */ ty_Ex(Int(_)) = INT(). ty_Ex(LitString(_)) = STRING(). ty_Ex(e@Stream()) = STREAM(). /* **/ /** md ``` Finally we get to the binary operators, and here we use the pattern found in recent versions of the "[chicago](https://github.com/MetaBorgCube/statix-sandbox/tree/master/chicago)" example language and in the Fall 2020 TU-Delft class lecture on [Name Binding and Name Resolution](https://tudelft-cs4200-2020.github.io/lectures/2020/09/24/lecture5/). This pattern lets us specify error messages. ```statix **/ /** md */ ty_Ex(Sum(e1, e2)) = INT() :- type_Ex(e1) == INT() | error $[Expression [e1] not an Int in sum.]@e1, type_Ex(e2) == INT() | error $[Expression [e2] not an Int in sum.]@e2. ty_Ex(Gets(e1, e2)) = STREAM() :- {T} type_Ex(e1) == STREAM() | error $[Only Streams may receive items.]@e1, type_Ex(e2) == T. ty_Ex(To(e1, e2)) = T :- type_Ex(e1) == T, type_Ex(e2) == STREAM() | error $[Items may only be sent to Streams.]@e2. /* **/ /** md ``` ### Using type annotations in transformation _Probably want to include stuff from analysis.str/ haskell.str here_ **/ rules // multi-file entry point projectOk : scope projectOk(s). fileOk : scope * Start fileOk(s, TopLevel(_)).