Add literal string constants (#19)

In addition, for the sake of Haskell code generation, this PR adds static typing with Statix. Resolves #5. Co-authored-by: Glen Whitney <glen@studioinfinity.org> Reviewed-on: #19 Co-Authored-By: Glen Whitney <glen@nobody@nowhere.net> Co-Committed-By: Glen Whitney <glen@nobody@nowhere.net>
2021-02-18 04:17:05 +00:00 · 2021-02-18 04:17:05 +00:00 · 2772fd0c5c
commit 2772fd0c5c
parent c516ed6d7f
16 changed files with 386 additions and 34 deletions
--- a/trans/statics.stx
+++ b/trans/statics.stx
@ -1,14 +1,257 @@
 module statics

 imports signatures/fostr-sig
+imports signature/TYPE
+imports statics/util

-// see docs/implementation.md for details on how to switch to multi-file analysis
+/** md
+Title: Adding Program Analysis with Statix
+
+## Development of fostr static analysis
+
+This section is more documentation of Spoofax in general and Statix
+in particular than of fostr itself, but is being maintained here in case
+it could be either helpful to someone getting started with Statix or
+helpful in understanding how the static characteristics of fostr were designed.
+
+As mentioned in the [Overview](../README.md), I don't like to program and a
+corollary of that is never to use a facility unless/until there's a need for
+it. So the first few rudimentary passes at fostr simply declared every program
+to be "OK" from the point of view of Statix:
+```statix
+{! "\git docs/statix_start:trans/statics.stx" extract:
+  start: programOk
+  stop: (.*TopLevel.*)
+!}
+```
+
+Then I reached the point at which the grammar was basically just
+```SDF3
+// Start.TopLevel = <Seq>
+// Seq            = <Ex>
+// Seq.Sequence   = sq:Ex+ {layout(align-list sq)}
+// Ex.Terminated  = <<Ex>;>
+{! "\git docs/statix_start:syntax/fostr.sdf3" extract:
+  start: TermEx.Terminate
+  stop: (.*bracket.*)
+!}
+```
+(The first four clauses are in comments because they approximate fostr's
+grammar; it actually uses a few more sorts for sequences of
+expressions, to achieve fostr's exact layout rules. Also note that the parsing
+of literal strings later evolved to include the surrounding single quotes,
+because the rule above implicitly allows layout between the quotes and the
+string contents, creating ambiguity.)
+
+This was the first point at which there were two different types that might
+need to be written to standard output (Int and String), and although of course
+the dynamically-typed Python and Javascript code generated dealt with both fine,
+the Haskell code needed to differ depending on the
+type of the item written (and I hadn't even started OCaml code generation at
+that point since I knew it would be hopeless without statically typing fostr
+programs).
+
+So it was time to bite the bullet and add type checking via Statix to fostr.
+The first step was to replace the simple assertion that any TopLevel
+is OK with a constraint that its Seq must type properly, and an assignment of
+that type to the top level node:
+```statix
+programOk(tl@TopLevel(seq)) :- {T}
+  type_Seq(seq) == T,
+  @tl.type := T.
+```
+Of course, for this to even parse, we must have a definition of `type_Seq`:
+```statix
+{! ../signature/TYPE.stx extract: {start: module, stop: rules} !}
+**/
+
+// see docs/implementation.md for detail on how to switch to multi-file analysis

 rules // single-file entry point

  programOk : Start

-  programOk(TopLevel(_)).
+  /** md
+rules
+  type_Seq : Seq -> TYPE
+```
+  **/
+
+  type_LineSeq : LineSeq -> TYPE
+
+  programOk(tl@TopLevel(seq)) :- {T}
+    type_LineSeq(seq) == T,
+    @tl.type := T.
+
+  /** md
+Now to type a Seq, we look to the syntax, and see that there are two
+possibilities for what it might be: just an Ex, or a Sequence(_) of a
+list of 'Ex's. For the first, Statix does not allow one sort to simply
+"become" another, but the Spoofax infrastructure automatically inserts
+"injection" constructors for us, in this case one named Ex2Seq. So the
+first rule for `type_Seq` is straightforward:
+
+```statix
+  type_Seq(s@Ex2Seq(e)) = T : -
+    type_Ex(e) == T,
+    @s.type := T.
+```
+where of course type_Ex needs its own declaration analogous to the above.
+  **/
+
+  type_Line : Line -> TYPE
+
+  type_LineSeq(ls@Line2LineSeq(l)) = T :-
+    type_Line(l) == T,
+    @ls.type := T.
+
+  /** md
+
+The other (and in fact more typical) rule for `type_Seq`, when it actually
+consists of a sequence of expressions, is a bit more involved. Fortunately
+Statix provides a primitive for mapping over a list, so we can proceed as
+follows:
+```statix
+  types_Exs maps type_Ex(list(*)) = list(*)
+  type_Seq(s@Sequence(l)) = T :- {lt}
+    types_Exs(l) == lt,
+    lastTYPE(lt) == T,
+    @s.type := T.
+```
+Here `lastTYPE` is a function that extracts the last TYPE from a list.
+Unless/until Statix develops some sort of standard library, it must be
+hand-defined, as done in "statics/util.stx" like so:
+```statix
+{! ../statics/util.stx extract: {start: lastTYPE} !}
+```
+  **/
+
+  types_Lines maps type_Line(list(*)) = list(*)
+
+  type_LineSeq(ls@Sequence(l)) = T :- {lt}
+    types_Lines(l) == lt,
+    lastTYPE(lt) == T,
+    @ls.type := T.
+
+  type_OptTermEx : OptTermEx -> TYPE
+
+  type_Line(l@OptTermEx2Line(ote)) = T :-
+    type_OptTermEx(ote) == T,
+    @l.type := T.
+
+  type_Ex     :     Ex -> TYPE
+  type_TermEx : TermEx -> TYPE
+
+  type_OptTermEx(ote@Ex2OptTermEx(e)) = T :-
+    type_Ex(e) == T,
+    @ote.type := T.
+
+  type_OptTermEx(ote@TermEx2OptTermEx(te)) = T :-
+    type_TermEx(te) == T,
+    @ote.type := T.
+
+  /** md
+
+This brings us to the syntax rules for the basic expressions themselves,
+which comprise almost all of the remaining fostr language constructs.
+But first a mechanism suggested by Ivo Wilms to avoid repeating the node
+type annotation in every rule:
+```statix
+  **/
+
+  /** md */
+  ty_Ex : Ex -> TYPE
+
+  type_Ex(e) = ty@ty_Ex(e) :-
+    @e.type := ty.
+  /* **/
+
+  /** md
+```
+At this stage in fostr's development, there was no difference between a
+terminated and unterminated expression, so the typing rule for that
+constructor was trivial:
+```statix
+  ty_Ex(Terminated(e)) = ty_Ex(e).
+```
+  **/
+
+  type_TermEx(te@Terminate(e)) = T :-
+    type_Ex(e) == T,
+    @te.type := T.
+
+  /** md
+
+Now typing literals is straightforward:
+```statix
+  **/
+
+  /** md */
+  ty_Ex(Int(_))       = INT().
+  ty_Ex(LitString(_)) = STRING().
+  ty_Ex(e@Stream())   = STREAM().
+  /* **/
+
+  /** md
+```
+
+Finally we get to the binary operators, and here we use the pattern found in
+recent versions of the
+"[chicago](https://github.com/MetaBorgCube/statix-sandbox/tree/master/chicago)"
+example language and in the Fall 2020 TU-Delft class lecture on
+[Name Binding and Name Resolution](https://tudelft-cs4200-2020.github.io/lectures/2020/09/24/lecture5/).
+This pattern lets us specify error messages.
+
+```statix
+  **/
+
+  /** md */
+  ty_Ex(Sum(e1, e2)) = INT() :-
+    type_Ex(e1) == INT() | error $[Expression [e1] not an Int in sum.]@e1,
+    type_Ex(e2) == INT() | error $[Expression [e2] not an Int in sum.]@e2.
+
+  ty_Ex(Gets(e1, e2)) = STREAM() :- {T}
+    type_Ex(e1) == STREAM() | error $[Only Streams may receive items.]@e1,
+    type_Ex(e2) == T.
+
+  ty_Ex(To(e1, e2)) = T :-
+    type_Ex(e1) == T,
+    type_Ex(e2) == STREAM() | error $[Items may only be sent to Streams.]@e2.
+  /* **/
+
+  /** md
+```
+
+### Using type annotations in transformation
+
+At this point, Statix properly types all of the valid programs of the very
+rudimentary language defined by the grammar above. But the proximate purpose
+for implementing this typing was to aid Haskell code generation. So how
+do we actually use the assigned types in a Stratego transformation?
+
+Statix provides a Stratego api that includes, among other items, strategies
+`stx-get-ast-analysis` and `stx-get-ast-type(|analysis)` that provide access
+to the assigned types. However, it's easiest to use the information via
+a wrapper like this, essentially lifted from the "chicago" language project:
+```stratego
+{! analysis.str extract:
+  start: Extract.the.type
+terminate: Prints.the.analyzed.type
+!}
+```
+
+Now `get_type` run on a node of the analyzed AST produces the assigned `TYPE`
+(as an ATerm in the constructors of sort TYPE in Statix).
+
+Thus, you can select on the assigned type, as in the strategy to select
+the correct Haskell operator to use to send an item to standard output:
+```stratego
+{! haskell.str extract:
+  start: '(.*hs_getOp.=.*)'
+  stop: \s
+!}
+```
+  **/

 rules // multi-file entry point