fostr/trans/statics.stx

module statics

imports signatures/fostr-sig
imports signature/TYPE
imports statics/util

/** md
Title: Adding Program Analysis with Statix

## Development of fostr static analysis

This section is more documentation of Spoofax in general and Statix
in particular than of fostr itself, but is being maintained here in case
it could be either helpful to someone getting started with Statix or
helpful in understanding how the static characteristics of fostr were designed.

As mentioned in the [Overview](../README.md), I don't like to program and a
corollary of that is never to use a facility unless/until there's a need for
it. So the first few rudimentary passes at fostr simply declared every program
to be "OK" from the point of view of Statix:
```statix
{! "\git docs/statix_start:trans/statics.stx" extract:
  start: programOk
  stop: (.*TopLevel.*)
!}
```

Then I reached the point at which the grammar was basically just
```SDF3
// Start.TopLevel = <Seq>
// Seq            = <Ex>
// Seq.Sequence   = sq:Ex+ {layout(align-list sq)}
// Ex.Terminated  = <<Ex>;>
{! "\git docs/statix_start:syntax/fostr.sdf3" extract:
  start: TermEx.Terminate
  stop: (.*bracket.*)
!}
```
(The first four clauses are in comments because they approximate fostr's
grammar; it actually uses a few more sorts for sequences of
expressions, to achieve fostr's exact layout rules. Also note that the parsing
of literal strings later evolved to include the surrounding single quotes,
because the rule above implicitly allows layout between the quotes and the
string contents, creating ambiguity.)

This was the first point at which there were two different types that might
need to be written to standard output (Int and String), and although of course
the dynamically-typed Python and Javascript code generated dealt with both fine,
the Haskell code needed to differ depending on the
type of the item written (and I hadn't even started OCaml code generation at
that point since I knew it would be hopeless without statically typing fostr
programs).

So it was time to bite the bullet and add type checking via Statix to fostr.
The first step was to replace the simple assertion that any TopLevel
is OK with a constraint that its Seq must type properly, and an assignment of
that type to the top level node:
```statix
programOk(tl@TopLevel(seq)) :- {T}
  type_Seq(seq) == T,
  @tl.type := T.
```
Of course, for this to even parse, we must have a definition of `type_Seq`:
```statix
{! ../signature/TYPE.stx extract: {start: module, stop: rules} !}
**/

// see docs/implementation.md for detail on how to switch to multi-file analysis

rules // single-file entry point

  programOk : Start

  /** md
rules
  type_Seq : Seq -> TYPE
```
  **/

  type_LineSeq : LineSeq -> TYPE

  programOk(tl@TopLevel(seq)) :- {T}
    type_LineSeq(seq) == T,
    @tl.type := T.

  /** md
Now to type a Seq, we look to the syntax, and see that there are two
possibilities for what it might be: just an Ex, or a Sequence(_) of a
list of 'Ex's. For the first, Statix does not allow one sort to simply
"become" another, but the Spoofax infrastructure automatically inserts
"injection" constructors for us, in this case one named Ex2Seq. So the
first rule for `type_Seq` is straightforward:

```statix
  type_Seq(s@Ex2Seq(e)) = T : -
    type_Ex(e) == T,
    @s.type := T.
```
where of course type_Ex needs its own declaration analogous to the above.
  **/

  type_Line : Line -> TYPE

  type_LineSeq(ls@Line2LineSeq(l)) = T :-
    type_Line(l) == T,
    @ls.type := T.

  /** md

The other (and in fact more typical) rule for `type_Seq`, when it actually
consists of a sequence of expressions, is a bit more involved. Fortunately
Statix provides a primitive for mapping over a list, so we can proceed as
follows:
```statix
  types_Exs maps type_Ex(list(*)) = list(*)
  type_Seq(s@Sequence(l)) = T :- {lt}
    types_Exs(l) == lt,
    lastTYPE(lt) == T,
    @s.type := T.
```
Here `lastTYPE` is a function that extracts the last TYPE from a list.
Unless/until Statix develops some sort of standard library, it must be
hand-defined, as done in "statics/util.stx" like so:
```statix
{! ../statics/util.stx extract: {start: lastTYPE} !}
```
  **/

  types_Lines maps type_Line(list(*)) = list(*)

  type_LineSeq(ls@Sequence(l)) = T :- {lt}
    types_Lines(l) == lt,
    lastTYPE(lt) == T,
    @ls.type := T.

  type_OptTermEx : OptTermEx -> TYPE

  type_Line(l@OptTermEx2Line(ote)) = T :-
    type_OptTermEx(ote) == T,
    @l.type := T.

  type_Ex     :     Ex -> TYPE
  type_TermEx : TermEx -> TYPE

  type_OptTermEx(ote@Ex2OptTermEx(e)) = T :-
    type_Ex(e) == T,
    @ote.type := T.

  type_OptTermEx(ote@TermEx2OptTermEx(te)) = T :-
    type_TermEx(te) == T,
    @ote.type := T.

  /** md

This brings us to the syntax rules for the basic expressions themselves,
which comprise almost all of the remaining fostr language constructs.
But first a mechanism suggested by Ivo Wilms to avoid repeating the node
type annotation in every rule:
```statix
  **/

  /** md */
  ty_Ex : Ex -> TYPE

  type_Ex(e) = ty@ty_Ex(e) :-
    @e.type := ty.
  /* **/

  /** md
```
At this stage in fostr's development, there was no difference between a
terminated and unterminated expression, so the typing rule for that
constructor was trivial:
```statix
  ty_Ex(Terminated(e)) = ty_Ex(e).
```
  **/

  type_TermEx(te@Terminate(e)) = T :-
    type_Ex(e) == T,
    @te.type := T.

  /** md

Now typing literals is straightforward:
```statix
{! "\git docs/statix_works:trans/statics.stx" extract:
  start: '(.*ty_Ex.Int.*\s*)'
  stop: '/. ../'
!}
```
  **/

  ty_Ex(Int(_))       = INT().
  ty_Ex(LitString(_)) = STRING().
  ty_Ex(EscString(_)) = STRING().
  ty_Ex(e@Stream())   = STREAM().

  /** md

Finally we get to the binary operators, and here we use the pattern found in
recent versions of the
"[chicago](https://github.com/MetaBorgCube/statix-sandbox/tree/master/chicago)"
example language and in the Fall 2020 TU-Delft class lecture on
[Name Binding and Name Resolution](https://tudelft-cs4200-2020.github.io/lectures/2020/09/24/lecture5/).
This pattern lets us specify error messages.

```statix
  **/

  /** md */
  ty_Ex(Sum(e1, e2)) = INT() :-
    type_Ex(e1) == INT() | error $[Expression [e1] not an Int in sum.]@e1,
    type_Ex(e2) == INT() | error $[Expression [e2] not an Int in sum.]@e2.

  ty_Ex(Gets(e1, e2)) = STREAM() :- {T}
    type_Ex(e1) == STREAM() | error $[Only Streams may receive items.]@e1,
    type_Ex(e2) == T.

  ty_Ex(To(e1, e2)) = T :-
    type_Ex(e1) == T,
    type_Ex(e2) == STREAM() | error $[Items may only be sent to Streams.]@e2.
  /* **/

  /** md
```

### Using type annotations in transformation

At this point, Statix properly types all of the valid programs of the very
rudimentary language defined by the grammar above. But the proximate purpose
for implementing this typing was to aid Haskell code generation. So how
do we actually use the assigned types in a Stratego transformation?

Statix provides a Stratego api that includes, among other items, strategies
`stx-get-ast-analysis` and `stx-get-ast-type(|analysis)` that provide access
to the assigned types. However, it's easiest to use the information via
a wrapper like this, essentially lifted from the "chicago" language project:
```stratego
{! analysis.str extract:
  start: Extract.the.type
terminate: Prints.the.analyzed.type
!}
```

Now `get_type` run on a node of the analyzed AST produces the assigned `TYPE`
(as an ATerm in the constructors of sort TYPE in Statix).

Thus, you can select on the assigned type, as in the strategy to select
the correct Haskell operator to use to send an item to standard output:
```stratego
{! haskell.str extract:
  start: '(.*hs_getOp.=.*)'
  stop: \s
!}
```
  **/

rules // multi-file entry point

  projectOk : scope

  projectOk(s).

  fileOk : scope * Start

  fileOk(s, TopLevel(_)).
init: SDF3/Statix project as initialized by Spoofax 2021-01-30 20:27:10 +00:00			`module statics`

chore: Switch to this repository from predecessor 2021-01-30 23:37:53 +00:00			`imports signatures/fostr-sig`
Add literal string constants (#19) In addition, for the sake of Haskell code generation, this PR adds static typing with Statix. Resolves #5. Co-authored-by: Glen Whitney <glen@studioinfinity.org> Reviewed-on: https://code.studioinfinity.org/glen/fostr/pulls/19 Co-Authored-By: Glen Whitney <glen@nobody@nowhere.net> Co-Committed-By: Glen Whitney <glen@nobody@nowhere.net> 2021-02-18 04:17:05 +00:00			`imports signature/TYPE`
			`imports statics/util`
chore: Switch to this repository from predecessor 2021-01-30 23:37:53 +00:00
Add literal string constants (#19) In addition, for the sake of Haskell code generation, this PR adds static typing with Statix. Resolves #5. Co-authored-by: Glen Whitney <glen@studioinfinity.org> Reviewed-on: https://code.studioinfinity.org/glen/fostr/pulls/19 Co-Authored-By: Glen Whitney <glen@nobody@nowhere.net> Co-Committed-By: Glen Whitney <glen@nobody@nowhere.net> 2021-02-18 04:17:05 +00:00			`/** md`
			`Title: Adding Program Analysis with Statix`

			`## Development of fostr static analysis`

			`This section is more documentation of Spoofax in general and Statix`
			`in particular than of fostr itself, but is being maintained here in case`
			`it could be either helpful to someone getting started with Statix or`
			`helpful in understanding how the static characteristics of fostr were designed.`

			`As mentioned in the [Overview](../README.md), I don't like to program and a`
			`corollary of that is never to use a facility unless/until there's a need for`
			`it. So the first few rudimentary passes at fostr simply declared every program`
			`to be "OK" from the point of view of Statix:`
			```statix
			`{! "\git docs/statix_start:trans/statics.stx" extract:`
			`start: programOk`
			`stop: (.TopLevel.)`
			`!}`
			```

			`Then I reached the point at which the grammar was basically just`
			```SDF3
			`// Start.TopLevel = <Seq>`
			`// Seq = <Ex>`
			`// Seq.Sequence = sq:Ex+ {layout(align-list sq)}`
			`// Ex.Terminated = <<Ex>;>`
			`{! "\git docs/statix_start:syntax/fostr.sdf3" extract:`
			`start: TermEx.Terminate`
			`stop: (.bracket.)`
			`!}`
			```
			`(The first four clauses are in comments because they approximate fostr's`
			`grammar; it actually uses a few more sorts for sequences of`
			`expressions, to achieve fostr's exact layout rules. Also note that the parsing`
			`of literal strings later evolved to include the surrounding single quotes,`
			`because the rule above implicitly allows layout between the quotes and the`
			`string contents, creating ambiguity.)`

			`This was the first point at which there were two different types that might`
			`need to be written to standard output (Int and String), and although of course`
			`the dynamically-typed Python and Javascript code generated dealt with both fine,`
			`the Haskell code needed to differ depending on the`
			`type of the item written (and I hadn't even started OCaml code generation at`
			`that point since I knew it would be hopeless without statically typing fostr`
			`programs).`

			`So it was time to bite the bullet and add type checking via Statix to fostr.`
			`The first step was to replace the simple assertion that any TopLevel`
			`is OK with a constraint that its Seq must type properly, and an assignment of`
			`that type to the top level node:`
			```statix
			`programOk(tl@TopLevel(seq)) :- {T}`
			`type_Seq(seq) == T,`
			`@tl.type := T.`
			```
			Of course, for this to even parse, we must have a definition of `type_Seq`:
			```statix
			`{! ../signature/TYPE.stx extract: {start: module, stop: rules} !}`
			`**/`

			`// see docs/implementation.md for detail on how to switch to multi-file analysis`
init: SDF3/Statix project as initialized by Spoofax 2021-01-30 20:27:10 +00:00
			`rules // single-file entry point`

feat: Allow indented continuation lines Resolves #2. 2021-02-01 08:29:00 +00:00			`programOk : Start`
init: SDF3/Statix project as initialized by Spoofax 2021-01-30 20:27:10 +00:00
Add literal string constants (#19) In addition, for the sake of Haskell code generation, this PR adds static typing with Statix. Resolves #5. Co-authored-by: Glen Whitney <glen@studioinfinity.org> Reviewed-on: https://code.studioinfinity.org/glen/fostr/pulls/19 Co-Authored-By: Glen Whitney <glen@nobody@nowhere.net> Co-Committed-By: Glen Whitney <glen@nobody@nowhere.net> 2021-02-18 04:17:05 +00:00			`/** md`
			`rules`
			`type_Seq : Seq -> TYPE`
			```
			`**/`

			`type_LineSeq : LineSeq -> TYPE`

			`programOk(tl@TopLevel(seq)) :- {T}`
			`type_LineSeq(seq) == T,`
			`@tl.type := T.`

			`/** md`
			`Now to type a Seq, we look to the syntax, and see that there are two`
			`possibilities for what it might be: just an Ex, or a Sequence(_) of a`
			`list of 'Ex's. For the first, Statix does not allow one sort to simply`
			`"become" another, but the Spoofax infrastructure automatically inserts`
			`"injection" constructors for us, in this case one named Ex2Seq. So the`
			first rule for `type_Seq` is straightforward:

			```statix
			`type_Seq(s@Ex2Seq(e)) = T : -`
			`type_Ex(e) == T,`
			`@s.type := T.`
			```
			`where of course type_Ex needs its own declaration analogous to the above.`
			`**/`

			`type_Line : Line -> TYPE`

			`type_LineSeq(ls@Line2LineSeq(l)) = T :-`
			`type_Line(l) == T,`
			`@ls.type := T.`

			`/** md`

			The other (and in fact more typical) rule for `type_Seq`, when it actually
			`consists of a sequence of expressions, is a bit more involved. Fortunately`
			`Statix provides a primitive for mapping over a list, so we can proceed as`
			`follows:`
			```statix
			`types_Exs maps type_Ex(list()) = list()`
			`type_Seq(s@Sequence(l)) = T :- {lt}`
			`types_Exs(l) == lt,`
			`lastTYPE(lt) == T,`
			`@s.type := T.`
			```
			Here `lastTYPE` is a function that extracts the last TYPE from a list.
			`Unless/until Statix develops some sort of standard library, it must be`
			`hand-defined, as done in "statics/util.stx" like so:`
			```statix
			`{! ../statics/util.stx extract: {start: lastTYPE} !}`
			```
			`**/`

			`types_Lines maps type_Line(list()) = list()`

			`type_LineSeq(ls@Sequence(l)) = T :- {lt}`
			`types_Lines(l) == lt,`
			`lastTYPE(lt) == T,`
			`@ls.type := T.`

			`type_OptTermEx : OptTermEx -> TYPE`

			`type_Line(l@OptTermEx2Line(ote)) = T :-`
			`type_OptTermEx(ote) == T,`
			`@l.type := T.`

			`type_Ex : Ex -> TYPE`
			`type_TermEx : TermEx -> TYPE`

			`type_OptTermEx(ote@Ex2OptTermEx(e)) = T :-`
			`type_Ex(e) == T,`
			`@ote.type := T.`

			`type_OptTermEx(ote@TermEx2OptTermEx(te)) = T :-`
			`type_TermEx(te) == T,`
			`@ote.type := T.`

			`/** md`

			`This brings us to the syntax rules for the basic expressions themselves,`
			`which comprise almost all of the remaining fostr language constructs.`
			`But first a mechanism suggested by Ivo Wilms to avoid repeating the node`
			`type annotation in every rule:`
			```statix
			`**/`

			`/** md */`
			`ty_Ex : Ex -> TYPE`

			`type_Ex(e) = ty@ty_Ex(e) :-`
			`@e.type := ty.`
			`/* **/`

			`/** md`
			```
			`At this stage in fostr's development, there was no difference between a`
			`terminated and unterminated expression, so the typing rule for that`
			`constructor was trivial:`
			```statix
			`ty_Ex(Terminated(e)) = ty_Ex(e).`
			```
			`**/`

			`type_TermEx(te@Terminate(e)) = T :-`
			`type_Ex(e) == T,`
			`@te.type := T.`

			`/** md`

			`Now typing literals is straightforward:`
			```statix
feat: Add double-quoted string constants with escapes Resolves #20. 2021-02-19 03:41:54 +00:00			`{! "\git docs/statix_works:trans/statics.stx" extract:`
			`start: '(.ty_Ex.Int.\s*)'`
			`stop: '/. ../'`
			`!}`
			```
Add literal string constants (#19) In addition, for the sake of Haskell code generation, this PR adds static typing with Statix. Resolves #5. Co-authored-by: Glen Whitney <glen@studioinfinity.org> Reviewed-on: https://code.studioinfinity.org/glen/fostr/pulls/19 Co-Authored-By: Glen Whitney <glen@nobody@nowhere.net> Co-Committed-By: Glen Whitney <glen@nobody@nowhere.net> 2021-02-18 04:17:05 +00:00			`**/`

			`ty_Ex(Int(_)) = INT().`
			`ty_Ex(LitString(_)) = STRING().`
feat: Add double-quoted string constants with escapes Resolves #20. 2021-02-19 03:41:54 +00:00			`ty_Ex(EscString(_)) = STRING().`
Add literal string constants (#19) In addition, for the sake of Haskell code generation, this PR adds static typing with Statix. Resolves #5. Co-authored-by: Glen Whitney <glen@studioinfinity.org> Reviewed-on: https://code.studioinfinity.org/glen/fostr/pulls/19 Co-Authored-By: Glen Whitney <glen@nobody@nowhere.net> Co-Committed-By: Glen Whitney <glen@nobody@nowhere.net> 2021-02-18 04:17:05 +00:00			`ty_Ex(e@Stream()) = STREAM().`

			`/** md`

			`Finally we get to the binary operators, and here we use the pattern found in`
			`recent versions of the`
			`"[chicago](https://github.com/MetaBorgCube/statix-sandbox/tree/master/chicago)"`
			`example language and in the Fall 2020 TU-Delft class lecture on`
			`[Name Binding and Name Resolution](https://tudelft-cs4200-2020.github.io/lectures/2020/09/24/lecture5/).`
			`This pattern lets us specify error messages.`

			```statix
			`**/`

			`/** md */`
			`ty_Ex(Sum(e1, e2)) = INT() :-`
			`type_Ex(e1) == INT() \| error $[Expression [e1] not an Int in sum.]@e1,`
			`type_Ex(e2) == INT() \| error $[Expression [e2] not an Int in sum.]@e2.`

			`ty_Ex(Gets(e1, e2)) = STREAM() :- {T}`
			`type_Ex(e1) == STREAM() \| error $[Only Streams may receive items.]@e1,`
			`type_Ex(e2) == T.`

			`ty_Ex(To(e1, e2)) = T :-`
			`type_Ex(e1) == T,`
			`type_Ex(e2) == STREAM() \| error $[Items may only be sent to Streams.]@e2.`
			`/* **/`

			`/** md`
			```

			`### Using type annotations in transformation`

			`At this point, Statix properly types all of the valid programs of the very`
			`rudimentary language defined by the grammar above. But the proximate purpose`
			`for implementing this typing was to aid Haskell code generation. So how`
			`do we actually use the assigned types in a Stratego transformation?`

			`Statix provides a Stratego api that includes, among other items, strategies`
			`stx-get-ast-analysis` and `stx-get-ast-type(\|analysis)` that provide access
			`to the assigned types. However, it's easiest to use the information via`
			`a wrapper like this, essentially lifted from the "chicago" language project:`
			```stratego
			`{! analysis.str extract:`
			`start: Extract.the.type`
			`terminate: Prints.the.analyzed.type`
			`!}`
			```

			Now `get_type` run on a node of the analyzed AST produces the assigned `TYPE`
			`(as an ATerm in the constructors of sort TYPE in Statix).`

			`Thus, you can select on the assigned type, as in the strategy to select`
			`the correct Haskell operator to use to send an item to standard output:`
			```stratego
			`{! haskell.str extract:`
			`start: '(.hs_getOp.=.)'`
			`stop: \s`
			`!}`
			```
			`**/`
init: SDF3/Statix project as initialized by Spoofax 2021-01-30 20:27:10 +00:00
			`rules // multi-file entry point`

			`projectOk : scope`

			`projectOk(s).`

feat: Allow indented continuation lines Resolves #2. 2021-02-01 08:29:00 +00:00			`fileOk : scope * Start`
init: SDF3/Statix project as initialized by Spoofax 2021-01-30 20:27:10 +00:00
feat: Allow indented continuation lines Resolves #2. 2021-02-01 08:29:00 +00:00			`fileOk(s, TopLevel(_)).`