2021-01-30 20:27:10 +00:00
|
|
|
module statics
|
|
|
|
|
2021-01-30 23:37:53 +00:00
|
|
|
imports signatures/fostr-sig
|
2021-02-18 04:17:05 +00:00
|
|
|
imports signature/TYPE
|
|
|
|
imports statics/util
|
2021-01-30 23:37:53 +00:00
|
|
|
|
2021-02-18 04:17:05 +00:00
|
|
|
/** md
|
|
|
|
Title: Adding Program Analysis with Statix
|
|
|
|
|
|
|
|
## Development of fostr static analysis
|
|
|
|
|
|
|
|
This section is more documentation of Spoofax in general and Statix
|
|
|
|
in particular than of fostr itself, but is being maintained here in case
|
|
|
|
it could be either helpful to someone getting started with Statix or
|
|
|
|
helpful in understanding how the static characteristics of fostr were designed.
|
|
|
|
|
|
|
|
As mentioned in the [Overview](../README.md), I don't like to program and a
|
|
|
|
corollary of that is never to use a facility unless/until there's a need for
|
|
|
|
it. So the first few rudimentary passes at fostr simply declared every program
|
|
|
|
to be "OK" from the point of view of Statix:
|
|
|
|
```statix
|
|
|
|
{! "\git docs/statix_start:trans/statics.stx" extract:
|
|
|
|
start: programOk
|
|
|
|
stop: (.*TopLevel.*)
|
|
|
|
!}
|
|
|
|
```
|
|
|
|
|
|
|
|
Then I reached the point at which the grammar was basically just
|
|
|
|
```SDF3
|
|
|
|
// Start.TopLevel = <Seq>
|
|
|
|
// Seq = <Ex>
|
|
|
|
// Seq.Sequence = sq:Ex+ {layout(align-list sq)}
|
|
|
|
// Ex.Terminated = <<Ex>;>
|
|
|
|
{! "\git docs/statix_start:syntax/fostr.sdf3" extract:
|
|
|
|
start: TermEx.Terminate
|
|
|
|
stop: (.*bracket.*)
|
|
|
|
!}
|
|
|
|
```
|
|
|
|
(The first four clauses are in comments because they approximate fostr's
|
|
|
|
grammar; it actually uses a few more sorts for sequences of
|
|
|
|
expressions, to achieve fostr's exact layout rules. Also note that the parsing
|
|
|
|
of literal strings later evolved to include the surrounding single quotes,
|
|
|
|
because the rule above implicitly allows layout between the quotes and the
|
|
|
|
string contents, creating ambiguity.)
|
|
|
|
|
|
|
|
This was the first point at which there were two different types that might
|
|
|
|
need to be written to standard output (Int and String), and although of course
|
|
|
|
the dynamically-typed Python and Javascript code generated dealt with both fine,
|
|
|
|
the Haskell code needed to differ depending on the
|
|
|
|
type of the item written (and I hadn't even started OCaml code generation at
|
|
|
|
that point since I knew it would be hopeless without statically typing fostr
|
|
|
|
programs).
|
|
|
|
|
|
|
|
So it was time to bite the bullet and add type checking via Statix to fostr.
|
|
|
|
The first step was to replace the simple assertion that any TopLevel
|
|
|
|
is OK with a constraint that its Seq must type properly, and an assignment of
|
|
|
|
that type to the top level node:
|
|
|
|
```statix
|
|
|
|
programOk(tl@TopLevel(seq)) :- {T}
|
|
|
|
type_Seq(seq) == T,
|
|
|
|
@tl.type := T.
|
|
|
|
```
|
|
|
|
Of course, for this to even parse, we must have a definition of `type_Seq`:
|
|
|
|
```statix
|
|
|
|
{! ../signature/TYPE.stx extract: {start: module, stop: rules} !}
|
|
|
|
**/
|
|
|
|
|
|
|
|
// see docs/implementation.md for detail on how to switch to multi-file analysis
|
2021-01-30 20:27:10 +00:00
|
|
|
|
|
|
|
rules // single-file entry point
|
|
|
|
|
2021-02-01 08:29:00 +00:00
|
|
|
programOk : Start
|
2021-01-30 20:27:10 +00:00
|
|
|
|
2021-02-18 04:17:05 +00:00
|
|
|
/** md
|
|
|
|
rules
|
|
|
|
type_Seq : Seq -> TYPE
|
|
|
|
```
|
|
|
|
**/
|
|
|
|
|
|
|
|
type_LineSeq : LineSeq -> TYPE
|
|
|
|
|
|
|
|
programOk(tl@TopLevel(seq)) :- {T}
|
|
|
|
type_LineSeq(seq) == T,
|
|
|
|
@tl.type := T.
|
|
|
|
|
|
|
|
/** md
|
|
|
|
Now to type a Seq, we look to the syntax, and see that there are two
|
|
|
|
possibilities for what it might be: just an Ex, or a Sequence(_) of a
|
|
|
|
list of 'Ex's. For the first, Statix does not allow one sort to simply
|
|
|
|
"become" another, but the Spoofax infrastructure automatically inserts
|
|
|
|
"injection" constructors for us, in this case one named Ex2Seq. So the
|
|
|
|
first rule for `type_Seq` is straightforward:
|
|
|
|
|
|
|
|
```statix
|
|
|
|
type_Seq(s@Ex2Seq(e)) = T : -
|
|
|
|
type_Ex(e) == T,
|
|
|
|
@s.type := T.
|
|
|
|
```
|
|
|
|
where of course type_Ex needs its own declaration analogous to the above.
|
|
|
|
**/
|
|
|
|
|
|
|
|
type_Line : Line -> TYPE
|
|
|
|
|
|
|
|
type_LineSeq(ls@Line2LineSeq(l)) = T :-
|
|
|
|
type_Line(l) == T,
|
|
|
|
@ls.type := T.
|
|
|
|
|
|
|
|
/** md
|
|
|
|
|
|
|
|
The other (and in fact more typical) rule for `type_Seq`, when it actually
|
|
|
|
consists of a sequence of expressions, is a bit more involved. Fortunately
|
|
|
|
Statix provides a primitive for mapping over a list, so we can proceed as
|
|
|
|
follows:
|
|
|
|
```statix
|
|
|
|
types_Exs maps type_Ex(list(*)) = list(*)
|
|
|
|
type_Seq(s@Sequence(l)) = T :- {lt}
|
|
|
|
types_Exs(l) == lt,
|
|
|
|
lastTYPE(lt) == T,
|
|
|
|
@s.type := T.
|
|
|
|
```
|
|
|
|
Here `lastTYPE` is a function that extracts the last TYPE from a list.
|
|
|
|
Unless/until Statix develops some sort of standard library, it must be
|
|
|
|
hand-defined, as done in "statics/util.stx" like so:
|
|
|
|
```statix
|
|
|
|
{! ../statics/util.stx extract: {start: lastTYPE} !}
|
|
|
|
```
|
|
|
|
**/
|
|
|
|
|
|
|
|
types_Lines maps type_Line(list(*)) = list(*)
|
|
|
|
|
|
|
|
type_LineSeq(ls@Sequence(l)) = T :- {lt}
|
|
|
|
types_Lines(l) == lt,
|
|
|
|
lastTYPE(lt) == T,
|
|
|
|
@ls.type := T.
|
|
|
|
|
|
|
|
type_OptTermEx : OptTermEx -> TYPE
|
|
|
|
|
|
|
|
type_Line(l@OptTermEx2Line(ote)) = T :-
|
|
|
|
type_OptTermEx(ote) == T,
|
|
|
|
@l.type := T.
|
|
|
|
|
|
|
|
type_Ex : Ex -> TYPE
|
|
|
|
type_TermEx : TermEx -> TYPE
|
|
|
|
|
|
|
|
type_OptTermEx(ote@Ex2OptTermEx(e)) = T :-
|
|
|
|
type_Ex(e) == T,
|
|
|
|
@ote.type := T.
|
|
|
|
|
|
|
|
type_OptTermEx(ote@TermEx2OptTermEx(te)) = T :-
|
|
|
|
type_TermEx(te) == T,
|
|
|
|
@ote.type := T.
|
|
|
|
|
|
|
|
/** md
|
|
|
|
|
|
|
|
This brings us to the syntax rules for the basic expressions themselves,
|
|
|
|
which comprise almost all of the remaining fostr language constructs.
|
|
|
|
But first a mechanism suggested by Ivo Wilms to avoid repeating the node
|
|
|
|
type annotation in every rule:
|
|
|
|
```statix
|
|
|
|
**/
|
|
|
|
|
|
|
|
/** md */
|
|
|
|
ty_Ex : Ex -> TYPE
|
|
|
|
|
|
|
|
type_Ex(e) = ty@ty_Ex(e) :-
|
|
|
|
@e.type := ty.
|
|
|
|
/* **/
|
|
|
|
|
|
|
|
/** md
|
|
|
|
```
|
|
|
|
At this stage in fostr's development, there was no difference between a
|
|
|
|
terminated and unterminated expression, so the typing rule for that
|
|
|
|
constructor was trivial:
|
|
|
|
```statix
|
|
|
|
ty_Ex(Terminated(e)) = ty_Ex(e).
|
|
|
|
```
|
|
|
|
**/
|
|
|
|
|
|
|
|
type_TermEx(te@Terminate(e)) = T :-
|
|
|
|
type_Ex(e) == T,
|
|
|
|
@te.type := T.
|
|
|
|
|
|
|
|
/** md
|
|
|
|
|
|
|
|
Now typing literals is straightforward:
|
|
|
|
```statix
|
2021-02-19 03:41:54 +00:00
|
|
|
{! "\git docs/statix_works:trans/statics.stx" extract:
|
|
|
|
start: '(.*ty_Ex.Int.*\s*)'
|
|
|
|
stop: '/. ../'
|
|
|
|
!}
|
|
|
|
```
|
2021-02-18 04:17:05 +00:00
|
|
|
**/
|
|
|
|
|
|
|
|
ty_Ex(Int(_)) = INT().
|
|
|
|
ty_Ex(LitString(_)) = STRING().
|
2021-02-19 03:41:54 +00:00
|
|
|
ty_Ex(EscString(_)) = STRING().
|
2021-02-18 04:17:05 +00:00
|
|
|
ty_Ex(e@Stream()) = STREAM().
|
|
|
|
|
|
|
|
/** md
|
|
|
|
|
|
|
|
Finally we get to the binary operators, and here we use the pattern found in
|
|
|
|
recent versions of the
|
|
|
|
"[chicago](https://github.com/MetaBorgCube/statix-sandbox/tree/master/chicago)"
|
|
|
|
example language and in the Fall 2020 TU-Delft class lecture on
|
|
|
|
[Name Binding and Name Resolution](https://tudelft-cs4200-2020.github.io/lectures/2020/09/24/lecture5/).
|
|
|
|
This pattern lets us specify error messages.
|
|
|
|
|
|
|
|
```statix
|
|
|
|
**/
|
|
|
|
|
|
|
|
/** md */
|
|
|
|
ty_Ex(Sum(e1, e2)) = INT() :-
|
|
|
|
type_Ex(e1) == INT() | error $[Expression [e1] not an Int in sum.]@e1,
|
|
|
|
type_Ex(e2) == INT() | error $[Expression [e2] not an Int in sum.]@e2.
|
|
|
|
|
|
|
|
ty_Ex(Gets(e1, e2)) = STREAM() :- {T}
|
|
|
|
type_Ex(e1) == STREAM() | error $[Only Streams may receive items.]@e1,
|
|
|
|
type_Ex(e2) == T.
|
|
|
|
|
|
|
|
ty_Ex(To(e1, e2)) = T :-
|
|
|
|
type_Ex(e1) == T,
|
|
|
|
type_Ex(e2) == STREAM() | error $[Items may only be sent to Streams.]@e2.
|
|
|
|
/* **/
|
|
|
|
|
|
|
|
/** md
|
|
|
|
```
|
|
|
|
|
|
|
|
### Using type annotations in transformation
|
|
|
|
|
|
|
|
At this point, Statix properly types all of the valid programs of the very
|
|
|
|
rudimentary language defined by the grammar above. But the proximate purpose
|
|
|
|
for implementing this typing was to aid Haskell code generation. So how
|
|
|
|
do we actually use the assigned types in a Stratego transformation?
|
|
|
|
|
|
|
|
Statix provides a Stratego api that includes, among other items, strategies
|
|
|
|
`stx-get-ast-analysis` and `stx-get-ast-type(|analysis)` that provide access
|
|
|
|
to the assigned types. However, it's easiest to use the information via
|
|
|
|
a wrapper like this, essentially lifted from the "chicago" language project:
|
|
|
|
```stratego
|
|
|
|
{! analysis.str extract:
|
|
|
|
start: Extract.the.type
|
|
|
|
terminate: Prints.the.analyzed.type
|
|
|
|
!}
|
|
|
|
```
|
|
|
|
|
|
|
|
Now `get_type` run on a node of the analyzed AST produces the assigned `TYPE`
|
|
|
|
(as an ATerm in the constructors of sort TYPE in Statix).
|
|
|
|
|
|
|
|
Thus, you can select on the assigned type, as in the strategy to select
|
|
|
|
the correct Haskell operator to use to send an item to standard output:
|
|
|
|
```stratego
|
|
|
|
{! haskell.str extract:
|
|
|
|
start: '(.*hs_getOp.=.*)'
|
|
|
|
stop: \s
|
|
|
|
!}
|
|
|
|
```
|
|
|
|
**/
|
2021-01-30 20:27:10 +00:00
|
|
|
|
|
|
|
rules // multi-file entry point
|
|
|
|
|
|
|
|
projectOk : scope
|
|
|
|
|
|
|
|
projectOk(s).
|
|
|
|
|
2021-02-01 08:29:00 +00:00
|
|
|
fileOk : scope * Start
|
2021-01-30 20:27:10 +00:00
|
|
|
|
2021-02-01 08:29:00 +00:00
|
|
|
fileOk(s, TopLevel(_)).
|