README.md

   1 Tiger.ml
   2 ========
   3 A Tiger-compiler implementation in (OCa)ML
   4
   5 Status
   6 ------
   7
   8 ![screenshot-tests-head](screenshots/tests-head.jpg)
   9 ...
  10 ![screenshot-tests-tail](screenshots/tests-tail.jpg)
  11
  12 ### Features
  13 #### Done
  14 - [x] ch 1: Warm-up AST
  15 - [x] ch 2: Lexer
  16 - [x] ch 3: Parser
  17 - [x] ch 4: AST
  18 - [x] ch 5: Semantic Analysis (type checking)
  19 - [x] ch 6: Activation Records
  20 #### In-progress
  21 - [ ] ch 7: Translation to Intermediate Code
  22 #### TODO (short-term)
  23 - [ ] ch 08: Basic Blocks and Traces
  24 - [ ] ch 09: Instruction Selection
  25 - [ ] ch 10: Liveness Analysis
  26 - [ ] ch 11: Register Allocation
  27 - [ ] ch 12: Putting It All Together
  28 #### TODO (long-term)
  29 - [ ] ch 13: Garbage Collection
  30 - [ ] ch 15: Functional Programming Languages
  31 - [ ] ch 16: Polymorphic Types
  32 - [ ] ch 17: Dataflow Analysis
  33 - [ ] ch 18: Loop Optimizations
  34 - [ ] ch 19: Static Single-Assignment Form
  35 - [ ] ch 20: Pipelining and Scheduling
  36 - [ ] ch 21: The Memory Hierarchy
  37 #### Maybe
  38 - [ ] ch 14: Object-Oriented Languages
  39
  40 ### Technical issues
  41 - [-] testing framework
  42   - [x] run arbitrary code snippets
  43   - [x] check non-failures
  44   - [x] check expected output
  45   - [-] check expected exceptions
  46     - [x] semant stage
  47     - [ ] generalized expect `Output ('a option) | Exception of (exn -> bool)`
  48   - [x] run all book test case files
  49   - [-] grid view (cols: lex, pars, semant, etc.; rows: test cases.)
  50     - [x] implementation
  51     - [ ] refactoring
  52   - [ ] test time-outs (motive: cycle non-detection caused an infinite loop)
  53     - [ ] parallel test execution
  54 - [ ] Travis CI
  55
  56 Implementation Notes
  57 --------------------
  58
  59 ### Parser
  60
  61 #### shift/reduce conflicts
  62 ##### grouping consecutive declarations
  63 In order to support mutual recursion, we need to group consecutive
  64 type and function declarations (see Tiger-book pages 97-99).
  65
  66 Initially, I defined the rules to do so as:
  67
  68     decs:
  69       | dec      { $1 :: [] }
  70       | dec decs { $1 :: $2 }
  71       ;
  72     dec:
  73       | var_dec  { $1 }
  74       | typ_decs { Ast.TypeDecs $1 }
  75       | fun_decs { Ast.FunDecs $1 }
  76       ;
  77
  78 which, while straightforward (and working, because `ocamlyacc` defaults to
  79 shift in case of a conflict), nonetheless caused a shift/reduce conflict in
  80 each of: `typ_decs` and `fun_decs`; where the parser did not know whether to
  81 shift and stay in `(typ|fun_)_dec` state or to reduce and get back to `dec`
  82 state.
  83
  84 Sadly, tagging the rules with a lower precedence (to explicitly favor
  85 shifting) - does not help :(
  86
  87     %nonassoc LOWEST
  88     ...
  89     dec:
  90       | var_dec                { $1 }
  91       | typ_decs  %prec LOWEST { Ast.TypeDecs $1 }
  92       | fun_decs  %prec LOWEST { Ast.FunDecs $1 }
  93       ;
  94
  95 The difficulty seems to be in the lack of a separator token which would be
  96 able to definitively mark the end of each sequence of consecutive
  97 `(typ_|fun_)` declarations.
  98
  99 Keeping this in mind, another alternative is to manually capture the possible
 100 interspersion patterns in the rules like:
 101
 102     (N * foo) followed-by (N * not-foo)
 103
 104 for the exception of `var_dec`, which, since we do not need to group its
 105 consecutive sequences, can be reduced upon first sighting.
 106
 107 The final rules I ended-up with are:
 108
 109     decs:
 110       | var_dec   decs_any          { $1 :: $2 }
 111       | fun_decs  decs_any_but_fun  { (Ast.FunDecs  $1) :: $2 }
 112       | typ_decs  decs_any_but_typ  { (Ast.TypeDecs $1) :: $2 }
 113       ;
 114
 115     decs_any:
 116       |                             { [] }
 117       | var_dec   decs_any          { $1 :: $2 }
 118       | fun_decs  decs_any_but_fun  { (Ast.FunDecs  $1) :: $2 }
 119       | typ_decs  decs_any_but_typ  { (Ast.TypeDecs $1) :: $2 }
 120       ;
 121
 122     decs_any_but_fun:
 123       |                             { [] }
 124       | var_dec   decs_any          { $1 :: $2 }
 125       | typ_decs  decs_any_but_typ  { (Ast.TypeDecs $1) :: $2 }
 126       ;
 127
 128     decs_any_but_typ:
 129       |                             { [] }
 130       | var_dec   decs_any          { $1 :: $2 }
 131       | fun_decs  decs_any_but_fun  { (Ast.FunDecs $1) :: $2 }
 132       ;
 133
 134 ##### lval
 135
 136 ### AST
 137
 138 #### print as M-exp
 139
 140 I chose to pretty-print AST as an (indented)
 141 [M-expression](https://en.wikipedia.org/wiki/M-expression) - an underrated
 142 format, used in Mathematica and was intended for Lisp by McCarthy himself; it
 143 is nearly as flexible as S-expressions, but significantly more readable (IMO).
 144
 145 As an example, here is what `test28.tig` looks like after parsing and
 146 pretty-printing:
 147
 148     LetExp[
 149         [
 150         TypeDecs[
 151             TypeDec[
 152                 arrtype1,
 153                 ArrayTy[
 154                 int]],
 155             TypeDec[
 156                 arrtype2,
 157                 ArrayTy[
 158                 int]]],
 159         VarDec[
 160             arr1,
 161             arrtype1,
 162             ArrayExp[
 163                 arrtype2,
 164                 IntExp[
 165                     10],
 166                 IntExp[
 167                     0]]]],
 168         SeqExp[
 169             VarExp[
 170                 SimpleVar[
 171                     arr1]]]]
 172
 173 ### Machine
 174 Will most-likely compile to RISC and execute using SPIM (as favored by Appel)