README.md

   1 Tiger.ml
   2 ========
   3 A Tiger-compiler implementation in (OCa)ML
   4
   5 Status
   6 ------
   7
   8 ![screenshot-tests-head](screenshots/tests-head.jpg)
   9 ...
  10 ![screenshot-tests-tail](screenshots/tests-tail.jpg)
  11
  12 ### Features
  13 #### Done
  14 - [x] ch 1: Warm-up AST
  15 - [x] ch 2: Lexer
  16 - [x] ch 3: Parser
  17 - [x] ch 4: AST
  18 - [x] ch 5: Semantic Analysis (type checking)
  19 #### In-progress
  20 - [ ] ch 6: Activation Records
  21 #### TODO (short-term)
  22 - [ ] ch 7: Translation to Intermediate Code
  23 - [ ] ch 08: Basic Blocks and Traces
  24 - [ ] ch 09: Instruction Selection
  25 - [ ] ch 10: Liveness Analysis
  26 - [ ] ch 11: Register Allocation
  27 - [ ] ch 12: Putting It All Together
  28 #### TODO (long-term)
  29 - [ ] ch 13: Garbage Collection
  30 - [ ] ch 15: Functional Programming Languages
  31 - [ ] ch 16: Polymorphic Types
  32 - [ ] ch 17: Dataflow Analysis
  33 - [ ] ch 18: Loop Optimizations
  34 - [ ] ch 19: Static Single-Assignment Form
  35 - [ ] ch 20: Pipelining and Scheduling
  36 - [ ] ch 21: The Memory Hierarchy
  37 #### Maybe
  38 - [ ] ch 14: Object-Oriented Languages
  39
  40 ### Technical issues
  41 - [-] testing framework
  42   - [x] run arbitrary code snippets
  43   - [x] check non-failures
  44   - [x] check expected output
  45   - [-] check expected exceptions
  46     - [x] semant stage
  47     - [ ] generalized expect `Output ('a option) | Exception of (exn -> bool)`
  48   - [x] run all book test case files
  49   - [-] grid view (cols: lex, pars, semant, etc.; rows: test cases.)
  50     - [x] implementation
  51     - [ ] refactoring
  52   - [ ] test time-outs (motive: cycle non-detection caused an infinite loop)
  53     - [ ] parallel test execution
  54 - [ ] Travis CI
  55
  56 Implementation Notes
  57 --------------------
  58
  59 ### Parser
  60
  61 #### shift/reduce conflicts
  62 ##### grouping consecutive declarations
  63 In order to support mutual recursion, we need to group consecutive
  64 type and function declarations (see Tiger-book pages 97-99).
  65
  66 Initially, I defined the rules to do so as:
  67
  68     decs:
  69       | dec      { $1 :: [] }
  70       | dec decs { $1 :: $2 }
  71       ;
  72     dec:
  73       | var_dec  { $1 }
  74       | typ_decs { Ast.TypeDecs $1 }
  75       | fun_decs { Ast.FunDecs $1 }
  76       ;
  77
  78 which, while straightforward (and working, because `ocamlyacc` defaults to
  79 shift in case of a conflict), nonetheless caused a shift/reduce conflict in
  80 each of: `typ_decs` and `fun_decs`; where the parser did not know whether to
  81 shift and stay in `(typ|fun_)_dec` state or to reduce and get back to `dec`
  82 state.
  83
  84 Sadly, tagging the rules with a lower precedence (to explicitly favor
  85 shifting) - does not help :(
  86
  87     %nonassoc LOWEST
  88     ...
  89     dec:
  90       | var_dec                { $1 }
  91       | typ_decs  %prec LOWEST { Ast.TypeDecs $1 }
  92       | fun_decs  %prec LOWEST { Ast.FunDecs $1 }
  93       ;
  94
  95 The difficulty seems to be in the lack of a separator token which would be
  96 able to definitively mark the end of each sequence of consecutive
  97 `(typ_|fun_)` declarations.
  98
  99 Keeping this in mind, another alternative is to manually capture the possible
 100 interspersion patterns in the rules like:
 101
 102     (N * foo) followed-by (N * not-foo)
 103
 104 for the exception of `var_dec`, which, since we do not need to group its
 105 consecutive sequences, can be reduced upon first sighting.
 106
 107 ##### lval
 108
 109 ### AST
 110
 111 #### print as M-exp
 112
 113 I chose to pretty-print AST as an (indented)
 114 [M-expression](https://en.wikipedia.org/wiki/M-expression) - an underrated
 115 format, used in Mathematica and was intended for Lisp by McCarthy himself; it
 116 is nearly as flexible as S-expressions, but significantly more readable (IMO).
 117
 118 As an example, here is what `test28.tig` looks like after parsing and
 119 pretty-printing:
 120
 121     LetExp[
 122         [
 123         TypeDecs[
 124             TypeDec[
 125                 arrtype1,
 126                 ArrayTy[
 127                 int]],
 128             TypeDec[
 129                 arrtype2,
 130                 ArrayTy[
 131                 int]]],
 132         VarDec[
 133             arr1,
 134             arrtype1,
 135             ArrayExp[
 136                 arrtype2,
 137                 IntExp[
 138                     10],
 139                 IntExp[
 140                     0]]]],
 141         SeqExp[
 142             VarExp[
 143                 SimpleVar[
 144                     arr1]]]]
 145
 146 ### Machine
 147 Will most-likely compile to RISC and execute using SPIM (as favored by Appel)