| | 1 | Tiger.ml |
| | 2 | ======== |
| | 3 | A Tiger-compiler implementation in (OCa)ML |
| | 4 | |
| | 5 | Status |
| | 6 | ------ |
| | 7 | |
| | 8 |  |
| | 9 | ... |
| | 10 |  |
| | 11 | |
| | 12 | ### Features |
| | 13 | #### Done |
| | 14 | - [x] ch 1: Warm-up AST |
| | 15 | - [x] ch 2: Lexer |
| | 16 | - [x] ch 3: Parser |
| | 17 | - [x] ch 4: AST |
| | 18 | - [x] ch 5: Semantic Analysis (type checking) |
| | 19 | - [x] ch 6: Activation Records |
| | 20 | #### In-progress |
| | 21 | - [ ] ch 7: Translation to Intermediate Code |
| | 22 | #### TODO (short-term) |
| | 23 | - [ ] ch 08: Basic Blocks and Traces |
| | 24 | - [ ] ch 09: Instruction Selection |
| | 25 | - [ ] ch 10: Liveness Analysis |
| | 26 | - [ ] ch 11: Register Allocation |
| | 27 | - [ ] ch 12: Putting It All Together |
| | 28 | #### TODO (long-term) |
| | 29 | - [ ] ch 13: Garbage Collection |
| | 30 | - [ ] ch 15: Functional Programming Languages |
| | 31 | - [ ] ch 16: Polymorphic Types |
| | 32 | - [ ] ch 17: Dataflow Analysis |
| | 33 | - [ ] ch 18: Loop Optimizations |
| | 34 | - [ ] ch 19: Static Single-Assignment Form |
| | 35 | - [ ] ch 20: Pipelining and Scheduling |
| | 36 | - [ ] ch 21: The Memory Hierarchy |
| | 37 | #### Maybe |
| | 38 | - [ ] ch 14: Object-Oriented Languages |
| | 39 | |
| | 40 | ### Technical issues |
| | 41 | - [-] testing framework |
| | 42 | - [x] run arbitrary code snippets |
| | 43 | - [x] check non-failures |
| | 44 | - [x] check expected output |
| | 45 | - [-] check expected exceptions |
| | 46 | - [x] semant stage |
| | 47 | - [ ] generalized expect `Output ('a option) | Exception of (exn -> bool)` |
| | 48 | - [x] run all book test case files |
| | 49 | - [-] grid view (cols: lex, pars, semant, etc.; rows: test cases.) |
| | 50 | - [x] implementation |
| | 51 | - [ ] refactoring |
| | 52 | - [ ] test time-outs (motive: cycle non-detection caused an infinite loop) |
| | 53 | - [ ] parallel test execution |
| | 54 | - [ ] Travis CI |
| | 55 | |
| | 56 | Implementation Notes |
| | 57 | -------------------- |
| | 58 | |
| | 59 | ### Parser |
| | 60 | |
| | 61 | #### shift/reduce conflicts |
| | 62 | ##### grouping consecutive declarations |
| | 63 | In order to support mutual recursion, we need to group consecutive |
| | 64 | type and function declarations (see Tiger-book pages 97-99). |
| | 65 | |
| | 66 | Initially, I defined the rules to do so as: |
| | 67 | |
| | 68 | decs: |
| | 69 | | dec { $1 :: [] } |
| | 70 | | dec decs { $1 :: $2 } |
| | 71 | ; |
| | 72 | dec: |
| | 73 | | var_dec { $1 } |
| | 74 | | typ_decs { Ast.TypeDecs $1 } |
| | 75 | | fun_decs { Ast.FunDecs $1 } |
| | 76 | ; |
| | 77 | |
| | 78 | which, while straightforward (and working, because `ocamlyacc` defaults to |
| | 79 | shift in case of a conflict), nonetheless caused a shift/reduce conflict in |
| | 80 | each of: `typ_decs` and `fun_decs`; where the parser did not know whether to |
| | 81 | shift and stay in `(typ|fun_)_dec` state or to reduce and get back to `dec` |
| | 82 | state. |
| | 83 | |
| | 84 | Sadly, tagging the rules with a lower precedence (to explicitly favor |
| | 85 | shifting) - does not help :( |
| | 86 | |
| | 87 | %nonassoc LOWEST |
| | 88 | ... |
| | 89 | dec: |
| | 90 | | var_dec { $1 } |
| | 91 | | typ_decs %prec LOWEST { Ast.TypeDecs $1 } |
| | 92 | | fun_decs %prec LOWEST { Ast.FunDecs $1 } |
| | 93 | ; |
| | 94 | |
| | 95 | The difficulty seems to be in the lack of a separator token which would be |
| | 96 | able to definitively mark the end of each sequence of consecutive |
| | 97 | `(typ_|fun_)` declarations. |
| | 98 | |
| | 99 | Keeping this in mind, another alternative is to manually capture the possible |
| | 100 | interspersion patterns in the rules like: |
| | 101 | |
| | 102 | (N * foo) followed-by (N * not-foo) |
| | 103 | |
| | 104 | for the exception of `var_dec`, which, since we do not need to group its |
| | 105 | consecutive sequences, can be reduced upon first sighting. |
| | 106 | |
| | 107 | The final rules I ended-up with are: |
| | 108 | |
| | 109 | decs: |
| | 110 | | var_dec decs_any { $1 :: $2 } |
| | 111 | | fun_decs decs_any_but_fun { (Ast.FunDecs $1) :: $2 } |
| | 112 | | typ_decs decs_any_but_typ { (Ast.TypeDecs $1) :: $2 } |
| | 113 | ; |
| | 114 | |
| | 115 | decs_any: |
| | 116 | | { [] } |
| | 117 | | var_dec decs_any { $1 :: $2 } |
| | 118 | | fun_decs decs_any_but_fun { (Ast.FunDecs $1) :: $2 } |
| | 119 | | typ_decs decs_any_but_typ { (Ast.TypeDecs $1) :: $2 } |
| | 120 | ; |
| | 121 | |
| | 122 | decs_any_but_fun: |
| | 123 | | { [] } |
| | 124 | | var_dec decs_any { $1 :: $2 } |
| | 125 | | typ_decs decs_any_but_typ { (Ast.TypeDecs $1) :: $2 } |
| | 126 | ; |
| | 127 | |
| | 128 | decs_any_but_typ: |
| | 129 | | { [] } |
| | 130 | | var_dec decs_any { $1 :: $2 } |
| | 131 | | fun_decs decs_any_but_fun { (Ast.FunDecs $1) :: $2 } |
| | 132 | ; |
| | 133 | |
| | 134 | ##### lval |
| | 135 | |
| | 136 | ### AST |
| | 137 | |
| | 138 | #### print as M-exp |
| | 139 | |
| | 140 | I chose to pretty-print AST as an (indented) |
| | 141 | [M-expression](https://en.wikipedia.org/wiki/M-expression) - an underrated |
| | 142 | format, used in Mathematica and was intended for Lisp by McCarthy himself; it |
| | 143 | is nearly as flexible as S-expressions, but significantly more readable (IMO). |
| | 144 | |
| | 145 | As an example, here is what `test28.tig` looks like after parsing and |
| | 146 | pretty-printing: |
| | 147 | |
| | 148 | LetExp[ |
| | 149 | [ |
| | 150 | TypeDecs[ |
| | 151 | TypeDec[ |
| | 152 | arrtype1, |
| | 153 | ArrayTy[ |
| | 154 | int]], |
| | 155 | TypeDec[ |
| | 156 | arrtype2, |
| | 157 | ArrayTy[ |
| | 158 | int]]], |
| | 159 | VarDec[ |
| | 160 | arr1, |
| | 161 | arrtype1, |
| | 162 | ArrayExp[ |
| | 163 | arrtype2, |
| | 164 | IntExp[ |
| | 165 | 10], |
| | 166 | IntExp[ |
| | 167 | 0]]]], |
| | 168 | SeqExp[ |
| | 169 | VarExp[ |
| | 170 | SimpleVar[ |
| | 171 | arr1]]]] |
| | 172 | |
| | 173 | ### Machine |
| | 174 | Will most-likely compile to RISC and execute using SPIM (as favored by Appel) |