X-Git-Url: https://git.xandkar.net/?p=tiger.ml.git;a=blobdiff_plain;f=README.md;h=6127cf991010a65848ce5f8173d98db7a4512445;hp=f711a2ead1b38b22f730a3b08639099274c2d8db;hb=HEAD;hpb=ba1409d14bb13a8a0d63246fa0323e0dc4a801aa diff --git a/README.md b/README.md index f711a2e..6127cf9 100644 --- a/README.md +++ b/README.md @@ -1,83 +1,174 @@ Tiger.ml ======== +A Tiger-compiler implementation in (OCa)ML -Project Plan ------------- - -| status | id | title | pages | estimate | actual | start | finish | -|--------|------------|------------------------------------------|-------|----------|--------|------------|------------| -| [-] | 0 | Modern Compiler Implementation in ML | 512 | 28-84 | -- | 2018-04-16 | xxxx-xx-xx | -| ====== | ========== | ======================================== | ===== | ======== | ====== | ========== | ========== | -| [-] | 1 | - Fundamentals of Compilation | 264 | 14 | -- | 2018-04-16 | ---------- | -| [-] | 1.01 | -- Introduction | 011 | 01 | -- | 2018-04-16 | ---------- | -| [x] | 1.01.1 | --- Modules and interfaces | 001 | -- | -- | 2018-04-16 | ---------- | -| [x] | 1.01.2 | --- Tools and Software | 002 | -- | -- | 2018-04-16 | ---------- | -| [x] | 1.01.3 | --- Data structures for tree languages | 003 | -- | -- | 2018-04-16 | ---------- | -| [x] | 1.01.p | --- Program | 002 | -- | -- | 2018-04-16 | ---------- | -| [x] | 1.01.p.1 | ---- interpreter: maxargs | --- | -- | -- | 2018-04-17 | 2018-04-17 | -| [x] | 1.01.p.2 | ---- interpreter: interp | --- | -- | -- | 2018-04-17 | 2018-04-17 | -| [ ] | 1.01.e | --- Exercises | 002 | -- | -- | ---------- | ---------- | -| [ ] | 1.01.e.1.a | ---- tree member | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.01.e.1.b | ---- tree key/val | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.01.e.1.c | ---- demo unbalanced behaviour | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.01.e.1.d | ---- find functional balanced tree | --- | -- | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.02 | -- Lexical Analysis | 024 | 01 | -- | ---------- | ---------- | -| [ ] | 1.02.1 | --- Lexical tokens | 001 | -- | -- | ---------- | ---------- | -| [ ] | 1.02.2 | --- Regular expressions | 003 | -- | -- | ---------- | ---------- | -| [ ] | 1.02.3 | --- Finite automata | 003 | -- | -- | ---------- | ---------- | -| [ ] | 1.02.4 | --- Nondeterministic finite automata | 006 | -- | -- | ---------- | ---------- | -| [ ] | 1.02.5 | --- ML-Lex: a lexical analyzer generator | 003 | -- | -- | ---------- | ---------- | -| [ ] | 1.02.p | --- Program | 002 | -- | -- | ---------- | ---------- | -| [ ] | 1.02.p.1 | ---- Tiger lexer | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e | --- Exercises | 004 | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.01 | ---- regexes | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.02 | ---- why no regexes | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.03 | ---- explain automata | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.04 | ---- regex to nondeterministic automata | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.05 | ---- NFA to DFA | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.06 | ---- merge equivalent automata states | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.07 | ---- DFA to regex | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.08 | ---- analyze lexer based on given DFA | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.09 | ---- generate lexer tables from spec | --- | -- | -- | ---------- | ---------- | -| [ ] | 1.02.e.10 | ---- design better lookahead than Aho | --- | -- | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.03 | -- Parsing | 049 | 02 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.04 | -- Abstract Syntax | 016 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.05 | -- Semantic Analysis | 021 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.06 | -- Activation Records | 024 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.07 | -- Translation to Intermediate Code | 025 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.08 | -- Basic Blocks and Traces | 013 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.09 | -- Instruction Selection | 025 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.10 | -- Liveness Analysis | 017 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.11 | -- Register Allocation | 030 | 02 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 1.12 | -- Putting It | 009 | 01 | -- | ---------- | ---------- | -| ====== | ========== | ======================================== | ===== | ======== | ====== | ---------- | ---------- | -| [ ] | 2 | - Advanced Topics | 245 | 14 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.13 | -- Garbage Collection | 026 | 02 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.14 | -- Object-Oriented Languages | 016 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.15 | -- Functional Programming Languages | 035 | 02 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.16 | -- Polymorphic Types | 033 | 02 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.17 | -- Dataflow Analysis | 027 | 02 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.18 | -- Loop Optimizations | 023 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.19 | -- Static Single-Assignment Form | 041 | 02 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.20 | -- Pipelining and Scheduling | 024 | 01 | -- | ---------- | ---------- | -| ------ | ---------- | ---------------------------------------- | ----- | -------- | ------ | ---------- | ---------- | -| [ ] | 2.21 | -- The Memory Hierarchy | 020 | 01 | -- | ---------- | ---------- | +Status +------ + +![screenshot-tests-head](screenshots/tests-head.jpg) +... +![screenshot-tests-tail](screenshots/tests-tail.jpg) + +### Features +#### Done +- [x] ch 1: Warm-up AST +- [x] ch 2: Lexer +- [x] ch 3: Parser +- [x] ch 4: AST +- [x] ch 5: Semantic Analysis (type checking) +- [x] ch 6: Activation Records +#### In-progress +- [ ] ch 7: Translation to Intermediate Code +#### TODO (short-term) +- [ ] ch 08: Basic Blocks and Traces +- [ ] ch 09: Instruction Selection +- [ ] ch 10: Liveness Analysis +- [ ] ch 11: Register Allocation +- [ ] ch 12: Putting It All Together +#### TODO (long-term) +- [ ] ch 13: Garbage Collection +- [ ] ch 15: Functional Programming Languages +- [ ] ch 16: Polymorphic Types +- [ ] ch 17: Dataflow Analysis +- [ ] ch 18: Loop Optimizations +- [ ] ch 19: Static Single-Assignment Form +- [ ] ch 20: Pipelining and Scheduling +- [ ] ch 21: The Memory Hierarchy +#### Maybe +- [ ] ch 14: Object-Oriented Languages + +### Technical issues +- [-] testing framework + - [x] run arbitrary code snippets + - [x] check non-failures + - [x] check expected output + - [-] check expected exceptions + - [x] semant stage + - [ ] generalized expect `Output ('a option) | Exception of (exn -> bool)` + - [x] run all book test case files + - [-] grid view (cols: lex, pars, semant, etc.; rows: test cases.) + - [x] implementation + - [ ] refactoring + - [ ] test time-outs (motive: cycle non-detection caused an infinite loop) + - [ ] parallel test execution +- [ ] Travis CI + +Implementation Notes +-------------------- + +### Parser + +#### shift/reduce conflicts +##### grouping consecutive declarations +In order to support mutual recursion, we need to group consecutive +type and function declarations (see Tiger-book pages 97-99). + +Initially, I defined the rules to do so as: + + decs: + | dec { $1 :: [] } + | dec decs { $1 :: $2 } + ; + dec: + | var_dec { $1 } + | typ_decs { Ast.TypeDecs $1 } + | fun_decs { Ast.FunDecs $1 } + ; + +which, while straightforward (and working, because `ocamlyacc` defaults to +shift in case of a conflict), nonetheless caused a shift/reduce conflict in +each of: `typ_decs` and `fun_decs`; where the parser did not know whether to +shift and stay in `(typ|fun_)_dec` state or to reduce and get back to `dec` +state. + +Sadly, tagging the rules with a lower precedence (to explicitly favor +shifting) - does not help :( + + %nonassoc LOWEST + ... + dec: + | var_dec { $1 } + | typ_decs %prec LOWEST { Ast.TypeDecs $1 } + | fun_decs %prec LOWEST { Ast.FunDecs $1 } + ; + +The difficulty seems to be in the lack of a separator token which would be +able to definitively mark the end of each sequence of consecutive +`(typ_|fun_)` declarations. + +Keeping this in mind, another alternative is to manually capture the possible +interspersion patterns in the rules like: + + (N * foo) followed-by (N * not-foo) + +for the exception of `var_dec`, which, since we do not need to group its +consecutive sequences, can be reduced upon first sighting. + +The final rules I ended-up with are: + + decs: + | var_dec decs_any { $1 :: $2 } + | fun_decs decs_any_but_fun { (Ast.FunDecs $1) :: $2 } + | typ_decs decs_any_but_typ { (Ast.TypeDecs $1) :: $2 } + ; + + decs_any: + | { [] } + | var_dec decs_any { $1 :: $2 } + | fun_decs decs_any_but_fun { (Ast.FunDecs $1) :: $2 } + | typ_decs decs_any_but_typ { (Ast.TypeDecs $1) :: $2 } + ; + + decs_any_but_fun: + | { [] } + | var_dec decs_any { $1 :: $2 } + | typ_decs decs_any_but_typ { (Ast.TypeDecs $1) :: $2 } + ; + + decs_any_but_typ: + | { [] } + | var_dec decs_any { $1 :: $2 } + | fun_decs decs_any_but_fun { (Ast.FunDecs $1) :: $2 } + ; + +##### lval + +### AST + +#### print as M-exp + +I chose to pretty-print AST as an (indented) +[M-expression](https://en.wikipedia.org/wiki/M-expression) - an underrated +format, used in Mathematica and was intended for Lisp by McCarthy himself; it +is nearly as flexible as S-expressions, but significantly more readable (IMO). + +As an example, here is what `test28.tig` looks like after parsing and +pretty-printing: + + LetExp[ + [ + TypeDecs[ + TypeDec[ + arrtype1, + ArrayTy[ + int]], + TypeDec[ + arrtype2, + ArrayTy[ + int]]], + VarDec[ + arr1, + arrtype1, + ArrayExp[ + arrtype2, + IntExp[ + 10], + IntExp[ + 0]]]], + SeqExp[ + VarExp[ + SimpleVar[ + arr1]]]] + +### Machine +Will most-likely compile to RISC and execute using SPIM (as favored by Appel)