| 1 | # vim:sw=2:sts=2: |
| 2 | TODO |
| 3 | ==== |
| 4 | |
| 5 | Legend: |
| 6 | - [ ] not started |
| 7 | - [-] in-progress |
| 8 | - [x] done |
| 9 | - [~] cancelled |
| 10 | |
| 11 | In-progress |
| 12 | ----------- |
| 13 | - [-] timeline limits |
| 14 | - [x] by time range |
| 15 | - [ ] by msg count |
| 16 | - [ ] per peer |
| 17 | - [ ] total |
| 18 | Not necessary for short format, because we have Unix head/tail, |
| 19 | but may be convinient for long format (because msg spans multiple lines). |
| 20 | - [-] Convert to Typed Racket |
| 21 | - [x] build executable (otherwise too-slow) |
| 22 | - [-] add signatures |
| 23 | - [x] top-level |
| 24 | - [ ] inner |
| 25 | - [ ] imports |
| 26 | - [-] commands: |
| 27 | - [x] c | crawl |
| 28 | Discover new peers mentioned by known peers. |
| 29 | - [x] r | read |
| 30 | - see timeline ops above |
| 31 | - [ ] w | write |
| 32 | - arg or stdin |
| 33 | - nick expand to URI |
| 34 | - Watch FIFO for lines, then read, timestamp and append [+ upload]. |
| 35 | Can be part of a "live" mode, along with background polling and |
| 36 | incremental printing. Sort of an ii-like IRC experience. |
| 37 | - [ ] q | query |
| 38 | - see timeline ops above |
| 39 | - see hashtag and channels above |
| 40 | - [x] d | download |
| 41 | - [ ] options: |
| 42 | - [ ] all - use all known peers |
| 43 | - [ ] fast - all except peers known to be slow or unavailable |
| 44 | REQUIRES: stats |
| 45 | - [x] u | upload |
| 46 | - calls user-configured command to upload user's own timeline file to their server |
| 47 | Looks like a better CLI parser than "racket/cmdline": https://docs.racket-lang.org/natural-cli/ |
| 48 | But it is no longer necessary now that I've figured out how to chain (command-line ..) calls. |
| 49 | - [-] Output formats: |
| 50 | - [x] text long |
| 51 | - [x] text short |
| 52 | - [ ] HTML |
| 53 | - [ ] JSON |
| 54 | - [-] Peer discovery |
| 55 | - [-] parse peer refs from peer timelines |
| 56 | - [x] mentions from timeline messages |
| 57 | - [x] @<source.nick source.url> |
| 58 | - [x] @<source.url> |
| 59 | - [ ] "following" from timeline comments: # following = <nick> <uri> |
| 60 | 1. split file lines in 2 groups: comments and messages |
| 61 | 2. dispatch messages parsing as usual |
| 62 | 3. dispatch comments parsing for: |
| 63 | - # following = <nick> <uri> |
| 64 | - what else? |
| 65 | - [ ] Parse User-Agent web access logs. |
| 66 | - [-] Update peer ref file(s) |
| 67 | - [x] peers-all |
| 68 | - [x] peers-mentioned |
| 69 | - [ ] peers-followed (by others, parsed from comments) |
| 70 | - [ ] peers-down (net errors) |
| 71 | - [ ] redirects? |
| 72 | Rough sketch from late 2019: |
| 73 | let read file = |
| 74 | ... |
| 75 | let write file peers = |
| 76 | ... |
| 77 | let fetch peer = |
| 78 | (* Fetch could mean either or both of: |
| 79 | * - fetch peer's we-are-twtxt.txt |
| 80 | * - fetch peer's twtxt.txt and extract mentioned peer URIs |
| 81 | * *) |
| 82 | ... |
| 83 | let test peers = |
| 84 | ... |
| 85 | let rec discover peers_old = |
| 86 | let peers_all = |
| 87 | Set.fold peers_old ~init:peers_old ~f:(fun peers p -> |
| 88 | match fetch p with |
| 89 | | Error _ -> |
| 90 | (* TODO: Should p be moved to down set here? *) |
| 91 | log_warning ...; |
| 92 | peers |
| 93 | | Ok peers_fetched -> |
| 94 | Set.union peers peers_fetched |
| 95 | ) |
| 96 | in |
| 97 | if Set.empty (Set.diff peers_old peers_all) then |
| 98 | peers_all |
| 99 | else |
| 100 | discover peers_all |
| 101 | let rec loop interval peers_old = |
| 102 | let peers_all = discover peers_old in |
| 103 | let (peers_up, peers_down) = test peers_all in |
| 104 | write "peers-all.txt" peers_all; |
| 105 | write "peers-up.txt" peers_up; |
| 106 | write "peers-down.txt" peers_down; |
| 107 | sleep interval; |
| 108 | loop interval peers_all |
| 109 | let () = |
| 110 | loop (Sys.argv.(1)) (read "peers-all.txt") |
| 111 | |
| 112 | Backlog |
| 113 | ------- |
| 114 | - [ ] Support date without time in timestamps |
| 115 | - [ ] Associate cached object with nick. |
| 116 | - [ ] Crawl downloaded web access logs |
| 117 | - [ ] download-command hook to grab the access logs |
| 118 | |
| 119 | (define (parse log-line) |
| 120 | (match (regexp-match #px"([^/]+)/([^ ]+) +\\(\\+([a-z]+://[^;]+); *@([^\\)]+)\\)" log-line) |
| 121 | [(list _ client version uri nick) (cons nick uri)] |
| 122 | [_ #f])) |
| 123 | |
| 124 | (list->set (filter-map parse (file->lines "logs/combined-access.log"))) |
| 125 | |
| 126 | (filter (λ (p) (equal? 'file (file-or-directory-type p))) (directory-list logs-dir)) |
| 127 | |
| 128 | - [ ] user-agent file as CLI option - need to run at least the crawler as another user |
| 129 | - [ ] Support fetching rsync URIs |
| 130 | - [ ] Check for peer duplicates: |
| 131 | - [ ] same nick for N>1 URIs |
| 132 | - [ ] same URI for N>1 nicks |
| 133 | - [ ] Background polling and incremental timeline updates. |
| 134 | We can mark which messages have already been printed and print new ones as |
| 135 | they come in. |
| 136 | REQUIRES: polling |
| 137 | - [ ] Polling mode/command, where tt periodically polls peer timelines |
| 138 | - [ ] nick tiebreaker(s) |
| 139 | - [ ] some sort of a hash of URI? |
| 140 | - [ ] angry-purple-tiger kind if thingie? |
| 141 | - [ ] P2P nick registration? |
| 142 | - [ ] Peers vote by claiming to have seen a nick->uri mapping? |
| 143 | The inherent race condition would be a feature, since all user name |
| 144 | registrations are races. |
| 145 | REQUIRES: blockchain |
| 146 | - [ ] stats |
| 147 | - [ ] download times per peer |
| 148 | - [ ] Support redirects |
| 149 | - should permanent redirects update the peer ref somehow? |
| 150 | - [ ] optional text wrap |
| 151 | - [ ] write |
| 152 | - [ ] peer refs set operations (perhaps better done externally?) |
| 153 | - [ ] timeline as a result of a query (peer ref set op + filter expressions) |
| 154 | - [ ] config files |
| 155 | - [ ] highlight mentions |
| 156 | - [ ] filter on mentions |
| 157 | - [ ] highlight hashtags |
| 158 | - [ ] filter on hashtags |
| 159 | - [ ] hashtags as channels? initial hashtag special? |
| 160 | - [ ] query language |
| 161 | - [ ] console logger colors by level ('error) |
| 162 | - [ ] file logger ('debug) |
| 163 | - [ ] Suport immutable timelines |
| 164 | - store individual messages |
| 165 | - where? |
| 166 | - something like DBM or SQLite - faster |
| 167 | - filesystem - transparent, easily published - probably best |
| 168 | - [ ] block(chain/tree) of twtxts |
| 169 | - distributed twtxt.db |
| 170 | - each twtxt.txt is a ledger |
| 171 | - peers can verify states of ledgers |
| 172 | - peers can publish known nick->url mappings |
| 173 | - peers can vote on nick->url mappings |
| 174 | - we could break time periods into blocks |
| 175 | - how to handle the facts that many(most?) twtxt are unseen by peers |
| 176 | - longest X wins? |
| 177 | |
| 178 | Done |
| 179 | ---- |
| 180 | - [x] Crawl all cache/objects/*, not given peers. |
| 181 | - [x] Support time ranges (i.e. reading the timeline between given time points) |
| 182 | - [x] Dedup read-in peers before using them. |
| 183 | - [x] Prevent redundant downloads |
| 184 | - [x] Check ETag |
| 185 | - [x] Check Last-Modified if no ETag was provided |
| 186 | - [x] Parse rfc2822 timestamps |
| 187 | - [x] caching (use cache by default, unless explicitly asked for update) |
| 188 | - [x] value --> cache |
| 189 | - [x] value <-- cache |
| 190 | REQUIRES: d command |
| 191 | - [x] Logger sync before exit. |
| 192 | - [x] Implement rfc3339->epoch |
| 193 | - [x] Remove dependency on rfc3339-old |
| 194 | - [x] remove dependency on http-client |
| 195 | - [x] Build executable |
| 196 | Implies fix of "collection not found" when executing the built executable |
| 197 | outside the source directory: |
| 198 | |
| 199 | collection-path: collection not found |
| 200 | collection: "tt" |
| 201 | in collection directories: |
| 202 | context...: |
| 203 | /usr/share/racket/collects/racket/private/collect.rkt:11:53: fail |
| 204 | /usr/share/racket/collects/setup/getinfo.rkt:17:0: get-info |
| 205 | /usr/share/racket/collects/racket/contract/private/arrow-val-first.rkt:555:3 |
| 206 | /usr/share/racket/collects/racket/cmdline.rkt:191:51 |
| 207 | '|#%mzc:p |
| 208 | |
| 209 | |
| 210 | Cancelled |
| 211 | --------- |
| 212 | - [~] named timelines/peer-sets |
| 213 | REASON: That is basically files of peers, which we already support. |