Commit | Line | Data |
---|---|---|
a78e83b8 | 1 | # vim:sw=2:sts=2: |
e678174b SK |
2 | TODO |
3 | ==== | |
33cf2848 | 4 | |
e678174b SK |
5 | Legend: |
6 | - [ ] not started | |
7 | - [-] in-progress | |
8 | - [x] done | |
9 | - [~] cancelled | |
33cf2848 | 10 | |
e678174b SK |
11 | In-progress |
12 | ----------- | |
a993cb85 SK |
13 | - [-] timeline limits |
14 | - [x] by time range | |
15 | - [ ] by msg count | |
16 | - [ ] per peer | |
17 | - [ ] total | |
18 | Not necessary for short format, because we have Unix head/tail, | |
19 | but may be convinient for long format (because msg spans multiple lines). | |
e678174b SK |
20 | - [-] Convert to Typed Racket |
21 | - [x] build executable (otherwise too-slow) | |
22 | - [-] add signatures | |
23 | - [x] top-level | |
24 | - [ ] inner | |
25 | - [ ] imports | |
24f1f64b | 26 | - [-] commands: |
a60c484e SK |
27 | - [x] c | crawl |
28 | Discover new peers mentioned by known peers. | |
24f1f64b | 29 | - [x] r | read |
d96fa613 | 30 | - see timeline ops above |
24f1f64b | 31 | - [ ] w | write |
d96fa613 SK |
32 | - arg or stdin |
33 | - nick expand to URI | |
9d1b7217 SK |
34 | - Watch FIFO for lines, then read, timestamp and append [+ upload]. |
35 | Can be part of a "live" mode, along with background polling and | |
36 | incremental printing. Sort of an ii-like IRC experience. | |
24f1f64b | 37 | - [ ] q | query |
d96fa613 SK |
38 | - see timeline ops above |
39 | - see hashtag and channels above | |
4214c0f3 | 40 | - [x] d | download |
e678174b SK |
41 | - [ ] options: |
42 | - [ ] all - use all known peers | |
43 | - [ ] fast - all except peers known to be slow or unavailable | |
44 | REQUIRES: stats | |
3a4b2233 | 45 | - [x] u | upload |
54c0807b | 46 | - calls user-configured command to upload user's own timeline file to their server |
24f1f64b SK |
47 | Looks like a better CLI parser than "racket/cmdline": https://docs.racket-lang.org/natural-cli/ |
48 | But it is no longer necessary now that I've figured out how to chain (command-line ..) calls. | |
e678174b SK |
49 | - [-] Output formats: |
50 | - [x] text long | |
51 | - [x] text short | |
52 | - [ ] HTML | |
53 | - [ ] JSON | |
54 | - [-] Peer discovery | |
55 | - [-] parse peer refs from peer timelines | |
56 | - [x] mentions from timeline messages | |
57 | - [x] @<source.nick source.url> | |
58 | - [x] @<source.url> | |
a60c484e | 59 | - [ ] "following" from timeline comments: # following = <nick> <uri> |
2cac257d SK |
60 | 1. split file lines in 2 groups: comments and messages |
61 | 2. dispatch messages parsing as usual | |
62 | 3. dispatch comments parsing for: | |
63 | - # following = <nick> <uri> | |
64 | - what else? | |
8cd862ed | 65 | - [ ] Parse User-Agent web access logs. |
a60c484e SK |
66 | - [-] Update peer ref file(s) |
67 | - [x] peers-all | |
68 | - [x] peers-mentioned | |
69 | - [ ] peers-followed (by others, parsed from comments) | |
eade8175 | 70 | - [ ] peers-up (no net errors) |
a60c484e | 71 | - [ ] peers-down (net errors) |
eade8175 | 72 | - [ ] peers-valid (up and parsed at least 1 message) |
a60c484e | 73 | - [ ] redirects? |
b06cbfc2 | 74 | Rough sketch from late 2019: |
c91a1ca9 SK |
75 | let read file = |
76 | ... | |
77 | let write file peers = | |
78 | ... | |
79 | let fetch peer = | |
80 | (* Fetch could mean either or both of: | |
81 | * - fetch peer's we-are-twtxt.txt | |
82 | * - fetch peer's twtxt.txt and extract mentioned peer URIs | |
83 | * *) | |
84 | ... | |
85 | let test peers = | |
86 | ... | |
87 | let rec discover peers_old = | |
88 | let peers_all = | |
89 | Set.fold peers_old ~init:peers_old ~f:(fun peers p -> | |
90 | match fetch p with | |
91 | | Error _ -> | |
92 | (* TODO: Should p be moved to down set here? *) | |
93 | log_warning ...; | |
94 | peers | |
95 | | Ok peers_fetched -> | |
96 | Set.union peers peers_fetched | |
97 | ) | |
98 | in | |
99 | if Set.empty (Set.diff peers_old peers_all) then | |
100 | peers_all | |
101 | else | |
102 | discover peers_all | |
103 | let rec loop interval peers_old = | |
104 | let peers_all = discover peers_old in | |
105 | let (peers_up, peers_down) = test peers_all in | |
106 | write "peers-all.txt" peers_all; | |
107 | write "peers-up.txt" peers_up; | |
108 | write "peers-down.txt" peers_down; | |
109 | sleep interval; | |
110 | loop interval peers_all | |
111 | let () = | |
112 | loop (Sys.argv.(1)) (read "peers-all.txt") | |
e678174b SK |
113 | |
114 | Backlog | |
115 | ------- | |
eade8175 SK |
116 | - [ ] Batch download jobs by domain: |
117 | - at most 1 worker per domain | |
118 | - more than 1 domain per worker is OK | |
119 | - [ ] Remove mention link noise in read view. | |
120 | in short view: just abbreviate @<nick uri> to @nick | |
121 | in long view: abbreviate like above AND list the full versions after the text | |
122 | - [ ] Crawl only valid objects | |
123 | REQUIRES: peers-valid ref file update | |
124 | - [ ] Reduce log noise | |
125 | - [ ] Parallelize crawling by file | |
126 | - [ ] Parallelize reading by file | |
a993cb85 | 127 | - [ ] Support date without time in timestamps |
d3ac9e11 | 128 | - [ ] Associate cached object with nick. |
7d9f2ab5 SK |
129 | - [ ] Crawl downloaded web access logs |
130 | - [ ] download-command hook to grab the access logs | |
131 | ||
132 | (define (parse log-line) | |
133 | (match (regexp-match #px"([^/]+)/([^ ]+) +\\(\\+([a-z]+://[^;]+); *@([^\\)]+)\\)" log-line) | |
134 | [(list _ client version uri nick) (cons nick uri)] | |
135 | [_ #f])) | |
136 | ||
137 | (list->set (filter-map parse (file->lines "logs/combined-access.log"))) | |
138 | ||
139 | (filter (λ (p) (equal? 'file (file-or-directory-type p))) (directory-list logs-dir)) | |
140 | ||
a60c484e | 141 | - [ ] user-agent file as CLI option - need to run at least the crawler as another user |
9c34c974 | 142 | - [ ] Support fetching rsync URIs |
3231d4b5 SK |
143 | - [ ] Check for peer duplicates: |
144 | - [ ] same nick for N>1 URIs | |
145 | - [ ] same URI for N>1 nicks | |
55da29c0 SK |
146 | - [ ] Background polling and incremental timeline updates. |
147 | We can mark which messages have already been printed and print new ones as | |
148 | they come in. | |
149 | REQUIRES: polling | |
4ffb857c | 150 | - [ ] Polling mode/command, where tt periodically polls peer timelines |
e678174b SK |
151 | - [ ] nick tiebreaker(s) |
152 | - [ ] some sort of a hash of URI? | |
153 | - [ ] angry-purple-tiger kind if thingie? | |
154 | - [ ] P2P nick registration? | |
155 | - [ ] Peers vote by claiming to have seen a nick->uri mapping? | |
156 | The inherent race condition would be a feature, since all user name | |
157 | registrations are races. | |
158 | REQUIRES: blockchain | |
159 | - [ ] stats | |
160 | - [ ] download times per peer | |
161 | - [ ] Support redirects | |
54c0807b | 162 | - should permanent redirects update the peer ref somehow? |
e678174b SK |
163 | - [ ] optional text wrap |
164 | - [ ] write | |
54c0807b SK |
165 | - [ ] peer refs set operations (perhaps better done externally?) |
166 | - [ ] timeline as a result of a query (peer ref set op + filter expressions) | |
e678174b SK |
167 | - [ ] config files |
168 | - [ ] highlight mentions | |
169 | - [ ] filter on mentions | |
170 | - [ ] highlight hashtags | |
171 | - [ ] filter on hashtags | |
172 | - [ ] hashtags as channels? initial hashtag special? | |
173 | - [ ] query language | |
174 | - [ ] console logger colors by level ('error) | |
175 | - [ ] file logger ('debug) | |
176 | - [ ] Suport immutable timelines | |
177 | - store individual messages | |
178 | - where? | |
179 | - something like DBM or SQLite - faster | |
180 | - filesystem - transparent, easily published - probably best | |
181 | - [ ] block(chain/tree) of twtxts | |
182 | - distributed twtxt.db | |
183 | - each twtxt.txt is a ledger | |
184 | - peers can verify states of ledgers | |
185 | - peers can publish known nick->url mappings | |
186 | - peers can vote on nick->url mappings | |
187 | - we could break time periods into blocks | |
188 | - how to handle the facts that many(most?) twtxt are unseen by peers | |
189 | - longest X wins? | |
190 | ||
191 | Done | |
192 | ---- | |
d3ac9e11 | 193 | - [x] Crawl all cache/objects/*, not given peers. |
a993cb85 | 194 | - [x] Support time ranges (i.e. reading the timeline between given time points) |
38c9ecd5 | 195 | - [x] Dedup read-in peers before using them. |
9c5e4499 SK |
196 | - [x] Prevent redundant downloads |
197 | - [x] Check ETag | |
198 | - [x] Check Last-Modified if no ETag was provided | |
199 | - [x] Parse rfc2822 timestamps | |
e678174b SK |
200 | - [x] caching (use cache by default, unless explicitly asked for update) |
201 | - [x] value --> cache | |
202 | - [x] value <-- cache | |
203 | REQUIRES: d command | |
204 | - [x] Logger sync before exit. | |
205 | - [x] Implement rfc3339->epoch | |
206 | - [x] Remove dependency on rfc3339-old | |
207 | - [x] remove dependency on http-client | |
208 | - [x] Build executable | |
209 | Implies fix of "collection not found" when executing the built executable | |
210 | outside the source directory: | |
211 | ||
212 | collection-path: collection not found | |
213 | collection: "tt" | |
214 | in collection directories: | |
215 | context...: | |
216 | /usr/share/racket/collects/racket/private/collect.rkt:11:53: fail | |
217 | /usr/share/racket/collects/setup/getinfo.rkt:17:0: get-info | |
218 | /usr/share/racket/collects/racket/contract/private/arrow-val-first.rkt:555:3 | |
219 | /usr/share/racket/collects/racket/cmdline.rkt:191:51 | |
220 | '|#%mzc:p | |
221 | ||
222 | ||
223 | Cancelled | |
224 | --------- | |
225 | - [~] named timelines/peer-sets | |
226 | REASON: That is basically files of peers, which we already support. |