X-Git-Url: https://git.xandkar.net/?a=blobdiff_plain;f=TODO;h=f89176cb77e88fa8becfde5501e36ce782b2794f;hb=refs%2Fheads%2Ftmp;hp=06c36749777ca93dc413c65d9063586367e85b7b;hpb=8cd862edbefa9ae27d78e3b03eb7a6256acfdcf6;p=tt.git diff --git a/TODO b/TODO index 06c3674..f89176c 100644 --- a/TODO +++ b/TODO @@ -10,7 +10,13 @@ Legend: In-progress ----------- - +- [-] timeline limits + - [x] by time range + - [ ] by msg count + - [ ] per peer + - [ ] total + Not necessary for short format, because we have Unix head/tail, + but may be convinient for long format (because msg spans multiple lines). - [-] Convert to Typed Racket - [x] build executable (otherwise too-slow) - [-] add signatures @@ -18,11 +24,16 @@ In-progress - [ ] inner - [ ] imports - [-] commands: + - [x] c | crawl + Discover new peers mentioned by known peers. - [x] r | read - see timeline ops above - [ ] w | write - arg or stdin - nick expand to URI + - Watch FIFO for lines, then read, timestamp and append [+ upload]. + Can be part of a "live" mode, along with background polling and + incremental printing. Sort of an ii-like IRC experience. - [ ] q | query - see timeline ops above - see hashtag and channels above @@ -45,11 +56,22 @@ In-progress - [x] mentions from timeline messages - [x] @ - [x] @ - - [x] "following" from timeline comments: # following = + - [ ] "following" from timeline comments: # following = + 1. split file lines in 2 groups: comments and messages + 2. dispatch messages parsing as usual + 3. dispatch comments parsing for: + - # following = + - what else? - [ ] Parse User-Agent web access logs. - - Rough sketch from late 2019: - + - [-] Update peer ref file(s) + - [x] peers-all + - [x] peers-mentioned + - [ ] peers-followed (by others, parsed from comments) + - [ ] peers-up (no net errors) + - [ ] peers-down (net errors) + - [ ] peers-valid (up and parsed at least 1 message) + - [ ] redirects? + Rough sketch from late 2019: let read file = ... let write file peers = @@ -91,6 +113,41 @@ In-progress Backlog ------- +- [ ] Batch download jobs by domain: + - at most 1 worker per domain + - more than 1 domain per worker is OK +- [ ] Remove mention link noise in read view. + in short view: just abbreviate @ to @nick + in long view: abbreviate like above AND list the full versions after the text +- [ ] Crawl only valid objects + REQUIRES: peers-valid ref file update +- [ ] Reduce log noise +- [ ] Parallelize crawling by file +- [ ] Parallelize reading by file +- [ ] Support date without time in timestamps +- [ ] Associate cached object with nick. +- [ ] Crawl downloaded web access logs +- [ ] download-command hook to grab the access logs + + (define (parse log-line) + (match (regexp-match #px"([^/]+)/([^ ]+) +\\(\\+([a-z]+://[^;]+); *@([^\\)]+)\\)" log-line) + [(list _ client version uri nick) (cons nick uri)] + [_ #f])) + + (list->set (filter-map parse (file->lines "logs/combined-access.log"))) + + (filter (λ (p) (equal? 'file (file-or-directory-type p))) (directory-list logs-dir)) + +- [ ] user-agent file as CLI option - need to run at least the crawler as another user +- [ ] Support fetching rsync URIs +- [ ] Check for peer duplicates: + - [ ] same nick for N>1 URIs + - [ ] same URI for N>1 nicks +- [ ] Background polling and incremental timeline updates. + We can mark which messages have already been printed and print new ones as + they come in. + REQUIRES: polling +- [ ] Polling mode/command, where tt periodically polls peer timelines - [ ] nick tiebreaker(s) - [ ] some sort of a hash of URI? - [ ] angry-purple-tiger kind if thingie? @@ -103,10 +160,8 @@ Backlog - [ ] download times per peer - [ ] Support redirects - should permanent redirects update the peer ref somehow? -- [ ] Support time ranges (i.e. reading the timeline between given time points) - [ ] optional text wrap - [ ] write -- [ ] timeline limits - [ ] peer refs set operations (perhaps better done externally?) - [ ] timeline as a result of a query (peer ref set op + filter expressions) - [ ] config files @@ -135,6 +190,13 @@ Backlog Done ---- +- [x] Crawl all cache/objects/*, not given peers. +- [x] Support time ranges (i.e. reading the timeline between given time points) +- [x] Dedup read-in peers before using them. +- [x] Prevent redundant downloads + - [x] Check ETag + - [x] Check Last-Modified if no ETag was provided + - [x] Parse rfc2822 timestamps - [x] caching (use cache by default, unless explicitly asked for update) - [x] value --> cache - [x] value <-- cache