X-Git-Url: https://git.xandkar.net/?a=blobdiff_plain;f=TODO;h=b48bfd808bf8d745037f826e699b008ead6f67cd;hb=7d9f2ab580eb3356275434ca0b70f7fef69a9513;hp=83aa0164d44690202f02e5ca7ee0ba1a4f36fb83;hpb=9c34c974c9c5d324cd432499ae55b3e2b3f1b059;p=tt.git diff --git a/TODO b/TODO index 83aa016..b48bfd8 100644 --- a/TODO +++ b/TODO @@ -10,7 +10,6 @@ Legend: In-progress ----------- - - [-] Convert to Typed Racket - [x] build executable (otherwise too-slow) - [-] add signatures @@ -18,6 +17,8 @@ In-progress - [ ] inner - [ ] imports - [-] commands: + - [x] c | crawl + Discover new peers mentioned by known peers. - [x] r | read - see timeline ops above - [ ] w | write @@ -48,9 +49,14 @@ In-progress - [x] mentions from timeline messages - [x] @ - [x] @ - - [x] "following" from timeline comments: # following = + - [ ] "following" from timeline comments: # following = - [ ] Parse User-Agent web access logs. - - [ ] Update peer ref file(s) + - [-] Update peer ref file(s) + - [x] peers-all + - [x] peers-mentioned + - [ ] peers-followed (by others, parsed from comments) + - [ ] peers-down (net errors) + - [ ] redirects? Rough sketch from late 2019: let read file = ... @@ -93,6 +99,22 @@ In-progress Backlog ------- +- [ ] Crawl all cache/objects/*, not given peers. + BUT, in order to build A-mentioned-B graph, we need to know the nick + associated with the URI whos object we're examining. How to do that? +- [ ] Crawl downloaded web access logs +- [ ] download-command hook to grab the access logs + + (define (parse log-line) + (match (regexp-match #px"([^/]+)/([^ ]+) +\\(\\+([a-z]+://[^;]+); *@([^\\)]+)\\)" log-line) + [(list _ client version uri nick) (cons nick uri)] + [_ #f])) + + (list->set (filter-map parse (file->lines "logs/combined-access.log"))) + + (filter (λ (p) (equal? 'file (file-or-directory-type p))) (directory-list logs-dir)) + +- [ ] user-agent file as CLI option - need to run at least the crawler as another user - [ ] Support fetching rsync URIs - [ ] Check for peer duplicates: - [ ] same nick for N>1 URIs @@ -146,6 +168,7 @@ Backlog Done ---- +- [x] Dedup read-in peers before using them. - [x] Prevent redundant downloads - [x] Check ETag - [x] Check Last-Modified if no ETag was provided