X-Git-Url: https://git.xandkar.net/?a=blobdiff_plain;f=TODO;h=f89176cb77e88fa8becfde5501e36ce782b2794f;hb=eade817510cd03ba31e90099238f06d6c30872aa;hp=b48bfd808bf8d745037f826e699b008ead6f67cd;hpb=7d9f2ab580eb3356275434ca0b70f7fef69a9513;p=tt.git diff --git a/TODO b/TODO index b48bfd8..f89176c 100644 --- a/TODO +++ b/TODO @@ -10,6 +10,13 @@ Legend: In-progress ----------- +- [-] timeline limits + - [x] by time range + - [ ] by msg count + - [ ] per peer + - [ ] total + Not necessary for short format, because we have Unix head/tail, + but may be convinient for long format (because msg spans multiple lines). - [-] Convert to Typed Racket - [x] build executable (otherwise too-slow) - [-] add signatures @@ -50,12 +57,19 @@ In-progress - [x] @ - [x] @ - [ ] "following" from timeline comments: # following = + 1. split file lines in 2 groups: comments and messages + 2. dispatch messages parsing as usual + 3. dispatch comments parsing for: + - # following = + - what else? - [ ] Parse User-Agent web access logs. - [-] Update peer ref file(s) - [x] peers-all - [x] peers-mentioned - [ ] peers-followed (by others, parsed from comments) + - [ ] peers-up (no net errors) - [ ] peers-down (net errors) + - [ ] peers-valid (up and parsed at least 1 message) - [ ] redirects? Rough sketch from late 2019: let read file = @@ -99,9 +113,19 @@ In-progress Backlog ------- -- [ ] Crawl all cache/objects/*, not given peers. - BUT, in order to build A-mentioned-B graph, we need to know the nick - associated with the URI whos object we're examining. How to do that? +- [ ] Batch download jobs by domain: + - at most 1 worker per domain + - more than 1 domain per worker is OK +- [ ] Remove mention link noise in read view. + in short view: just abbreviate @ to @nick + in long view: abbreviate like above AND list the full versions after the text +- [ ] Crawl only valid objects + REQUIRES: peers-valid ref file update +- [ ] Reduce log noise +- [ ] Parallelize crawling by file +- [ ] Parallelize reading by file +- [ ] Support date without time in timestamps +- [ ] Associate cached object with nick. - [ ] Crawl downloaded web access logs - [ ] download-command hook to grab the access logs @@ -136,10 +160,8 @@ Backlog - [ ] download times per peer - [ ] Support redirects - should permanent redirects update the peer ref somehow? -- [ ] Support time ranges (i.e. reading the timeline between given time points) - [ ] optional text wrap - [ ] write -- [ ] timeline limits - [ ] peer refs set operations (perhaps better done externally?) - [ ] timeline as a result of a query (peer ref set op + filter expressions) - [ ] config files @@ -168,6 +190,8 @@ Backlog Done ---- +- [x] Crawl all cache/objects/*, not given peers. +- [x] Support time ranges (i.e. reading the timeline between given time points) - [x] Dedup read-in peers before using them. - [x] Prevent redundant downloads - [x] Check ETag