+- [ ] Batch download jobs by domain:
+ - at most 1 worker per domain
+ - more than 1 domain per worker is OK
+- [ ] Remove mention link noise in read view.
+ in short view: just abbreviate @<nick uri> to @nick
+ in long view: abbreviate like above AND list the full versions after the text
+- [ ] Crawl only valid objects
+ REQUIRES: peers-valid ref file update
+- [ ] Reduce log noise
+- [ ] Parallelize crawling by file
+- [ ] Parallelize reading by file
+- [ ] Support date without time in timestamps
+- [ ] Associate cached object with nick.
+- [ ] Crawl downloaded web access logs
+- [ ] download-command hook to grab the access logs
+
+ (define (parse log-line)
+ (match (regexp-match #px"([^/]+)/([^ ]+) +\\(\\+([a-z]+://[^;]+); *@([^\\)]+)\\)" log-line)
+ [(list _ client version uri nick) (cons nick uri)]
+ [_ #f]))
+
+ (list->set (filter-map parse (file->lines "logs/combined-access.log")))
+
+ (filter (λ (p) (equal? 'file (file-or-directory-type p))) (directory-list logs-dir))
+
+- [ ] user-agent file as CLI option - need to run at least the crawler as another user
+- [ ] Support fetching rsync URIs
+- [ ] Check for peer duplicates:
+ - [ ] same nick for N>1 URIs
+ - [ ] same URI for N>1 nicks