Tools for bulk indexing of WARC/ARC files on Hadoop/EMR https://github.com/ikreymer/webarchive-indexing … - still new goodness for #webarchiving from @webrecorder_io
-
-
Replying to @webrecorder_io
@webrecorder_io config.archive_paths could be pointed to a s3 bucket, right?1 reply 0 retweets 1 like
Replying to @atomotic
@atomotic Yes, using HTTP url though only if bucket is public (for now). Added more detailed reference: https://github.com/ikreymer/pywb/wiki/Additional-Archive-Loading-Options …
0 replies
1 retweet
1 like
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.