Tools for bulk indexing of WARC/ARC files on Hadoop/EMR https://github.com/ikreymer/webarchive-indexing … - still new goodness for #webarchiving from @webrecorder_io
@atomotic Thanks! Still very much a work in progress, but updating docs to make it easier to use. EMR/S3 specific, plan to generalize
-
-
@webrecorder_io config.archive_paths could be pointed to a s3 bucket, right? -
@atomotic Yes, using HTTP url though only if bucket is public (for now). Added more detailed reference: https://github.com/ikreymer/pywb/wiki/Additional-Archive-Loading-Options …
End of conversation
New conversation -
-
-
@webrecorder_io just about s3, i was thinking to run pywb on aws elasticbeanstalk or ecs container service (docker)Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.