Headless Chromium looks very promising to properly archive web pages https://chromium.googlesource.com/chromium/src/+/master/headless/README.md … https://docs.google.com/document/d/11zIkKkLBocofGgoTeeyibB2TZ_k7nR78v7kNelCatUE/edit# … https://bugs.chromium.org/p/chromium/issues/detail?id=546953 …
-
-
Replying to @FiloSottile
do you know about https://github.com/internetarchive/brozzler … ?
1 reply 0 retweets 1 like -
Replying to @atomotic
sleek, I didn't, it looks like they could now use headless to save memory. However, it does not save the computed HTML, does it?
1 reply 0 retweets 0 likes -
Replying to @FiloSottile
no, it just extract links. then you need heritrix (or other warc tool) to save pages. here an old tut https://sbforge.org/display/NAS/Experience+With+IA+Umbra …
1 reply 0 retweets 0 likes -
Replying to @atomotic
what I want want is to run JS, then freeze the HTML and save that, so that I get the content loaded by AJAX, archive.is style
2 replies 0 retweets 1 like -
Replying to @FiloSottile
the right solution for this problem is
@webrecorder_io, even if not automatic http://labs.rhizome.org/iipc2016/workshop.html#/9 …1 reply 0 retweets 1 like -
Replying to @atomotic @FiloSottile
yep, you can interact with page for dynamic archive, also there is a 'static snapshot' opt to save DOM at any point.
1 reply 0 retweets 0 likes
DOM snapshot needs some iframe handling improvements, working on new release now, automation eventually, but not yet.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.