Only 100 per second? I was expecting quite a bit more; I sometimes see people using http://archive.is and webcite, but most archived pages you see people using are through the Wayback Machine.
-
-
-
we have robots collecting at thousands per second, including those robots dispatched by 500 institutions subscribing to archive-it ( https://archive-it.org ) . the 100 per second are external users and robots. lots o robots. :)
-
It's a truly astonishing rate, and it's a real achievement that the archive's technical architecture supports it. It would be super interesting to try to characterize (not de-anonymize) the different types of actors & types of material that are being collected this way.
-
100 req/sec for on-demand archiving is an impressive rate. I think many tools are responsible for this including: *
#ArchiveNow by@maturban1@WebSciDL *#SaveMyNews &#SavePageNow by@palewire *#MassArchive by@josephfcox An analysis would be great! http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html …pic.twitter.com/vqGim2TrbH -
Yes the tools generating the traffic are one thing, and then the resources being archived are another, right?
-
That's right, tools affect what is being requested to Save Page Now and how often. E.g.,
#Mink by@machawk1 &#SaveMyNews users will probably request the pages they visit in their browser while many libraries listed above might be utilized in a more robotic way on a list of URLs. -
That sounds like a good hypothesis.
End of conversation
New conversation -
-
-
Is mass, machine submitting of URLs formally accepted? Are any frequency or amount limits in doing this?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Congrats! We love and admire your work!
@netpreserve#archivoswebThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.