A Google security employee once replied to my musings with “I’ll look into that; I have a copy of the Internet at the office”. The threat intel possibilities of an exabyte-scale data warehouse shouldn’t be underestimated.https://twitter.com/jeremiahg/status/1167439062128443392 …
-
-
Replying to @alexstamos
While I don’t know for certain, last time I asked, internally they didn’t have the capability to search the HTML source for strings.
1 reply 1 retweet 3 likes -
Replying to @jeremiahg @alexstamos
Huh? Of course we do. Tell them to ask me for pointers to codelabs
1 reply 0 retweets 59 likes -
Replying to @taviso @alexstamos
Fantastic. Thanks for clarifying. Any chance this functionality is publicly available?
2 replies 0 retweets 3 likes -
Replying to @jeremiahg @alexstamos
No, but I suppose you could download a public corpus like
@commoncrawl.2 replies 1 retweet 16 likes -
Several hundred TB later. Gonna need a bigger boat!
1 reply 0 retweets 5 likes -
Right, probably not going to work on a laptop, might need some GCE time
2 replies 0 retweets 13 likes -
Hey, Jeremiah asked
If there's a way to grep petabyte scale corpora for free, I don't know what it is!
-
-
I know. I fell smack into that one.
1 reply 0 retweets 1 like -
You get the steak knives,
@taviso.@jeremiahg, third place. You're fired.1 reply 0 retweets 1 like - 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.