I'm trying to find an automatic way to get all the non accessible websites for EU users due to #GDPR. Let's do that together! 1/n
-
Show this thread
-
I downloaded the top 1000 sites from Alexa: curl -s -O http://s3.amazonaws.com/alexa-static/top-1m.csv.zip … ; unzip -q -o http://top-1m.csv.zip top-1m.csv ; head -1000 top-1m.csv | cut -d, -f2 | cut -d/ -f1 > topsites.txt 2/n
4 replies 0 retweets 12 likesShow this thread -
It's not really working the output is only: http://google.com 301 http://youtube.com 301 http://facebook.com 301 http://wikipedia.org 301 http://reddit.com 301 http://yahoo.com 301 Any suggestions guys? 4/n
6 replies 0 retweets 9 likesShow this thread -
Replying to @fs0c131y
Websites may not use HTTP status codes to specify their availability. For example, Spotify website redirects to https://www.spotify.com/int/why-not-available/ … in non-available countries and returns a status code of 200! Maybe check loaded page content for 'GDPR'?
1 reply 0 retweets 1 like
Yep you are right, looking at the status code is stupid :)
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.