Like, there's no good reason why I shouldn't be able to download all the reviews that are publicly posted of a particular movie on a particular, major website. I can do it for the dominant gaming website, but not for the dominant movie website.
-
Show this thread
-
And that means I can't collect any actual data about questions about the differences between gaming and movie audiences (stuff like "Why is The Last of Us Part II successful and Atomic Blonde wasn't? Why was The Last of Us Part I successful, and Hanna 2011 wasn't?"
1 reply 0 retweets 5 likesShow this thread -
Or rather, I have to overcome arbitrary obstacles to do this. The '10s were a golden age for automated data research, and I just missed them. The dominance of CMSs like WordPress and the old open API for Twitter which has been completely removed made data harvesting so feasible.
1 reply 0 retweets 5 likesShow this thread -
Again, if you're uncomfortable with the idea of researchers scraping data, consider that we have ethical rules, and the whole point is we're doing it in huge batches. We're not microtargeting, like the companies that actually run these sites are doing.
1 reply 0 retweets 6 likesShow this thread -
I'm not actually certain that the increased difficulty of web scraping is a result of intentional anti-scraping moves, any more than WordPress being really easy to scrape was intentional. It's about how webpages are designed and served.
3 replies 0 retweets 5 likesShow this thread -
Replying to @BootlegGirl
I mean, Twitter now blocks the Internet Archive’s automated scraper, which is a level of adversarial that feels like it can’t not be intentional at this point
1 reply 2 retweets 3 likes -
Replying to @chrysopoetics
I have mixed feelings about saving tweets and about the Internet Archive in general, as you know, but it's hard to argue that Twitter's motivation is out of genuine concern for privacy
2 replies 2 retweets 2 likes -
Replying to @BootlegGirl @chrysopoetics
Twitter is one of the most aggressively "you must use our software like a black box closed source program" sites out there, more so than even Facebook, which is probably why Twitter's open source web frameworks aren't nearly as popular as Facebook's
1 reply 2 retweets 2 likes -
Replying to @BootlegGirl @chrysopoetics
I mean they unfixably broke millions of dollars worth of third party app APIs, without even a compromise solution like requiring them to serve ads
1 reply 2 retweets 1 like -
Replying to @BootlegGirl @chrysopoetics
Also, Twitter is only able to keep itself unscrapable and un-app-integratable by making their first party experience terrible too - this is why the pages load so fcked up. Bc the actual HTML structure IS obvious so they only serve it in tiny doses and change them often
1 reply 2 retweets 3 likes
It used to be relatively trivial to just "go back in time" and read someone's whole Twitter feed from a given day in history and now it's extremely finicky and difficult
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.