Conversation

Replying to
Great work! Very interested. Is there a docs I can read to understand how data were collected? Streaming or retrospective scraping? Particularly interested in whether the dataset includes deleted posts.
1
1
Replying to
Some of the posts were collected historically -- but we are real-time ingesting. I'll be releasing the first "corpus" soon which will have a better explanation -- but you can look at the created time vs retrieved time to get a sense of how quickly it was captured.
1
Replying to
Awesome. This was sth I've been meaning to look at but I was put off as I had no API access, unlike Twitter. Having issues tho. When parsing and flattening, is ndjson expanding in memory as .jsonl files do?
2
1
Replying to
Thanks! Shame that I am a py noob so I'd rather stay in R unless absolutely necessary to jump ship. Master , can I read this large ndjson file using a combination of `readLines` and `ndjson::stream_in` and export each entry rather than trying to read the whole file?
1