I am finalizing the code for streaming Reddit data into #bigquery. I hope to have it completed by this evening. You will be able to query BQ for Reddit comments, submissions and subreddit data once I have completed the code. The stream is near real-time and will end up in ...
Conversation
Google's BigQuery within a couple seconds of being posted to Reddit. The tables will take advantage of BQ's partitioned field which means that when using queries specifying a time range, you will only use up data from the partitions that are scanned. has been a ..
1
2
huge help during this process and I really appreciate his assistance in making this possible for data lovers who enjoy using BigQuery. Remember, you get a terabyte of data processing free each month -- so have at it once I announce its completion!
1
1
Replying to
I am creating a time (day) partition on the created_utc field for both comments and submissions so yes, they should be partitioned correctly for the field that Reddit uses for their timestamp. The timestamp field that I add (retrieved_on) will be a timestamp field but I am ...
not partitioning on that field. Also, when I add the subreddit table, there will be a partition on the created_utc for subreddit objects as well.
1
Show replies

