Just to give you an idea of how complicated this table schema was to generate, take a look at the JSON representation for the #bigquery table schema for #reddit submission objects: jsonblob.com/8704528b-5e72-
Conversation
Replying to
Here is the text representation of all the fields that are possible for Reddit submissions. I actually had to write a new parser for #BigQuery to improve how it discovers fields for its autodetection.
pastebin.com/HFW0tkKn
1
Apparently BQ (when set to autodetect the schema) will only use the first X rows of data and then generate a schema. The problem is that if you have a record 50,000 rows out with a new field that it never encountered before, it will throw an error because it thinks it has ...
1
... already discovered the correct schema. The trick is to generate a row with every possible nested key out of the entire data set and then let #bigquery use that row as a reference when it autodetects the schema.
1

