Just to give you an idea of how complicated this table schema was to generate, take a look at the JSON representation for the #bigquery table schema for #reddit submission objects: jsonblob.com/8704528b-5e72-
Conversation
Here is the text representation of all the fields that are possible for Reddit submissions. I actually had to write a new parser for #BigQuery to improve how it discovers fields for its autodetection.
pastebin.com/HFW0tkKn
1
Apparently BQ (when set to autodetect the schema) will only use the first X rows of data and then generate a schema. The problem is that if you have a record 50,000 rows out with a new field that it never encountered before, it will throw an error because it thinks it has ...
Replying to
... already discovered the correct schema. The trick is to generate a row with every possible nested key out of the entire data set and then let #bigquery use that row as a reference when it autodetects the schema.
1
