I love how I take the time to organize data and provide it free for people to use and yet there is always someone who will complain about the file format.
Conversation
If you're processing 500 megs of data and are complaining about 300k of extra data (used to delimit the data), you need to re-evaluate your priorities.
2
2
5
Replying to
Exactly. Everyone has their preference but JSON is usually very adaptable for all kinds of nested data structures. But if someone is going to complain about 300k of new-lines for record seps in 500 megs of data, I don't know what else to say. :)
1
2
I may be slightly biased, but BigQuery is of course the best for this kind of conversion :) You can load NDJSON (for free), run a query to transform the data if you want, and then extract to CSV (for free).
2
1
Yes exactly. Big Query already has support for these type of files -- so that's one of the advantages for releasing data in NDJSON format. Yes, the keys are repeated in every record but that is something any decent compression library can easily compress.


