So, what's a good speed for CSV parsing? And what would people expect to get? I'm surveying the landscape and seeing a big range of functionality, from all-singing, all-dancing field recognition (date vs floating point vs int vs string) vs super-basic and/or wrong ...
I think handling CR-LF/CR/LF seamlessly has made me blind to whether they are used consistently. My instinct says "yes," but not solid. As for quoting/escaping configs, yes, I don't think I've ever seen a request to support multiple such configs in a single CSV file.
-
-
And yeah, Python was my primary motivation. Just imagine how many broken CSV files are parsed by random Python scripts in the wild. Go's CSV parser was a good example of what NOT to do. (It is super slow and rigidly follows the RFC, which means it barfs quite a bit.)
-
I imagine the RFC leaves a lot to be desired; it obviously predates inclusion of UTF-8 (unless there is a addendum somewhere?).
@lemire and I were pretty strict in simdjson as there's a lot of conformant JSON out there, but CSV has quite a rep (from what I understand Excel ... - 2 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.