Hey data engineering twitter, what's the current state of the art in #pandas-to-postgres or Arrow-to-postgres? Redshift supports COPY directly on parquet files but regular old postgres doesn't. Is there a way to avoid the CSV format ser/de?
-
-
Also though, don’t underestimate the inefficiency of pandas’ csv read/write, when string columns are involved. Huge bottleneck and pandas significantly underperforms here compared to most csv read/write tools
-
Welp, we basically did every trick in the book and now it looks like you were right - to_csv is now the bottleneck like you said. Might try a parallel insert first, but this might be next.
- 14 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.