I can't believe my dumbass has stuck with Spark for so long. Dask solves 99.9% of my use case.
-
-
Esp because I have this semi-packaged / partially reimplemented idea called vaquero for data cleaning that is *way* *way* more intuitive if it's just python native rather than PySpark based. Basically, this,https://github.com/jbn/modpipe
-
Plus a smart error wrapper for interactive, iterative, and self-documenting data cleaning. My guess is that if your flows doesn't do it, it would fit well.
- 3 more replies
New conversation -
-
-
Nice. If you ever what to do anything large scale with OmniSci, let me know. We've got hardware :)
-
Next year, yes. My last few months have been frustratingly analysis-less. Been relearning scalable web-tech — I last did that style dev before Rails/merb merged. But soon!
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.