As someone who actually kinda likes data munging, I agree. Cleaning is part of EDA. What I suspect most people don't like about data cleaning is repetition (which can be scripted away) and opaque errors in your tooling (which is not a problem with data cleaning in and of itself).https://twitter.com/Randy_Au/status/1304121716831064065 …
-
-
This Tweet is unavailable.
-
Replying to @Mike_Kaminsky
Pandas actually has a bunch of really nice utilities built into it, they're just impossible to find. IMO it suffers from fragmentation, but it is an open source project and incredible for what it is
1 reply 0 retweets 0 likes -
Replying to @imightbemary @Mike_Kaminsky
But to your point, it doesn't change the fact that software engineering is not the area expertise of most DS. That's not what they're hired to do, not what they want to do, and not how they're best positioned to add value.
0 replies 0 retweets 0 likes -
This Tweet is unavailable.
-
Replying to @Mike_Kaminsky
Always good to dream! I'm more fluent in Python than R, but whenever I switch over to R world, I always marvel at how R is stats and analysis first. When it comes to the world of DS, Python certainly lives up to its reputation as the second best language for everything
0 replies 0 retweets 3 likes -
This Tweet is unavailable.
For sure, and you see a lot of data analysis tools in Python trying to copy it. pandas is basically an attempt at porting data.frames, statsmodels takes R-style formulas, seaborn is ggplot...
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

