I'm going to try crowdsourcing a data management problem: Recently been acquiring lots of data in Excel format with tons of sub sheets. Large number of workbooks. What's a reasonable strategy for converting this to something useable for analysis?
-
-
-
Replying to @cfcoverdale @jordan_mclay
I'm not really looking for an analytics platform (I can write my own code in R or Python for that). More looking for a sensible way to convert messy Excel files and store them.
2 replies 0 retweets 2 likes -
Are you looking to merge the subsheets or just separate with more clarity/organization? And by “large number” do you mean too many for an RA to help organize?
1 reply 0 retweets 0 likes -
Basically, it's monthly data with 200+ sub sheets and large file sizes. Separation might be ok (there is a primary key) but then lots of files! It feels like data that belongs in a proper database (read: SQL or some noSQL alternative) but I'm not sure about the right workflow.
1 reply 0 retweets 1 like -
@dataandme@generativist any ideas?2 replies 0 retweets 1 like
Hrm. Depends on what you mean by "large." Even with lovely tools such as @docker, I try to avoid database management systems for as long as possible to make reproducibility easier. Like, *most* of my data reside in JSON-LINES files.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.