In general, there is very little research done on best practices for data curation / cleaning / annotation, even though these steps have more impact on applications than incremental architecture improvements. Preparing the data is an exercise left to the reader
-
-
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Broke: spend all research budget on annotations Woke: use keras to build active learning pipeline Bespoke: continuous training!
-
Choke: Use to many losses to train the rig for just 2% improvement Stroke: Keep training till the system hangs from heat.
- Show replies
New conversation -
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
We at the Lab are working on complete automation of data preprocessing and labelling: the user must provide a few examples of trusted unbiased and fair labelled data samples and the rest is done by the system with unbiased and fair solution guaranteed.
-
Hmmm.... Would love hear more about this? Anything released in public?
- Show replies
New conversation -
-
-
Very very true. I tried hard to balance between “trying to be smart”, vs “just brute force collect more data (judiciously).”
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
There should be some 3rd party service that does all of this, bottom up, and we only choose the pipeline architecture. name drops are appreciated.
-
Consulting. A significant portion of time is spent talking with clients' domain experts, figuring out systems and data involved in the outcome. Then often writing code for data acquisition from disparate sources. Then "data wrestling". Then "ML". Then all to make it "real-world".
End of conversation
New conversation -
-
-
Do you have any recommend books or courses I could take to gain more understanding about how to properly prepare, collect and cleaning the data ?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

