@agramfort: Rproducible ML: Software challenges, anecdotes and some engineering solutions #icml2018 #reproducibleML
-
Show this thread
-
@agramfort: reproduciblity can be very brittle: freesurfer outputs depend on OS (even when run from the same Docker) and ICA results depend on the numerical library and even OMP_NUM_THREADS#icml2018#reproducibleML1 reply 1 retweet 3 likesShow this thread -
.
@agramfort: Even on the same machine, with the same software, tiny numeric changes internally can lead to unit tests failing, e.g. https://github.com/scikit-learn/scikit-learn/issues/5545 …#icml2018#reproducibleML1 reply 1 retweet 6 likesShow this thread -
.
@agramfort: Some software engineering solutions for reproducible ML. First, don't re-invent the wheel. A huge user base (like scikit-learn, which has 500,000 users a month) leads to more issues being discovered.#icml2018#reproducibleML1 reply 1 retweet 4 likesShow this thread -
.
@agramfort: How can you make sure code is correct & consistent? Tests! We use docstring tests for looking at reproducible numerics & unit tests to make sure our math is right.#icml2018#reproducibleML1 reply 1 retweet 3 likesShow this thread -
.
@agramfort: Tests do take a lot of investment. Right now scikit-learn is >500,000 lines of code... and also 45,000 lines of tests.#icml2018#reproducibleML1 reply 1 retweet 2 likesShow this thread -
.
@agramfort: We use Travis for continuous integrations & to ensure and quantify test coverage, which helps to add a gamifciation element to ensuring test coverage.#icml2018#reproducibleML2 replies 1 retweet 2 likesShow this thread -
.
@agramfort: For documentation we use sphinx-gallery, which generates examples from scripts. https://sphinx-gallery.readthedocs.io/en/latest/#icml2018#reproducibleML1 reply 1 retweet 3 likesShow this thread -
.
@agramfort: How do we improve reproducibility even further? Sharing data! Lots of good examples of this: Mozilla Common Voice, YouTube 8m, Imagenet, etc.#icml2018#reproducibleML1 reply 1 retweet 4 likesShow this thread -
.
@agramfort: Another example is biobank, the goal is to have brain data from a million people in the UK (!), they currently have 15,000 images.#icml2018#reproducibleML2 replies 1 retweet 3 likesShow this thread
@uk_biobank in case people are curious
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.