First, analytic variability is a killer. eg in "standard" analysis for brain mapping https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.24603 …, for machine learning in brain imaging https://www.sciencedirect.com/science/article/abs/pii/S1053811917305311 … or more generally in "hypothesis driven" statistical testing https://go.gale.com/ps/anonymous?id=GALE|A389260653&linkaccess=abs&issn=00030996 … 2/8
-
-
Prikaži ovu nit
-
We need weakly-parametric models that can fit data as raw as possible, without relying on non-testable assumptions. Machine learning provides these, and tree-based models need little data transformations. 3/8
Prikaži ovu nit -
We need non-parametric model selection and testing, that do not break if the model is wrong. Cross-validation and permutation importance provide these, once we have chosen input (endogenous) and output (exogenous) variables. 4/8
Prikaži ovu nit -
If there are less than a thousand data points, all but the simple statistical question can and will be gamed (sometimes unconsciously), partly for lack of model selection. An example in neuroimaging https://www.biorxiv.org/content/10.1101/843193v1 … I no longer trust such endeavors, including mines. 5/8
Prikaži ovu nit -
For thousands of data points and moderate dimensionality (99% of cases), gradient-boosted trees provide the necessary regression model https://scikit-learn.org/stable/modules/ensemble.html#histogram-based-gradient-boosting … They are robust to data distribution and support missing values (even outside MAR settings https://arxiv.org/abs/1902.06931 ) 6/8
Prikaži ovu nit -
For thousands of data points and large dimensionality, linear models (ridge) are needed. But applying them without thousands of data points (as I tried for many years) is hazardous. Get more data, change the question (eg analyze across cohorts). 7/8
Prikaži ovu nit -
Most questions are not about "prediction". But machine learning is about estimating functions that approximate conditional expectations / probability. We need to get better at integrating it in our scientific inference pipelines. For more, push me to write a paper on this. 8/8
Prikaži ovu nit
Kraj razgovora
Novi razgovor -
-
-
There's something that many people in science and technology don't seem to get: you can either work on better methods, testing them on well-understood problems, or use well-understood methods for new problems. Innovating in multiple layers at the same time just produces noise.
- Još 3 druga odgovora
Novi razgovor -
-
-
I’ve been calling to develop standards for more than 2 years now, and not many have been supportive of this..https://twitter.com/raamana_/status/1027989248614514693?s=20 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.