This is a more nuanced elaboration on Carles's controversial thread from a few weeks ago:https://twitter.com/carlesgelada/status/1208618401729568768 …
-
-
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
We've updated this blog post in response to community discussion & feedback:https://twitter.com/jacobmbuckman/status/1220012977727987712 …
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
one easy thing to dispute: using a particular optimizer that can't find some optima due to their shapes is *tantamount* to assigning regions of parameter space in which those optima are found negligible prior mass 1/2
-
so if some region of parameter space is intrinsically unable to yield good generalization, and this manifests in the shape of the objective function as sharp (hence brittle) optima when fits are good, then the use of SGD does in fact save us 2/2
- Još 15 drugih odgovora
Novi razgovor -
-
-
I feel people are having the exact same arguments that have been going on in classical statistics for the past 40 years.
-
Certainly Bayesians and frequentists have been arguing for decades, but our specific point is pretty DNN-centric so I'd be surprised if someone had made the exact same argument 40 years ago.
- Još 1 odgovor
Novi razgovor -
-
-
The claim that a network that fits C perfectly would do badly on any test set is actually not correct. It would only be bad if C >> D and the labels in C\D are all random (i.e., no structure can be learned from the extra examples and they dominate the dataset).
-
In this case, even the normal NN will fail to learn. This is not just a problem of BNN. So I'm not sure why this problem is a criticism of BNN.
- Još 9 drugih odgovora
Novi razgovor -
-
-
"Generalization-Agnostic Priors" is a feature not a bug, and a misnomer? It's okay for the bad function and good function to have similar posteriors on D. This gives us uncertainty when integrating over the function distribution to get our prediction distribution.
-
You might just need more data for things to separate out. Usually you have some structural biases that distinguish solutions, like smoothness, which will prefer one over another, so the prior probability is unlikely to be equal if you add lots of corrupted data to C.
- Još 10 drugih odgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.