(2) A model being able to fit noise is unremarkable and is different than having an inductive bias that favours noisy solutions. A standard GP-RBF prior over functions easily supports noise, but it favours more structured solutions.
-
-
Prikaži ovu nit
-
(3) The volume of good solutions exceeds the volume of bad solutions for typical problems NNs are applied to. Neural nets were constructed to have these inductive biases to help generalization. It is wild to suggest that a NN function is "generalization agnostic".
Prikaži ovu nit -
(4) In fact, you could easily create various "generalization agnostic" priors in function space. They would behave very differently from BNNs. They would have trivial structure and indeed would not generalize.
Prikaži ovu nit -
(5) Lack of desirable posterior collapse can happen when (i) the hypothesis space does not contain a good solution, (ii) the prior is too confident about a bad solution (e.g. equal label p for any x). But NNs are expressive, and (ii) is the opposite of a vague prior on weights!
Prikaži ovu nit -
Separately from the technical discussion, I suggest tending towards asking questions and learning, and being open minded about understanding BDL. Perhaps your prior is too strong! :-)
Prikaži ovu nit -
I will also note that my original response was actually not intended to be specific to Carles, but rather to address general remarks and confusions I had recently seen about BDL.
Prikaži ovu nit
Kraj razgovora
Novi razgovor -
-
-
(1) may be true in theory, but in practice, NNs are so overparameterized (relative to data) that we never see a meaningful reduction in the set of bad functions. And recent work on double descent (https://openai.com/blog/deep-double-descent/ …) indicates that this might even be the ideal regime.
-
The addition of data certainly affects the shape of the likelihood and the posterior, otherwise standard training wouldn’t work. I agree the posterior is often diffuse in practice, but that’s actually a great argument for marginalization, which was a key point in my note.
- Još 13 drugih odgovora
Novi razgovor -
-
-
I am not intimately familiar with BNNs but here’s my 2cts as someone who has worked with « classical » Bayesian statistics. It does not make sense to reason about priors on parameters in isolation. Priors can only be understood in the context of the whole architecture.
-
What you can say is « do these priors and this architecture allow me to cover the range of acceptable predictions? ». To tale the example of classification, if you have no reason to think the dataset is imbalanced
- Još 2 druga odgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.