Isn't this the "double descent" phenomenon studied in https://arxiv.org/abs/1812.11118 and subsequent works?
-
-
-
That's the first citation in the blog post. Note also that the first author, Mikhail Belkin, provided helpful discussions and feedback throughout this work, as mentioned at the bottom.
- Još 2 druga odgovora
Novi razgovor -
-
-
Isn't the abstract a bit misleading claiming *you* show that 'double descent' happens in DNN training while there were already many works, especially from Mikhail Belkin et. al., which showed this in detail (who also happened to coin the term)?
-
Hi, we build on these works and are grateful to Misha Belkin for many helpful discussions. See threadhttps://twitter.com/PreetumNakkiran/status/1202643899652747266 …
- Još 1 odgovor
Novi razgovor -
-
-
For those who are interested in a discussion about this paper that goes beyond the Twitter-character-count-limit please see the OpenReview discussion at https://openreview.net/forum?id=B1g5sA4twr … where some praise but also some constructive criticism of this work appears.
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
This intuitively makes perfect sense to me, reminiscent of Vygotsky's Zone of Proximal Development In other words, more data only serves to confuse a learner until they've grasped underlying concepts and patterns
-
The x axis is not data. It’s plotting model size
- Još 1 odgovor
Novi razgovor -
-
-
- Još 2 druga odgovora
Novi razgovor -
-
-
@jeremyphoward isn't this similar to what you have said in class about the bias variance trade-off? -
Kinda, although I was relying on anecdotal results when I talked about it, rather than rigorous research :)
- Još 3 druga odgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.