This paper is so insightful. Surprised how much mileage they could gain out of such a simple setting.https://www.pnas.org/content/116/23/11537 …
isn't that intuitive though? if they had identical gradients wouldn't that imply that they train to optimal performance in the same number of steps? obviously, that isnt the case. perhaps i misunderstood.
-
-
I'd never heard of 'deep linear' networks before, and assumed that because they are still just linear networks, they would be identical to shallow ones
-
ah yes i see. thanks
Kraj razgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.