Some of our findings are relevant in the context of the head ablation studies (@gneubig, @lena_voita), the easy-dataset problem (@tallinzen, @ssgrn, @VeredShwartz, @yoavgo, @robinomial, and many others), and the Great Attention Debate (@byron_c_wallace, @nlpnoah, @yuvalpi).
-
-
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
Not sure if I'd say "most" :) But yeah, at least on a few!
-
Yeah—this is a really cool/interesting/surprising post overall, but "random + fine-tuned BERT is doing disturbingly well on all tasks except textual similarity" doesn't really seem fair.
- Još 8 drugih odgovora
Novi razgovor -
-
-
I liked the experiment comparing pretrained, random+finetuned, and pretrained+finetuned because it is hard to find experiments in literature that measure contribution of MLM pretraining on downstream tasks. But I am with
@sleepinyourhat on the conclusion from that experiment! -
Do you by chance also have results on effect of pretraining on learning efficiency in downstream tasks? In other words, does pretrained+finetuned achieve same performance as random+finetuned with fewer training samples or epochs?
Kraj razgovora
Novi razgovor -
-
-
This was a really enjoyable read. Thank you for sharing
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.