I've also observed this to be the case with few-shot image classification! I think this might be telling us more about the benchmarks than the algorithms. I look forward to seeing how these methods perform as we start to evaluate them on much harder problems.
-
-
-
Definitely think that there are lots more open questions on how things work as we vary datasets and tasks. Probably lots of interesting connections to be made with domain adaptation and transfer learning also!
Kraj razgovora
Novi razgovor -
-
-
Nice work! We also had similar observations in our recent work "Few-shot Text Classification with Distributional Signatures". Directly applying MAML to text may perform worse than NN, as it only activates on patterns that are important during training. http://arxiv.org/abs/1908.06039
-
In CV, low-level patterns (e.g. edges) and their representations are sharable across tasks. The situation is different for NLP, where most tasks operate at the lexical level. Words that are informative for one task may not be relevant for others. MAML may fail in this case.
Kraj razgovora
Novi razgovor -
-
-
I think the following work may be relevant ? https://arxiv.org/pdf/1810.03642.pdf …. They similarly show that very few parameters need to be adapted in the inner loop and suggest that that may be due to the fact that in current "the the amount of adaptation needed in some cases is small."
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
In MAML++ the learned per-layer, per-step learning rates tend to be massive in the last couple of layers, so perhaps how much learning is being done depends on which components of the network are learnable on the outer loop? https://www.bayeswatch.com/2018/11/30/HTYM/ …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
[1/3] Really nice paper! In my recent work [1] with Martha White (To appear at NeurIPS19), we observed something similar -- learning features by updating only the last layers in the inner loop is effective for meta-learning. [1]https://arxiv.org/pdf/1905.12588.pdf …pic.twitter.com/jWZxShzVMm
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
[3/3] However, even though I knew MAML and ANIL were comparable when meta-testing involved few gradient updates, I did not know the reason. I also did not expect that MAML was mostly just reusing features. This work clarifies that in a very convincing way.
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
It seems to me this is just indicating we've been using benchmarks for which transfer learning alone already works well. Perhaps the challenge resides in finding a truly diverse set of tasks for which a last layer swap is not enough to adapt.
- Još 2 druga odgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.