Bursty data + transformer -> in-context (few-shot) learning not weight-based learning
Bursty data + LSTM -> weight-based but no few-shot learning
Zipfian (long-tailed) bursty data + transformer -> both in-context learning AND weight-based learning
Conversation
5
Fascinating! Is it a happy accident natural language has these properties? A necessary condition for language? Just an artifact of transformers?
1
1
Is say language has then *because* the world has them (when viewed from the 1st person at least)
1
Yes. Or objects or entities or people
See Linda Smith's analysis of child sensory input for quantification of burstiness there.
1




