Conversation

Bursty data + transformer -> in-context (few-shot) learning not weight-based learning Bursty data + LSTM -> weight-based but no few-shot learning Zipfian (long-tailed) bursty data + transformer -> both in-context learning AND weight-based learning