Oops haven't tweeted too much recently; I'm mostly watching with interest the open source LLM ecosystem experiencing early signs of a cambrian explosion. Roughly speaking the story as of now:
1. Pretraining LLM base models remains very expensive. Think: supercomputer + months.
2. But finetuning LLMs is turning out to be very cheap and effective due to recent PEFT (parameter efficient training) techniques that work surprisingly well, e.g. LoRA / LLaMA-Adapter, and other awesome work, e.g. low precision as in bitsandbytes library. Think: few GPUs + day, even for very large models.
3. Therefore, the cambrian explosion, which requires wide reach and a lot of experimentation, is quite tractable due to (2), but only conditioned on (1).
4. The de facto OG release of (1) was Facebook's sorry Meta's LLaMA release - a very well executed high quality series of models from 7B all the way to 65B, trained nice and long, correctly ignoring the "Chinchilla trap". But LLaMA weights are research-only, been locked down behind forms, but have also awkwardly leaked all over the place... it's a bit messy.
5. In absence of an available and permissive (1), (2) cannot fully proceed. So there are a number of efforts on (1), under the banner "LLaMA but actually open", with e.g. current models from , ~matching the performance of the smallest (7B) LLaMA model, and , nearby.
For now, things are moving along (e.g. see the 10 chat finetuned models released last ~week, and projects like llama.cpp and friends) but a bit awkwardly due to LLaMA weights being open but not really but still. And most interestingly, a lot of questions of intuition remain to be resolved, e.g. especially around how well finetuned model work in practice, even at smaller scales.
Conversation
I didn't agree what that post. I cheer for open source but the conclusion over-reaches. I'm still trying to formalize a coherent mental picture of the dynamics, meanwhile the nearest neighbor of my own hot take atm is along the lines of
Quote Tweet
This leaked google memo is a great overview of what open source AI has achieved... but the conclusion is wrong. twitter.com/simonw/status/…
Show this thread
6
11
146
Show replies
This is a good but a bit technical post on the topic: harmdevries.com/post/model-siz
People have been training way too big LLMs for way too short. First because the original scaling laws were measured not quite right due to details of learning rate decay schedules (which Chinchilla… Show more
14
87
866
Show replies
Super excited to push this even further:
- Next week: bitsandbytes 4-bit closed beta that allows you to finetune 30B/65B LLaMA models on a single 24/48 GB GPU (no degradation vs full fine-tuning in 16-bit)
- Two weeks: Full release of code, paper, and a collection of 65B models
40
237
1,418
Show replies
It's a great question. I roughly think of finetuning as analogous to expertise in people:
- Describe a task in words ~= zero-shot prompting
- Give examples of solving task ~= few-shot prompting
- Allow person to practice task ~= finetuning
With this analogy in mind, it's… Show more
29
197
874
Show replies
amusing my eval is "wait and observe general sentiment on 4chan, Reddit, twitter and discord"
5
1
54
Show replies






