Conversation

I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument: Let e be the probability that any generated token exits the tree of "correct" answers. Then the probability that an answer of length n is correct is (1-e)^n 1/
Image
Errors accumulate. The proba of correctness decreases exponentially. One can mitigate the problem by making e smaller (through training) but one simply cannot eliminate the problem entirely. A solution would require to make LLMs non auto-regressive while preserving their fluency.
24
276
I should add that things like RLHF may reduce e but do not change the fact that token production is auto-regressive and subject to exponential divergence.
14
178
While I don't disagree with you about LLMs, aren't you assuming that each "e" is independent? That's clearly not the case here.
4
52
Yes, I'm implicitly making an independence assumption. Not clear to what extent it's valid or invalid.
3
39
Show replies
Come on, are we saying now that the tokens generated are independent? That's clearly not true, or everyone missed something very important here.
1
24
No. We are assuming the *errors* are independent, so their probabilities can be multiplied.
4
45
Show replies
Does this take into account the fact that even as the search space increases on each tree level, all the (likely correct) priors would make the answer diverge into the "correct space".
1
10
Show replies
Show replies