Language models (LMs) are already deployed in many real-world applications and used to interact with users 👩🦰, but these models are primarily evaluated non-interactively.
How can we evaluate LMs interactively and why is it important? (1/8)
Pinned Tweet
Show this thread






















