fact: progress on benchmarks has NOT thus far translated into progress on dialog & story understanding. NO current AI can read a story/article or follow a dialog & build a mental model of what has taken place questions we should be asking: - why not? - what else should we do?
-
-
Because we need different kinds of benchmarks, beyond static train/validate/test sets. https://twitter.com/douwekiela/status/1190083616874934272?s=20 … goes in the right direction.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.