Last week: Deep learning is on verge of understanding natural language. Problem of tracking world change over time solved 3 years ago. This week: Nobody ever said statistical language models understand anything.https://twitter.com/tdietterich/status/1191756945226522624 …
-
-
Replying to @GaryMarcus
Now now. I didn't say they understand nothing; you know very well I wouldn't claim that. The "extend this prompt" demos show that the new models extend the span of "local coherence" far beyond what n-gram models would capture. But they also reveal major semantic failures...
2 replies 0 retweets 6 likes -
Replying to @tdietterich
real question is whether deep learning alone can maintain representations of the world as it unfolds in a narrative / article etc. aside from highly limited worlds of the FB babi tasks, i have not seen any evidence that they can. and gave principled reasons to think otherwise.
3 replies 0 retweets 1 like -
Replying to @GaryMarcus @tdietterich
Even on limited words babi tasks what happens when two or more tasks are clubbed together? I wud say the first sign of understanding will come when a system can pass any of the babi tasks individually or collectively with same accuracy.
1 reply 0 retweets 1 like -
That said, the babi tasks tests very limited aspect of only a few senses. Like in task 14 time manipulation, it only tests ability to understand before/after or timespans like morning or evening. Understanding time involves far more complex aspects than that.
1 reply 0 retweets 1 like -
Replying to @alok_damle @GaryMarcus
I think babi is a very limited (i.e., nearly useless) set of tasks. But I would reword your statement to say "There are many more aspects of time to understand than just that."
2 replies 0 retweets 1 like -
Replying to @tdietterich @GaryMarcus
Exactly my point. That the performance of state of art AI wud go nuts if one combines just two of these "nearly useless" tasks. And even when stat based AI pass this time manipulation task, they dont understand even "just that".
1 reply 0 retweets 1 like -
Problem is that the focus is always on the benchmarks rather than approach of the system to pass them. If approach is human level then the system can pass babi tasks as well as the one
@GaryMarcus is proposing.1 reply 1 retweet 4 likes -
Replying to @alok_damle @GaryMarcus
Agreed. "Benchmarking disease" is particularly bad when there is only one benchmark (e.g., ImageNet 1000 categories). We need many, diverse, tasks to prove that the AI/ML techniques are general.
3 replies 4 retweets 11 likes
agreed; cc @roydanroy re our last interchange
-
-
Replying to @GaryMarcus @tdietterich and
Agreed with both of you, Tom and Gary!
0 replies 0 retweets 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.