completely agree: remarkable ≠ semantically coherent. system has no idea that a unicorn has one horn, blithely asserts it has four, etc. simultaneously close and not close to Searle’s room, not at all close to genuine understanding.https://twitter.com/Grady_Booch/status/1096841495519232000 …
-
-
Replying to @GaryMarcus
Seems like you're moving the yardstick here. It's coherent wrt entities often over whole paragraphs. Certainly that falls into the camp of things "deep learning shouldn't do". Obviously it's not solving the common sense problem, but it's hard to argue that it's not a step forward
1 reply 1 retweet 5 likes -
Replying to @Zergylord
if i did same with n-grams where n > 50, yielding similar local coherence, would you insist that I count that as progress? we may not have had the data or compute in 1960s, but we have long known that local quasi-coherence can be captured that way - without real understanding
1 reply 0 retweets 1 like -
Replying to @GaryMarcus
N-gram models of the size needed to capture this range of structure would overfit to an absurd degree. At what point does "local conference" make the jump to global?
1 reply 0 retweets 0 likes -
Replying to @Zergylord
What makes you think there isn’t overfitting to a high degree? The pastiche-y nature of the output seems like that to me. Some of the coherence violations are actually local (unicorns can’t have four horns), and I don’t see this approach as on a track to genuine coherence.
1 reply 1 retweet 1 like -
Replying to @GaryMarcus
They report the n-gram overlap with the training set, so it's at least not overfitting in the trivial sense. To be clear, I'm happy with just syntactic coherence, and agree it won't learn "real" semantics. But I view that as a problem with next step prediction rather than DL.
1 reply 0 retweets 1 like -
Replying to @Zergylord
but you can get syntactic coherence with n-grams, if you are indifferent to semantics.
1 reply 0 retweets 0 likes -
Replying to @GaryMarcus
As a non-linguist, my jargon was probably wrong. I meant that these examples show tracking of novel entities over long time horizons, but are bad at inferring their real-world properties. N-grams can't even do the former, and many argued that neither could neural networks.
1 reply 0 retweets 0 likes -
Replying to @Zergylord
large enough ngrams could do same, and GPT-2 still wanders off. not saying it isn’t better than ngrams, but i think the spirit and profile of what can and can’t be done is similar.
1 reply 0 retweets 0 likes -
Replying to @GaryMarcus
A "large enough" lookup table could perfectly capture the whole of cognition (over a finite lifetime). The problem is combinatorics -- at some point differences in scale become differences in kind.
1 reply 0 retweets 1 like
no. you will never have enough data to deal with novelty with a pure lookup table.
-
-
Replying to @GaryMarcus
1) Finite possible sensory percepts 2) Finite possible muscle actuations The correct map from the history of 1) to 2) should be sufficient. (Its just absurdly large)
0 replies 0 retweets 2 likesThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.