completely agree: remarkable ≠ semantically coherent. system has no idea that a unicorn has one horn, blithely asserts it has four, etc. simultaneously close and not close to Searle’s room, not at all close to genuine understanding.https://twitter.com/Grady_Booch/status/1096841495519232000 …
-
-
Replying to @GaryMarcus
Ben Sprecher Retweeted Ben Sprecher
Then why is it getting decent Winograd scores (a test you originally made me aware of when pointing out BERT's shortcomings)? I asked if one could just throw 10x more data at it and you indicated that wasn't feasible. Are we there yet?https://twitter.com/bensprecher/status/1071651722181849088?s=19 …
Ben Sprecher added,
Ben Sprecher @bensprecherReplying to @GaryMarcusBERT is achieving state of the art performance on a broad range of tests, but not - you are correct - the GLUE WLNI due (at least implied in the paper) to an issue with the construction of the data set. Taking a step back, I think you are looking for logical symbolic 1/1 reply 0 retweets 1 like -
Replying to @bensprecher
Decent, not great, because of some modest correlations? Hard for me to say much without seeing any of the data and without access to the model. Would be interesting to see if it asymptotes or improves with eg 10x or more data.
1 reply 0 retweets 0 likes -
Replying to @GaryMarcus
100% agree on the asymptote question. They give an encouraging indication w/r/t model size (see below), not data. But good news on the data is that since it's unsupervised, you just need to feed it more text. Going from 40GB --> 40 TB is just more web scraping/curating.pic.twitter.com/X0MjRlBsTt
1 reply 0 retweets 2 likes
I see a hint of diminishing returns there, actually (fewer percentage point increase for last doubling in model size). Will be interesting to see, certainly.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.