I suggest making the benchmark airtight in a way where the goalposts cannot be moved. If someone submits a blob of code that does really well, and their solution is not one that people in the community might approve of, they cannot be accused of "cheating" or breaking guidelines.
-
-
-
easier said then done as
@nasrinmmm could tell you. - 1 more reply
New conversation -
-
-
We have been developing a test set for NLU. Some examples with various sample categories can be found at https://www.rulasys.com/rula-test We surely look forward to a similar benchmark which tests "true NL understanding" ability rather than "phrase finding" ability.
-
this is close to what i have in mind. has community tried it? please dm to discuss
End of conversation
New conversation -
-
-
I think a good test would be framed in terms of narrative, which is something we don't have a good formal understanding of yet, but we understand at an intuitive level. Existing tests (e.g. bAbI) are about monitoring factual states of the world. Narrative, on the other hand >
-
Narrative would represent the changes in state, rather than snapshots. It could best be described as arcs, plots, journeys, and incorporates clear cause-effect relationships. Moreover, it would be centered on agents, including mental models, and how they change over time >
- 4 more replies
New conversation -
-
-
Gary, I can contrib to benchmark dev.. Will connect offline.
-
@GaryMarcus we will have our own benchmark for real NLU, not like today's ones which test machine learning on specific data sets, after trained, to highlight pieces of text. - 2 more replies
New conversation -
-
-
Make the AI watch half a movie and ask it to explain what might happen in the 2nd half. Ask it to listen to a few lectures on website design and then build a website about cats. If you set the bar higher Gary people might start to realise how I'll equipped ML is to deal with GI
- 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.