I think you'd need some restrictions on the algorithm. If it knows it's a word math problem, it can overfit to that particular kind of "understanding" (e.g. use hand-crafted features and a linear model). Would that count as successful to you?
-
-
-
indeed, that’s why i say another tweet a moment ago: general model not tailored to this task.
End of conversation
New conversation -
-
-
tell me more (via contact form at garymarcus dot com)
End of conversation
New conversation -
-
-
This will be a pretty cool benchmark for math problem solving. Maybe a natural upgrade forhttps://github.com/deepmind/mathematics_dataset …
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
We have a few datasets relevant to Math word problem solving. Please check out: MathQA https://arxiv.org/abs/1905.13319 , MAWPS: https://homes.cs.washington.edu/~hannaneh/papers/mawps-data.pdf …
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Can we make a GLUE-like data set out of this? Seriously this could have a tremendous value.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
AI2's DROP dataset is related to this. But the community would enjoy a bigger one.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Do you want something on the line of Glue ?https://super.gluebenchmark.com/
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.