Isn’t the problem with this the same as kernel machines: how do you choose a meaningful measure of distance? Ideally your radius-extended blue set should resemble the test data distribution for this to be a measure of generalisation. What am I missing?
-
-
Replying to @fhuszar
The point is to test on data that does not, in fact, resemble the training distribution. You are right that picking an appropriate definition for distance is important. This is domain-specific, there will not be one universal definition. But there can be one for images, or games
2 replies 0 retweets 4 likes -
Replying to @fchollet
So you’re testing a special case of domain adaptation. Presumably, if you know the definition of the distance you could just train with corresponding data augmentation? Say, for Euclidean distance use spherically symmetric additive noise.
1 reply 0 retweets 8 likes -
-
Replying to @fchollet
It was a pleasure engaging in a discussion about your idea. I’m sorry you had to give up on me.
1 reply 0 retweets 47 likes -
Replying to @fhuszar
In case you were actually expecting a reply: the purpose of this property is that it is useful in the real world. You might want, for example, to train a robot to operate in a couple kitchen then have it operate in an unknown kitchen. Which humans can do.
1 reply 0 retweets 3 likes -
A DL model would have to be trained on a dense sampling of all possible kitchens. In the real world it's easy to come up with novel test data points but impossible to have a formal definition for generalization distance, let alone a way to use data aug. to cover the full space.
1 reply 0 retweets 4 likes -
Naturally you know this and you weren't expecting me to point it out. You frequently send me these bad faith adversarial/ironic replies where you play dumb to make a point. Too used to it to "engage", sorry. It was my mistake for even replying in the first place
2 replies 0 retweets 5 likes -
I think the solution will rely mostly on: 1) working with programs that are richer than neural networks (i.e. which can express complex and general information processes in a compact form), e.g. going from ML to program synthesis
1 reply 1 retweet 6 likes
2) ability to abstract learned patterns (learning a program somewhere and turn it into something that can be applied more generally, on analogous-but-different inputs) 3) lifelong learning and heavy program reuse across tasks/environments to build strong priors
-
-
I see your point. I think the original tweet can be misunderstood as if you are suggesting a new way of measuring generalization. But when you think about it having a separate test set is our best bet (which everyone's already doing).
0 replies 0 retweets 0 likesThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.