About 10,000 deep learning papers have been written about "hard-coding priors about a specific task into a NN architecture works better than a lack of prior" -- but they're typically being passed as "architecture XYZ offers superior performance for [overly generic task category]"
-
Show this thread
-
You can always "buy" performance by either training on more data, better data, or by injecting task information into the architecture or the preprocessing. However, this isn't informative about the generalization power of the techniques used (which is the only thing that matters)
4 replies 16 retweets 112 likesShow this thread -
Basically, a lot of papers can be rephrased as "we achieved better performance on this specific task by going to great lengths to inject more information about the task in our training setup"
4 replies 21 retweets 147 likesShow this thread -
An extreme case of this: working with a synthetically-generated dataset where samples follow a "template" (e.g. bAbI), and manually hard-coding that template into your NN architecture
2 replies 2 retweets 64 likesShow this thread -
Fitting parametric models via gradient descent, unsurprisingly, works best when what you are fitting is already a template of the solution. Of course, convnets are an instance of this (but in a good way, since their assumptions generalize to all visual data).
1 reply 8 retweets 85 likesShow this thread -
Replying to @fchollet
So methodologically, papers making a claim of architectural generality need to demonstrate this generality across a range of tasks and datasets. This is what the ML community used to do before the arrival of these large and expensive dataset-based tasks
1 reply 0 retweets 14 likes
Basically yes, researchers should focus on measuring generalization rather than performance on a single task. The point of an architecture innovation is to be able to reuse it across tasks / datasets -- so you should understand what priors it contains, what its specialization is
-
-
Replying to @fchollet @tdietterich
Abel.TM Retweeted Abel.TM
Abel.TM added,
Abel.TM @Abel_TorresMIntelligence should be tested in never seen before tasks, to be decomposed into subtasks until expressed based on the existing knowledge or leads to new knowledge. It is not about learning the patterns or the associations between them, but about modeling the world based on thoseShow this thread0 replies 1 retweet 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.