It's only with a sufficiently dense sampling of the latent manifold that it becomes possible to make sense of new inputs by interpolating between past training inputs without having to leverage additional priors.pic.twitter.com/SmRvEN2NXS
Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'. Opinions are my own.
You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more
Add this Tweet to your website by copying the code below. Learn more
Add this video to your website by copying the code below. Learn more
By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.
| Country | Code | For customers of |
|---|---|---|
| United States | 40404 | (any) |
| Canada | 21212 | (any) |
| United Kingdom | 86444 | Vodafone, Orange, 3, O2 |
| Brazil | 40404 | Nextel, TIM |
| Haiti | 40404 | Digicel, Voila |
| Ireland | 51210 | Vodafone, O2 |
| India | 53000 | Bharti Airtel, Videocon, Reliance |
| Indonesia | 89887 | AXIS, 3, Telkomsel, Indosat, XL Axiata |
| Italy | 4880804 | Wind |
| 3424486444 | Vodafone | |
| » See SMS short codes for other countries | ||
This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.
Hover over the profile pic and click the Following button to unfollow any account.
When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.
The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.
Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.
Get instant insight into what people are talking about now.
Follow more accounts to get instant updates about topics you care about.
See the latest conversations about any topic instantly.
Catch up instantly on the best stories happening as they unfold.
It's only with a sufficiently dense sampling of the latent manifold that it becomes possible to make sense of new inputs by interpolating between past training inputs without having to leverage additional priors.pic.twitter.com/SmRvEN2NXS
The practical implication is that the best way to improve a deep learning model is to get more data or better data (overly noisy / inaccurate data will hurt generalization). A denser coverage of the latent manifold leads a model that generalizes better.
This is why *data augmentation techniques* like exposing a model to variations in image brightness or rotation angle is an extremely effective way to improve test-time performance. Data augmentation is all about densifying your latent space coverage (by leveraging visual priors).
In conclusion: the only things you'll find in a DL model is what you put into it: the priors encoded in its architecture and the data it was trained on. DL models are not magic. They're big curves that fit their training samples, with some constraints on their structure.
I saw you were discussing this with Yoshua Bengio at the AGI conference, to which he replied that the missing piece is getting rid of the independence assumption in the latent space by assuming some additional 'modularity' prior. Do you have any comments on this?
Yoshua is right! The more priors you inject the less data you need to obtain a curve that approximates the latent manifold. Strong & accurate priors enable you to "see" further given the stepping stones (data points) you're given.
The entire subfields of DL architecture and data augmentation are about leveraging new/more priors in this way. And such priors are often about modularity! This is why we use "layers" or "convolutions" in DL instead of an amorphous soup of parameters.
Makes sense. But I suppose the manifold hypothesis persists regardless of the priors we use? Then, end to end DL will never truly get us to the 'system 2' type of capabilities. I guess the uncertainty is in whether we can find good priors to get enough ood generalization?
It's complicated -- it's basically a fundamental question about the structure of information in the universe. DL only works with spaces where the manifold hypothesis applies (regardless of priors). The question is how far it really extends.
I really believe that there are two poles within the space of all problems: discrete and continuous, with a swampy spectrum in-between. Perception (e.g. natural images) is very continuous. Prime numbers, very discrete. Most problems are somewhere along the spectrum.
While DL might be able to attack some mostly-discrete problems with continuous embedding space (e.g. language), it seems to be fundamentally misfit for the task. We need to also leverage discrete search to get full coverage of the problem spectrum. This was the topic of my talk.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.