This just means that for any two samples, you can slowly morph one sample into the other without "stepping out" of the latent space.pic.twitter.com/c4bg4q0O4W
You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more
The ideal model literally just encodes the latent space -- it would be able to perfectly generalize to *any* new sample. An imperfect model will partially deviate from the latent space, leading to possible errors.
Being able to fit a curve that approximates the latent space relies critically on two factors: 1. The structure of the latent space itself! (a property of the data, not of your model) 2. The availability of a "sufficiently dense" sampling of the latent manifold, i.e. enough data
You *cannot* generalize in this way to a problem where the manifold hypothesis does not apply (i.e. a true discrete problem, like finding prime numbers).
In this case, there is no latent manifold to fit to, which means that your curve (i.e. deep learning model) will simply memorize the data -- interpolated points on the curve will be meaningless. Your model will be a very inefficient hashtable that embeds your discrete space.
The second point -- training data density -- is equally important. You will naturally only be able to train on a very space sampling *of the encoding space*, but you need to *densely cover the latent space*.
It's only with a sufficiently dense sampling of the latent manifold that it becomes possible to make sense of new inputs by interpolating between past training inputs without having to leverage additional priors.pic.twitter.com/SmRvEN2NXS
The practical implication is that the best way to improve a deep learning model is to get more data or better data (overly noisy / inaccurate data will hurt generalization). A denser coverage of the latent manifold leads a model that generalizes better.
This is why *data augmentation techniques* like exposing a model to variations in image brightness or rotation angle is an extremely effective way to improve test-time performance. Data augmentation is all about densifying your latent space coverage (by leveraging visual priors).
In conclusion: the only things you'll find in a DL model is what you put into it: the priors encoded in its architecture and the data it was trained on. DL models are not magic. They're big curves that fit their training samples, with some constraints on their structure.
You have an amazing ability to explain some rather complicated concepts.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.