Now, how do you learn such a curve? That's where deep learning comes in.
-
Show this thread
-
But by this point this thread is LONG and the Keras team sync starts in 30s, so I refer you to DLwP, chapter 5 for how DL models and gradient descent are an awesome way to achieve generalization via interpolation on the latent manifold.https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff …
3 replies 9 retweets 92 likesShow this thread -
I'm back, just wanted to add one important note to conclude the thread: deep learning models are basically big curves fitted via gradient, that approximate the latent manifold of a dataset. The *quality of this approximation* determines how well the model will generalize.
2 replies 15 retweets 122 likesShow this thread -
The ideal model literally just encodes the latent space -- it would be able to perfectly generalize to *any* new sample. An imperfect model will partially deviate from the latent space, leading to possible errors.
1 reply 4 retweets 54 likesShow this thread -
Being able to fit a curve that approximates the latent space relies critically on two factors: 1. The structure of the latent space itself! (a property of the data, not of your model) 2. The availability of a "sufficiently dense" sampling of the latent manifold, i.e. enough data
1 reply 9 retweets 95 likesShow this thread -
You *cannot* generalize in this way to a problem where the manifold hypothesis does not apply (i.e. a true discrete problem, like finding prime numbers).
2 replies 5 retweets 81 likesShow this thread -
In this case, there is no latent manifold to fit to, which means that your curve (i.e. deep learning model) will simply memorize the data -- interpolated points on the curve will be meaningless. Your model will be a very inefficient hashtable that embeds your discrete space.
2 replies 9 retweets 81 likesShow this thread -
The second point -- training data density -- is equally important. You will naturally only be able to train on a very space sampling *of the encoding space*, but you need to *densely cover the latent space*.
2 replies 6 retweets 62 likesShow this thread -
It's only with a sufficiently dense sampling of the latent manifold that it becomes possible to make sense of new inputs by interpolating between past training inputs without having to leverage additional priors.pic.twitter.com/SmRvEN2NXS
3 replies 15 retweets 109 likesShow this thread -
Replying to @fchollet
Is there a preferred mathematical approach for defining "sufficiently dense sampling" on the latent manifold?
1 reply 0 retweets 0 likes
It depends on the priors encoded by your model architecture. More priors enable you to "explore" further around each data point, reducing the required data density.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.