Skip to content
By using Twitter’s services you agree to our Cookies Use. We and our partners operate globally and use cookies, including for analytics, personalisation, and ads.
  • Home Home Home, current page.
  • About

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
fchollet's profile
François Chollet
François Chollet
François Chollet
Verified account
@fchollet

Tweets

François CholletVerified account

@fchollet

Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'. Opinions are my own.

United States
fchollet.com
Joined August 2009

Tweets

  • © 2021 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    1. François Chollet‏Verified account @fchollet Oct 19

      A common beginner mistake is to misunderstand the meaning of the term "interpolation" in machine learning. Let's take a look ⬇️⬇️⬇️

      21 replies 392 retweets 1,905 likes
      Show this thread
    2. François Chollet‏Verified account @fchollet Oct 19

      Some people think it always refers to linear interpolation -- for instance, interpolating images in pixel space. In reality, it means interpolating *on a learned approximation of the latent manifold of the data*. That's a very different beast!

      2 replies 13 retweets 193 likes
      Show this thread
    3. François Chollet‏Verified account @fchollet Oct 19

      Let's start with a very basic example. Consider MNIST digits. Linearly interpolating between 2 MNIST samples does not produce a MNIST sample, but blurry images: pixel space is not linearly interpolative for digits!pic.twitter.com/JTjGGqwHjr

      1 reply 14 retweets 159 likes
      Show this thread
    4. François Chollet‏Verified account @fchollet Oct 19

      However, if you interpolate between two digits *on the latent manifold of the MNIST data*, the mid-point between two digits still lies on the manifold of the data, i.e. it's still a plausible digit.

      3 replies 6 retweets 97 likes
      Show this thread
      François Chollet‏Verified account @fchollet Oct 19

      Here's a very simple way to visualize what's going on, in the trivial case of a 2D encoding space and a 1D latent manifold. For typical ML problems, the encoding space has millions of dimensions and the latent manifold has 2D-1000D (could be anything really).pic.twitter.com/nxTU6BJCbA

      11:12 AM - 19 Oct 2021
      • 14 Retweets
      • 149 Likes
      • Syed Waleed Hyder Kevin Denamganaï Samik Martin Zettersten Ansgar Pütz Mustafa a Daniel Barrejon The Rorschach Imperative Kay Kim (김기웅)
      2 replies 14 retweets 149 likes
        1. New conversation
        2. François Chollet‏Verified account @fchollet Oct 19

          But wait, what does that really mean? What's a "manifold"? What does "latent" mean? How do you learn to interpolate on a latent manifold?

          2 replies 4 retweets 69 likes
          Show this thread
        3. François Chollet‏Verified account @fchollet Oct 19

          Let's dive deeper. But first: if you want to understand these ideas in-depth in a better format than a Twitter thread, grab your copy of Deep Learning with Python, 2nd edition, and read chapter 5 ("fundamentals of ML"). It covers all of this in detail.https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff …

          3 replies 20 retweets 231 likes
          Show this thread
        4. François Chollet‏Verified account @fchollet Oct 19

          Ok, so, consider MNIST digits, 28x28 black & white images. You could say the "encoding space" of MNIST has 28 * 28 = 784 dimensions. But does that mean MNIST digits represent "high-dimensional" data?

          1 reply 4 retweets 52 likes
          Show this thread
        5. François Chollet‏Verified account @fchollet Oct 19

          Not quite! The dimensionality of the encoding space is an entirely artificial measure that only reflects how you choose to encode the data -- it has nothing to do with the intrinsic complexity of the data.

          1 reply 9 retweets 94 likes
          Show this thread
        6. François Chollet‏Verified account @fchollet Oct 19

          For instance, if you take a set of different numbers between 0 and 1, print them on sheets of paper, then take 2000x2000 RGB pictures of those papers, you end up with a dataset with 12M dimensions. But in reality your data is scalar, i.e. 1-dimensional.

          1 reply 6 retweets 65 likes
          Show this thread
        7. François Chollet‏Verified account @fchollet Oct 19

          You can always add (and usually remove) encoding dimensions without changing the data, simply by picking a different encoding scheme. That is, until you hit the "intrinsic dimensionality" of the data (past which you would be destroying information).

          3 replies 6 retweets 77 likes
          Show this thread
        8. François Chollet‏Verified account @fchollet Oct 19

          For MNIST, for instance, very few random grids of 28x28 pixels form a valid digit. Valid digits occupy a tiny, microscopic *subspace* within the encoding space. Like a grain of sand in a stadium. You call it the "latent space". This is broadly true for all forms of data.

          1 reply 9 retweets 99 likes
          Show this thread
        9. François Chollet‏Verified account @fchollet Oct 19

          Further, the valid digits aren't sprinkled at random within the encoding space. The latent space is *highly structured*. So structured, in fact, that for many problems it is *continuous*.

          2 replies 4 retweets 74 likes
          Show this thread
        10. François Chollet‏Verified account @fchollet Oct 19

          This just means that for any two samples, you can slowly morph one sample into the other without "stepping out" of the latent space.pic.twitter.com/c4bg4q0O4W

          2 replies 7 retweets 96 likes
          Show this thread
        11. François Chollet‏Verified account @fchollet Oct 19

          For instance, this is true for digits. This is also true for human faces. For tree leaves. For cats. For dogs. For the sounds of the human voice. It's even true for sufficiently structured discrete symbolic spaces, like human language! (face grid image by the awesome @dribnet)pic.twitter.com/kVgu6CDlHM

          1 reply 7 retweets 88 likes
          Show this thread
        12. François Chollet‏Verified account @fchollet Oct 19

          The fact that this property (latent space = very small subspace + continuous & structured) applies to so many problems is called the *manifold hypothesis*. This concept is central to understanding the nature of generalization in ML.

          2 replies 13 retweets 166 likes
          Show this thread
        13. François Chollet‏Verified account @fchollet Oct 19

          The manifold hypothesis posits that for many problems, your data samples lie on a low-dimensional manifold embedded in the original encoding space.

          3 replies 13 retweets 113 likes
          Show this thread
        14. François Chollet‏Verified account @fchollet Oct 19

          A "manifold" is simply a lower-dimensional subspace of some parent space that is locally similar to a linear (Euclidian) space. (i.e. it is continuous and smooth). Like a curved line within a 3D space.

          1 reply 9 retweets 85 likes
          Show this thread
        15. François Chollet‏Verified account @fchollet Oct 19

          When you're dealing with data that lies on a manifold, you can use *interpolation* to generalize to samples you've never seen before. You do this by using a *small subset* of the latent space to fit a *curve* that approximately matches the latent space.pic.twitter.com/qkcmPHOaWX

          1 reply 6 retweets 85 likes
          Show this thread
        16. François Chollet‏Verified account @fchollet Oct 19

          Once you have such a curve, you can walk on the curve to make sense of *samples you've never seen before* (that are interpolated from samples you have seen). This is how a GAN can generate faces that weren't in the training data, or how a MNIST classifier can recognize new digits

          2 replies 4 retweets 66 likes
          Show this thread
        17. François Chollet‏Verified account @fchollet Oct 19

          If you're in a high-dimensional encoding space, this curve is, of course, a high-dimensional curve. But that's because it needs to deal with the encoding space, not because the problem is intrinsically high-dimensional (as mentioned earlier).

          1 reply 4 retweets 54 likes
          Show this thread
        18. François Chollet‏Verified account @fchollet Oct 19

          Now, how do you learn such a curve? That's where deep learning comes in.

          2 replies 3 retweets 44 likes
          Show this thread
        19. François Chollet‏Verified account @fchollet Oct 19

          But by this point this thread is LONG and the Keras team sync starts in 30s, so I refer you to DLwP, chapter 5 for how DL models and gradient descent are an awesome way to achieve generalization via interpolation on the latent manifold.https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff …

          3 replies 9 retweets 92 likes
          Show this thread
        20. François Chollet‏Verified account @fchollet Oct 19

          I'm back, just wanted to add one important note to conclude the thread: deep learning models are basically big curves fitted via gradient, that approximate the latent manifold of a dataset. The *quality of this approximation* determines how well the model will generalize.

          2 replies 15 retweets 122 likes
          Show this thread
        21. François Chollet‏Verified account @fchollet Oct 19

          The ideal model literally just encodes the latent space -- it would be able to perfectly generalize to *any* new sample. An imperfect model will partially deviate from the latent space, leading to possible errors.

          1 reply 4 retweets 54 likes
          Show this thread
        22. François Chollet‏Verified account @fchollet Oct 19

          Being able to fit a curve that approximates the latent space relies critically on two factors: 1. The structure of the latent space itself! (a property of the data, not of your model) 2. The availability of a "sufficiently dense" sampling of the latent manifold, i.e. enough data

          1 reply 9 retweets 95 likes
          Show this thread
        23. François Chollet‏Verified account @fchollet Oct 19

          You *cannot* generalize in this way to a problem where the manifold hypothesis does not apply (i.e. a true discrete problem, like finding prime numbers).

          2 replies 5 retweets 81 likes
          Show this thread
        24. François Chollet‏Verified account @fchollet Oct 19

          In this case, there is no latent manifold to fit to, which means that your curve (i.e. deep learning model) will simply memorize the data -- interpolated points on the curve will be meaningless. Your model will be a very inefficient hashtable that embeds your discrete space.

          2 replies 9 retweets 81 likes
          Show this thread
        25. François Chollet‏Verified account @fchollet Oct 19

          The second point -- training data density -- is equally important. You will naturally only be able to train on a very space sampling *of the encoding space*, but you need to *densely cover the latent space*.

          2 replies 6 retweets 62 likes
          Show this thread
        26. François Chollet‏Verified account @fchollet Oct 19

          It's only with a sufficiently dense sampling of the latent manifold that it becomes possible to make sense of new inputs by interpolating between past training inputs without having to leverage additional priors.pic.twitter.com/SmRvEN2NXS

          3 replies 15 retweets 109 likes
          Show this thread
        27. François Chollet‏Verified account @fchollet Oct 19

          The practical implication is that the best way to improve a deep learning model is to get more data or better data (overly noisy / inaccurate data will hurt generalization). A denser coverage of the latent manifold leads a model that generalizes better.

          3 replies 8 retweets 78 likes
          Show this thread
        28. François Chollet‏Verified account @fchollet Oct 19

          This is why *data augmentation techniques* like exposing a model to variations in image brightness or rotation angle is an extremely effective way to improve test-time performance. Data augmentation is all about densifying your latent space coverage (by leveraging visual priors).

          1 reply 7 retweets 80 likes
          Show this thread
        29. François Chollet‏Verified account @fchollet Oct 19

          In conclusion: the only things you'll find in a DL model is what you put into it: the priors encoded in its architecture and the data it was trained on. DL models are not magic. They're big curves that fit their training samples, with some constraints on their structure.

          13 replies 28 retweets 180 likes
          Show this thread
        30. End of conversation

      Loading seems to be taking a while.

      Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

        Promoted Tweet

        false

        • © 2021 Twitter
        • About
        • Help Center
        • Terms
        • Privacy policy
        • Cookies
        • Ads info