Skip to content
  • Home Home Home, current page.
  • Moments Moments Moments, current page.

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
michael_nielsen's profile
Michael Nielsen
Michael Nielsen
Michael Nielsen
@michael_nielsen

Tweets

Michael Nielsen

@michael_nielsen

Searching for the numinous. Individually & collectively optimistic. 🇦🇺 🇨🇦, now home in 🇺🇸. @AsteraInstitute. Thinking out loud at: http://mnielsen.github.io 

San Francisco, CA
michaelnielsen.org
Joined July 2008

Tweets

  • © 2022 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    Michael Nielsen‏ @michael_nielsen 19 Jul 2020

    Spent an enjoyable few hours digging into GPT-3, trying to better understand how it works, what the limits are, how it may be improved. The paper is here: https://arxiv.org/pdf/2005.14165.pdf …

    12:44 PM - 19 Jul 2020
    • 166 Retweets
    • 858 Likes
    • Gorrito Luminoso 🧢 Jake Cowton Mark Essel PC-Bjørn Mathisen chuka 🪤 placebomancer 🧙‍♂️🪕🏴‍☠️ Gulan Parthasarathi Philip Thrift
    13 replies 166 retweets 858 likes
      1. New conversation
      2. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        A few thoughts to follow. This isn't the result of any deep reflection or understanding on my part, just a few first impressions based on a quick read of the paper, some of the background papers and other materials. Caveat emptor.

        1 reply 0 retweets 28 likes
        Show this thread
      3. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        First, GPT-3 is just plain fun. I mean, look at this proposal for a new religion, based on a prompt by @flantz: https://twitter.com/flantz/status/1284322274313752576 …pic.twitter.com/W4Fbz02j8b

        3 replies 23 retweets 128 likes
        Show this thread
      4. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Michael Nielsen Retweeted Nick Cammarata

        Or consider that simply telling it to improve on its first attempt at an essay can plausibly result in better outcomes (the author @nicklovescode works at @openAI):https://twitter.com/nicklovescode/status/1284685741759492096 …

        Michael Nielsen added,

        Nick Cammarata @nickcammarata
        Replying to @garybasin @sterlingcrispin
        I think improve will work. Keep in mind gpt3 is an autocompleter. It’s not trying to write a great essay, just the essay it thinks the internet would write. When you ask it to improve it, now it’s trying to write a great essay
        1 reply 0 retweets 42 likes
        Show this thread
      5. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        One thing that stands out: GPT-3 does lots of things remarkably well, and lots of things poorly. Here's a nice example of the latter, from @gwern (https://www.gwern.net/GPT-3#parity ): it fails to learn how to determine the parity of a bit stringpic.twitter.com/wRgSOKtTNc

        3 replies 1 retweet 58 likes
        Show this thread
      6. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Another example: after being shown 10,000 examples of reversed words (eg "gradient -> tneidarg") and then asked to reverse some words, it gets almost everything wrong.

        7 replies 3 retweets 43 likes
        Show this thread
      7. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        It does a bit better on what you might call "easy anagrams" -- anagrams where the first and last letter is held constant. Still, it gets only about 15% correct.

        1 reply 1 retweet 17 likes
        Show this thread
      8. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        So: it can do what appear to be some very complicated tasks remarkably well, and fails miserably at certain other tasks that seem _a priori_ much easier.

        3 replies 2 retweets 46 likes
        Show this thread
      9. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        What's more, it's not always obvious in advance which tasks it will do well on and which poorly. Nor, for that matter, is it always obvious after the fact! I'm not sure how I'd write a checker for "Does this sound like a plausible religion or not?"

        1 reply 2 retweets 38 likes
        Show this thread
      10. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        It's a kind of brittleness that shows up often in AI systems. For instance, a few years back people got really good at building systems that are fantastic at solving certain image recognition tasks. In certain (narrow) ways even better than humans: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/ …pic.twitter.com/zAl1iKxjSg

        1 reply 1 retweet 42 likes
        Show this thread
      11. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        That's an amazing breakthrough. But: many early systems turned out to be extremely brittle. You could build adversarial examples, changing just a few pixels to cause the system to incorrectly classify a screwdriver as a St. Bernard dog (etc). https://arxiv.org/abs/1802.08195 pic.twitter.com/LgL1U1Cm8m

        1 reply 2 retweets 49 likes
        Show this thread
      12. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        (Aside: a more recent paper on adversarial examples that fool both computers and time-limited humans: https://arxiv.org/abs/1802.08195  )

        1 reply 2 retweets 34 likes
        Show this thread
      13. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        The discovery of adversarial examples was great news: it acted as a prompt to go much deeper into the systems, understand better what was going on, how they were brittle, etc. I don't know current state of the art, but such problems are a great stimulus to build better systems

        1 reply 0 retweets 35 likes
        Show this thread
      14. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Similarly, trying to understand _why_ GPT-3 fails at parity or at word reversing seems like a good challenge. And, of course, human beings _also_ have trouble computing parity for bit strings ("hang on, where was I?"), or get taken in by visual illusions.

        2 replies 0 retweets 36 likes
        Show this thread
      15. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Almost certainly: using a richer internal representation would help for these particular problems. GPT3 doesn't use a character-level representation, but rather uses BPEs (byte-pair encoding) that chunk characters together: https://www.gwern.net/GPT-3#bpes  (@gwern).pic.twitter.com/YXsu2Ilgip

        2 replies 3 retweets 41 likes
        Show this thread
      16. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        It seems plausible that extending it to use both character-level and BPE-level can help with problems like parity and anagrams and word-reversal.

        3 replies 2 retweets 25 likes
        Show this thread
      17. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        One of the most striking things about GPT-3 is that it's not so different than GPT-2. It's much, much bigger, though, and consumed far more resources. They did tweak it in a number of ways:pic.twitter.com/c16wJb5lhg

        2 replies 2 retweets 31 likes
        Show this thread
      18. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Also interesting to see the scale of computation used during training, and at runtime:pic.twitter.com/7oniPdVHHQ

        1 reply 0 retweets 26 likes
        Show this thread
      19. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Going back to the striking patterns of strengths and weaknesses, I'm reminded of Minsky's "Society of Mind", the notion that there's no magic trick, but rather just a large collection of tricks that need to be put together in a patchwork:pic.twitter.com/vtGpkf5vcd

        1 reply 3 retweets 78 likes
        Show this thread
      20. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        In that sense, it seems like GPT-3 and similar models may be prototypes for a powerful extra piece in the patchwork needed. We need to understand them much better, but something interesting seems to be going on in them!

        2 replies 0 retweets 52 likes
        Show this thread
      21. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        It's illuminating to contrast GPT-3 with much older systems like SHRDLU (from 1970): https://en.wikipedia.org/wiki/SHRDLU . If you look closely at the SHRDLU transcript here, it seems much more impressive, in some ways, than GPT-3: it's really doing some sophisticated reasoning.pic.twitter.com/QsI12kQeun

        3 replies 6 retweets 49 likes
        Show this thread
      22. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        The reason is that SHRDLU genuinely had sophisticated internal models for the world it described (block world). It was a narrow understanding -- by contrast, GPT-3 is astoundingly responsive across domains -- but quite deep in block world, perhaps deeper in some ways than a human

        2 replies 1 retweet 20 likes
        Show this thread
      23. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        How deep is the understanding underlying GPT-3? One fun thing reading the paper is to realize: I don't know! In fact, I'm not sure anyone does? Maybe @nottombrown? @AlecRad ? One of the other authors?

        2 replies 0 retweets 18 likes
        Show this thread
      24. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        The reason is that neural nets can be quite opaque, even if you understand their architecture, and how they're trained. That is, they can discover all kinds of structure during training, without the programmer being aware of it.

        2 replies 1 retweet 23 likes
        Show this thread
      25. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        This happens, for instance, in image classifiers. There are now some great papers which crack open the black box of image classifiers, and start to understand how they work, what representations they've discovered in training.

        1 reply 1 retweet 19 likes
        Show this thread
      26. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        It turns out that many of the classifiers discover really interesting things about how to see, as they're trained, without the programmers being conscious of those things. @distillpub has quite a few fun papers about this, eg:https://distill.pub/2019/activation-atlas/ …

        1 reply 6 retweets 35 likes
        Show this thread
      27. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Returning to language models, I suspect a way to make progress is: (a) break open the black box of language models like GPT-3, trying to understand what internal representations it's discovered; (b) understand the deficiencies of those repns; & (c) fix them in a new architecture

        3 replies 2 retweets 37 likes
        Show this thread
      28. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        All speculation on my part, vaporthought. Still, the goal would be to support the automatic discovery of deeper representations during unsupervised training, perhaps even models as deep as SHRDLU or deeper.

        3 replies 0 retweets 22 likes
        Show this thread
      29. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        Michael Nielsen Retweeted Tom Brown

        Small amendment on the training for anagrams etc (100 examples in training set, 10,000 in test):https://twitter.com/nottombrown/status/1284965736494981120 …

        Michael Nielsen added,

        Tom Brown @nottombrown
        Replying to @michael_nielsen
        Slight correction, the model was shown only 100 examples of the task, not 10,000. 10,000 was the number of examples we used in the test set for calculating accuracy. This doesn't change your argument though. The model was indeed bad at reversing words! pic.twitter.com/BQ5uA64zRB
        1 reply 0 retweets 10 likes
        Show this thread
      30. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        A comment on my thread above: a few people have been kind enough to say this thread is a "useful take" (or, sometimes, a "not-so-useful take" 😀) on GPT-3. I appreciate the intentions and the generous words.

        1 reply 0 retweets 7 likes
        Show this thread
      31. Michael Nielsen‏ @michael_nielsen 19 Jul 2020

        But with the caveat that w/ any complex generative system you can't really understand much at all with a few hours of playing around Imagine you picked up a guitar for 3 hours & tried to learn to play. At the end you might well put it down and go "lame, the music was no good"

        1 reply 1 retweet 22 likes
        Show this thread
      32. Show replies

    Loading seems to be taking a while.

    Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

      Promoted Tweet

      false

      • © 2022 Twitter
      • About
      • Help Center
      • Terms
      • Privacy policy
      • Cookies
      • Ads info