Skip to content
  • Home Home Home, current page.
  • About

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
honnibal's profile
Matthew Honnibal
Matthew Honnibal
Matthew Honnibal
@honnibal

Tweets

Matthew Honnibal

@honnibal

Computational linguist from Sydney and Berlin. 💫 Author of the @spacy_io NLP tools. 💥 Founder @explosion_ai

Berlin, Germany
spacy.io
Joined May 2008

Tweets

  • © 2019 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    Matthew Honnibal‏ @honnibal 28 Nov 2018

    If you try out the new spacy-nightly (v2.1.0a3), you might be surprised to see it's single-threaded. This actually took a tonne of work! I've spent many hours getting the Blis linear algebra routines into a stand-alone, wheel-installable package. Why? A thread on threads 🧵1/10

    3:17 PM - 28 Nov 2018
    • 40 Retweets
    • 223 Likes
    • Janu Verma Matthew Kenney Łukasz Augustyniak Janos Hajagos Ross Markello Antonino Ingargiola Adam Lineberry Tim Spann Kaustubh N
    6 replies 40 retweets 223 likes
      1. New conversation
      2. Matthew Honnibal‏ @honnibal 28 Nov 2018

        In spaCy 2 we switched over to neural network models, so the bottleneck in spaCy comes down to matrix multiplication. Most Python libraries delegate CPU matrix multiplication to numpy, which then delegates it to a low-level library. Which library? Well, that depends. 2/10

        1 reply 0 retweets 12 likes
        Show this thread
      3. Matthew Honnibal‏ @honnibal 28 Nov 2018

        There are three main libraries default numpy might delegate to. All have different problems. * Intel MKL: May not perform well on non-Intel CPUs * OpenBLAS: Often misdetects my CPU, leading to poor performance. * Accelerate (for OSX): Crashes if executed from a subprocess. 3/10

        2 replies 2 retweets 11 likes
        Show this thread
      4. Matthew Honnibal‏ @honnibal 28 Nov 2018

        Aside from the variation in problems, all of these matrix multiplication libraries will eagerly launch a tonne of threads. Most people see this as a good thing. It makes people happy to see their CPU working hard. But all these threads probably aren't helping you. 4/10

        1 reply 0 retweets 10 likes
        Show this thread
      5. Matthew Honnibal‏ @honnibal 28 Nov 2018

        When we used OpenBLAS for matrix multiplication, people kept reporting terrible performance, even though they were running on a 96 core machine and 96 child threads were being launched. The solution, OMP_NUM_THREADS=2, was not exactly obvious. 5/10

        1 reply 1 retweet 14 likes
        Show this thread
      6. Matthew Honnibal‏ @honnibal 28 Nov 2018

        The problem was that OpenBLAS -- like most other matrix multiplication libraries -- launched far too many threads for our relatively small workloads. This just caused a bunch of contention and switching costs, killing performance. 6/10

        1 reply 0 retweets 14 likes
        Show this thread
      7. Matthew Honnibal‏ @honnibal 28 Nov 2018

        Piping lots of data through a statistical model is an embarrassingly parallel workload. It's completely backwards to launch lots of threads for the *matrix multiplications*. That's the *lowest* level of computation! You want to parallellise at the *highest* level! 7/10

        1 reply 10 retweets 58 likes
        Show this thread
      8. Matthew Honnibal‏ @honnibal 28 Nov 2018

        Let's say you want to pipe 1 billion documents through spaCy. Great. Spin up 1,000 worker CPUs, give them 1 million documents each, and you'll be done in a few minutes. There's zero advantage to having an individual worker launching threads. You shouldn't want that. 8/10

        2 replies 1 retweet 17 likes
        Show this thread
      9. Matthew Honnibal‏ @honnibal 28 Nov 2018

        The place where the single-threading sucks at the moment is training. I hope a multi-processing solution won't be too hard to implement. I've also got some ideas for a software transactional memory strategy I've been meaning to try. 9/10

        1 reply 0 retweets 12 likes
        Show this thread
      10. Matthew Honnibal‏ @honnibal 28 Nov 2018

        With the new models, spaCy is running at around 8000 words per second on an n1-standard-1 machine on Google Compute Engine. This is a bit short of our target of 10k words per second, but still works out to more than 28m words parsed per $0.01, which ain't bad. 10/10

        2 replies 2 retweets 54 likes
        Show this thread
      11. End of conversation
      1. New conversation
      2. Ilias Chalkidis‏ @KiddoThe2B 29 Nov 2018
        Replying to @honnibal

        I was really looking forward for this fix 🥳I was trying to POS-tag approx. 750K documents in order to pre-train POS tag embeddings. I had the anticipated idea to run this preprocess step in parallel, but the multithread configuration was a major bottleneck for the CPU threads.

        2 replies 0 retweets 0 likes
      3. Matthew Honnibal‏ @honnibal 29 Nov 2018
        Replying to @KiddoThe2B

        If you need to run the previous version for some reason, the multi-threading can be disabled by setting the environment variables OMP_NUM_THREADS=1 and MKL_NUM_THREADS=1 If you're using a Jupyter notebook, you can edit os.environ before importing numpy.

        0 replies 0 retweets 2 likes
      4. End of conversation
      1. New conversation
      2. Andy Kitchen  🤖‏ @auastro 29 Nov 2018
        Replying to @honnibal

        Getting OMP_NUM_THREADS=1 tatoo'd on my neck. Lesser academics will join my gang at conferences for protection.

        1 reply 0 retweets 5 likes
      3. Andy Kitchen  🤖‏ @auastro 29 Nov 2018
        Replying to @auastro @honnibal

        Thanks for your interesting question question! Let's take this offline... then I'm taking you offline, punk.

        0 replies 0 retweets 0 likes
      4. End of conversation
      1. simone gabbriellini‏ @digitaldust_it 29 Nov 2018
        Replying to @honnibal

        @Safranf check this out!

        0 replies 0 retweets 1 like
        Thanks. Twitter will use this to make your timeline better. Undo
        Undo
      1. New conversation
      2. Ines Montani  〰️‏ @_inesmontani 29 Nov 2018
        Replying to @honnibal

        @threadreaderapp unroll please! ✨

        1 reply 0 retweets 0 likes
      3. End of conversation

    Loading seems to be taking a while.

    Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

      Promoted Tweet

      false

      • © 2019 Twitter
      • About
      • Help Center
      • Terms
      • Privacy policy
      • Cookies
      • Ads info