Skip to content
By using Twitter’s services you agree to our Cookies Use. We and our partners operate globally and use cookies, including for analytics, personalisation, and ads.
  • Home Home Home, current page.
  • About

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
cmuratori's profile
Casey Muratori
Casey Muratori
Casey Muratori
@cmuratori

Tweets

Casey Muratori

@cmuratori

I'm worried that the baby thinks people can't change.

Seattle
caseymuratori.com
Joined March 2009

Tweets

  • © 2021 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    1. Casey Muratori‏ @cmuratori Aug 3

      [15/*] This directive could do some things, like limit reordering, try to translate intrinsics as directly as possible, etc., with an emphasis on _predictability_ rather than speed.

      1 reply 1 retweet 21 likes
      Show this thread
    2. Casey Muratori‏ @cmuratori Aug 3

      [16/*] This would allow programmers to have some confidence that when they've worked out the best way to do something, they can lock that in and know it won't get undone by crazy optimizer steps.

      1 reply 0 retweets 21 likes
      Show this thread
    3. Casey Muratori‏ @cmuratori Aug 3

      [17/*] The only alternative to something like this would be to write these parts in inline assembler, which honestly, would be fine with me - except the syntax for inline assembler is so horrid that it makes it very unpleasant to use.

      2 replies 0 retweets 12 likes
      Show this thread
    4. Casey Muratori‏ @cmuratori Aug 3

      [18/*] So another option would be to get serious about inline assembler, and make a pleasant-to-use, well specific ASM syntax that can be placed in C. I'd be fine with either.

      3 replies 0 retweets 18 likes
      Show this thread
    5. Casey Muratori‏ @cmuratori Aug 3

      [19/*] What I'm increasingly less fine with is constantly having to struggle with CLANG to get it to _stop_ doing crazy things to routines that were already basically optimal if it just translated them directly. It's very frustrating, and sometimes literally can't be fixed :(

      1 reply 0 retweets 20 likes
      Show this thread
    6. Casey Muratori‏ @cmuratori Aug 3

      [20/*] That's all. I figured this has happened enough times now that I should post something about it.

      8 replies 0 retweets 21 likes
      Show this thread
    7. Wouter Bijlsma‏ @Wouter_Bijlsma Aug 5
      Replying to @cmuratori

      Does the ‘bad’ version also benchmark appreciably worse compared to the straightforward version? I remember reading AMD internally only has 128 bit ALU’s and for some things can be very slow using 256-bit operations directly, maybe Clang is trying to avoid that?

      1 reply 0 retweets 0 likes
    8. Casey Muratori‏ @cmuratori Aug 5
      Replying to @Wouter_Bijlsma

      Discussing the performance of this is a bit difficult on Twitter, but: the point here is not to talk about the performance of it. Regardless of which version you think is better, you either thing CLANG 4 through 10 were producing a bad version, or you think Clang <4 >10 were.

      1 reply 0 retweets 0 likes
    9. Casey Muratori‏ @cmuratori Aug 5
      Replying to @cmuratori @Wouter_Bijlsma

      So if you think CLANG was doing something _right_ in 4 through 10, then you think it has a codegen bug _now_ :)

      1 reply 0 retweets 0 likes
    10. Casey Muratori‏ @cmuratori Aug 5
      Replying to @cmuratori @Wouter_Bijlsma

      But to answer your original question, splitting 256's up into 128's would only be a win on slow-256 CPUs _if you actually did that_. The CLANG code here in 4-10 doesn't. It still uses the ymm registers, so it doesn't actually avoid the 256-bit path. It's just a plain old bug.

      1 reply 0 retweets 0 likes
      Casey Muratori‏ @cmuratori Aug 5
      Replying to @cmuratori @Wouter_Bijlsma

      (As an aside, I also doubt this would be something you'd do for AMD CPUs. It would more be something to do for the tranche of Intel CPUs that drastically downclocked the code when ymm's were used. But regardless, CLANG didn't accomplish that in the buggy codegen.)

      11:33 AM - 5 Aug 2021
      1 reply 0 retweets 0 likes
        1. Wouter Bijlsma‏ @Wouter_Bijlsma Aug 5
          Replying to @cmuratori

          You are right, this does not appear to be related to any optimisation trade-off to prevent slow-256 code paths, rather just an optimizer bug. Personally I would cut the Clang people some slack though, it’s a massively complex thing that works extremely well most of the time ;-)

          0 replies 0 retweets 0 likes
          Thanks. Twitter will use this to make your timeline better. Undo
          Undo

      Loading seems to be taking a while.

      Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

        Promoted Tweet

        false

        • © 2021 Twitter
        • About
        • Help Center
        • Terms
        • Privacy policy
        • Cookies
        • Ads info