Skip to content
By using Twitter’s services you agree to our Cookie Use and Data Transfer outside the EU. We and our partners operate globally and use cookies, including for analytics, personalisation, and ads.
  • Home Home Home, current page.
  • About

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
geofflangdale's profile
Geoff Langdale
Geoff Langdale
Geoff Langdale
@geofflangdale

Tweets

Geoff Langdale

@geofflangdale

Joined October 2015

Tweets

  • © 2018 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Share Location

    Foursquare
    Results from Yelp

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    1. Computer Science‏ @CompSciFact 28 Nov 2017

      Bit hacking versus memoization: a Stream VByte example https://lemire.me/blog/2017/11/28/bit-hacking-versus-memoization-a-stream-vbyte-example/ … by @lemire

      4 replies 12 retweets 35 likes
    2. Geoff Langdale‏ @geofflangdale 29 Nov 2017
      Replying to @CompSciFact @lemire

      One more thought. You could do this in SIMD using either batch or by using PSHUFB to do a lookup of the sums of each nibble, then adding those values together. Of course, this assumes you have SIMD-sized workloads to begin with.

      1 reply 0 retweets 1 like
    3. Daniel Lemire‏ @lemire 29 Nov 2017
      Replying to @geofflangdale @CompSciFact

      In our case, we do have SIMD-size workloads and vectorized table lookups is definitively an option.

      1 reply 0 retweets 0 likes
      Geoff Langdale‏ @geofflangdale 30 Nov 2017
      Replying to @lemire @CompSciFact

      OK so this is a winner: You can AND by 0x7 and PSHUFB out a result for 4 bits, then shift by 4 bits (no byte shift on IA so you'll bring in useless bits to the 'magic bit 7' for PSHUFB than must be AND'd off) and PSHUFB again. Then add the results of the other PSHUFB and reduce.

      2:52 AM - 30 Nov 2017
      • 1 Like
      • Kendall Willets
      2 replies 0 retweets 1 like
        1. New conversation
        2. Daniel Lemire‏ @lemire 30 Nov 2017
          Replying to @geofflangdale @CompSciFact

          I have updated my post with a link to your tweet.

          1 reply 0 retweets 0 likes
        3. Geoff Langdale‏ @geofflangdale 30 Nov 2017
          Replying to @lemire @CompSciFact

          My team suggests that I have only 1 idea in my whole career; specifically to use PSHUFB for everything. This is unfair; I also like PDEP/PEXT. The combination of these instructions is lethal for columnar database operations (i.e. fast lookup of packed 6 bit fields against a set).

          2 replies 0 retweets 2 likes
        4. Daniel Lemire‏ @lemire 30 Nov 2017
          Replying to @geofflangdale

          We have used pshufb for many papers: https://arxiv.org/abs/1503.03465  https://arxiv.org/abs/1503.07387  https://arxiv.org/abs/1611.07612  https://arxiv.org/abs/1704.00605 

          1 reply 0 retweets 0 likes
        5. Geoff Langdale‏ @geofflangdale 30 Nov 2017
          Replying to @lemire

          I've seen some of this work but not all. Thanks! It's a great instruction. Even the magic bit 7 can occasionally be useful. I was sad when we lost the 2 independent 128-bit PSHUFBs to get a single 256-bit one. VBMI will generalize the shuffle architecture to bytes (eventually).

          1 reply 0 retweets 0 likes
        6. Daniel Lemire‏ @lemire 30 Nov 2017
          Replying to @geofflangdale

          How did we lose the two-lane byte shuffle. What do you mean by lose? Where did it go?

          1 reply 0 retweets 0 likes
        7. Geoff Langdale‏ @geofflangdale 30 Nov 2017
          Replying to @lemire

          In the transition from IVB->HSW we lost the ability to do 2 128-bit PSHUFBs (ports 1 and 5) a cycle, gaining instead the ability to do 1 256-bit VPSHUFB (port 5 only).

          1 reply 0 retweets 0 likes
        8. Daniel Lemire‏ @lemire 1 Dec 2017
          Replying to @geofflangdale

          Makes sense. So we still get the two 128-bit byte shuffles per cycle, but need to use 256-bit registers.

          0 replies 0 retweets 0 likes
        9. End of conversation
        1. New conversation
        2. Daniel Lemire‏ @lemire 30 Nov 2017
          Replying to @geofflangdale @CompSciFact

          Agreed. This is good and will be fast.

          1 reply 0 retweets 0 likes
        3. Daniel Lemire‏ @lemire 30 Nov 2017
          Replying to @lemire @geofflangdale @CompSciFact

          But! Don’t forget that you need to get the result back to scalar registers... which has non-trivial latency.

          1 reply 0 retweets 0 likes
        4. Geoff Langdale‏ @geofflangdale 30 Nov 2017
          Replying to @lemire @CompSciFact

          In the 'large' you just leave the sums in a SIMD register and just reduce once; reduction is the most painful part IMO. Generally you have to shift/add, shift/add, etc. However a sneaky person would use PSADBW against zero which will get you byte sums over 64 bits.

          1 reply 0 retweets 0 likes
        5. Geoff Langdale‏ @geofflangdale 30 Nov 2017
          Replying to @geofflangdale @lemire @CompSciFact

          Transfer to GPR from SIMD is not that bad on IA by the way. Maybe 1-2 cycles and cheap. The close binding of SIMD registers to GPR (and control flow) is a huge strength of Intel Arch and I was saying this before I was an Intel employee :-)

          0 replies 0 retweets 0 likes
        6. End of conversation

      Loading seems to be taking a while.

      Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

        Promoted Tweet

        false

        • © 2018 Twitter
        • About
        • Help Center
        • Terms
        • Privacy policy
        • Cookies
        • Ads info