Kamil Rocki

@rockikamil

Supercomputing+reinforcement learning+ architecture and memory models at Cerebras Systems. After hours I build turbo engines.

Vrijeme pridruživanja: srpanj 2012.

Tweetovi

Blokirali ste korisnika/cu @rockikamil

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @rockikamil

  1. 26. sij

    People are usually paid more when they ask for more. At the time when they have 15+ years of experience, they have also around 40+ years of life experience - it is perfectly reasonable to assume that *some* are just not motivated by aggregating money at a certain point.

    Poništi
  2. proslijedio/la je Tweet
    25. sij

    “Pigeons classify breast cancer as well as humans”

    Prikaži ovu nit
    Poništi
  3. 25. sij

    TIL: 'K' opens man entry for the word under the cursor in vi. Made my day. 2020 is going to be way more productive.

    Poništi
  4. 21. sij

    It's public now

    Prikaži ovu nit
    Poništi
  5. 21. sij

    One of the most important debugging technique of my life - numerical gradient check. This is a short intro and working piece of some for two models sharing the same grad-check scaffolding: a classifier and an Autoencoder. Pure Numpy!

    Prikaži ovu nit
    Poništi
  6. 20. sij

    No BLAS? No problem. This is the first post of the series implementing various algorithms in ANSI C with 0 dependencies for the DYI people. Stay tuned. Here' s #1: A 2-layer MLP getting beating OpenBLAS' performance and getting over 95 % in under 5s.

    Poništi
  7. 31. pro 2019.

    Hope to see some ML based solutions to this contest: Can you determine a good policy for resource allocation on a 633 x 633 grid of processors ?

    Poništi
  8. proslijedio/la je Tweet
    26. pro 2019.

    If we are in a simulation, then it's worth not wasting compute and have a not boring life. Make sure that you are worth being computed.🤔😜

    Poništi
  9. proslijedio/la je Tweet
    26. stu 2019.

    Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed.

    The SHA-RNN is composed of an RNN, pointer based attention, and a “Boom” feed-forward with a sprinkling of layer normalization. The persistent state is the RNN’s hidden state h as well as the memory M concatenated from previous memories. Bake at 200◦F for 16 to 20 hours in a desktop sized oven.
    The attention mechanism within the SHA-RNN is highly computationally efficient. The only matrix multiplication acts on the query. The A block represents scaled dot product attention, a vector-vector operation. The operators {qs, ks, vs} are vectorvector multiplications and thus have minimal overhead. We use a sigmoid to produce {qs, ks}. For vs see Section 6.4.
    Bits Per Character (BPC) onenwik8. The single attention SHA-LSTM has an attention head on the second last layer and hadbatch size 16 due to lower memory use. Directly comparing the head count for LSTM models and Transformer models obviously doesn’tmake sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead.
    Poništi
  10. proslijedio/la je Tweet
    20. stu 2019.

    Goodbye Denver. Parting trillion transistor selfies. Wafer-Scale Engine. 400,000 cores x 48 KB SRAM. All that and still only 16nm process. The best is yet to come.

    Prikaži ovu nit
    Poništi
  11. 20. stu 2019.

    This is extremely motivating: CS-1 is being used at ANL to find better cancer drugs -

    Poništi
  12. 20. stu 2019.

    I'm at SC'19. If you would like to see CS-1 or know more about programming 400k of its cores, please stop by the Cerebras booth #689

    Poništi
  13. 18. ruj 2019.

    Deep Dive Into The Wafer Scale Engine AI Chip With Andrew Feldman @より

    Poništi
  14. proslijedio/la je Tweet
    17. ruj 2019.

    The U.S. national laboratories plan to do big things with the world's biggest chip. For one, it'll help the labs' supercomputers ramp up their AI smarts at blazing speeds.

    Poništi
  15. 11. ruj 2019.
    Poništi
  16. 9. ruj 2019.

    Enabling AI’s potential through wafer-scale integration

    Poništi
  17. 9. ruj 2019.

    “The reason to think about wafer-scale, and making it work, is that it is a way around the chief barrier to higher computer performance: off-chip communication.”

    Poništi
  18. 29. kol 2019.
    Poništi
  19. 19. kol 2019.

    400k cores, 1.2T transistors - the most insane machine I have ever used

    Poništi
  20. 14. ožu 2019.
    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·