Smerity

@Smerity

gcc startup.c -o ./startup. Focused on machine learning & society. Previously Research via . '14, '11. 🇦🇺 in SF.

San Francisco, CA
Vrijeme pridruživanja: srpanj 2008.

Tweetovi

Blokirali ste korisnika/cu @Smerity

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @Smerity

  1. Prikvačeni tweet
    26. stu 2019.

    Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed.

    The SHA-RNN is composed of an RNN, pointer based attention, and a “Boom” feed-forward with a sprinkling of layer normalization. The persistent state is the RNN’s hidden state h as well as the memory M concatenated from previous memories. Bake at 200◦F for 16 to 20 hours in a desktop sized oven.
    The attention mechanism within the SHA-RNN is highly computationally efficient. The only matrix multiplication acts on the query. The A block represents scaled dot product attention, a vector-vector operation. The operators {qs, ks, vs} are vectorvector multiplications and thus have minimal overhead. We use a sigmoid to produce {qs, ks}. For vs see Section 6.4.
    Bits Per Character (BPC) onenwik8. The single attention SHA-LSTM has an attention head on the second last layer and hadbatch size 16 due to lower memory use. Directly comparing the head count for LSTM models and Transformer models obviously doesn’tmake sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead.
    Poništi
  2. proslijedio/la je Tweet
    2. velj

    Thread 👇 We’ve also had a difficult time with CuDNN as soon as you want to go beyond a basic LSTM and grab cell states, do -style dropout, etc. Maybe now that has an team, things might get better?

    Poništi
  3. 2. velj

    Having said that: - I'm all for giving Taika Waititi hold over the Star Wars universe - The Mandalorian is a refreshing take on the Samurai Western elements - Rogue One gave me the same awe upon seeing the Death Star tower as I'm certain Episode 4 audiences had A New Hope/s ;)

    Prikaži ovu nit
    Poništi
  4. 2. velj

    Knives Out was perfect, go see it. The new Jumanjis are a fun addition to the series. In comparison, I watched the original Star Wars trilogy nearly daily as a child yet read the leaked synopsis for Ep 9 as it'd make watching the disaster unfold in cinema less frustrating :(

    Prikaži ovu nit
    Poništi
  5. 2. velj

    At my local cinema: - Knives Out (released: Nov 27th, 2019) has 4 showtimes - Jumanji The Next Level (Dec 13th 2019) has 4 showtimes - Star Wars Rise of Skywalker (Dec 20th 2019) has 3 showtimes Star Wars Ep 4 started in ~40 theatres, expanded to 1700+, screened for 44 weeks.

    Prikaži ovu nit
    Poništi
  6. 31. sij

    For a link to the PyTorch JIT example, which shows brilliant work but I just wasn't able to convert it into a working solution after trying to work through the code (i.e. not really working result + slower than cuDNN - but maybe all my fault!):

    Prikaži ovu nit
    Poništi
  7. 31. sij

    Even in self-attention you need fancy tricks to be able to perform certain optimizations, such as reversible layers, in a way that's supported and gets maximal advantage from the framework. I'm glad that JAX exists for such cases!

    Prikaži ovu nit
    Poništi
  8. 31. sij

    The blackbox style of existing RNN implementations mean even simple modifications such as recovering the hidden state, rather than the output, at each timestep is difficult. One of the biggest advantages of non-recurrent self attention is that the tooling is already sufficient.

    Prikaži ovu nit
    Poništi
  9. 31. sij

    What are the best solns for high level to low level performant creation of RNN/LSTM layers? - Reason: seem unlikely to release new cuDNN RNNs (i.e. LayerNorm LSTM) - The JIT looked promising but JIT LSTM had many problems for me - JAX? TF? Back to CUDA?

    Prikaži ovu nit
    Poništi
  10. 29. sij

    To note: this isn't necessarily tragedy for open innovation. Both VCs and BigCo make the majority of returns from a short tail of success. VCs can fund from {private capital, IPOs, BigCo returns via acquisitions} to give "externalized R&D" an opportunity and a fate of its own.

    Prikaži ovu nit
    Poništi
  11. 29. sij

    Related curiosity: how much of the ~$200 billion in venture capital raised in the past few years, converted by start-ups to innovation, will end up in {Facebook, Google, Amazon, ...} hands? In some ways VC is like externalized R&D subsidized by a gambling community.

    Prikaži ovu nit
    Poništi
  12. proslijedio/la je Tweet
    28. sij

    Fascinating to compare the half-life of content across platforms (time it takes for a piece of content to reach 50% of its total lifetime engagement) 🧐 Twitter: 20 mins Facebook: 5 hrs Instagram: 20 hrs LinkedIn: 24 hrs YouTube: 20 days Pinterest: 4 mos Blog post: 2 yrs

    Prikaži ovu nit
    Poništi
  13. proslijedio/la je Tweet
    Odgovor korisnicima

    We tie 6 strongly simplified RNN layers in a CPU-optimized transformer hybrid with very little quality loss. Been doing that for two years now in production, wrote about it only now :)

    Poništi
  14. 28. sij

    The QRNN was used in "Multi-task self-supervised learning for Robust Speech Recognition" =]

    Poništi
  15. proslijedio/la je Tweet
    28. sij

    Interesting to hear of folks getting better results from ULMFiT / AWD-LSTM and fastai2 vs BERT/transformer based models. Anyone else tried comparing these? cc

    Prikaži ovu nit
    Poništi
  16. proslijedio/la je Tweet
    27. sij

    Amazing: a termite track (top) and an ant track (bottom) • each travelling insect is protected by its own column of soldiers, no fights necessary | 📹 via Mehdi Moussaid

    Poništi
  17. 27. sij

    I have a secret smile that turns on when I hit the button to ride an elevator up or down 15 floors in mere seconds. I get to do this every day. What everyday "unseen" engineering or societal miracles make you stop and smile? What magical future should make you smile but doesn't?

    Poništi
  18. proslijedio/la je Tweet

    So Apple's market cap is about to eclipse that of the entire Australian stock market. 😂

    Poništi
  19. proslijedio/la je Tweet
    19. sij

    I finished my side project today! I built a neural network that predicts the categories of a scientific paper given its title and abstract. The model is a smaller version of 's SHA-RNN, trained on all of arXiv. Try it out:

    Prikaži ovu nit
    Poništi
  20. proslijedio/la je Tweet
    13. sij

    Ramanujan’s first letter to G.H. Hardy reminds me of the difficulties faced by independent, often self-taught researchers working their way into the scientific research community.

    Poništi
  21. proslijedio/la je Tweet
    12. sij

    This reminds me of a fun fact I recently learnt about leaf cutter ants - by moving the body + legs as a compass to cut the leaf in an arc, the ant ends up with a leaf proportional to its size. So, bigger ants carry bigger leaves. :) geometry at work in nature.

    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·