Tweetovi

Blokirali ste korisnika/cu @pragmaticml

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @pragmaticml

  1. Prikvačeni tweet
    9. ruj 2019.

    Finetune 0.8.3 is out! It's a minor release only in name. Featuring: 📚 Much more complete documentation! ⚗️ The small but mighty DistilBERT from the team! 🐳 GPT-2 774M from ! Github: Documentation:

    Poništi
  2. 4. velj

    This is a first -- received a *handwritten* note and some beautifully designed stickers in the mail from the wonderful folks ! They've clearly put as much attention to detail into their mail as they've put into their product :)

    Poništi
  3. proslijedio/la je Tweet
    31. sij

    An Opinionated Guide to ML Research: “To make breakthroughs with idea-driven research, you need to develop an exceptionally deep understanding of your subject, and a perspective that diverges from the rest of the community—some can do it, but it’s hard.”

    Prikaži ovu nit
    Poništi
  4. proslijedio/la je Tweet
    28. sij

    Introducing the new Thinc, a refreshing functional take on deep learning! 🔮 Static type checking 🔥 Mix , & ⛓️ Integrated config system 🧮 Extensible backends incl. JAX (experimental) 🧬 Variable-length sequences & more

    Prikaži ovu nit
    Poništi
  5. proslijedio/la je Tweet
    28. sij

    Super interesting article about the "Dark Secrets of BERT". They did an analysis on what happens in the fine-tuned BERT, in particular on the GLUE tasks. Check it out! I enjoyed reading it! 📝 Article:

    Poništi
  6. proslijedio/la je Tweet
    27. sij

    [1/2] Excited to present SMART: Semi-Autoregressive Training for Conditional Masked Language Models. SMART closes the performance gap between semi- and fully-autoregressive MT models, while retaining the benefits of fast parallel decoding. With

    Prikaži ovu nit
    Poništi
  7. proslijedio/la je Tweet
    25. sij

    Watch my awesome mentor talk about why it’s important to include people in the ML lifecycle. Her views on human-centered approaches and evaluation of interpretability are exactly how this field should move forward!

    Poništi
  8. proslijedio/la je Tweet
    24. sij

    Scaling Laws for Neural Language Models. OpenAI team found that the loss of LM scales as a power-law with model size, dataset size, and the amount of compute used for training up to seven order of magnitudes.

    Prikaži ovu nit
    Poništi
  9. proslijedio/la je Tweet
    22. sij

    The quiet semisupervised revolution continues

    Poništi
  10. 20. sij

    It's exciting to see alternatives to attention that avoid the O(seq_length^2) complexity. I think the Reformer (or perhaps a variant like 's k-means based version) is here to stay.

    Poništi
  11. proslijedio/la je Tweet
    19. sij
    Poništi
  12. proslijedio/la je Tweet
    16. sij

    Introducing Reformer, an efficiency optimized architecture, based on the Transformer model for language understanding, that can handle context windows of up to 1 million words, all on a single accelerator with only 16GB of memory. Read all about it ↓

    Poništi
  13. proslijedio/la je Tweet
    6. sij

    10 ML & NLP Research Highlights of 2019 New blog post on ten ML and NLP research directions that I found exciting and impactful in 2019.

    Poništi
  14. proslijedio/la je Tweet
    12. sij

    When Adam Gaier first tried to test weight agnostic neural networks using random weights, we didn’t get any good results. He then made a bug in the code that accidentally forced all the weights to be shared, and suddenly we got much better results, and based our paper on the bug.

    Prikaži ovu nit
    Poništi
  15. proslijedio/la je Tweet
    28. pro 2019.

    Reformer: The Efficient Transformer They present techniques to reduce the time and memory complexity of Transformer, allowing batches of very long sequences (64K) to fit on one GPU. Should pave way for Transformer to be really impactful beyond NLP domain

    Prikaži ovu nit
    Poništi
  16. proslijedio/la je Tweet
    7. sij

    I wrote a retrospective on human-in-the-loop companies, a specific subset I'm calling worker-in-the-loop: . This draws from my time building Clara for five years, and knowing many other companies in the space.

    Prikaži ovu nit
    Poništi
  17. proslijedio/la je Tweet
    8. sij

    Ok, people! Are you looking for something to read at the intersection of machine learning and HCI? Three new papers posted online today that you should check out, all with my amazing colleague/BFF ! Ready? I'm gonna try a thread!

    Prikaži ovu nit
    Poništi
  18. proslijedio/la je Tweet
    25. pro 2019.

    Merry Christmas to those who celebrate and happy holidays from Koala and I! I got some needle felting supplies for Christmas and I can't wait to them out! I know the holidays can be hard on some people and I hope everyone is able to enjoy them as much as possible 💜

    Poništi
  19. proslijedio/la je Tweet
    20. pro 2019.

    Finally can reveal our paper on ELECTRA, much more efficient than existing pretraining, state-of-the-art results; more importantly, trainable with one GPU! Key idea is to have losses on all tokens. Joint work , , .

    Prikaži ovu nit
    Poništi
  20. proslijedio/la je Tweet
    2. pro 2019.

    I've been working with for a long time to create something that combines the best of Notebooks with the best of traditional software development approaches. It's called nbdev. We're releasing it today as open source. 1/

    Prikaži ovu nit
    Poništi
  21. proslijedio/la je Tweet
    26. stu 2019.

    Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed.

    The SHA-RNN is composed of an RNN, pointer based attention, and a “Boom” feed-forward with a sprinkling of layer normalization. The persistent state is the RNN’s hidden state h as well as the memory M concatenated from previous memories. Bake at 200◦F for 16 to 20 hours in a desktop sized oven.
    The attention mechanism within the SHA-RNN is highly computationally efficient. The only matrix multiplication acts on the query. The A block represents scaled dot product attention, a vector-vector operation. The operators {qs, ks, vs} are vectorvector multiplications and thus have minimal overhead. We use a sigmoid to produce {qs, ks}. For vs see Section 6.4.
    Bits Per Character (BPC) onenwik8. The single attention SHA-LSTM has an attention head on the second last layer and hadbatch size 16 due to lower memory use. Directly comparing the head count for LSTM models and Transformer models obviously doesn’tmake sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead.
    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·