Lei Li

@_TobiasLee

Senior student at Xidian University, Homepage:

Vrijeme pridruživanja: kolovoz 2015.

Tweetovi

Blokirali ste korisnika/cu @_TobiasLee

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @_TobiasLee

  1. proslijedio/la je Tweet
    11. sij

    “Meet AdaMod: a new deep learning optimizer with memory” by Less Wright

    Poništi
  2. proslijedio/la je Tweet
    6. sij

    10 ML & NLP Research Highlights of 2019 New blog post on ten ML and NLP research directions that I found exciting and impactful in 2019.

    Poništi
  3. 23. pro 2019.
    Poništi
  4. proslijedio/la je Tweet
    23. pro 2019.

    New NLP News: 2020 NLP wish lists, HuggingFace + fastai, NeurIPS 2019, GPT-2 things, Machine Learning Interviews via

    Poništi
  5. 23. pro 2019.

    Still a lot to explore about GNN models

    Poništi
  6. proslijedio/la je Tweet
    8. pro 2019.

    Our recent work to be presented at NeurIPS 2019: Understanding and improving layer normalization-- We find that the normalization effects on derivatives are much more important than forward normalization. Paper:

    Poništi
  7. proslijedio/la je Tweet
    5. pro 2019.

    A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance. arxiv: code:

    Poništi
  8. proslijedio/la je Tweet
    26. stu 2019.

    Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed.

    The SHA-RNN is composed of an RNN, pointer based attention, and a “Boom” feed-forward with a sprinkling of layer normalization. The persistent state is the RNN’s hidden state h as well as the memory M concatenated from previous memories. Bake at 200◦F for 16 to 20 hours in a desktop sized oven.
    The attention mechanism within the SHA-RNN is highly computationally efficient. The only matrix multiplication acts on the query. The A block represents scaled dot product attention, a vector-vector operation. The operators {qs, ks, vs} are vectorvector multiplications and thus have minimal overhead. We use a sigmoid to produce {qs, ks}. For vs see Section 6.4.
    Bits Per Character (BPC) onenwik8. The single attention SHA-LSTM has an attention head on the second last layer and hadbatch size 16 due to lower memory use. Directly comparing the head count for LSTM models and Transformer models obviously doesn’tmake sense but neither does comparing zero-headed LSTMs against bajillion headed models and then declaring an entire species dead.
    Poništi
  9. proslijedio/la je Tweet
    24. stu 2019.

    A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.

    Poništi
  10. proslijedio/la je Tweet
    8. stu 2019.
    Poništi
  11. proslijedio/la je Tweet
    5. stu 2019.

    This slide from 's (excellent) @emnlp2019 talk speaks to me on a spiritual level. From: Show Your Work: Improved Reporting of Experimental Results. Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz and Noah A. Smith Paper link:

    Poništi
  12. proslijedio/la je Tweet
    26. ožu 2019.

    New blog post (...and new blog) on how to build a deep learning framework from scratch in 100 LOC . Let me know what you think!

    Poništi
  13. 27. ožu 2019.
    Poništi
  14. 22. pro 2018.
    Poništi
  15. proslijedio/la je Tweet
    27. stu 2018.

    + are the fastest platform for ML. On the popular (and now quite silly) ResNet-50 ImageNet benchmark, one TPUv3 Pod finishes in 2.2 minutes at full accuracy (76.3% top-1)! Approx 2x faster than extremely large GPU clusters. Details:

    Poništi
  16. proslijedio/la je Tweet

    "Pattern Recognition and Machine Learning" by is now available as a free download. Download your copy today for an introduction to the fields of pattern recognition & machine learning:

    Poništi
  17. proslijedio/la je Tweet
    11. lis 2018.

    A new era of NLP has just begun a few days ago: large pretraining models (Transformer 24 layers, 1024 dim, 16 heads) + massive compute is all you need. BERT from : SOTA results on everything . Results on SQuAD are just mind-blowing. Fun time ahead!

    Poništi
  18. 4. srp 2018.
    Poništi
  19. 31. lis 2017.

    Long time no see! I'm online again just for twitter dataset QAQ

    Poništi
  20. 1. kol 2015.

    怕什么真理无穷,进一寸有一寸的欢喜。 Hello,Twitter!

    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·