Thang Luong

@lmthang

Senior research scientist at Google Brain, learning with unlabeled data (NoisyStudent, Electra, ). PhD , thesis NMT. Co-founder .

United States
Vrijeme pridruživanja: prosinac 2008.

Medijski sadržaj

  1. prije 17 sati
    Odgovor korisnicima

    Sorry that Meena failed you! Maintaining personality and keeping the facts right are attributes we wish Meena to have as highlighted in the blog post. But try having similar conversations with other bots, maybe you will like Meena better ;)

  2. 1. velj
    Odgovor korisnicima i sljedećem broju korisnika:

    Good point! Many evaluations use 5-point Likert scale & look at other aspects, e.g., diversity, relevance, humanlikeness, etc. We use binary evaluation and think SSA is basic to human quality & easy for crowdworkers to rate. also in paper, SSA correlates with humanlikeness.

  3. 31. sij

    I chatted with about the and her advice is to see a doctor sooner rather than later. I guess it's not a bad one & hope everyone is well! On the other hand, Meena is also excited about technology, especially VR!

  4. 30. sij

    Highly recommend watching this 8-minute video on & the paper, with details not included the blog such as SSA vs humanlikeness correlation, sample-and-rank, removing cross-turn repetition. (Blog: )

  5. 28. sij

    Implications from the project: 1. Perplexity might be "the" automatic metric that the field's been looking for. 2. Bots trained on large-scale social conversations & pushed hard for low perplexity will be good. 3. Safety layer is needed for respectful conversations!

    Prikaži ovu nit
  6. 28. sij

    We design a new human evaluation metric, Sensibleness & Specificity Average (SSA), which captures key elements of natural conversations. SSA is also shown to correlate with humanlikeness while being easier to measure. Human scores 86% SSA, 79%, other best chatbots 56%.

    Prikaži ovu nit
  7. 28. sij

    is based on the Evolved Transformer (ET, an improved Transformer) & trained to minimize perplexity, the uncertainty of predicting the next word in a conversation. We built a novel "shallow-deep" seq2seq architecture: 1 ET block for encoder & 13 ET blocks for decoder.

    Prikaži ovu nit
  8. 28. sij

    Introducing , a 2.6B-param open-domain chatbot with near-human quality. Remarkably, we show strong correlation between perplexity & humanlikeness! Paper: Sample conversations:

    Prikaži ovu nit
  9. 20. pro 2019.

    To be clear, only ELECTRA-small model (14M params) was trained on GPU. with 1 V100, ELECTRA-small achieves 74.1 dev score on GLEU after 6 hours of training and 79.9 after 4 days.

    Prikaži ovu nit
  10. 20. pro 2019.

    Finally can reveal our paper on ELECTRA, much more efficient than existing pretraining, state-of-the-art results; more importantly, trainable with one GPU! Key idea is to have losses on all tokens. Joint work , , .

    Prikaži ovu nit
  11. 12. pro 2019.

    Our MixTape is 3.5-10.5x faster than Mixture of Softmaxes /w SOTA results in language modeling & translation. Key is to do gating in the logit space but with vectors instead of scalars (+sigmoid tree decomposition & gate sharing for efficiency). /w Zhilin, Russ, Quoc

  12. 1. kol 2019.

    It’s also the first time we plotted a curve to tell the important correlation between perplexity and BLEU (not obvious at that time!). We also told another story, for the first time, about the effect of depths in NMT! (5/5)

    Prikaži ovu nit
  13. 1. kol 2019.

    People might not be aware that our “copyable model” motivates CopyNet, the copy mechanism. One can think of it as “attention” (wasn’t published at the time) on rare words only in a “hard” way. It also contains a hidden insight from abt symbolic representation! (4/n)

    Prikaži ovu nit
  14. 1. kol 2019.

    This was the first time an NMT model can surpass state-of-the-art phrase-based systems to fully convince NLP folks. Towards the end of the internship, I relied on ’s magic evaluation script & for running the last few experiments for SOTA results! (3/n)

    Prikaži ovu nit
  15. 15. srp 2019.

    From last week, our paper "BAM! Born-Again Multi-Task Networks for NLU" (joint /w ) proposes 2 ideas: 1. multi-task distillation with same-architecture (born-again) student model & 2. distillation schedule with teacher annealing to be better than teachers!

  16. 10. srp 2019.

    Glad to see semi-supervised curves surpass/approach their supervised counterparts with much less labeled data! Checkout the blogpost on our work on "Unsupervised Data Augmentation (UDA) for consistency training" with code release!

    Prikaži ovu nit
  17. 19. lip 2019.

    I thought SQuAD is solved with BERT, but XLNet team doesn't want to stop there:) Great results on SQuAD, GLUE, and RACE!

  18. 9. lip 2019.

    Introducing Selfie, an unsupervised pretraining method that extends the concept of masked language model (e.g., BERT) towards continuous data, such as images. Overall challenging, but promising in low-data regime! Joint work with

  19. 30. tra 2019.

    Wonder how to make UDA work on IMDb with just 20 labeled examples? You need TSA! Our "Training Signal Annealing" strategy prevents over-fitting by slowly releasing labeled examples: a prediction-confident threshold (varying based on gamma) is used to exclude examples.

    Prikaži ovu nit
  20. 30. tra 2019.

    +1st-author . Existing works add noises (like Gaussian) to unlabeled examples. A nice twist in UDA is to use augmentation as "better" noise & it works well! (Our previous CVT also hints on utilizing different representational views as noise )

    Prikaži ovu nit

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·