Tweetovi

Blokirali ste korisnika/cu @tpimentelms

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @tpimentelms

  1. Prikvačeni tweet
    4. velj

    Our new work on measuring phonotactic complexity and its trade-offs has just been published in TACL 😁 This is joint work with and . Wanna know more about it? Follow the thread:

    Prikaži ovu nit
    Poništi
  2. 4. velj

    Finally, we tried including phoneme features explicitly in the LM embedding representations, but this yielded little benefit except in an extremely low-resource condition.

    Prikaži ovu nit
    Poništi
  3. 4. velj

    This correlation persists when controlling for characteristics of long words. Our proposed measure also correlates with word length both across and inside language families. Standard measures of phonotactic complexity do not show such correlation intra language families.

    Prikaži ovu nit
    Poništi
  4. 4. velj

    Why should I care? Weak trade-offs have been found between other measures of phonetic complexity (e.g. size of vowels inventory) and word length. Entropy (bits per phoneme) shows a strong negative correlation with average word length (-.74).

    Prikaži ovu nit
    Poništi
  5. 4. velj

    (Pros continues): - requires relatively modest annotation to be computed; - it permits a straightforward cross-linguistic comparison; - uses a very cool LSTM (actually it uses a standard one, but they are cool).

    Prikaži ovu nit
    Poništi
  6. 4. velj

    Pros in our approach: - it considers frequency distribution for different phonemes in a language; - covers corner cases (e.g. borrowed words and phonemes) nicely at the rate they appear; - captures long distance dependencies in words (e.g. vowel harmony);

    Prikaži ovu nit
    Poništi
  7. 4. velj

    We do an extensive review of relevant literature: - what is phonotactics and how has it been modeled? - what is complexity and how might one measure it? - the role of information theory in linguistic complexity.

    Prikaži ovu nit
    Poništi
  8. 4. velj

    What are phonotactics? The study of which sequences of segments constitute natural-sounding words. The classic example (due to Chomsky and Halle) is: - brick: actual English word; - blick: not a word, but judged to be grammatical by speakers; - bnick: not a possible word.

    Prikaži ovu nit
    Poništi
  9. 4. velj

    In this paper, we propose using bits-per-phoneme as a measure of phonotactic complexity. We train a phoneme-level LSTM language model, and use it to estimate its cross-entropy on held-out data. This gives us an upper-bound on the actual "phonotactic entropy" of a language.

    Prikaži ovu nit
    Poništi
  10. proslijedio/la je Tweet
    24. sij

    Please join SIGTYP, ACL's newest SIG! We are dedicated to the computational study of linguistic typology and multilingual NLP. Sign up on our website . We are hosting a workshop (SIGTYP 2020) at . Consider submitting ()!

    Poništi
  11. proslijedio/la je Tweet
    29. srp 2019.

    Tiago Pimentel presenting best paper nominated paper on work with and others

    Poništi
  12. 25. srp 2019.

    Our work is a candidate for best paper in :) Read a short description in the thread! The whole paper in arXiv: ! And come to our presentation on Monday afternoon! Ps: Also read the other candidates! List is here:

    Poništi
  13. 17. lip 2019.

    Drawbacks: - We used English word2vec as meaning vectors for all languages in NorthEuraLex. -> definitely not - Using word2vec vectors might not fully capture meaning, so we might be underestimating systematicity. - Our mutual information measurements are estimates.

    Prikaži ovu nit
    Poništi
  14. 17. lip 2019.

    Advantages against previous work: - Easy to control for trivial factors of systematicity. - Can capture non-linear relationships between meaning and form. - Bits per phone is a more tangible measurement unit than Pearson correlation results.

    Prikaži ovu nit
    Poništi
  15. 17. lip 2019.

    Results 3: Finding fantastic phonesthemes. We can use our technique to get a list of phonesthemes in English, German and Dutch.

    Prikaži ovu nit
    Poništi
  16. 17. lip 2019.

    Results 2: Using NorthEuraLex, we find systematicity in 87 (out of 106) analysed languages. After controlling for POS tags, though, we only find it for 17.

    Prikaži ovu nit
    Poništi
  17. 17. lip 2019.

    Results 1: Using CELEX, we find systematicity (statistically significant nonzero mutual information I(V; W)) in English, German and Dutch. We also find systematicity after controlling for POS tags.

    Prikaži ovu nit
    Poništi
  18. 17. lip 2019.

    How do we represent forms? We use words' phone string representation in the International Phonetic Alphabet (IPA), How do we represent meaning? We use Word2Vec distributed representations.

    Prikaži ovu nit
    Poništi
  19. 17. lip 2019.

    By framing systematicity as mutual information, we can straightforwardly control for known (maybe trivial) factors of systematicity (e.g. grammatical class in Portuguese / Spanish). I(W;V|G) = H(W|G) - H(W| V, G)

    Prikaži ovu nit
    Poništi
  20. 17. lip 2019.

    We estimate mutual information as the difference in entropy of two phone-level (form; W) LSTM language models---one of which is conditioned on the semantic representation (meaning; V). I(W;V) = H(W) - H(W| V)

    Prikaži ovu nit
    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·