Aurelie Herbelot

@ah__cl

PI at the Centre for Mind/Brain Sciences, University of Trento. My group is CALM (Computational Approaches to Language and Meaning).

Planet Earth
Vrijeme pridruživanja: listopad 2017.

Tweetovi

Blokirali ste korisnika/cu @ah__cl

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @ah__cl

  1. proslijedio/la je Tweet
    21. sij

    Universals of word order reflect optimization of grammars for efficient communication: a thread. Joint work with and

    Prikaži ovu nit
    Poništi
  2. proslijedio/la je Tweet
    18. sij
    Odgovor korisniku/ci

    Why a mismatch between form and meaning? To keep semanticists in business. :)

    Poništi
  3. 18. sij

    And another puzzle: why should there be a mismatch between form and meaning anyway? We can be Chomskian about this: performance is degraded competence and produces the mismatch. Another explanation is that language did it on purpose. But that's a story for another day... /32

    Prikaži ovu nit
    Poništi
  4. 18. sij

    There is much more experimentation needed to get clear on the role of data vs 'innate' features in a learning algorithm. At present, we only know that certain hard-encoded mechanisms help, but without controlling for the exact contributions of the environment and the learner. /31

    Prikaži ovu nit
    Poništi
  5. 18. sij

    Contrast is an early feature of language acquisition (Clark, 1988). The fact that it might be encoded in a very simple structure in language is perhaps no coincidence. If this result holds, it would mean that this particular phenomenon *is* in the data in its rawest form. /30

    Prikaži ovu nit
    Poništi
  6. 18. sij

    Johann found that word vectors based on bigram probabilities encoded only moderate amounts of lexical semantics. *But* they were extremely good at auto-encoding bags of words, implying that they captured 'contrast': the notion that words that differ in form differ in meaning. /29

    Prikaži ovu nit
    Poništi
  7. 18. sij

    Last year, student Johann Seltmann did a short project in our group to start elucidating "How much competence is there in performance?" (Seltmann et al, 2019) His intention was to understand what exactly was in one of the most basic structures of language: the word bigram. /28

    Prikaži ovu nit
    Poništi
  8. 18. sij

    State-of-the-art distributional models of meaning have to apply great amounts of distortion to the input data to perform well. This distortion is a *hard-encoded* part of the model which needs to be there to do anything useful with the input. So what is in the raw data itself?/27

    Prikaži ovu nit
    Poništi
  9. 18. sij

    So if (language) data does not perfectly equal meaning, then meaning minus form corresponds to some non-empty element which, not being data, must be something else. Perhaps non-linguistic stuff, the contextual 'schemas' in Fillmore. And/or perhaps that mysterious innateness. /26

    Prikaži ovu nit
    Poništi
  10. 18. sij

    "... However, this is not the same thing as saying that the distributional structure of language (phonology, morphology, and at most a small amount of discourse structure) conforms in some one-to-one way with some independently discoverable structure of meaning." /25

    Prikaži ovu nit
    Poništi
  11. 18. sij

    Well, meaning is in the data, right? Harris (1954)! Distributional Hypothesis! But wait, what did Harris actually say? "To the extent that formal (distributional) structure can be discovered in discourse, it correlates in some way with the substance of what is being said... /24

    Prikaži ovu nit
    Poništi
  12. 18. sij

    PART 3 - THE DATA. Working backwards. If linguistic competence is in the data, ready to be captured by some generic pattern-finding algorithm, it doesn't need to be innate. If competence is not in the data, it needs to be somewhere else. So what is in the data? /23

    Prikaži ovu nit
    Poništi
  13. 18. sij

    Whichever way we look at it, we need something to learn anything, and that something must be somewhere. Should we be connectionists and need dedicated architectures? Should we be Bayesian and require some prior event space? What if the answer was in the data? /22

    Prikaži ovu nit
    Poništi
  14. 18. sij

    How about Bayesian approaches? We must now explain where priors come from, as well as the event space itself. Computing the probability of X requires having X in the first place, a space of possible concepts over which to distribute probability mass. Fodor is looming. /21

    Prikaži ovu nit
    Poništi
  15. 18. sij

    That's right, <<e,t>,<<e,t>,t>>. A determiner with a restrictor and a scope, giving truth values. Morale of the story: we must assume some innate structure for this particular network. When that structure takes the shape of a particular formal semantics type, we learn better. /20

    Prikaži ovu nit
    Poništi
  16. 18. sij

    Another striking example relates to learning the meaning of quantifiers in a neural model (, , yours truly and others, 2018). Below is the architecture that worked best in our experiments. Linguists, what does it look like to you? /19

    Prikaži ovu nit
    Poništi
  17. 18. sij

    In fact, and colleagues have shown that BERT works rather well even without pre-training, implying that structure might be playing a huge role in the results (Kovaleva et al 2019). /18

    Prikaži ovu nit
    Poništi
  18. 18. sij

    Take a trendy model like BERT. It makes huge assumptions in terms of architecture and underlying theories, favouring linguistic principles such as extreme contextualism and cognitive principles such as attention (vs incrementality). It also has a very particular shape. /17

    Prikaži ovu nit
    Poništi
  19. 18. sij

    The point is that the model we choose commits us to some theoretical priors. Just as need innate behavioural responses (see Chomsky & Katz argument), NNs require a lot of hard-coding: architecture, shape of input, hyperparameters, etc. /16

    Prikaži ovu nit
    Poništi
  20. 18. sij

    But perhaps we'd like to be truth-theoretic. Combining script-like regularities à la Fillmore, and formal semantics à la Montague, Rational Speech Act theory lets you sample worlds with Bayesian models (Goodman & Frank, 2016), solving the question: "where do worlds come from?"/15

    Prikaži ovu nit
    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·