Tweetovi
- Tweetovi, trenutna stranica.
- Tweetovi i odgovori
- Medijski sadržaj
Blokirali ste korisnika/cu @ah__cl
Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @ah__cl
-
Aurelie Herbelot proslijedio/la je Tweet
Universals of word order reflect optimization of grammars for efficient communication: a thread. Joint work with
@mhahn29 and@jurafsky https://www.pnas.org/content/pnas/early/2020/01/16/1910923117.full.pdf … pic.twitter.com/xDeqeg5efk
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Aurelie Herbelot proslijedio/la je Tweet
Why a mismatch between form and meaning? To keep semanticists in business. :)
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
And another puzzle: why should there be a mismatch between form and meaning anyway? We can be Chomskian about this: performance is degraded competence and produces the mismatch. Another explanation is that language did it on purpose. But that's a story for another day... /32
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
There is much more experimentation needed to get clear on the role of data vs 'innate' features in a learning algorithm. At present, we only know that certain hard-encoded mechanisms help, but without controlling for the exact contributions of the environment and the learner. /31
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Contrast is an early feature of language acquisition (Clark, 1988). The fact that it might be encoded in a very simple structure in language is perhaps no coincidence. If this result holds, it would mean that this particular phenomenon *is* in the data in its rawest form. /30
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Johann found that word vectors based on bigram probabilities encoded only moderate amounts of lexical semantics. *But* they were extremely good at auto-encoding bags of words, implying that they captured 'contrast': the notion that words that differ in form differ in meaning. /29
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Last year, student Johann Seltmann did a short project in our group to start elucidating "How much competence is there in performance?" (Seltmann et al, 2019) His intention was to understand what exactly was in one of the most basic structures of language: the word bigram. /28
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
State-of-the-art distributional models of meaning have to apply great amounts of distortion to the input data to perform well. This distortion is a *hard-encoded* part of the model which needs to be there to do anything useful with the input. So what is in the raw data itself?/27
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
So if (language) data does not perfectly equal meaning, then meaning minus form corresponds to some non-empty element which, not being data, must be something else. Perhaps non-linguistic stuff, the contextual 'schemas' in Fillmore. And/or perhaps that mysterious innateness. /26
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
"... However, this is not the same thing as saying that the distributional structure of language (phonology, morphology, and at most a small amount of discourse structure) conforms in some one-to-one way with some independently discoverable structure of meaning." /25
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Well, meaning is in the data, right? Harris (1954)! Distributional Hypothesis! But wait, what did Harris actually say? "To the extent that formal (distributional) structure can be discovered in discourse, it correlates in some way with the substance of what is being said... /24
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
PART 3 - THE DATA. Working backwards. If linguistic competence is in the data, ready to be captured by some generic pattern-finding algorithm, it doesn't need to be innate. If competence is not in the data, it needs to be somewhere else. So what is in the data? /23
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Whichever way we look at it, we need something to learn anything, and that something must be somewhere. Should we be connectionists and need dedicated architectures? Should we be Bayesian and require some prior event space? What if the answer was in the data? /22
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
How about Bayesian approaches? We must now explain where priors come from, as well as the event space itself. Computing the probability of X requires having X in the first place, a space of possible concepts over which to distribute probability mass. Fodor is looming. /21
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
That's right, <<e,t>,<<e,t>,t>>. A determiner with a restrictor and a scope, giving truth values. Morale of the story: we must assume some innate structure for this particular network. When that structure takes the shape of a particular formal semantics type, we learn better. /20
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Another striking example relates to learning the meaning of quantifiers in a neural model (
@IonutSorodoc,@sandropezzelle, yours truly and others, 2018). Below is the architecture that worked best in our experiments. Linguists, what does it look like to you? /19pic.twitter.com/sIOPns1h6Q
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
In fact,
@annargrs and colleagues have shown that BERT works rather well even without pre-training, implying that structure might be playing a huge role in the results (Kovaleva et al 2019). /18https://twitter.com/annargrs/status/1217887159530573824?s=20 …Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Take a trendy model like BERT. It makes huge assumptions in terms of architecture and underlying theories, favouring linguistic principles such as extreme contextualism and cognitive principles such as attention (vs incrementality). It also has a very particular shape. /17
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
The point is that the model we choose commits us to some theoretical priors. Just as
#empiricists need innate behavioural responses (see Chomsky & Katz argument), NNs require a lot of hard-coding: architecture, shape of input, hyperparameters, etc. /16Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
But perhaps we'd like to be truth-theoretic. Combining script-like regularities à la Fillmore, and formal semantics à la Montague, Rational Speech Act theory lets you sample worlds with Bayesian models (Goodman & Frank, 2016), solving the question: "where do worlds come from?"/15
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.