Conversation

given that there exist words with multiple distinct meanings, should “semantic distance” between words even satisfy the triangle inequality? e.g. intuitively “car” is close to “bus” (noun) and “bus” is close to “wait” (verb), but “car” isn’t close to “wait”
Quote Tweet
the word2vec manifold must be folded in on itself in weird ways by words with multiple meanings, i was i guess naively expecting it to be a lot "flatter"
Show this thread
Replying to
afaict the notion of semantic distance word2vec is trying to capture must satisfy the triangle inequality. so it looks to me like there are words that are “artificially too close” because of these paths through ambiguous words. does that sound right?
1
6
nice, here it is In The Literature: “semantically unrelated words that are similar to different senses of a word are pulled towards each other in the semantic space”
Image
2
1
3
this circle of ideas around exploring the structure of "wordspace" is interesting to me among other things b/c everyone already has their own intuitive sense of "wordspace" and of semantic distance between words, etc. that's why you can play games like french toast w/ anyone
Image
2
so any kind of technical attempt to machine-learn something about the structure of wordspace, like word2vec, can be compared and contrasted to people's existing conceptions of wordspace. that's good shit right there
Replying to
I would think that words with distinct semantic meanings should be treated as effectively distinct words, for such a metric. Anything else is conflating syntactic and semantic levels.
1
1