One of my worries about current trends in AI research is the reinforcement of the long-term cultural & linguistic dominance of the Anglosphere. It's not just that English is the exclusive language of research (which seems inevitable). NLP research and models focus on English.
-
-
Super interesting! I also hear from researchers who study emotional decisionmaking that the dubious "emotions-on-your-face" cues researchers are trying to teach algorithms to recognize are very culture-dependent.
-
I agree! And it’s not just the active “emotions on the face” it’s also that words that describe emotions have a different meaning to every single person. Reductionistic oversimplification of behaviours (linguistically & actual) is the opposite to actual lived human behaviours.
End of conversation
New conversation -
-
-
in research i agree but i guess the knowledge gained from training wirg the english language allows faster development of applications for other languages as well, or am i wrong?
-
Not really: English is a bit of an outlier in terms of linguistic structure in that it has impoverished morphology. We don't even have a proper future tense.
- Show replies
New conversation -
-
-
I wonder if AI models could help save dying languages?
-
Unfortunately, no. The current AI methods require translations of same content in both languages to train the Machine Translation model. Most dying languages don't even have a script. How do we get millions of words & sentences in the dying languages?
- Show replies
New conversation -
-
-
Pre-trained models in other languages than English are indeed lacking; at Hugging Face we're rooting for multi-lingual and that's why we currently have 11 multi-lingual models out of 33:https://huggingface.co/transformers/multilingual.html …
-
I have friends researching with pt-br, and the multi-lingual BERT yields worse results than using the ULMFiT or ELMo fully trained only on a pt-br Wikipedia dump. Until now, we can't see the benefits of using BERT. We would need an expensive machine to pre-train it.
End of conversation
New conversation -
-
-
I worked on NLP/NLU in Persian language, the lack of resources and datasets (even unlabeled) makes it so hard that we spent over 60% of our time creating tools to mine / gather data. Nowadays I see Mozilla is gathering some open datasets which might make things better, but /
-
Without a commercial interest, it's hopeless.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.