Conversation

Replying to
I am Mi'kmaq, and as a non-native speaker, I have always wanted to learn it fully. When I was young, my aunties always spoke in Mi'kmaq around me. I wish I had paid more attention. Like many Indigenous languages, the use is declining as older generations pass. 2/22
Image
2
1
In 2016, only a total population of 168,000 Mi'kmaq people, and only 4% identified as native speakers. In an effort to preserve the Mi'kmaq language, I begin by using the most current AI tools available. In this study, I use OpenAI to explore word embedding. 3/22
1
1
'Newtigisg'g gisitu'a'ti'tis, newtigisg'g tu'a'ti'tis' translates to 'If they could play ball all day, they would play ball all day' is converted from a word form to numeric form and embedded into a vector. I use this phrase because it has intent, an object, and an action. 6/22
Image
1
These vectors are massive, it would be like looking into the matrix. Here is a tiny section of the entire numeric field that is within the vector assigned to that word. It would take a few dozen scrolls to get to the bottom of the word in the screenshot. 7/22
Image
Image
1
I used the phrase I want to play ball. I didn't want to complicate the search process, but I wanted to know if unknown English words would reveal similarities. At this point, the AI only knows Mi'kmaq. 9/22
Image
1
Mi'kmaq words are converted to a numerical form and embedded into a vector space. Using Cosine Similarity, the word or term is compared to the embedded Mi'kmaq words. 10/22
Image
1
The results showed that given just the words alone, that the AI could associate other Mi'kmaq words in meaning. getmete'gl (win all/break all/destroy all) mila'sualatl (plays with/toy with) papit (amuse self) mila'suaqan (toy) mila'suatg (plays with/toy with) 11/22
Image
1
It's interesting to point out that the word nuja'q (swimmer) is the furthest from the word getmete'gl (win all/break/destroy all) in this search. 12/22
Image
1
Using the vector ID for awanmila'sit (plays poorly) and tu'at (play baseball/play ball), I use the two words to establish an updated vector association when adding new context to the search. An action_vector and object_vector are defined for maths. 13/22
Image
1
Using the awanmila'sit + tu'at, we establish an updated and greater value in similarities shared in vector spaces. The shapes that could be imagined using the embedded number data would be amazing to view. 14/22
Image
1
The English words are converted to a numeric form and given a vector space. We enter the phrase: I want to play ball, and the phrase is then embedded. 16/22
Image
1
Using that phrase, this is the most associated order within a vector space. It has a striking comparison to the Mi'kmaq word list. 17/22
Image
1