This study explores how similar words in Mi'kmaq compare to each other when converted to a numeric form. Through the use of OpenAI and Word Embedding, I explored the word 'Play' and the twenty-one Mi'kmaq words that define it in the Mi'kmaq language.
#OpenAI #Indigenous
1/22
Conversation
Replying to
I am Mi'kmaq, and as a non-native speaker, I have always wanted to learn it fully. When I was young, my aunties always spoke in Mi'kmaq around me. I wish I had paid more attention.
Like many Indigenous languages, the use is declining as older generations pass.
2/22
2
1
In 2016, only a total population of 168,000 Mi'kmaq people, and only 4% identified as native speakers.
In an effort to preserve the Mi'kmaq language, I begin by using the most current AI tools available. In this study, I use OpenAI to explore word embedding.
3/22
1
1
The same company behind ChatGPT and DALL·E 2 is the solution of choice for this study. Maths at the bottom.
4/22
beta.openai.com/docs/guides/em
1
The twenty-one Mi'kmaq words for the word 'Play' is read from a CSV file.
5/22
1
'Newtigisg'g gisitu'a'ti'tis, newtigisg'g tu'a'ti'tis' translates to 'If they could play ball all day, they would play ball all day' is converted from a word form to numeric form and embedded into a vector.
I use this phrase because it has intent, an object, and an action.
6/22
1
These vectors are massive, it would be like looking into the matrix. Here is a tiny section of the entire numeric field that is within the vector assigned to that word.
It would take a few dozen scrolls to get to the bottom of the word in the screenshot.
7/22
1
The list of Mi'kmaq words is all embedded and printed with the vector header information.
8/22
1
I used the phrase I want to play ball. I didn't want to complicate the search process, but I wanted to know if unknown English words would reveal similarities. At this point, the AI only knows Mi'kmaq.
9/22
1
Mi'kmaq words are converted to a numerical form and embedded into a vector space.
Using Cosine Similarity, the word or term is compared to the embedded Mi'kmaq words.
10/22
1
The results showed that given just the words alone, that the AI could associate other Mi'kmaq words in meaning.
getmete'gl (win all/break all/destroy all)
mila'sualatl (plays with/toy with)
papit (amuse self)
mila'suaqan (toy)
mila'suatg (plays with/toy with)
11/22
1
It's interesting to point out that the word nuja'q (swimmer) is the furthest from the word getmete'gl (win all/break/destroy all) in this search.
12/22
1
Using the vector ID for awanmila'sit (plays poorly) and tu'at (play baseball/play ball), I use the two words to establish an updated vector association when adding new context to the search.
An action_vector and object_vector are defined for maths.
13/22
1
Using the awanmila'sit + tu'at, we establish an updated and greater value in similarities shared in vector spaces.
The shapes that could be imagined using the embedded number data would be amazing to view.
14/22
1
The English words associated with the meaning of each of the twenty-one Mi'kmaq words are now introduced.
15/22
1
The English words are converted to a numeric form and given a vector space.
We enter the phrase: I want to play ball, and the phrase is then embedded.
16/22
1
Using that phrase, this is the most associated order within a vector space. It has a striking comparison to the Mi'kmaq word list.
17/22
1
The sort list of English words and the association order within a vector space.
19/22
1
1
1
Thank you for getting to the end. I have only just begun. This is just from one word. I'm excited to see what else exists behind the veil of language.
I've uploaded the Jupyter and Python files to my GitHub.
github.com/AdieLaine/Mikm
22/22
1
