Conversation

Large language models (LLM) can be defined as "deep learning algorithms that can recognize, summarize, translate, predict, and generate text and other content based on knowledge gained from massive datasets." 2/22
1
10
Some are celebrating these models as great productivity hacks that will revolutionize how we work and almost "automate" the creation of texts, code, documents, marketing campaigns, and whatever comes up to the imagination. 3/22
1
9
Others are not so enthusiastic. Some feel that these tools are actually endangering the value or the existence of their own human work. Others point out that it will be more and more difficult to distinguish good quality work from persuasive nonsense spit out by AI tools. 4/22
1
12
Privacy professionals are wary and can already foresee risks. To begin with, these tools are trained using publicly available data, which contain massive amounts of personal data. 5/22
1
14
If you have ever posted anything in English on the internet, it is possible that your data was used to train these models. 6/22
1
10
and his team of researchers demonstrated in their paper that "an adversary can perform a training data extraction attack to recover individual training examples by querying the language model." 7/22
1
13
Replying to
Even if the information was "publicly available," this is a breach of contextual integrity, a concept popularized by and central to privacy. When you post something personal online, it is necessarily connected to the context you chose to post. 9/22
1
16
You would not want that your inferred address or information about a life milestone that you posted on a social network would be shared with others by a chatbot, e.g., when prompted to create a marketing campaign or to offer customer support. 10/22
2
8
"told me Mat has a wife and two young daughters (correct, apart from the names), and lives in San Francisco (correct). It also told me it wasn’t sure if Mat has a dog: '[From] what we can see on social media, it doesn't appear that Mat Honan has any pets. 13/22
1
6
He has tweeted about his love of dogs in the past, but he doesn't seem to have any of his own.'(Incorrect.)." In this case, some of the information was correct, but some was pure fantasy, probably some form of association. 14/22
1
7
So in a similar sense, if someone is constantly writing about crime or murder news, this person's name might be associated with criminal offenses, causing reputational harm. 15/22
1
8
If people are going to resort to these tools to obtain information, does it make sense that it is going to output fake information, including fantasized personal information - causing privacy and reputational harm? 16/22
1
11
If we remember GDPR's article 5, it brings principles relating to the processing of personal data. Among them are purpose limitation, data minimization, and accuracy. 18/22
1
9
I do not even need to go further and analyze specific articles to point out that it is unclear to me how LLM-based tools can be GDPR compliant. 19/22
1
9
I never gave my data to LLM-based chatbots; I never consented that my data was going to be scrapped from the internet, condensed in a context-less way, sometimes fantasized or distorted, then potentially output by a chatbot when anyone around the world gives it a prompt. 20/22
1
16
These models are constantly being trained, and the more they are used, the more "advanced" will be their next version - including in terms of potential privacy harms - if we do not do anything about it. 21/22
1
9
Show additional replies, including those that may contain offensive content
Show