Also, the majority of the data used to train Copilot (which came from GPT-3) is not code. The biggest part of it was the GPT-3 part that was trained with public internet content (which could also be licensed). So, you can't also state that "everything" (+)
-
-
Antwort an @_odelucca @NoraDotCodes und
that Github used to train Copilot was potentially licensed since the majority of the training was done in the transformer (GPT-3). So, the "code" part is just the tip of the iceberg. Just to specialize the GPT-3 algorithm to write code (+)
1 Antwort 1 Retweet 1 Gefällt mir -
Antwort an @_odelucca @NoraDotCodes und
Anyway, we need to do a LOT of debate on this subject. Not only for copilot but for ANY licensed content. For example: - If we train an AI to write songs, what should we do? Since we used licensed music to train it (+)
1 Antwort 0 Retweets 0 Gefällt mir -
Antwort an @_odelucca @NoraDotCodes und
- If we train an AI to write movie scripts, what should we do? Since we used licensed movie scripts to train it - If we train an AI to take pictures, what should we do? Since we used professional pictures to train it (+)
1 Antwort 0 Retweets 0 Gefällt mir -
Antwort an @_odelucca @NoraDotCodes und
This is a WHOLE new area. We can't simplify it as "OMG! COPILOT IS WRITING THE SAME LINE AS WE CAN SEE HERE IN THIS SOFTWARE" Common, it is way more complicated than that!
1 Antwort 0 Retweets 0 Gefällt mir -
Actually, we can, and we should simplify it, because no matter how complex the AI system is, it boils down to the same end result. It is designed to provide useful recommendations, not to avoid copyright infringement, that's the problem.
1 Antwort 0 Retweets 0 Gefällt mir -
You're mixing concepts, maybe because (as you've said) you're not an expert in machine learning. Recommendations are different from predictions. What Copilot (and also GPT-3) does is predicting your expectations based on a previously stated context. (+)
1 Antwort 0 Retweets 0 Gefällt mir -
Antwort an @_odelucca @awakecoding und
As I've said, if we go to this road, we can cancel any machine learning initiative, since every ML initiative uses publicly available data, not considering if they are licensed or not.
2 Antworten 0 Retweets 1 Gefällt mir -
I don't mind experiments and research, this type of technology is bound to eventually become the norm. The thing is that OSS is general is going to become the first victim of this shift, as you can't get access to private source code easily. I disagree with forcing it onto people
1 Antwort 0 Retweets 0 Gefällt mir -
But why using a publicly available source code to train an AI make it a victim? You're not copying it. The code is already publicly available. It is basically the same thing as a developer reading the code to learn from it.
2 Antworten 0 Retweets 0 Gefällt mir
1. Publicly available does not mean free to copy.
2. Yes, they are copying it, just as much as if they put it in a zip file; as @mitsuhiko has demonstrated, it can parrot pieces of code verbatim.
3. Apply the same logic to Disney movies and see if you'd get sued. You would.
-
-
Antwort an @NoraDotCodes @awakecoding und
So, what is the difference between a human reading licensed public OSS code to improve her/his skills and create functions that are basically the same that she/he saw in a given library, to a machine using that data to learn from it and splitting the same? (honest question)
1 Antwort 0 Retweets 0 Gefällt mir -
Antwort an @_odelucca @NoraDotCodes und
IMHO that's the way it always happened. Humans learn from other humans. This is not copying
2 Antworten 0 Retweets 0 Gefällt mir - Antworten anzeigen
Neue Unterhaltung -
Das Laden scheint etwas zu dauern.
Twitter ist möglicherweise überlastet oder hat einen vorübergehenden Schluckauf. Probiere es erneut oder besuche Twitter Status für weitere Informationen.