Conversation

Really getting tired of the “but they trained it on our code” chat. GitHub has hosted your code for free, forever. Enjoying GitHub Actions? How about GHCR? When your license your code as MIT, and other permissive licenses, you free it up for anybody to make money from it.
23
142
Replying to
I'm honestly curious about the backlash. "Our code", so much FOSS, is used to make money for other people. That's why OSS is commercially viable. If this is a problem, then write a new license which prohibits commercial AI consumption of the code licensed by it and adopt it. 🤷🏼‍♀️
1
2
Replying to and
Open source licenses almost all require attribution. In the case of copyleft licenses they require not adding additional restrictions. Commercial use is allowed, but the licenses still have to be respected. If you use CoPilot you're creating work derived from other projects.
1
16
Replying to and
This is interesting, the attribution requirement must have an effective trigger. Is there a difference between human and machine self teaching from GitHub? How many similar lines of code from a studied project in original work obligates the author to the original license?
2
Replying to and
The machine learning model itself is a derivative work. It's not a person, isn't actually an intelligence and hasn't actually learned how to write code. It's a language model deriving directly from what it has seen before. I don't see how it'd bypass licensing of the code.
1
7
It doesn't necessarily matter if it copies one project in particular. In many cases there is 1 company with ownership over a vast amount of code written by their employees (such as Google). Individuals could assign copyrights to an org or take collective action even without it.
1
1
I don't think it needs to be explicitly stated that a machine learning model created from code is a derivative work but licenses could start explicitly stating it going forward if the licenses aren't being respected with how they previously existed. It's a simple thing to state.
1
1
And worth noting that open source licenses do permit this, but it would be very inconvenient to respect the permissive licenses with all the attributions and the model would need to leave out copyleft code unless you're writing code that's meant to be under that license.
1
Instead they're just hoping it's would be considered fair use on a case-by-case basis by the courts and are depending on it mostly not coming up in practice because most people don't care enough to file a lawsuit especially when it's diluted so much. Hard to prove / get anything.
1
2
Violating the licenses is unethical even if they don't think it's going to be a pragmatic legal issue for them in the courts though. They COULD have only used non-copyleft code (which is likely the majority of code on GitHub) and used compression for efficient attribution.
1