The issue with doing this using code under permissive MIT, BSD, etc. licenses is that you would have to include a ridiculous number of licenses notices to properly respect attribution. For GPL and other copyleft licenses, you can't distribute code with additional restrictions.
Conversation
GitHub is playing fast and loose with the rules. Open source licenses are as enforceable as proprietary software licenses. There's no reason that it would be legal to do this with open source licensed code without respecting the licenses but not code under proprietary licenses.
1
5
Their choice of only including code under what they consider valid open source licenses is a strong indication they know the licenses still apply and just don't consider it something likely to massively blow back on them especially since it's mostly not their software infringing.
3
This Tweet was deleted by the Tweet author. Learn more
Direct copying is in no way required for it to be infringement. If you're using 10 GPL code repositories fed into machine learning model to generate code, the code produced by that is clearly under the GPL and is licensed under the GPL. There's no question at that small scale.
1
5
Whether it's a derivative work is what matters, not direct copying. What Microsoft / GitHub is counting on is that CoPilot is diluting the origin of the code enough that it usually won't be feasible to prove that there is infringement on code owned by a specific entity.
1
6
Also, CoPilot is just a tool. GitHub's infringement would be distributing the machine learning model, etc. The infringement in terms of shipping the code generated from it would be by the people using CoPilot. That's their legal problem, not GitHub's legal problem.
3
This Tweet was deleted by the Tweet author. Learn more
I think dilution of the sources is essentially going to be the legal argument for it. Maybe that can be worked around with a class action suit with a ton of open source developers participating where it's far harder to deny code was substantially code on their work as a whole.
1
3
There's no way I could personally succeed in filing a lawsuit against someone for using CoPilot for example because the amount that their code is based on mine is minuscule. Direct copying would just be an easier way to prove it but it's not what is meant by it being derivative.
1
2
Look at en.wikipedia.org/wiki/Clean_roo for the approach many companies and open source projects have taken to implementing compatible replacements for software, hardware, etc. You can't just use GPLv2 code as a reference to write proprietary code, etc. even without copying any directly.
