Conversation

This Tweet was deleted by the Tweet author. Learn more
Replying to and
Direct copying is not required for it to be copyright infringement. GitHub knows they aren't respecting the licenses and are counting on courts deciding that it's fair use to train their machine learning models based on the code. It's also really their users infringing, not them.
10
This Tweet was deleted by the Tweet author. Learn more
Replying to and
Open source licenses permit doing this but the code produced by the tools needs to respect the licenses which means the tool needs to be able to generate attribution information based on everything that it was derived from and copyleft licenses will constrain the licensing used.
1
4
The issue with doing this using code under permissive MIT, BSD, etc. licenses is that you would have to include a ridiculous number of licenses notices to properly respect attribution. For GPL and other copyleft licenses, you can't distribute code with additional restrictions.
1
3
GitHub is playing fast and loose with the rules. Open source licenses are as enforceable as proprietary software licenses. There's no reason that it would be legal to do this with open source licensed code without respecting the licenses but not code under proprietary licenses.
1
5
Their choice of only including code under what they consider valid open source licenses is a strong indication they know the licenses still apply and just don't consider it something likely to massively blow back on them especially since it's mostly not their software infringing.
3
This Tweet was deleted by the Tweet author. Learn more
Replying to and
Direct copying is in no way required for it to be infringement. If you're using 10 GPL code repositories fed into machine learning model to generate code, the code produced by that is clearly under the GPL and is licensed under the GPL. There's no question at that small scale.
1
5
Whether it's a derivative work is what matters, not direct copying. What Microsoft / GitHub is counting on is that CoPilot is diluting the origin of the code enough that it usually won't be feasible to prove that there is infringement on code owned by a specific entity.
1
6
This Tweet was deleted by the Tweet author. Learn more
Replying to and
I think dilution of the sources is essentially going to be the legal argument for it. Maybe that can be worked around with a class action suit with a ton of open source developers participating where it's far harder to deny code was substantially code on their work as a whole.
1
3
Show replies