Conversation

Really getting tired of the “but they trained it on our code” chat. GitHub has hosted your code for free, forever. Enjoying GitHub Actions? How about GHCR? When your license your code as MIT, and other permissive licenses, you free it up for anybody to make money from it.
23
142
Replying to
MIT still requires attribution: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."
23
This Tweet was deleted by the Tweet author. Learn more
Replying to and
Direct copying is not required for it to be copyright infringement. GitHub knows they aren't respecting the licenses and are counting on courts deciding that it's fair use to train their machine learning models based on the code. It's also really their users infringing, not them.
10
This Tweet was deleted by the Tweet author. Learn more
Replying to and
Open source licenses permit doing this but the code produced by the tools needs to respect the licenses which means the tool needs to be able to generate attribution information based on everything that it was derived from and copyleft licenses will constrain the licensing used.
1
4
The issue with doing this using code under permissive MIT, BSD, etc. licenses is that you would have to include a ridiculous number of licenses notices to properly respect attribution. For GPL and other copyleft licenses, you can't distribute code with additional restrictions.
1
3
GitHub is playing fast and loose with the rules. Open source licenses are as enforceable as proprietary software licenses. There's no reason that it would be legal to do this with open source licensed code without respecting the licenses but not code under proprietary licenses.
1
5
Their choice of only including code under what they consider valid open source licenses is a strong indication they know the licenses still apply and just don't consider it something likely to massively blow back on them especially since it's mostly not their software infringing.
3
This Tweet was deleted by the Tweet author. Learn more
Whether it's a derivative work is what matters, not direct copying. What Microsoft / GitHub is counting on is that CoPilot is diluting the origin of the code enough that it usually won't be feasible to prove that there is infringement on code owned by a specific entity.
1
6
Also, CoPilot is just a tool. GitHub's infringement would be distributing the machine learning model, etc. The infringement in terms of shipping the code generated from it would be by the people using CoPilot. That's their legal problem, not GitHub's legal problem.
3