We spent the last year working closely with OpenAI to build GitHub Copilot. We've been using it internally for months, and can't wait for you to try it out; it's like a piece of the future teleported back to 2021.
Conversation
Replying to
I'm also curious about the licensing implications, both for the data corpus used and the resulting code proposals.
1
34
Replying to
I did, but that's quite rudimentary. The code it produces is ... derived ... from publicly available code? Possibly suggesting patterns from code it has seen that came with implied patent/IP grants?
I like it a lot, and clearly IANAL, but I'd like to get a bit deeper into this.
2
29
Additionally how does it handle licenses that don’t allow for derivative use without credit but are still open source
2
18
There definitely needs to be an FAQ answer specifically about licensing, , the FOSS-Legal community is already asking...
2
5
15
I find it somewhat depressing that the first reaction of senior engineers to new and exciting tech these days is "Oh, but how about legal", it shows we all have PTSD ;-)
5
3
27
In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler.
We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we're eager to participate!
Thanks for the clarification. Can’t wait to use it in that case.
1
Definitely an interesting policy discussion! Here's a post from folks earlier in the year.
2
2
12
In hindsight, I should have done that masters in international intellectual property law.
3
Thanks Nat! There's clearly more to be said though - for example, Github are requesting information on solutions that appear copied from a training source so perhaps have concerns over the extent to which training leads to copyright exhaustion.
1
1
7
One thing we are working on is reducing accidental recitation of training data – already extremely rare today – and we have written a paper about what we are doing to prevent that entirely, linked here:
2
10
I guess until people stop making it public...including on GitHub
Might be worth thinking about a robots.txt equivalent for this.
1
1
8
Show replies
Mmmmm I am skeptical about (1). Copilot uses the entirety of the copyrighted works, and for a commercial purpose. I'm not aware of any case law that has tested this position either.
My take: MIT licensed works unambiguously qualify. APL could, but you have unmet obligations.
3
25
The use of the blanket term "public data" only increases my concern that Github isn't treating intellectual property with the care it should.
1
32









