0:00
GitHub Copilot, your AI pair programmer or is it
0:03
Copilot is more than just an autocomplete plugin. GitHub Copilot is an open AI system called Codex
0:09
It's a way to use AI to generate code snippets, tests, and other boilerplate
0:14
And if you don't like its suggestions, then don't worry because it generates an entire list of different suggestions. Basically, you just install an extension
0:21
in your favorite editor and you're good to go. It works across a ton of different
0:24
languages and frameworks and it works especially well if it's one of the more popular languages
0:29
like JavaScript, Java, Python. Easy peasy. And who needs programmers anyways? I'm just curious, have you used Copilot before? I'd love to hear your experience
0:37
so drop it in the comments. I do read them. So how does GitHub Copilot work? Well
0:41
you drop a comment describing what you want to have happen and it offers you suggestions based
0:45
on billions of lines of public code that was used to train its AI. It's so easy that in a
0:51
matter of minutes, it created an application that I was able to deploy to the app store and it's
0:55
already making me a ton of money. Looks like I'll be out of a job soon, but at least I'm on my way
0:59
to becoming a millionaire. You probably shouldn't even think of becoming a programmer because you're
1:03
just too late. Okay, so I am totally kidding there. It did not build me an application like
1:08
that and I'm not really worried about artificial intelligence destroying the future of software
1:13
development. But I know a lot of people out there are because I get asked it a lot. What does concern
1:17
me is a massive wave of lawsuits. But before we get into that, we have to talk about two important
1:22
concepts that you must understand before everything else makes sense. First, there are a
1:27
ton of different ways companies protect trade secrets and code. It can be as simple as keeping
1:31
all source code private, but valuable information is usually protected through trade secrets
1:37
copyrights, and patents. I'm not an attorney and so this is not meant to be legal advice
1:41
or a comprehensive description of how this stuff works. I am going to simplify it for discussion
1:47
sake. Basically, copyrights protect the actual text that you write, the images, the content that
1:52
you create, but it does not protect the actual ideas or concepts. And it's fairly easy to get
1:57
copyright protection in the United States. Patents are much more involved and they can take years to
2:02
secure and usually cost thousands of dollars. But patents protect the new and unique ideas
2:07
the systems, processes, and stuff like that. Whereas the copyright is just going to protect
2:12
the actual source code you write as it is on the page. The second thing that you need to understand
2:16
is licensing. If you want to use software that has been created by someone else, then you're
2:20
going to have to ask permission. And this permission is called getting a license. For
2:24
example, if you pay for software, you don't actually own the software, you're just paying
2:28
for a license to use it. And that license is going to say things like what you can and can't do with
2:33
it. For example, how many users can access it and how many computers it can be on. Now some
2:38
companies tightly control their code and others decide that they're just going to share it as
2:42
open source. But not all open source projects are the same. Some licenses will let you do pretty
2:47
much whatever you want. You can copy it, build off of it, whatever you can make money with it
2:53
no strings attached. But other license types will let you use their code inside your code for free
2:57
as long as the software product is free. If you end up making software for profit
3:02
then you're going to end up having to pay them for a commercial license. If this is helpful
3:06
smack that like button. But there are some licenses that will really bite you in the butt if you
3:10
aren't careful. For example, some open source licenses may say that if you use their code
3:14
in your project, that you're going to then have to make your software open source. And companies
3:19
don't want to do this. They don't want to include this kind of code, because they can spend millions
3:23
of dollars developing software. And they don't just want some competitor to walk in and be able
3:28
to use their software for free. Because of some open source requirement. This last type of license
3:33
is really important to what we're about to discuss. You see, GitHub Copilot was trained on
3:38
billions of lines of public code stored in public repositories. If you aren't familiar with the term
3:44
repositories, that's just another name for where all of the code for a software project is stored
3:49
Anyways, each of these projects likely has a license file where the owner can specify
3:53
the terms for using the code. In GitHub, you'll find a ton of different licenses
3:58
including the ones that can bite you. This is what really worries me about GitHub Copilot
4:03
And I did a quick search to get a vibe of how other devs feel about this. And the concern is
4:09
not just with me, you see, because Copilot is trained on data from a ton of different sources
4:14
you could pretty easily argue that anything Copilot creates is a derivative of those
4:20
projects. And you must follow the terms and licensing for those projects. But how do you
4:25
know where the code came from? You just don't, it offers you code suggestions, and you don't
4:31
really have a reliable way to trace the origins. I mean, you could take that snippet and you could
4:36
drop it into Google and see if anything similar shows up. But let's see what Copilot has to say
4:40
about this. On their website, there's a section titled does GitHub Copilot recite code from the
4:46
training set, you could pause and read the whole paragraph because it's very interesting. But the
4:51
short answer is no, it claims the code is uniquely generated, except for about point 1% of the time
4:59
when the snippet is verbatim. Now, that might seem like a low risk. But let's be real here
5:04
depending on where that point 1% comes from, you could be talking a multi million dollar lawsuit
5:10
some lawyer shows up at your desk asking you why you stole code from a competitor. What's your
5:15
answer? It wasn't me, it was Copilot. Well, we can't prove that but your personal name is on
5:21
the git blame for that line of code. So you're in deep doo doo unless you subscribe, then I'll
5:26
forgive you no matter how bad you write code. Or what about this other section that talks about
5:31
personal data? Now personal data should be pretty easy to spot in the suggestion. But as a GitHub
5:36
repository owner, I would be even more concerned about what is being stored in my repo. Not that
5:42
you should be storing any kind of personal information there anyways, that's just bad
5:46
practice. Apparently, some companies are concerned enough by Copilot that they're pulling their
5:51
projects from GitHub. But is that the right answer? For starters, I would personally encourage you to
5:56
keep any important and sensitive repos private anyways, instead of making them public. But I
6:02
think this problem is bigger than GitHub and even Copilot. Now I'm a realist and like it or not
6:08
artificial intelligence is going to play a big part of the world we live in. And I think that AI
6:12
is going to be a major disruptor when it comes to challenging the way our society handles patents
6:17
and licensing. I mean, it could completely change the way that we protect trade secrets, how we
6:22
write and interact with our code, as well as handle attributions. Sure, it's probably fun to
6:26
experiment with Copilot for personal projects that you don't plan to commercialize. But until these
6:31
patent laws change and catch up with this use of artificial intelligence, there's just no way in
6:36
heck that I would use it for writing code for my employer. It's just not worth the liability. If I
6:43
write code, I know what I've written, and which libraries I have leveraged, and I will be able to
6:49
explain that. I will know the exact origin of that code. And if you think that this is bad
6:54
then you should watch this video on how one company is using AI to destroy the job interview
6:59
process as we know it. Lates