0:00
So in replet we have a configuration
0:02
file called.relet and we don't want the
0:04
AI to edit it because it can easily
0:06
break the system. So initially prompting
0:09
it and telling it would like not work
0:11
because at some point it gets convinced
0:13
that this is the only way to solve
0:14
problems. It would like ignore any of
0:16
your prompts and go edit it anyways. I
0:18
was like okay we're just going to make
0:20
it so that we're going to throw an error
0:21
when you try to edit the file. We did
0:24
that. we throw an error and in there we
0:25
tell it like don't edit it and it still
0:29
at some point hits a point where it's
0:31
like I really need to edit it. It's the
0:33
only way that I'm going to solve this
0:34
problem. So I'm going to write a script
0:36
and then run that script to edit it and
0:39
it works because I think it spins up a
0:41
different like Linux user and that that
0:43
had permissions. It's like oh [ __ ] like
0:46
it's getting around our protection. And
0:49
so then we created a real sandbox where
0:51
you really can't edit that file and it
0:54
so it hit all these issues and then it's
0:58
like hm I'm going to like social
1:00
engineer the user into editing this file
1:04
it it goes back to the user like hey um
1:09
you should like here's a piece of code
1:15
Um, so yeah, there there are some like
1:18
early signs of that sort of behavior.
1:20
And so when I look at these instances, I
1:28
orientedness and some
1:32
around getting to that goal in a sort of
1:35
like a dumb like a sort
1:39
savant dumb way and like could this be
1:43
dangerous? Yeah, I think in some cases
1:45
it could like destroy data. It could
1:47
like harm the users. In some cases, you
1:50
really want to care about
1:52
this. Could this create a catastrophe? I
1:56
just don't see it yet. Are we preparing
1:59
for it? We are prepared for it in that
2:01
we have human actors that are trying to
2:03
hack Replet all the time for their own
2:04
needs. We've had people do crypto
2:06
mining. We had people like trying to
2:07
attack other websites. And unfortunately
2:09
like replet was like a lot more open and
2:12
had less limits but like the amount of
2:14
abuse that humans throw our way have
2:17
made it so that we had to close some
2:18
systems down and add like a lot more
2:20
protections and limit a lot of things.
2:23
So I don't see AI be any different than
2:26
us like battling bad human actors.
2:30
Look, I'm always willing to update my
2:32
view and as we're watching this and as
2:34
we're using these systems is if I felt
2:37
the their ability to scheme and to
2:42
misunderstand objectives and goals and
2:44
and get to a point where they're
2:46
actually potentially doing really
2:47
destructive and harmful things. I I
2:50
to invest more in security and safety.