0:00
Hi everyone and welcome to a new episode
0:01
Hi everyone and welcome to a new episode
0:02
Hi everyone and welcome to a new episode of AI Dev Tools. Today I'm your host
0:03
of AI Dev Tools. Today I'm your host
0:04
of AI Dev Tools. Today I'm your host Simon and we are back once again. In
0:06
Simon and we are back once again. In
0:06
Simon and we are back once again. In this episode we are going to learn about
0:08
this episode we are going to learn about
0:08
this episode we are going to learn about preventing AI model drift continuous
0:10
preventing AI model drift continuous
0:10
preventing AI model drift continuous retraining strategies for production.
0:12
retraining strategies for production.
0:12
retraining strategies for production. And for this episode we have Amit Aurora
0:15
And for this episode we have Amit Aurora
0:15
And for this episode we have Amit Aurora who is a software engineering manager.
0:23
[Music] Hi Amit and welcome to the show.
0:26
Hi Amit and welcome to the show.
0:26
Hi Amit and welcome to the show. Hello everyone. Um and thanks for
0:29
Hello everyone. Um and thanks for
0:29
Hello everyone. Um and thanks for watching and joining this one. Um as
0:31
watching and joining this one. Um as
0:31
watching and joining this one. Um as Simon said, I'm Ahmed Dora, software
0:33
Simon said, I'm Ahmed Dora, software
0:33
Simon said, I'm Ahmed Dora, software engineering manager. So I'll be talking
0:35
engineering manager. So I'll be talking
0:35
engineering manager. So I'll be talking about the topic that's increasingly
0:37
about the topic that's increasingly
0:37
about the topic that's increasingly important as the AI system move from the
0:39
important as the AI system move from the
0:39
important as the AI system move from the research labs to real world deployment.
0:42
research labs to real world deployment.
0:42
research labs to real world deployment. Uh how to prevent model drift and image
0:44
Uh how to prevent model drift and image
0:44
Uh how to prevent model drift and image classifications using continuous
0:46
classifications using continuous
0:46
classifications using continuous retraining strategies. Uh the
0:49
retraining strategies. Uh the
0:49
retraining strategies. Uh the presentation is based on my recently
0:51
presentation is based on my recently
0:51
presentation is based on my recently published article in the journal of
0:52
published article in the journal of
0:52
published article in the journal of computer science and technology studies.
0:54
computer science and technology studies.
0:54
computer science and technology studies. I'll try to share real world evidence uh
0:57
I'll try to share real world evidence uh
0:57
I'll try to share real world evidence uh practical architectures and
0:59
practical architectures and
0:59
practical architectures and recommendations to keep model performant
1:00
recommendations to keep model performant
1:00
recommendations to keep model performant and trustworthy in production
1:01
and trustworthy in production
1:02
and trustworthy in production environment. We're going to talk about
1:03
environment. We're going to talk about
1:03
environment. We're going to talk about how do we prevent model drift in image
1:06
how do we prevent model drift in image
1:06
how do we prevent model drift in image classification using continuous
1:08
classification using continuous
1:08
classification using continuous retraining strategies. Uh I'll move to
1:10
retraining strategies. Uh I'll move to
1:10
retraining strategies. Uh I'll move to the actual content. Uh first talking
1:14
the actual content. Uh first talking
1:14
the actual content. Uh first talking about recap of how far we have come as
1:17
about recap of how far we have come as
1:17
about recap of how far we have come as far as the
1:19
far as the accuracy of image classification is
1:20
accuracy of image classification is
1:20
accuracy of image classification is concerned. I think in 2011 the top tier
1:24
concerned. I think in 2011 the top tier
1:24
concerned. I think in 2011 the top tier accuracy on standard benchmark was
1:27
accuracy on standard benchmark was
1:27
accuracy on standard benchmark was around 50%.
1:28
around 50%. Uh and then alexnet brought it to 63% in
1:32
Uh and then alexnet brought it to 63% in
1:32
Uh and then alexnet brought it to 63% in 2012 and by 2023 architecture like effet
1:36
2012 and by 2023 architecture like effet
1:36
2012 and by 2023 architecture like effet v2 uh pushed it over 90%.
1:40
v2 uh pushed it over 90%.
1:40
v2 uh pushed it over 90%. uh despite these advances, production
1:42
uh despite these advances, production
1:42
uh despite these advances, production model deployed in real environment often
1:45
model deployed in real environment often
1:45
model deployed in real environment often see performance decay due to evolving
1:47
see performance decay due to evolving
1:47
see performance decay due to evolving inputs and context and that's where the
1:51
inputs and context and that's where the
1:51
inputs and context and that's where the uh drift becomes critical.
1:53
uh drift becomes critical.
1:53
uh drift becomes critical. Moving on to the next uh
1:57
Moving on to the next uh
1:57
Moving on to the next uh slide the I I think without like if you
2:01
slide the I I think without like if you
2:01
slide the I I think without like if you don't do the proactive maintenance model
2:03
don't do the proactive maintenance model
2:03
don't do the proactive maintenance model accuracy roads by 1.5 to 3% every month
2:08
accuracy roads by 1.5 to 3% every month
2:08
accuracy roads by 1.5 to 3% every month and can collapse by 15% within a
2:10
and can collapse by 15% within a
2:10
and can collapse by 15% within a quarter. Uh enterprises are now
2:13
quarter. Uh enterprises are now
2:13
quarter. Uh enterprises are now allocating around 30% 35% of their ML
2:16
allocating around 30% 35% of their ML
2:16
allocating around 30% 35% of their ML budgets just to model maintenance. So
2:19
budgets just to model maintenance. So
2:19
budgets just to model maintenance. So the takeaway here is very clear. Um so
2:23
the takeaway here is very clear. Um so
2:23
the takeaway here is very clear. Um so deploying a high accuracy model isn't
2:26
deploying a high accuracy model isn't
2:26
deploying a high accuracy model isn't the finish.
2:28
the finish. Basically it's it's the starting point.
2:30
Basically it's it's the starting point.
2:30
Basically it's it's the starting point. That's where you start when you once you
2:32
That's where you start when you once you
2:32
That's where you start when you once you have deployed your uh high accuracy
2:35
have deployed your uh high accuracy
2:35
have deployed your uh high accuracy models.
2:40
Um so this chart this is very
2:43
Um so this chart this is very
2:43
Um so this chart this is very interesting chart. It shows how the
2:46
interesting chart. It shows how the
2:46
interesting chart. It shows how the static models degrade over time. Like
2:50
static models degrade over time. Like
2:50
static models degrade over time. Like you can see the false positive increased
2:53
you can see the false positive increased
2:53
you can see the false positive increased by over 7% per month and false negatives
2:57
by over 7% per month and false negatives
2:57
by over 7% per month and false negatives are up by like almost similar like 6.8
3:01
are up by like almost similar like 6.8
3:01
are up by like almost similar like 6.8 percentage every month. So I mean and
3:04
percentage every month. So I mean and
3:04
percentage every month. So I mean and these just aren't the numbers that
3:06
these just aren't the numbers that
3:06
these just aren't the numbers that models are degrading right so these
3:08
models are degrading right so these
3:08
models are degrading right so these represents your lost revenue because of
3:10
represents your lost revenue because of
3:10
represents your lost revenue because of the model getting degraded. It's a
3:12
the model getting degraded. It's a
3:12
the model getting degraded. It's a manual overhead. You have to actually
3:14
manual overhead. You have to actually
3:14
manual overhead. You have to actually the things which are not able to catch
3:16
the things which are not able to catch
3:16
the things which are not able to catch by the model. You have the manual
3:18
by the model. You have the manual
3:18
by the model. You have the manual overhead there. And of course the poor
3:20
overhead there. And of course the poor
3:20
overhead there. And of course the poor user experience depending on where the
3:22
user experience depending on where the
3:22
user experience depending on where the model is deployed.
3:24
model is deployed. So longer we delay the retraining the
3:28
So longer we delay the retraining the
3:28
So longer we delay the retraining the the problem will keep compounding. You
3:29
the problem will keep compounding. You
3:30
the problem will keep compounding. You can imagine the first month what the
3:32
can imagine the first month what the
3:32
can imagine the first month what the problem you're having in couple of
3:33
problem you're having in couple of
3:33
problem you're having in couple of months the problem is just keep
3:35
months the problem is just keep
3:35
months the problem is just keep compounding and problem is becoming
3:36
compounding and problem is becoming
3:36
compounding and problem is becoming intense and intense.
3:42
So talking about the real world
3:44
So talking about the real world
3:44
So talking about the real world consequences like I mean we talked about
3:46
consequences like I mean we talked about
3:46
consequences like I mean we talked about like theoretical the percentage numbers
3:48
like theoretical the percentage numbers
3:48
like theoretical the percentage numbers and all this stuff but what are the real
3:50
and all this stuff but what are the real
3:50
and all this stuff but what are the real world consequences? I'm bringing a few
3:51
world consequences? I'm bringing a few
3:52
world consequences? I'm bringing a few examples in different industries which
3:54
examples in different industries which
3:54
examples in different industries which can help you relate the real world
3:56
can help you relate the real world
3:56
can help you relate the real world consequences of uh the model drift.
3:59
consequences of uh the model drift.
3:59
consequences of uh the model drift. Right? So if you if you just pick up
4:01
Right? So if you if you just pick up
4:01
Right? So if you if you just pick up manu manufacturing industry uh image
4:04
manu manufacturing industry uh image
4:04
manu manufacturing industry uh image classification can lose up to 25 27%
4:08
classification can lose up to 25 27%
4:08
classification can lose up to 25 27% accuracy within 6 months. Like if you
4:11
accuracy within 6 months. Like if you
4:11
accuracy within 6 months. Like if you translate that to that will translate to
4:13
translate that to that will translate to
4:13
translate that to that will translate to roughly $3.2 2 million annual waste and
4:16
roughly $3.2 2 million annual waste and
4:16
roughly $3.2 2 million annual waste and potential like you can have bunch of
4:18
potential like you can have bunch of
4:18
potential like you can have bunch of different product recalls
4:20
different product recalls
4:20
different product recalls in healthcare which is uh critical
4:23
in healthcare which is uh critical
4:23
in healthcare which is uh critical industry even if the model starts with
4:26
industry even if the model starts with
4:26
industry even if the model starts with 94% sensitivity right so that's pretty
4:29
94% sensitivity right so that's pretty
4:29
94% sensitivity right so that's pretty high sensitivity it can drop to 78%.
4:34
high sensitivity it can drop to 78%.
4:34
high sensitivity it can drop to 78%. uh and like effectively missing critical
4:37
uh and like effectively missing critical
4:37
uh and like effectively missing critical diagnosis right so which is not
4:39
diagnosis right so which is not
4:39
diagnosis right so which is not something acceptable
4:41
something acceptable uh financially like you can think of
4:43
uh financially like you can think of
4:43
uh financially like you can think of like just thinking of the financial
4:45
like just thinking of the financial
4:45
like just thinking of the financial impact of it like every 1% degradation
4:49
impact of it like every 1% degradation
4:49
impact of it like every 1% degradation leads to the cascade right so 1%
4:52
leads to the cascade right so 1%
4:52
leads to the cascade right so 1% degradation can have more like it
4:54
degradation can have more like it
4:54
degradation can have more like it results in more false positives
4:57
results in more false positives
4:57
results in more false positives uh loss sales and you can imagine the
5:00
uh loss sales and you can imagine the
5:00
uh loss sales and you can imagine the rising the operational cost right So
5:03
rising the operational cost right So
5:03
rising the operational cost right So just 1% and imagine like you're losing
5:05
just 1% and imagine like you're losing
5:05
just 1% and imagine like you're losing 25% you're losing uh bunch of 30%
5:08
25% you're losing uh bunch of 30%
5:08
25% you're losing uh bunch of 30% accuracy in these models like 1% can
5:10
accuracy in these models like 1% can
5:10
accuracy in these models like 1% can have such a huge financial impact.
5:15
So now since we have talked about what's
5:19
So now since we have talked about what's
5:19
So now since we have talked about what's a financial implication what's a like we
5:21
a financial implication what's a like we
5:22
a financial implication what's a like we have talked about some of the
5:22
have talked about some of the
5:22
have talked about some of the theoretical numbers on the model uh
5:24
theoretical numbers on the model uh
5:24
theoretical numbers on the model uh drifts that happens like talks about
5:27
drifts that happens like talks about
5:27
drifts that happens like talks about what are the benefits of if you continue
5:30
what are the benefits of if you continue
5:30
what are the benefits of if you continue if you have continuous retraining of
5:31
if you have continuous retraining of
5:31
if you have continuous retraining of your models what are what will be the
5:33
your models what are what will be the
5:33
your models what are what will be the benefit of it I think one is the
5:36
benefit of it I think one is the
5:36
benefit of it I think one is the lifespan you can actually increase the
5:38
lifespan you can actually increase the
5:38
lifespan you can actually increase the model lifespan by 42%.
5:41
model lifespan by 42%.
5:41
model lifespan by 42%. Uh and then the other one is compute
5:44
Uh and then the other one is compute
5:44
Uh and then the other one is compute savings. You can actually have 60% 62%
5:48
savings. You can actually have 60% 62%
5:48
savings. You can actually have 60% 62% saving uh if you basically do regular
5:51
saving uh if you basically do regular
5:52
saving uh if you basically do regular retraining versus if you do the full
5:53
retraining versus if you do the full
5:53
retraining versus if you do the full retraining of the model.
5:56
retraining of the model.
5:56
retraining of the model. The third one is data efficiency like
5:58
The third one is data efficiency like
5:58
The third one is data efficiency like you will have uh we just need like 53%
6:02
you will have uh we just need like 53%
6:02
you will have uh we just need like 53% of the training volume versus like the
6:05
of the training volume versus like the
6:05
of the training volume versus like the full retraining sample if you have to do
6:06
full retraining sample if you have to do
6:06
full retraining sample if you have to do it. The other one which is I think the
6:09
it. The other one which is I think the
6:09
it. The other one which is I think the the benefit of it is like 43% fewer
6:13
the benefit of it is like 43% fewer
6:13
the benefit of it is like 43% fewer false positives
6:15
false positives like I mean that is a huge percentage in
6:18
like I mean that is a huge percentage in
6:18
like I mean that is a huge percentage in false positive right it directly
6:21
false positive right it directly
6:21
false positive right it directly improves the trust in the AI system the
6:24
improves the trust in the AI system the
6:24
improves the trust in the AI system the so I mean effectively solution is not
6:26
so I mean effectively solution is not
6:26
so I mean effectively solution is not just technical so it's economical and
6:29
just technical so it's economical and
6:29
just technical so it's economical and strategic for basically the the the
6:31
strategic for basically the the the
6:31
strategic for basically the the the retraining requires have that strategic
6:34
retraining requires have that strategic
6:34
retraining requires have that strategic need that that we we should be focused
6:38
need that that we we should be focused
6:38
need that that we we should be focused Um
6:40
Um now we have talked about basically the
6:43
now we have talked about basically the
6:43
now we have talked about basically the benefits. Now what is my like what is
6:46
benefits. Now what is my like what is
6:46
benefits. Now what is my like what is the the high level blueprint I'm
6:49
the the high level blueprint I'm
6:49
the the high level blueprint I'm recommending? So high level blueprint is
6:52
recommending? So high level blueprint is
6:52
recommending? So high level blueprint is we I think first is continuously mon
6:55
we I think first is continuously mon
6:55
we I think first is continuously mon monitor model performance like that's
6:57
monitor model performance like that's
6:57
monitor model performance like that's the first thing like if we don't measure
6:59
the first thing like if we don't measure
6:59
the first thing like if we don't measure it we won't know it right so first step
7:01
it we won't know it right so first step
7:01
it we won't know it right so first step is like are you measuring the
7:02
is like are you measuring the
7:02
is like are you measuring the performance of the model so a lot of
7:04
performance of the model so a lot of
7:04
performance of the model so a lot of time happens like the model is deployed
7:06
time happens like the model is deployed
7:06
time happens like the model is deployed we started with 95% accuracy but after
7:09
we started with 95% accuracy but after
7:09
we started with 95% accuracy but after that we have no idea what's the accuracy
7:11
that we have no idea what's the accuracy
7:11
that we have no idea what's the accuracy of the model is
7:13
of the model is then use smart sampling right to collect
7:17
then use smart sampling right to collect
7:17
then use smart sampling right to collect new and informative data for continuous
7:19
new and informative data for continuous
7:19
new and informative data for continuous retrain
7:21
retrain and I think the third step is uh use
7:23
and I think the third step is uh use
7:23
and I think the third step is uh use incremental training techniques to
7:25
incremental training techniques to
7:25
incremental training techniques to update the model and very important that
7:27
update the model and very important that
7:27
update the model and very important that we have to res preserve the knowledge
7:29
we have to res preserve the knowledge
7:29
we have to res preserve the knowledge the existing knowledge of that model and
7:32
the existing knowledge of that model and
7:32
the existing knowledge of that model and I I think the final step is through
7:35
I I think the final step is through
7:35
I I think the final step is through validation like before deployment so
7:37
validation like before deployment so
7:37
validation like before deployment so that's very important that once you
7:39
that's very important that once you
7:39
that's very important that once you retrain the model uh make sure that the
7:42
retrain the model uh make sure that the
7:42
retrain the model uh make sure that the model is still behaving the way it
7:44
model is still behaving the way it
7:44
model is still behaving the way it should be behaving
7:45
should be behaving uh so
7:48
uh so so So this keeps the model current
7:50
so So this keeps the model current
7:50
so So this keeps the model current without actually fully retraining the
7:52
without actually fully retraining the
7:52
without actually fully retraining the model. So this is this is the highle
7:55
model. So this is this is the highle
7:55
model. So this is this is the highle four steps you need need to follow.
8:01
Um now I'm talking about like
8:04
Um now I'm talking about like
8:04
Um now I'm talking about like like some of the more details in each of
8:06
like some of the more details in each of
8:06
like some of the more details in each of those steps. One is like how do I know
8:08
those steps. One is like how do I know
8:08
those steps. One is like how do I know when to retrain the model?
8:11
when to retrain the model?
8:11
when to retrain the model? So like we can I I think we can rely on
8:15
So like we can I I think we can rely on
8:16
So like we can I I think we can rely on the fixed calendar like every time like
8:18
the fixed calendar like every time like
8:18
the fixed calendar like every time like every month we try to retrain but the I
8:21
every month we try to retrain but the I
8:21
every month we try to retrain but the I I think the my proposal is focus more on
8:24
I think the my proposal is focus more on
8:24
I think the my proposal is focus more on like that's why we are monitoring the
8:26
like that's why we are monitoring the
8:26
like that's why we are monitoring the model performance so focus more on
8:28
model performance so focus more on
8:28
model performance so focus more on monitoring some of the signals from your
8:30
monitoring some of the signals from your
8:30
monitoring some of the signals from your model like you can actually see if the
8:32
model like you can actually see if the
8:32
model like you can actually see if the model have certain drops in confidence
8:34
model have certain drops in confidence
8:34
model have certain drops in confidence scores uh you're you're noticing the
8:37
scores uh you're you're noticing the
8:37
scores uh you're you're noticing the shifts in prediction distributions
8:41
shifts in prediction distributions
8:41
shifts in prediction distributions um you are noticing drift uh like you
8:43
um you are noticing drift uh like you
8:44
um you are noticing drift uh like you can actually use the couple of
8:45
can actually use the couple of
8:45
can actually use the couple of techniques like KL divergence or ve
8:47
techniques like KL divergence or ve
8:47
techniques like KL divergence or ve sustain distance to see if there's a
8:49
sustain distance to see if there's a
8:50
sustain distance to see if there's a there's a drift detected in your model.
8:52
there's a drift detected in your model.
8:52
there's a drift detected in your model. Uh the other one is canary test
8:54
Uh the other one is canary test
8:54
Uh the other one is canary test failures. You can actually look into
8:55
failures. You can actually look into
8:55
failures. You can actually look into those ones. Right? So these adaptive
8:59
those ones. Right? So these adaptive
8:59
those ones. Right? So these adaptive triggers ensures that retraining happens
9:01
triggers ensures that retraining happens
9:01
triggers ensures that retraining happens before pro performance collapse not
9:04
before pro performance collapse not
9:04
before pro performance collapse not after. Right? And then it's also why
9:06
after. Right? And then it's also why
9:06
after. Right? And then it's also why it's like it gives you a real signal
9:08
it's like it gives you a real signal
9:08
it's like it gives you a real signal when to when to retrain your model
9:10
when to when to retrain your model
9:10
when to when to retrain your model versus you guessing it takes the guess
9:13
versus you guessing it takes the guess
9:13
versus you guessing it takes the guess works out of the when to retrain your
9:15
works out of the when to retrain your
9:15
works out of the when to retrain your model.
9:18
model. Now going to the next portion
9:21
Now going to the next portion
9:21
Now going to the next portion like the other one is we talked about
9:24
like the other one is we talked about
9:24
like the other one is we talked about like the we need data for retraining
9:26
like the we need data for retraining
9:26
like the we need data for retraining like what are the efficient data
9:27
like what are the efficient data
9:27
like what are the efficient data sampling techniques we can use to
9:29
sampling techniques we can use to
9:30
sampling techniques we can use to retrain our models. So I I think the
9:32
retrain our models. So I I think the
9:32
retrain our models. So I I think the main thing we talked about we want to
9:33
main thing we talked about we want to
9:33
main thing we talked about we want to minimize the amount of data we need
9:35
minimize the amount of data we need
9:35
minimize the amount of data we need right so we can actually focus on these
9:37
right so we can actually focus on these
9:37
right so we can actually focus on these four techniques. One is uh uncertaintity
9:40
four techniques. One is uh uncertaintity
9:40
four techniques. One is uh uncertaintity sampling. So we targets the input that
9:42
sampling. So we targets the input that
9:42
sampling. So we targets the input that model struggles with right. The other
9:45
model struggles with right. The other
9:45
model struggles with right. The other one is diversity sampling. So instead of
9:48
one is diversity sampling. So instead of
9:48
one is diversity sampling. So instead of having lot of repetitions we avoid
9:49
having lot of repetitions we avoid
9:50
having lot of repetitions we avoid repetitions and we try to cover the
9:51
repetitions and we try to cover the
9:51
repetitions and we try to cover the entire feature space. Uh the other one
9:54
entire feature space. Uh the other one
9:54
entire feature space. Uh the other one is adversarial sampling. So add the edge
9:57
is adversarial sampling. So add the edge
9:57
is adversarial sampling. So add the edge cases to strengthen the robot robustness
10:00
cases to strengthen the robot robustness
10:00
cases to strengthen the robot robustness of the models and I think last but not
10:03
of the models and I think last but not
10:03
of the models and I think last but not least is active learning. Uh basically
10:06
least is active learning. Uh basically
10:06
least is active learning. Uh basically selectively involves human annotators.
10:08
selectively involves human annotators.
10:08
selectively involves human annotators. Um I mean so all these techniques can
10:11
Um I mean so all these techniques can
10:11
Um I mean so all these techniques can help us avoid reduce the labeling cost
10:14
help us avoid reduce the labeling cost
10:14
help us avoid reduce the labeling cost and uh retraining cycle significantly
10:17
and uh retraining cycle significantly
10:17
and uh retraining cycle significantly right. So I mean we that's a that's a
10:19
right. So I mean we that's a that's a
10:19
right. So I mean we that's a that's a goal of the retraining and that's one of
10:20
goal of the retraining and that's one of
10:20
goal of the retraining and that's one of the advantage of the retraining on a
10:22
the advantage of the retraining on a
10:22
the advantage of the retraining on a regular basis.
10:30
So one major risk so we talked about like
10:33
one major risk so we talked about like
10:33
one major risk so we talked about like different uh facets of uh retraining how
10:36
different uh facets of uh retraining how
10:36
different uh facets of uh retraining how to focus on those one triggers and
10:39
to focus on those one triggers and
10:39
to focus on those one triggers and basically efficient sampling.
10:41
basically efficient sampling.
10:41
basically efficient sampling. One major risk with continuous learning
10:43
One major risk with continuous learning
10:43
One major risk with continuous learning is catastrophic forgetting. So so where
10:47
is catastrophic forgetting. So so where
10:47
is catastrophic forgetting. So so where what I mean by that is where the model
10:49
what I mean by that is where the model
10:49
what I mean by that is where the model forgets what it used to know. to prevent
10:52
forgets what it used to know. to prevent
10:52
forgets what it used to know. to prevent this. I think uh like I'm suggesting a
10:54
this. I think uh like I'm suggesting a
10:54
this. I think uh like I'm suggesting a couple of things to to make sure that we
10:57
couple of things to to make sure that we
10:57
couple of things to to make sure that we prevent this one. This one is very
10:58
prevent this one. This one is very
10:58
prevent this one. This one is very important that models learn new things
11:01
important that models learn new things
11:01
important that models learn new things versus forgetting the the which which is
11:03
versus forgetting the the which which is
11:03
versus forgetting the the which which is already new. So one is elast elastic
11:05
already new. So one is elast elastic
11:05
already new. So one is elast elastic weight consolidations. Uh this will help
11:08
weight consolidations. Uh this will help
11:08
weight consolidations. Uh this will help to preserve important weights. Uh the
11:10
to preserve important weights. Uh the
11:10
to preserve important weights. Uh the other one is knowledge distillations. Uh
11:13
other one is knowledge distillations. Uh
11:13
other one is knowledge distillations. Uh so we want to make sure that we transfer
11:15
so we want to make sure that we transfer
11:15
so we want to make sure that we transfer the wisdom from old to new models.
11:19
the wisdom from old to new models.
11:19
the wisdom from old to new models. And the third one is replay buffers for
11:22
And the third one is replay buffers for
11:22
And the third one is replay buffers for historical data. Right? So we actually
11:24
historical data. Right? So we actually
11:24
historical data. Right? So we actually use replay buffers for historical data
11:26
use replay buffers for historical data
11:26
use replay buffers for historical data and gradient regularization
11:30
and gradient regularization
11:30
and gradient regularization to avoid harmful updates. So the the
11:33
to avoid harmful updates. So the the
11:33
to avoid harmful updates. So the the whole goal of this exercise is uh
11:36
whole goal of this exercise is uh
11:36
whole goal of this exercise is uh helping the model learn something new
11:39
helping the model learn something new
11:39
helping the model learn something new without forgetting which is already
11:41
without forgetting which is already
11:41
without forgetting which is already mastered which is very critical. Right?
11:43
mastered which is very critical. Right?
11:43
mastered which is very critical. Right? So we want to basically just focus on
11:46
So we want to basically just focus on
11:46
So we want to basically just focus on the retraining of new things and making
11:48
the retraining of new things and making
11:48
the retraining of new things and making sure that previous knowledge is
11:50
sure that previous knowledge is
11:50
sure that previous knowledge is preserved.
11:56
Okay. So couple of key takeaways.
12:00
Okay. So couple of key takeaways.
12:00
Okay. So couple of key takeaways. Um I think to sum it up here are four
12:05
Um I think to sum it up here are four
12:05
Um I think to sum it up here are four four key takeaways right. First is like
12:10
four key takeaways right. First is like
12:10
four key takeaways right. First is like use drift detection methods
12:13
use drift detection methods
12:13
use drift detection methods using KL divergence and variance metrics
12:15
using KL divergence and variance metrics
12:16
using KL divergence and variance metrics and other methods to see if there is
12:18
and other methods to see if there is
12:18
and other methods to see if there is model is drifting right so it goes back
12:21
model is drifting right so it goes back
12:21
model is drifting right so it goes back to measuring right so measuring is a
12:23
to measuring right so measuring is a
12:23
to measuring right so measuring is a major focus uh be proactive in figuring
12:26
major focus uh be proactive in figuring
12:26
major focus uh be proactive in figuring out if any drift is happening for your
12:28
out if any drift is happening for your
12:28
out if any drift is happening for your model or not
12:31
model or not um and then within your uh ML life cycle
12:36
um and then within your uh ML life cycle
12:36
um and then within your uh ML life cycle build adaptive retraining workflows
12:39
build adaptive retraining workflows
12:39
build adaptive retraining workflows with sampling and incremental updates,
12:42
with sampling and incremental updates,
12:42
with sampling and incremental updates, right? So you want to make sure that
12:43
right? So you want to make sure that
12:43
right? So you want to make sure that you're sampling your you build the
12:45
you're sampling your you build the
12:45
you're sampling your you build the efficient sampling techniques so that
12:47
efficient sampling techniques so that
12:47
efficient sampling techniques so that the sampling volume is low and you do
12:51
the sampling volume is low and you do
12:51
the sampling volume is low and you do incremental updates. So you build like
12:53
incremental updates. So you build like
12:53
incremental updates. So you build like the you're able to quickly update your
12:56
the you're able to quickly update your
12:56
the you're able to quickly update your model.
12:57
model. And then the third one is you validate
13:01
And then the third one is you validate
13:01
And then the third one is you validate to make sure that the new model is
13:04
to make sure that the new model is
13:04
to make sure that the new model is learning new things with the old
13:07
learning new things with the old
13:07
learning new things with the old knowledge preserved, right? So you want
13:09
knowledge preserved, right? So you want
13:09
knowledge preserved, right? So you want to make sure you want to be super
13:11
to make sure you want to be super
13:11
to make sure you want to be super critical that you're not losing the old
13:13
critical that you're not losing the old
13:13
critical that you're not losing the old knowledge of your models and you're
13:14
knowledge of your models and you're
13:14
knowledge of your models and you're learning the new things in the model. So
13:16
learning the new things in the model. So
13:16
learning the new things in the model. So uh validation is super important and it
13:18
uh validation is super important and it
13:18
uh validation is super important and it has these dual purposes, right? So one
13:20
has these dual purposes, right? So one
13:20
has these dual purposes, right? So one is your model can you can validate your
13:23
is your model can you can validate your
13:23
is your model can you can validate your model, learn new things. The other one
13:24
model, learn new things. The other one
13:24
model, learn new things. The other one is you didn't lose your old knowledge of
13:26
is you didn't lose your old knowledge of
13:26
is you didn't lose your old knowledge of the model.
13:28
the model. Uh I think the the fourth thing which is
13:30
Uh I think the the fourth thing which is
13:30
Uh I think the the fourth thing which is very important is which helps you give
13:33
very important is which helps you give
13:33
very important is which helps you give the sense of how critical retraining
13:36
the sense of how critical retraining
13:36
the sense of how critical retraining regular retraining of the model is that
13:38
regular retraining of the model is that
13:38
regular retraining of the model is that you track the total cost of the
13:40
you track the total cost of the
13:40
you track the total cost of the ownership.
13:42
ownership. uh you have to like once you employ
13:44
uh you have to like once you employ
13:44
uh you have to like once you employ retraining you will see that you're uh
13:47
retraining you will see that you're uh
13:47
retraining you will see that you're uh like how much your return you want to
13:49
like how much your return you want to
13:49
like how much your return you want to calculate your return in how much fewer
13:51
calculate your return in how much fewer
13:51
calculate your return in how much fewer errors you are seeing right or false
13:53
errors you are seeing right or false
13:53
errors you are seeing right or false positive you're seeing with the regular
13:54
positive you're seeing with the regular
13:54
positive you're seeing with the regular retraining
13:56
retraining uh how what is the lifespan of the model
13:59
uh how what is the lifespan of the model
13:59
uh how what is the lifespan of the model is it increasing or not uh how much
14:02
is it increasing or not uh how much
14:02
is it increasing or not uh how much compute is actually how efficient the
14:04
compute is actually how efficient the
14:04
compute is actually how efficient the compute is becoming is it going down uh
14:07
compute is becoming is it going down uh
14:07
compute is becoming is it going down uh is it increasing right so like I mean
14:09
is it increasing right so like I mean
14:09
is it increasing right so like I mean you will see the the cost coming down
14:11
you will see the the cost coming down
14:11
you will see the the cost coming down your compute was coming down as you
14:13
your compute was coming down as you
14:13
your compute was coming down as you retrain on a regular basis. So I think
14:16
retrain on a regular basis. So I think
14:16
retrain on a regular basis. So I think tracking that is very important and
14:18
tracking that is very important and
14:18
tracking that is very important and especially with the existing system you
14:19
especially with the existing system you
14:19
especially with the existing system you have or existing life cycle ML life
14:21
have or existing life cycle ML life
14:21
have or existing life cycle ML life cycles you have right. So start tracking
14:23
cycles you have right. So start tracking
14:23
cycles you have right. So start tracking it right away before you employ the
14:26
it right away before you employ the
14:26
it right away before you employ the retraining strategies.
14:32
Um the last thank you for time today. Uh
14:37
Um the last thank you for time today. Uh
14:37
Um the last thank you for time today. Uh I hope this session offered you
14:39
I hope this session offered you
14:39
I hope this session offered you practical insights into building
14:41
practical insights into building
14:41
practical insights into building resilient and sustainable AI system for
14:43
resilient and sustainable AI system for
14:43
resilient and sustainable AI system for the real use cases. I mean this project
14:45
the real use cases. I mean this project
14:46
the real use cases. I mean this project was really close to me and I'm excited
14:47
was really close to me and I'm excited
14:48
was really close to me and I'm excited to see more organizations
14:50
to see more organizations
14:50
to see more organizations and uh individuals and institutes employ
14:55
and uh individuals and institutes employ
14:55
and uh individuals and institutes employ embrace this continuous learning as part
14:57
embrace this continuous learning as part
14:57
embrace this continuous learning as part of their uh production AI strategy.
14:59
of their uh production AI strategy.
15:00
of their uh production AI strategy. Thank you.