0:00
Hello, hello everyone and welcome to the Cloud Show
0:05
Today I'm going to talk to a good friend of mine, his name is Joe
0:08
and we are going to talk about a very important topic because everybody asks about how to cut costs in the cloud
0:16
How do you spend only, you know, the cloud has this promise
0:21
You only pay for what you use. So how do you achieve that
0:26
How do you only pay for what you use and not over provision
0:29
overpay for more than what you actually are going to be actually using and your users need
0:37
That is a great topic for today's episode of The Cloud Show
0:53
Hello, my friend. How are you, Joe? Good. How are you doing, Magnus
0:57
I'm doing very well. Thank you. Thank you for asking. So you're over there at home now. You were in Mauritius just recently
1:05
Yeah, I was in Maritius for about two weeks. Gorgeous, gorgeous place
1:10
Wife didn't want to leave. Our host kept talking to us, oh, you can move down here now
1:15
You know, we can help you find a place. Wow. That's dangerous. You'd have to like hold your wife's airs
1:20
It's like, no, no, no, no, no, no, no, no, we're not. So where is home otherwise
1:27
I'm in Phoenix, Arizona, in the West. in the western part of the United States
1:32
Yep, yep. West-Southwest, sort of. Yep, in the southwest part. Yeah, perfect
1:38
We've been averaging around 115 degrees Fahrenheit or 47-ish, 48 Celsius
1:49
Yeah, I was just going to say, what's that in normal degrees? It sounds like when you do it in the hundreds
1:58
Yeah. right um and and professionally you are um a team leader for for various technical teams and cloud
2:07
stuff that seems you have been uh for many years of yes okay tell us as a friday as a friday i'll be
2:17
unemployed so but open for new opportunities folks folks for other uh for the last 10 years basically i've been
2:27
leaving teams in our migration for kind of moving stuff on-prem into the cloud where it seemed fit
2:38
Part of that migration has been modernization of the apps, taken older.net apps or dot-net
2:44
framework apps moving on to dot-net core, and then seeing if it makes sense for them to be in the cloud
2:51
or first day on-prem. Some of our high availability or, or high risk apps meeting like if they're down the businesses down a lot of those we moved
3:03
into the cloud just for you know safety scalability reliability etc yeah all right that's that's all
3:12
really really cool work because i i would know i do i do much of the same and i love doing that
3:19
and again to it's like to go into like the topic of this this conversation today uh we were talking about
3:26
What should we talk about? What does cloud leaders need to know about
3:31
And I think they need to know, you know, pick your brain, about how to pay only for what you use
3:40
That is one of the promises of the cloud, right? You only pay per use
3:45
But how do you achieve that? How do you get to a point where you are spending exactly the amount that you need for your application to run and your users to be happy
3:56
And you're only paying for that much. How do you do it? It's a bit challenging, obviously
4:05
You have to know what you're using, and that's one of the keys
4:09
figuring out what you do. When you're a software engineer, it's very easy to pick the largest instance of the largest
4:18
you know, storage space of the largest thing because you're not paying for it, you know, in theory, right
4:25
So what a lot of, of us do is just pick the highest of everything. And then, you know, when we get that magical bill at the end for, you know, thousands and thousands of dollars, everyone goes around says, oh, no, what do I do
4:40
Yeah So there are a couple of different tricks that you know we tend to do One is we start off small and then add as needed But it the as needed part that always hard Like how do you know when you need to add something or not And that where kind of the open telemetry comes in You know if you on Azure it App Insights if you on AWS it X
5:08
These provide tools to kind of look into your application and see what you're using. Yeah
5:15
Because it's important to know if you have a server, let's say I have a big honking beast that, you know
5:23
as 32 processors and, you know, 128 gig in memory. Using Open Telemetry, you can look at it and see
5:31
well, that machine is using 2% CPU the entire time. Maybe I don't need that big of a machine
5:39
I can start scaling it back. So that's where telemetry really comes in and lets you see
5:43
into your app, how it's being used, who's using it, where it's being used from
5:49
You can also save cost. Let's say you have a CDN and you're set, you know
5:53
across six different continents. What if you only have users in two of those continents
6:00
Why bother with those other, you know, extra CDN resources? So CDN is a content delivery network where you put your files closer to the user
6:11
So when they download a file, it's instant. Like, for example, if I share my favorite cat video on Facebook, all my friends around the
6:19
world, you can watch it in Arizona and people can watch it everywhere. and for everyone it appears to be close to them
6:25
It appears to start straight away. But if everyone were downloading it from like here in Malmo, Sweden
6:31
wherever I posted it from, it would take forever or it would be lag
6:36
So with a CDN, you just put everything everywhere for everyone to be close to the data
6:41
which means you have to duplicate the data everywhere. And that adds cost
6:45
Good point. That adds cost as well as, you know, how resilient or how quick you want it
6:52
Yeah, yeah, definitely. That's always a good trade-off there. But, you know, coming back to that point, you were saying about how the 2% CPU thing
7:02
that was always the case, right, before the cloud, that you had to calculate like a worst-case scenario
7:08
how much would you need in incapacity at the very worst case
7:12
And then you kind of had to buy and provision hardware for the worst case, because otherwise you'd never be able to accommodate it
7:19
Which means that across the world, all of servers, are running at like 10% or less or, you know, that's, you know, 10% is probably a good number
7:29
Yeah, and that's where a lot of, you know, virtualization techniques came in place
7:34
you know, things like VMware and stuff that help you maximize that
7:38
So they handle the load balancing. But, you know, in the cloud world, we have a lot of that, a lot of people want to control
7:45
their servers by adding their own VMs or scaling up their own version of SQL server or their
7:51
own version of, you know, Oracle or what have even web servers. But, you know, Microsoft and the other
7:59
cloud providers have realized that it's not necessary. There's lots of ways when you scale that up
8:05
So they created, you know, different services like app service that lets you share the resources
8:10
and you want. You can get a much, you know, more scalable one. But for basic sites, you know
8:18
they have smaller instances. And by using those, you're saving on a lot of those, the cost of
8:25
getting your own custom VM. Yeah. Yeah. It really does. And that's about, I suppose
8:31
the term of right-sizing, right? You have to know which size is right, because there are many
8:37
many sizes of things. They can pay for a large, large size. But if you're not using it, you're
8:42
spending money. And the stockholders in Microsoft or in the US, right, they
8:48
They salute you, but it's not good for you. It's not good for your business. They're spending the wrong thing
8:53
It's like buying a bus at home when you only have two people in your house and get away with a small car
8:59
That's right. That's a very good example of a similar situation. So right sizing is important, but also the fact that no application, no matter which, is always used at the same requirement, at the same capacity, around the clock
9:17
around the week, around the weekend or whatever, right? Every application has always highs and lows, right, in terms of usage
9:27
So what do you do for managing that? I know you can scale to meet the load requirements
9:36
What's that about? So the easiest thing to do is there are in application insights or Azure monitor
9:46
depending on what version of documentation you're looking at. They have lots of tools with alerting as well as custom reports that you can do So you can kind of watch the trends and see there they even have trend reports saying Fridays at 5 p or 1700
10:07
You'll see that, you know, your app spikes, but outside of that, it doesn't
10:11
So you can run those, that custom telemetry to kind of watch for the trends
10:16
Also looking at the storage because, you know, sometimes, you know, sometimes that's overutilizing
10:24
to or sometimes you turn on things and you don't forget or you forget about them
10:29
I have a not so funny story. At one point, I was building a side app for one of my open source projects and I deployed it and I forgot
10:42
I didn't forget about it, but work came up and I stopped working on it
10:46
And about a month later, I get a notification from Azure that my account has been turned
10:54
off because I hit my spend limit. And I'm like, how do I hit my spend limit? I've been using
11:00
this forever. It turned out I made a change to the app. And when I deployed it to prod, I didn't turn
11:06
I didn't turn off one of the telemetry settings. So I was logging everything to it. So I ended up
11:15
sending one too many logs. And then two, I filled up my storage space that I was using
11:24
using, I was averaging like, I don't know, a mega or two a month and I turned it out to be like 30 gig
11:33
Okay, okay. That's nuts. And then things happen, right? So you need to, in order to break this down
11:42
you need to know what you need to measure. You need to figure that out. What do we need to measure
11:49
to understand the workload that we are having. So it comes down to using the right tooling to put into your application to measure the right things
12:01
Yeah. Right? Yeah. And the good thing is that out of the box with Azure Monitor App Insights
12:09
there are a lot of good reports for the common things. You can see like where you're throwing the most errors
12:17
where you're using the CPU, you can run even cost reports. There's reports that show you how you're generating costs
12:27
or how you're generating expenses and which one of those resources are generating the expenses
12:34
Yeah. And you can start to narrow it down from there to see
12:38
If you're looking at a SQL server and an app insight or app service with a website
12:47
in an API, you can see that, well, my SQL server is generating the most
12:51
So maybe I should look at that and see if I'm using the right size SQL server for it
12:58
Okay, it sounds a little bit like a detective job, right? Yes
13:03
Yeah, you're actively debugging. Like, if you're a software engineer going in and figuring out the code here
13:09
you're figuring out, like, what's up. There's no real easy way to kind of tell where things
13:17
are you kind of have to figure out the use yeah you there and that makes yeah that makes a lot of
13:25
sense it really does um you can hear me still now i can now you can all right good good good um
13:36
that's the beauty of being totally blank oops all right um well you're back or you're still here we
13:45
saw you the whole time you just looked like a question oh okay um yeah
13:49
i'm a big turk off sorry about that uh so where was that well we were we were saying that
13:57
it's kind of like a detected job and you have to focus on like you're trying to find things right
14:04
yeah the good thing is that there's a lot more clues available to you to see that it's just you
14:12
you don't have to know where to look, but you don't have to know what to look for, which is key
14:18
And your architects or system admins or site reliability engineers should understand that because they're the ones in most cases, provisioning your equipment
14:29
They should be able to know what to look for. The same thing, like if you had an on-site DBE to manage your SQL instances or your database, it's
14:42
is they know technology, they know what science to look for. You need people to be able to watch that and understand what it is
14:52
Yeah yeah definitely And just to like throw almost like a curveball into that once you got it all perfect once you got it all set right You may even have been able to configure auto or something like that which is brilliant if and when it works right
15:12
It's maybe a little bit tricky to put it right, but now you have it all set
15:17
Your site is auto-scaling, it's all beautiful. And then all of a sudden, the biggest problem that could ever hit any application
15:23
ever occurs. And that is that you become successful. Because obviously your architecture was never built for that amount of success
15:34
And that's a whole other almost ballgame, right? When you maybe have made some assumptions which were real when you started
15:44
But then much later, oh my gosh, the same thing is not true anymore
15:48
And now you're in a bind. Yeah, so, and that's kind of where the, you know, app monitoring comes in
15:58
You're constantly watching and looking for that. And, you know, sometimes you learn a lot that the cloud, the way it's configured, might not work
16:07
Like we started talking about in the beginning, you start off with a virtual machine that has everything and it's too much
16:13
So you scale down to a, you know, software as a service, like the app service or for SQL
16:21
service. You might realize that now you've grown and it's time to, you know, go back to those
16:26
but that's where that constant monitoring is. Eventually, you know, most applications can outgrow
16:36
You would have the same problem if you're on-prem, right? Like there are apps I've worked on where
16:41
between 10 o'clock in the morning and three or four in the afternoon, the apps were basically
16:48
useless because we're way too many people on there. So we had to figure out ways, you know
16:53
to bring those up to scale or just pay for, you know, dead weight in the hours
17:02
Right, right, exactly. And that's, that really is a challenge because you didn't know before
17:08
but at a certain level of usage, at a certain capacity requirement, you will find your
17:15
bottlenecks. You didn't know you had them until you hit a certain level
17:18
of usage and then all of a sudden something chokes like okay so many users are hitting the website
17:25
and now we have an issue with the fact that you know it can be weird things like all of a sudden
17:31
a SQL server report is kicking off right at the peak and that never happened before but it
17:37
happened now and now nobody can use anything and that's when you really need to really
17:44
need to measure the right things and you need to be on top of that game. Yeah. It's so common how often that happens too
17:51
It's unfortunately the thing. Yeah. The previous job, we moved all the transactional kind of things to like two in the morning to prevent that
18:03
That meant we had a window from like 10 p.m. to 2 a.m. to get everything else done because
18:10
those jobs kicked off when they kicked off, everything was basically useless
18:14
Yeah, yeah, exactly. And that's what I love about replicas, which you can have
18:19
which you basically can just configure. For example, SQL Server now that we're talking about it
18:23
you can just configure, I want a replica of that database. So let that database continue to service the users
18:30
That's fine. But on my replica, the whole other capacity right over there
18:34
away from the users, I will run a heavy report on that. And if I don't want the replica, once I've done with my report
18:41
I can trash it again. It's gone. And that's a beautiful way of approaching those things or getting around them, right
18:51
Yeah. I love that. Okay, cool. So to sum it up, we need to know what we're measuring and we need to put the right measurements in place
19:02
And that's sort of the way to understand how to pay for what you use only
19:07
But one additional thing to that is just got figuring it out
19:11
your measuring and monitoring is always changing. It's not configure these alerts and I'm done
19:19
No, no. As you start to learn more about your applications and your systems
19:24
you may need to tweak that or change different things. As you start fine-tuning pieces
19:30
you'll find other potential areas to find two. Yeah, brilliant. Those are wise words
19:36
I'm going to let you have the last say in this episode. Brilliant to have you with us on the Cloud Show, Joe
19:44
I hope you come back sometime in the fall. And thanks everyone for watching the Cloud Show today
19:51
Hi. Thanks for having me. Bye