Building an Empire - Implementing Power BI Step by Step || Power Platform Virtual Conference
Nov 6, 2023
Trying to implement a BI project can be extremely tedious, take unreasonable amounts of time and in the end turn out to be something that the users didn't want.
This session goes through how to use Power BI and dataflows to iteratively build a BI solution - starting from the absolute beginning with just an excel file, moving to combine data from several data sources, and finally explaining how to use Power BI dataflows and shared datasets to deliver maximum analytical power to the right users with a minimum of wasted time. You will walk away with new ideas on how to start and grow your BI projects as well as an insight into Power BI dataflows and its uses.
About Speaker:
Alexander is a principal solutions architect at Solution Xperts in Linköping, Sweden. Nobody really knows what it means to be a principal solutions architect, but he spends his days teaching courses or helping clients of all shapes and sizes to take better care of their data.
He has spent the last 22 years toying with data, databases and related infrastructure services such as storage, networking and virtualization, occasionally emerging from the technical darkness to attend a Star Wars convention somewhere in the world.
He is a Data Platform MVP, frequent international speaker, podcaster, Pluralsight author, blogger and a Microsoft Certified Trainer, focusing on SQL Server, Azure and Power BI.
Conference Website: https://www.2020twenty.net/power-platform-virtual-conference/
C# Corner - Community of Software and Data Developers
https://www.c-sharpcorner.com
#conference #powerplatform #powerbi #csharpcorner
Show More Show Less View Video Transcript
0:00
Thank you so much for everybody attending
0:04
And one thing that really struck a chord, the charity that is chosen in this case, I
0:12
used to be a paramedic many, many, many years ago, but I have the utmost respect for anyone
0:18
who collects money for a charity that is connected to first responders
0:24
All right. Welcome to Building an Empire, implementing Power BI step by step
0:30
So every story has a beginning. And as most stories do, this one started with a phone call
0:44
I was asked to come out to a customer that had jumped in both feet first into Power BI
0:50
And they had embraced the whole self-service approach, i.e. anything goes. and now they wanted to discuss the state of the project
1:01
I thought I detected an undertone of desperation, so I prepared for the worst
1:08
It wasn't enough. So, as I said, I used to be a paramedic
1:13
and the first thing you do at a scene of an accident
1:17
is what's known as triage. And this is to quickly get a grasp of the work ahead
1:22
and to start prioritizing patients. And that was pretty much what I had to do, because the issues in this case were absolutely legion
1:30
The data sources were slow. The reports were hammering the data sources
1:35
The quality was absolute crap. The security was no, no security at all
1:42
The work was endlessly repeated, and no one was certifying any data anywhere
1:48
Something needed to be done, stat. As, well, the patient was very, very rapidly expiring
1:54
So how do you build an empire? Well, you build an empire on control. You build an empire on order. You build an empire on structure. But you also build your empire on efficiency, on empowerment, and simplicity
2:10
What I saw coming into my customer was none of that. It was the ugly aftermath, if you will, of rampant self-service BI where control was nowhere to be found, structure was completely unheard of, and empowerment meant that anyone could do exactly as they pleased
2:28
That is not an empire. That is chaos. us. And that's why we'll today go through how to use Power BI to empower users to achieve more
2:42
hopefully without the chaos part, because that kind of sucks. We're going to look at the why
2:47
the how, and the what, or the why, what, and how of Power BI data flows and data sets. What are
2:53
they and how do they enter into this huge equation hopefully we're going to see a few tips along the
3:02
way as well because you might recognize some of the issues that my customer faced you might even
3:09
face some of them right now my goal is to give you food for thought on how to make how do you
3:16
how you might take back control and increase order, efficiency, and empowerment
3:22
As Simon said, I'm Aime Alexander. I have 23 years in IT, pretty much all of it spent with data
3:29
databases, infrastructure in some way, shape, or form. These days I work for solution experts in Sweden
3:34
as a principal solutions architect, which is probably the best title I've ever had
3:39
Nobody has a clue of what I do. Well, that's kind of easy to explain, really
3:45
because I make data matter because only data that matters can inspire any change
3:51
And I do this, or I should say I used to do this by going all over the world
3:55
speaking at conferences, teaching courses, and doing consulting. But as everybody else, since the world ended back in March 2020
4:03
I've been stuck inside my home office. I know exactly where my four walls end
4:09
I'll tell you that much. I'm one of seven data platform MVPs in Sweden
4:13
I co-host a podcast called the needy bintech and yes for you who are in my age or older that is the
4:20
day of the the doom game logo and just a small aside I'm a bit of a star wars fan like a very
4:28
big star wars fan so yeah but there will be preciously little star wars in this specific
4:35
session. I have another session that has a lot more Star Wars. So in order to find ourselves on
4:43
common ground, let's backtrack a bit and make sure that everybody is on the same page because
4:50
my customer had this issue. And let's pretend just for a minute that we are the San Francisco
4:58
of police departments just pluck that out of the air and we want a dashboard that shows the the
5:05
crime rate per police district and we also want it broken down by category and time so whenever i i
5:12
click around on my my um power bi report well it's going to behave like a normal power bi report but
5:19
i can dig into the data and i can visualize whatever i'm seeing but this didn't happen just
5:26
automatically magically suddenly one tuesday morning i have this beautiful report and no it
5:31
is not beautiful never mind that so where did this come from we we started with the data source
5:38
most data starts somewhere and we brought the data from this data source into power bi unfortunately
5:48
it is never just plug and play because some data needs to be tweaked some data types maybe need
5:53
to change, some data should be removed, and, well, it might not be relevant, and, well
6:00
you never know what you need to do. And this is done through Power Query, or with the language M, and this is generally the
6:08
domain of the data engineer. All right. When we've done the, as Patrick LeBlanc of Guy in the Cube calls it, massaging of data
6:18
we are ready to punch it into Power BI. because when the data is in a format
6:24
that we can basically make sense of, it is time to put it into the hands
6:28
of those who know the data best, the business. And armed with a good data set
6:34
it's up to them to create the nice visuals, basically supporting the ytical efforts
6:41
because it doesn't matter if you have the best data ever. If you can't visualize it, it's useless
6:48
So this generally requires a number of competencies And the primary two being visual communication and the ability to write data ysis expressions, DAX code
7:02
So what happens when we do a refresh in Power BI? I want to see updated data, right
7:09
So I click the refresh button. What's going to happen is it is going to reach out all the way to the data source
7:14
and it is going to grab the data in the data source and pull it
7:18
screaming and kicking in some cases, through the data wrangling step the Power Query step So all the steps that we have defined in Power Query well they get applied and then we have the data So you can view the Power Query steps as a recipe if you will
7:36
A number of steps, a number of things you'll do to the data before it becomes cooked for the consumer
7:44
So we can instantly see the value of Power BI in this context because we don't need to bug IT
7:51
We don't need to learn complex ETL or extract transform load systems, and we do not need to wait for results
8:00
If only all the data we ever needed was neatly packaged in one file
8:05
How amazing wouldn't that be? Yeah, no. So let's look at a slightly more advanced example
8:14
Let's see if the weather plays a role in the crime statistics from San Francisco
8:19
because it turns out that we can get access to historical weather from the American NOAA
8:26
the National Oceanic and Atmospheric Administration. And again, it is in CSV format
8:32
So let's look at bringing that into the report from earlier. Now, the file is easy enough, but there is a lot of columns here that are not relevant
8:43
or even empty because, and the main issue is that TAVG or the average temperature is not
8:53
it doesn't contain anything either. So we'll need to calculate that. So choosing the columns means that we only see what we need to see
9:00
And in this case, it's precipitation. It needs to change the type
9:05
And something that is always fun if you're not an American is that you need to do the
9:09
change with locale. Otherwise, you're going to learn everything you ever needed about decimal comma or decimal point
9:17
Now it is in numeric. And as we know, this is precipitation in inches
9:21
So I'll change the name of the column to precipitation inches. Now I'm going to add a column because I want to see a precipitation in millimeters as well
9:33
And that means that we need to do some math. and that's basically just multiplying the number by 25.4 as there are 25.4 millimeters to one inch
9:44
We change the type again. This one we can do with decimal number because I am already in a Swedish
9:50
setup and I move things around and give it a name. And we have the T max and the T min
9:58
That's the temperature maximum and temperature minimum of Fahrenheit. And again, we need to have an average
10:06
So that's basically just T max divided by, or T max plus T min and divided by two
10:15
That's it. And then we need to convert this to a number as well
10:22
And then we need a custom column to convert this to Celsius
10:26
and I'm not going to bore you to tears with the calculations and that's why I already have it
10:31
just ready for copying here or pasting and there we go boom we now have a column that shows us the
10:40
numbers in celsius because that's what we we do over here we're going to call this the weather
10:45
data 2016 sf 2016 and voila we now have what's um applicable and available for us to use in in
11:00
the power bi so let's let's add this to the report and we're going to be putting this on the lower
11:08
right visual because that's already prepared as a combo chart so we are bringing in the new data
11:16
and we are taking the t average the thing is it did not put it in the right place because this is
11:23
not a combo chart yet but i can change that so now i have a column the chart and a line chart in the
11:30
same visual i just need to move the number of crimes down to the line but wait wait a second
11:36
Uh, everything is the same. Yes. Everything is the same. Okay
11:46
That's interesting. Why is everything the same? Because we don't have any connections between these two tables
11:54
And in order to fix that, we need to drag the date tag column for, for the date
12:00
table to the date column in the weather data table to enable them to talk to each other
12:06
We need to create a relation in this case, because when we do that, suddenly, well, everything
12:12
works just the way we want it to work. And then we can change the title, number of crimes per month
12:19
Well, we are going to add and weather, average temperature, and then we can see if this data
12:27
makes any sense. So does the average temperature have any effect on the crime rate
12:34
No, not very much. Okay. So what we've done thus far is basically take the report and make it more complex by adding
12:45
a second data set. A normal report can have many, many data sets
12:50
So this is nothing strange. But every data source is different and it'll need to be wrangled and connected in different ways
12:58
That's just the way things work. But the problem here is starting to compound because when we do a refresh, remember what happens
13:06
I click refresh in Power BI Desktop or in the Power BI service
13:10
What's going to happen is it is going to reach out all the way back to both data sources
13:17
Yes, both data sources at the same time, and it is going to hammer them and grab them and drag the whole thing
13:22
screaming and kicking through the wrangling steps and putting it into Power BI
13:27
okay this means that we have already started to increase the load on the the data sources right
13:37
but what if we don't only have one report but we have three reports or we can just as easily have
13:47
50 users looking at the same report, all clicking refresh at the same time
13:56
Yeah, this is going to get ugly because what's going to happen is all these reports are going
14:02
to be going to the same data source and hammering the data source relentlessly
14:08
This is a great way of killing a database. you're going to have a very set of very very angry dbas and yeah no not a good thing you should you
14:20
should take care of your dba so how how does how do we fix this well what's even worse is
14:29
we might have slight differences all on all these reports what what's to say that someone does not
14:36
just slightly tweak one of the wrangling steps. Suddenly you have deviating data
14:41
And especially in financial reporting, that is a bad thing. So how do we sort that
14:49
Well, I'm very happy you asked because it is time to introduce data flows
14:55
So a data flow is taking a data source adding the power query magic and packaging it into a new data source basically
15:08
So we have our steps. We have our data sources, be it Excel or databases or whatever
15:15
It doesn't really matter. Then we are doing the power query steps, the M code that we're applying on it
15:21
we're dumping the whole thing into something called a power bi data flow which happens to be
15:28
underneath the covers um cdm or common data model or data reverse data inside of a data link but
15:36
don't worry about that for this for the moment and then we put the whole thing into the power bi
15:42
report because this means that i can now build multiple reports off the same data flow meaning
15:50
that I do all the heavy lifting once. And that is kind of cool
15:57
Take, for instance, I usually demo some data from the Swedish statistical agency
16:03
Statistics Sweden. The problem is they have a great API to give you all the data you ever need
16:08
And more than that, the issue is that you're going to get it
16:12
in a somewhat weird JSON format. And in order to get it into a format
16:18
that Power BI can read in an easy manner, I need to do a few steps
16:23
And that is, well, it's not hard per se, but it's a lot of work. So if I just do that once
16:29
and save the whole shebang as a data flow, and then I tell my users
16:34
here you have a curated and ready to go data flow. You can connect to that
16:38
and you don't need to do all the heavy lifting stuff. I already did that and packaged it as a data flow
16:44
People go from meh to yay very, very quickly. And that is a better thing than me
16:52
But there is a few gotchas here. One is, let's go back to the whole idea of refresh
17:00
If I click refresh in my Power BI desktop or in my service, what's going to happen, you think
17:06
Well, we're not going to go back to the original data source because we're going to reach into the data lake
17:12
And we're going to be pulling the data from the data lake to the Power BI report
17:18
and while this even works for all the different reports, you need to do a Power BI data flow refresh
17:29
and it's only the data flow refresh that in turn goes to the data sources
17:35
but we're going to come back to that in a bit. So why are we doing this in the first place
17:44
Because we're going to have several effects. It's all about reusing data
17:50
First of all, we take away the risk for deviating data. And second, we're actually increasing the performance because we're not pulling data
17:59
from the source system every time we're doing a refresh. And third, it's easier for users to find usable data sets
18:06
Instead of having to scour the internet or hoping to find the data source you're looking
18:11
for, well, you can use the already curated data sets. and actually fourth with incremental refresh we can also make sure never to pull in more data than
18:22
has changed uh we're wondering about the incremental refresh it requires premium but that's not such a
18:30
bad thing anymore more of that in a bit all right we're going to dive deeper into the data flows
18:38
here. So what about creating a data flow? How hard is that? Well, not very. There are a couple
18:48
of ways of doing it. And again, to paraphrase Patrick from Guy in the Cube, I'm not lazy. I am
18:54
efficient. And I love being efficient because it means that I can spend more time on the things
18:58
that I want to do, like play with Star Wars stuff or go flying because I fly gliders as well
19:04
So how do you create them? Again, you can either do all the fun work graphically and do all the steps in the Power
19:15
Query as we did in Power BI Desktop. Ah, nah, that's boring
19:20
I don't want to do all the steps again because that's already been done for me because this
19:27
is the M code or the Power Query code. It's the same thing
19:32
This is taken from the desktop. So you can just go advanced editor and voila, you have all the M code you ever need
19:39
Now, you can use this to make the foundation for your Dataflow, because again, Dataflow
19:46
is just Power Query online. So what are we looking at? Well, for starters, this line here, this is where we point out the source file
20:02
and here we're going to run into an interesting problem because the source file is on my laptop
20:10
and that's fine as long as i'm running power bi desktop on my machine
20:17
what happens when i put my data my reporting in the cloud well it's not going to be able to reach
20:26
down to my laptop and i think we're all in agreement that is a good thing
20:30
unless I have a way of doing that. So the normal way of doing this
20:40
or the most obvious way of doing this is to have the data source on-prem
20:45
It might be a CSV file. It might be a database. It doesn't really matter, but it's on-prem
20:51
In order for Power BI to be able to reach this data source
20:55
it needs to go through what's known as the Power BI gateway. Now, the Power BI gateway has a tendency to scare people half to death, and I don't see why, because this is not the huge hole in the firewall that a lot of admins think it is
21:09
This is basically just a connection to two parts of Azure. So Power BI pokes the Power BI gateway and says, would you be so kind as to ship me the data
21:21
It does not reach through it and grab the data and pull it back in
21:25
no no no it is very polite and kindly asks the gateway for for the data so it is a push only
21:32
it's not a pull and okay we have installed the power bi gateway on a machine on-prem it can talk
21:41
to my data source it can talk to the service and basically what's happening is the power bi data
21:47
flows which is part of the service is going to ask the power bi gateway nicely can i please have the
21:53
the data and the Power BI gateway is going to go, well, yeah, sure. And dump the whole shebang into the Power BI data flows
22:00
But there is a way to get around this hassle. And some of you might know it
22:05
some of you might never seen this or thought about it because we can use OneDrive here
22:13
Let's say that the data source in this case is indeed an Excel file or a CSV file
22:19
That's a file. And if I put this on my local OneDrive
22:25
it will automatically sync to the cloud And suddenly I can read the file from the sync destination because it is also in the cloud
22:37
Azure, Office 365, Power BI, hello, we are all living in the cloud
22:44
Suddenly we don't need to go to the gateway to grab the data. However, we are going to need to change
22:49
some of the code in the beginning that deals with how to get to the file
22:54
because we're going to be needing to use what's known as a web API connector instead of a normal
23:00
CSV data source. And there we have it. That's how we're going to do this. Okay, how do we change
23:08
this? Well, we're going to be modifying it in just a second. Let me show you a demo of how to do this
23:18
we're going to start by creating ourselves a new data flow so we're going to go create data
23:23
this this is in the service we're going to start with defining new entities and here we have all
23:30
the different data sources that we can use and this we get more and more every day we're going
23:36
to go for the web api i am going to just input the url to my one drive destination i've also
23:45
already signed in. So I don't need to do that. And then I click next and suddenly, boom, Power Query
23:53
can read the file that I stored in my OneDrive. And it looks exactly the same like the one that I have
24:02
on on-prem. So what I need to do is I need to change the source or let's do this instead. Let's
24:09
just steal this whole thing and paste in the m code that we already had so i i'm using the starter
24:17
part of the graphically fixed um data flow and then i'm pasting in the m code from the previous
24:25
steps that i did in power bi desktop then i can just call it um again the something something um
24:33
weather and I store the data flow. I'm going to save the data flow as weather data SF 2016
24:44
But danger, Will Robinson, danger, because this is where you get the refresh now
24:51
or set of refresh schedule. What is a data flow? Well, it is essentially a power query
25:01
set of Power Query statements, right? It does not contain the data yet
25:10
I need to refresh at least once in order for the data to be pulled in
25:15
and be stored inside the data lake. And either I can do a refresh once
25:20
that means that the data will never change, or I can set a schedule that gives us a scheduling operation
25:28
like for once every day. And depending on if you're running Power BI free or Power BI premium or Power BI pro, you're going to have different max number of refreshes per day
25:41
All right. So remember what I told you, I pasted in the whole shebang
25:47
And essentially what I need to change is this. I need to just change, and instead of going to the c colon backslash temp or wherever I have my file, it is now going to be pointed at the destination for OneDrive, i.e. OneDrive is actually SharePoint underneath
26:09
so I have a SharePoint URL kind of neat right and again I have I reuse all the work that I did in
26:19
Power BI Desktop in order to apply that to my Power BI data flow
26:28
so again the whole refresh thing when you set up a refresh schedule or when you refresh the data flow
26:37
What's going to happen is the data flow is going to reach out to the data source and pull the whole thing through and store the result in the Power BI desktop
26:49
No, no, no, no, no. In the Power BI data flow. I mean, Power BI something, and then you can insert so many different words
26:58
Yeah. Okay. But there are a few limitations. and some things that you might be used to be able to do in in power vi desktop and the
27:14
power query workbench there is what's known as advanced data flows because if you want to do
27:25
more complex things such as referencing other entities then we'll add what's known as the the
27:32
computed the enhanced data engine or the enhanced data flows engine. This is part of premium
27:42
And remember what I told you that premium, it used to be outside of the grasp of most
27:49
people because premium, well, a P1 costs roughly $5,000 a month. And one of the interesting things with working with predominantly European customers is that
28:01
most of them, they just scream bloody murder when they see the price tag
28:05
Yeah, no, we're not going to touch that with a 10-foot pole. Now, you can always argue that what you get if you buy a premium
28:12
is so many fantastic things. And I totally agree. But $5,000 a month is a huge sum
28:21
So what just came out and what will be publicly available on April 2nd
28:28
my birthday that's a heck of a birthday present is premium per user yes that means that you can
28:37
license the premium functionality all roughly 90 percent of it as a a single user thing so instead
28:46
of buying the whole shebang for five grand a month you're going to be paying twenty dollars
28:51
per head and that is a steal holy cow you're going to be able to do so much fun stuff with
29:00
power bi premium per user but i get ahead of myself and back to the whole computed entity
29:06
and linked entity a computed entity is when you reference things in power query maybe you want to
29:12
do an aggregated version of the same query that's when you reference another query in power query
29:19
That's a computed entity. You can actually do the heavy lifting inside of the Power BI enhanced compute engine
29:26
and not tax the data source. That's kind of cool. And the linked entity
29:31
well, that means that you're referencing another data flow. So you can kind of make a multi-step data flow, if you will
29:40
One data flow that does something, and then you have another data flow that references that, and then you reference that, and so on and so forth
29:47
And you also, if you buy premium per user or you buy premium
29:52
you get incremental refresh for Power BI data flows. And that is huge
29:58
Incremental refresh basically means you have a high watermark. So if I load, the first load is going to be all the data
30:05
say 2 billion rows. And then you're going to set up a schedule that says every week
30:11
or every month or every day, we're just bringing in the new rows
30:15
That's incremental refresh. And, oh, yeah, that is a lot of fun
30:20
And it is super, super fast as opposed to bringing in all the rows
30:26
of the data set. So let's recap what we've seen so far
30:32
It turns out that this is almost the same Power Query code
30:38
You can get away with stealing roughly 90% of the Power Query code that you've done in Power BI Desktop
30:48
You can just take that and dump it into a Power BI data flow
30:52
That's kind of neat. Again, I'm not lazy. I am efficient. some features do require premium and the first time i did this session i had a lot of people go
31:05
oh this is so cool and then i said it needs premium and then we went nah it's not so cool
31:10
anymore but again that is no longer a hurdle because 20 per head wow and always consider
31:19
the refresh, because if you do not have a refresh schedule set on your data flows
31:28
your users can push refresh to their heart's content. The data is not going to change up in
31:36
the reports. So just be aware of this. So is this the fix-all? No, no, no, no, no. This is one of the
31:49
tools you're going to be needing to build your empire. If we go beyond the data flows, we're
31:54
going to find the next step. And here is where I find that most people conceptually kind of trip
32:01
over themselves. So let's talk about what's not in the data flow. Remember the phenomenon when we
32:09
brought in the weather data, right? This was due to not having any relationships between the new
32:14
weather table and the already existing date table. This is easily solved because I just put in a link
32:24
I have my data model and I just pull in and do the connections
32:31
This is something that people can forget or do wrong. And despite using a good data flow, they might end up with a seriously funky report
32:43
there are more things that are not part of data flows such as um
32:48
measures the entire report that i've shown you has one measure and that is rare normal reports
33:01
have a lot of measures but i i just need one measure to do the demo and it looks like this
33:07
this is just a count of all the number of crimes but depending on the filter context of the report
33:12
it is pretty much the key part of the entire report. And these are part of the data model in the data set
33:19
but they are not part of the data flow. Make sense? Because the data flow is the data source
33:27
and now we're starting to cook something on top of the data source
33:33
But the funny thing is, you can share the data set instead of the data source
33:40
then all the good stuff we've already put into the data model and the data can be shared
33:49
So basically what we're doing here is we're having one data set based off one data source
33:57
and we're doing multiple reports based off that data set. so the whole thing looks like this where we have the data source all the way out to the your left
34:12
and we're doing the data wrangling inside of power bi data flows we're dumping the result
34:17
into a power bi data flow it's it's going to be in a data lake and then we're building
34:23
a single data set on top of that data flow or that data source
34:30
And then we're producing multiple reports based off that data set. So in essence, what you're going to end up with is a data, a PBIX file with just the connection
34:43
to the data source and the different tables such as date tables such as extra dimensions measures whatever you want to have it That it There are no visuals in that file And then you creating a PBX file with no data unless you want to have them there
35:02
but you just connect that to the data set. Now, the very, very cool thing came out very, very recently, and that is the ability to
35:13
do data sets and add your own data to it. Because previously, this meant that the reports that you made
35:22
they were not able to bring in any other data than was already inside of the data set
35:28
And yeah, it was nice, but it kind of sucked because you could not use a data set that somebody else had done
35:36
and add your own data. Now you can. It's in preview and it has some pretty funky behavior
35:43
but it's heralding where we're going, and that will be absolutely fantastic
35:50
I can't wait for this feature to get into GA, but definitely do look at it already
35:57
because it is a total game changer when it comes to bringing your own data
36:02
and any kind of self-service BI. That kind of brings us back
36:09
to the self-service part of things because how do you find the right data set
36:17
How do you find the right data source? There's a saying here that you're not seeing the forest
36:22
because of all the trees. And that's kind of where you might find yourself
36:27
with a lot of data sources and a lot of data sets. You're literally going to drown in them
36:33
So there is a way to add an endorsement to both a data flow or a data set
36:38
And I highly recommend you to do this. It basically means that I can promote a data source or a data set, data flow or data set, and it is going to be on the top of the list of data flows or the data sets
36:57
Or I can certify it. So what does it mean to certify a data source or a data flow or data set
37:04
Well, it means absolutely squat. Nothing. it is not it is a nice tag that's all there is to it so you need to add your own methodology
37:17
behind the certification you can specify who is able to add a certified label to a data flow or a
37:25
data set that's easy but you need to implement a methodology to certify the data before you add
37:33
the tag. That's not part of the tool itself. Okay. So to wrap up the whole data flows discussion
37:45
we're going to find basically this. So we have the data flows because the data flows is going
37:52
to be the data sources, the original data sources, CSV files or databases or whatever it is
37:58
and we are bringing this into a data lake through Power Query
38:05
We're going to do a number of steps, and then we're going to take the code for these Power Query steps, the cogs
38:12
and then we're going to take the data and dump the whole shebang inside of the data lake
38:18
Okay. Then we have the data sets, which are based off the data flow
38:26
So we have reused a lot of the code for the heavy lifting for the data sources, and we put that in the data flow
38:36
Then we're taking the data sets and reusing date hierarchies and date tables and all the kind of good stuff that we want to have inside of a data model
38:49
Here we also create all the measures, and hopefully we're going to hide all the irrelevant data that is not really usable for the next step
38:59
And then we have the cooked and ready data set ready to be handed over to the yst that is going to do the report
39:09
And the thing is, as I told you in the beginning, we have different kinds of people doing this
39:19
Back in the old days, there were a lot of different kind of people doing this
39:25
For instance, if we go back here at the data flows, that's the ETL stuff
39:28
That's the domain of the data engineer. That's a specific skill set, right
39:38
Here we have the data ysts for the most part or the data modelers All right And then we have the most of the time the business people the thing is you not going to find that it different people these days It it probably
39:54
going to be the same people. And especially if you're doing this from a self-service BI perspective
39:59
well, people are going to be doing all these steps by themselves. Okay. So how do we move forward
40:09
how do we move forward from this because we are now looking at an embryo to creating a self-service
40:17
a working self-service environment where we do not endlessly just code we we're going to be
40:26
reusing everything we're going to have good order and all that stuff that we asked for in the
40:30
beginning well we can look at one pretty interesting thing that is the data lake
40:37
because it turns out that all the data flows we've used so far they're using an internal data lake
40:46
inside of power bi you can't touch that you it's just there you don't have to worry about it it's
40:53
fine as is. And I'm going to go ahead and say that 85% of people are more than happy to just
41:01
leave it in the Power BI data lake. But for the last 15% and the people that realize, wait a
41:10
second, I can use this to my advantage. Remember what I told you very, very quickly about CDM and
41:16
Dataverse, right? So CDM and Dataverse, it's the same thing. It changed name from Common Data
41:22
model to Dataverse a while back. Dataverse is a specified nomenclature of ways of storing data entities
41:31
It's basically a file system inside of a data lake that you know exactly how it works
41:38
It has some data. It has some JSON as a definition for that specific folder
41:45
you can instead of using the power bi inbuilt data lake you can move all your data flows into
41:55
your own data lake why would you want to do that well you can read and you can write
42:06
to the data flows from a third party you could be if you were so inclined write your own data flows
42:15
and dump them inside of the data lake. No Power Query or Power BI Desktop ever involved
42:25
You can do this programmatically and then have it accessible to the next step
42:33
And I've seen some pretty nifty ways of doing things because, again, you can programmatically do this
42:40
So in one case, as soon as they added a new data source in their environment
42:47
They also programmatically created data flows for the different entities automatically. All was done through DevOps
42:57
But it also means that I can read the data from the data flow in a third-party tool
43:06
Say, for instance, that I want to use an Azure function that goes to the data lake
43:09
and just picks up the results, the result set, I should say, from a Power BI data flow
43:17
Well, I can do that because it is stored in a data lake, my data lake
43:22
I used to say that a word of warning was don't try this
43:26
unless you want to go for your own data lake because you could not disconnect your own data lake
43:31
You were kind of screwed. This has since been fixed. So now you can disconnect your data lake
43:37
That's a good thing. Okay, let's recap the final part here because what can we more do, or I should say recap
43:49
We can put things into Dataflow workspaces. This is also a highly recommended best practice to put all the Dataflows in their own workspaces
44:02
And this is a great way of separating stuff from each other because you can put reports and data sources and data flows in the same workspace
44:12
Just try to avoid that. Have a specific workspace for data flows
44:17
And then you reference that from your other workspaces and thus from your other apps
44:23
That is definitely the way to go. You should probably implement a data catalog
44:31
A data catalog is this mythical place where all the metadata of your data is Why would you want to have that Well if you have one or two or 15 data sources it really doesn matter because it going to be
44:48
in the head of whomever is working with that. If you come to that person and said, I need the sales
44:55
data for last year, where do I find that? You're going to get a file or a link and just go on your
45:00
way. But if you have thousands of data sources, that's a whole different kettle of fish. And you
45:07
might have raw data sources that has just been put in from the source system. You might have
45:13
different levels of cooked data sources where you look at the changes made from a previous step
45:24
How do you figure out where all your data is? That's where data catalog comes into play, and
45:29
That's where you might want to look at purview. Purview is in preview, but a word of warning with purview, look at the pricing
45:41
I've had a few friends get burned on purview. It's a fantastic tool, but be aware that it is not free or cheap
45:55
And then you want to establish data culture. that is four words that are very very simple to say and so hard to do a data culture is all about
46:09
getting people involved and if you have a proper data culture then a lot of the issues that i my
46:16
my customer from the the beginning were facing they're going to go away but a culture is something
46:22
that gets implemented over time and i highly recommend the blog of matthew roche he he's on
46:29
the power bi cat team microsoft he talks a lot about data culture and the the necessity for
46:36
creating a data culture and also um about what why you need need one because that's that's going
46:43
to be a whole session in and of itself so to summarize this entire session
46:52
order is absolutely necessary. If you just turn people loose on Power BI and go
47:01
yes, we now have a self-service BI solution. It is the equivalent of pulling the pin in the grenade
47:07
And as the saying goes, when you pull the pin, Mr. Grenade is no longer your friend
47:12
And you're going to end up in a ton of issues unless you implement strict order and do so from the beginning
47:22
Reuse will increase efficiency. And again, I'm not lazy. I'm efficient. I want to reuse
47:31
Not only because I can keep my code in one place, but also I'm prone to making mistakes
47:38
And if I reuse code, hopefully I'm reusing debug code so I'm not introducing any more mistakes
47:45
Yes, that means that if you reuse code that is full of bugs, well, then you have more issues
47:50
But that's a whole different story. And empowerment will ensure collaboration. People want to do the right thing. That's just the way people work. But the more limitations you set and the more difficulties you put in their way, the more creative they're going to be to find a way to get to their perceived end goal
48:14
But if you empower them and give them the tools and the training and the guardrails, then people want to really do good stuff
48:26
So in essence, stop inviting chaos and just hope that things will work out for the best
48:35
Because take it from my friend, my customer, they won't. instead use the tools and the people available to work smarter make sure that the business is
48:49
on board and always have a very very clear why as your guiding star take everything you've learned
48:56
this 50 minutes and let's go build an empire with power bi i thank you so much for your time
49:04
my name is alexander and if there's ever anything you need don't hesitate to hit me up on on
49:10
Twitter or send me an email or come grab me at a conference when we can meet in person
49:16
I'm so much looking forward to that. That's it. Thank you so much
#Business & Industrial
#Computers & Electronics
#Data Management


