0:00
Welcome to my session. Welcome from Austria, Europe, and welcome to 45 minutes of Power BI and Synapse ytics. Simon already says, well, my name is Wolfgang. I'm a data platform MVP. I work as a data consultant in Austria, and I like to work with data
0:24
I started in the data warehousing field and quickly, well, moved into the reporting, data modeling, everything like that stuff
0:35
So in the last five years, I was very, very focused on Power BI and everything around the data platform
0:43
And now with the Synapse ytics, the new stuff that I want to show you in the next minutes
0:50
well, Power BI gets some real power behind the scenes to get data from almost everywhere
0:58
to prepare the data. And that's the topic of today. And what I also like, yeah, it was already in the video
1:07
I like to speak at conferences. I like to organize conferences. authors of the Power BI for Dummies book. So it's in German only, but if you want to learn German
1:19
and Power BI, that would be a great idea. So why is there a requirement for yet another data
1:28
preparation tool? And well, almost every business, no, let's put out the almost all businesses are
1:37
data businesses, because data is essential for companies, for everything to be successful
1:45
And almost 15 years ago, there was one of those sentences which started to be famous
1:54
and it was the data is the new oil sentence. And that one was said 2006. And there is just
2:02
one thing to add. Data itself, it's not the new oil. It's only well-able when it's refined
2:13
But if it's unrefined, you cannot really use it. And so it's almost important to have a look at
2:20
data like water. Water needs to be accessible. Data needs to be accessible. And it needs to be
2:27
clean and it is needed to survive. And if you compare water and data with data for the business
2:37
well, if you have clean data, if you have combined data, you are ready to go and survive on the
2:44
market. So that is one of those things to remember when talking about every reporting or data
2:51
platform because data, well, data is good. But if you have clean data, if you have data that is well organized, that can be accessed
3:02
in a good and usable way, it's a good start. And there are two big streams when talking and dealing with data
3:16
It's the good old data warehousing approach on the right side. So for all those business ysts out there, that's the approach where data is structured
3:26
Data is extracted from source systems. Data is transformed into a way that you can build reports, that you can build a data model based on the data structure
3:37
And data warehouses, well, they are really a defined structure because you are preparing the data
3:45
you are preparing your dimensions, like customer lists, product lists, and everything like that
3:51
And on the other side, you have your measures, like the numbers you want to count, the numbers
3:56
you want to sum up, like your revenue, your costs, and everything like that. And that is predictable
4:01
because you know what are the reports that are required. So you can build your data warehouse
4:07
in a structure that is, in most of the cases, known upfront. On the other side, well, on the
4:14
the right side, we've got the data science approach and the data lake and data scientists
4:21
that are working with the data. So what is the difference between data warehousing, the structured
4:27
reporting approach on the one side and data lake and data scientists on the other side? Well
4:34
data scientists, they work, usually work with huge amounts of data. Imagine a machine
4:42
in your production hall, which is equipped with 60, 100, thousands of sensors, and they produce
4:50
measurements every second, every 10 milliseconds or whatever. So you get huge amounts of data
4:57
and that data is required to build some sort of machine learning models on it. That data is maybe
5:04
a foundation for predictive maintenance. And in order to build machine learning models
5:10
data scientists need to work and need to explore data in the data lake
5:17
And the data lake is a concept for those of you that don't know it. Well, it's a concept of somehow cheap storage to put and store your data there
5:26
but a powerful storage combined with the right technologies to query the data
5:33
to explore the data in there and to build or yze the data to build machine learning models based on that
5:40
So two different worlds, the predicted, well-structured data warehousing side. And on the other side, we have the data scientists that do data exploration that have huge amounts of data and also changes in the structure of the data
5:55
For the data warehouse, where we have a customer table from SAP, for example, we have revenue information, we have orders coming from SAP and so on and so forth
6:07
But for the data lake, we have sensors. We have sensor values
6:12
And over time, machines get better equipped. They get more sensors. There are changes in the measurements
6:18
So the structure changes on the side of the data lake storage
6:23
And also, there are three big parts like data. We've got the data approach, the upper right
6:32
We've got the skills approach. And we've got the technology cluster. over there. So there are many options for every of that three pieces, like the data. We've got
6:43
streaming data, like sensor values coming, are streamed to your data platform. There is structured
6:51
data coming from SAP, unstructured or semi-structured data. There's big data, there's small data
6:57
there's technologies that work with data in the cloud or on-premises. There is Internet of Things
7:05
So completely different set of technologies and data and the technologies that are used to yze the data well, they
7:12
range from data warehousing software like SQL Server, Oracle, SAP Business Warehouse, something like this, to data lake
7:20
software to yze the data, Apache Spark, open source software And for the skills people are essential to work with your data and the skill set of your colleagues Well it can range from a SQL developer
7:36
As I started my career, I work with a SQL server and I prefer to work with a SQL server
7:43
and people working with Python or R or Scala or whatever language
7:49
or specialists in data modeling, other specialists doing data cleansing, data engineering
7:55
So we've got many different approaches, many different directions people are coming from and data is produced and data is coming
8:05
in every color and shape, as you can imagine, and also the technologies
8:10
And those different technologies and combinations, well, they can lead to data silos, too
8:19
So they can, well, somehow lead to a solution where a colleague says, no, I'm not able to work with SQL
8:28
I'm only working with Python. The other colleague says, I'm not into Python
8:33
I'm only working with SQL. And the other one says, I'm working with the data lake, although I'm working on a data warehouse
8:40
So it's not that easy to combine all those different streams of interests
8:48
And in the good old times, well, I'm in the field and working in the data consultants field for some years now
8:57
And the good old times, well, it was the data warehouse. The one I've shown you on the slide, which is the structured data storage
9:07
You have your dimensions, you have your measurements, you have your well-defined reports and everything like that
9:13
And now with the change of requirements in the engineering field, well, is that concept of data warehousing still relevant
9:22
And the question or the answer is, yes, it is relevant. But I'm a consultant, so it's the but that I have to add here
9:35
And whenever we are dealing with those customer requirements, with those workshops, well, it's like you have a wall with, I don't know, 100 different sheets of paper
9:46
And you have different opinions. You have different source systems, formats, like data generation intervals and everything like that
9:55
So it's not that easy to find that one solution that fits it all
10:00
So maybe there's some sort of directions we could go. Either we go to the left side or either we go to the right side
10:09
Or is there another direction we could go, like thinking about working on premises, like in your data center, or using the cloud to solve your requirements
10:21
And what we've seen is the pace of technology. It's not possible to solve everything on-premises
10:31
Because with the cloud, especially with the Azure cloud, you get new technologies, new services, updates, new features in a frequent way
10:41
That's not possible for any of those administrators to install on your local machine
10:47
So let's have a look at a proposed architecture of a data warehousing solution in the cloud
10:56
And that is the data warehousing architecture before we got Azure Synapse ytics
11:02
So what does the data need before it can be shown in a report like Power BI report on that side
11:12
Well, we've got data sources and data can be produced everywhere. It can be produced on premises like your SAP system, your ERP system, whatever vendor you are using
11:27
It can be on premises. It can also live and be generated in the cloud
11:31
It can come from devices like your production machine. It can be generated by a temperature sensor or whatever
11:40
And there can be software as a service data providers that are needed to get your overview, your overview about your enterprise data
11:52
And that is the idea behind the data warehouse to integrate data from different sources and to get the overview about your data, your company's data
12:03
When we've got the data sources identified, where is my cursor? It's here. We need to ingest the data somewhere
12:15
We need to fetch the data from our sources and store it somewhere
12:21
We've got two pieces. We've got the storage and we've got one technology that does
12:27
the data transportation and transformation. Before Synapse, well, we had the Azure Data Factory that did and does a good job to transfer the data from source into some sort of storage
12:42
The storage in that data warehousing architecture, well, it's an Azure Data Lake storage that holds the data
12:50
Now, we've got those two big streams I already talked about, the big data data science approach and the data warehousing approach
12:58
And for those two streams, well, Microsoft had two answers. It depends if you want to work with big data, have a look at Azure Databricks to read the data from the storage, prepare your data and write the results, for example, from machine learning models and so on and so forth back to the storage
13:20
For the data warehousing approach, there was and is the Azure SQL data warehouse, which is now part of the Synapse ytics
13:30
So different ways of data preparation are needed and different technologies were needed
13:38
So there was the data factory to prepare the data, big data approach using Azure Databricks and SQL data warehouse
13:47
And on the ytics side plus the reporting side, we had some sort of systems like Azure Machine Learning, Power BI, and so on and so forth to work with the data that is stored in the data warehouse
14:03
And now, well, those systems, they work great. They do a great job, but those are separate systems
14:11
So we have Data Factory, Databricks, and SQL Data Warehouse, and they have, for example, different security approaches
14:18
And Microsoft said, well, it's a good approach, and those tools are great
14:23
but maybe we could put a new thing, a new label and use some sort of umbrella to solve the data ingestion, the big data and
14:36
the data warehousing side. And what they introduced was Azure Synapse ytics
14:42
And what they did, they did not invent or reinvent the wheel
14:46
So they didn't wrote a new data transformation engine. Well they took the source of Azure Data Factory and they integrated it into the Synapse ytics They took technology that Databricks uses and they integrated it into Azure Synapse ytics
15:07
And I already mentioned it, the SQL data warehousing part, well, it's integrated in Synapse ytics too
15:15
So just to get a little bit closer and into the Synapse ytics environment
15:21
Well, we've got the sources and we've got the data storage on the bottom, the gray part
15:29
It's Azure Data Lake storage, cheap storage, powerful storage, very powerful storage
15:35
And that is used to store our data. That's the brain of our system
15:41
And then we've got work to do. We have to do some ytics
15:46
And there are two different flavors. I already talked about the skills of the people
15:51
So we've got the SQL skill. So SQL related ytic approach. And for the big data approach, we've got Apache Spark based ytic runtimes
16:05
And what is really important are those light blue boxes. It's the integration
16:14
Well, that's the thing to get data into the system. It's the overall management
16:20
So one central management to manage the data lake storage, to manage the data integration
16:27
to manage the SQL and the Spark runtimes. And also if you have implemented something
16:32
well, you need to monitor it. And one of my favorites, security is needed
16:38
And you've got one security model in Azure Synapse ytics and no different security mechanism for Databricks
16:47
for SQL Data Warehouse, for Azure Data Factory. And for the developer, yeah, we've got Azure Synapse Studio
16:55
That is the thing where you start your development, where you monitor your development, where you configure your security and do the management
17:03
So it's the single point of work to do or to start
17:07
And also on the right side, those Power BI and Azure Machine Learning boxes are still there
17:15
And that's one thing. Microsoft said we are integrating the main parts into the Synapse umbrella
17:22
but we have the concept of connected or linked services. So we can integrate like, for example, Power BI or Azure Machine Learning
17:31
or other storage approaches like another data lake storage account, Azure SQL databases, and so on and so forth
17:39
And they work very well with the Synapse. And now, well, many technologies, one umbrella, it should work together
17:49
How does it really look like? And that is the thing I would like to show you in demos
17:55
So we will see the SQL approach, the Synapse SQL approach in the demo
18:02
We will see the Apache Spark approach, so the big data ytics approach
18:08
We will see how to integrate data into the Synapse. And, well, Synapse Studio is the tool of our choice where we can work with the data, where we can work with the artifacts in the Synapse
18:22
And now it's a little bit of different view. It's almost the same I already talked about
18:30
There are two different flavors of the ytic runtimes, and I will go into detail in the demo for that
18:37
And there's one thing on that slide, it's the languages part. I talked about the SQL and the Spark-based data ytic runtimes, but there is a way of your language of choice
18:56
So if you want to work with Python, .NET, Java, Scala, R, whatever, or with a SQL, well, it's your choice
19:07
which language you would like to use to work with Spark. That is a really great approach I've seen in our projects
19:16
I've said it before, I'm coming from the SQL side. I'm a SQL developer
19:22
I started development with databases in SQL Server. In one of our first projects
19:30
I started with the ytic data platform and the cloud and so on and so forth
19:36
Well, I had some hard times to solve problems with SQL, 25, 50, 100 lines of SQL
19:44
And then my colleague, my data scientist colleague, they came and said, hey, well, it's two lines of code in Scala
19:54
It's one line of code in Python. You could do it that way. And then I started to look around the corner and tried some other languages
20:02
And there's a great approach of notebooks built into Synapse that allows you to switch languages during your development
20:12
So that is a very, very nice approach if you know Python, for example
20:16
And if you know a little bit of SQL, you can combine those two different approaches in one of your development approaches
20:23
and the overall concept of synapse ytics and all those services and data storages and
20:31
technologies and outside or linked services to synapse well and also the name of the system
20:39
when we think about our brain we've got brain cells and we've got some synapses in there and
20:47
And we learn, we as a human, we learn when there are additional connections between our brain cells
20:54
And that is the thing where I think Microsoft had some sort of, hey, if we combine data living and well
21:05
Yeah, living in different islands of data. And if we combine them, we can learn and we can get better
21:13
So synapse ytics, well, it has some sort of meaning in the background
21:17
And with that, I'm going to start my demo, and we're going to use Synapse Studio
21:25
So just one question for Simon. Do we have some questions so far
21:33
Not yet, not yet, Wolfgang, but everyone can see your screen. You are good
21:38
Okay, perfect. So Azure Synapse ytics, it's an Azure service, And if you instantiate one, well, if you go here and we want to create a new Azure Synapse workspace, well, where is it
21:58
What we create is the umbrella. And that umbrella needs to be filled with the different pieces
22:07
So what is required in a Synapse ytics? Well, it's called workspace
22:14
It's a SQL or sorry, it's a data lake storage account because Synapse needs some place to store the metadata
22:23
So Synapse workspace there is the need of a primary data lake storage account And if we want to yze data we have to instantiate and configure additional pools run types ytic run types
22:43
And if we have a look here on the overview screen, what we see here is we've got in our demo environment, the SQL pools here and the Apache Spark pools over there
22:56
And I've got two Apache Spark pools in my demo environment. So let's see here
23:03
And those Apache Spark pools, they can have different sizes. And Apache Spark pools, it's always the question of money and costs
23:14
Apache Spark pools are built by the consumption. So you can configure a Spark pool and you only start and use it whenever you need it
23:25
And when the work is done, you can shut it down. And it's almost the same for one kind of SQL pools over here
23:33
We've got two different flavors of SQL pools. We've got the dedicated SQL pools
23:40
and those are the SQL data warehousing, well, yeah, database systems. And they are compared or can be compared
23:51
to the Spark-based building approach because you start your SQL dedicated pool and it's built
23:59
If you pause it, you don't have to pay for the compute
24:04
You only have to pay for the storage. And there's another concept of SQL pool, which is called serverless pool
24:12
And that is a very, very nice approach. You will see it in the demo that you can use to yze data in the data lake
24:20
And that is just there. And it scales whenever it needs more performance
24:25
and it just shuts down or scales down whenever it's not needed anymore
24:30
And that one is built by the amount of data you are processing
24:35
So those are the things we put together and we put into our Synapse workspace
24:42
Now, Synapse Studio, it's the development environment. And I've got one workspace over here, the Kubito Synapse Playground
24:52
And what you see here, those are the four, well, main buttons, main menu items over here
25:02
What we could start with is we could start with testing data, explore and yze data, and or visualize data
25:12
But if you are new to Synapse, I would recommend to start with the fourth button over here, which is the learn button
25:21
And within that learn button, or it's also here in the menu, the knowledge center
25:29
If you start in the knowledge center, you get a huge list of samples
25:34
You get a huge list of data sets, public data sets that you can use and start your data journey in Synapse
25:41
So you can use or create samples directly. So if you want to learn SQL, the SQL serverless approach in Synapse, you can create a sample
25:54
You can create sample data and sample notebooks for work with Spark
26:00
Or we can browse the public data set gallery, like COVID data, taxi data from New York, and so on and so forth
26:09
And you can just start over here if you want to yze the New York taxi data
26:18
And you've got the description, an overview, a preview of data, what is in there
26:22
So I would recommend to have a look at the Knowledge Center if you are new to Synapse
26:28
But now let's dive a little bit deeper. What you see on the left side is the, where is it
26:35
Where's my mouse? It's here. They are called hubs. Well, it's those menu items
26:41
And there is a data menu item that develop and integrate. And those are the three I would like to start with
26:49
The data hub is there to browse your data artifacts in your Synapse environment
26:58
So we've got one tab over here listing the databases. And when we have a little bit of a closer look, we can see that we have different icons over here
27:12
So we've got three different icons. And those are the three different SQL or not, sorry
27:19
Those are the three different ytic pools. So the green ones, those are the dedicated pools, the data warehousing
27:26
And they are powerful because it's not a normal SQL server. It's a cluster
27:32
Imagine you have 60 compute nodes, you have 60 different databases in the background
27:39
Data is distributed, data is replicated, and so on and so forth
27:43
So that is the powerful engine to solve your data problems when dealing with the data warehousing approach
27:50
The red ones, those are databases created for the use of the SQL Serverless approach
27:57
So for data exploration, for data preparation in a little bit. And the last, but not least, the blue ones, those are Spark databases
28:08
So coming from the Spark ytic runtime to, well, do big data ytics to store the data, not in the data lake, back to the data lake, but to store it in Spark databases
28:22
So that is the first thing. So we can have a look here. if you know SQL Server Management Studio
28:28
it's similar to that one. So we can have a look at the few
28:34
we can start that one, we can query the few, we can query the data
28:39
and we can have a look at all those structures like the data warehousing approach
28:45
where we have tables, where we have some sort of tables in here
28:51
and you can have a look at the columns and so on and so forth. So nothing really special over here
28:57
On the other side, we've got the linked services. So those are the data storages that
29:03
are approached and connected to Synapse. We've got our or one data lake
29:08
storage attached to our Synapse workspace, and we've got the containers in there
29:16
What we can do is we can browse those containers. We can have a look at some files in the data lake
29:23
like that one over here, and we can do a right click and some sort of preview
29:28
So what is in that file in the data lake? Imagine we are now, well, working as a data scientist
29:36
We want to know what is stored in the data lake, what can be achieved in the data lake
29:41
what is the structure of our files? So you can just do a right click and do a preview
29:48
What we can also do is we can start data preparation, by using or creating a new notebook
29:56
So let's try that one and What is generated is a notebook
30:02
If you don't know the concept of notebooks, well, have a look at notebooks because they are great
30:09
and they are popping up in different kind of tools. They are in Azure Data Studio
30:15
in Visual Studio Code, in Spark over here. And what's the idea behind a notebook
30:23
It's the combination of source code and documentation and, well, data preparation results
30:30
So you can mix cells, they are called cells. You can have a code cell and you can have markdown cells for your documentation, like over here
30:40
And you can create some sort of markdown documentation over here and write your documentation in there
30:50
You can just edit that and so on and so forth. What you can also do is you can specify the default language within your notebook
30:59
So that is the thing, Power BI developers, well, you can select the language of choice
31:07
Unfortunately, we don't have a Power Query M here, but we will have some sort of Power Query in the data integration, data ingestion part
31:18
So you can specify the language of choice and you can override and change the language within every cell
31:27
So you can start with PySpark, you can change it to SQL in the next, and you can change it to Python, for example
31:35
And all those results generated by one cell can be reused in the next cell
31:41
So very, very powerful approach. What you can also do is you can start yzing and exploring the data lake using SQL
31:51
And that one is using the built-in, the serverless approach. So there's a generated script and that script, well, it reads the content of the file and there is some sort of information we need to add here
32:10
Terminator, like we need to specify the field selection over here, and we also skip the first row
32:28
So we only need the data. And what we can also do is we can add
32:34
like some sort of structure to the result. We can have a look of the data
32:43
So that one is population data from the part of Austria where I live
32:48
So I can do some sort of selection for the town I live in
32:55
and we can have a look at the population. And what do you see here is, well, it's the youngest entry
33:03
The youngest entry, it's coming from 2019. And there is one thing we could do and change in that query is we could change that specific file path to some sort of wildcard
33:16
Because in the data lake, we've got a new and additional file, which has the population of 2020 in it
33:26
If we specify and change it and add that wildcard in here
33:30
we get the content of multiple files in the data. What we can do now as a next step is
33:40
well, we can store the data in a few. No, we can't store the data in a few
33:50
we can create a few. So we can create, create few like population data
33:58
and we can create a new database. So I'm gonna create the new database
34:03
which is C sharp corner and create that one. Change to the new database
34:15
So we are now working in the new database And I create a few back to the workspace browser
34:23
refresh over here. So that is here and we've got the few in here
34:30
and we are directly querying the data lake using a few. So what is there for the Power BI developer
34:40
You would say, well, that can be used as a logical layer above the data lake So you can use the serverless pool to serve as a logical data warehousing layer for example for your Power BI reports
34:57
But before we get into that, well, we need to have a look at the data integration thing
35:04
because data needs to come from somewhere into our Synapse environment. And what we have in here is the Synapse pipelines
35:14
And Synapse pipelines, if you have seen Azure Data Factory, they look familiar and they look similar because it's almost 100% of source code that is coming from the Data Factory and is seen in Synapse pipelines
35:32
And there's one thing. So we are almost at 100% because there's one thing missing in Azure Data Factory
35:39
We've got the power query transformation. We've got wrangling data flows like Power Query
35:45
you know from Power BI Desktop or from Power BI or Power Apps data flows
35:54
So that is an approach that will come to Synapse. So skill set of Power Query transformations can be used in the future
36:05
It's not there today in Synapse pipelines to get data into that environment
36:11
In the development section, we've got a way of, well, querying data
36:18
We can have SQL scripts and we can query data. So we can query a table like having 83 million rows
36:31
or we can query like data or, well, tables having more than 660 million rows and performs on joins, group buys, and everything
36:42
like that. In the background, it's powered by the SQL pools. So we've got the powerful
36:49
the clustered approach in the background here. It's the dedicated pools that are working there
36:56
And well, we've got data prepared and we want to yze it. So we've got Power BI on that side
37:05
And what is there? It was on one of the slides. It's the concept of linked services
37:12
And within your Synapse workspace, we can have a look at the linked services
37:19
And as you can see, I've got some linked services already defined, pointing to SQL, pointing to blob storage, to data lake storage
37:27
And there's one linked services that is pointing to a Power BI workspace
37:33
So what you can do is you can create a link service coming from Synapse workspace and mapping to an existing workspace in your Power BI environment like that one
37:47
So the Synapse integration demo workspace is mapped into Power BI or in Synapse
37:52
Sorry. And let's go over here. It's found here. So we've got our data workspace over here, including the Power BI datasets already there, and also having some reports
38:09
So we can have a look at the reports in your Power BI
38:14
Oh, sorry. I wanted to show you a demo. Let's see if that one works
38:20
And you can have a look at the reports in Power BI
38:24
But what we can also do is we start to create a new Power BI dataset
38:31
And the Power BI dataset, it can be started here. So we have to select the Power BI or the ytic runtime, the SQL runtime
38:43
and we select the newly generated database and download the PBIDs file, the data source file
38:51
start that one and the data model itself, it needs to be created in Power BI Desktop
38:58
There is no way of creating a Power BI data model in the service as of today
39:05
So Power BI Desktop opens and what we see on the next part over here
39:12
it creates by the use of the pbids file a connection to our new population data view
39:21
and we can load that one into our dataset. Next step, next question, import or direct query
39:30
I'm going to select the import, not today. I'm going to select the direct query option
39:36
What we get now is a simple data model containing one table which is mapped to a few in your SQL serverless approach which is querying data in your data lake
39:52
So we can have a look at the population in my hometown over the years, over the last years
40:04
And it's only my hometown. So it's just my hometown in that data set
40:10
So nothing else is in there. If we go back to synapse, if we go back to the definition of our view, we have it somewhere here
40:25
It should be here. And if I change the alter, I changed the view definition
40:33
I just remove the filter for the city and I've changed the definition
40:41
So the view definition is updated. And if I refresh that one over here
40:49
let's see if we get more data. Yeah, we get more data
40:53
So we can have a look at the, What's there? Oh my God, what is wrong here
41:06
Ah, we need to change it again. Not only the first hundred rows
41:13
change it over here, refresh and it's Data, data, data, wait for the data
41:26
So it should update now. Working on it. Yeah. We've got data from different cities in my home region
41:38
Next step is publish the data set. So C sharp population. and we publish that one to the mapped workspace
41:51
So it's the Synapse integration workspace. Select that one. And there's one technical step we need to do
41:59
We need to go to the Power BI workspace over here and it's here
42:06
And there's one thing we need to do. It's the data source credentials
42:11
It's only here for setting two files or setting two options, selecting the right user
42:21
And now we are done with the configuration. So that is not possible today in Synapse, but back to Synapse development
42:30
Come over here, refresh, refresh that one. Let's see if the C-sharp population data set is here
42:43
We can create a new report. We can add some filters. Or for the other users, you can just open the already existing report
42:56
And you can have a look and yze your data without leaving the tool
43:00
So Power BI is integrated here. And there are some parts where Power BI is integrated already
43:08
So unfortunately, it's only one workspace you can map from Power BI to your Synapse environment
43:16
I hope that there will be an option of mapping multiple Power BI workspaces
43:20
And there's one thing within the integration hub that is already there in the data factory and will be there in Synapse pipelines
43:30
It's the Power Query transformation. They are called wrangling data flows. And that is a very, very powerful way to use your knowledge of power query transformations and use it in a concept that is, well, a little bit broader, that is powered by clusters in the background, like the Spark clusters, like data lake storage, like dedicated SQL pools
43:55
And Synapse Studio, you've seen it in the demo. It's the one tool for development in Synapse to create ingestion, to transform your data, to yze your data, and even create reports based on your data models in Power BI
44:15
SQL pools, I talked about that topic. We've got the data warehousing approach and the data exploration, the SQL serverless approach
44:23
And we got the Apache Spark the choose your own language you would like to work with the notebook approach And it a very powerful approach where you can work have a look at the results store the results
44:41
and share it with your colleagues. And you can even automate those notebooks
44:46
For the data ingestion part, mapping data floats is one part of the story
44:51
and those wrangling data flows marked with a star, well, they are available in the data factory
44:58
and its Power Query. You can use it to read data and write it to your Synapse environment
45:06
Power BI and Synapse, they are integrated, as you've seen, create a new data set
45:13
That needs to be done in Power BI Desktop, but afterwards you can use and reuse the data set
45:18
and create open view work with your Power BI reports directly and integrated in Synapse ytics
45:26
And for the data integration part, for those of you that know the Power Platform Dataflows approach, like depicted here with the Power BI Dataflows, they are storing the data in CDM folders, common data model folders in a data lake
45:45
And that is one of those stories that can be extended, like generating data using data flows and reusing that one in, for example, Azure SQL Data Warehouse or Synapse Pipelines or Azure Data Factory to read the data, to write it back over here
46:04
and maybe, well, do some data preparation here and write the data to a CDM folder
46:10
that is afterwards used in a Power BI dataset. So huge connections within the data integration
46:20
and data preparation part between Synapse, Azure Technology, and Power BI and Power Platform
46:29
Sorry for that. And with that, I'm nearing the end of my presentation
46:34
Just a short recap. We've got the Azure Synapse ytics workspace in the middle
46:39
We've got two different big approaches of ytic runtimes, the SQL and the Spark runtimes
46:45
central storage using data lake storage, and the huge part of integrated and linked services
46:52
Power BI, other storage systems, and so on and so forth. And is it everything
46:59
No, there's another thing that is coming that is planned for August 2021
47:05
It's the performance accelerator for Azure Synapse ytics. Well, what's that? Imagine a Power BI report that is used heavily
47:14
that is connecting to a SQL dedicated pool in the background, and it's using almost the same query over and over
47:23
And what the performance accelerator does, it generates so-called materialized views in the background
47:31
And that is a very powerful thing because with a materialized view, you define a select statement
47:40
And that select statement, well, the results of that select statements are cached
47:45
They are stored. They are persisted. And whenever there's a change in one of those source tables, those materialized views are updated
47:57
And now imagine those statements are used in Power BI and the materialized views are generated automatically
48:07
So it's a very, very powerful way of improving the reporting performance and the speed of your results in the Power BI reports
48:18
And to go a little bit further and above, like 10,000 meters above, it's the overall data landscape in your environment
48:27
It's not only Synapse that connects and collects data from somewhere. It's the data that is stored somewhere in your system
48:37
And there you need some sort of data catalog. Data governance is one part of the story
48:42
And there's a new service, which is called Azure Perfue, which does the scanning, which does the classification of data, which does the data lineage
48:53
So where is your data generated? What is done during the data journey
48:59
And what is the result? So Azure Synapse is part of the whole story
49:05
Power BI is part of the whole story. And they are working very, very well together
49:11
And with that, I'm at the end of my session