This video shows how to write inner join SQL Query in Microsoft Fabric Notebook using Spark.
Show More Show Less View Video Transcript
0:00
In this video, I'm going to show you how to write in a join SQL query using the PySpark
0:12
in Fabric Notebook. So let's get started. Now the PySpark is one of the Python libraries
0:19
that provide an interface for programming Apache Spark job. So let's see how we can
0:24
use that in the Fabric Notebook. Now before we get started, let me just explain what we
0:29
have on the screen. In this case, I've got this data in the lake house. Of course, you
0:34
can see the dimension tables. Of course, I have the fact table. So you have so many dimension
0:39
table. In this cell, we define a df variable. And of course, we initialize the instance
0:45
of the Spark module and of course passing the select star from wh underscore data underscore
0:53
lake house dot dbo underscore f transaction. Now this is going to be the fact table. And
0:58
of course, we have the dimension tables. And of course, just limit the rows to just 10
1:03
rows and of course display the content of the data frame. Now I'm going to scroll down
1:10
and let's see what we intend to do. We want to write a SQL query using the inner join
1:15
to calculate the total sales by product and the stock. So I'm going to click on this add
1:23
cell. And then we can begin to write. The first thing I want to do is to define a variable that
1:28
I'm going to store the SQL code inside. I'm going to call it total sales by product and
1:38
store. Okay, and then just put in the equal to now to write the SQL in the Spark, I'm going to
1:45
open three double quotes. So three double quotes. And of course, don't forget three double quotes
1:50
And I'm going to come into line two, and then I can begin to write the SQL query. So actually
1:56
select the product column and then the store. And I'm going to scroll up now in the fact table
2:04
we do have the sales amount column that's one aggregate. So let's scroll down. Now I'm going
2:12
to use this sum function, open the brackets, and I want to call this sales amount. Of course
2:19
we want to alias as total sales. Okay, and then click Enter. Now, we'll select from the
2:28
DBO underscore dim products. So DBO underscore dim product dimension table. And of course
2:37
I'm going to alias the name of that table using the P. And I'm going to come back to this product
2:42
column and just type a P dot just to provide a proper name convention. And just click Enter
2:49
I can use the inner join. Now for the inner join, I want to reference the F transaction fact table
2:55
So if transaction, oh, excuse me, DBO underscore F transaction. And I'm going to alias this as F
3:04
And I'm going to come back to the sales amount and alias as F dot sales amount column. And I want
3:11
to use the on operation. Now, first and foremost, we want to access the product key, which is the
3:17
primary key in the dim product table. So I'm going to say P dot product key, which is the primary
3:25
key in the dim product table, equal to F dot product key in the fact table. So product key
3:33
okay, and then click Enter. So for the second part, we want to undo this term. So inner join
3:39
and I'm going to call the DBO underscore dim store dimension table. Now that's exactly what we have
3:46
a DBO underscore dim store. And of course, this is the DBO underscore dim product. So I'm going to
3:52
alias this as S. And I'm going to come back to this store column and alias as S. And let's continue
4:00
So I'm going to use the on operation. Now, in this case, I want to focus on the store key in the F
4:07
transaction fact table, which is the foreign key. And of course, the primary key in the store
4:13
dim store table. So I can say F dot store key, which is the foreign key equal to S dot store
4:25
key, which is the primary key. And then let's click Enter. And then we can group by the two
4:32
columns. So P dot product column comma and the S dot store column. And we can even order by
4:41
so this is store. Okay, click Enter. I'm going to order by and order by the sum of the F
4:51
dot sales amount column. Okay, and then let's order descending. All right, let's go through
5:00
the code to be sure everything is fine. First, we selected two tables, the product and of course
5:06
the store tables. And of course, we aggregate using the sum function, the sales amount
5:11
and of course, the alias as total sales. And then we select from the DB underscore dim product
5:19
dimension table, and of course, we alias that particular table with P. And then we use the
5:25
inner join. And of course, we specify the DB underscore F transfer the fact table. And of
5:30
course, we alias with F. And we use the on operation. And of course, we specify the product key
5:39
equal to the same rows of the product key in the F factor, which is the in this case
5:46
the foreign key, and this is the primary key. And then for the other part, the store we also
5:52
call the inner join DB underscore dim store, and then we alias as S. And we use the on keyword
5:59
again, F, the foreign key, the store key equal to the primary key in the store key that is the S
6:08
And then we group by the two columns that we specified here, P dot product and S dot store
6:15
And then we just order by the sum of sales amount. And that's going to be the code. Now
6:22
the next one to do is to initialize the Spark SQL module again. So I'm going to just type
6:27
df and equal to spark dot SQL module, and then open the brackets now inside the bracket
6:35
I'm just going to pass in the name of this variable total sales by product and store
6:40
I'm going to just paste here. And oops, control Z, okay, control V. Okay, let me just get rid of
6:47
these. Okay, and then let's get it done. Okay. And finally, we can display the content of the
6:57
df this data frame. And let's click on run. Let's scroll down. Absolutely cool. There we go. So we
7:08
can see the output, we can see the results. So we have the sales by product and store. And we can
7:16
easily tell that the largest total sales was achieved when we have products that equals to
7:22
furnitures and store that equal to EB. And that's returned 1,956,607. And the amazing part is that
7:32
I can click on this particular object, the charts and see the charts. There we go. So we can see
7:38
the clustered bar chart, which is quite amazing. I love this functionality. And of course, we can see
7:44
the list of the product. And of course, we can see the volume. And we can even click on this
7:49
customized charts, I can switch from the bar charts to clustered column charts. And I'm going
7:56
to scroll down and click on apply. And there we go. So we have the cluster column charts quite
8:02
amazing. I can even switch to the line chart, just scroll down, click on apply. And there we go. This
8:11
is super cool. So this is basically how we can use the Spark SQL to write SQL query in the
8:18
Fabric Notebook. I trust you enjoy this video. If you do, like, share and comment. Thank you and bye for now. Cheers


