Data Engineering in Microsoft Fabric | Software Architecture Conference

Name: Data Engineering in Microsoft Fabric | Software Architecture Conference | Open Video
Uploaded: 2025-08-06T12:06:13+00:00
Duration: 41 min 12 s

0:03
uh in this session we will do like a
0:06
uh in this session we will do like a
0:06
uh in this session we will do like a deep dive into the data engineering part
0:09
deep dive into the data engineering part
0:09
deep dive into the data engineering part in or workload uh in Microsoft Fabrics
0:13
in or workload uh in Microsoft Fabrics
0:13
in or workload uh in Microsoft Fabrics so we will cover data engineering
0:16
so we will cover data engineering
0:16
so we will cover data engineering capabilities including the Lakehouse
0:19
capabilities including the Lakehouse
0:19
capabilities including the Lakehouse spark notebooks so uh I'm glad that you
0:23
spark notebooks so uh I'm glad that you
0:23
spark notebooks so uh I'm glad that you are joining the session also to learn
0:26
are joining the session also to learn
0:26
are joining the session also to learn about some
0:27
about some
0:27
about some updates uh about the product so
0:33
um I'm a business intelligence engineer
0:36
um I'm a business intelligence engineer
0:36
um I'm a business intelligence engineer so I have a newslet which is mixing
0:39
so I have a newslet which is mixing
0:39
so I have a newslet which is mixing humor of things that challenges me um
0:44
humor of things that challenges me um
0:44
humor of things that challenges me um during my my daily routine so um and I
0:48
during my my daily routine so um and I
0:48
during my my daily routine so um and I Tred to help entities not only entities
0:52
Tred to help entities not only entities
0:52
Tred to help entities not only entities but anyone uh seeking for help to make
0:56
but anyone uh seeking for help to make
0:56
but anyone uh seeking for help to make better decision in different business
0:59
better decision in different business
0:59
better decision in different business processes of course
1:01
processes of course
1:01
processes of course um Based on data that's the fuel so my
1:06
um Based on data that's the fuel so my
1:06
um Based on data that's the fuel so my LinkedIn newsletter is called the bi
1:08
LinkedIn newsletter is called the bi
1:08
LinkedIn newsletter is called the bi Chronicle and my um AKA is the data
1:14
Chronicle and my um AKA is the data
1:14
Chronicle and my um AKA is the data wit
1:16
wit
1:16
wit so let's start by um we're going to meet
1:21
so let's start by um we're going to meet
1:21
so let's start by um we're going to meet Alex starting with his uh life routine
1:25
Alex starting with his uh life routine
1:25
Alex starting with his uh life routine who is in data engineer at retail Hub
1:28
who is in data engineer at retail Hub
1:28
who is in data engineer at retail Hub it's growing e-commerce company which is
1:32
it's growing e-commerce company which is
1:32
it's growing e-commerce company which is facing let's say significant challenges
1:34
facing let's say significant challenges
1:34
facing let's say significant challenges with the their data so data is spread
1:38
with the their data so data is spread
1:38
with the their data so data is spread across various systems POS online
1:41
across various systems POS online
1:41
across various systems POS online transaction inventories customer
1:43
transaction inventories customer
1:43
transaction inventories customer services so Alex daily routine is
1:47
services so Alex daily routine is
1:47
services so Alex daily routine is involved like juggling let's say
1:50
involved like juggling let's say
1:50
involved like juggling let's say multiple tools and platforms each with
1:53
multiple tools and platforms each with
1:53
multiple tools and platforms each with its own uh set of complexities
1:57
its own uh set of complexities
1:57
its own uh set of complexities so we can s like we can mention the
2:01
so we can s like we can mention the
2:01
so we can s like we can mention the manual integration of data from
2:02
manual integration of data from
2:03
manual integration of data from different sources which is considered as
2:06
different sources which is considered as
2:06
different sources which is considered as time consuming and prone to errors so a
2:10
time consuming and prone to errors so a
2:10
time consuming and prone to errors so a different departments at retail Hub use
2:14
different departments at retail Hub use
2:14
different departments at retail Hub use their own systems so we are talking
2:16
their own systems so we are talking
2:16
their own systems so we are talking about scenario leading to data silos
2:20
about scenario leading to data silos
2:20
about scenario leading to data silos that made it difficult to get like um a
2:24
that made it difficult to get like um a
2:24
that made it difficult to get like um a holistic view of the business and as
2:28
holistic view of the business and as
2:28
holistic view of the business and as data volumes like through the existing
2:32
data volumes like through the existing
2:32
data volumes like through the existing infrastructure um from the point of view
2:35
infrastructure um from the point of view
2:35
infrastructure um from the point of view of
2:36
of
2:36
of Alex um struggl it like to keep up so he
2:41
Alex um struggl it like to keep up so he
2:41
Alex um struggl it like to keep up so he spent significant time troubleshooting
2:43
spent significant time troubleshooting
2:43
spent significant time troubleshooting performance issues trying to optimiz
2:46
performance issues trying to optimiz
2:46
performance issues trying to optimiz optimize like slow running queries and
2:50
optimize like slow running queries and
2:50
optimize like slow running queries and which at the final like at the end
2:52
which at the final like at the end
2:52
which at the final like at the end detracted from more strategic
2:56
detracted from more strategic
2:56
detracted from more strategic word so
3:04
during a technology conference uh Alex
3:08
during a technology conference uh Alex
3:08
during a technology conference uh Alex attended a session on Microsoft fabric
3:10
attended a session on Microsoft fabric
3:10
attended a session on Microsoft fabric so it's an endtoend analytic Pro
3:14
so it's an endtoend analytic Pro
3:14
so it's an endtoend analytic Pro platform promises to solve many of the
3:18
platform promises to solve many of the
3:18
platform promises to solve many of the challenges he faced so it Alex is you
3:23
challenges he faced so it Alex is you
3:23
challenges he faced so it Alex is you it's me is everyone trying to uh
3:27
it's me is everyone trying to uh
3:27
it's me is everyone trying to uh understand what Microsoft fabric about
3:31
understand what Microsoft fabric about
3:31
understand what Microsoft fabric about so data engineering uh in fabric enables
3:35
so data engineering uh in fabric enables
3:35
so data engineering uh in fabric enables users so we are going to just
3:38
users so we are going to just
3:38
users so we are going to just uh like you see focus on the data
3:41
uh like you see focus on the data
3:41
uh like you see focus on the data engineering
3:42
engineering
3:42
engineering part so enables users to design build
3:46
part so enables users to design build
3:46
part so enables users to design build and maintain infrastructures and system
3:49
and maintain infrastructures and system
3:49
and maintain infrastructures and system that enable uh organization to collect
3:52
that enable uh organization to collect
3:52
that enable uh organization to collect store process and analyze large volumes
3:56
store process and analyze large volumes
3:56
store process and analyze large volumes of data so um with Ms fabric you can
4:00
of data so um with Ms fabric you can
4:00
of data so um with Ms fabric you can create and manage your data using leg
4:02
create and manage your data using leg
4:02
create and manage your data using leg house design pipelines to copy data in
4:06
house design pipelines to copy data in
4:06
house design pipelines to copy data in your leg house uh use spark job
4:09
your leg house uh use spark job
4:09
your leg house uh use spark job definitions to submit for example badge
4:12
definitions to submit for example badge
4:12
definitions to submit for example badge streaming job to spark clusters and even
4:16
streaming job to spark clusters and even
4:16
streaming job to spark clusters and even use like notebooks to write code for
4:20
use like notebooks to write code for
4:20
use like notebooks to write code for data ingestion preparation and
4:26
transformation so the Journey of um
4:31
transformation so the Journey of um
4:31
transformation so the Journey of um Alex with data engineering part in Ms
4:34
Alex with data engineering part in Ms
4:34
Alex with data engineering part in Ms fabric starting with building the lak
4:37
fabric starting with building the lak
4:37
fabric starting with building the lak house architecture of the retail
4:40
house architecture of the retail
4:40
house architecture of the retail Hub how to maximize the data potential
4:45
Hub how to maximize the data potential
4:45
Hub how to maximize the data potential in this case with data lake house that's
4:47
in this case with data lake house that's
4:47
in this case with data lake house that's the first part and when entering into
4:51
the first part and when entering into
4:51
the first part and when entering into the engineering word uh within Microsoft
4:55
the engineering word uh within Microsoft
4:55
the engineering word uh within Microsoft fabric
4:56
fabric
4:56
fabric so uh Microsoft fabric Lake house is a
5:00
so uh Microsoft fabric Lake house is a
5:00
so uh Microsoft fabric Lake house is a data architecture platform of course for
5:03
data architecture platform of course for
5:03
data architecture platform of course for storing managing and analyzing
5:05
storing managing and analyzing
5:05
storing managing and analyzing structured and unstructured data in a
5:09
structured and unstructured data in a
5:09
structured and unstructured data in a single location so it's flexible
5:13
single location so it's flexible
5:13
single location so it's flexible scalable and allows organizations to
5:17
scalable and allows organizations to
5:17
scalable and allows organizations to handle large volumes of data using uh
5:21
handle large volumes of data using uh
5:21
handle large volumes of data using uh various Frameworks to process or analyze
5:25
various Frameworks to process or analyze
5:25
various Frameworks to process or analyze that data so and it's possible to in
5:28
that data so and it's possible to in
5:28
that data so and it's possible to in integrate it with other data management
5:31
integrate it with other data management
5:31
integrate it with other data management and analytic tools of course to provide
5:33
and analytic tools of course to provide
5:34
and analytic tools of course to provide let's say more comprehensive solution
5:36
let's say more comprehensive solution
5:36
let's say more comprehensive solution for data engineering and analytics so as
5:40
for data engineering and analytics so as
5:40
for data engineering and analytics so as you can see in the screenshot it's
5:43
you can see in the screenshot it's
5:43
you can see in the screenshot it's something you get used to uh
5:47
something you get used to uh
5:47
something you get used to uh maybe when working with powerbi online
5:51
maybe when working with powerbi online
5:51
maybe when working with powerbi online or powerbi
5:55
desktop moving to uh The Lakehouse SQL
5:59
desktop moving to uh The Lakehouse SQL
5:59
desktop moving to uh The Lakehouse SQL analytic points so the Lakehouse creates
6:02
analytic points so the Lakehouse creates
6:02
analytic points so the Lakehouse creates a serving layer by automatically
6:06
a serving layer by automatically
6:06
a serving layer by automatically generate a SQL analytics endpoint and a
6:09
generate a SQL analytics endpoint and a
6:09
generate a SQL analytics endpoint and a default semantic model during
6:12
default semantic model during
6:12
default semantic model during creation um this in my opinion like this
6:17
creation um this in my opinion like this
6:17
creation um this in my opinion like this new see-through functionality will allow
6:19
new see-through functionality will allow
6:19
new see-through functionality will allow users to work directly on top of the
6:22
users to work directly on top of the
6:23
users to work directly on top of the Delta tables in The
6:25
Delta tables in The
6:25
Delta tables in The Lakehouse and to provide like um
6:28
Lakehouse and to provide like um
6:28
Lakehouse and to provide like um frictional and performant experience all
6:32
frictional and performant experience all
6:32
frictional and performant experience all the way from data ingestion to reporting
6:36
the way from data ingestion to reporting
6:36
the way from data ingestion to reporting it's important also to note that the SQL
6:39
it's important also to note that the SQL
6:39
it's important also to note that the SQL analytics endpoints is is a read only
6:42
analytics endpoints is is a read only
6:42
analytics endpoints is is a read only experience and doesn't support the the
6:45
experience and doesn't support the the
6:45
experience and doesn't support the the full
6:47
full
6:47
full tsql um surface area of transactional
6:50
tsql um surface area of transactional
6:50
tsql um surface area of transactional data orh house so um if I explain better
6:55
data orh house so um if I explain better
6:55
data orh house so um if I explain better only the tables in Delta format are
6:57
only the tables in Delta format are
6:57
only the tables in Delta format are available in the SQL Analytics
7:00
available in the SQL Analytics
7:00
available in the SQL Analytics endpoint parket CSV and other
7:03
endpoint parket CSV and other
7:03
endpoint parket CSV and other formats may can like can not be queried
7:08
formats may can like can not be queried
7:08
formats may can like can not be queried using the SQL analytic points so if you
7:12
using the SQL analytic points so if you
7:12
using the SQL analytic points so if you don't see your table you will need to
7:15
don't see your table you will need to
7:15
don't see your table you will need to converted to Delta
7:17
converted to Delta
7:17
converted to Delta format of course you can set uh object
7:20
format of course you can set uh object
7:20
format of course you can set uh object level security to access data using uh
7:23
level security to access data using uh
7:23
level security to access data using uh SQL analytics endpoints um security
7:26
SQL analytics endpoints um security
7:26
SQL analytics endpoints um security rules will apply for accessing data um
7:30
rules will apply for accessing data um
7:30
rules will apply for accessing data um via the the the SQL analytics points uh
7:34
via the the the SQL analytics points uh
7:34
via the the the SQL analytics points uh to ensure of course data is not
7:37
to ensure of course data is not
7:37
to ensure of course data is not accessible in other ways um it's
7:40
accessible in other ways um it's
7:40
accessible in other ways um it's recommended to use workspace roles and
7:44
recommended to use workspace roles and
7:44
recommended to use workspace roles and permissions uh another Point uh related
7:48
permissions uh another Point uh related
7:48
permissions uh another Point uh related to SQL analytic end points is like the
7:51
to SQL analytic end points is like the
7:52
to SQL analytic end points is like the automatic Discovery um table Discovery
7:54
automatic Discovery um table Discovery
7:54
automatic Discovery um table Discovery and
7:55
and
7:55
and registration uh the automatic this
7:58
registration uh the automatic this
7:58
registration uh the automatic this feature um
8:00
feature um
8:00
feature um provides a fully managed file to to
8:04
provides a fully managed file to to
8:04
provides a fully managed file to to table experience for data engineers and
8:06
table experience for data engineers and
8:06
table experience for data engineers and not only Engineers data science also uh
8:10
not only Engineers data science also uh
8:10
not only Engineers data science also uh you can drop just a file into the
8:12
you can drop just a file into the
8:12
you can drop just a file into the managed area of the lake house and um it
8:17
managed area of the lake house and um it
8:17
managed area of the lake house and um it like registered into the meta store with
8:20
like registered into the meta store with
8:20
like registered into the meta store with the necessary meta data such as for
8:23
the necessary meta data such as for
8:23
the necessary meta data such as for example uh column names compression
8:27
example uh column names compression
8:27
example uh column names compression formats Etc
8:29
formats Etc
8:29
formats Etc uh currently the only supported uh
8:32
uh currently the only supported uh
8:32
uh currently the only supported uh format is Delta table so um let's hope
8:36
format is Delta table so um let's hope
8:36
format is Delta table so um let's hope in the future we have more uh features
8:40
in the future we have more uh features
8:40
in the future we have more uh features uh an improvement for this uh
8:54
component as I was uh talking about the
8:57
component as I was uh talking about the
8:57
component as I was uh talking about the automatic table Discovery uh um there is
9:00
automatic table Discovery uh um there is
9:00
automatic table Discovery uh um there is another um a fabric Lakehouse can
9:04
another um a fabric Lakehouse can
9:04
another um a fabric Lakehouse can automatically discover all the data sets
9:06
automatically discover all the data sets
9:06
automatically discover all the data sets already present in your data Lake and
9:09
already present in your data Lake and
9:09
already present in your data Lake and expose them um as tables in The
9:12
expose them um as tables in The
9:12
expose them um as tables in The Lakehouse and of course warehouses
9:15
Lakehouse and of course warehouses
9:15
Lakehouse and of course warehouses there's always a single condition as I
9:18
there's always a single condition as I
9:18
there's always a single condition as I always said the tables must be stored in
9:21
always said the tables must be stored in
9:21
always said the tables must be stored in the Delta Lake
9:23
the Delta Lake
9:23
the Delta Lake format uh about multitasking with lake
9:27
format uh about multitasking with lake
9:27
format uh about multitasking with lake house uh this experience like provides a
9:31
house uh this experience like provides a
9:31
house uh this experience like provides a browser tab design that can allow you to
9:36
browser tab design that can allow you to
9:36
browser tab design that can allow you to open and switch between like multiple
9:38
open and switch between like multiple
9:38
open and switch between like multiple items seamlessly allowing to you you to
9:42
items seamlessly allowing to you you to
9:42
items seamlessly allowing to you you to manage uh for example your data data
9:46
manage uh for example your data data
9:46
manage uh for example your data data Lakehouse and in or more efficiently
9:50
Lakehouse and in or more efficiently
9:50
Lakehouse and in or more efficiently than ever so you don't need to juggle
9:53
than ever so you don't need to juggle
9:53
than ever so you don't need to juggle between different Windows to make your
9:55
between different Windows to make your
9:55
between different Windows to make your data tasks so um that's um the way that
10:01
data tasks so um that's um the way that
10:01
data tasks so um that's um the way that it is more efficient and user friendly
10:04
it is more efficient and user friendly
10:04
it is more efficient and user friendly uh as
10:07
uh as
10:07
uh as possible the smooth
10:09
possible the smooth
10:10
possible the smooth interaction you can upload or run data
10:13
interaction you can upload or run data
10:13
interaction you can upload or run data load operation in one Tab and check on
10:17
load operation in one Tab and check on
10:18
load operation in one Tab and check on another task in different tabs so with
10:20
another task in different tabs so with
10:20
another task in different tabs so with enhanced
10:22
enhanced
10:22
enhanced multitask the running operations are not
10:24
multitask the running operations are not
10:24
multitask the running operations are not cancelled when you navigate between tabs
10:28
cancelled when you navigate between tabs
10:28
cancelled when you navigate between tabs so you can focus on your work without
10:31
so you can focus on your work without
10:31
so you can focus on your work without interruptions of course selected objects
10:34
interruptions of course selected objects
10:34
interruptions of course selected objects data tables or files remain open and
10:36
data tables or files remain open and
10:36
data tables or files remain open and readly um available when you switch
10:39
readly um available when you switch
10:39
readly um available when you switch between tabs the context of your data
10:41
between tabs the context of your data
10:41
between tabs the context of your data lake house is always at your
10:45
lake house is always at your
10:45
lake house is always at your fingertips um there's also behind a
10:49
fingertips um there's also behind a
10:49
fingertips um there's also behind a non-blocking uh reload mechanism of your
10:52
non-blocking uh reload mechanism of your
10:52
non-blocking uh reload mechanism of your files and table list so you can keep
10:55
files and table list so you can keep
10:55
files and table list so you can keep working while the list refreshes in the
10:58
working while the list refreshes in the
10:58
working while the list refreshes in the background so you have always the latest
11:01
background so you have always the latest
11:01
background so you have always the latest data while you provide while providing
11:04
data while you provide while providing
11:04
data while you provide while providing you like with a smooth and
11:06
you like with a smooth and
11:06
you like with a smooth and uninterrupted
11:09
uninterrupted
11:09
uninterrupted experience moving
11:12
experience moving
11:12
experience moving to how to get data in The Lakehouse so
11:17
to how to get data in The Lakehouse so
11:17
to how to get data in The Lakehouse so Lakehouse currently um supports over
11:20
Lakehouse currently um supports over
11:20
Lakehouse currently um supports over than 100 data sources um you can connect
11:23
than 100 data sources um you can connect
11:23
than 100 data sources um you can connect to wide range of data sources uh this
11:27
to wide range of data sources uh this
11:27
to wide range of data sources uh this include of course traditional databases
11:29
include of course traditional databases
11:29
include of course traditional databases Cloud uh Storage Solutions
11:33
Cloud uh Storage Solutions
11:33
Cloud uh Storage Solutions um uh SAS applications and more um of
11:39
um uh SAS applications and more um of
11:39
um uh SAS applications and more um of course this extensive like connectivity
11:42
course this extensive like connectivity
11:42
course this extensive like connectivity make sure that you can gather data from
11:46
make sure that you can gather data from
11:46
make sure that you can gather data from virtually any Source your organization
11:49
virtually any Source your organization
11:49
virtually any Source your organization uses and provide you with unified data
11:53
uses and provide you with unified data
11:53
uses and provide you with unified data platform
11:59
also we have the direct upload of files
12:02
also we have the direct upload of files
12:02
also we have the direct upload of files from the computer so uh the process is
12:06
from the computer so uh the process is
12:06
from the computer so uh the process is very simple uh how to get data from your
12:10
very simple uh how to get data from your
12:10
very simple uh how to get data from your um like file on your computer and you
12:13
um like file on your computer and you
12:13
um like file on your computer and you can upload them directly to uh fabric uh
12:18
can upload them directly to uh fabric uh
12:18
can upload them directly to uh fabric uh it's useful uh for quickly uh ingesting
12:22
it's useful uh for quickly uh ingesting
12:22
it's useful uh for quickly uh ingesting csvs Excel spreadsheet gone and other
12:27
csvs Excel spreadsheet gone and other
12:27
csvs Excel spreadsheet gone and other common like data formats without needing
12:29
common like data formats without needing
12:29
common like data formats without needing to set up a complex data pipeline so
12:33
to set up a complex data pipeline so
12:33
to set up a complex data pipeline so just you can simply drag and drop the
12:36
just you can simply drag and drop the
12:36
just you can simply drag and drop the file into the platform and that's
12:40
file into the platform and that's
12:40
file into the platform and that's it uh applying Transformations using
12:43
it uh applying Transformations using
12:43
it uh applying Transformations using data flow so once your data is there in
12:46
data flow so once your data is there in
12:46
data flow so once your data is there in the lake
12:47
the lake
12:47
the lake house with Microsoft fabric you can have
12:51
house with Microsoft fabric you can have
12:51
house with Microsoft fabric you can have tools to transform and prepare uh your
12:55
tools to transform and prepare uh your
12:55
tools to transform and prepare uh your data which uh which is like the data
12:58
data which uh which is like the data
12:58
data which uh which is like the data flows you define like data
13:00
flows you define like data
13:00
flows you define like data transformation steps is something many
13:03
transformation steps is something many
13:03
transformation steps is something many like similar to people who worked
13:06
like similar to people who worked
13:06
like similar to people who worked already or have um knowledge
13:09
already or have um knowledge
13:09
already or have um knowledge about active um aure data Factory so you
13:14
about active um aure data Factory so you
13:14
about active um aure data Factory so you can do like uh um data transformation
13:17
can do like uh um data transformation
13:17
can do like uh um data transformation like filtering joining
13:20
like filtering joining
13:20
like filtering joining aggregating uh or even further complex
13:24
aggregating uh or even further complex
13:24
aggregating uh or even further complex processing uh
13:25
processing uh
13:25
processing uh operations and the data flows support
13:29
operations and the data flows support
13:29
operations and the data flows support support like both no Code and low code
13:33
support like both no Code and low code
13:33
support like both no Code and low code options so making it like um accessible
13:36
options so making it like um accessible
13:36
options so making it like um accessible for both Technical and non-technical
13:40
for both Technical and non-technical
13:40
for both Technical and non-technical users we have also the copy
13:43
users we have also the copy
13:43
users we have also the copy activity uh especially when dealing with
13:47
activity uh especially when dealing with
13:47
activity uh especially when dealing with large scale data uh Microsoft fabric
13:51
large scale data uh Microsoft fabric
13:51
large scale data uh Microsoft fabric supports the copying of entire data Ls
13:55
supports the copying of entire data Ls
13:55
supports the copying of entire data Ls up to we can say petabyte scales so uh
14:00
up to we can say petabyte scales so uh
14:00
up to we can say petabyte scales so uh it's very important for organization
14:03
it's very important for organization
14:03
it's very important for organization that need to replicate uh move or backup
14:08
that need to replicate uh move or backup
14:08
that need to replicate uh move or backup like very or big amounts of data across
14:12
like very or big amounts of data across
14:12
like very or big amounts of data across different environments or Cloud regions
14:16
different environments or Cloud regions
14:16
different environments or Cloud regions so um the copy activity is designed to
14:22
so um the copy activity is designed to
14:22
so um the copy activity is designed to be highly efficient and
14:24
be highly efficient and
14:24
be highly efficient and scalable uh especially with um very
14:27
scalable uh especially with um very
14:27
scalable uh especially with um very large data sets can which need to be
14:31
large data sets can which need to be
14:31
large data sets can which need to be transferred without performance like
14:35
transferred without performance like
14:35
transferred without performance like bottlenecks uh for spark lovers of
14:38
bottlenecks uh for spark lovers of
14:38
bottlenecks uh for spark lovers of course for more Advanced Data
14:41
course for more Advanced Data
14:41
course for more Advanced Data manipulation and
14:43
manipulation and
14:43
manipulation and processing today Microsoft fabric um
14:46
processing today Microsoft fabric um
14:46
processing today Microsoft fabric um supports the use of fabric code uh these
14:49
supports the use of fabric code uh these
14:49
supports the use of fabric code uh these allow data engineers and scientists to
14:52
allow data engineers and scientists to
14:52
allow data engineers and scientists to write custom code in languages like
14:55
write custom code in languages like
14:55
write custom code in languages like python or
14:56
python or
14:56
python or Scala and through spark users can
14:59
Scala and through spark users can
14:59
Scala and through spark users can connect to various data sources uh
15:02
connect to various data sources uh
15:02
connect to various data sources uh perform complex transformations of
15:04
perform complex transformations of
15:04
perform complex transformations of course and execute let's say distributed
15:07
course and execute let's say distributed
15:07
course and execute let's say distributed like computation ac across like large
15:09
like computation ac across like large
15:09
like computation ac across like large data
15:12
data
15:12
data sets uh another thing is like the short
15:15
sets uh another thing is like the short
15:15
sets uh another thing is like the short cuts the shortcuts uh in Microsoft one L
15:18
cuts the shortcuts uh in Microsoft one L
15:18
cuts the shortcuts uh in Microsoft one L allow you to unify your data across like
15:21
allow you to unify your data across like
15:21
allow you to unify your data across like domains clouds by uh and accounts by
15:25
domains clouds by uh and accounts by
15:25
domains clouds by uh and accounts by just creating single virtual data L for
15:29
just creating single virtual data L for
15:29
just creating single virtual data L for your entire Enterprise so all fabric um
15:33
your entire Enterprise so all fabric um
15:34
your entire Enterprise so all fabric um let's say experiences and analytical
15:37
let's say experiences and analytical
15:37
let's say experiences and analytical engines can directly connect to your
15:40
engines can directly connect to your
15:40
engines can directly connect to your existing data source um for example Asia
15:44
existing data source um for example Asia
15:44
existing data source um for example Asia AWS one Lake uh through a unified name
15:49
AWS one Lake uh through a unified name
15:49
AWS one Lake uh through a unified name space so one Lake manages all
15:53
space so one Lake manages all
15:53
space so one Lake manages all permissions and
15:54
permissions and
15:54
permissions and credentials uh and you don't need to
15:57
credentials uh and you don't need to
15:57
credentials uh and you don't need to separately configure each fabric
16:00
separately configure each fabric
16:00
separately configure each fabric workload to connect to each data source
16:03
workload to connect to each data source
16:03
workload to connect to each data source and you can use of course shortcuts to
16:05
and you can use of course shortcuts to
16:05
and you can use of course shortcuts to El eliminate Edge copies of data and
16:10
El eliminate Edge copies of data and
16:10
El eliminate Edge copies of data and reduce um process latency associated
16:13
reduce um process latency associated
16:14
reduce um process latency associated with the data copies and
16:26
staging of course uh as I mentioned
16:30
staging of course uh as I mentioned
16:30
staging of course uh as I mentioned about like the Delta Lake uh tables
16:34
about like the Delta Lake uh tables
16:34
about like the Delta Lake uh tables today the lake house provides this
16:36
today the lake house provides this
16:36
today the lake house provides this feature uh to load common files types
16:39
feature uh to load common files types
16:40
feature uh to load common files types and to an optimize like Delta table so
16:43
and to an optimize like Delta table so
16:43
and to an optimize like Delta table so today the supported files are parket and
16:47
today the supported files are parket and
16:47
today the supported files are parket and CSV um the file extension case doesn't
16:52
CSV um the file extension case doesn't
16:52
CSV um the file extension case doesn't matter uh also with the single file low
16:56
matter uh also with the single file low
16:56
matter uh also with the single file low users can load a single file of their
16:58
users can load a single file of their
16:58
users can load a single file of their choice in one of the supported format by
17:02
choice in one of the supported format by
17:02
choice in one of the supported format by selecting directly load to data table uh
17:05
selecting directly load to data table uh
17:05
selecting directly load to data table uh in the context of the menu um action we
17:10
in the context of the menu um action we
17:10
in the context of the menu um action we have also the folder level load so you
17:13
have also the folder level load so you
17:13
have also the folder level load so you can load all files under a folder and
17:17
can load all files under a folder and
17:17
can load all files under a folder and it's subfolders also at once by
17:20
it's subfolders also at once by
17:20
it's subfolders also at once by selecting load to Delta table just um
17:24
selecting load to Delta table just um
17:24
selecting load to Delta table just um like this feature automatically uh uh
17:27
like this feature automatically uh uh
17:27
like this feature automatically uh uh transverses all the files and load them
17:30
transverses all the files and load them
17:30
transverses all the files and load them uh to to a data
17:32
uh to to a data
17:32
uh to to a data table um just we need to make sure that
17:36
table um just we need to make sure that
17:36
table um just we need to make sure that only files of the same type can be
17:39
only files of the same type can be
17:39
only files of the same type can be loaded at the same time in a
17:42
loaded at the same time in a
17:42
loaded at the same time in a table uh we can also choose to load
17:46
table uh we can also choose to load
17:46
table uh we can also choose to load files and folders to new tables or an
17:48
files and folders to new tables or an
17:48
files and folders to new tables or an existing table so and of course for CSV
17:52
existing table so and of course for CSV
17:52
existing table so and of course for CSV file uh users are allowed to specify if
17:56
file uh users are allowed to specify if
17:56
file uh users are allowed to specify if their source file like for example
17:58
their source file like for example
17:58
their source file like for example includ
17:59
includ
17:59
includ heers um to be used as color names so
18:03
heers um to be used as color names so
18:03
heers um to be used as color names so you can see that these
18:06
you can see that these
18:06
you can see that these um details are somehow very similar to
18:10
um details are somehow very similar to
18:10
um details are somehow very similar to power query if you import like for
18:12
power query if you import like for
18:12
power query if you import like for example CSV file to power
18:17
example CSV file to power
18:17
example CSV file to power desktop so and of course tables are
18:20
desktop so and of course tables are
18:21
desktop so and of course tables are always loaded using the Delta L table
18:23
always loaded using the Delta L table
18:23
always loaded using the Delta L table format with vorder optimization enabled
18:33
Delta l so what is the V order and
18:36
Delta l so what is the V order and
18:36
Delta l so what is the V order and before that what is the Delta Lake
18:38
before that what is the Delta Lake
18:38
before that what is the Delta Lake optimalization let's start with defining
18:41
optimalization let's start with defining
18:41
optimalization let's start with defining um the V order it's it's a right time
18:44
um the V order it's it's a right time
18:44
um the V order it's it's a right time optimization to the parket file
18:46
optimization to the parket file
18:46
optimization to the parket file format to enable Lightning Fast reads
18:50
format to enable Lightning Fast reads
18:50
format to enable Lightning Fast reads under the Microsoft fabric compute
18:53
under the Microsoft fabric compute
18:53
under the Microsoft fabric compute engines such as like for such as powerbi
18:56
engines such as like for such as powerbi
18:57
engines such as like for such as powerbi SQL spark and others so powerbi and SQL
19:01
SQL spark and others so powerbi and SQL
19:01
SQL spark and others so powerbi and SQL engines make use like of the Microsoft
19:04
engines make use like of the Microsoft
19:04
engines make use like of the Microsoft vertic scan technology and the vorder
19:06
vertic scan technology and the vorder
19:06
vertic scan technology and the vorder parket files to achieve um in memory
19:11
parket files to achieve um in memory
19:11
parket files to achieve um in memory like data access time so this is for the
19:16
like data access time so this is for the
19:16
like data access time so this is for the case for power B SQL engines spark and
19:18
case for power B SQL engines spark and
19:18
case for power B SQL engines spark and other um let's say nonvert scan uh
19:22
other um let's say nonvert scan uh
19:22
other um let's say nonvert scan uh compute engines also benefit from the V
19:26
compute engines also benefit from the V
19:26
compute engines also benefit from the V ordered files um let's say with I think
19:30
ordered files um let's say with I think
19:30
ordered files um let's say with I think with an average of 10% faster read times
19:35
with an average of 10% faster read times
19:35
with an average of 10% faster read times with some scenarios up to I think
19:40
with some scenarios up to I think
19:41
with some scenarios up to I think 50% V orders Works um by applying like
19:45
50% V orders Works um by applying like
19:45
50% V orders Works um by applying like special sorting Ro group distribution
19:48
special sorting Ro group distribution
19:48
special sorting Ro group distribution dictionary uh
19:49
dictionary uh
19:49
dictionary uh encoding and compression on parket files
19:53
encoding and compression on parket files
19:53
encoding and compression on parket files this requires less Network disk CPU
19:57
this requires less Network disk CPU
19:57
this requires less Network disk CPU resources of course in the compute
20:00
resources of course in the compute
20:00
resources of course in the compute engines so to read and of course it
20:05
engines so to read and of course it
20:05
engines so to read and of course it provides cost efficiency and
20:08
provides cost efficiency and
20:08
provides cost efficiency and performance
20:10
performance
20:10
performance um V order Sor has
20:13
um V order Sor has
20:13
um V order Sor has 50% 15% impact on average right times
20:18
50% 15% impact on average right times
20:18
50% 15% impact on average right times and it provides up to 15% uh more
20:21
and it provides up to 15% uh more
20:21
and it provides up to 15% uh more compression so it's 100% open- source
20:24
compression so it's 100% open- source
20:24
compression so it's 100% open- source parket for a compliant all parket
20:28
parket for a compliant all parket
20:28
parket for a compliant all parket engines can read um read it as a
20:32
engines can read um read it as a
20:32
engines can read um read it as a regular parket file and and that's what
20:37
regular parket file and and that's what
20:37
regular parket file and and that's what that's what makes like the the Delta
20:40
that's what makes like the the Delta
20:40
that's what makes like the the Delta tables more efficient than ever uh so
20:44
tables more efficient than ever uh so
20:44
tables more efficient than ever uh so it's applied in the the V order is
20:47
it's applied in the the V order is
20:47
it's applied in the the V order is applied at the parket file
20:50
applied at the parket file
20:50
applied at the parket file level Delta tables and um its features
20:54
level Delta tables and um its features
20:54
level Delta tables and um its features such as for example z z order competion
20:58
such as for example z z order competion
20:58
such as for example z z order competion vacuum time travel Etc let's say are um
21:02
vacuum time travel Etc let's say are um
21:02
vacuum time travel Etc let's say are um orthogonal to V order and they are
21:06
orthogonal to V order and they are
21:06
orthogonal to V order and they are compatible um compatible and can be used
21:09
compatible um compatible and can be used
21:09
compatible um compatible and can be used together for extra
21:15
benefits for the new features like try
21:19
benefits for the new features like try
21:19
benefits for the new features like try to as I do it U each month I try to
21:23
to as I do it U each month I try to
21:23
to as I do it U each month I try to check the new features uh we have like
21:26
check the new features uh we have like
21:27
check the new features uh we have like the
21:28
the
21:28
the today it's a previous the lake schema
21:31
today it's a previous the lake schema
21:31
today it's a previous the lake schema feature which introduces the datab
21:34
feature which introduces the datab
21:34
feature which introduces the datab pipeline support for reading uh the
21:36
pipeline support for reading uh the
21:36
pipeline support for reading uh the schema from Lakehouse we have also the
21:40
schema from Lakehouse we have also the
21:40
schema from Lakehouse we have also the support
21:42
support
21:42
support um for G integration and deployment
21:45
um for G integration and deployment
21:45
um for G integration and deployment pipeline uh the Microsoft 365 connector
21:50
pipeline uh the Microsoft 365 connector
21:50
pipeline uh the Microsoft 365 connector now supports ingesting data in leg C for
21:54
now supports ingesting data in leg C for
21:54
now supports ingesting data in leg C for the the second and the third they are
21:56
the the second and the third they are
21:56
the the second and the third they are still in preview so uh be careful guys
21:59
still in preview so uh be careful guys
21:59
still in preview so uh be careful guys and feel free uh to upscale any bugs on
22:04
and feel free uh to upscale any bugs on
22:04
and feel free uh to upscale any bugs on the
22:07
platform going to a patch for a patch
22:10
platform going to a patch for a patch
22:10
platform going to a patch for a patch lovers uh in Microsoft Fabrics so um if
22:15
lovers uh in Microsoft Fabrics so um if
22:15
lovers uh in Microsoft Fabrics so um if we are going to start with a patch we
22:17
we are going to start with a patch we
22:17
we are going to start with a patch we need to talk about the the the the the
22:20
need to talk about the the the the the
22:20
need to talk about the the the the the patch run time within Microsoft fabric
22:23
patch run time within Microsoft fabric
22:23
patch run time within Microsoft fabric which is aure integrated platform based
22:27
which is aure integrated platform based
22:27
which is aure integrated platform based on a a patch spark and enables uh the
22:32
on a a patch spark and enables uh the
22:32
on a a patch spark and enables uh the executions and the management of data
22:36
executions and the management of data
22:36
executions and the management of data engineering and also data science
22:38
engineering and also data science
22:38
engineering and also data science experiences so uh it combines like the
22:41
experiences so uh it combines like the
22:41
experiences so uh it combines like the the the the components from both
22:44
the the the components from both
22:44
the the the components from both internal uh and open source uh sources
22:48
internal uh and open source uh sources
22:48
internal uh and open source uh sources to provide uh a comprehensive uh uh
22:52
to provide uh a comprehensive uh uh
22:52
to provide uh a comprehensive uh uh solution so um the major component of
22:57
solution so um the major component of
22:57
solution so um the major component of fabric runtime we have the apach sparkk
23:00
fabric runtime we have the apach sparkk
23:00
fabric runtime we have the apach sparkk it's a open source dist distributed like
23:03
it's a open source dist distributed like
23:03
it's a open source dist distributed like Computing library to enable large scale
23:06
Computing library to enable large scale
23:06
Computing library to enable large scale data processing we have the Delta Lake
23:08
data processing we have the Delta Lake
23:09
data processing we have the Delta Lake which is also open source storage layer
23:12
which is also open source storage layer
23:12
which is also open source storage layer uh that brings acid transactions and
23:15
uh that brings acid transactions and
23:15
uh that brings acid transactions and other data um reliability features to a
23:20
other data um reliability features to a
23:20
other data um reliability features to a pach spark and we have the default level
23:24
pach spark and we have the default level
23:24
pach spark and we have the default level packages for Java Scala uh Python and
23:28
packages for Java Scala uh Python and
23:28
packages for Java Scala uh Python and are so these packages support diverse
23:31
are so these packages support diverse
23:31
are so these packages support diverse programming languages and they are
23:34
programming languages and they are
23:34
programming languages and they are automatically installed and configured
23:36
automatically installed and configured
23:36
automatically installed and configured uh uh allowing developers to apply their
23:40
uh uh allowing developers to apply their
23:40
uh uh allowing developers to apply their prefer preferred programming languages
23:43
prefer preferred programming languages
23:43
prefer preferred programming languages so uh this is by default all the new
23:47
so uh this is by default all the new
23:47
so uh this is by default all the new like all new workspaces use the latest
23:50
like all new workspaces use the latest
23:50
like all new workspaces use the latest runtime version which is currently
23:53
runtime version which is currently
23:53
runtime version which is currently 1.2 and uh this runtime is built open
23:57
1.2 and uh this runtime is built open
23:57
1.2 and uh this runtime is built open and robust open source uh operating
24:02
and robust open source uh operating
24:02
and robust open source uh operating system uh to ensure like the
24:04
system uh to ensure like the
24:04
system uh to ensure like the compatibility with various um Hardware
24:07
compatibility with various um Hardware
24:07
compatibility with various um Hardware configurations and of course system uh
24:10
configurations and of course system uh
24:10
configurations and of course system uh uh
24:13
requirements moving to the next
24:15
requirements moving to the next
24:15
requirements moving to the next component which is aach spark
24:18
component which is aach spark
24:18
component which is aach spark compute um as I said like uh this
24:23
compute um as I said like uh this
24:23
compute um as I said like uh this platform is designed to deliver and
24:24
platform is designed to deliver and
24:24
platform is designed to deliver and paralled speed and efficiency so with um
24:29
paralled speed and efficiency so with um
24:29
paralled speed and efficiency so with um let's say starter pools um you can
24:32
let's say starter pools um you can
24:32
let's say starter pools um you can expect rapid up like spark sessions to
24:37
expect rapid up like spark sessions to
24:37
expect rapid up like spark sessions to initialize it's typically between five
24:40
initialize it's typically between five
24:40
initialize it's typically between five to 10 seconds so we don't you don't need
24:44
to 10 seconds so we don't you don't need
24:44
to 10 seconds so we don't you don't need any manual setup you can also get the
24:47
any manual setup you can also get the
24:47
any manual setup you can also get the flexibility to customize uh the apach uh
24:50
flexibility to customize uh the apach uh
24:50
flexibility to customize uh the apach uh spark pools according to the specific um
24:55
spark pools according to the specific um
24:55
spark pools according to the specific um requirements so um
24:59
requirements so um
24:59
requirements so um the platform in somehow enables you to
25:03
the platform in somehow enables you to
25:03
the platform in somehow enables you to optimize the analytic
25:11
experience moving to another component
25:14
experience moving to another component
25:14
experience moving to another component which is important in spark uh apach
25:17
which is important in spark uh apach
25:17
which is important in spark uh apach spark runtime is the spark
25:20
spark runtime is the spark
25:20
spark runtime is the spark pool uh which is like a way to tell
25:25
pool uh which is like a way to tell
25:25
pool uh which is like a way to tell spark uh what kind of resource um you
25:29
spark uh what kind of resource um you
25:29
spark uh what kind of resource um you need for your data analysis so you can
25:32
need for your data analysis so you can
25:32
need for your data analysis so you can give it a name choose how many and how
25:35
give it a name choose how many and how
25:35
give it a name choose how many and how large uh the nodes uh when I say the
25:38
large uh the nodes uh when I say the
25:38
large uh the nodes uh when I say the nodes like the the machines that do the
25:40
nodes like the the machines that do the
25:40
nodes like the the machines that do the work are so um you can also test Park um
25:45
work are so um you can also test Park um
25:45
work are so um you can also test Park um how to adjust the number of the nodes
25:48
how to adjust the number of the nodes
25:48
how to adjust the number of the nodes depending on how much work uh you have
25:52
depending on how much work uh you have
25:52
depending on how much work uh you have um you creating uh of course the spark
25:56
um you creating uh of course the spark
25:56
um you creating uh of course the spark pool is free you only pay when you run a
25:59
pool is free you only pay when you run a
25:59
pool is free you only pay when you run a spark job on the pool and the spark sets
26:02
spark job on the pool and the spark sets
26:03
spark job on the pool and the spark sets up uh the nodes for you so if you don't
26:05
up uh the nodes for you so if you don't
26:06
up uh the nodes for you so if you don't use spark pool for two minutes uh after
26:09
use spark pool for two minutes uh after
26:09
use spark pool for two minutes uh after your session expires your spark pool
26:12
your session expires your spark pool
26:12
your session expires your spark pool will be deallocated so this default
26:15
will be deallocated so this default
26:15
will be deallocated so this default session expiration time period is set up
26:18
session expiration time period is set up
26:18
session expiration time period is set up to 20 minute and you can of course
26:21
to 20 minute and you can of course
26:21
to 20 minute and you can of course change it if you want if you are
26:24
change it if you want if you are
26:24
change it if you want if you are workspace admin you can also create
26:26
workspace admin you can also create
26:26
workspace admin you can also create custom spark pools for your
26:28
custom spark pools for your
26:28
custom spark pools for your workspace and make them the default
26:32
workspace and make them the default
26:32
workspace and make them the default option for the other users so in this
26:34
option for the other users so in this
26:34
option for the other users so in this way you can always save time and avoid
26:37
way you can always save time and avoid
26:37
way you can always save time and avoid setting up new spark pools every time
26:39
setting up new spark pools every time
26:39
setting up new spark pools every time you run a notebook or a spark uh
26:44
job uh one other detail to mention is
26:48
job uh one other detail to mention is
26:48
job uh one other detail to mention is like the custom spark uh uh pools take
26:51
like the custom spark uh uh pools take
26:51
like the custom spark uh uh pools take always about three minutes to start
26:54
always about three minutes to start
26:54
always about three minutes to start because spark must get the nodes from
26:57
because spark must get the nodes from
26:57
because spark must get the nodes from Asia
27:02
uh moving to the nodes so as I said an
27:06
uh moving to the nodes so as I said an
27:06
uh moving to the nodes so as I said an apach spark pool in instance consists of
27:09
apach spark pool in instance consists of
27:09
apach spark pool in instance consists of one head node and worker no so um could
27:13
one head node and worker no so um could
27:13
one head node and worker no so um could start a minimum of one node in spark
27:16
start a minimum of one node in spark
27:16
start a minimum of one node in spark instance the head node all always runs
27:19
instance the head node all always runs
27:19
instance the head node all always runs extra Management Services such as um Ley
27:23
extra Management Services such as um Ley
27:23
extra Management Services such as um Ley yarn service yarn resource manager
27:26
yarn service yarn resource manager
27:26
yarn service yarn resource manager zookeeper
27:28
zookeeper
27:28
zookeeper and apach Spark driver so all run all
27:32
and apach Spark driver so all run all
27:32
and apach Spark driver so all run all the nodes run services such as node
27:37
the nodes run services such as node
27:37
the nodes run services such as node agent y node manager and all worker
27:40
agent y node manager and all worker
27:40
agent y node manager and all worker nodes run the apach spark executor
27:43
nodes run the apach spark executor
27:43
nodes run the apach spark executor Services the node sizes as you can see
27:47
Services the node sizes as you can see
27:47
Services the node sizes as you can see um can be defined of course the spark
27:50
um can be defined of course the spark
27:50
um can be defined of course the spark pool can be defined with node node sizes
27:54
pool can be defined with node node sizes
27:54
pool can be defined with node node sizes that range from a small compute with 4 V
27:57
that range from a small compute with 4 V
27:57
that range from a small compute with 4 V core and
27:58
core and
27:58
core and 28 gigabyte of memory uh to double um
28:04
28 gigabyte of memory uh to double um
28:04
28 gigabyte of memory uh to double um extra large compute nodes as you can see
28:08
extra large compute nodes as you can see
28:08
extra large compute nodes as you can see and they can be altered after pool
28:10
and they can be altered after pool
28:10
and they can be altered after pool creation although uh active session
28:14
creation although uh active session
28:14
creation although uh active session should be restarted we can talk about uh
28:17
should be restarted we can talk about uh
28:17
should be restarted we can talk about uh also the auto
28:19
also the auto
28:19
also the auto scale since um the the pools allow
28:24
scale since um the the pools allow
28:24
scale since um the the pools allow Automatic Auto um um Auto scale or this
28:29
Automatic Auto um um Auto scale or this
28:29
Automatic Auto um um Auto scale or this automatic scale up and down of course of
28:31
automatic scale up and down of course of
28:31
automatic scale up and down of course of compute resources based on the amount of
28:34
compute resources based on the amount of
28:34
compute resources based on the amount of the activity so when you enable the
28:38
the activity so when you enable the
28:38
the activity so when you enable the Autos scale uh you set like the minimum
28:41
Autos scale uh you set like the minimum
28:41
Autos scale uh you set like the minimum and the maximum number of nodes to scale
28:44
and the maximum number of nodes to scale
28:44
and the maximum number of nodes to scale uh and of course there is the dynamic
28:47
uh and of course there is the dynamic
28:47
uh and of course there is the dynamic allocation principle which allow The
28:50
allocation principle which allow The
28:50
allocation principle which allow The aach Spar application to request more
28:53
aach Spar application to request more
28:53
aach Spar application to request more executors if the task exceeds let's say
28:56
executors if the task exceeds let's say
28:56
executors if the task exceeds let's say the load that current executor
28:59
the load that current executor
28:59
the load that current executor uh can be so and when you enable the
29:03
uh can be so and when you enable the
29:03
uh can be so and when you enable the dynamic allocation option for
29:06
dynamic allocation option for
29:06
dynamic allocation option for um for like the applic every Spar
29:10
um for like the applic every Spar
29:10
um for like the applic every Spar application submitted the system
29:13
application submitted the system
29:13
application submitted the system reserves executors during the job
29:16
reserves executors during the job
29:16
reserves executors during the job submission step on the minimum notes you
29:19
submission step on the minimum notes you
29:19
submission step on the minimum notes you can specify maximum notes to support uh
29:22
can specify maximum notes to support uh
29:22
can specify maximum notes to support uh of course successful uh automatic
29:24
of course successful uh automatic
29:24
of course successful uh automatic scenarios
29:28
moving to the apach spark job
29:32
moving to the apach spark job
29:32
moving to the apach spark job definition uh which is a Microsoft
29:36
definition uh which is a Microsoft
29:36
definition uh which is a Microsoft fabric code item that allows us to
29:39
fabric code item that allows us to
29:39
fabric code item that allows us to submit uh badge streaming jobs to spark
29:43
submit uh badge streaming jobs to spark
29:43
submit uh badge streaming jobs to spark clusters so by uploading the binary uh
29:47
clusters so by uploading the binary uh
29:48
clusters so by uploading the binary uh uh um files from the compilation
29:51
uh um files from the compilation
29:51
uh um files from the compilation output um for example jar files uh from
29:56
output um for example jar files uh from
29:56
output um for example jar files uh from java uh uh you can apply different
29:58
java uh uh you can apply different
29:59
java uh uh you can apply different transformation logic to the data hosted
30:01
transformation logic to the data hosted
30:01
transformation logic to the data hosted on the lake house so you just to run
30:04
on the lake house so you just to run
30:04
on the lake house so you just to run that job definition you must have at
30:06
that job definition you must have at
30:06
that job definition you must have at least one lake house associated with it
30:10
least one lake house associated with it
30:10
least one lake house associated with it um there are a few ways you can get
30:12
um there are a few ways you can get
30:12
um there are a few ways you can get start with the creation process it's you
30:15
start with the creation process it's you
30:15
start with the creation process it's you can go to the data engineering homepage
30:18
can go to the data engineering homepage
30:18
can go to the data engineering homepage directly so you can easily create spark
30:21
directly so you can easily create spark
30:21
directly so you can easily create spark definition through um the the the spark
30:25
definition through um the the the spark
30:25
definition through um the the the spark definition
30:26
definition
30:26
definition card under the new section or um you can
30:32
card under the new section or um you can
30:32
card under the new section or um you can uh another like you can you can create
30:35
uh another like you can you can create
30:35
uh another like you can you can create it through the create page under the
30:37
it through the create page under the
30:37
it through the create page under the data engineering uh on the left
30:42
data engineering uh on the left
30:42
data engineering uh on the left side or of course
30:45
side or of course
30:45
side or of course um you can of course through the
30:48
um you can of course through the
30:48
um you can of course through the workspace in uh data engineering so you
30:51
workspace in uh data engineering so you
30:51
workspace in uh data engineering so you can always go back to the data
30:53
can always go back to the data
30:53
can always go back to the data engineering if you get lost
30:58
the another component in apach spark
31:01
the another component in apach spark
31:01
the another component in apach spark that you can find in Microsoft um fabric
31:04
that you can find in Microsoft um fabric
31:04
that you can find in Microsoft um fabric is the libraries and we have
31:08
is the libraries and we have
31:08
is the libraries and we have like uh two types of libraries so the
31:11
like uh two types of libraries so the
31:11
like uh two types of libraries so the buildin and the public actually three
31:14
buildin and the public actually three
31:14
buildin and the public actually three types built-in public and Customs the
31:17
types built-in public and Customs the
31:17
types built-in public and Customs the buil-in libraries um each fabric like
31:22
buil-in libraries um each fabric like
31:22
buil-in libraries um each fabric like spark front time provides a rich set of
31:25
spark front time provides a rich set of
31:25
spark front time provides a rich set of popular pre-installed libraries you can
31:29
popular pre-installed libraries you can
31:29
popular pre-installed libraries you can find them um all in the documentation we
31:34
find them um all in the documentation we
31:34
find them um all in the documentation we have also the fabric uh libraries and
31:37
have also the fabric uh libraries and
31:37
have also the fabric uh libraries and they are sourced from um uh repositories
31:42
they are sourced from um uh repositories
31:42
they are sourced from um uh repositories like cond uh which is currently
31:45
like cond uh which is currently
31:45
like cond uh which is currently supported and we have custom libraries
31:49
supported and we have custom libraries
31:49
supported and we have custom libraries uh that refer to code that you or your
31:52
uh that refer to code that you or your
31:52
uh that refer to code that you or your organizations uh uh buil
31:55
organizations uh uh buil
31:55
organizations uh uh buil so uh just we need make sure that um
31:59
so uh just we need make sure that um
31:59
so uh just we need make sure that um they are in the format of Jar T gz
32:05
they are in the format of Jar T gz
32:05
they are in the format of Jar T gz WHL so um because fabric supports um
32:10
WHL so um because fabric supports um
32:10
WHL so um because fabric supports um tar.gz only for the r language for
32:14
tar.gz only for the r language for
32:14
tar.gz only for the r language for python for example it use like the point
32:19
python for example it use like the point
32:19
python for example it use like the point WHL
32:23
format uh as best practices is always
32:26
format uh as best practices is always
32:26
format uh as best practices is always the default uh that have default like
32:31
the default uh that have default like
32:31
the default uh that have default like um libraries for the
32:33
um libraries for the
32:33
um libraries for the workspace uh the admin of course is
32:36
workspace uh the admin of course is
32:37
workspace uh the admin of course is responsible of that and by creating a
32:39
responsible of that and by creating a
32:39
responsible of that and by creating a new environment installing all the
32:42
new environment installing all the
32:42
new environment installing all the required uh libraries and attaching this
32:45
required uh libraries and attaching this
32:45
required uh libraries and attaching this environment to the workspace or the
32:48
environment to the workspace or the
32:48
environment to the workspace or the second um scenario is like to persist
32:51
second um scenario is like to persist
32:51
second um scenario is like to persist the library specification if you want
32:53
the library specification if you want
32:54
the library specification if you want that to persist you need to install
32:56
that to persist you need to install
32:56
that to persist you need to install libraries in an environment and attach
32:58
libraries in an environment and attach
32:58
libraries in an environment and attach it in the code items uh one in my
33:02
it in the code items uh one in my
33:02
it in the code items uh one in my opinion one benefit from this approach
33:03
opinion one benefit from this approach
33:04
opinion one benefit from this approach is that it saves effort of running the
33:07
is that it saves effort of running the
33:07
is that it saves effort of running the code that requires common libraries all
33:10
code that requires common libraries all
33:10
code that requires common libraries all the time so you have the libraries you
33:14
the time so you have the libraries you
33:14
the time so you have the libraries you can work in all spark sessions if the
33:16
can work in all spark sessions if the
33:16
can work in all spark sessions if the environment is
33:21
attached and the third scenario is to
33:23
attached and the third scenario is to
33:23
attached and the third scenario is to inline the
33:24
inline the
33:24
inline the installation uh so if you're interested
33:27
installation uh so if you're interested
33:27
installation uh so if you're interested in the onetime use within an interactive
33:30
in the onetime use within an interactive
33:30
in the onetime use within an interactive let's say run book uh notebook of a
33:34
let's say run book uh notebook of a
33:34
let's say run book uh notebook of a library that isn't installed uh so you
33:38
library that isn't installed uh so you
33:38
library that isn't installed uh so you may need to uh use the inline
33:41
may need to uh use the inline
33:41
may need to uh use the inline installation uh with the commands the
33:43
installation uh with the commands the
33:43
installation uh with the commands the inline commands you you have like
33:48
inline commands you you have like
33:48
inline commands you you have like um that they allow you like the command
33:50
um that they allow you like the command
33:50
um that they allow you like the command they allow you to have the library in
33:52
they allow you to have the library in
33:52
they allow you to have the library in the current notebook sessions and they
33:54
the current notebook sessions and they
33:54
the current notebook sessions and they don't uh and the library doesn't persist
33:57
don't uh and the library doesn't persist
33:57
don't uh and the library doesn't persist AC different
34:01
sessions as I already uh talked about
34:04
sessions as I already uh talked about
34:04
sessions as I already uh talked about environments previously so Microsoft
34:08
environments previously so Microsoft
34:08
environments previously so Microsoft fabric um environments are is a
34:11
fabric um environments are is a
34:11
fabric um environments are is a Consolidated item for all your hardware
34:14
Consolidated item for all your hardware
34:14
Consolidated item for all your hardware and software uh you can select in
34:17
and software uh you can select in
34:17
and software uh you can select in environment different spark runtimes you
34:19
environment different spark runtimes you
34:19
environment different spark runtimes you can configure your compute resources
34:22
can configure your compute resources
34:22
can configure your compute resources install libraries from public
34:24
install libraries from public
34:24
install libraries from public repositories uh there are multiple
34:27
repositories uh there are multiple
34:27
repositories uh there are multiple entries of creating new environments
34:30
entries of creating new environments
34:30
entries of creating new environments standard or you can create during
34:33
standard or you can create during
34:33
standard or you can create during selection so three major uh uh
34:37
selection so three major uh uh
34:37
selection so three major uh uh components in the environment
34:40
components in the environment
34:40
components in the environment um which are like the spark compute that
34:43
um which are like the spark compute that
34:43
um which are like the spark compute that includes like the runtime the libraries
34:46
includes like the runtime the libraries
34:46
includes like the runtime the libraries and the
34:47
and the
34:47
and the resource uh of course the environment
34:50
resource uh of course the environment
34:50
resource uh of course the environment can be attached to your data engineering
34:52
can be attached to your data engineering
34:52
can be attached to your data engineering or data science workspaces or your uh
34:55
or data science workspaces or your uh
34:56
or data science workspaces or your uh notebooks and uh uh spark job
35:03
definitions the new features for uh last
35:06
definitions the new features for uh last
35:06
definitions the new features for uh last few uh last period at least there is the
35:10
few uh last period at least there is the
35:10
few uh last period at least there is the run time uh I said that the one 1.2 is
35:13
run time uh I said that the one 1.2 is
35:13
run time uh I said that the one 1.2 is the def but the uh the fabric runtime
35:17
the def but the uh the fabric runtime
35:17
the def but the uh the fabric runtime 1.3 um remains and still perview so it
35:21
1.3 um remains and still perview so it
35:21
1.3 um remains and still perview so it include the incorporation of Delta Lake
35:24
include the incorporation of Delta Lake
35:24
include the incorporation of Delta Lake um 3.1 compatibility with p
35:28
um 3.1 compatibility with p
35:29
um 3.1 compatibility with p 3.1 so other um it's it also supports
35:34
3.1 so other um it's it also supports
35:34
3.1 so other um it's it also supports like start tools and integrating with
35:37
like start tools and integrating with
35:37
like start tools and integrating with environment and other Library management
35:40
environment and other Library management
35:40
environment and other Library management capabilities and of course we have the
35:42
capabilities and of course we have the
35:42
capabilities and of course we have the native execution engine for opach spark
35:45
native execution engine for opach spark
35:45
native execution engine for opach spark um is now on preview but only for the
35:49
um is now on preview but only for the
35:49
um is now on preview but only for the fiber CR time
35:55
1.2 moving to the notebooks for
35:58
1.2 moving to the notebooks for
35:58
1.2 moving to the notebooks for notebooks of course lovers it's the same
36:02
notebooks of course lovers it's the same
36:02
notebooks of course lovers it's the same um principle it's a primary code item to
36:05
um principle it's a primary code item to
36:05
um principle it's a primary code item to develop your spark or apach spark jobs
36:10
develop your spark or apach spark jobs
36:10
develop your spark or apach spark jobs and with a fabric notebook you can get
36:12
and with a fabric notebook you can get
36:12
and with a fabric notebook you can get started with a zero setup effort you can
36:16
started with a zero setup effort you can
36:16
started with a zero setup effort you can easily explore and process data with low
36:19
easily explore and process data with low
36:19
easily explore and process data with low code experience uh analyze data across
36:23
code experience uh analyze data across
36:24
code experience uh analyze data across uh row formats so um let's say see V
36:27
uh row formats so um let's say see V
36:28
uh row formats so um let's say see V text J Etc and even processed file
36:31
text J Etc and even processed file
36:31
text J Etc and even processed file formats like paret Delta L and of course
36:35
formats like paret Delta L and of course
36:35
formats like paret Delta L and of course to be more and more Pro productive uh
36:38
to be more and more Pro productive uh
36:38
to be more and more Pro productive uh with enhanced authoring capabilities and
36:42
with enhanced authoring capabilities and
36:42
with enhanced authoring capabilities and building data Vis
36:47
visualization you can of course export
36:50
visualization you can of course export
36:50
visualization you can of course export notebook to other standard formats um
36:54
notebook to other standard formats um
36:54
notebook to other standard formats um into um the standard notebook file
36:58
into um the standard notebook file
36:58
into um the standard notebook file uh which are used uh to Jupiter like
37:02
uh which are used uh to Jupiter like
37:02
uh which are used uh to Jupiter like notebooks even HTML that can be opened
37:05
notebooks even HTML that can be opened
37:05
notebooks even HTML that can be opened from a browser directly a python file
37:07
from a browser directly a python file
37:07
from a browser directly a python file and even a
37:09
and even a
37:09
and even a lat
37:13
file uh gentle reminder that it you know
37:17
file uh gentle reminder that it you know
37:17
file uh gentle reminder that it you know it's time to maybe wrap yeah uh how much
37:20
it's time to maybe wrap yeah uh how much
37:20
it's time to maybe wrap yeah uh how much time do do I have uh we've only passed
37:23
time do do I have uh we've only passed
37:23
time do do I have uh we've only passed two minutes over but you can have
37:25
two minutes over but you can have
37:25
two minutes over but you can have another couple of minutes yeah okay
37:27
another couple of minutes yeah okay
37:27
another couple of minutes yeah okay thank you than you for your
37:28
thank you than you for your
37:29
thank you than you for your understanding so uh you can explore like
37:32
understanding so uh you can explore like
37:32
understanding so uh you can explore like The Lakehouse Note data with the uh an
37:36
The Lakehouse Note data with the uh an
37:36
The Lakehouse Note data with the uh an existing notebook or you can create a
37:39
existing notebook or you can create a
37:39
existing notebook or you can create a new one uh it's easy um and the in the
37:45
new one uh it's easy um and the in the
37:45
new one uh it's easy um and the in the same Works Space and the current lake
37:46
same Works Space and the current lake
37:47
same Works Space and the current lake house you can have all the
37:52
notebooks of course you can Source
37:54
notebooks of course you can Source
37:54
notebooks of course you can Source control the you can Source control the
37:57
control the you can Source control the
37:57
control the you can Source control the the the notebooks and um using git
38:03
the the notebooks and um using git
38:03
the the notebooks and um using git integration for Source control with aure
38:08
devops so when you just commit The
38:11
devops so when you just commit The
38:11
devops so when you just commit The Notebook item to the G repo The Notebook
38:13
Notebook item to the G repo The Notebook
38:13
Notebook item to the G repo The Notebook code is converted to source code format
38:16
code is converted to source code format
38:17
code is converted to source code format instead of what we call Standard ipnb IP
38:21
instead of what we call Standard ipnb IP
38:21
instead of what we call Standard ipnb IP ynb
38:26
file what's new for the notebooks are
38:29
file what's new for the notebooks are
38:29
file what's new for the notebooks are the deployment pipelines uh to deploy
38:32
the deployment pipelines uh to deploy
38:32
the deployment pipelines uh to deploy your notebook code across like different
38:36
your notebook code across like different
38:36
your notebook code across like different uh environments like test Dev production
38:40
uh environments like test Dev production
38:41
uh environments like test Dev production uh it's still on preview but it will
38:43
uh it's still on preview but it will
38:43
uh it's still on preview but it will enable you to streamline uh your
38:45
enable you to streamline uh your
38:45
enable you to streamline uh your developing process it's like you can see
38:48
developing process it's like you can see
38:48
developing process it's like you can see it's similar to uh to the powerbi uh
38:53
it's similar to uh to the powerbi uh
38:53
it's similar to uh to the powerbi uh deployment pipeline we have also the AP
38:57
deployment pipeline we have also the AP
38:57
deployment pipeline we have also the AP AP I the public apis to mainly for
39:01
AP I the public apis to mainly for
39:01
AP I the public apis to mainly for management uh to automate like the
39:04
management uh to automate like the
39:04
management uh to automate like the pipelines uh and establish cicd so those
39:09
pipelines uh and establish cicd so those
39:09
pipelines uh and establish cicd so those pipelines make uh your life easy to
39:12
pipelines make uh your life easy to
39:12
pipelines make uh your life easy to manage and manipulate the fabric
39:14
manage and manipulate the fabric
39:14
manage and manipulate the fabric notebook items and integrate like
39:17
notebook items and integrate like
39:17
notebook items and integrate like notebooks with other tools and systems
39:20
notebooks with other tools and systems
39:20
notebooks with other tools and systems we have also uh the
39:22
we have also uh the
39:22
we have also uh the schedule uh public API to schedule and
39:25
schedule uh public API to schedule and
39:25
schedule uh public API to schedule and it's more oriented to
39:28
it's more oriented to
39:28
it's more oriented to um schedule uh the the running uh items
39:34
um schedule uh the the running uh items
39:34
um schedule uh the the running uh items uh with the specific
39:38
requirements going finally to the data
39:40
requirements going finally to the data
39:40
requirements going finally to the data pipelines in Microsoft that's the last
39:42
pipelines in Microsoft that's the last
39:42
pipelines in Microsoft that's the last part of our data engineering uh Journey
39:46
part of our data engineering uh Journey
39:46
part of our data engineering uh Journey uh the data pip plance you already heard
39:50
uh the data pip plance you already heard
39:50
uh the data pip plance you already heard about this it's series of steps to
39:52
about this it's series of steps to
39:52
about this it's series of steps to collect transform data from its um row
39:56
collect transform data from its um row
39:56
collect transform data from its um row format you can use it for analysis and
39:59
format you can use it for analysis and
39:59
format you can use it for analysis and decision making so the first uh
40:02
decision making so the first uh
40:02
decision making so the first uh component is activities um where you can
40:07
component is activities um where you can
40:07
component is activities um where you can um Define actions to perform on your
40:09
um Define actions to perform on your
40:09
um Define actions to perform on your data for example you can use copy
40:11
data for example you can use copy
40:11
data for example you can use copy activity to copy data from SQL Server to
40:15
activity to copy data from SQL Server to
40:15
activity to copy data from SQL Server to AER blob uh then you can use a data flow
40:18
AER blob uh then you can use a data flow
40:18
AER blob uh then you can use a data flow activity to or notebook activity if you
40:21
activity to or notebook activity if you
40:21
activity to or notebook activity if you want to process uh complex uh
40:24
want to process uh complex uh
40:24
want to process uh complex uh transformation uh um we have have like
40:27
transformation uh um we have have like
40:27
transformation uh um we have have like three types of activities data movement
40:31
three types of activities data movement
40:31
three types of activities data movement activities data transformation
40:33
activities data transformation
40:33
activities data transformation activities and control
40:40
activities that's it the the journey our
40:44
activities that's it the the journey our
40:44
activities that's it the the journey our journey with the data Engineering in
40:46
journey with the data Engineering in
40:46
journey with the data Engineering in Microsoft
40:48
Microsoft
40:48
Microsoft fabric uh that was the Journey of Alex
40:51
fabric uh that was the Journey of Alex
40:51
fabric uh that was the Journey of Alex remember also that Alex is you and me
40:55
remember also that Alex is you and me
40:55
remember also that Alex is you and me and every people everyone sorry to um
40:59
and every people everyone sorry to um
40:59
and every people everyone sorry to um interested with the Microsoft Fabric and
41:02
interested with the Microsoft Fabric and
41:02
interested with the Microsoft Fabric and of course the
41:07
[Music]

Data Engineering in Microsoft Fabric | Software Architecture Conference

CSharpCorner

Citus Con: An Event for Postgres 2023 – Americas Livestream

Software Architecture Conference 2023

Code Quality Conference 2023

When to Choose Serverless Versus Fixed Size Resources

Build, Host & Scale Full-Stack Apps in Minutes—No Infrastructure Needed!

Domain-first GraphQL API platforms for your .Net stack | Software Architecture Conference

n8n Is INSANE — Automate EVERYTHING from Apps to AI With Drag-and-Drop Workflows!

VSCode + Cline + Continue | NEVER PAY for CURSOR again. Use this OPEN SOURCE & LOCAL Alternative

Hacking the Human Mind: Mind Control is REAL? 💻

How to download Corporate Announcements from NSE Website using python | FabTrader

The RIGHT Way to Write Unit Tests for Domain-Driven Design applications

Building a Native App for Windows: Which UI Framework Should You Choose?

Up next in 10

Data Engineering in Microsoft Fabric | Software Architecture Conference

CSharpCorner