SQL Server 2019 Intelligent Query Processing by Dean Savović || SQL Server Virtual Conference
Oct 30, 2023
Dean Savović talks about new intelligent query processing features in SQL Server 2019. The features that will be described are adaptive joins, interleaved execution, memory grant feedback, table variable deferred execution, batch mode on rowstore, T-SQL scalar UDF inlining, and approximate count distinct.
About Speaker:
Dean is a Senior DBA practicing SQL Server since version 7.0 and the main areas of interest are performance tuning and troubleshooting SQL Server database. The primary focus is on OLTP databases with occasional adventures with Business Intelligence. He finished the Faculty of Electrical Engineering and Computing and MBA on Faculty of Economics in Zagreb. His premise was that it is very useful to combine technical knowledge with management process knowledge that helps us in our everyday work. He works in SPAN which is specialized in the data management field on Microsoft SQL Server.
Conference Website: https://www.2020twenty.net/sql-server-virtual-conference/
C# Corner - Community of Software and Data Developers
https://www.c-sharpcorner.com
#SQL #Bigdata #conference #sqlserver #t-sql #BI #IQP #query
Show More Show Less View Video Transcript
0:00
So
0:29
So hi to everyone. As Simon already said, I will talk about SQL Server Intelligent Query Processing
0:37
on SQL Server 2019 and 2017. Intelligent Query Processing is just a marketing name
0:47
for seven features that are available on SQL 2019, and three of them were available already on SQL 2017
0:57
under the marketing name Adaptive Query Processing. So let's start. A little bit about me
1:05
I'm a senior database developer. I love to perform and still on databases and SQL servers
1:11
I'm a Microsoft Certified Trainer, co-organizer of SQL Saturday Croatia and past Meetup SQL Server User Group Croatia
1:20
And here's my contact details. You can find me on LinkedIn, Twitter under Dean Savović
1:27
So if you have some questions about SQL Server or about this session or about anything related to SQL Server, please feel free to reach out
1:38
so agenda for today are features from intelligent query processing stack
1:48
so we will go through all of them we will talk about a little bit about theory about them and
1:55
show a demo about each of them and at the end we will have some room i hope for questions and answers
2:05
so the features under the intelligent query processing stack is as i already said part of
2:13
them are already available from sql 2017 and are called adaptive query processing
2:18
those are adaptive joins interleaved execution memory grant feedback and so on
2:25
some of them on this slide are marked with two letters e meaning that they are only
2:34
available on Enterprise Edition of SQL Server or Developer Edition, but you cannot use Developer Edition in production
2:44
So in production, you can use only Enterprise Edition. So that with Joins, Memory Grant Feedback and BatchMod on RowStore
2:52
are only available in Enterprise Edition of SQL Server. All others are available in standard
2:59
uh so three of these features as i already said are enterprise edition only features
3:08
and the thing i want you to remember that six out of these seven features can affect your existing
3:16
code positively or negatively so you can have uh query regressions or you can have improvements in
3:26
in your queries because of these features when you migrate to sql server 2017 or 2019
3:35
an important thing to remember is also if you migrated your sick your databases to sql server
3:42
2019 and adjusted the database compatibility level to 150 which is the compatibility level of sql
3:49
server 2019 all features are activated by default and and on sql 2019 or oltp workloads interact
4:02
significantly with intelligent query processing features on 2017 they interact uh by using
4:11
column store indexes. So basically I think that you would use column store indexes basically
4:21
more on data warehouse solutions. So on 2017 the store is different
4:32
Let's talk about execution plans and how intelligent query processing affects execution plans of the queries. Very interesting topic
4:42
Queries that are affected by intelligent query processing are queries that use table variables
4:51
tables that use user defined functions, queries that use multi-state and table-valued functions
5:00
complex queries, and queries that use tables with skewed data distribution. They all can benefit from intelligent query processing, but there can also be some regressions in the behavior of the queries
5:22
issues that are resolved by intelligent query processing are inappropriate operator choice
5:30
meaning nested loops versus hash match join or or vice versa and memory grand under or overestimation
5:40
big difference between SQL Server 2016 and prior and 2017 and 2019
5:50
are that on SQL 2016 and prior all planned decisions are made at compile time
5:59
So not after the query is compiled, nothing can be changed in the execution plan
6:08
in the execution of the query. and once the plan is created
6:14
only recompile or memory pressure if the plan is forced out of the plan cache
6:22
So nothing other than this can change the execution plan that is in the execution plan cache
6:33
On SQL Server 2017, we have three improvements. multi-statement table value functions interleaved execution
6:45
which means that SQL Server will execute the multi-statement table value function in compile time
6:53
and get the exact number of rows that that function will return in the query
6:59
and have the exact number of the exact cardinality of that function
7:05
This is a big change comparing to 2016 and prior SQL servers
7:12
because prior to that, SQL server has a hard guess of 1 up until SQL 2014 and 100 after SQL 2014 Next we have Adaptive Join and Memory Grant Batch Memory Grant Feedback that runs in batch mode
7:37
Adaptive Join also runs only in batch mode. And these are the ways that SQL Server can
7:44
after compiling the execution plan, change the execution plan that is in the cache
7:50
So this is really a big thing. and on SQL Server 2019
7:56
there are additional adaptive improvements like interleaved execution of table variables which behave the same as
8:05
multi-station table value functions so SQL executes them and gets the correct cardinality
8:12
and memory grant feedback in SQL 2019 now works in raw mode also
8:20
so on 2017 it worked only in batch mode so using column store indexes and we have t-sql inlining which is a great feature
8:32
also so now it's time to go through each of those features in detail so first first of them is
8:43
adaptive joins which works in 2017 and 2019 but only in batch mode so meaning that we need to have
8:54
a column store index on the table touch that is touched by the query and the adaptive join is a
9:02
new operator you will see in the management studio which can in runtime decide whether it will execute
9:11
as hash or nested loop join. So adaptive join operator is a logical operator
9:16
It's not a physical operator. In runtime, SQL Server decides based on the threshold
9:22
that is calculated at compile time of building the execution plan whether the hash or nested loop join will be used
9:34
as i said before this feature is on by default you can turn it off by altering the scope
9:46
configuration of the database and setting batch mode adaptive joins to off or by using query hints
9:53
this is a good feature that handles oscillations in number of rows that are coming into the
10:01
the adaptive join, meaning that SQL Server at runtime can, if the number of rows is less than threshold
10:11
choose to use nested loop join, and if the number of rows coming
10:16
in this number one table is greater than the threshold, SQL Server query optimizer will use hash join
10:29
So this is how SQL Server handles oscillations with adaptive join operator
10:36
And this applies only to select queries. There is obviously some overhead because SQL Server at runtime has to decide whether to use nested loop or hash join
10:49
has joined but this overhead is compensated with the sql server choosing the right operator to do the
10:58
to do the job let's now switch to management studio and show this in practice
11:10
i will zoom in so you can see what i'm talking about in this demo
11:20
We will use Wide World Importance DW database, make sure it's in compatibility level of SQL Server 2019
11:29
and we will free ProjCache. Okay. Let's go into this database. With this statement, I'm just making sure that
11:44
column store index exists on a fact order table. because I need it for adaptive join to kick in
11:57
Okay. And now let's execute these two queries. These two queries are exactly the same
12:08
The only difference is that the predicate for the quantity is 360 in the first query and 361 in the second query
12:19
So let's run them together. Let's include actual execution plan. Okay, it's already included
12:29
And run this. First query returns 206 rows, and the second query returns 0 rows
12:39
Let's see the execution plans. You can see here, adapt to join operator
12:47
and here also adaptive join operator. So the first query, let me show you when I hover over the adaptive join
13:02
You can see here that adaptive threshold I talked about is 67.89
13:10
so if the first table running into the adapter join
13:19
has more rows than 67.89 in runtime remember that then the operator will
13:29
transform to hash match as it is here and in the second query
13:46
We can see that adaptive threshold is 6.5. Why it is different than in the first query
13:56
Because it's a different query because of the different predicate. and my database is in simple mode of parametrization
14:07
It's not enforced. So now SQL Server, because zero rows are coming into the adaptive join operator
14:17
is choosing nested loops. So in runtime, SQL Server chooses nested loops or hash join
14:27
if you for some reason identify that after migrating to SQL Server 2017
14:35
or 2019 if you experience some regressions in your queries you can easily
14:41
easily disable the batch mode adaptive joins by using this hint at query level
14:51
or you can disable it on the database level but I highly recommend that if you determine that some query is slower than before because of this batch mode adaptive joins you can use this hint to disable the adaptive joins
15:11
And now you get the old way of compiling and running the execution plan by SQL Server and
15:21
and you will not see adaptive join operator, but SQL Server Query Optimizer will choose nested loop
15:28
or hash join at compile time and use it at runtime. Okay
15:43
Next feature is memory grant feedback. It works on 2017 in batch mode only
15:50
and in 2019 in row and batch mod. What is memory grant feedback
15:57
Well, this is a feature where SQL Server at runtime can modify the memory granted to the query
16:07
So if SQL Server determines that too much memory was granted to the query, in the second execution of the same query
16:18
SQL Server will decrease the amount of granted memory, and vice versa if the granted memory was too little
16:28
and because of that you had some troubles with the query like spilt.mdb or something else
16:37
SQL Server at the next execution of the same query, which is fetched from the execution plan
16:43
will give this query more memory than it was initially given
16:53
So this is, again, modification of the execution plan in cache. You can also disable it or enable it
17:06
and if SQL Server encounters a situation where memory increase, memory decrease
17:14
fluctuation is going in the loop, you can imagine that this can happen
17:20
After about 30 occurrences of this increase and decrease fluctuation, SQL Server will disable memory grant feedback
17:31
for this statement. Let's see this in the demo
17:44
Let's include actual execution plan. The first part is the same in all the scripts
17:52
Make sure that it's in 2019 ability level. Now we will execute this query
18:04
and take a look at the properties. So let's run this. Okay, it's taking more time than... okay. So in the execution plan we see the exclamation mark
18:31
with the select operator. Let's click on it, hover it, and you can see that the query had to wait 13 seconds for memory grant during execution. Okay
18:46
this is why we waited for the query to execute, but the more important thing that I'm talking about
18:51
here is the query memory grant detected excessive grant, which may impact the reliability. Grant
18:59
size initial 23 megabytes, final the same, and used only 120 kilobytes
19:08
So this is something that memory grant feedback should correct. Let's take a look at the properties window of the select operator, and under memory grant info
19:22
you can see here that is memory grant feedback adjusted? No, this is the
19:31
first execution, so there is nothing to adjust because the query is not yet in the execution
19:37
plan cache. Let's run this query again. Okay. Again, we have exclamation mark
19:49
Let's see what it says now. it says the same message let's take a look at the memory grant info
20:01
it's still saying the first execution there is something wrong in my playing cache let's run it again
20:12
okay now we don't have exclamation mark on the select statement and memory grant info says
20:25
is memory grant feedback adjusted yes adjusting now if i executed a few more times
20:36
this message would change to yes stable. Okay. Let's go to the next feature
20:51
And next feature is interleaved execution. Interleaved execution is a feature that applies
20:56
to multi-statement table-valued functions, which had a fixed cardinality of 100 from SQL Server 2014
21:04
and before that, the fixed cardinality was 1. And this could lead to numerous problems
21:12
when you had multistatement table-valued functions that returns hundreds of thousands or millions of rows
21:21
SQL Server 2017 and 2019, when encounters a multistatement table-valued function
21:31
pauses the optimization at compile time, executes the function, captures the accurate cardinality
21:40
and then resumes the optimization. And at runtime, because the function was executed
21:46
at compile time, it doesn't have to execute it again. So he only executes it once
21:59
and this is during the compilation of the execution plan. This only applies to select queries
22:09
and as always, you can turn it on and off. Let's see this in a demo
22:22
So this is the demo Let clear the cache okay this is our function it a very simple function that
22:37
takes start and end dates as parameters and returns a table and we insert some rows in
22:50
that table from the fact order table where order date key is between start and end
23:03
So very simple but multi-statement table value function. Okay so this is not an inline function and now let's take a look how this behaves under
23:18
compatibility level 150 so this is SQL 2019. I forgot to include the actual execution plan sorry about that and in the execution plan
23:35
we have an estimate the number of rows of 5281
23:45
and actual number of rows is exactly the same this is what we expected because of what i told
23:52
you about this feature and we can disable this with a hint or by setting the compatibility level
24:04
of the database to compatibility prior to 2017 sql server i'm putting this
24:11
database into compatibility level of SQL 2016. And now if I run this
24:20
we get an estimate of 100. And actual is the same, of course, because it is the same function
24:37
and we are using the same set of parameters as you can see here
24:46
Okay, when could you have problems with these features? Well, you could have problems when you
24:52
rely on the old cardinality estimator. So if you rely that for your multi-table value function
25:01
you will get 100 even the function is is returning more rows than that then then you could have
25:10
regressions in after migration to sql 2017 or 2019. uh those three features were the features of the sql that were available uh already on sql 2017
25:28
now we are we will go through the features that are available on 2019 only
25:37
so the first one is table variable the third compilation the story is the same as with mstvfs the sql server pauses the optimization of the
25:54
the execution plan executes the table variable and gets the actual row count
26:01
Before that, it was one row guess, so on SQL 2017 and prior
26:07
This doesn't mean that table variables get column statistics, and this doesn't increase recompilation frequency
26:17
And what happens with the variations in table variable cardinality? well this is the same as with any other predicate you are prone to parameter sniffing
26:29
which could sometimes be a problem and sometimes not let's see this in a demo okay
26:47
let's include actual execution plan we have a table variable named foo
26:55
and order key is the only filled in that table we are filling in
27:02
the table variable with rows from the fact order table with the same parameters we use for the multi-statement table value function
27:13
And then we are selecting from fact order and joining this full table variable by order key
27:25
Let's see how this behaves on SQL 2019. so we get the execution plan and in the execution plan you can see that estimate
27:43
is spot on 5281 and the actual is 5281 sorry this one
27:53
So this is because SQL executed the table variable and got the exact number of rows from that table variable
28:04
If we lower the compatibility level to SQL Server 2017, we will get different results
28:17
so now i'm executing the same set of statements i executed before on sql 2019
28:26
the only difference is the compatibility level which is now 140 so sql 2017
28:33
and let's take a look at the estimates now it's fixed guess of one
28:46
and the actual number is of course 5281 next feature a really good one is color udf inlining available on sql 2019
29:09
So what is Scalar UDF inlining? So it applies obviously to Scalar user-defined functions
29:20
And SQL Server Query Optimizer is now able to transform these user-defined functions into relational expressions
29:29
This is set-oriented and it is able to run in parallel, which was not the case in the SQL 2017 and prior
29:43
Whenever you had the UDF in the query, this query couldn't run in parallel
29:52
It was only able to run in serial mode. And In Management Studio, when you took a look at this query, it lacked of costing
30:06
This was a black box to SQL Server and to us when looking to the execution plan in Management Studio
30:16
And let's see this in action. Let's see a demo. okay let's do the same thing as we always do
30:36
okay and now we have a scholar function named total price which takes the customer key as
30:45
input parameter and from fact order for this customer calculates the total including tax
30:55
it calculates the sum of that puts that in the in the variable total price and returns total price
31:04
so let's create this function okay and now let's run the first query which is on sql server 2019
31:20
in, and let's set statistics time to on. So this query runs relatively fast. It
31:34
executes executed in two seconds, and the execution plan, we can see that SQL
31:43
server unwrapped that Scalar function. It is here. It's getting data from the order table
31:56
Aggregation we can see here and so on. And when we disable the SQL Scalar UDF inlining, so we are still on SQL 2019
32:08
but we have disabled UDF inlining. Let's run this query. And this query just..
32:20
It will not execute even. In the satisfactory number of seconds
32:32
I will not wait for the query to execute because it will run very, very slow
32:39
Let's take a look at the estimated execution plan. And you can see here that UDF is in the different part of the execution plan You don even have it in the
33:01
It's a black box to SQL Server. If we display the estimated execution plan of both queries
33:09
you will see that the query that doesn't allow SQL UDF inlining is cheaper by Management Studio
33:24
but this is only the estimate of course and this one is more expensive which is a total lie
33:31
let's go to the next feature next feature is batch mod on row store
33:42
so this works only on sql 2019 and on enterprise edition of sql server it enables you to
33:52
run batch mode queries without column store indexes. So what does even batch mode means
34:04
SQL server can execute queries in row mode or batch mode. Row mode is the old way, available from forever
34:17
And it means that the first operator, for example, select, asks for one row from the next operator, and this operator asks the next operator for one
34:30
row, and it goes to the index-sick or index-scan or table-scan operator, which then retrieves
34:37
one row and gives it back to the operator that asked for that row
34:43
In batch mode, the procedure is similar, but operator is not asking for one row, but for batch of rows
34:57
How many rows will be in the one batch depends on the size of the row. so
35:09
what are the scenarios which can benefit from this these are ytical queries on all
35:19
ATP databases, joins and aggregates scenarios in queries which are CPU bound
35:25
and if you cannot afford to create a column store index on your
35:31
OLTP database because it is too much overhead for you and for your database
35:38
Operators that can be run in batch mode are Hash join hash aggregates sorts window aggregates filters concatenations and compute scalar And there are some limitations
35:51
like only disk-based tables and indexes can be used. You cannot use in-memory tables
36:00
and expect to be run in batch mode, and you cannot use LOBs, sparse, column sets, and XML
36:10
Let's see this in a demo. Let's put the database into compatibility level SQL Server 2019
36:28
Let's go into wide world importance DW. Let's drop the column store index on the fact order table
36:40
to show that batch mode works without column store indexes on 2019
36:47
And we are running the same query. These two are completely the same
36:56
The first one has a hint allow batch mode, which is not needed
36:59
but let's have it here. And this allow batch mode, this allows the batch mode from the second query
37:10
Let's execute this with actual execution plan. And let's take a look. So first query returns 100 rows and second returns 100 rows also because
37:25
they are the same. First query that used batch mode ran in 163 milliseconds and the second one
37:35
in 348 so the first one was quicker and if you take a look at the operators
37:45
sorry you can see that estimated execution plan and actual execution plan mod is batch and on the second query
38:00
it is row. The difference in the execution time
38:12
between these two queries is not significant, but when you're talking about hundreds of thousands
38:21
or millions of rows that have to be processed by the query
38:27
then the difference will be much higher. And then we have approximate query processing feature which introduces the InstalServer 2019
38:44
a proxy count distinct function that allows you to return to the client approximate number of rows
38:56
distinct rows and it guarantees up to a 2% error rate within a 97% probability
39:08
The most important difference between count distinct and approximately count distinct is that approximately count distinct
39:16
will use much less memory to do the same job. And it will run faster, of course
39:27
but you will not get the exact number of rows so let's see this in a demo
39:40
okay let's use adventureworks wide world importance dw and we are counting order keys
39:52
on fact order You can see that the result is different
40:03
This is an approximative result. This is the exact number of rows and I didn't include actual execution plan
40:13
Sorry for that, I will do this again. the execution plan you can see that select statement had a memory grant of almost zero
40:36
and sorry almost zero zero and this one had 23 megabytes this one
40:53
So it's much, much higher memory usage when you use count distinct in contrast to using ApproxCountDistinct
41:18
Okay, this is all I wanted to say
#Software
#Computer Science


