High Performance Techniques For Microsoft SQL Server PDF
High Performance Techniques For Microsoft SQL Server PDF
Editor-in-chief:
Aaron Bertrand
Authors:
Aaron Bertrand
Erin Stellato
Glenn Berry
Jason Hall
Joe Sack
Jonathan Kehayias
Kevin Kline
Paul Randal
Paul White
eBook Lead:
Eric Smith
Project Lead:
Kevin Kline
Copyright © 2013 SQL Sentry Inc
All Rights Reserved
Forward
It is with great pleasure that I present to you our first eBook, a collection of blog posts from
SQLPerformance.com. In the pages ahead you will find several useful, hand-picked articles that will help
give you insight into some of your most vexing performance problems. These articles were written by
several of the SQL Server industry's leading experts, including Paul Randal, Jonathan Kehayias, and Paul
White.
I want to thank SQL Sentry for making me Editor-in-Chief of the site, my esteemed colleague Kevin Kline
for helping assemble this eBook, our technical editor, Eric Smith, and all of our authors who have helped
make our content top-notch, and of course our readers who keep us motivated to keep producing
quality material. Thank you.
Aaron Bertrand
Table of Contents
Table of Contents
Best Approach for Running Totals
Split Strings the Right Way
Split Strings: Now with less T-SQL
My Perspective: The Top 5 Most Common SQL Server Performance Problems
Performance impact of different error handling techniques
Using named instances? Test your DAC connection!
What is the fastest way to calculate the median?
T-SQL Tuesday #33 : Trick Shots : Schema Switch-A-Roo
Conditional Order By
Splitting Strings : A Follow-Up
When the DRY principle doesn’t apply
Hit-Highlighting in Full-Text Search
What impact can different cursor options have?
How much impact can a data type choice have?
What is the most efficient way to trim time from datetime?
Beware misleading data from SET STATISTICS IO
Trimming time from datetime – a follow-up
Measuring “Observer Overhead” of SQL Trace vs. Extended Events
The Zombie PerfMon Counters That Never Die!
Is the sp_ prefix still a no-no?
Configuring a Dedicated Network for Availability Group Communication
Checking if a non-LOB column needs to be updated
Minimizing the impact of DBCC CHECKDB : DOs and DON’Ts
Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing
The Benefits of Indexing Foreign Keys
Quick Tip – Speed Up a Slow Restore from the Transaction Log
Ten Common Threats to Execution Plan Quality
Bad cardinality estimates coming from SSMS execution plans
An important change to Extended Events in SQL Server 2012
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
Trimming the Transaction Log Fat
TRANSACTION_MUTEX and Multi-Session Transaction Access
Trimming More Transaction Log Fat
Generate a set or sequence without loops – part 1
Generate a set or sequence without loops – part 2
Generate a set or sequence without loops – part 3
Potential enhancements to ASPState
Selecting a Processor for SQL Server 2012
A Look At DBCC CHECKCONSTRAINTS and I/O
Transaction Log Configuration Issues
The Halloween Problem – Part 1
The Halloween Problem – Part 2
The Halloween Problem – Part 3
The Halloween Problem – Part 4
Break large delete operations into chunks
The Problem with Windows Functions and Views
Index
Best Approach for Running Totals
By Aaron Bertrand
I see a lot of advice out there that says something along the lines of, “Change your cursor to a set-based
operation; that will make it faster.” While that can often be the case, it’s not always true. One use case I
see where a cursor repeatedly outperforms the typical set-based approach is the calculation of running
totals. This is because the set-based approach usually has to look at some portion of the underlying data
more than one time, which can be an exponentially bad thing as the data gets larger; whereas a cursor –
as painful as it might sound – can step through each row/value exactly once.
These are our basic options in most common versions of SQL Server. In SQL Server 2012, however, there
have been several enhancements made to windowing functions and the OVER clause, mostly stemming
from several great suggestions submitted by fellow MVP Itzik Ben-Gan (here is one of his suggestions).
In fact Itzik has a new MS-Press book that covers all of these enhancements in much greater detail,
entitled, “Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions.”
So naturally, I was curious; would the new windowing functionality make the cursor and self-join
techniques obsolete? Would they be easier to code? Would they be faster in any (never mind all) cases?
What other approaches might be valid?
The Setup
USE [master];
GO
IF DB_ID('RunningTotals') IS NOT NULL
BEGIN
ALTER DATABASE RunningTotals SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE RunningTotals;
END
GO
CREATE DATABASE RunningTotals;
GO
USE RunningTotals;
GO
SET NOCOUNT ON;
GO
And then fill a table with 10,000 rows that we can use to perform some running totals against. Nothing
too complicated, just a summary table with a row for each date and a number representing how many
speeding tickets were issued. I haven’t had a speeding ticket in a couple of years, so I don’t know why
this was my subconscious choice for a simplistic data model, but there it is.
;WITH x(d,h) AS
(
SELECT TOP (250)
ROW_NUMBER() OVER (ORDER BY [object_id]),
CONVERT(INT, RIGHT([object_id], 2))
FROM sys.all_objects
ORDER BY [object_id]
)
INSERT dbo.SpeedingTickets([Date], TicketCount)
SELECT TOP (10000)
d = DATEADD(DAY, x2.d + ((x.d-1)*250), '19831231'),
x2.h
FROM x CROSS JOIN x AS x2
ORDER BY d;
GO
So again, 10,000 rows of pretty simple data – small INT values and a series of dates from 1984 through May
of 2011.
The Approaches
Now my assignment is relatively simple and typical of many applications: return a resultset that has all 10,000
dates, along with the cumulative total of all speeding tickets up to and including that date. Most people
would first try something like this (we’ll call this the “inner join” method):
SELECT
st1.[Date],
st1.TicketCount,
RunningTotal = SUM(st2.TicketCount)
FROM
dbo.SpeedingTickets AS st1
INNER JOIN
dbo.SpeedingTickets AS st2
ON st2.[Date] <= st1.[Date]
GROUP BY st1.[Date], st1.TicketCount
ORDER BY st1.[Date];
…and be shocked to discover that it takes nearly 10 seconds to run. Let’s quickly examine why by viewing the
graphical execution plan, using SQL Sentry Plan Explorer:
The big fat arrows should give an immediate indication of what is going on: the nested loop reads one
row for the first aggregation, two rows for the second, three rows for the third, and on and on through
the entire set of 10,000 rows. This means we should see roughly ((10000 * (10000 + 1)) / 2) rows
processed once the entire set is traversed, and that seems to match with the number of rows shown in
the plan.
Note that running the query without parallelism (using the OPTION (MAXDOP 1) query hint) makes the
plan shape a little simpler, but does not help at all in either execution time or I/O; as shown in the plan,
duration actually almost doubles, and reads only decrease by a very small percentage. Comparing to the
previous plan:
There are plenty of other approaches that people have tried to get efficient running totals. One example
is the “subquery method” which just uses a correlated subquery in much the same way as the inner join
method described above:
SELECT
[Date],
TicketCount,
RunningTotal = TicketCount + COALESCE(
(
SELECT SUM(TicketCount)
FROM dbo.SpeedingTickets AS s
WHERE s.[Date] < o.[Date]), 0
)
FROM dbo.SpeedingTickets AS o
ORDER BY [Date];
Comparing those two plans:
So while the subquery method appears to have a more efficient overall plan, it is worse where it
matters: duration and I/O. We can see what contributes to this by digging into the plans a little deeper.
By moving to the Top Operations tab, we can see that in the inner join method, the clustered index seek
is executed 10,000 times, and all other operations are only executed a few times. However, several
operations are executed 9,999 or 10,000 times in the subquery method:
So, the subquery approach seems to be worse, not better. The next method we’ll try, I’ll call the “quirky
update” method. This is not exactly guaranteed to work, and I would never recommend it for production
code, but I’m including it for completeness. Basically the quirky update takes advantage of the fact that
during an update you can redirect assignment and math so that the variable increments behind the scenes as
each row is updated.
UPDATE @st
SET @RunningTotal = RunningTotal = @RunningTotal + TicketCount
FROM @st;
UPDATE @st
SET @RunningTotal = RunningTotal = @RunningTotal + TicketCount
FROM @st
OPTION (FORCE ORDER);
For a little more confidence at a slightly higher I/O cost, you can bring the original table back into play, and
ensure that the PK on the base table is used:
UPDATE st
SET @RunningTotal = st.RunningTotal = @RunningTotal + t.TicketCount
FROM dbo.SpeedingTickets AS t WITH (INDEX = pk)
INNER JOIN @st AS st
ON t.[Date] = st.[Date]
OPTION (FORCE ORDER);
Personally I don’t think it’s that much more guaranteed, since the SET part of the operation could potentially
influence the optimizer independent of the rest of the query. Again, I’m not recommending this approach,
I’m just including the comparison for completeness. Here is the plan from this query:
Based on the number of executions we see in the Top Operations tab (I’ll spare you the screen shot; it’s
1 for every operation), it is clear that even if we perform a join in order to feel better about ordering,
the quirky update allows the running totals to be calculated in a single pass of the data. Comparing it to
the previous queries, it is much more efficient, even though it first dumps data into a table variable and
is separated out into multiple operations:
This brings us to a “recursive CTE” method. This method uses the date value, and relies on the assumption
that there are no gaps. Since we populated this data above, we know that it is a fully contiguous series, but in
a lot of scenarios you can’t make that assumption. So, while I’ve included it for completeness, this approach
isn’t always going to be valid. In any case, this uses a recursive CTE with the first (known) date in the table as
the anchor, and the recursive portion determined by adding one day (adding the MAXRECURSION option
since we know exactly how many rows we have):
;WITH x AS
(
SELECT [Date], TicketCount, RunningTotal = TicketCount
FROM dbo.SpeedingTickets
WHERE [Date] = '19840101'
UNION ALL
SELECT y.[Date], y.TicketCount, x.RunningTotal + y.TicketCount
FROM x INNER JOIN dbo.SpeedingTickets AS y
ON y.[Date] = DATEADD(DAY, 1, x.[Date])
)
SELECT [Date], TicketCount, RunningTotal
FROM x
ORDER BY [Date]
OPTION (MAXRECURSION 10000);
This query works about as efficiently as the quirky update method. We can compare it against the
subquery and inner join methods:
Like the quirky update method, I would not recommend this CTE approach in production unless you can
absolutely guarantee that your key column has no gaps. If you may have gaps in your data, you can
construct something similar using ROW_NUMBER(), but it is not going to be any more efficient than the
self-join method above.
DECLARE
@Date DATE,
@TicketCount INT,
@RunningTotal INT = 0;
DECLARE c CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR
SELECT [Date], TicketCount
FROM dbo.SpeedingTickets
ORDER BY [Date];
OPEN c;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @TicketCount;
CLOSE c;
DEALLOCATE c;
We can see that over 20,000 statements have been collected, but if we sort by Estimated or Actual Rows
descending, we find that there are only two operations that handle more than one row. Which is a far
cry from a few of the above methods that cause exponential reads due to reading the same previous
rows over and over again for each new row.
Now, let’s take a look at the new windowing enhancements in SQL Server 2012. In particular, we can
now calculate SUM OVER() and specify a set of rows relative to the current row. So, for example:
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] ROWS UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
These two queries happen to give the same answer, with correct running totals. But do they work
exactly the same? The plans suggest that they don’t. The version with ROWS has an additional operator,
a 10,000-row sequence project:
And that’s about the extent of the difference in the graphical plan. But if you look a little closer at actual
runtime metrics, you see minor differences in duration and CPU, and a huge difference in reads. Why is
this? Well, this is because RANGE uses an on-disk spool, while ROWS uses an in-memory spool. With
small sets the difference is probably negligible, but the cost of the on-disk spool can certainly become
more apparent as sets get larger. I don’t want to spoil the ending, but you might suspect that one of
these solutions will perform better than the other in a more thorough test.
As an aside, the following version of the query yields the same results, but works like the slower RANGE
version above:
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date])
FROM dbo.SpeedingTickets
ORDER BY [Date];
So as you’re playing with the new windowing functions, you’ll want to keep little tidbits like this in mind:
the abbreviated version of a query, or the one that you happen to have written first, is not necessarily
the one you want to push to production.
In order to conduct fair tests, I created a stored procedure for each approach, and measured the results
by capturing statements on a server where I was already monitoring with SQL Sentry Performance
Advisor (if you are not using our tool, you can collect SQL:BatchCompleted events in a similar way using
SQL Server Profiler).
By “fair tests” I mean that, for example, the quirky update method requires an actual update to static
data, which means changing the underlying schema or using a temp table / table variable. So I
structured the stored procedures to each create their own table variable, and either store the results
there, or store the raw data there and then update the result. The other issue I wanted to eliminate was
returning the data to the client – so the procedures each have a debug parameter specifying whether to
return no results (the default), top/bottom 5, or all. In the performance tests I set it to return no results,
but of course validated each to ensure that they were returning the right results.
The stored procedures are all modeled this way (I’ve attached a script that creates the database and the
stored procedures, so I’m just including a template here for brevity):
I still found that one of the commands above was not getting caught by the Top SQL threshold; it was
the Windowed_Rows version. So I added the following to that batch only:
Conclusion
1. Windowed aggregates in SQL Server 2012 make performance issues with running totals
computations (and many other next row(s) / previous row(s) problems) alarmingly more
efficient. When I saw the low number of reads I thought for sure there was some kind of
mistake, that I must have forgotten to actually perform any work. But no, you get the same
number of reads if your stored procedure just performs an ordinary SELECT from the
SpeedingTickets table. (Feel free to test this yourself with STATISTICS IO.)
2. The issues I pointed out earlier about RANGE vs. ROWS yield slightly different runtimes
(duration difference of about 6x – remember to ignore the second I added with WAITFOR), but
read differences are astronomical due to the on-disk spool. If your windowed aggregate can be
solved using ROWS, avoid RANGE, but you should test that both give the same result (or at least
that ROWS gives the right answer). You should also note that if you are using a similar query and
you don’t specify RANGE nor ROWS, the plan will operate as if you had specified RANGE).
3. The subquery and inner join methods are relatively abysmal. 35 seconds to a minute to generate
these running totals? And this was on a single, skinny table without returning results to the
client. These comparisons can be used to show people why a purely set-based solution is not
always the best answer.
4. Of the faster approaches, assuming you are not yet ready for SQL Server 2012, and assuming
you discard both the quirky update method (unsupported) and the CTE date method (can’t
guarantee a contiguous sequence), only the cursor performs acceptably. It has the highest
duration of the “faster” solutions, but the least amount of reads.
I hope these tests help give a better appreciation for the windowing enhancements that Microsoft has
added to SQL Server 2012. Please be sure to thank Itzik if you see him online or in person, since he was
the driving force behind these changes. In addition, I hope this helps open some minds out there that a
cursor may not always be the evil and dreaded solution it is often depicted to be.
(As an addendum, I did test the CLR function offered by Pavel Pawlowski, and the performance
characteristics were nearly identical to the SQL Server 2012 solution using ROWS. Reads were identical,
CPU was 78 vs. 47, and overall duration was 73 instead of 40. So if you won’t be moving to SQL Server
2012 in the near future, you may want to add Pavel’s solution to your tests.)
I know many people are bored of the string splitting problem, but it still seems to come up almost daily
on forum and Q & A sites like StackOverflow. This is the problem where people want to pass in a string
like this:
The answer seems to be, almost invariably, that you should use CLR. If you can’t use CLR – and I know there
are many of you out there who can’t, due to corporate policy, the pointy-haired boss, or stubbornness – then
you use one of the many workarounds that exist. And many workarounds exist.
I’m going to compare the performance of a few solutions – and focus on the question everyone always asks:
“Which is fastest?” I’m not going to belabor the discussion around *all* of the potential methods, because
several have already been eliminated due to the fact that they simply don’t scale. And I may re-visit this in
the future to examine the impact on other metrics, but for now I’m just going to focus on duration. Here are
the contenders I am going to compare (using SQL Server 2012, 11.00.2316, on a Windows 7 VM with 4 CPUs
and 8 GB of RAM):
CLR
If you wish to use CLR, you should definitely borrow code from fellow MVP Adam Machanic before thinking
about writing your own (I’ve blogged before about re-inventing the wheel, and it also applies to free code
snippets like this). He spent a lot of time fine-tuning this CLR function to efficiently parse a string. If you are
currently using a CLR function and this is not it, I strongly recommend you deploy it and compare – I tested it
against a much simpler, VB-based CLR routine that was functionally equivalent, but performed about three
times worse.
So I took Adam’s function, compiled the code to a DLL (using csc), and deployed just that file to the server.
Then I added the following assembly and function to my database:
XML
This is the typical function I use for one-off scenarios where I know the input is “safe,” but is not one I
recommend for production environments (more on that below).
A very strong caveat has to ride along with the XML approach: it can only be used if you can guarantee that
your input string does not contain any illegal XML characters. One name with <, > or & and the function will
blow up. So regardless of the performance, if you’re going to use this approach, be aware of the limitations –
it should not be considered a viable option for a generic string splitter. I’m including it in this round-up
because you may have a case where you can trust the input – for example it is possible to use for comma-
separated lists of integers or GUIDs.
Numbers table
This solution uses a Numbers table, which you must build and populate yourself. (We’ve been requesting a
built-in version for ages.) The Numbers table should contain enough rows to exceed the length of the longest
string you’ll be splitting. In this case we’ll use 1,000,000 rows:
WITH n AS
(
SELECT
x = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
CROSS JOIN sys.all_objects AS s3
)
SELECT Number = x
INTO dbo.Numbers
FROM n
WHERE x BETWEEN 1 AND @UpperLimit;
GO
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers(Number)
WITH (DATA_COMPRESSION = PAGE);
GO
(Using data compression will drastically reduce the number of pages required, but obviously you should only
use this option if you are running Enterprise Edition. In this case the compressed data requires 1,360 pages,
versus 2,102 pages without compression – about a 35% savings.)
This solution uses a recursive CTE to extract each part of the string from the “remainder” of the previous
part. As a recursive CTE with local variables, you’ll note that this had to be a multi-statement table-valued
function, unlike the others which are all inline.
WITH a AS
(
SELECT
[start] = 1,
[end] = COALESCE(NULLIF(CHARINDEX(@Delimiter,
@List, @ld), 0), @ll),
[value] = SUBSTRING(@List, 1,
COALESCE(NULLIF(CHARINDEX(@Delimiter,
@List, @ld), 0), @ll) - 1)
UNION ALL
SELECT
[start] = CONVERT(INT, [end]) + @ld,
[end] = COALESCE(NULLIF(CHARINDEX(@Delimiter,
@List, [end] + @ld), 0), @ll),
[value] = SUBSTRING(@List, [end] + @ld,
COALESCE(NULLIF(CHARINDEX(@Delimiter,
@List, [end] + @ld), 0), @ll)-[end]-@ld)
FROM a
WHERE [end] < @ll
)
INSERT @Items SELECT [value]
FROM a
WHERE LEN([value]) > 0
OPTION (MAXRECURSION 0);
RETURN;
END
GO
Jeff Moden’s splitter
Over on SQLServerCentral, Jeff Moden presented a splitter function that rivaled the performance of CLR, so I
thought it only fair to include it in this round-up. I had to make a few minor changes to his function in order
to handle our longest string (500,000 characters), and also made the naming conventions similar:
As an aside, for those using Jeff Moden’s solution, you may consider using a Numbers table as above, and
experimenting with a slight variation on Jeff’s function:
(This will trade slightly higher reads for slightly lower CPU, so may be better depending on whether your
system is already CPU- or I/O-bound.)
Sanity checking
Just to be sure we’re on the right track, we can verify that all five functions return the expected results:
And in fact, these are the results we see in all five cases…
Now that we know the functions behave as expected, we can get to the fun part: testing performance against
various numbers of strings that vary in length. But first we need a table. I created the following simple object:
I populated this table with a set of strings of varying lengths, making sure that roughly the same set of data
would be used for each test – first 10,000 rows where the string is 50 characters long, then 1,000 rows where
the string is 500 characters long, 100 rows where the string is 5,000 characters long, 10 rows where the string
is 50,000 characters long, and so on up to 1 row of 500,000 characters. I did this both to compare the same
amount of overall data being processed by the functions, as well as to try to keep my testing times somewhat
predictable.
I use a #temp table so that I can simply use GO <constant> to execute each batch a specific number of times:
Creating and populating this table took about 20 seconds on my machine, and the table represents about 6
MB worth of data (about 500,000 characters times 2 bytes, or 1 MB per string_type, plus row and index
overhead). Not a huge table, but it should be large enough to highlight any differences in performance
between the functions.
The Tests
With the functions in place, and the table properly stuffed with big strings to chew on, we can finally run
some actual tests to see how the different functions perform against real data. In order to measure
performance without factoring in network overhead, I used SQL Sentry Plan Explorer, running each set of
tests 10 times, collecting the duration metrics, and averaging.
The first test simply pulled the items from each string as a set:
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
The results show that as the strings get larger, the advantage of CLR really shines. At the lower end, the
results were mixed, but again the XML method should have an asterisk next to it, since its use depends on
relying on XML-safe input. For this specific use case, the Numbers table consistently performed the worst:
After the hyperbolic 40-second performance for the numbers table against 10 rows of 50,000
characters, I dropped it from the running for the last test. To better show the relative performance of
the four best methods in this test, I’ve dropped the Numbers results from the graph altogether:
Next, let’s compare when we perform a search against the comma-separated value (e.g. return the rows
where one of the strings is ‘foo’). Again we’ll use the five functions above, but we’ll also compare the
result against a search performed at runtime using LIKE instead of bothering with splitting.
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
SELECT s.string_type
FROM dbo.strings
WHERE string_type = @i
AND ',' + string_value + ',' LIKE '%,' + @search + ',%';
These results show that, for small strings, CLR was actually the slowest, and that the best solution is
going to be performing a scan using LIKE, without bothering to split the data up at all. Again I dropped
the Numbers table solution from the 5th approach, when it was clear that its duration would increase
exponentially as the size of the string went up:
And to better demonstrate the patterns for the top 4 results, I’ve eliminated the Numbers and XML
solutions from the graph:
Next, let’s look at replicating the use case from the beginning of this post, where we’re trying to find all
the rows in one table that exist in the list being passed in. As with the data in the table we created
above, we’re going to create strings varying in length from 50 to 500,000 characters, store them in a
variable, and then check a common catalog view for existing in the list.
DECLARE
@i INT = <num>, -- value 1-5, yielding strings 50 - 500,000 characters
@x NVARCHAR(MAX) = N'a,id,xyz,abcd,abcde,sa,foo,bar,mort,splunge,bacon,';
SELECT c.[object_id]
FROM sys.all_columns AS c
WHERE EXISTS
(
SELECT 1 FROM dbo.SplitStrings_<method>(@x, N',') AS x
WHERE Item = c.name
)
ORDER BY c.[object_id];
SELECT [object_id]
FROM sys.all_columns
WHERE N',' + @x + ',' LIKE N'%,' + name + ',%'
ORDER BY [object_id];
These results show that, for this pattern, several methods see their duration increase exponentially as
the size of the string goes up. At the lower end, XML keeps good pace with CLR, but this quickly
deteriorates as well. CLR is consistently the clear winner here:
And again without the methods that explode upward in terms of duration:
Finally, let’s compare the cost of retrieving the data from a single variable of varying length, ignoring the
cost of reading data from a table. Again we’ll generate strings of varying length, from 50 – 500,000
characters, and then just return the values as a set:
DECLARE
@i INT = <num>, -- value 1-5, yielding strings 50 - 500,000 characters
@x NVARCHAR(MAX) = N'a,id,xyz,abcd,abcde,sa,foo,bar,mort,splunge,bacon,';
In almost all cases, the CLR solution clearly out-performs the other approaches – in some cases it’s a
landslide victory, especially as string sizes increase; in a few others, it’s a photo finish that could fall
either way. In the first test we saw that XML and CTE out-performed CLR at the low end, so if this is a
typical use case *and* you are sure that your strings are in the 1 – 10,000 character range, one of those
approaches might be a better option. If your string sizes are less predictable than that, CLR is probably
still your best bet overall – you lose a few milliseconds at the low end, but you gain a whole lot at the
high end. Here are the choices I would make, depending on the task, with second place highlighted for
cases where CLR is not an option. Note that XML is my preferred method only if I know the input is XML-
safe; these may not necessarily be your best alternatives if you have less faith in your input.
The only real exception where CLR is not my choice across the board is the case where you’re actually
storing comma-separated lists in a table, and then finding rows where a defined entity is in that list. In
that specific case, I would probably first recommend redesigning and properly normalizing the schema,
so that those values are stored separately, rather than using it as an excuse to not use CLR for splitting.
If you can’t use CLR for other reasons, there isn’t a clear-cut “second place” revealed by these tests; my
answers above were based on overall scale and not at any specific string size. Every solution here was
runner up in at least one scenario – so while CLR is clearly the choice when you can use it, what you
should use when you cannot is more of an “it depends” answer – you’ll need to judge based on your use
case(s) and the tests above (or by constructing your own tests) which alternative is better for you.
The above approaches require no changes to your existing application(s), assuming they are already
assembling a comma-separated string and throwing it at the database to deal with. One option you
should consider, if either CLR is not an option and/or you can modify the application(s), is using Table-
Valued Parameters (TVPs). Here is a quick example of how to utilize a TVP in the above context. First,
create a table type with a single string column:
using (connectionObject)
{
SqlCommand cmd = new SqlCommand("dbo.UpdateProfile", connectionObject);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter tvparam = cmd.Parameters.AddWithValue("@TeamNames", tvp);
tvparam.SqlDbType = SqlDbType.Structured;
// other parameters, e.g. userId
cmd.ExecuteNonQuery();
}
You might consider this to be a prequel to a follow-up post.
Of course this doesn’t play well with JSON and other APIs – quite often the reason a comma-separated
string is being passed to SQL Server in the first place.
Split Strings: Now with less T-SQL
By Aaron Bertrand
Some interesting discussions always evolve around the topic of splitting strings. In two previous blog
posts, “Split strings the right way – or the next best way” and “Splitting Strings : A Follow-Up,” I hope I
have demonstrated that chasing the “best-performing” T-SQL split function is fruitless. When splitting is
actually necessary, CLR always wins, and the next best option can vary depending on the actual task at
hand. But in those posts I hinted that splitting on the database side may not be necessary in the first
place.
SQL Server 2008 introduced table-valued parameters, a way to pass a “table” from an application to a
stored procedure without having to build and parse a string, serialize to XML, or deal with any of this
splitting methodology. So I thought I would check how this method compares to the winner of our
previous tests – since it may be a viable option, whether you can use CLR or not. (For the ultimate bible
on TVPs, please see fellow SQL Server MVP Erland Sommarskog’s comprehensive article.)
The Tests
For this test I’m going to pretend we are dealing with a set of version strings. Imagine a C# application
that passes in a set of these strings (say, that have been collected from a set of users) and we need to
match the versions against a table (say, which indicates the service releases that are applicable to a
specific set of versions). Obviously a real application would have more columns than this, but just to
create some volume and still keep the table skinny (I also use NVARCHAR throughout because that is
what the CLR split function takes and I want to eliminate any ambiguity due to implicit conversion):
;WITH x AS
(
SELECT lp = CONVERT(DECIMAL(4,3), RIGHT(RTRIM(s1.[object_id]), 3)/1000.0)
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
)
INSERT dbo.VersionStrings
(
left_post, right_post
)
SELECT
lp - CASE WHEN lp >= 0.9 THEN 0.1 ELSE 0 END,
lp + (0.1 * CASE WHEN lp >= 0.9 THEN -1 ELSE 1 END)
FROM x;
Now that the data is in place, the next thing we need to do is create a user-defined table type that can
hold a set of strings. The initial table type to hold this string is pretty simple:
CREATE TYPE dbo.VersionStringsTVP AS TABLE (VersionString NVARCHAR(5));
Then we need a couple of stored procedures to accept the lists from C#. For simplicity, again, we’ll just
take a count so that we can be sure to perform a complete scan, and we’ll ignore the count in the
application:
SELECT c = COUNT(*)
FROM dbo.VersionStrings AS v
INNER JOIN dbo.SplitStrings_CLR(@list, N',') AS s
ON s.Item BETWEEN v.left_post AND v.right_post;
END
GO
SELECT c = COUNT(*)
FROM dbo.VersionStrings AS v
INNER JOIN @list AS l
ON l.VersionString BETWEEN v.left_post AND v.right_post;
END
GO
Note that a TVP passed into a stored procedure must be marked as READONLY – there is currently no
way to perform DML on the data like you would for a table variable or temp table. However, Erland has
submitted a very popular request that Microsoft make these parameters more flexible (and plenty of
deeper insight behind his argument here).
The beauty here is that SQL Server no longer has to deal with splitting a string at all – neither in T-SQL
nor in handing it off to CLR – as it’s already in a set structure where it excels.
Accepts a number as an argument to indicate how many string elements should be defined
Builds a CSV string of those elements, using StringBuilder, to pass to the CLR stored procedure
Builds a DataTable with the same elements to pass to the TVP stored procedure
Also tests the overhead of converting a CSV string to a DataTable and vice-versa before calling
the appropriate stored procedures
The code for the C# app is found at the end of the article. I can spell C#, but I am by no means a guru; I
am sure there are inefficiencies you can spot there that may make the code perform a bit better. But
any such changes should affect the entire set of tests in a similar way.
I ran the application 10 times using 100, 1,000, 2,500 and 5,000 elements. The results were as follows
(this is showing average duration, in seconds, across the 10 tests):
Performance Aside…
In addition to the clear performance difference, TVPs have another advantage – table types are much
simpler to deploy than CLR assemblies, especially in environments where CLR has been forbidden for
other reasons. I am hoping that barriers to CLR are gradually disappearing, and new tools are making
deployment and maintenance less painful, but I doubt the ease of initial deployment for CLR will ever be
easier than native approaches.
On the other hand, on top of the read-only limitation, table types are like alias types in that they are
difficult to modify after the fact. If you want to change the size of a column or add a column, there is no
ALTER TYPE command, and in order to DROP the type and re-create it, you must first remove references
to the type from all procedures that are using it. So for example in the above case if we needed to
increase the VersionString column to NVARCHAR(32), we’d have to create a dummy type and alter the
stored procedure (and any other procedure that is using it):
Conclusion
The TVP method consistently outperformed the CLR splitting method, and by a greater percentage as
the number of elements increased. Even adding in the overhead of converting an existing CSV string to a
DataTable yielded much better end-to-end performance. So I hope that, if I hadn’t already convinced
you to abandon your T-SQL string splitting techniques in favor of CLR, I have urged you to give table-
valued parameters a shot. It should be easy to test out even if you’re not currently using a DataTable (or
some equivalent).
The C# Code Used For These Tests
As I said, I’m no C# guru, so there are probably plenty of naïve things I am doing here, but the
methodology should be quite clear.
using System;
using System.IO;
using System.Data;
using System.Data.SqlClient;
using System.Text;
using System.Collections;
namespace SplitTester
{
class SplitTester
{
static void Main(string[] args)
{
DataTable dt_pure = new DataTable();
dt_pure.Columns.Add("Item", typeof(string));
using
(
SqlConnection conn = new SqlConnection(@"Data Source=.;
Trusted_Connection=yes;Initial Catalog=Splitter")
)
{
conn.Open();
// four cases:
// (1) pass CSV string directly to CLR split procedure
// (2) pass DataTable directly to TVP procedure
// (3) serialize CSV string from DataTable and pass CSV to CLR procedure
// (4) populate DataTable from CSV string and pass DataTable to TCP
procedure
write("Finished (1)");
write("Finished (2)");
write("Finished (3)");
// ********** (4) ********** //
write("Finished (4)");
}
}
As lots of you know already, Kevin Kline (blog|@KEKline) has joined us here at SQL Sentry. I’ve followed
Kevin myself for a good while, so I’m really excited to have him here with us.
One of the first things Kevin asked of me was to list the top 5 technical issues we see while providing
customer service for SQL Sentry. For our Client Services team, technical support is far more than just
supporting our own software products or managing licenses. Everyone on the team is deeply trained in
SQL Server, Windows and Microsoft .NET. In fact, in general, any member trained on our Client Services
team should be able to walk into the role of a Windows sysadmin, SQL Server DBA or .NET Application
Developer with very little adjustment effort.
When presented with a technical issue or question, we try to see past the surface, and read between
the lines. Many technical issues or questions are not rooted in what seems apparent on the surface, and
the Client Services team is expected to work to resolve the root problem rather than simply addressing
the immediate or obvious symptom.
As a result of this expectation, we do come across a variety of technical issues with both Windows and
SQL Server, and I thought that filling Kevin’s request for a “Top 5″ would make for an interesting blog
post. I’m starting with 5 SQL Server Performance issues, and I’ll have another post later on 5 general
technical issues that may or may not have anything to do with SQL Server.
I’ll make this a count down from 5 to 1, with 1 being the most common technical problem we see.
Jonathan Kehayias (blog|@SQLPoolBoy) covers this topic very well in a great article on Optimizing
tempdb configuration with SQL Server 2012 Extended Events, so I’m not going to attempt to go into it
very deeply here, but I will talk about my experience with it.
I’m starting to see this more and more. It is usually with a system making heavy use of tempdb for some
type of ETL process. This is especially common if it is an ongoing “real-time” style ETL process. I’m not
saying data loads are the cause of this by the way; I’m just relaying my observations.
The symptoms of this can vary, but some things are always the same. High PAGELATCH waits in tempdb
and poor performance recorded for processes using tempdb. I’ll typically follow the waits to Top SQL in
Performance Advisor, and see lots of queries that use temp tables listed in Top SQL. These queries
usually run in milliseconds, and should never be counted among the “Top SQL” for the server. This can
have people feeling like these queries are a large part of the problem, but that is not necessarily the
case at all. The queries are the victims of the real problem.
Once I’m suspecting this to be the case, I will usually jump to the Disk Activity tab in Performance
Advisor to see how tempdb is configured. Most times I actually see the same thing: A busy tempdb with
a single data file defined. From here I’ll usually recommend reconfiguring tempdb, and direct them to a
resource like Jonathan’s article for more information.
The problem here is that the thresholds for triggering auto statistics updates end up being the same in
most cases, even for a very large table. Without going into a very deep explanation, the threshold is
~20% of the rows in the table. So on a really big table it takes a lot of data change to trigger an update.
Kevin Kline has a nice, easy to follow explanation of this here as well.
The reason this makes the list is that DBAs seem really surprised to find out that the auto update isn’t
taking care of things the way the name implies. Then there are also many dbas that believe it should be
handled by their maintenance job. Then after looking at the maintenance, they are doing index reorgs
most of the time, and that won’t update the statistics either (though a rebuild will). I also want to note
here that if you are using the Fragmentation Manager feature in SQL Sentry 7.0 and higher, you can
have a running history of when your indexes were reorganized rather than rebuilt. This can help you
decide whether the problem you’re seeing could be related to auto update not happening.
The lesson here is really to keep an eye on statistics, and make sure they’re updated regularly, especially
on large tables, which are becoming more and more common as time goes on. Another option here can
be to use trace flag 2371 to actually change the formula used to trigger the update. The nicest
explanation of this option I have found is at this blog post by Juergen Thomas.
I would say that this is hands down the single most common wait type I see on larger SQL Server systems
when someone asks me to look into query performance with them.
There is a lot of information out there on how to deal with this, but sadly I still see a lot of people make
the initial assumption that the problem should be solved by having either the query or the entire server
set MAXDOP to 1. More often than not the problem can be handled by proper indexing or statistics
maintenance. It could also be that the plan cached for this query is just not optimal, and you can mark it
for recompile using sp_recompile, set recompile at the query level, or just evict the plan using DBCC
FREEPROCCACHE with a plan handle. It is best to exhaust these options before deciding to change
MAXDOP to 1 because you could be throwing away a lot of processing power without realizing it.
Paul Randal (blog|@PaulRandal) has a great survey on his blog here that seems to support what I’m
used to seeing as well. In fact, he’s the one who first taught me that MAXDOP 1 is not necessarily the
answer to this.
I’m going to focus on operation timeouts, since they are the most common. Operation timeout errors
from various software tools might be the most misunderstood situation I come across. The cause of
these really boils down to one simple thing though: The client executing the command has set a
maximum amount of time that it will wait for the command to complete. If this maximum is reached
prior to completion the command is aborted. An error is raised from the client.
Many times the timeout error will induce a panic mode, because the error can look a bit intimidating.
The reality is, though, that this is not much different than hitting the stop button in SSMS because the
query was taking too long. In fact, it will show up exactly the same in a profiler trace with Error = 2
(Aborted).
So, what does a timeout like this really tell us? It tells us that queries are taking longer than expected.
We should go into “performance tuning” mode rather than “something is broken” mode. The error
information from the client is really just some good information on where you might start to focus your
tuning efforts.
If you receive timeout errors from the SQL Sentry monitoring service, and one of the servers you are
monitoring is the source, this is not telling you that SQL Sentry is broken. This is SQL Sentry telling you
that this server is experiencing performance issues. Again, it is time for “performance tuning” mode.
These errors could be easily consumed internally, and retried later, but this would be doing our
customers a huge disservice. We believe that you should know about *any* potential problem on your
monitored server, even if it is SQL Sentry encountering the problem.
Incidentally, this is true for SQL Sentry, just as it is for any other system that uses an RDBMS for a
repository your SQL Sentry database needs some TLC now and again. Without it you may indeed
experience some timeouts from your SQL Sentry client. We spend a lot of time tuning our queries for
performance before they ever make it out the door, but proper maintenance will ensure they continue
to run as intended.
This is the big one. As soon as Kevin mentioned wanting this list it’s the first thing that popped into my
head. Not only because I see it so often, but also because it is so often mistaken for poor disk
performance.
There are lots of caches in SQL Server, but the most well-known is the data cache (aka buffer pool). The
easiest way to describe the data cache is that it is the data stored in memory, rather than persisted to
disk. Being able to store lots of data in memory long term is desirable because working with data in
memory is generally much quicker than having to perform physical IOs.
I could turn this post into a very long discussion on memory pressure in SQL Server at this point, but I
promise I will try to avoid that. There is already a ton of information available on this subject, and that is
not really the intent of this post. What I will say is that, usually, memory pressure manifests as a few
different symptoms. When viewed individually, some of these symptoms can lead you to incorrect, and
sometimes costly, conclusions.
The two misleading symptoms are that you may start to see higher than normal latency across the disk
subsystem, and you may start to see abnormally high waits related to disk activity. If you look at nothing
but these two symptoms, you may come to the conclusion that you need to work on your disk system.
This is why being presented with all relevant metrics on one dashboard is so important. You have to look
at the bigger picture, and having the memory-related data available along with the disk activity and
waits helps to paint a clearer picture of what is really going on.
Typically what I’ll see (along with the disk waits and disk latency) is a PLE (Page Lifetime Expectancy) that
is fairly low for this server. I describe it this way because what is good or bad for this value really
“depends”. The larger your buffer cache is, the higher your “critical” threshold will be for PLE. The more
data there is to churn in and out of the buffer, the worse off you will be when the “churn” actually
happens. Another consideration is NUMA. The way the PLE counter is calculated can cause this value
alone to be very misleading when multiple NUMA nodes are involved, as described by Paul Randal in a
blog post about Page Life Expectancy isn’t what you think… Luckily in SQL Sentry 7.0 and higher, you can
actually see where PLE is for the individual NUMA nodes in history mode, which makes this a bit less of a
problem.
I’ll usually also see consistently higher lazy writer activity, and SQL Server page faults (SQL Server going
to disk). Sometimes I’ll see what I call buffer tearing. It’s basically when the data buffer is up and down
frequently creating a jagged (or torn) edge on the history chart in Performance Advisor. Finally, I may
also see an abnormally large plan cache reducing available memory for the data cache.
All of these things together spell memory pressure, and there are various ways to deal with them, but
the important thing to note is that this is not a disk issue. It’s not saying that your disk system is
necessarily wonderful either, but I am saying I wouldn’t call up your SAN guy and order a bunch of new
hardware based on this situation. Once you get the memory pressure situation under control, SQL
Server will not need to go to disk as much, and the few symptoms related to disk may disappear
entirely!
The moral here is really to always consider the full picture of performance, because looking at one thing
out of context could severely limit your options for a solution.
In SQL Server Agent Properties, under History, you can adjust retention settings.
For some reason, I’ve seen quite a few people set this to unlimited by unchecking both checkboxes. If
you do this, and you use Agent jobs frequently, eventually you’re going to run into problems with job
history in MSDB, because these tables aren’t really indexed very well. The settings I’m using above are
generally fine for most cases, and if you’re using SQL Sentry Event Manager, you’re keeping this
information in the SQL Sentry database anyway, so retaining it here is just redundant.
Conclusion
So there are my (current) top 5 most common SQL Server performance issues/topics. For #4 and #5, I
actually had to run some numbers to find out what they were, but for the top three, I knew without
having to consider it much at all. Thanks for reading!
Performance impact of different error handling techniques
By Aaron Bertrand
People wonder whether they should do their best to prevent exceptions, or just let the system handle
them. I’ve seen several discussions where folks debate whether they should do whatever they can to
prevent an exception, because error handling is “expensive.” There is no doubt that error handling isn’t
free, but I would predict that a constraint violation is at least as efficient as checking for a potential
violation first. This may be different for a key violation than a static constraint violation, for example, but
in this post I’m going to focus on the former.
Just let the engine handle it, and bubble any exception back to the caller.
Use TRY/CATCH with ROLLBACK in the CATCH block (SQL Server 2005+).
And many take the approach that they should check if they’re going to incur the violation first, since it
seems cleaner to handle the duplicate yourself than to force the engine to do it. My theory is that you
should trust but verify; for example, consider this approach (mostly pseudo-code):
INSERT dbo.[Objects](name)
SELECT name FROM dbo.GenerateRows(100000, N'')
ORDER BY name;
GO
Now, since we are going to be inserting new unique values into the table, I created a procedure to
perform some cleanup at the beginning and end of each test – in addition to deleting any new rows
we’ve added, it will also clean up the cache and buffers. Not something you want to code into a
procedure on your production system, of course, but quite fine for local performance testing.
For these tests I picked 40,000 rows as my total number of insert attempts, and in the procedure I perform a
union of 20,000 unique or non-unique rows with 20,000 other unique or non-unique rows. You can see that I
hard-coded the cutoff strings in the procedure; please note that on your system these cutoffs will almost
certainly occur in a different place.
DECLARE
@CutoffString1 NVARCHAR(255),
@CutoffString2 NVARCHAR(255),
@Name NVARCHAR(255),
@Continue BIT = 1,
@LogID INT;
-- generate a new log entry
INSERT dbo.RunTimeLog(Spid, InsertType, ErrorHandlingMethod)
SELECT @@SPID, @InsertType, @ErrorHandlingMethod;
IF @InsertType = 'AllSuccess'
SELECT @CutoffString1 = N'database_audit_specifications_1000',
@CutoffString2 = N'dm_clr_properties_1398';
IF @InsertType = 'AllFail'
SELECT @CutoffString1 = N'', @CutoffString2 = N'';
IF @InsertType = 'HalfSuccess'
SELECT @CutoffString1 = N'database_audit_specifications_1000',
@CutoffString2 = N'';
DECLARE c CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR
SELECT name FROM dbo.GenerateRows(@RowSplit, @CutoffString1)
UNION ALL
SELECT name FROM dbo.GenerateRows(@RowSplit, @CutoffString2);
OPEN c;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Continue = 1;
IF @Continue = 1
BEGIN
-- just let the engine catch
IF @ErrorHandlingMethod LIKE '%Insert'
BEGIN
INSERT dbo.[Objects](name) SELECT @name;
END
CLOSE c;
DEALLOCATE c;
-- update the log entry
UPDATE dbo.RunTimeLog SET EndDate = SYSUTCDATETIME()
WHERE LogID = @LogID;
Having said that, the individual graphs for each scenario (high % of success, high % of failure, and 50-50)
really drive home the impact of each method.
In this case we see that the overhead of checking for the violation first is negligible, with an average
difference of 0.7 seconds across the batch (or 125 microseconds per insert attempt):
Where only half the inserts succeed
When half the inserts fail, we see a big jump in the duration for the insert / rollback methods. The
scenario where we start a transaction and roll it back is about 6x slower across the batch when
compared to checking first (1.625 milliseconds per attempt vs. 0.275 milliseconds per attempt). Even the
TRY/CATCH method is 11% faster when we check first:
Where all the inserts fail
As you might expect, this shows the most pronounced impact of error handling, and the most obvious
benefits of checking first. The rollback method is nearly 70x slower in this case when we don’t check
compared to when we do (3.59 milliseconds per attempt vs. 0.065 milliseconds per attempt):
What does this tell us? If we think we are going to have a high rate of failure, or have no idea what our
potential failure rate will be, then checking first to avoid violations in the engine is going to be
tremendously worth our while. Even in the case where we have a successful insert every time, the cost
of checking first is marginal and easily justified by the potential cost of handling errors later (unless your
anticipated failure rate is exactly 0%).
So for now I think I will stick to my theory that, in simple cases, it makes sense to check for a potential
violation before telling SQL Server to go ahead and insert anyway. In a future post, I will look at the
performance impact of various isolation levels, concurrency, and maybe even a few other error handling
techniques.
[As an aside, I wrote a condensed version of this post as a tip for mssqltips.com back in February.]
Using named instances? Test your DAC connection!
By Aaron Bertrand
Playing around today, I discovered that I couldn’t connect to my local named instance using
the dedicated administrator connection (otherwise known as the DAC, but not that DAC):
It turns out that this symptom only affects *named* instances. I was talking about this with Jonathan
Kehayias, who had a default instance, and could connect fine. However he couldn’t connect if he
explicitly specified the port number, which led him to discover that TCP/IP was disabled.
While this affects named instances of Developer Edition specifically because the TCP/IP protocol is
disabled by default, there are other scenarios where this can hurt you if you have named instances and…
To resolve this, make sure that TCP/IP is enabled via the SQL Server Configuration Manager > Network
Protocols for <named instance> and make sure that the SQL Server Browser Service is running. You will
need to restart SQL Server.
Now, when you are able to connect via the DAC, if you try to connect within Management Studio, you
will get this error message:
This error message is benign (and I believe comes from the background IntelliSense connection). You can
see from your status bar that you are connected, and you can verify your connection is the DAC
connection by dismissing this error message and running a query.
In any case, confirming that you are able to connect via the DAC is an absolutely essential step in your
disaster recovery plan. If you can’t connect to the DAC, you should plan for one or both of the following
actions during your next maintenance window (or earlier, if you can afford a service restart):
enable TCP/IP
In either case, ensure the SQL Server Browser Service is running. Also be sure the server setting to
enable remote connections is enabled, since you never know where you might be when you need to
access an unresponsive server.
Kendra Little wrote a great blog post about the DAC last year. It’s fun to root around and see what you
can do with the DAC, and it’s really nice to know it’s there, but it’s also important to know how it might
not be able to help you in the event of actual server hardship.
What is the fastest way to calculate the median?
By Aaron Bertrand
SQL Server has traditionally shied away from providing native solutions to some of the more common
statistical questions, such as calculating a median. According to WikiPedia, “median is described as the
numerical value separating the higher half of a sample, a population, or a probability distribution, from
the lower half. The median of a finite list of numbers can be found by arranging all the observations
from lowest value to highest value and picking the middle one. If there is an even number of
observations, then there is no single middle value; the median is then usually defined to be the mean of
the two middle values.”
In terms of a SQL Server query, the key thing you’ll take away from that is that you need to “arrange”
(sort) all of the values. Sorting in SQL Server is typically a pretty expensive operation if there isn’t a
supporting index, and adding an index to support an operation which probably isn’t requested that
often may not be worthwhile.
Let’s examine how we have typically solved this problem in previous versions of SQL Server. First let’s
create a very simple table so that we can eyeball that our logic is correct and deriving an accurate
median. We can test the following two tables, one with an even number of rows, and the other with an
odd number of rows:
INSERT dbo.EvenRows(id,val)
SELECT 1, 6
UNION ALL SELECT 2, 11
UNION ALL SELECT 3, 4
UNION ALL SELECT 4, 4
UNION ALL SELECT 5, 15
UNION ALL SELECT 6, 14
UNION ALL SELECT 7, 4
UNION ALL SELECT 8, 9;
INSERT dbo.OddRows(id,val)
SELECT 1, 6
UNION ALL SELECT 2, 11
UNION ALL SELECT 3, 4
UNION ALL SELECT 4, 4
UNION ALL SELECT 5, 15
UNION ALL SELECT 6, 14
UNION ALL SELECT 7, 4;
In SQL Server 2000, we were constrained to a very limited T-SQL dialect. I’m investigating these options
for comparison because some people out there are still running SQL Server 2000, and others may have
upgraded but, since their median calculations were written “back in the day,” the code might still look
like this today.
This approach takes the highest value from the first 50 percent, the lowest value from the last 50
percent, then divides them by two. This works for even or odd rows because, in the even case, the two
values are the two middle rows, and in the odd case, the two values are actually from the same row.
SELECT @Median = (
(SELECT MAX(val) FROM
(SELECT TOP 50 PERCENT val
FROM dbo.EvenRows ORDER BY val, id) AS t)
+ (SELECT MIN(val) FROM
(SELECT TOP 50 PERCENT val
FROM dbo.EvenRows ORDER BY val DESC, id DESC) AS b)
) / 2.0;
2000_B – #temp table
This example first creates a #temp table, and using the same type of math as above, determines the two
“middle” rows with assistance from a contiguous IDENTITY column ordered by the val column. (The
order of assignment of IDENTITY values can only be relied upon because of the MAXDOP setting.)
CREATE TABLE #x
(
i INT IDENTITY(1,1),
val DECIMAL(12, 2)
);
INSERT #x(val)
SELECT val
FROM dbo.EvenRows
ORDER BY val OPTION (MAXDOP 1);
SQL Server 2005 introduced some interesting new window functions, such as ROW_NUMBER(), which
can help solve statistical problems like median a little easier than we could in SQL Server 2000. These
approaches all work in SQL Server 2005 and above:
This example uses ROW_NUMBER() to walk up and down the values once in each direction, then finds
the “middle” one or two rows based on that calculation. This is quite similar to the first example above,
with easier syntax:
This one is quite similar to the above, using a single calculation of ROW_NUMBER() and then using the
total COUNT() to find the “middle” one or two rows:
Fellow MVP Itzik Ben-Gan showed me this method, which achieves the same answer as the above two
methods, but in a very slightly different way:
In SQL Server 2012, we have new windowing capabilities in T-SQL that allow statistical calculations like
median to be expressed more directly. To calculate the median for a set of values, we can
use PERCENTILE_CONT(). We can also use the new “paging” extension to the ORDER BY clause (OFFSET /
FETCH).
This solution uses a very straightforward calculation using distribution (if you don’t want the average
between the two middle values in the case of an even number of rows).
This example implements a clever use of OFFSET / FETCH (and not exactly one for which it was intended)
– we simply move to the row that is one before half the count, then take the next one or two rows
depending on whether the count was odd or even. Thanks to Itzik Ben-Gan for pointing out this
approach.
We’ve verified that the above methods all produce the expected results on our little table, and we know
that the SQL Server 2012 version has the cleanest and most logical syntax. But which one should you be
using in your busy production environment? We can build a much bigger table from system metadata,
making sure we have plenty of duplicate values. This script will produce a table with 10,000,000 non-
unique integers:
USE tempdb;
GO
CREATE TABLE dbo.obj(id INT IDENTITY(1,1), val INT);
INSERT dbo.obj(val)
SELECT TOP (10000000) o.[object_id]
FROM sys.all_columns AS c
CROSS JOIN sys.all_objects AS o
CROSS JOIN sys.all_objects AS o2
WHERE o.[object_id] > 0
ORDER BY c.[object_id];
On my system the median for this table should be 146,099,561. I can calculate this pretty quickly without a
manual spot check of 10,000,000 rows by using the following query:
val rn
---- ----
146099561 4999999
146099561 5000000
146099561 5000001
So now we can create a stored procedure for each method, verify that each one produces the correct
output, and then measure performance metrics such as duration, CPU and reads. We’ll perform all of
these steps with the existing table, and also with a copy of the table that does not benefit from the
clustered index (we’ll drop it and re-create the table as a heap).
I’ve created seven procedures implementing the query methods above. For brevity I won’t list them
here, but each one is named dbo.Median_<version>, e.g. dbo.Median_2000_A,dbo.Median_2000_B,
etc. corresponding to the approaches described above. If we run these seven procedures using the
free SQL Sentry Plan Explorer, here is what we observe in terms of duration, CPU and reads (note that
we run DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS in between executions):
And these metrics don’t change much at all if we operate against a heap instead. The biggest percentage
change was the method that still ended up being the fastest: the paging trick using OFFSET / FETCH:
Here is a graphical representation of the results. To make it more clear, I highlighted the slowest
performer in red and the fastest approach in green.
I was surprised to see that, in both cases, PERCENTILE_CONT() – which was designed for this type of
calculation – is actually worse than all of the other earlier solutions. I guess it just goes to show that
while sometimes newer syntax might make our coding easier, it doesn’t always guarantee that
performance will improve. I was also surprised to see OFFSET / FETCH prove to be so useful in scenarios
that usually wouldn’t seem to fit its purpose – pagination.
In any case, I hope I have demonstrated which approach you should use, depending on your version of
SQL Server (and that the choice should be the same whether or not you have a supporting index for the
calculation).
T-SQL Tuesday #33 : Trick Shots : Schema Switch-A-Roo
By Aaron Bertrand
This month’s T-SQL Tuesday is being hosted by Mike Fal (blog | twitter), and the topic is Trick Shots,
where we’re invited to tell the community about some solution we used in SQL Server that felt, at least
to us, as a sort of “trick shot” – something similar to using massé, “English” or complicated bank shots in
billiards or snooker. After working with SQL Server for some 15 years, I’ve had the occasion to come up
with tricks to solve some pretty interesting problems, but one that seems to be quite reusable, easily
adapts to many situations, and is simple to implement, is something I call “schema switch-a-roo.”
Let’s say you have a scenario where you have a large lookup table that needs to get refreshed
periodically. This lookup table is needed across many servers and can contain data that gets populated
from an external or 3rd party source, e.g. IP or domain data, or can represent data from within your own
environment.
The first couple of scenarios where I needed a solution for this were making metadata and denormalized
data available to read-only “data caches” – really just SQL Server MSDE (and later Express) instances
installed on various web servers, so the web servers pulled this cached data locally instead of bothering
the primary OLTP system. This may seem redundant, but off-loading read activity away from the primary
OLTP system, and being able to take the network connection out of the equation completely, led to a
real bump in all-around performance and, most notably, for end users.
These servers did not need up-to-the minute copies of the data; in fact, a lot of the cache tables were
only updated daily. But since the systems were 24×7, and some of these updates could take several
minutes, they often got in the way of real customers doing real things on the system.
At the very beginning, the code was rather simplistic: we deleted rows that had been removed from the
source, updated all the rows that we could tell had changed, and inserted all the new rows. It looked
something like this (error handling etc. removed for brevity):
BEGIN TRANSACTION;
DELETE dbo.Lookup
WHERE [key] NOT IN
(SELECT [key] FROM [source]);
INSERT dbo.Lookup([cols])
SELECT [cols] FROM [source]
WHERE [key] NOT IN
(SELECT [key] FROM dbo.Lookup);
COMMIT TRANSACTION;
Needless to say this transaction could cause some real performance issues when the system was in use.
Surely there were other ways to do this, but every method we tried was equally slow and expensive.
How slow and expensive? “Let me count the scans…”
Since this pre-dated MERGE, and we had already discarded “external” approaches like DTS, through
some testing we determined that it would be more efficient to just wipe the table and re-populate it,
rather than to try and sync to the source:
BEGIN TRANSACTION;
INSERT dbo.Lookup([cols])
SELECT [cols] FROM [source];
COMMIT TRANSACTION;
Now, as I explained, this query from [source] could take a couple of minutes, especially if all of the web
servers were being updated in parallel (we tried to stagger where we could). And if a customer was on
the site and trying to run a query involving the lookup table, they had to wait for that transaction to
finish. In most cases, if they’re running this query at midnight, it wouldn’t really matter if they got
yesterday’s copy of the lookup data or today’s; so, making them wait for the refresh seemed silly, and
actually did lead to a number of support calls.
My initial solution, back when SQL Server 2000 was cool, was to create a “shadow” table:
INSERT dbo.Lookup_Shadow([cols])
SELECT [cols] FROM [source];
BEGIN TRANSACTION;
-- if successful:
EXEC sp_rename N'dbo.Lookup_Fake', N'dbo.Lookup_Shadow';
The downside to this initial approach was that sp_rename has a non-suppressible output message
warning you about the dangers of renaming objects. In our case we performed this task through SQL
Server Agent jobs, and we handled a lot of metadata and other cache tables, so the job history was
flooded with all these useless messages and actually caused real errors to be truncated from the history
details. (I complained about this in 2007, but my suggestion was ultimately dismissed and closed as
“Won’t Fix.”)
Once we upgraded to SQL Server 2005, I discovered this fantastic command called CREATE SCHEMA. It
was trivial to implement the same type of solution using schemas instead of renaming tables, and now
the Agent history wouldn’t be polluted with all of these unhelpful messages. Basically I created two new
schemas:
With those two schemas in place, and a copy of the Lookup table in the shadow schema, my three-way
rename became a three-way schema transfer:
INSERT shadow.Lookup([cols])
SELECT [cols] FROM [source];
BEGIN TRANSACTION;
COMMIT TRANSACTION;
Some Caveats
Foreign Keys
This won’t work out of the box if the lookup table is referenced by foreign keys. In our case we
didn’t point any constraints at these cache tables, but if you do, you may have to stick with
intrusive methods such as MERGE. Or use append-only methods and disable or drop the foreign
keys before performing any data modifications (then re-create or re-enable them afterward). If
you stick with MERGE / UPSERT techniques and you’re doing this between servers or, worse yet,
from a remote system, I highly recommend getting the raw data locally rather than trying to use
these methods between servers.
Statistics
Switching the tables (using rename or schema transfer) will lead to statistics flipping back and
forth between the two copies of the table, and this can obviously be an issue for plans. So you
may consider adding explicit statistics updates as part of this process.
Other Approaches
There are of course other ways to do this that I simply haven’t had the occasion to try. Partition
switching and using a view + synonym are two approaches I may investigate in the future for a
more thorough treatment of the topic. I’d be interested to hear your experiences and how
you’ve solved this problem in your environment. And yes, I realize that this problem is largely
solved by Availability Groups and readable secondaries in SQL Server 2012, but I consider it a
“trick shot” if you can solve the problem without throwing high-end licenses at the problem, or
replicating an entire database to make a few tables redundant. :-)
Conclusion
If you can live with the limitations here, this approach may well be a better performer than a scenario
where you essentially take a table offline using SSIS or your own MERGE / UPSERT routine, but please be
sure to test both techniques. The most significant point is that the end user accessing the table should
have the exact same experience, any time of the day, even if they hit the table in the middle of your
periodic update.
Conditional Order By
By Aaron Bertrand
A common scenario in many client-server applications is allowing the end user to dictate the sort order of
results. Some people want to see the lowest priced items first, some want to see the newest items first, and
some want to see them alphabetically. This is a complex thing to achieve in Transact-SQL because you can’t
just say:
-- or
(And when the error message says, “an expression referencing a column name,” you might find it ambiguous,
and I agree. But I can assure you that this does not mean a variable is a suitable expression.)
If you try to append @SortDirection, the error message is a little more opaque:
There are a few ways around this, and your first instinct might be to use dynamic SQL, or to introduce the
CASE expression. But as with most things, there are complications that can force you down one path or
another. So which one should you use? Let’s explore how these solutions might work, and compare the
impacts on performance for a few different approaches.
Sample Data
Using a catalog view we all probably understand quite well, sys.all_objects, I created the following table
based on a cross join, limiting the table to 100,000 rows (I wanted data that filled many pages but that didn’t
take significant time to query and test):
CREATE DATABASE OrderBy;
GO
USE OrderBy;
GO
Then I created a typical clustered / non-clustered index pair that might exist on such a table, prior to
optimization (I can’t use object_id for the key, because the cross join creates duplicates):
As mentioned above, users may want to see this data ordered in a variety of ways, so let’s set out some
typical use cases we want to support (and by support, I mean demonstrate):
We’ll leave the key_col ordering as the default because it should be the most efficient if the user doesn’t
have a preference; since the key_col is an arbitrary surrogate that should mean nothing to the user (and may
not even be exposed to them), there is no reason to allow reverse sorting on that column.
In order to use CASE with ORDER BY, there must be a distinct expression for each combination of compatible
types and directions. In this case we would have to use something like this:
You can collapse this logic a little bit further by converting all the non-string types into strings that will sort
correctly, e.g.
I discovered this neat trick from AndriyM, though it is most useful in cases where all of the potential ordering
columns are of compatible types, otherwise the expression used for ROW_NUMBER() is equally complex. The
most clever part is that in order to switch between ascending and descending order, we simply multiply the
ROW_NUMBER() by 1 or -1. We can apply it in this situation as follows:
;WITH x AS
(
SELECT key_col, [object_id], name, type_desc, modify_date,
rn = ROW_NUMBER() OVER (
ORDER BY CASE @SortColumn
WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12)
WHEN 'object_id' THEN
RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0')
+ REPLICATE('0', 23) + RTRIM([object_id]), 24)
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
END
) * CASE @SortDirection WHEN 'ASC' THEN 1 ELSE -1 END
FROM dbo.sys_objects
)
SELECT key_col, [object_id], name, type_desc, modify_date
FROM x
ORDER BY rn;
END
GO
Again, OPTION RECOMPILE can help here. Also, you might notice in some of these cases that ties are handled
differently by the various plans – when ordering by name, for example, you will usually see key_col come
through in ascending order within each set of duplicate names, but you may also see the values mixed up. To
provide more predictable behavior in the event of ties, you can always add an additional ORDER BY clause.
Note that if you were to add key_col to the first example, you’ll need to make it an expression so that key_col
is not listed in the ORDER BY twice (you can do this using key_col + 0, for example).
Dynamic SQL
A lot of people have reservations about dynamic SQL – it’s impossible to read, it’s a breeding ground for SQL
injection, it leads to plan cache bloat, it defeats the purpose of using stored procedures… Some of these are
simply untrue, and some of them are easy to mitigate. I’ve added some validation here that could just as
easily be added to any of the above procedures:
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
EXEC dbo.Test_Sort_CaseExpanded;
--EXEC dbo.Test_Sort_CaseCollapsed;
--EXEC dbo.Test_Sort_RowNumber;
--EXEC dbo.Test_Sort_DynamicSQL;
GO 10
I also tested the first three cases with OPTION RECOMPILE (doesn’t make much sense for the dynamic SQL
case, since we know it will be a new plan each time), and all four cases with MAXDOP 1 to eliminate
parallelism interference. Here are the results:
Conclusion
For outright performance, dynamic SQL wins every time (though only by a small margin on this data set). The
ROW_NUMBER() approach, while clever, was the loser in each test (sorry AndriyM).
It gets even more fun when you want to introduce a WHERE clause, never mind paging. These three are like
the perfect storm for introducing complexity to what starts out as a simple search query. The more
permutations your query has, the more likely you’ll want to throw readability out the window and use
dynamic SQL in combination with the “optimize for ad hoc workloads” setting to minimize the impact of
single-use plans in your plan cache.
Splitting Strings : A Follow-Up
By Aaron Bertrand
There were a lot of comments following my post last week about string splitting. I think the point of the
article was not as obvious as it could have been: that spending a lot of time and effort trying to “perfect”
an inherently slow splitting function based on T-SQL would not be beneficial. I have since collected the
most recent version of Jeff Moden’s string splitting function, and put it up against the others:
I had to make a couple of adjustments to my tests to fairly represent Jeff’s function. Most importantly: I
had to discard all samples that involved any strings > 4,000 characters. So I changed the 5,000-character
strings in the dbo.strings table to be 4,000 characters instead, and focused only on the first three non-
MAX scenarios (keeping the previous results for the first two, and running the third tests again for the
new 4,000-character string lengths). I also dropped the Numbers table from all but one of the tests,
because it was clear that the performance there was always worse by a factor of at least 10. The
following chart shows the performance of the functions in each of the four tests, again averaged over 10
runs and always with a cold cache and clean buffers.
So here are my slightly revised preferred methods, for each type of task:
You’ll notice that CLR has remained my method of choice, except in the one case where splitting doesn’t
make sense. And in cases where CLR is not an option, the XML and CTE methods are generally more
efficient, except in the case of single variable splitting, where Jeff’s function may very well be the best
option. But given that I might need to support more than 4,000 characters, the Numbers table solution
just might make it back onto my list in specific situations where I’m not allowed to use CLR.
I promise that my next post involving lists will not talk about splitting at all, via T-SQL or CLR, and will
demonstrate how to simplify this problem regardless of data type.
As an aside, I noticed this comment in one of the versions of Jeff’s functions that was posted in the
comments:
I also thank whoever wrote the first article I ever saw on “numbers tables” which is located at the
following URL and to Adam Machanic for leading me to it many years ago.
http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-
table.html
That article was written by me in 2004. So whoever added the comment to the function, you’re
welcome. :-)
When the DRY principle doesn’t apply
By Aaron Bertrand
The “Don’t Repeat Yourself” principle suggests that you should reduce repetition. This week I came
across a case where DRY should be thrown out the window. There are other cases as well (for example,
scalar functions), but this one was an interesting one involving Bitwise logic.
0 = stock wheels
1 = 17" wheels
2 = 18" wheels
4 = upgraded tires
So possible combinations are:
0 = no upgrade
1 = upgrade to 17" wheels only
2 = upgrade to 18" wheels only
4 = upgrade tires only
5 = 1 + 4 = upgrade to 17" wheels and better tires
6 = 2 + 4 = upgrade to 18" wheels and better tires
Let’s set aside arguments, at least for now, about whether this should be packed into a single TINYINT in
the first place, or stored as separate columns, or use an EAV model… fixing the design is a separate issue.
This is about working with what you have.
To make the examples useful, let’s fill this table up with a bunch of random data. (And we’ll assume, for
simplicity, that this table contains only orders that haven’t yet shipped.) This will insert 50,000 rows of
roughly equal distribution between the six option combinations:
;WITH n AS
(
SELECT n,Flag FROM (VALUES(1,0),(2,1),(3,2),(4,4),(5,5),(6,6)) AS n(n,Flag)
)
INSERT dbo.CarOrders
(
OrderID,
WheelFlag,
OrderDate
)
SELECT x.rn, n.Flag, DATEADD(DAY, x.rn/100, '20100101')
FROM n
INNER JOIN
(
SELECT TOP (50000)
n = (ABS(s1.[object_id]) % 6) + 1,
rn = ROW_NUMBER() OVER (ORDER BY s2.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS x
ON n.n = x.n;
If we look at the breakdown, we can see this distribution. Note that your results may differ slightly than
mine depending on the objects in your system:
WheelFlag Count
--------- -----
0 7654
1 8061
2 8757
4 8682
5 8305
6 8541
Now let’s say it’s Tuesday, and we just got a shipment of 18″ wheels, which were previously out of stock. This
means we are able to satisfy all of the orders that require 18″ wheels – both those that upgraded tires (6),
and those that did not (2). So we *could* write a query like the following:
SELECT OrderID
FROM dbo.CarOrders
WHERE WheelFlag IN (2,6);
In real life, of course, you can’t really do that; what if more options are added later, like wheel locks, lifetime
wheel warranty, or multiple tire options? You don’t want to have to write a series of IN() values for every
possible combination. Instead we can write a BITWISE AND operation, to find all the rows where the 2nd bit
is set, such as:
SELECT OrderID
FROM dbo.CarOrders
WHERE WheelFlag & @Flag = @Flag;
This gets me the same results as the IN() query, but if I compare them using SQL Sentry Plan Explorer,
the performance is quite different:
It’s easy to see why. The first uses an index seek to isolate the rows that satisfy the query, with a filter
on the WheelFlag column:
The second uses a scan, coupled with an implicit convert, and terribly inaccurate statistics. All due to the
BITWISE AND operator:
So what does this mean? At the heart of it, this tells us that the BITWISE AND operation is not sargable.
If we ignore the DRY principle for a moment, we can write a slightly more efficient query by being a bit
redundant in order to take advantage of the index on the WheelFlag column. Assuming that we’re after
any WheelFlag option above 0 (no upgrade at all), we can re-write the query this way, telling SQL Server
that the WheelFlag value must be at least the same value as flag (which eliminates 0 and 1), and then
adding the supplemental information that it also must contain that flag (thus eliminating 5).
SELECT OrderID
FROM dbo.CarOrders
WHERE WheelFlag >= @Flag
AND WheelFlag & @Flag = @Flag;
The >= portion of this clause is obviously covered by the BITWISE portion, so this is where we violate
DRY. But because this clause we’ve added is sargable, relegating the BITWISE AND operation to a
secondary search condition still yields the same result, and the overall query yields better performance.
We see a similar index seek to the hard-coded version of the query above, and while the estimates are
even further off (something that may be addressed as a separate issue), reads are still lower than with
the BITWISE AND operation alone:
We can also see that a filter is used against the index, which we didn’t see when using the BITWISE AND
operation alone:
Conclusion
Don’t be afraid to repeat yourself. There are times when this information can help the optimizer; even
though it may not be entirely intuitive to *add* criteria in order to improve performance, it’s important
to understand when additional clauses help whittle the data down for the end result rather than making
it “easy” for the optimizer to find the exact rows on its own.
Hit-Highlighting in Full-Text Search
By Aaron Bertrand
Hit-highlighting is a feature that many people wish SQL Server’s Full-Text Search would support natively.
This is where you can return the entire document (or an excerpt) and point out the words or phrases
that helped match that document to the search. Doing so in an efficient and accurate manner is no easy
task, as I found out first hand.
As an example of hit-highlighting: when you perform a search in Google or Bing, you get the key words
bolded in both the title and the excerpt (click either image to enlarge):
[As an aside, I find two things amusing here: (1) that Bing favors Microsoft properties a lot more than
Google does, and (2) that Bing bothers returning 2.2 million results, many of which are likely irrelevant.]
These excerpts are commonly called “snippets” or “query-biased summarizations.” We’ve been asking
for this functionality in SQL Server for some time, but have yet to hear any good news from Microsoft:
Connect #722324 : Would be nice if SQL Full Text Search provided snippet / highlighting support
Will Sql Server 2012 FTS have native support for hit highlighting?
There are some partial solutions. This script from Mike Kramar, for example, will produce a hit-
highlighted extract, but does not apply the same logic (such as language-specific word breakers) to the
document itself. It also uses an absolute character count, so the excerpt can begin and end with partial
words (as I will demonstrate shortly). The latter is pretty easy to fix, but another issue is that it loads the
entire document into memory, rather than performing any kind of streaming. I suspect that in full-text
indexes with large document sizes, this will be a noticeable performance hit. For now I’ll focus on a
relatively small average document size (35 KB).
A simple example
So let’s say we have a very simple table, with a full-text index defined:
SELECT d.Title,
Excerpt = dbo.HighLightSearch(d.[Content], N'states', 'font-weight:bold', 80)
FROM dbo.[Document] AS d
INNER JOIN CONTAINSTABLE(dbo.[Document], *, N'states') AS t
ON d.ID = t.[KEY]
ORDER BY [RANK] DESC;
The results show how the excerpt works: a <SPAN> tag is injected at the first keyword, and the excerpt is
carved out based on an offset from that position (with no consideration for using complete words):
Now using a UDF function like Mike Kramar’s:
SELECT d.Title,
Excerpt = dbo.HighLightSearch(d.[Content], N'states', 'font-weight:bold', 80)
FROM dbo.[Document] AS d
INNER JOIN CONTAINSTABLE(dbo.[Document], *, N'states') AS t
ON d.ID = t.[KEY]
ORDER BY [RANK] DESC;
The results show how the excerpt works: a <SPAN> tag is injected at the first keyword, and the excerpt is
carved out based on an offset from that position (with no consideration for using complete words):
ThinkHighlight
Eran Meyuchas of Interactive Thoughts has developed a component that solves many of these
issues. ThinkHighlight is implemented as a CLR Assembly with two CLR scalar-valued functions:
Now, without getting into all of the details about installing and activating the assembly on your system,
here is how the above query would be represented with ThinkHighlight:
SELECT d.Title,
Excerpt = dbo.HitHighlight(dbo.HitHighlightContext('Document', 'Content',
N'states', -1),
'top-fragment', 100, d.ID)
FROM dbo.[Document] AS d
INNER JOIN CONTAINSTABLE(dbo.[Document], *, N'states') AS t
ON d.ID = t.[KEY]
ORDER BY t.[RANK] DESC;
The results show how the most relevant keywords are highlighted, and an excerpt is derived from that based
on full words and an offset from the term being highlighted:
Some additional advantages that I haven’t demonstrated here include the ability to choose different
summarization strategies, controlling the presentation of each keyword (rather than all) using unique
CSS, as well as support for multiple languages and even documents in binary format (most IFilters are
supported).
Performance results
Initially I tested the runtime metrics for the three queries using SQL Sentry Plan Explorer, against the 7-
row table. The results were:
Next I wanted to see how they would compare on a much larger data size. I inserted the table into itself until
I was at 4,000 rows, then ran the following query:
Conclusion
It should not come as a surprise that hit-highlighting is an expensive operation, and based on the
complexity of what has to be supported (think multiple languages), that very few solutions exist out
there. I think Mike Kramar has done an excellent job producing a baseline UDF that gets you a good way
toward solving the problem, but I was pleasantly surprised to find a more robust commercial offering –
and found it to be very stable, even in beta form. I do plan to perform more thorough tests using a wider
range of document sizes and types. In the meantime, if hit-highlighting is a part of your application
requirements, you should try out Mike Kramar’s UDF and consider taking ThinkHighlight for a test drive.
What impact can different cursor options have?
By Aaron Bertrand
I’ve written several times about using cursors and how, in most cases, it is more efficient to re-write
your cursors using set-based logic.
I know that there are cases where cursors are “required” – you need to call another stored procedure or
send an e-mail for every row, you are doing maintenance tasks against each database, or you are
running a one-off task that simply isn’t worth investing the time to convert to set-based.
Regardless of the reason you are still using cursors, you should at the very least be careful not to use the
quite expensive default options. Most folks start their cursors off like this:
Other ways to do it
I wanted to run some tests using the defaults and compare them to different cursor options such
as LOCAL, STATIC, READ_ONLY and FAST_FORWARD. (There are a ton of options, but these are the ones
most commonly used as they are applicable to the most common types of cursor operations that people
use.) Not only did I want to test the raw speed of a few different combinations, but also the impact to
tempdb and memory, both after a cold service restart and with a warm cache.
The query I decided to feed to the cursor is a very simple query against sys.objects, in
the AdventureWorks2012 sample database. This returns 318,500 rows on my system (a very humble 2-
core system with 4GB RAM):
SELECT c1.[object_id]
FROM sys.objects AS c1
CROSS JOIN (SELECT TOP 500 name FROM sys.objects) AS c2;
Then I wrapped this query in a cursor with various options (including the defaults) and ran some tests,
measuring Total Server Memory, pages allocated to tempdb (according
tosys.dm_db_task_space_usage and/or sys.dm_db_session_space_usage), and total duration. I also
tried to observe tempdb contention using scripts from Glenn Berry and Robert Davis, but on my paltry
system I could not detect any contention whatsoever. Of course I’m also on SSD and absolutely nothing
else is running on the system, so these may be things you want to add to your own tests if tempdb is
more likely to be a bottleneck.
So in the end the queries looked something like this, with diagnostic queries peppered in at appropriate
points:
DECLARE @i INT = 1;
DECLARE c CURSOR
-- LOCAL
-- LOCAL STATIC
-- LOCAL FAST_FORWARD
-- LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR
SELECT c1.[object_id]
FROM sys.objects AS c1
CROSS JOIN (SELECT TOP 500 name FROM sys.objects) AS c2
ORDER BY c1.[object_id];
OPEN c;
FETCH c INTO @i;
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @i += 1; -- meaningless operation
FETCH c INTO @i;
END
CLOSE c;
DEALLOCATE c;
Results
Duration
Quite arguably the most important and common measure is, “how long did it take?” Well, it took almost
five times as long to run a cursor with the default options (or with only LOCALspecified), compared to
specifying either STATIC or FAST_FORWARD:
Memory
I also wanted to measure the additional memory that SQL Server would request when fulfilling each
cursor type. So I simply restarted before each cold cache test, measuring the performance counter Total
Server Memory (KB) before and after each test. The best combination here was LOCAL FAST_FORWARD:
tempdb usage
This result was surprising to me. Since the definition of a static cursor means that it copies the entire
result to tempdb, and it is actually expressed in sys.dm_exec_cursors as SNAPSHOT, I expected the hit
on tempdb pages to be higher with all static variants of the cursor. This was not the case; again we see a
roughly 5X hit on tempdb usage with the default cursor and the one with only LOCAL specified:
Conclusion
For years I have been stressing that the following option should always be specified for your cursors:
LOCAL FAST_FORWARD
(As an aside, I also ran tests omitting the LOCAL option, and the differences were negligible.)
That said, this is not necessarily true for *all* cursors. In this case, I am talking solely about cursors
where you’re only reading data from the cursor, in a forward direction only, and you aren’t updating the
underlying data (either by the key or using WHERE CURRENT OF). Those are tests for another day.
How much impact can a data type choice have?
By Aaron Bertrand
I’ve long been a proponent of choosing the correct data type. I’ve talked about some examples in a
previous “Bad Habits” blog post, but this weekend at SQL Saturday #162 (Cambridge, UK), the topic of
using DATETIME by default came up. In a conversation after my T-SQL : Bad Habits and Best Practices
presentation, a user stated that they just use DATETIME even if they only need granularity to the minute
or day, this way the date/time columns across their enterprise are always the same data type. I
suggested that this might be wasteful, and that the consistency might not be worth it, but today I
decided to set out to prove my theory.
TL;DR version
My testing below reveals that there are certainly scenarios where you may want to consider using a
skinnier data type instead of sticking with DATETIME everywhere. But it is important to see where my
tests for this pointed the other way, and it is also important to test these scenarios against your schema,
in your environment, with hardware and data that is as true to production as possible. Your results may,
and almost certainly will, vary.
Let’s consider the case where granularity is only important to the day (we don’t care about hours,
minutes, seconds). For this we could choose DATETIME (like the user proposed), orSMALLDATETIME,
or DATE on SQL Server 2008+. There are also two different types of data that I wanted to consider:
Data that would be inserted roughly sequentially in real-time (e.g. events that are happening
right now);
I started with 2 tables like the following, then created 4 more (2 for SMALLDATETIME, 2 for DATE):
Sample Data
To generate some sample data, I used one of my handy techniques for generating something meaningful
from something that is not: the catalog views. On my system this returned 971 distinct date/time values
(1,000,000 rows altogether) in about 12 seconds:
;WITH y AS
(
SELECT TOP (1000000) d = DATEADD(SECOND, x, DATEADD(DAY, DATEDIFF(DAY, x, 0),
'20120101'))
FROM
(
SELECT s1.[object_id] % 1000
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS x(x) ORDER BY NEWID()
)
SELECT DISTINCT d FROM y;
I put these million rows into a table so I could simulate sequential/random inserts using different access
methods for the exact same data from three different session windows:
;WITH Staging_Data AS
(
SELECT TOP (1000000) dt = DATEADD(SECOND, x, DATEADD(DAY, DATEDIFF(DAY, x, 0),
'20110101'))
FROM
(
SELECT s1.[object_id] % 1000
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS sd(x) ORDER BY NEWID()
)
INSERT dbo.Staging(source_date)
SELECT dt
FROM y
ORDER BY dt;
This process took a little bit longer to complete (20 seconds). Then I created a second table to store the same
data but distributed randomly (so that I could repeat the same distribution across all inserts).
INSERT dbo.Staging_Random(source_date)
SELECT source_date
FROM dbo.Staging
ORDER BY NEWID();
Queries to Populate the Tables
Next, I wrote a set of queries to populate the other tables with this data, using three query windows to
simulate at least a little bit of concurrency:
(In a future test I will also try this with real batches coming in from log files with relatively mixed data,
and using chunks of the source table in loops – I think those would be interesting experiments as well.
And of course adding compression into the mix.)
The results:
These results were not all that surprising to me – inserting in random order led to longer runtimes than
inserting sequentially, something we can all take back to our roots of understanding how indexes in SQL
Server work and how more “bad” page splits can happen in this scenario (I didn’t monitor specifically for
page splits in this exercise, but it is something I will consider in future tests).
I noticed that, on the random side, the implicit conversions on the incoming data might have had an
impact on timings, since they seemed a little bit higher than the native DATETIME -> DATETIME inserts.
So I decided to build two new tables containing source data: one using DATE and one
using SMALLDATETIME. This would simulate, to some degree, converting your data type properly before
passing it to the insert statement, such that an implicit conversion is not required during the insert. Here
are the new tables and how they were populated:
CREATE TABLE dbo.Staging_Random_SmallDatetime
(
ID INT IDENTITY(1,1) PRIMARY KEY,
source_date SMALLDATETIME NOT NULL
);
INSERT dbo.Staging_Random_SmallDatetime(source_date)
SELECT CONVERT(SMALLDATETIME, source_date)
FROM dbo.Staging_Random ORDER BY ID;
INSERT dbo.Staging_Random_Date(source_date)
SELECT CONVERT(DATE, source_date)
FROM dbo.Staging_Random ORDER BY ID;
This did not have the effect I was hoping for – timings were similar in all cases. So that was a wild goose
chase.
I ran the following query to determine how many pages were reserved for each table:
SELECT
name = 'dbo.' + OBJECT_NAME([object_id]),
pages = SUM(reserved_page_count)
FROM sys.dm_db_partition_stats
GROUP BY OBJECT_NAME([object_id])
ORDER BY pages;
The results:
No rocket science here; use a smaller data type, you should use fewer pages. Switching
from DATETIME to DATE consistently yielded a 25% reduction in number of pages used,
whileSMALLDATETIME reduced the requirement by 13-20%.
Now for fragmentation and page density on the non-clustered indexes (there was very little difference for the
clustered indexes):
SELECT '{table_name}',
index_id
avg_page_space_used_in_percent,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats
(
DB_ID(), OBJECT_ID('{table_name}'),
NULL, NULL, 'DETAILED'
)
WHERE index_level = 0 AND index_id = 2;
Results:
I was quite surprised to see the ordered data become almost completely fragmented, while the data that was
inserted randomly actually ended up with slightly better page usage. I’ve made a note that this warrants
further investigation outside the scope of these specific tests, but it may be something you’ll want to check
on if you have non-clustered indexes that are relying on largely sequential inserts.
[An online rebuild of the non-clustered indexes on all 6 tables ran in 7 seconds, putting page density back up
to the 99.5% range, and bringing fragmentation down to under 1%. But I didn't run that until performing the
query tests below...]
Essentially we see slightly higher duration and reads for the DATETIME versions, but very little difference in
CPU. And the differences between SMALLDATETIME and DATE are negligible in comparison. All of the queries
had simplistic query plans like this:
Conclusion
While admittedly these tests are quite fabricated and could have benefited from more permutations,
they do show roughly what I expected to see: the biggest impacts on this specific choice are on space
occupied by the non-clustered index (where choosing a skinnier data type will certainly benefit), and on
the time required to perform inserts in arbitrary, rather than sequential, order (where DATETIME only
has a marginal edge).
I’d love to hear your ideas on how to put data type choices like these through more thorough and
punishing tests. I do plan to go into more details in future posts.
What is the most efficient way to trim time from datetime?
By Aaron Bertrand
he most common need for stripping time from a datetime value is to get all the rows that represent
orders (or visits, or accidents) that occurred on a given day. However, not all of the techniques that are
used to do so are efficient or even safe.
TL;DR version
If you want a safe range query that performs well, use an open-ended range or, for single-day queries on
SQL Server 2008 and above, use CONVERT(DATE):
-- or
But I wanted to compare the performance of some of the more common approaches I see out there.
I’ve always used open-ended ranges, and since SQL Server 2008 we’ve been able to
use CONVERT(DATE)and still utilize an index on that column, which is quite powerful.
To perform a very simple initial performance test, I did the following for each of the above statements,
setting a variable to the output of the calculation 100,000 times:
SELECT SYSDATETIME();
GO
SELECT SYSDATETIME();
GO
I did this three times for each method, and they all ran in the range of 34-38 seconds. So strictly
speaking, there are very negligible differences in these methods when performing the operations in
memory:
I also wanted to compare these methods with different data types (DATETIME, SMALLDATETIME,
andDATETIME2), against both a clustered index and a heap, and with and without data compression. So
first I created a simple database. Through experimentation I determined that the optimal size to handle
120 million rows and all of the log activity that might incur (and to prevent auto-grow events from
interfering with the testing) was a 20GB data file and a 3GB log:
Next, I inserted 10,000,000 rows into each table. I did this by creating a view that would generate the
same 10,000,000 dates each time:
Here are the timings for each insert (as captured with Plan Explorer):
Next I set out to test two different query patterns for performance:
Counting the rows for a specific day, using the above seven approaches, as well as the open-
ended date range
Converting all 10,000,000 rows using the above seven approaches, as well as just returning the
raw data (since formatting on the client side may be better)
[With the exception of the FLOAT methods and the DATETIME2 column, since this conversion is not
legal.]
For the first question, the queries look like this (repeated for each table type):
And here are the second set of queries (again, repeating for each table type):
(For this set of queries, the heap showed very similar results – practically indistinguishable.)
Conclusion
In case you wanted to skip to the punchline, these results show that conversions in memory are not
important, but if you are converting data on the way out of a table (or as part of a search predicate), the
method you choose can have a dramatic impact on performance. Converting to a DATE (for a single day)
or using an open-ended date range in any case will yield the best performance, while the most popular
method out there – converting to a string – is absolutely abysmal.
We also see that compression can have a decent effect on storage space, with very minor impact on
query performance. The effect on insert performance seems to be as dependent on whether or not the
table has a clustered index rather than whether or not compression is enabled. However, with a
clustered index in place, there was a noticeable bump in the duration it took to insert 10 million rows.
Something to keep in mind and to balance with disk space savings.
Clearly there could be a lot more testing involved, with more substantial and varied workloads, which I
may explore further in a future post.
Beware misleading data from SET STATISTICS IO
By Aaron Bertrand
My co-worker Steve Wright (blog | @SQL_Steve) prodded me with a question recently on a strange
result he was seeing. In order to test some functionality in our latest tool, SQL Sentry Plan Explorer PRO,
he had manufactured a wide and large table, and was running a variety of queries against it. In one case
he was returning a lot of data, but STATISTICS IO was showing that very few reads were taking place. I
pinged some people on #sqlhelp and, since it seemed nobody had seen this issue, I thought I would blog
about it.
TL;DR Version
In short, be very aware that there are some scenarios where you can’t rely on STATISTICS IO to tell you
the truth. In some cases (this one involving TOP and parallelism), it will vastly under-report logical reads.
This can lead you to believe you have a very I/O-friendly query when you don’t. There are other more
obvious cases – such as when you have a bunch of I/O hidden away by the use of scalar user-defined
functions. We think Plan Explorer makes those cases more obvious; this one, however, is a little trickier.
The table has 37 million rows, up to 250 bytes per row, about 1 million pages, and very low
fragmentation (0.42% on level 0, 15% on level 1, and 0 beyond that). There are no computed columns,
no UDFs in play, and no indexes except a clustered primary key on the leading INT column. A simple
query returning 500,000 rows, all columns, using TOP and SELECT *:
Results:
We’re returning 500,000 rows, and it takes about 10 seconds. I immediately know that something is
wrong with the logical reads number. Even if I didn’t already know about the underlying data, I can tell
from the grid results in Management Studio that this is pulling more than 23 pages of data, whether
they are from memory or cache, and this should be reflected somewhere in STATISTICS IO. Looking at
the plan…
…we see parallelism is in there, and that we’ve scanned the entire table. So how is it possible that there are
only 23 logical reads?
One of my first questions back to Steve was: “What happens if you eliminate parallelism?” So I tried it
out. I took the original subquery version and added MAXDOP 1:
It’s not hard to see that one of these queries is not telling the whole truth. While STATISTICS IO might
not tell us the whole story, maybe trace will. If we retrieve runtime metrics by generating an actual
execution plan in Plan Explorer, we see that the magical low-read query is, in fact, pulling the data from
memory or disk, and not from a cloud of magic pixie dust. In fact it has *more* reads than the other
version:
So it is clear that reads are happening, they’re just not appearing correctly in the STATISTICS IOoutput.
Well, I’ll be quite honest: I don’t know, other than the fact that parallelism is definitely playing a role,
and it seems to be some kind of race condition. STATISTICS IO (and, since that’s where we get the data,
our Table I/O tab) shows a very misleading number of reads. It’s clear that the query returns all of the
data we’re looking for, and it’s clear from the trace results that it uses reads and not osmosis to do so. I
asked Paul White (blog | @SQL_Kiwi) about it and he suggested that only some of the pre-thread I/O
counts are being included in the total (and agrees that this is a bug).
If you want to try this out at home, all you need is AdventureWorks (this should repro against 2008,
2008 R2 and 2012 versions), and the following query:
So it seems that we can easily reproduce this at will with a TOP operator and a low enough DOP. I’ve
filed a bug:
And Paul has filed two other somewhat-related bugs involving parallelism, the first as a result of our
conversation:
Cardinality Estimation Error With Pushed Predicate on a Lookup [ related blog post ]
(For the nostalgic, here are six other parallelism bugs I pointed out a few years ago.)
Be careful about trusting a single source. If you look solely at STATISTICS IO after changing a query like
this, you may be tempted to focus on the miraculous drop in reads instead of the increase in duration.
At which point you may pat yourself on the back, leave work early and enjoy your weekend, thinking you
have just made a tremendous performance impact on your query. When of course nothing could be
further from the truth.
Trimming time from datetime – a follow-up
By Aaron Bertrand
Following up on my previous post about trimming time from datetime, I was spurred by co-worker
Brooke Philpott (@Macromullet), and by re-visiting this blog post by Adam Machanic, to demonstrate
more clearly the performance characteristics of various methods without involving data access. In the
original post, I quickly compared seven different methods of converting a datetime value to a date
independently, showed that the differences were negligible, then moved straight into analyzing the use
of those methods in actual queries that return data.
In this post I wanted to show several different ways to trim time from datetime (18 different ways in
fact!), without introducing any actual data, to see if we could proclaim a “fastest” way to perform this
task.
The Methods
Here are the 18 methods I would be testing, some suggested by Brooke, and some taken from the blog
post Madhivanan pointed out after my previous post:
The Test
I created a loop where I would run each conversion 1,000,000 times, and then repeat the process for all
18 conversion methods 10 times. This would provide metrics for 10,000,000 conversions for each
method, eliminating any significant statistical skew.
WHILE @j <= 18
BEGIN
SELECT @x = 1, @t = SYSDATETIME();
WHILE @x <= @i
BEGIN
IF @j = 1
SET @d = DATEDIFF(DAY, 0, @ds);
IF @j = 2
SET @d = CAST(@ds AS INT);
IF @j = 3
SET @d = CAST(CONVERT(CHAR(8), @ds, 112) AS DATETIME);
IF @j = 4
SET @d = DATEADD(DAY, DATEDIFF(DAY, 0, @ds), 0);
IF @j = 5
SET @d = CAST(CAST(SUBSTRING(CAST(@ds AS BINARY(8)), 1, 4)
AS BINARY(8)) AS DATETIME);
IF @j = 6
SET @d = CONVERT(CHAR(8), @ds, 112);
IF @J = 7
SET @d = CAST(CAST(@ds AS VARCHAR(11)) AS DATETIME);
IF @J = 8
SET @d = @ds - CONVERT(CHAR(10), @ds, 108);
IF @J = 9
SET @d = @ds - CAST(CAST(@ds AS TIME) AS DATETIME);
IF @J = 10
SET @d = CAST(FLOOR(CAST(@ds AS FLOAT)) AS DATETIME);
IF @J = 11
SET @d = CAST(CAST(CAST(CAST(@ds AS BINARY(8)) AS BINARY(4))
AS BINARY(8)) AS DATETIME);
IF @J = 12
SET @d = @ds - CAST(@ds AS BINARY(4));
IF @J = 13
SET @d = DATEADD(DAY, CONVERT(INT, @ds - 0.5), 0);
IF @J = 14
SET @d = CONVERT(DATETIME, FORMAT(@ds, N'yyyy-MM-dd'));
IF @J = 15
SET @d = CONVERT(DATETIME,CONVERT(INT,CONVERT(FLOAT,@ds)));
IF @J = 16
SET @d = CAST(CAST(CAST(CAST(@ds AS BINARY(8)) AS BIGINT) &
0XFFFFFFFF00000000 AS BINARY(8)) AS DATETIME);
IF @J = 17
SET @d = CONVERT(DATE, @ds);
IF @j = 18
SET @d = CAST(@ds AS DATE);
SET @x += 1;
END
SET @j += 1;
END
GO 10
SELECT
j, method = CASE ... END,
MIN(ms), MAX(ms), AVG(ms)
FROM #s
GROUP BY j ORDER BY j;
The Results
I ran this on a Windows 8 VM, with 8 GB RAM and 4 vCPUs, running SQL Server 2012 (11.0.2376). Here
are tabular results, sorted by average duration, fastest first:
SQL Server offers two methods of collecting diagnostic and troubleshooting data about the workload
executed against the server: SQL Trace and Extended Events. Starting in SQL Server 2012, the Extended
Events implementation provides comparable data collection capabilities to SQL Trace and can be used
for comparisons of the overhead incurred by these two features. In this article we’ll take a look at
comparing the “observer overhead” that occurs when using SQL Trace and Extended Events in various
configurations in order to determine the performance impact that data collection may have on our
workload through the use of a replay workload capture and Distributed Replay.
The test environment is comprised of six virtual machines, one domain controller, one SQL Server 2012
Enterprise edition server, and four client servers with the Distributed Replay client service installed on
them. Different host configurations were tested for this article and similar results resulted from the
three different configurations that were tested based on the ratio of impact. The SQL Server Enterprise
edition server is configured with 4 vCPUs and 4GB of RAM. The remaining five servers are configured
with 1 vCPU and 1GB RAM. The Distributed Replay controller service was run on the SQL Server 2012
Enterprise edition server because it requires an Enterprise license to use more than one client for replay.
Test workload
The test workload used for the replay capture is the AdventureWorks Books Online workload that I
created last year for generating mock workloads against SQL Server. This workload uses the example
queries from the Books Online against the AdventureWorks family of databases and is driven by
PowerShell. The workload was setup on each of the four replay clients and run with four total
connections to the SQL Server from each of the client servers to generate a 1GB replay trace
capture. The replay trace was created using the TSQL_Replay template from SQL Server Profiler,
exported to a script and configured as a server side trace to a file. Once the replay trace file was
captured it was preprocessed for use with Distributed Replay and then the replay data was used as the
replay workload for all of the tests.
Replay configuration
The replay operation was configured to use stress mode configuration to drive the maximum amount of
load against the test SQL Server instance. Additionally, the configuration uses a reduced think and
connect time scale, which adjust the ratio of time between the start of the replay trace and when an
event actually occurred to when it is replayed during the replay operation, to allow the events to be
replayed at maximum scale. The stress scale for the replay is also configured per spid. The details of the
configuration file for the replay operation were as follows:
These counters will be used to measure the overall server load, and the throughput characteristics of
each of the tests for comparison.
Test configurations
Baseline
Server-side Trace
Profiler on server
Profiler remotely
Each test was repeated three times to ensure that the results were consistent across different tests and
to provide an average set of results for comparison. For the initial baseline tests, no additional data
collection was configured for the SQL Server instance, but the default data collections that ship with SQL
Server 2012 were left enabled: the default trace and the system_health event session. This reflects the
general configuration of most SQL Servers, since it is not generally recommended that the default trace
or system_health session be disabled due to the benefits they provide to database administrators. This
test was used to determine the overall baseline for comparison with the tests where additional data
collection was being performed. The remaining tests are based on the TSQL_SPs template that ships
with SQL Server Profiler and collects the following events:
Sessions\ExistingConnection
Stored Procedures\RPC:Starting
Stored Procedures\SP:Completed
Stored Procedures\SP:Starting
Stored Procedures\SP:StmtStarting
TSQL\SQL:BatchStarting
This template was selected based on the workload used for the tests, which is primarily SQL batches
that are captured by the SQL:BatchStarting event, and then a number of events using the various
methods of hierarchyid, which are captured by the SP:Starting, SP:StmtStarting,
and SP:Completedevents. A server-side trace script was generated from the template using the export
functionality in SQL Server Profiler, and the only changes made to the script were to set
the maxfilesize parameter to 500MB, enable trace file rollover, and provide a filename to which the
trace was written.
The third and fourth tests used SQL Server Profiler to collect the same events as the server-side trace to
measure the performance overhead of tracing using the Profiler application. These tests were run using
SQL Profiler locally on the SQL Server and remotely from a separate client to ascertain whether there
was a difference in overhead by having Profiler running locally or remotely.
The final tests used Extended Events collected the same events, and the same columns based on an
event session created using my Trace to Extended Events conversion script for SQL Server 2012. The
tests included evaluating the event_file, ring_buffer, and new streaming provider in SQL Server 2012
separately to determine the overhead that each target might impose on the performance of the
server. Additionally, the event session was configured with the default memory buffer options, but was
changed to specify NO_EVENT_LOSS for the EVENT_RETENTION_MODE option for the event_file and
ring_buffer tests to match the behavior of server-side Trace to a file, which also guarantees no event
loss.
Results
With one exception, the results of the tests were not surprising. The baseline test was able to perform
the replay workload in thirteen minutes and thirty-five seconds, and averaged 2345 batch requests per
second during the tests. With the server-side Trace running, the replay operation completed in 16
minutes and 40 seconds, which is an 18.1% degradation to performance. The Profiler Traces had the
worst performers overall, and required 149 minutes when Profiler was run locally on the server, and 123
minutes and 20 seconds when Profiler was run remotely, yielding 90.8% and 87.6% degradation in
performance respectively. The Extended Events tests were the best performers, taking 15 minutes and
15 seconds for the event_file and 15 minutes and 40 seconds for the ring_buffer target, resulting in a
10.4% and 11.6% degradation in performance. The average results for all tests are displayed in Table 1
and charted in Figure 2:
The Extended Events streaming test is not quite a fair result in the context of the tests that were run and
requires a bit more explanation to understand the result. From the table results we can see that the
streaming tests for Extended Events completed in sixteen minutes and thirty-five seconds, equating to
34.1% degradation in performance. However, if we zoom into the chart and change its scale, as shown in
Figure 3, we’ll see that the streaming had a much greater impact to the performance initially and then
began to perform in a manner similar to the other Extended Events tests:
An exception occurred during event enumeration. Examine the inner exception for more
information.
(Microsoft.SqlServer.XEvent.Linq)
Error 25726, severity 17, state 0 was raised, but no message with that error number
was found in sys.messages. If error is larger than 50000, make sure the user-defined
message is added using sp_addmessage.
(Microsoft SQL Server, Error: 18054)
Conclusions
All of the methods of collecting diagnostics data from SQL Server have “observer overhead” associated
with them and can impact the performance of a workload under heavy load. For systems running on SQL
Server 2012, Extended Events provide the least amount of overhead and provide similar capabilities for
events and columns as SQL Trace (some events in SQL Trace are rolled up into other events in Extended
Events). Should SQL Trace be necessary for capturing event data – which may be the case until third-
party tools are recoded to leverage Extended Events data – a server-side Trace to a file will yield the
least amount of performance overhead. SQL Server Profiler is a tool to be avoided on busy production
servers, as shown by the tenfold increase in duration and significant reduction in throughput for the
replay.
While the results would seem to favor running SQL Server Profiler remotely when Profiler must be used,
this conclusion cannot be definitively drawn based on the specific tests that were run in this scenario.
Additional testing and data collection would have to be performed to determine if the remote Profiler
results were the result of lower context switching on the SQL Server instance, or if networking between
VMs played a factor in the lower performance impact to the remote collection. The point in these tests
was to show the significant overhead that Profiler incurs, regardless of where Profiler was being run.
Finally, the live event stream in Extended Events also has a high overhead when it is actually connected
in collecting data, but as shown in the tests, the Database Engine will disconnect a live stream if it falls
behind on the events to prevent severely impacting the performance of the server.
The Zombie PerfMon Counters That Never Die!
By Kevin Kline
One of the things that’s simultaneously great and horrible about the Internet is that, once something
gets posted out in the ether, it basically never goes away. (Some day, politicians will realize this. We can
easily fact check their consistency.) Because of longevity of content posted to the Internet, a lot of
performance tuning topics become “zombies.” We shoot ‘em dead, but they keep coming back!
In other words, those old recommendations werea suggested best practice long ago, for a specific
version of SQL Server, but are now inappropriate for the newer version. It’s not uncommon for me,
when speaking at a conference, to encounter someone who’s still clinging to settings and techniques
which haven’t been good practice since the days of SQL Server 2000. The SQL Server 2000 Operations
Guide on Capacity/Storagecontains many “best practice” recommendations that were very version-
specific and no longer apply today.
So here’s an example. The % Disk Time and Disk Queue Length PerfMon counters were heavily
recommended as key performance indicators for I/O performance. SQL Server throws a lot of I/O at the
disks using scatter/gather to maximize the utilization of the disk-based I/O subsystem. This approach
leads to short bursts of long queue depths during checkpoints and read-aheads for an instance of SQL
Server. Sometimes the server workload is such that your disk can’t keep up with the I/O shoved at it
and, when that happens, you’ll see long queue lengths too. The short burst scenario isn’t a
problem. The lengthening queue length scenario usually is a problem. So is that a good practice?
In a word, not-so-much.
Those counters can still be of some use on an instance of SQL Server which only has one hard disk
(though that’s exceedingly rare these days). Why?
The PerfMon counter % Disk Time is a bogus performance metric for several reasons. It does not take
into account asynchronous I/O requests. It can’t tell what the real performance profile for an
underlying RAID set may be, since they contain multiple disk drives. The PerfMon counter Disk Queue
Length is also mostly useless, except on SQL Servers with a single physical disk, because the hard disk
controller cache obfuscates how many I/O operations are actually pending on the queue or not. In fact,
some hard disks even have tiny write caches as well, which further muddies the water was to whether
the I/O is truly queued, in a cache somewhere between the operating system and the disk, or has finally
made it all the way to the CMOS on the disk.
Instead of using those PerfMon counters, use the Avg Disk Reads/sec, Avg Disk Writes/sec, and Avg
Disk Transfers/sec to track the performance of disk subsystems. These counters track the average
number of read I/Os, write I/Os, and combined read and write I/Os that occured in the last
second. Occassionally, I like to track the same metrics by volume of data rather than the rate of I/O
operations. So, to get that data, you may wish to give these volume-specific PerfMon counters a try: Avg
Disk Transfer Bytes/sec, Avg Disk Read Bytes/sec, and Avg Disk Write Bytes/sec.
For SQL Server I/O Performance, Use Dynamic Management Views (DMV)
And unless you’ve been living in a cave, you should make sure to use SQL Server’s Dynamic Management
Views (DMVs) to check on I/O performance for recent versions of SQL Server. Some of my favorite DMVs
for I/O include:
sys.dm_os_wait_stats
sys.dm_os_waiting_tasks
sys.dm_os_performance_counters
sys.dm_io_virtual_file_stats
sys.dm_io_pending_io_requests
sys.dm_db_index_operational_stats
sys.dm_db_index_usage_stats
So how are you tracking I/O performance metrics? Which ones are you using?
I look forward to hearing back from you!
Is the sp_ prefix still a no-no?
By Aaron Bertrand
In the SQL Server world, there are two types of people: those who like all of their objects to be prefixed,
and those who don’t. The former group is further divided into two categories: those who prefix stored
procedures with sp_, and those who choose other prefixes (such as usp_ or proc_). A long-standing
recommendation has been to avoid the sp_ prefix, both for performance reasons, and to avoid
ambiguity or collisions if you happen to choose a name that already exists in master. Collisions are
certainly still an issue, but assuming you’ve vetted your object name, is it still a performance issue?
The sp_ prefix is still a no-no. But in this post I will explain why, how SQL Server 2012 might lead you to
believe that this cautionary advice no longer applies, and some other potential side effects of choosing
this naming convention.
The sp_ prefix does not mean what you think it does: most people think sp stands for “stored
procedure” when in fact it means “special.” Stored procedures (as well as tables and views) stored in
master with an sp_ prefix are accessible from any database without a proper reference (assuming a local
version does not exist). If the procedure is marked as a system object
(usingsp_MS_marksystemobject (an undocumented and unsupported system procedure that
setsis_ms_shipped to 1), then the procedure in master will execute in the context of the calling
database. Let’s look at a simple example:
(0 row(s) affected)
sp_test foo
(1 row(s) affected)
The performance issue comes from the fact that master might be checked for an equivalent stored
procedure, depending on whether there is a local version of the procedure, and whether there is in fact
an equivalent object in master. This can lead to extra metadata overhead as well as an
additionalSP:CacheMiss event. The question is whether this overhead is tangible.
If we run a quick test from our test database, we see that executing these stored procedures will never
actually invoke the versions from master, regardless of whether we properly database- or schema-
qualify the procedure (a common misconception) or if we mark the master version as a system object:
USE sp_prefix;
GO
EXEC sp_prefix.dbo.sp_something;
GO
EXEC dbo.sp_something;
GO
EXEC sp_something;
Results:
sp_prefix sp_prefix
sp_prefix sp_prefix
sp_prefix sp_prefix
Let’s also run a Quick Trace® using SQL Sentry Performance Advisor to observe whether there are
anySP:CacheMiss events:
We see CacheMiss events for the ad hoc batch that calls the stored procedure (since SQL Server
generally won’t bother caching a batch that consists primarily of procedure calls), but not for the stored
procedure itself. Both with and without the sp_something procedure existing in master (and when it
exists, both with and without it being marked as a system object), the calls to sp_something in the user
database never “accidentally” call the procedure in master, and never generate anyCacheMiss events
for the procedure.
This was on SQL Server 2012. I repeated the same tests above on SQL Server 2008 R2, and found slightly
different results:
So on SQL Server 2008 R2 we see an additional CacheMiss event that does not occur in SQL Server 2012.
This occurs in all scenarios (no equivalent object master, an object in master marked as a system object,
and an object in master not marked as a system object). Immediately I was curious whether this
additional event would have any noticeable impact on performance.
I made an additional procedure without the sp_ prefix to compare raw performance, CacheMiss aside:
USE sp_prefix;
GO
CREATE PROCEDURE dbo.proc_something
AS
BEGIN
SELECT 'sp_prefix', DB_NAME();
END
GO
So the only difference between sp_something and proc_something. I then created wrapper procedures
to execute them 1000 times each, using EXEC sp_prefix.dbo.<procname>, EXEC
dbo.<procname> andEXEC <procname> syntax, with equivalent stored procedures living in master and
marked as a system object, living in master but not marked as a system object, and not living in master
at all.
USE sp_prefix;
GO
CREATE PROCEDURE dbo.wrap_sp_3part
AS
BEGIN
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
EXEC sp_prefix.dbo.sp_something;
SET @i += 1;
END
END
GO
CREATE PROCEDURE dbo.wrap_sp_2part
AS
BEGIN
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
EXEC dbo.sp_something;
SET @i += 1;
END
END
GO
CREATE PROCEDURE dbo.wrap_sp_1part
AS
BEGIN
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
EXEC sp_something;
SET @i += 1;
END
END
GO
-- repeat for proc_something
Measuring runtime duration of each wrapper procedure with SQL Sentry Plan Explorer, the results show
that using the sp_ prefix has a significant impact on average duration in almost all cases (and certainly
on average):
We also see that the performance of SQL Server 2012 trends much better than the performance on SQL
Sevrer 2008 R2 – no other variables are different. Both instances are on the same host, and neither is
under memory or other pressure of any kind. This could be a combination of the
additionalCacheMiss event and those transparent improvements you get from enhancements made to
the database engine between versions.
If you create a stored procedure that references an object you created, say dbo.sp_helptext, and you
didn’t realize (or didn’t care) that this name collides with a system procedure name, then there is
potential ambiguity when someone is reviewing your stored procedure. They will most likely assume
you meant the system procedure, not a different procedure you created that happens to share its name.
Another interesting thing happens when you create a stored procedure that references a stored
procedure prefixed with sp_ that just happens to also exist in master. Let’s pick an existing procedure
that you might not be immediately familiar with (and therefore might be a more likely representative of
the scenario I’m describing): sp_resyncuniquetable.
One more source of ambiguity can occur in this scenario. The following sequence of events is entirely
plausible:
2. The deployer accidentally creates a version in master (and maybe notices, or maybe doesn’t, but
in either case doesn’t clean up).
3. The deployer (or someone else) creates the procedure, this time in the right database.
5. You replace sp_foo with sp_superfoo, and delete sp_foo from the user database.
6. When updating the application(s) to reference the new stored procedure, you might miss a
replacement or two for various reasons.
So in this scenario, the application is still calling sp_foo, and it’s not failing – even though you’ve deleted
the local copy – since it finds what it thinks is an equivalent in master. Not only is this stored procedure
in master not equivalent to sp_superfoo, it’s not even equivalent to the latest version ofsp_foo.
“Procedure not found” is a much easier problem to troubleshoot than “Procedure doesn’t exist – but
code calling it works, and doesn’t quite return the expected results.”
Conclusion
I still think that, even though the behavior has changed slightly in SQL Server 2012, you shouldn’t be
using the sp_ prefix at any time, unless your intention is to create a stored procedure in master *and*
mark it as a system object. Otherwise you are exposed to these performance issues as well as potential
ambiguity on multiple fronts.
And personally, I don’t think stored procedures need to have any prefix at all – but I have less tangible
evidence to convince you of that, other than asking you what other type of object could it possible be?
You can’t execute a view, or a function, or a table…
As I suggest often, I don’t really care what your naming convention is, as long as you’re consistent. But I
think you should avoid potentially harmful prefixes like sp_.
Configuring a Dedicated Network for Availability Group Communication
By Joe Sack
SQL Server 2012 AlwaysOn Availability Groups require a database mirroring endpoint for each SQL
Server instance that will be hosting an availability group replica and/or database mirroring session. This
SQL Server instance endpoint is then shared by one or more availability group replicas and/or database
mirroring sessions and is the mechanism for communication between the primary replica and the
associated secondary replicas.
Depending on the data modification workloads on the primary replica, the availability group messaging
throughput requirements can be non-trivial. This activity is also sensitive to traffic from concurrent non-
availability group activity. If throughput is suffering due to degraded bandwidth and concurrent traffic,
you may consider isolating the availability group traffic to its own dedicated network adapter for each
SQL Server instance hosting an availability replica. This post will describe this process and also briefly
describe what you might expect to see in a degraded throughput scenario.
For this article, I’m using a five node virtual guest Windows Server Failover Cluster (WSFC). Each node in
the WSFC has its own stand-alone SQL Server instance using non-shared local storage. Each node also
has a separate virtual network adapter for public communication, a virtual network adapter for WSFC
communication, and a virtual network adapter that we’ll dedicate to availability group communication.
For the purposes of this post, we’ll focus on the information needed for the availability group dedicated
network adapters on each node:
SQL2K12-SVR2 192.168.20.32
SQL2K12-SVR3 192.168.20.33
SQL2K12-SVR4 192.168.20.34
SQL2K12-SVR5 192.168.20.35
Setting up an availability group using a dedicated NIC is almost identical to a shared NIC process, only in
order to “bind” the availability group to a specific NIC, I first have to designate the LISTENER_IPargument
in the CREATE ENDPOINT command, using the aforementioned IP addresses for my dedicated NICs.
Below shows the creation of each endpoint across the five WSFC nodes:
:CONNECT SQL2K12-SVR1
USE [master];
GO
USE [master];
GO
:CONNECT SQL2K12-SVR2
After creating my availability group, if I start driving data modification load against the primary replica
availability databases, I can quickly see that the availability group communication traffic is flowing on
the dedicated NIC using Task Manager on the networking tab (the first section is the throughput for the
dedicated availability group NIC):
And I can also track the stats using various performance counters. In the below image, the Inetl[R]
PRO_1000 MT Network Connection _2 is my dedicated availability group NIC and has the majority of NIC
traffic compared to the two other NICs:
Now having a dedicated NIC for availability group traffic can be a way to isolate activity and theoretically
improve performance, but if your dedicated NIC has insufficient bandwidth, as you might expect
performance will suffer and the health of the availability group topology will degrade.
For example, I changed the dedicated availability group NIC on the primary replica to a 28.8 Kbps
outgoing transfer bandwidth to see what would happen. Needless to say, it wasn’t good. The availability
group NIC throughput dropped significantly:
Within a few seconds, the health of the various replicas degraded, with a couple of the replicas moving
to a “not synchronizing” state:
I increased the dedicated NIC on the primary replica to 64 Kbps and after a few seconds there was an
initial catch-up spike as well:
While things improved, I did witness periodic disconnects and health warnings at this lower NIC
throughput setting:
What about the associated wait statistics on the primary replica?
When there was plenty of bandwidth on the dedicated NIC and all availability replicas were in a healthy
state, I saw the following distribution during my data loads over a 2 minute period:
HADR_WORK_QUEUE represents an expected background worker thread waiting for new
work.HADR_LOGCAPTURE_WAIT represents another expected wait for new log records to become
available and according to Books Online, is expected if the log scan is caught up or is reading from disk.
When I reduced the throughput of the NIC enough in order to get the availability group to an unhealthy
state, the wait type distribution was as follows:
We now see a new top wait type, HADR_NOTIFICATION_DEQUEUE. This is one of those “internal use
only” wait types as defined by Books Online, representing a background task that processes WSFC
notifications. What’s interesting is that this wait type doesn’t point directly to an issue, and yet the tests
show this wait type rise to the top in association with degraded availability group messaging throughput.
So the bottom line is isolating your availability group activity to a dedicated NIC can be beneficial if
you’re providing a network throughput with sufficient bandwidth. However if you can’t guarantee good
bandwidth even using a dedicated network, the health of your availability group topology will suffer.
Checking if a non-LOB column needs to be updated
By Aaron Bertrand
Occasionally I see people try to “optimize” their update statements to avoid writing the same value to a
particular column. My understanding has always been that if you’re going to update a row, assuming all
of the values are in-row, the costs of locking the row are much higher than the incremental cost of
updating one, two or all columns in that row.
;WITH x(d) AS
(
SELECT d FROM
(
VALUES (N'abc'),(N'def'),(N'ghi'),
(N'jkl'),(N'mno'),(N'pqr')
) AS y(d)
)
INSERT dbo.whatever(v1, v2, v3, v4, v5, v6)
SELECT TOP (50000) x1.d, x2.d, x3.d, x4.d, x5.d, x6.d
FROM x AS x1, x AS x2, x AS x3, x AS x4,
x AS x5, x AS x6, x AS x7;
END
GO
Then I wrote update statements formulated in two ways that you could “avoid” writing to a specific
column, given this variable assignment:
DECLARE
@v1 NVARCHAR(50) = N'abc',
@v2 NVARCHAR(50) = N'def',
@v3 NVARCHAR(50) = N'ghi',
@v4 NVARCHAR(50) = N'jkl',
@v5 NVARCHAR(50) = N'mno',
@v6 NVARCHAR(50) = N'pqr';
First by using a CASE expression to check if the value in the column is the same as the value in the
variable:
-- to show when any value might change in a row, the procedure uses the full
cross join:
-- to show when particular values will change on many rows, we can hard-code
constants:
-- and to show when all values will change, a different variable assignment
would take place:
DECLARE
@v1 NVARCHAR(50) = N'zzz',
@v2 NVARCHAR(50) = N'zzz',
@v3 NVARCHAR(50) = N'zzz',
@v4 NVARCHAR(50) = N'zzz',
@v5 NVARCHAR(50) = N'zzz',
@v6 NVARCHAR(50) = N'zzz';
Results
After running these tests, the “blind update” won in every single scenario. Now, you’re thinking, what’s
a couple hundred milliseconds? Extrapolate. If you’re performing a lot of updates in your system, this
can really start to take a toll.
Detailed results in Plan Explorer: Any change | 2 values exempt | 4 values exempt | All values
exempt| All change
Based on feedback from Roji, I decided to test this with a few indexes as well:
Conclusion
From this test, it seems to me that it is usually not worth checking if a value should be updated. If your
UPDATE statement affects multiple columns, it is almost always cheaper for you to scan all of the
columns where any value might have changed rather than check each column individually. In a future
post, I will investigate whether this scenario is paralleled for LOB columns.
Minimizing the impact of DBCC CHECKDB : DOs and DON’Ts
By Aaron Bertrand
Your responsibilities as a DBA (or <insert role here>) probably include things like performance tuning,
capacity planning and disaster recovery. What many people tend to forget or defer it ensuring the
integrity of the structure of their databases (both logical and physical); the most important step
beingDBCC CHECKDB. You can get partway there by creating a simple maintenance plan with a “Check
Database Integrity Task” – however, in my mind, this is just checking a checkbox.
http://www.sqlperformance.com/app/wp-content/uploads/2012/11/SSMS_MP.png
If you look closer, there is very little you can do to control how the task operates. Even the quite
expansive Properties panelexposes a whole lot of settings for the maintenance subplan, but virtually
nothing about theDBCC commands it will run. Personally I think you should take a much more proactive
and controlled approach to how you perform your CHECKDBoperations in production environments, by
creating your own jobs and manually hand-crafting your DBCC commands. You might tailor your
schedule or the commands themselves to different databases – for example the ASP.NET membership
database is probably not as crucial as your sales database, and could tolerate less frequent and/or less
thorough checks.
But for your crucial databases, I thought I would put together a post to detail some of the things I would
investigate in order to minimize the disruption DBCC commands may cause – and what myths and
marketing hoopla you should be wary of. And I want to thank Paul “Mr. DBCC” Randal
(blog |@PaulRandal) for providing valuable input – not only to this specific post, but also his endless
advice on his blog, #sqlhelp and in SQLskills Immersion training.
Please take all of these ideas with a grain of salt, and do your best to perform adequate testing in your
environment – not all of these suggestions will yield better performance in all environments. But you
owe it to yourself, your users and your stakeholders to at least consider the impact that
your CHECKDBoperations might have, and take steps to mitigate those effects where feasible – without
introducing unnecessary risk by not checking the right things.
No matter where you are running CHECKDB, always use the WITH NO_INFOMSGS option. This simply
suppresses all the irrelevant output that just tells you how many rows are in each table; if you’re
interested in that information, you can get it from simple queries against DMVs and not while DBCC is
running. Suppressing the output makes it far less likely that you’ll miss a critical message buried in all
that happy output.
Similarly, you should always use the WITH ALL_ERRORMSGS option, but especially if you are running SQL
Server 2008 RTM or SQL Server 2005 (in those cases, you may see the list of per-object errors truncated
to 200). For any CHECKDB operations other than quick ad-hoc checks, you should consider directing
output to a file. Management Studio is limited to 1000 lines of output from DBCC CHECKDB, so you
might miss out on some errors if you exceed this figure.
While not strictly a performance issue, using these options will prevent you from having to run the
process again. This is particularly critical if you’re in the middle of disaster recovery.
In most cases, CHECKDB spends the majority of its time performing logical checks of the data. If you
have the ability to perform these checks on a true copy of the data, you can focus your efforts on the
physical structure of your production systems, and use the secondary server to handle all of the logical
checks and alleviate that load from the primary. By secondary server, I mean only the following:
The place where you test your full restores – because you test your restores, right?
Other folks (most notably the behemoth marketing force that is Microsoft) might have convinced you
that other forms of secondary servers are suitable for DBCC checks. For example:
SAN mirroring;
or other variations…
Unfortunately, this is not the case, and none of these secondaries are valid, reliable places to perform
your checks as an alternative to the primary. Only a one-for-one backup can serve as a true copy;
anything else that relies on things like the application of log backups to get to a consistent state is not
going to reliably reflect integrity problems on the primary.
So rather than try to offload your logical checks to a secondary and never perform them on the primary,
here is what I suggest:
1. Make sure you are frequently testing the restores of your full backups. And no, this does not
include COPY_ONLY backups from from an AG secondary, for the same reasons as above – that
would only be valid in the case where you have just initiated the secondary with a full restore.
2. Run DBCC CHECKDB often against the full restore, before doing anything else. Again, replaying
log records at this point will invalidate this database as a true copy of the source.
3. Run DBCC CHECKDB against your primary, perhaps broken up in ways that Paul Randal suggests,
and/or on a less frequent schedule, and/or using PHYSICAL_ONLY more often than not. This can
depend on how often and reliably you are performing (2).
4. Never assume that checks against the secondary are enough. Even with an exact replica of your
primary database, there are still physical issues that can occur on the I/O subsystem of your
primary that will never propagate to the secondary.
5. Always analyze DBCC output. Just running it and ignoring it, to check it off some list, is as helpful
as running backups and claiming success without ever testing that you can actually restore that
backup when needed.
I’ve done some thorough testing of two trace flags (2549 and 2562) and have found that they can yield
substantial performance improvements. These two trace flags are described in a lot more detail in KB
#2634571, but basically:
o This optimizes the checkdb process by treating each individual database file as residing
on a unique underlying disk. This is okay to use if your database has a single data file, or
if you know that each database file is, in fact, on a separate drive. If your database has
multiple files and they share a single, direct-attached spindle, you should be wary of this
trace flag, as it may do more harm than good.
o This flag treats the entire checkdb process as a single batch, at the cost of higher
tempdb utilization (up to 5% of the database size).
o Uses a better algorithm to determine how to read pages from the database, reducing
latch contention (specifically for DBCC_MULTIOBJECT_SCANNER). Note that this specific
improvement is in the SQL Server 2012 code path, so you will benefit from it even
without the trace flag. This can avoid errors such as:
The above two trace flags are available in the following versions:
o If you are still using SQL Server 2005, this trace flag, introduced in 2005 SP2 CU#9
(9.00.3282) (though not documented in that Cumulative Update’s Knowledge Base
article, KB #953752), attempts to correct poor performance of DATA_PURITY checks on
x64-based systems. You can see more details in KB #945770. This trace flag should not
be necessary in more modern versions of SQL Server, as the problem in the query
processor has been fixed.
If you’re going to use any of these trace flags, I highly recommend setting them at the session level
using DBCC TRACEON rather than as a startup trace flag. Not only does it enable you to turn them off
without having to cycle SQL Server, but it also allows you to implement them only when performing
certain CHECKDB commands, as opposed to operations using any type of repair.
DBCC CHECKDB can make heavy use of tempdb, so make sure you plan for resource utilization there.
This is usually a good thing to do in any case. For CHECKDB you’ll want to properly allocate space to
tempdb; the last thing you want is for CHECKDB progress (and any other concurrent operations) to have
to wait for an autogrow. You can get an idea for requirements using WITH ESTIMATEONLY, as
Paulexplains here. Just be aware that the estimate can be quite low due to a bug in SQL Server 2008 R2.
Also if you are using trace flag 2562 be sure to accommodate for the additional space requirements.
And of course, all of the typical advice for optimizing tempdb on just about any system is appropriate
here as well: make sure tempdb is on its own set of fast spindles, make sure it is sized to accommodate
all other concurrent activity without having to grow, make sure you are using an optimal number of data
files, etc. A few other resources you might consider:
A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor
core
In order to run CHECKDB, modern versions of SQL Server will attempt to create a hidden snapshot of
your database on the same drive (or on all of the drives if your data files span multiple drives). You can’t
control this mechanism, but if you want to control where CHECKDB operates, create your own snapshot
first (Enterprise Edition required) on whatever drive you like, and run the DBCC command against the
snapshot. In either case, you’ll want to run this operation during a relative downtime, to minimize the
copy-on-write activity that will go through the snapshot. And you won’t want this schedule to conflict
with any heavy write operations, like index maintenance or ETL.
You may have seen suggestions to force CHECKDB to run in offline mode using the WITH
TABLOCKoption. I strongly recommend against this approach. If your database is actively being used,
choosing this option will just make users frustrated. And if the database is not actively being used,
you’re not saving any disk space by avoiding a snapshot, since there will be no copy-on-write activity to
store.
The operating system returned error 665(The requested operation could not be
completed due to a file system limitation) to SQL Server during a write at offset
0x[...] in file ‘[file]‘
There are some tips here for reducing the risk of these errors during CHECKDB operations, and reducing
their impact in general – with several fixes available, depending on your operating system and SQL
Server version:
http://blogs.msdn.com/b/psssql/archive/2009/03/04/…workarounds.aspx
http://blogs.msdn.com/b/psssql/archive/2008/07/10/…retries.aspx
1. Use Resource Governor on 2008 and above, as long as you are running Enterprise Edition. To
target just DBCC commands for a particular resource pool or workload group, you’ll have to
write a classifier function that can identify the sessions that will be performing this work (e.g. a
specific login or a job_id).
2. Use Trace flag 2528 to turn off parallelism for DBCC CHECKDB (as well
as CHECKFILEGROUP andCHECKTABLE). Trace flag 2528 is described here. Of course this is only
valid in Enterprise Edition, because in spite of what Books Online currently says, the truth is
that CHECKDB does not go parallel in Standard Edition.
3. While the DBCC command itself does not support OPTION (MAXDOP n), it does respect the
global setting max degree of parallelism. Probably not something I would do in production
unless I had no other options, but this is one overarching way to control
certain DBCC commands if you can’t target them more explicitly.
We’ve been asking for better control over the number of CPUs that DBCC CHECKDB uses, but they’ve
been repeatedly denied. For example, Ola Hallengren asked for the ability to add MAXDOP to the
command to limit the number of CPUs used on a multi-core system: Connect #468694 : MAXDOP option
in DBCC CHECKDB. And Chirag Roy made a similar request (or radically different, depending on your
point of view) to enable CHECKDB to override the server-level setting and use *more* CPUs: Connect
#538754 : Introduce Setting to Force DBCC CHECKDB to run Multi Threaded when MAXDOP = 1.
My Findings
In these two scenarios, I have learned the following (again, keeping in mind that your mileage may vary,
and that you will need to perform your own tests to draw any meaningful conclusions):
1. When I have to perform logical checks:
o At small database sizes, the NO_INFOMSGS option can cut processing time significantly
when the checks are run in SSMS. On larger databases, however, this benefit diminishes,
as the time and work spent relaying the information becomes such an insignificant
portion of the overall duration. 21 seconds out of 2 minutes is substantial; 88 seconds
out of 35 minutes, not so much.
o The two trace flags I tested had a significant impact on performance – representing a
runtime reduction of 40-60% when both were used together.
2. When I can push logical checks to a secondary server (again, assuming that I am performing
logical checks elsewhere against a true copy):
o In my scenario, the trace flags had very little impact on duration when
performingPHYSICAL_ONLY checks.
Of course, and I can’t stress this enough, these are relatively small databases and only used so that I
could perform repeated, measured tests in a reasonable amount of time. This was also a fairly beefy
server (80 logical CPUs, 128 GB RAM) and I was the only user. Duration and interaction with other
workloads on the system may skew these results quite a bit. Here is a quick glimpse of typical CPU
usage, using SQL Sentry Performance Advisor, during one of the CHECKDB operations (and none of the
options really changed the overall impact on CPU, just duration):
And here is another view, showing similar CPU profiles for three different sample CHECKDB operations
in historical mode (I’ve overlaid a description of the three tests sampled in this range):
On even larger databases, hosted on busier servers, you may see different effects, and your mileage is
quite likely to vary. So please perform your due diligence and test out these options and trace flags
during a typical concurrent workload before deciding how you want to approach CHECKDB.
Conclusion
DBCC CHECKDB is a very important but often undervalued part of your responsibility as a DBA or
architect, and crucial to the protection of your company’s data. Do not take this responsibility lightly,
and do your best to ensure that you do not sacrifice anything in the interest of reducing impact on your
production instances. Most importantly: look beyond the marketing data sheets to be sure you fully
understand how valid those promises are and whether you are willing to bet your company’s data on
them. Skimping out on some checks or offloading them to invalid secondary locations could be a disaster
waiting to happen.
Finally, if you have an unresolved question about DBCC CHECKDB, post it to the #sqlhelp hash tag on
twitter. Paul checks that tag often and, since his picture should appear in the main Books Onlinearticle,
it’s likely that if anyone can answer it, he can. If it’s too complex for 140 characters, you can ask here
(and I will make sure Paul sees it at some point), or post to a forum site such as-
answers.sqlsentry.net or dba.stackexchange.com.
Performance Problems with SQL Server 2012 Enterprise Edition Under
CAL Licensing
By Jonathan Kehayias
Numerous licensing changes were introduced in SQL Server 2012; the most significant was the move
from socket-based licensing to core-based licensing for Enterprise Edition. One of the challenges that
Microsoft faced with this change was providing a migration path for customers that previously used
Server+CAL based licensing for Enterprise Edition prior to SQL Server 2012. Customers under Software
Assurance can upgrade to SQL Server 2012 Enterprise Edition and still use Server+CAL licensing (also
known as “grandfathering”) but with a limitation to 20 logical processors, as documented in the SQL
Server 2012 Licensing Guide. This licensing also extends to VMs with a limit of 4 VMs being covered by
the Enterprise Server+CAL license, but still with the same 20 logical processor limitation as documented
in the SQL Server 2012 Virtualization Licensing Guide.
A lot of people have been caught off guard by the 20 logical processor limitation, even though it is
documented in the licensing guides.
An entry is made in the ERRORLOG file when the instance starts up, specifying the number of logical
processors and that the 20 processor limitation is being enforced:
With the default configuration that SQL Server applies under the 20 logical processor limitation using
Server+CAL, the first 20 schedulers are VISIBLE ONLINE and any remaining schedulers are VISIBLE
OFFLINE. As a result, performance problems can occur for the instance, due to NUMA node scheduler
imbalances. To demonstrate this I created a VM on our Dell R720 test server which has two sockets and
Intel Xeon E5-2670 processors installed, each with 8 cores and Hyperthreading enabled, providing a
total of 32 logical processors available under Windows Server 2012 Datacenter Edition. The VM was
configured to have 32 virtual CPUs with 16 virtual processors allocated in two vNUMA nodes.
In SQL Server under the Enterprise Server+CAL licensing model, this results in a scheduler configuration
that is similar to the following:
SELECT
parent_node_id,
[status],
scheduler_id,
[cpu_id],
is_idle,
current_tasks_count,
runnable_tasks_count,
active_workers_count,
load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';
As you can see, all 16 of the logical processors in the first NUMA node and only four of the logical
processors in the second NUMA node are used by the instance. This results in a significant imbalance of
schedulers between the two NUMA nodes that can lead to significant performance problems under
load. To demonstrate this, I spun up 300 connections running the AdventureWorks Books Online
workload against the instance and then captured the scheduler information for the VISIBLE ONLINE
schedulers in the instance using the following query:
SELECT
parent_node_id,
scheduler_id,
[cpu_id],
is_idle,
current_tasks_count,
runnable_tasks_count,
active_workers_count,
load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';
An example output of this query under load is shown in Figure 3 below.
You can also see this symptom visually in monitoring tools such as SQL Sentry Performance Advisor:
This information shows a significant imbalance and performance is going to be affected as a result. This
is clearly evident in the runnable tasks counts for the four schedulers in the second NUMA node, which
are three to four times the size of those for the schedulers in the first NUMA node. So what exactly is
the problem and why does this occur?
At first glance you might think that this is a bug in SQL Server, but it isn’t. This is something that occurs
by design, though I doubt that this scenario was expected when the 20 logical processor limitation was
originally implemented. On NUMA-based systems, new connections are assigned to the NUMA nodes in
a round-robin fashion, and then inside of the NUMA node the connection is assigned to a scheduler
based on load. If we change the way that we are looking at this data and aggregate the data based on
parent_node_id we’ll see that the tasks are actually being balanced across the NUMA nodes. To do this
we’ll use the following query, the output of which is shown in Figure 5.
SELECT
parent_node_id,
SUM(current_tasks_count) AS current_tasks_count,
SUM(runnable_tasks_count) AS runnable_tasks_count,
SUM(active_workers_count) AS active_workers_count,
AVG(load_factor) AS avg_load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE'
GROUP BY parent_node_id;
In other words, the 20-logical processor limit actually results in 40 schedulers ONLINE if (a) it is not a
virtual machine, (b) the processors are Intel, and (c) hyper-threading is enabled.
And the same query as above results in all 32 processors being VISIBLE ONLINE:
SELECT
parent_node_id,
[status],
scheduler_id,
[cpu_id],
is_idle,
current_tasks_count,
runnable_tasks_count,
active_workers_count,
load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';
In this case, since there are only 32 logical processors, the 20 (well, 40) core limit does not impact us at
all, and work is distributed evenly across all of the cores.
In scenarios where the 20 processor limitation affects the NUMA balance of schedulers it is possible to
manually change the server configuration to balance the number of VISIBLE ONLINE schedulers in each
of the NUMA nodes through the use of ALTER SERVER CONFIGURATION. In the VM example provided,
the following command will configure the first 10 logical processors in each NUMA node to VISIBLE
ONLINE.
And again using SQL Sentry Performance Advisor we can see the CPU load distributed more evenly
across both NUMA nodes:
This problem is not strictly limited to VMs and the way that virtual CPUs are presented to the OS. It is
also possible to run into this problem with physical hardware. For example, an older Dell R910 with four
sockets and eight cores per socket, or even an AMD Opteron 6200 Interlagos based server with two
sockets and 16 cores per socket, which presents itself as four NUMA nodes with eight cores each. Under
either of these configurations, the process imbalance can also result in one of the NUMA nodes being
set offline entirely. Consequently, other side effects such as memory from that node being distributed
across the online nodes leading to foreign memory access issues can also degrade performance.
Summary
The default configuration of SQL Server 2012 using the Enterprise Edition licensing for Server+CAL is not
ideal under all NUMA configurations that might exist for SQL Server. Whenever Enterprise Server+CAL
licensing is being used, the NUMA configuration and scheduler statuses per NUMA node needs to be
reviewed to ensure that SQL Server is configured for optimum performance. This problem does not
occur under core-based licensing since all of the schedulers are licensed and VISIBLE ONLINE.
The Benefits of Indexing Foreign Keys
By Erin Stellato
Primary and foreign keys are fundamental characteristics of relational databases, as originally noted
inE.F. Codd’s paper, “A Relational Model of Data for Large Shared Data Banks”, published in 1970. The
quote often repeated is, “The key, the whole key, and nothing but the key, so help me Codd.”
A primary key is a constraint in SQL Server, which acts to uniquely identify each row in a table. The key
can be defined as a single non-NULL column, or a combination of non-NULL columns which generates a
unique value, and is used to enforce entity integrity for a table. A table can only have one primary key,
and when a primary key constraint is defined for a table, a unique index is created. That index will be a
clustered index by default, unless specified as a nonclustered index when the primary key constraint is
defined.
Consider the Sales.SalesOrderHeader table in the AdventureWorks2012 database. This table holds basic
information about a sales order, including order date and customer ID, and each sale is uniquely
identified by a SalesOrderID, which is the primary key for the table. Every time a new row is added to
the table, the primary key constraint (named PK_SalesOrderHeader_SalesOrderID) is checked to ensure
that no row already exists with the same value for SalesOrderID.
Foreign Keys
Separate from primary keys, but very much related, are foreign keys. A foreign key is a column or
combination of columns that is the same as the primary key, but in a different table. Foreign keys are
used to define a relationship and enforce integrity between two tables.
To continue using the aforementioned example, the SalesOrderID column exists as a foreign key in
theSales.SalesOrderDetail table, where additional information about the sale is stored, such as product
ID and price. When a new sale is added to the SalesOrderHeader table, it is not required to add a row for
that sale to the SalesOrderDetail table However, when adding a row to the SalesOrderDetailtable, a
corresponding row for the SalesOrderID must exist in the SalesOrderHeader table.
Conversely, when deleting data, a row for a specific SalesOrderID can be deleted at any time from
theSalesOrderDetail table, but in order for a row to be deleted from the SalesOrderHeader table,
associated rows from SalesOrderDetail will need to be deleted first.
Unlike primary key constraints, when a foreign key constraint is defined for a table, an index is not
created by default by SQL Server. However, it’s not uncommon for developers and database
administrators to add them manually. The foreign key may be part of a composite primary key for the
table, in which case a clustered index would exist with the foreign key as part of the clustering key.
Alternatively, queries may require an index that includes the foreign key and one or more additional
columns in the table, so a nonclustered index would be created to support those queries. Further,
indexes on foreign keys can provide performance benefits for table joins involving the primary and
foreign key, and they can impact performance when the primary key value is updated, or if the row is
deleted.
An Example
We can see this in the following example, which uses copies of the aforementioned tables from
theAdventureWorks2012 database that have been expanded using a script which can be found here.
The script was developed by Jonathan Kehayias (blog | @SQLPoolBoy) and creates
aSalesOrderHeaderEnlarged table with 1,258,600 rows, and a SalesOrderDetailEnlarged table with
4,852,680 rows. After the script was run, the foreign key constraint was added using the statements
below. Note that the constraint is created with the ON DELETE CASCADE option. With this option, when
an update or delete is issued against the SalesOrderHeaderEnlarged table, rows in the corresponding
table(s) – in this case just SalesOrderDetailEnlarged – are updated or deleted.
In addition, the default, clustered index for SalesOrderDetailEnglarged was dropped and recreated to
just have SalesOrderDetailID as the primary key, as it represents a typical design.
USE [AdventureWorks2012];
GO
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
USE [AdventureWorks2012];
GO
Using SQL Sentry Plan Explorer, the execution plan shows a clustered index scan
againstSalesOrderDetailEnlarged as there is no index on SalesOrderID:
The nonclustered index to support SalesOrderDetailEnlarged was then created using the following
statement:
USE [AdventureWorks2012];
GO
GO
Another delete was executed for a SalesOrderID that affected one row in SalesOrderHeaderEnlargedand
72 rows in SalesOrderDetailEnlarged:
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
USE [AdventureWorks2012];
GO
And the query plan showed an index seek of the nonclustered index on SalesOrderID, as expected:
The query execution time dropped from 1898 ms to 27 ms – a 98.58% reduction, and reads for
theSalesOrderDetailEnlarged table decreased from 50647 to 48 – a 99.9% improvement. Percentages
aside, consider the I/O alone generated by the delete. The SalesOrderDetailEnlarged table is only 500
MB in this example, and for a system with 256 GB of available memory, a table taking up 500 MB in the
buffer cache doesn’t seem like a terrible situation. But a table of 5 million rows is relatively small; most
large OLTP systems have tables with hundreds of millions rows. In addition, it is not uncommon for
multiple foreign key references to exist for a primary key, where a delete of the primary key requires
deletes from multiple related tables. In that case, it is possible to see extended durations for deletes
which is not only a performance issue, but a blocking issue as well, depending on isolation level.
Conclusion
It is generally recommended to create an index which leads on the foreign key column(s), to support not
only joins between the primary and foreign keys, but also updates and deletes. Note that this is a
general recommendation, as there are edge case scenarios where the additional index on the foreign
key was not used due to extremely small table size, and the additional index updates actually negatively
impacted performance. As with any schema modifications, index additions should be tested and
monitored after implementation. It is important to ensure that the additional indexes produce the
desired effects and do not negatively impact solution performance. It is also worth noting how much
additional space is required by the indexes for the foreign keys. This is essential to consider before
creating the indexes, and if they do provide a benefit, must be considered for capacity planning going
forward.
Quick Tip – Speed Up a Slow Restore from the Transaction Log
By Kevin Kline
During some restore operations in SQL Server, the transaction log redo step might be taking an
unusually long time. Depending somewhat on the version and edition of SQL Server you’ve installed, you
may be able to increase performance by tinkering with the readahead performance for the redo
operations. To do this, you should use the MAXTRANSFERSIZE parameter of the RESTORE statement. For
example, if you set MAXTRANSFERSIZE = 1048576, it’ll use 1MB buffers.
If you change the MAXTRANSFERSIZE, keep an eye on the PerfMon objects for Buffer Manager and
Readahead I/O. You may also wish to keep an eye on LOGBUFFER wait stats.
I’d love to hear your feedback. Have you tried this technique? Did it work as advertised? Did it require
some changes to work on a specific version or edition?
Ten Common Threats to Execution Plan Quality
By Joe Sack
The quality of an execution plan is highly dependent on the accuracy of the estimated number of rows
output by each plan operator. If the estimated number of rows is significantly skewed from the actual
number of rows, this can have a significant impact on the quality of a query’s execution plan. Poor plan
quality can be responsible for excessive I/O, inflated CPU, memory pressure, decreased throughput and
reduced overall concurrency.
By “plan quality” – I’m talking about having SQL Server generate an execution plan that results in
physical operator choices that reflect the current state of the data. By making such decisions based on
accurate data, there is a better chance that the query will perform properly. The cardinality estimate
values are used as input for operator costing, and when the values are too far off from reality, the
negative impact to the execution plan can be pronounced. These estimates are fed to the various cost
models associated to the query itself, and bad row estimates can impact a variety of decisions including
index selection, seek vs. scan operations, parallel versus serial execution, join algorithm selection, inner
vs. outer physical join selection (e.g. build vs. probe), spool generation, bookmark lookups vs. full
clustered or heap table access, stream or hash aggregate selection, and whether or not a data
modification uses a wide or narrow plan.
As an example, let’s say you have the following SELECT query (using the Credit database):
SELECT
m.member_no,
m.lastname,
p.payment_no,
p.payment_dt,
p.payment_amt
FROM dbo.member AS m
INNER JOIN dbo.payment AS p
ON m.member_no = p.member_no;
Based on the query logic, is the following plan shape what you would expect to see?
And what about this alternative plan, where instead of a nested loop we have a hash match?
The “correct” answer is dependent on a few other factors – but one major factor is the number of rows
in each of the tables. In some cases, one physical join algorithm is more appropriate than the other –
and if the initial cardinality estimate assumptions aren’t correct, your query may be using a non-optimal
approach.
Identifying cardinality estimate issues is relatively straightforward. If you have an actual execution plan,
you can compare the estimated versus actual row count values for operators and look for skews.SQL
Sentry Plan Explorer simplifies this task by allowing you to see actual versus estimated rows for all
operators in a single plan tree tab versus having to hover over the individual operators in the graphical
plan:
Now, skews don’t always result in poor quality plans, but if you are having performance issues with a
query and you see such skews in the plan, this is one area that is then worthy of further investigation.
Identification of cardinality estimate issues is relatively straightforward, but the resolution often isn’t.
There are a number of root causes as to why cardinality estimate issues can occur, and I’ll cover ten of
the more common reasons in this post.
Of all the reasons for cardinality estimate issues, this is the one that you hope to see, as it is often
easiest to address. In this scenario, your statistics are either missing or out-of-date. You may have
database options for automatic statistics creation and updates disabled, “no recomputed” enabled for
specific statistics, or have large enough tables that your automatic statistics updates simply aren’t
happening frequently enough.
Sampling Issues
It may be that the precision of the statistics histogram is inadequate – for example, if you have a very
large table with significant and/or frequent data skews. You may need to change your sampling from the
default or if even that doesn’t help – investigate using separate tables, filtered statistics or filtered
indexes.
The query optimizer assumes that columns within the same table are independent. For example, if you
have a city and state column, we may intuitively know that these two columns are correlated, but SQL
Server does not understand this unless we help it out with an associated multi-column index, or with
manually-created multi-column statistics. Without helping the optimizer with correlation, the selectivity
of your predicates may be exaggerated.
SELECT
lastname,
firstname
FROM dbo.member
WHERE city = 'Minneapolis'
AND state_prov - 'MN';
I happen to know that 10% of our 10,000 row member table qualify for this combination, but the query
optimizer is guessing that it is 1% of the 10,000 rows:
Now contrast this with the appropriate estimate that I see after adding multi-column statistics:
Cardinality estimation issues can occur when comparing columns within the same table. This is a known
issue. If you have to do so, you can improve the cardinality estimates of the column comparisons by
using computed columns instead or by re-writing the query to use self-joins or common table
expressions.
Using table variables much? Table variables show a cardinality estimate of “1″ – which for just a small
number of rows may not be an issue, but for large or volatile result sets can significantly impact query
plan quality. Below is a screenshot of an operator’s estimate of 1 row versus the actual 1,600,000 rows
from the @charge table variable:
If this is your root cause, you would be well-advised to explore alternatives like temporary tables and or
permanent staging tables where possible.
Below shows an estimated versus actual plan when using a multi-statement table-valued function:
Implicit data type issues in conjunction with search and join conditions can cause cardinality estimate
issues. They also can also surreptitiously eat away at server-level resources (CPU, I/O, memory), so it’s
important to address them whenever possible.
Complex Predicates
You’ve probably seen this pattern before – a query with a WHERE clause that has each table column
reference wrapped in various functions, concatenation operations, mathematical operations and more.
And while not all function wrapping precludes proper cardinality estimates (such as
for LOWER, UPPERand GETDATE) there are plenty of ways to bury your predicate to the point that the
query optimizer can no longer make accurate estimates.
Query Complexity
Similar to buried predicates, are your queries extraordinarily complex? I realize “complex” is a subjective
term, and your assessment may vary, but most can agree that nesting views within views within views
that reference overlapping tables is likely to be non-optimal – especially when coupled with 10+ table
joins, function references and buried predicates. While the query optimizer does an admirable job, it
isn’t magic, and if you have significant skews, query complexity (Swiss-army knife queries) can certainly
make it next to impossible to derive accurate row estimates for operators.
Distributed Queries
Are you using distributed queries with linked servers and you see significant cardinality estimate issues?
If so, be sure to check the permissions associated with the linked server principal being used to access
the data. Without the minimum db_ddladmin fixed database role for the linked server account, this lack
of visibility to remote statistics due to insufficient permissions may be the source for your cardinality
estimation issues.
There are other reasons why cardinality estimates can be skewed, but I believe I’ve covered the most
common ones. The key point is to pay attention to the skews in association with known, poorly
performing queries. Don’t assume that the plan was generated based on accurate row count conditions.
If these numbers are skewed, you need to try to troubleshoot this first.
Bad cardinality estimates coming from SSMS execution plans
By Aaron Bertrand
I have a few people to thank for a recent update to Plan Explorer. Brooke Philpott (@Macromullet) and
Greg Gonzalez (blog | @SQLsensei), of course, for R & D and for digging into the code and sorting it out.
But also to Paul White (blog | @SQL_kiwi) for being persistent in helping us validate the fixes.
The problem Paul discovered is that SQL Server 2008+ messes up cardinality estimates on certain
queries when key or RID lookups are involved. I’ll leave the deeper explanation to Paul’s blog
post andthe bug he filed on Connect, but long story short, we were taking these misrepresented
estimates, believing them, and extrapolating them to show you “better” information. Unfortunately, as
Paul explains, we were duped.
SELECT
th.ProductID,
th.TransactionID,
th.TransactionDate
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 1
AND th.TransactionDate BETWEEN '20030901' AND '20031231';
Management Studio yields the following plan, just as Paul described it:
In Plan Explorer, we tried to be helpful by multiplying the estimated number of rows (rounded to 17) by
the number of executions (45), and came up with 765:
For most operators, this approach yields the right data, but due to this bug in SQL Server, it is not
correct for key/RID lookups. We’ve adjusted for that, and released the appropriate fix in 7.2.42.0
(download it now!). The graphical plan now properly shows correct row counts for both estimated:
And actual:
I’ll repeat Paul’s warning: Watch out for poor cardinality estimates when a predicate is applied as part
of a lookup.
There were some more complex problems caused by these misleading estimates, which we have also
addressed. I will blog about a few of those in a follow-up post – for this post I just wanted to
demonstrate that we quickly resolved the specific issue Paul highlighted in his post.
An important change to Extended Events in SQL Server 2012
By Aaron Bertrand
As you have most certainly heard elsewhere, SQL Server 2012 finally offers a version of Extended Events
that is a viable alternative to SQL Trace, in terms of both better performance and event parity. There are
other enhancements such as a usable UI in Management Studio – previously your only hope for this was
Jonathan Kehayias’ Extended Events Manager. There is also a great change related to permissions: in
SQL Server 2012 you only need ALTER ANY EVENT SESSION to create and manage Extended Event
sessions (previously you needed CONTROL SERVER).
I came across a more subtle behavior change recently that made it look like my event session was
dropping events. The change itself is not a secret, and in fact even after reading or hearing about this
change multiple times (Jonathan reminded me that he told me about this change too), I still missed it in
my initial troubleshooting since, at the time, it wasn’t a change that I thought would affect me. Lo and
behold…
TL;DR Version
In SQL Server 2012, your event session will only capture 1,000 events by default if it uses
thering_buffer target (and 10,000 for pair_matching). This is a change from 2008 / 2008 R2, where it
was limited only by memory. (The change is mentioned almost in a footnote here, back in July 2011.) To
override the default, you can use the MAX_EVENTS_LIMIT setting – but note that this setting will not be
recognized by SQL Server 2008 / 2008 R2, so if you have code that needs to work against multiple
versions, you’ll need to use a conditional.
More Details
The scenario I was working through was more complex than this, but to demonstrate this issue, let’s
assume a very simple use case for Extended Events: tracking who is modifying objects. There is a handy
facility for this: object_altered. We can see the description for this event from the following query:
So, if an object is modified, say, 20 times, I would expect to pull 40 events. And this is exactly what
happens in SQL Server 2008, 2008 R2 and 2012. The challenge comes when more than 500 modifications
happen (leading to more than 1,000 events). In SQL Server 2008 and 2008 R2, we still capture all events.
But SQL Server 2012 will drop some due to a change in the ring_buffer target. To demonstrate, let’s
build a quick, sample event session that trades performance for prevention of losing events (note that
this is not the set of options I would prescribe for any production system):
USE master;
GO
CREATE EVENT SESSION [XE_Alter] ON SERVER
ADD EVENT sqlserver.object_altered
(
ACTION (sqlserver.server_principal_name)
WHERE (sqlserver.session_id = 78) -- change 78 to your current spid
)
ADD TARGET package0.ring_buffer (SET MAX_MEMORY = 4096)
WITH (EVENT_RETENTION_MODE = NO_EVENT_LOSS, MAX_DISPATCH_LATENCY = 5 SECONDS);
;WITH raw_data(t) AS
(
SELECT CONVERT(XML, target_data)
FROM sys.dm_xe_sessions AS s
INNER JOIN sys.dm_xe_session_targets AS st
ON s.[address] = st.event_session_address
WHERE s.name = 'XE_Alter'
AND st.target_name = 'ring_buffer'
),
xml_data (ed) AS
(
SELECT e.query('.')
FROM raw_data
CROSS APPLY t.nodes('RingBufferTarget/event') AS x(e)
)
SELECT [object_name] = obj, event_count = COUNT(*)
FROM
(
SELECT
--[login] =
ed.value('(event/action[@name="server_principal_name"]/value)[1]',
'nvarchar(128)'),
obj = ed.value('(event/data[@name="object_name"]/value)[1]',
'nvarchar(128)'),
phase = ed.value('(event/data[@name="ddl_phase"]/text)[1]', 'nvarchar(128)')
FROM xml_data
) AS x
WHERE phase = 'Commit'
GROUP BY obj;
GO
object_name event_count
=========== ===========
foo_x 225
foo_y 275
This shows that 50 commit events (100 events total) were dropped for foo_x, and exactly 1,000 total
events have been collected ((225 + 275) * 2)). SQL Server seems to arbitrarily decide which events to
drop – in theory, if it were collecting 1,000 events and then stopping, I should have 275 events forfoo_x,
and 225 for foo_y, since I altered foo_x first, and I shouldn’t have hit the cap until after that loop was
completed. But obviously there are some other mechanics at play here in how XEvents decides which
events to keep and which events to throw away.
In any case, you can get around this by specifying a different value for MAX_EVENTS_LIMIT in the ADD
TARGET portion of the code:
-- ...
ADD TARGET package0.ring_buffer (SET MAX_MEMORY = 4096, MAX_EVENTS_LIMIT = 0)
------------------------------------------------------^^^^^^^^^^^^^^^^^^^^^^
-- ...
Note that 0 = unlimited, but you can specify any integer value. When we run our test above with the
new setting, we see more accurate results, since no events were dropped:
object_name event_count
=========== ===========
foo_x 275
foo_y 275
As mentioned above, if you attempt to use this property when creating an event session against SQL
Server 2008 / 2008 R2, you will get this error:
Msg 25629, Level 16, State 1, Line 1
For target, “package0.ring_buffer”, the customizable attribute, “MAX_EVENTS_LIMIT”,
does not exist.
So if you are doing any kind of code generation and want consistent behavior across versions, you’ll have to
check the version first, and only include the attribute for 2012 and above.
Conclusion
If you are upgrading from SQL Server 2008 / 2008 R2 to 2012, or have written Extended Events code that
targets multiple versions, you should be aware of this behavior change and code accordingly. Otherwise
you risk dropping events, even in situations where you would assume – and where previous behavior
would imply – that dropped events were not possible. This isn’t something tools like the Upgrade
Advisor or Best Practices Analyzer are going to point out for you.
The underlying mechanics surrounding this problem are described in detail in this bug report and this
blog post.
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT
EXISTS?
By Aaron Bertrand
A pattern I see quite a bit, and wish that I didn’t, is NOT IN. Let’s say you want to find all the patients
who have never had a flu shot. Or, in AdventureWorks2012, a similar question might be, “show me all of
the customers who have never placed an order.” Expressed using NOT IN, a pattern I see all too often,
that would look something like this (note that I’m using the enlarged header and detail tables from this
script by Jonathan Kehayias (blog | @SQLPoolBoy)):
SELECT CustomerID
FROM Sales.Customer
WHERE CustomerID NOT IN
(
SELECT CustomerID
FROM Sales.SalesOrderHeaderEnlarged
);
When I see this pattern, I cringe. But not for performance reasons – after all, it creates a decent enough
plan in this case:
The main problem is that the results can be surprising if the target column is NULLable (SQL Server
processes this as a left anti semi join, but can’t reliably tell you if a NULL on the right side is equal to – or
not equal to – the reference on the left side). Also, optimization can behave differently if the column is
NULLable, even if it doesn’t actually contain any NULL values (Gail Shaw talked about this back in 2010).
In this case, the target column is not nullable, but I wanted to mention those potential issues with NOT
IN – I may investigate these issues more thoroughly in a future post.
TL;DR version
Instead of NOT IN, use a correlated NOT EXISTS for this query pattern. Always. Other methods may rival
it in terms of performance, when all other variables are the same, but all of the other methods
introduce either performance problems or other challenges.
Alternatives
OUTER APPLY
One way we can express this result is using a correlated OUTER APPLY.
SELECT c.CustomerID
FROM Sales.Customer AS c
OUTER APPLY
(
SELECT CustomerID
FROM Sales.SalesOrderHeaderEnlarged
WHERE CustomerID = c.CustomerID
) AS h
WHERE h.CustomerID IS NULL;
Logically, this is also a left anti semi join, but the resulting plan is missing the left anti semi join operator,
and seems to be quite a bit more expensive than the NOT IN equivalent. This is because it is no longer a
left anti semi join; it is actually processed in a different way: an outer join brings in all matching and non-
matching rows, and *then* a filter is applied to eliminate the matches:
A more typical alternative is LEFT OUTER JOIN where the right side is NULL. In this case the query would
be:
SELECT c.CustomerID
FROM Sales.Customer AS c
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS h
ON c.CustomerID = h.CustomerID
WHERE h.CustomerID IS NULL;
This returns the same results; however, like OUTER APPLY, it uses the same technique of joining all the
rows, and only then eliminating the matches:
You need to be careful, though, about what column you check for NULL. In this case CustomerIDis the
logical choice because it is the joining column; it also happens to be indexed. I could have
picked SalesOrderID, which is the clustering key, so it is also in the index on CustomerID. But I could
have picked another column that is not in (or that later gets removed from) the index used for the join,
leading to a different plan. Or even a NULLable column, leading to incorrect (or at least unexpected)
results, since there is no way to differentiate between a row that doesn’t exist and a row that does exist
but where that column is NULL. And it may not be obvious to the reader / developer / troubleshooter
that this is the case. So I will also test these three WHEREclauses:
In addition to needing to be careful about column selection in the filter, the other problem I have with
the LEFT OUTER JOIN form is that it is not self-documenting, in the same way that an inner join in the
“old-style” form of FROM dbo.table_a, dbo.table_b WHERE ... is not self-documenting. By that I mean it
is easy to forget the join criteria when it is pushed to the WHERE clause, or for it to get mixed in with
other filter criteria. I realize this is quite subjective, but there it is.
EXCEPT
If all we are interested in is the join column (which by definition is in both tables), we can useEXCEPT –
an alternative that doesn’t seem to come up much in these conversations (probably because – usually –
you need to extend the query in order to include columns you’re not comparing):
SELECT CustomerID
FROM Sales.Customer AS c
EXCEPT
SELECT CustomerID
FROM Sales.SalesOrderHeaderEnlarged;
This comes up with the exact same plan as the NOT IN variation above:
One thing to keep in mind is that EXCEPT includes an implicit DISTINCT – so if you have cases where you
want multiple rows having the same value in the “left” table, this form will eliminate those duplicates.
Not an issue in this specific case, just something to keep in mind – just likeUNION versus UNION ALL.
NOT EXISTS
SELECT CustomerID
FROM Sales.Customer AS c
WHERE NOT EXISTS
(
SELECT 1
FROM Sales.SalesOrderHeaderEnlarged
WHERE CustomerID = c.CustomerID
);
(And yes, I use SELECT 1 instead of SELECT * … not for performance reasons, but simply to clarify intent:
this subquery does not return any data.)
Its performance is similar to NOT IN and EXCEPT, and it produces an identical plan, but is not prone to
the potential issues caused by NULLs or duplicates:
Performance Tests
I ran a multitude of tests, with both a cold and warm cache, to validate that my long-standing
perception about NOT EXISTS being the right choice remained true. The typical output looked like this:
I’ll take the incorrect result out of the mix when showing the average performance of 20 runs on a graph
(I only included it to demonstrate how wrong the results are), and I did execute the queries in different
order across tests to make sure that one query was not consistently benefitting from the work of a
previous query. Focusing on duration, here are the results:
If we look at duration and ignore reads, NOT EXISTS is your winner, but not by much. EXCEPT and NOT IN
aren’t far behind, but again you need to look at more than performance to determine whether these
options are valid, and test in your scenario.
Performance Results
We can immediately see how useful the index is when we look at these new results. In all but one case
(the left outer join that goes outside the index anyway), the results are clearly worse when we’ve
dropped the index:
So we can see that, while there is less noticeable impact, NOT EXISTS is still your marginal winner in
terms of duration. And in situations where the other approaches are susceptible to schema volatility, it
is your safest choice, too.
Conclusion
This was just a really long-winded way of telling you that, for the pattern of finding all rows in table A
where some condition does not exist in table B, NOT EXISTS is typically going to be your best choice. But,
as always, you need to test these patterns in your own environment, using your schema, data and
hardware, and mixed in with your own workloads.
Trimming the Transaction Log Fat
By Paul Randal
In many SQL Server workloads, especially OLTP, the database’s transaction log can be a bottleneck that
adds to the time it takes a transaction to complete. Most people assume that the I/O subsystem is the
real bottleneck, with it not being able to keep up with the amount of transaction log being generated by
the workload.
The latency of write operations to the transaction log can be monitored using
thesys.dm_io_virtual_file_stats DMV and correlated with the WRITELOG waits that are occurring on the
system. I recorded a demo video of analyzing transaction log I/O back in 2011 so I won’t repeat all of
that in this post. You can get the video here and the demo code here (suitable for running in production
right away).
If the write latency is higher than you’d expect for your I/O subsystem then the I/O subsystem cannot
keep up, as is the general supposition. Does that mean that the I/O subsystem needs to be improved
though? Not necessarily.
On many client systems I’ve found that a significant proportion of log records being generated are
unnecessary, and if you can reduce the number of log records being generated, you reduce the amount
of transaction log being written to disk. This should translate into a reduction in write latency, thus
reducing transaction completion time.
There are two main causes of extraneous log records being generated: unused nonclustered indexes,
and indexes becoming fragmented.
Whenever a record is inserted into a table, a record must be inserted into each nonclustered index
defined on the table (with the exception of filtered indexes with appropriate filters, which I’ll ignore
from this point). This means that extra log records are generated, at least one per nonclustered index,
for each table insert. The same thing applies to deleting a record in a table – the matching records must
be deleted from all the nonclustered indexes. For an update to a table record, nonclustered index
records are only updated if the nonclustered index key column(s) or included column(s) were part of the
update.
These operations are necessary, of course, to keep each nonclustered index correct with respect to the
table, but if the nonclustered index is unused by the workload, then the operations and the log records
produced by them are unnecessary overhead. Furthermore, if these unused indexes become
fragmented (which I’ll discuss later in this post), then the regular index maintenance tasks will also
operate on them, generating even more log records (from the
index REBUILD or REORGANIZEoperations) completely unnecessarily.
Unused indexes come from a variety of sources such as someone mistakenly creating an index per table
column, someone creating every index suggested by the missing index DMVs, or someone creating all
indexes suggested by the Database Tuning Advisor. It could also be that the workload characteristics
have changed and so what used to be useful indexes are no longer being used.
Wherever they came from, unused indexes should be removed to reduce their overhead. You can
determine which indexes are unused using the sys.dm_db_index_usage_stats DMV, and I recommend
you read posts by my colleagues Kimberly L. Tripp (here), and Joe Sack (here and here), as they explain
how to use the DMV correctly.
Index Fragmentation
Most people think of index fragmentation as a problem that affects queries that have to read large
amounts of data. While this is one of the problems that fragmentation can cause, fragmentation is also a
problem because of how it occurs.
Fragmentation is caused by an operation called a page split. The simplest cause of a page split is when
an index record must be inserted on a particular page (because of its key value) and the page does not
have enough free space. In this scenario, the following operations will take place:
Some of the records from the full page are moved to the new page, thus creating free space in
the required page
All of these operations generate log records, and as you might imagine, this can be significantly more
than is required to insert a new record on a page that does not require a page split. Back in 2009
Iblogged an analysis of page split cost in terms of the transaction log and found some cases where a
page split generated over 40 times more transaction log than a regular insert!
The first step in reducing the extra cost is to remove unused indexes, as I described above, so that
they’re not generating page splits. The second step is to identify remaining indexes that are becoming
fragmented (and so must be suffering page splits) using the sys.dm_db_index_physical_stats DMV (or
SQL Sentry’s new Fragmentation Manager) and proactively creating free space in them using an index
fillfactor. A fillfactor instructs SQL Server to leave empty space on index pages when the index is built,
rebuilt, or reorganized so that there is space to allow new records to be inserted without requiring a
page split, hence cutting down on the extra log records generated.
Of course nothing comes for free – the trade-off when using fillfactors is that you are proactively
provisioning extra space in the indexes to prevent more log records being generated – but that’s usually
a good trade-off to make. Choosing a fillfactor is relatively easy and I blogged about that here.
Summary
Reducing the write latency of a transaction log file does not always mean moving to a faster I/O
subsystem, or segregating the file into its own portion of the I/O subsystem. With some simple analysis
of the indexes in your database, you may be able to significantly reduce the amount of transaction log
records being generated, leading to a commensurate reduction in write latency.
There are other, more subtle issues that can affect transaction log performance, and I’ll explore those in
a future post.
TRANSACTION_MUTEX and Multi-Session Transaction Access
By Joe Sack
I recently encountered high TRANSACTION_MUTEX accumulated wait time on a client system. I couldn’t
recall a case where I saw this wait type as near the top of the “high waits” list and I was curious about
what factors could push this type of overall wait time.
The Books Online definition of TRANSACTION_MUTEX is that it “occurs during synchronization of access
to a transaction by multiple batches.” Not many areas within the SQL Server engine expose this type of
functionality, so my investigation was narrowed down to the following technologies:
The deprecated sp_getbindtoken and sp_bindsession system stored procedures used to handle
bound connections
Distributed transactions
My goal was to test each technology and see if it influenced the TRANSACTION_MUTEX wait type.
The first test I performed used the deprecated sp_getbindtoken and sp_bindsession stored procedures.
The sp_getbindtoken returns a transaction identifier which can then be used bysp_bindsession to bind
multiple sessions together on the same transaction.
Before each test scenario, I made sure to clear my test SQL Server instance’s wait statistics:
sp_getbindtoken / sp_bindsession
In the first session window of SQL Server Management Studio, I executed the following code to begin a
transaction and output the bind token for enlistment by the other planned sessions:
USE Credit;
GO
BEGIN TRANSACTION;
USE Credit;
GO
WHILE 1 = 1
BEGIN
UPDATE dbo.charge
SET charge_dt = SYSDATETIME();
END
After a few seconds, I cancelled each executing query. Of the three sessions, only one was able to
actually perform updates – even though the other two sessions were actively joined to the same
transaction. And if I looked at the TRANSACTION_MUTEX wait type, I can see that it did indeed
increment:
SELECT [wait_type],
[waiting_tasks_count],
[wait_time_ms],
[max_wait_time_ms],
[signal_wait_time_ms]
FROM sys.dm_os_wait_stats
WHERE wait_type = 'TRANSACTION_MUTEX';
The results for this particular test were as follows:
So I see that there were two waiting tasks (the two sessions that were simultaneously trying to update
the same table via the loop). Since I hadn’t executed SET NOCOUNT ON, I was able to see that only the
first executed UPDATE loop got changes in. I tried this similar technique using a few different variations
(for example – four overall sessions, with three waiting) – and the TRANSACTION_MUTEXincrementing
showed similar behavior. I also saw the TRANSACTION_MUTEX accumulation when simultaneously
updating a different table for each session – so modifications against the same object in a loop wasn’t
required in order to reproduce the TRANSACTION_MUTEX wait time accumulation.
Distributed transactions
My next test involved seeing if TRANSACTION_MUTEX wait time was incremented for distributed
transactions. For this test, I used two SQL Server instances and a linked server connected between the
two of them. MS DTC was running and properly configured, and I executed the following code that
performed a local DELETE and a remote DELETE via the linked server and then rolled back the changes:
USE Credit;
GO
ROLLBACK TRANSACTION;
The TRANSACTION_MUTEX showed no activity on the local server:
TRANSACTION_MUTEX 0 0 0 0
However the waiting tasks count was incremented on the remote server:
TRANSACTION_MUTEX 1 0 0 0
So my expectation to see this was confirmed – given that we have one distributed transaction with more
than one session involved in some way with the same transaction.
What about the use of Multiple Active Result Sets (MARS)? Would we also expect to
seeTRANSACTION_MUTEX accumulate when associated with MARS usage?
For this, I used the following C# console application code tested from Microsoft Visual Studio against my
SQL Server 2012 instance and the Credit database. The logic of what I’m actually doing isn’t very useful
(returns one row from each table), but the data readers are on the same connection and the connection
attribute MultipleActiveResultSets is set to true, so it was enough to verify if indeed MARS could
drive TRANSACTION_MUTEX accumulation as well:
string ConnString =
@"Server=.;Database=Credit;Trusted_Connection=True;MultipleActiveResultSets=t
rue;";
SqlConnection MARSCon = new SqlConnection(ConnString);
MARSCon.Open();
MARSReader1.Read();
MARSReader2.Read();
Console.WriteLine("\t{0}", MARSReader1[0]);
Console.WriteLine("\t{0}", MARSReader2[0]);
After executing this code, I saw the following accumulation for TRANSACTION_MUTEX:
TRANSACTION_MUTEX 8 2 0 0
So as you can see, the MARS activity (however trivially implemented) caused an uptick in
theTRANSACTION_MUTEX wait type accumulation. And the connection string attribute itself doesn’t
drive this, the actual implementation does. For example, I removed the second reader implementation
and just maintained a single reader with MultipleActiveResultSets=true, and as expected, there was
noTRANSACTION_MUTEX wait time accumulation.
Conclusion
If you are seeing high TRANSACTION_MUTEX waits in your environment, I hope this post gives you some
insight into three avenues to explore - to determine both where these waits are coming from, and
whether or not they are necessary.
Trimming More Transaction Log Fat
By Paul Randal
In my previous post on streamlining the operations of the transaction log, I discussed two of the most
common causes of extra log records being generated: dead weight from unused nonclustered indexes
and page split operations (that cause index fragmentation). Assuming you’ve read that post, I
mentioned that there are more subtle problems that can be detrimental to transaction log performance,
and I’m going to cover these here.
Every so often SQL Server will flush a portion of the transaction log to disk. A log flush occurs whenever:
A transaction abort log record is generated at the end of a transaction roll back.
60KB of log records have been generated since the previous log flush.
The smallest log flush possible is a single 512-byte log block. If all transactions in a workload are very
small (e.g. inserting a single, small table row) then there will be lots of minimally-sized log flushes
occurring. Log flushes are performed asynchronously, to allow decent transaction log throughput, but
there is a fixed limit of 32 concurrent log-flush I/Os at any one time.
There are two possible effects this may have:
1. On a slow-performing I/O subsystem, the volume of tiny transaction log writes could overwhelm
the I/O subsystem leading to high-latency writes and subsequent transaction log throughput
degradation. This situation can be identified by high-write latencies for the transaction log file in
the output of sys.dm_io_virtual_file_stats (see the demo links at the top of the previous post)
2. On a high-performing I/O subsystem, the writes may complete extremely quickly, but the limit
of 32 concurrent log-flush I/Os creates a bottleneck when trying to make the log records durable
on disk. This situation can be identified by low write latencies and a near-constant number of
outstanding transaction log writes near to 32 in the aggregated output of
sys.dm_io_pending_io_requests (see the same demo links).
In both cases, making transactions longer (which is very counter-intuitive!) can reduce the frequency of
transaction log flushes and increase performance. Additionally, in case #1, moving to a higher-
performing I/O subsystem may help – but may lead to case #2. With case #2, if the transactions cannot
be made longer, the only alternative is to split the workload over multiple databases to get around the
fixed limit of 32 concurrent log-flush I/Os.
Whenever new space is added to the transaction log it must be zero-initialized (writing out zeroes to
overwrite the previous use of that portion of the disk), no matter whether the instant file initialization
feature is enabled or not. This applies to creation, manual growth, and auto-growth of the transaction
log. While the zero initialization is taking places, log records cannot be flushed to the log, so auto-
growth during a workload that is changing data can lead to a noticeable drop in throughput, especially if
the auto-growth size is set to be large (say gigabytes, or left at the default 10%).
Auto-growth should be avoided, then, if at all possible by allowing the transaction log to clear so there is
always free space that can be reused for new log records. Transaction log clearing (also known as
transaction log truncation, not to be confused with transaction log shrinking) is performed by
transaction log backups when using the Full or Bulk-Logged recovery modes, and by checkpoints when
using the Simple recovery mode.
Log clearing can only occur if nothing requires the log records in the section of transaction log being
cleared. One common problem that prevents log clearing is having long-running transactions. Until a
transaction commits or rolls back, all the log records from the beginning of the transaction onwards are
required in case the transaction rolls back – including all the log records from other transactions that are
interspersed with those from the long-running transaction. Long-running transactions could be because
of poor design, code that is waiting for human input, or improper use of nested transactions, for
example. All of these can be avoided once they are identified as a problem.
High-Availability Features
Database mirroring and availability groups when running asynchronously can build up a queue
of log records that have not yet been sent to the redundant database copy. These log records
must be kept around until they’re sent, delaying transaction log clearing.
Transactional replication (and also Change Data Capture) relies on a Log Reader Agent job to
periodically scan the transaction log for transactions that modify a table contained in a
replication publication. If the Log Reader Agent falls behind for any reason, or is purposefully
made to run infrequently, all the log records that have not been scanned by the job must be
kept around, delaying transaction log clearing.
When running in synchronous mode, database mirroring and availability groups can cause other
problems with the logging mechanism. Using synchronous database mirroring as an example, any
transaction that commits on the principal cannot actually return to the user or application until all log
records it generated have successfully been sent to the mirror server, adding a commit delay on the
principal. If the average transaction size is long, and the delay is short, this may not be noticeable, but if
the average transaction is very short, and the delay is quite long, this can have a noticeable effect on the
workload throughput. In that case, either the performance goals of the workload need to be changed,
the high-availability technology changed to asynchronous mode, or the network bandwidth and speed
between the principal and redundant databases must be increased.
Incidentally, the same kind of issue can occur if the I/O subsystem itself is synchronously mirrored – with
a potential delay for all writes that SQL Server performs.
Summary
As you can see, transaction log performance is not just about extra transaction log records being
generated – there are many environmental factors that can have a profound effect too.
The bottom line is that transaction log health and performance are of paramount importance for
maintaining overall workload performance. In these two posts I’ve detailed the major causes of
transaction log performance problems so hopefully you’ll be able to identify and remediate any that you
have.
If you want to learn a whole lot more about transaction log operations and performance tuning, I
recommend that you check out my 7.5 hour online training course on logging, recovery, and the
transaction log, available through Pluralsight.
Generate a set or sequence without loops – part 1
By Aaron Bertrand
There are many use cases for generating a sequence of values in SQL Server. I’m not talking about a
persisted IDENTITYcolumn (or the new SEQUENCE in SQL Server 2012), but rather a transient set to be
used only for the lifetime of a query. Or even the simplest cases – such as just appending a row number
to each row in a resultset – which might involve adding aROW_NUMBER() function to the query (or,
better yet, in the presentation tier, which has to loop through the results row-by-row anyway).
I’m talking about slightly more complicated cases. For example, you may have a report that shows sales
by date. A typical query might be:
SELECT
OrderDate = CONVERT(DATE, OrderDate),
OrderCount = COUNT(*)
FROM dbo.Orders
GROUP BY CONVERT(DATE, OrderDate)
ORDER BY OrderDate;
The problem with this query is that, if there are no orders on a certain day, there will be no row for that
day. This can lead to confusion, misleading data, or even incorrect calculations (think daily averages) for
the downstream consumers of the data.
So there is a need to fill those gaps with the dates that are not present in the data. And sometimes
people will stuff their data into a #temp table and use a WHILE loop or a cursor to fill in the missing
dates one-by-one. I won’t show that code here because I don’t want to advocate its use, but I’ve seen it
all over the place.
Before we get too deep into dates, though, let’s first talk about numbers, since you can always use a
sequence of numbers to derive a sequence of dates.
Numbers table
I’ve long been an advocate of storing an auxiliary “numbers table” on disk (and, for that matter, a
calendar table as well).
Here is one way to generate a simple numbers table with 1,000,000 values:
However, many people are opposed to the auxiliary table approach. Their argument: why store all that
data on disk (and in memory) when they can generate the data on-the-fly? My counter is to be realistic
and think about what you’re optimizing; computation can be expensive, and are you sure that
calculating a range of numbers on the fly is always going to be cheaper? As far as space, the Numbers
table only takes up about 11 MB compressed, and 17 MB uncompressed. And if the table is referenced
frequently enough, it should always be in memory, making access fast.
Let’s take a look at a few examples, and some of the more common approaches used to satisfy them. I
hope we can all agree that, even at 1,000 values, we don’t want to solve these problems using a loop or
a cursor.
Numbers table
spt_values
This is a table that is used by internal stored procedures for various purposes. Its use online seems to be
quite prevalent, even though it is undocumented, unsupported, it may disappear one day, and because
it only contains a finite, non-unique, and non-contiguous set of values. There are 2,164 unique and 2,508
total values in SQL Server 2008 R2; in 2012 there are 2,167 unique and 2,515 total. This includes
duplicates, negative values, and even if using DISTINCT, plenty of gaps once you get beyond the number
2,048. So the workaround is to use ROW_NUMBER() to generate a contiguous sequence, starting at 1,
based on the values in the table.
That said, for only 1,000 values, you could write a slightly simpler query to generate the same sequence:
SELECT DISTINCT n = number FROM master..[spt_values] WHERE number BETWEEN 1 AND 1000;
This leads to a simpler plan, of course, but breaks down pretty quickly (once your sequence has to be
more than 2,048 rows):
In any case, I do not recommend the use of this table; I’m including it for comparison purposes, only
because I know how much of this is out there, and how tempting it might be to just re-use code you
come across.
sys.all_objects
Another approach that has been one of my favorites over the years is to use sys.all_objects.
Like spt_values, there is no reliable way to generate a contiguous sequence directly, and we have the
same issues dealing with a finite set (just under 2,000 rows in SQL Server 2008 R2, and just over 2,000
rows in SQL Server 2012), but for 1,000 rows we can use the same ROW_NUMBER() trick. The reason I
like this approach is that (a) there is less concern that this view will disappear anytime soon, (b) the view
itself is documented and supported, and (c) it will run on any database on any version since SQL Server
2005 without having to cross database boundaries (including contained databases).
Plan:
Stacked CTEs
I believe Itzik Ben-Gan deserves the ultimate credit for this approach; basically you construct a CTE with
a small set of values, then you create the Cartesian product against itself in order to generate the
number of rows you need. And again, instead of trying to generate a contiguous set as part of the
underlying query, we can just apply ROW_NUMBER() to the final result.
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2) -- 10*100
SELECT n = ROW_NUMBER() OVER (ORDER BY n) FROM e3 ORDER BY n;
Plan:
Recursive CTE
Finally, we have a recursive CTE, which uses 1 as the anchor, and adds 1 until we hit the maximum. For
safety I specify the maximum in both the WHERE clause of the recursive portion, and in
the MAXRECURSION setting. Depending on how many numbers you need, you may have to
set MAXRECURSION to 0.
;WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 1000
)
SELECT n FROM n ORDER BY n
OPTION (MAXRECURSION 1000);
Plan:
Performance
Of course with 1,000 values the differences in performance is negligible, but it can be useful to see how
these different options perform:
To Be Continued…
If this is your typical use case, and you won’t venture far beyond 1,000 rows, then I hope I have shown
the fastest ways to generate those numbers. If your use case is a larger number, or if you are looking for
solutions to generate sequences of dates, stay tuned. Later in this series, I will explore generating
sequences of 50,000 and 1,000,000 numbers, and of date ranges ranging from a week to a year.
Generate a set or sequence without loops – part 2
By Aaron Bertrand
In my previous post, I talked about ways to generate a sequence of contiguous numbers from 1 to 1,000.
Now I’d like to talk about the next levels of scale: generating sets of 50,000 and 1,000,000 numbers.
When starting this series, I was genuinely curious how the different approaches would scale to larger
sets of numbers. At the low end I was a little dismayed to find that my favorite approach –
using sys.all_objects – was not the most efficient method. But how would these different techniques
scale to 50,000 rows?
Numbers table
Since we have already created a Numbers table with 1,000,000 rows, this query remains virtually
identical:
Plan:
spt_values
Since there are only ~2,500 rows in spt_values, we need to be a little more creative if we want to use it
as the source of our set generator. One way to simulate a larger table is to CROSS JOIN it against itself. If
we did that raw we’d end up with ~2,500 rows squared (over 6 million). Needing only 50,000 rows, we
need about 224 rows squared. So we can do this:
;WITH x AS
(
SELECT TOP (224) number FROM [master]..spt_values
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.number)
FROM x CROSS JOIN x AS y
ORDER BY n;
Note that this is more equivalent to, but more concise than, this variation:
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.number)
FROM
(SELECT TOP (224) number FROM [master]..spt_values) AS x
CROSS JOIN
(SELECT TOP (224) number FROM [master]..spt_values) AS y
ORDER BY n;
sys.all_objects
Like spt_values, sys.all_objects does not quite satisfy our 50,000 row requirement on its own, so we will
need to perform a similar CROSS JOIN.
;;WITH x AS
(
SELECT TOP (224) [object_id] FROM sys.all_objects
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.[object_id])
FROM x CROSS JOIN x AS y
ORDER BY n;
Plan:
Stacked CTEs
We only need to make a minor adjustment to our stacked CTEs in order to get exactly 50,000 rows:
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e2 CROSS JOIN e2 AS b), -- 100*100
e4(n) AS (SELECT 1 FROM e3 CROSS JOIN (SELECT TOP 5 n FROM e1) AS b) -- 5*10000
SELECT n = ROW_NUMBER() OVER (ORDER BY n) FROM e4 ORDER BY n;
Plan:
Recursive CTEs
An even less substantial change is required to get 50,000 rows out of our recursive CTE: change
the WHERE clause to 50,000 and change the MAXRECURSION option to zero.
;WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 50000
)
SELECT n FROM n ORDER BY n
OPTION (MAXRECURSION 0);
Plan:
Performance
As with the last set of tests, we’ll compare each technique, including the Numbers table with both a cold
and warm cache, and both compressed and uncompressed:
To get a better visual, let’s remove the recursive CTE, which was a total dog in this test and which skews
the results:
Runtime, in milliseconds, to generate 50,000 contiguous numbers (excluding recursive CTE)
At 1,000 rows, the difference between compressed and uncompressed was marginal, since the query
only needed to read 8 and 9 pages respectively. At 50,000 rows, the gap widens a bit: 74 pages vs. 113.
However, the overall cost of decompressing the data seems to outweigh the savings in I/O. So, at 50,000
rows, an uncompressed numbers table seems to be the most efficient method of deriving a contiguous
set – though, admittedly, the advantage is marginal.
While I can’t envision very many use cases where you’d need a contiguous set of numbers this large, I
wanted to include it for completeness, and because I did make some interesting observations at this
scale.
Numbers table
The TOP isn’t strictly necessary, but that’s only because we know that our Numbers table and our
desired output have the same number of rows. The plan is still quite similar to previous tests:
spt_values
To get a CROSS JOIN that yields 1,000,000 rows, we need to take 1,000 rows squared:
;WITH x AS
(
SELECT TOP (1000) number FROM [master]..spt_values
)
SELECT n = ROW_NUMBER() OVER (ORDER BY x.number)
FROM x CROSS JOIN x AS y ORDER BY n;
Plan:
sys.all_objects
;WITH x AS
(
SELECT TOP (1000) [object_id] FROM sys.all_objects
)
SELECT n = ROW_NUMBER() OVER (ORDER BY x.[object_id])
FROM x CROSS JOIN x AS y ORDER BY n;
Plan:
Stacked CTEs
For the stacked CTE, we just need a slightly different combination of CROSS JOINs to get to 1,000,000
rows:
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2 AS b), -- 10*100
e4(n) AS (SELECT 1 FROM e3 CROSS JOIN e3 AS b) -- 1000*1000
SELECT n = ROW_NUMBER() OVER (ORDER BY n) FROM e4 ORDER BY n;
Plan:
At this row size, you can see that the stacked CTE solution goes parallel. So I also ran a version
with MAXDOP 1 to get a similar plan shape as before, and to see if parallelism really helps:
Recursive CTE
The recursive CTE again has just a minor change; only the WHERE clause needs to change:
;WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 1000000
)
SELECT n FROM n ORDER BY n
OPTION (MAXRECURSION 0);
Plan:
Performance
While again we see the uncompressed Numbers table (at least with a warm cache) as the winner, the
difference even at this scale is not all that remarkable.
To be continued…
Now that we’ve thoroughly explored a handful of approaches to generating a sequence of numbers,
we’ll move on to dates. In the final post of this series, we’ll walk through the construction of a date
range as a set, including the use of a calendar table, and a few use cases where this can be handy.
Generate a set or sequence without loops – part 3
By Aaron Bertrand
Earlier in this series (Part 1 | Part 2) we talked about generating a series of numbers using various
techniques. While interesting, and useful in some scenarios, a more practical application is to generate a
series of contiguous dates; for example, a report that requires showing all the days of a month, even if
some days had no transactions.
In a previous post I mentioned that it is easy to derive a series of days from a series of numbers. Since
we’ve already established multiple ways to derive a series of numbers, let’s look at how the next step
looks. Let’s start very simple, and pretend we want to run a report for three days, from January 1st
through January 3rd, and include a row for every day. The old-fashioned way would be to create a
#temp table, create a loop, have a variable that holds the current day, within the loop insert a row into
the #temp table until the end of the range, and then use the #temp table to outer join to our source
data. That’s more code than I even want to present here, never mind put in production and maintain.
Starting simple
With an established sequence of numbers (regardless of the method you choose), this task becomes
much easier. For this example I can replace complex sequence generators with a very simple union,
since I only need three days. I’m going to make this set contain four rows, so that it is also easy to
demonstrate how to cut off to exactly the series you need.
First, we have a couple of variables to hold the start and end of the range we’re interested in:
;WITH n(n) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT n FROM n ORDER BY n;
-- result:
n
----
1
2
3
4
To convert that into a series of dates, we can simply apply DATEADD() from the start date:
;WITH n(n) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT DATEADD(DAY, n, @s) FROM n ORDER BY n;
-- result:
----
2012-01-02
2012-01-03
2012-01-04
2012-01-05
This still isn’t quite right, since our range starts on the 2nd instead of the 1st. So in order to use our start
date as the base, we need to convert our set from 1-based to 0-based. We can do that by subtracting 1:
;WITH n(n) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT DATEADD(DAY, n-1, @s) FROM n ORDER BY n;
-- result:
----
2012-01-01
2012-01-02
2012-01-03
2012-01-04
Almost there! We just need to limit the result from our larger series source, which we can do by feeding
the DATEDIFF, in days, between the start and end of the range, to a TOP operator – and then adding 1
(since DATEDIFF essentially reports an open-ended range).
;WITH n(n) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1) DATEADD(DAY, n-1, @s) FROM n ORDER BY n;
-- result:
----
2012-01-01
2012-01-02
2012-01-03
Adding real data
Now to see how we would join against another table to derive a report, we can just use that our new
query and outer join against the source data.
;WITH n(n) AS
(
SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4
),
d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1) DATEADD(DAY, n-1, @s)
FROM n ORDER BY n
)
SELECT
d.OrderDate,
OrderCount = COUNT(o.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeader AS o
ON o.OrderDate >= d.OrderDate
AND o.OrderDate < DATEADD(DAY, 1, d.OrderDate)
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
(Note that we can no longer say COUNT(*), since this will count the left side, which will always be 1.)
;WITH d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1) DATEADD(DAY, n-1, @s)
FROM
(
SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4
) AS n(n) ORDER BY n
)
SELECT
d.OrderDate,
OrderCount = COUNT(o.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeader AS o
ON o.OrderDate >= d.OrderDate
AND o.OrderDate < DATEADD(DAY, 1, d.OrderDate)
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
This should make it easier to envision how you would replace the leading CTE with the generation of a
date sequence from any source you choose. We’ll go through those (with the exception of the recursive
CTE approach, which only served to skew graphs), using AdventureWorks2012, but we’ll use
the SalesOrderHeaderEnlarged table I created from this script by Jonathan Kehayias. I added an index to
help with this specific query:
Also note that I’m choosing an arbitrary date ranges that I know exists in the table.
Numbers table
;WITH d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1) DATEADD(DAY, n-1, @s)
FROM dbo.Numbers ORDER BY n
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND CONVERT(DATE, s.OrderDate) = d.OrderDate
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
spt_values
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, @s)
FROM (SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
ROW_NUMBER() OVER (ORDER BY Number) FROM master..spt_values) AS x(n)
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND CONVERT(DATE, s.OrderDate) = d.OrderDate
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
sys.all_objects
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, @s)
FROM (SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects) AS x(n)
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND CONVERT(DATE, s.OrderDate) = d.OrderDate
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
Stacked CTEs
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b),
d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
d = DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY n)-1, @s)
FROM e2
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND d.OrderDate = CONVERT(DATE, s.OrderDate)
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
Now, for a year long range, this won’t cut it, since it only produces 100 rows. For a year we’d need to
cover 366 rows (to account for potential leap years), so it would look like this:
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b),
e3(n) AS (SELECT 1 FROM e2 CROSS JOIN (SELECT TOP (37) n FROM e2) AS b),
d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
d = DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY N)-1, @s)
FROM e3
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND d.OrderDate = CONVERT(DATE, s.OrderDate)
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
Calendar table
This is a new one that we didn’t talk much about in the previous two posts. If you are using date series
for a lot of queries then you should consider having both a Numbers table and a Calendar table. The
same argument holds about how much space is really required and how fast access will be when the
table is queried frequently. For example, to store 30 years of dates, it requires less than 11,000 rows
(exact number depends on how many leap years you span), and takes up a mere 200 KB. Yes, you read
that right: 200 kilobytes. (And compressed, it’s only 136 KB.)
To generate a Calendar table with 30 years of data, assuming you’ve already been convinced that having
a Numbers table is a good thing, we can do this:
Now to use that Calendar table in our sales report query, we can write a much simpler query:
SELECT
OrderDate = c.d,
OrderCount = COUNT(s.SalesOrderID)
FROM dbo.Calendar AS c
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND c.d = CONVERT(DATE, s.OrderDate)
WHERE c.d >= @s AND c.d <= @e
GROUP BY c.d
ORDER BY c.d;
Performance
I created both compressed and uncompressed copies of the Numbers and Calendar tables, and tested a
one week range, a one month range, and a one year range. I also ran queries with cold cache and warm
cache, but that turned out to be largely inconsequential.
Addendum
Paul White (blog | @SQL_Kiwi) pointed out that you can coerce the Numbers table to produce a much
more efficient plan using the following query:
SELECT
OrderDate = DATEADD(DAY, n, 0),
OrderCount = COUNT(s.SalesOrderID)
FROM dbo.Numbers AS n
LEFT OUTER JOIN Sales.SalesOrderHeader AS s
ON s.OrderDate >= CONVERT(DATETIME, @s)
AND s.OrderDate < DATEADD(DAY, 1, CONVERT(DATETIME, @e))
AND DATEDIFF(DAY, 0, OrderDate) = n
WHERE
n.n >= DATEDIFF(DAY, 0, @s)
AND n.n <= DATEDIFF(DAY, 0, @e)
GROUP BY n
ORDER BY n;
At this point I’m not going to re-run all of the performance tests (exercise for the reader!), but I will
assume that it will generate better or similar timings. Still, I think a Calendar table is a useful thing to
have even if it isn’t strictly necessary.
Conclusion
The results speak for themselves. For generating a series of numbers, the Numbers table approach wins
out, but only marginally – even at 1,000,000 rows. And for a series of dates, at the lower end, you will
not see much difference between the various techniques. However, it is quite clear that as your date
range gets larger, particularly when you’re dealing with a large source table, the Calendar table really
demonstrates its worth – especially given its low memory footprint. Even with Canada’s wacky metric
system, 60 milliseconds is way better than about 10 *seconds* when it only incurred 200 KB on disk.
I hope you’ve enjoyed this little series; it’s a topic I’ve been meaning to revisit for ages.
Potential enhancements to ASPState
By Aaron Bertrand
Many people have implemented ASPState in their environment. Some people use the in-memory option
(InProc), but usually I see the database option being used. There are some potential inefficiencies here
that you might not notice on low volume sites, but that will start to affect performance as your web
volume ramps up.
Recovery Model
Make sure ASPState is set to simple recovery – this will dramatically reduce the impact to the log that
can be caused by the high volume of (transient and largely disposable) writes that are likely to go here:
When setting up ASPState initially, you can use the -sstype c and -d arguments to store session state in a
custom database that is already on a different drive (just like you would with tempdb). Or, if your
tempdb database is already optimized, you can use the -sstype t argument. These are explained in detail
in the Session-State Modes andASP.NET SQL Server Registration Tool documents on MSDN.
If you’ve already installed ASPState, and you’ve determined that you would benefit from moving it to its
own (or at least a different) volume, then you can schedule or wait for a brief maintenance period and
follow these steps:
At this point you will need to manually move the files to <new path>, and then you can bring the
database back online:
Isolate applications
It is possible to point more than one application at the same session state database. I recommend
against this. You may want to point applications at different databases, perhaps even on different
instances, to better isolate resource usage and provide utmost flexibility for all of your web properties.
If you already have multiple applications using the same database, that’s okay, but you’ll want to keep
track of the impact each application might be having. Microsoft’s Rex Tang published a useful query to
see space consumed by each session; here is a modification that will summarize number of sessions and
total/avg session size per application:
SELECT
a.AppName,
SessionCount = COUNT(s.SessionId),
TotalSessionSize = SUM(DATALENGTH(s.SessionItemLong)),
AvgSessionSize = AVG(DATALENGTH(s.SessionItemLong))
FROM
dbo.ASPStateTempSessions AS s
LEFT OUTER JOIN
dbo.ASPStateTempApplications AS a
ON SUBSTRING(s.SessionId, 25, 8) =
SUBSTRING(sys.fn_varbintohexstr(CONVERT(VARBINARY(8), a.AppId)), 3, 8)
GROUP BY a.AppName
ORDER BY TotalSessionSize DESC;
If you find that you have a lopsided distribution here, you can set up another ASPState database
elsewhere, and point one or more applications at that database instead.
The code for dbo.DeleteExpiredSessions uses a cursor, replacing a single DELETE in earlier
implementations. (This, I think, was based largely on this post by Greg Low.)
DELETE ASPState..ASPStateTempSessions
WHERE Expires < @now
RETURN 0
GO
(And it may still be, depending on where you downloaded the source, or how long ago you installed
ASPState. There are many outdated scripts out there for creating the database, though you really should
be using aspnet_regsql.exe.)
Currently (as of .NET 4.5), the code looks like this (anyone know when Microsoft will start using semi-
colons?).
My idea is to have a happy medium here – don’t try to delete ALL rows in one fell swoop, but don’t play
one-by-one whack-a-mole, either. Instead, delete n rows at a time in separate transactions – reducing
the length of blocking and also minimizing the impact to the log:
BEGIN TRANSACTION;
WHILE @c <> 0
BEGIN
;WITH x AS
(
SELECT TOP (@top) SessionId
FROM dbo.ASPStateTempSessions
WHERE Expires < @now
ORDER BY SessionId
)
DELETE x;
SET @c = @@ROWCOUNT;
IF @@TRANCOUNT = 1
BEGIN
COMMIT TRANSACTION;
BEGIN TRANSACTION;
END
END
IF @@TRANCOUNT = 1
BEGIN
COMMIT TRANSACTION;
END
END
GO
You will want to experiment with TOP depending on how busy your server is and what impact it has on
duration and locking. You may also want to consider implementing snapshot isolation – this will force
some impact to tempdb but may reduce or eliminating blocking seen from the app.
Also, by default, the job ASPState_Job_DeleteExpiredSessions runs every minute. Consider dialing that
back a bit – reduce the schedule to maybe every 5 minutes (and again, a lot of this will come down to
how busy your applications are and testing the impact of the change). And on the flip side, make sure it
is enabled – otherwise your sessions table will grow and grow unchecked.
Every time a page is loaded (and, if the web app hasn’t been created correctly, possibly multiple times
per page load), the stored procedure dbo.TempResetTimeout is called, ensuring that the timeout for
that particular session is extended as long as they continue to generate activity. On a busy web site, this
can cause a very high volume of update activity against the table dbo.ASPStateTempSessions. Here is the
current code for dbo.TempResetTimeout:
Now, imagine you have a web site with 500 or 5,000 users, and they are all madly clicking from page to
page. This is probably one of the most frequently called operations in any ASPState implementation, and
while the table is keyed on SessionId – so the impact of any individual statement should be minimal – in
aggregate this can be substantially wasteful, including on the log. If your session timeout is 30 minutes
and you update the timeout for a session every 10 seconds because of the nature of the web app, what
is the point of doing it again 10 seconds later? As long as that session is asynchronously updated at some
point before the 30 minutes are up, there is no net difference to the user or the application. So I thought
that you could implement a more scalable way to “touch” sessions to update their timeout values.
One idea I had was to implement a service broker queue so that the application does not have to wait
on the actual write to happen – it calls the dbo.TempResetTimeout stored procedure, and then the
activation procedure takes over asynchronously. But this still leads to a lot more updates (and log
activity) than is truly necessary.
A better idea, IMHO, is to implement a queue table that you only insert to, and on a schedule (such that
the process completes a full cycle in some time shorter than the timeout), it would only update the
timeout for any session it seesonce, no matter how many times they *tried* to update their timeout
within that span. So a simple table might look like this:
And then we would change the stock procedure to push session activity onto this stack instead of
touching the sessions table directly:
The clustered index is on the smalldatetime column to prevent page splits (at the potential cost of a hot
page), since the event time for a session touch will always be monotonically increasing.
Then we’ll need a background process to periodically summarize new rows in dbo.SessionStack and
updatedbo.ASPStateTempSessions accordingly.
BEGIN TRY
;WITH summary(SessionId, Expires) AS
(
SELECT SessionId, MAX(EventTime)
FROM #s GROUP BY SessionId
)
UPDATE src
SET Expires = DATEADD(MINUTE, [Timeout], summary.Expires)
FROM dbo.ASPStateTempSessions AS src
INNER JOIN summary
ON src.SessionId = summary.SessionId;
You may want to add more transactional control and error handling around this – I’m just presenting an
off-the-cuff idea, and you can get as crazy around this as you want. :-)
You might think you would want to add a non-clustered index on dbo.SessionStack(SessionId, EventTime
DESC) to facilitate the background process, but I think it is better to focus even the most miniscule
performance gains on the process users wait for (every page load) as opposed to one that they don’t
wait for (the background process). So I’d rather pay the cost of a potential scan during the background
process than pay for additional index maintenance during every single insert. As with the clustered index
on the #temp table, there is a lot of “it depends” here, so you may want to play with these options to
see where your tolerance works best.
Unless the frequency of the two operations need to be drastically different, I would schedule this as part
of theASPState_Job_DeleteExpiredSessions job (and consider renaming that job if so) so that these two
processes don’t trample on each other.
One final idea here, if you find you need to scale out even more, is to create
multiple SessionStack tables, where each one is responsible for a subset of sessions (say, hashed on the
first character of the SessionId). Then you can process each table in turn, and keep those transactions
that much smaller. In fact you could do something similar for the delete job as well. If done correctly,
you should be able to put these in individual jobs and run them concurrently, since – in theory – the
DML should be affecting completely different sets of pages.
Conclusion
Those are my ideas so far. I’d love to hear about your experiences with ASPState: What kind of scale
have you achieved? What kind of bottlenecks have you observed? What have you done to mitigate
them?
Selecting a Processor for SQL Server 2012
By Glenn Berry
Since Microsoft revamped the licensing model for SQL Server 2012, it is especially important to do some
thoughtful analysis before you decide exactly which processor to use for a database server that will be
running SQL Server 2012. The move to core-based licensing for SQL Server 2012 Enterprise Edition
means that a careless decision about precisely what processor you will be running on can cost both a
great deal of money and a great deal of performance and scalability. The same issue exists (to a lesser
extent) with SQL Server 2012 Standard Edition.
The difference in SQL Server 2012 licensing costs between a good processor choice and a bad processor
choice can more than pay for your hardware and your storage subsystem in many cases. Given this, how
do you go about making an optimal processor choice for SQL Server 2012 Enterprise Edition? Normally,
the first step would be to decide whether you wanted to use an Intel Xeon processor or an AMD
Opteron processor (SQL Server 2012 does not support the Intel Itanium processor family).
Unfortunately, it is very difficult to make a good technical or business case to select an AMD Opteron
processor for use with SQL Server 2012 Enterprise Edition. Since the introduction of the Intel Nehalem
architecture (Intel Xeon 3500, 5500, 6500 and 7500 series) in 2008-2010, AMD has simply not been able
to compete with Intel when it comes to single-threaded processor performance. AMD does not have the
financial or technical resources to compete with Intel in terms of performance or power efficiency. With
each new processor family release from Intel, AMD has been falling further behind. This is not a good
thing for the I.T. industry, since a lack of viable competition from AMD will inevitably slow down the
pace of innovation from Intel.
Since an Intel Xeon processor seems to be the only viable choice for SQL Server 2012, the next step is
deciding which one of the many available Xeon families and models would be the best choice for your
intended SQL Server 2012 workload. Intel has different Xeon product families for different socket count
servers. For single-socket workstations and servers they have the Intel Xeon E3 family. For two-socket
workstations and servers they have the Intel Xeon E5 family. Finally, for two-socket, four-socket, and
eight-socket (or more) servers, they have the Intel Xeon E7 family.
Since this article is discussing SQL Server 2012 Enterprise Edition, we will ignore the single-socket Intel
Xeon E3 family, since Intel Xeon E3 processors are limited to using 32GB of DDR3 RAM. There are some
niche scenarios where it might make very good sense to use an Intel Xeon E3-1290V2 processor in a
single-socket server with 32GB of RAM in combination with SQL Server 2012 Enterprise Edition. Perhaps
you have a relatively small database where you need the absolute fastest single-threaded performance
and you also need specific Enterprise Edition features such as SQL Server AlwaysOn Availability Groups.
The more common choice is between a two-socket server and a four-socket (or more) server. You are
going to have to assess your workload size and intensity and decide whether it can run on a smaller, but
faster two-socket server, or whether you will have to make the jump to a larger, slower four-socket or
larger server. This decision is directly affected by your required total CPU capacity, total physical RAM
capacity, and your total required I/O capacity (which is related to the number and type of PCI-E
expansion slots in the server).
One common misconception is that bigger Intel-based servers (in terms of socket counts) are faster
servers. This is simply not true, for a number of reasons. The sales volume and market share of two-
socket servers is much higher than it is for four-socket and larger servers. There is also less engineering
and validation work required for two-socket capable Intel processors compared to four-socket capable
Intel processors. Because of these factors, Intel releases new processor architectures more frequently
and earlier for lower socket count servers. Currently, Intel’s single-socket E3 family is using the 22nm Ivy
Bridge and the two-socket E5 family is using the 32nm Sandy Bridge-EP, while Intel E7 family is using the
older 32nm Westmere-EX microarchitecture.
Another reason is that you do not get linear scaling as you increase your socket count, even with Non-
uniform memory access (NUMA) architecture processors, which scale much better than the older
symmetrical multiprocessing (SMP) architecture. This means that a four-socket server will not have
twice the processor performance or capacity as a two-socket server with the same model processor.
This can be confirmed by comparing the TPC-E OLTP benchmark results of two-socket systems with Intel
Xeon E7-2870 processors to four-socket systems with Intel Xeon E7-4870 processors to eight-socket
systems with Intel Xeon E7-8870 processors. Even though these are essentially the same processor with
the same individual performance characteristics, the TPC-E benchmark score does not double as you
double the socket count, as you can see in Table
1.
Processor Socket Count TPC-E Score Total Core Count TPC-E Score/Core
Xeon E7-2870 2 1560.70 20 78.04
When I think about comparing single-socket to two-socket, to four and eight-socket processors, I like to
use a car and truck analogy. A single-socket server is like a Formula-1 race car, being extremely fast but
having very little cargo capacity. A two-socket server is like a Tesla Model S, being very fast and having
pretty decent cargo capacity. A four-socket server is like a large SUV, being slower but having more
cargo capacity than a Tesla Model S. Finally, an eight-socket server is like a Mack truck, able to haul a
huge load at a much slower rate than an SUV.
Processor Socket Count TPC-E Score Total Core Count TPC-E Score/Core
Xeon E5-2690 2 1881.76 16 117.61
Xeon E5-4650 4 2651.27 32 82.85
Comparing Table 1 to Table 2, we can see that the Intel Xeon E5 family does quite a bit better on TPC-E
than the Intel Xeon E7 family does, which is no surprise, since we are comparing the newer Sandy
Bridge-EP to the older Westmere-EX microarchitecture. From a performance perspective, the two-
socket Xeon E5-2690 does much better than the two-socket Xeon E7-2870. In my opinion, you really
should not be using the two-socket Xeon E7-2870 for SQL Server 2012 because of its lower single-
threaded performance and higher physical core counts (which means a higher SQL Server 2012 licensing
cost).
Currently, my favorite Intel server processor is the Intel Xeon E5-2690. It will give you excellent single-
threaded performance and relatively affordable SQL Server 2012 licensing costs. If you need to step up
to a four-socket server, then I would choose an Intel Xeon E5-4650 processor instead of using an Intel
Xeon E7-4870 processor, since you will get better single-threaded performance and lower SQL Server
2012 license costs. Using TPC-E benchmark scores is an excellent way to compare the performance and
SQL Server 2012 license efficiency of different processor families.
A Look At DBCC CHECKCONSTRAINTS and I/O
By Erin Stellato
A common element used in database design is the constraint. Constraints come in a variety of flavors
(e.g. default, unique) and enforce the integrity of the column(s) on which they exist. When implemented
well, constraints are a powerful component in the design of a database because they prevent data that
doesn’t meet set criteria from getting into a database. However, constraints can be violated using
commands such as WITH NOCHECK andIGNORE_CONSTRAINTS. In addition, when using
the REPAIR_ALLOW_DATA_LOSS option with any DBCC CHECK command to repair database corruption,
constraints are not considered.
Consequently, it is possible to have invalid data in the database – either data that doesn’t adhere to a
constraint, or data that no longer maintains the expected primary-foreign key relationship. SQL Server
includes the DBCC CHECKCONSTRAINTS statement to find data that violates constraints. After any repair
option executes, run DBCC CHECKCONSTRAINTS for the entire database to ensure there are no issues,
and there may be times when it’s appropriate to run CHECKCONSTRAINTS for a select constraint or a
table. Maintaining data integrity is critical, and while it’s not typical to run DBCC CHECKCONSTRAINTS on
a scheduled basis to find invalid data, when you do need to run it, it’s a good idea to understand the
performance impact it may create.
DBCC CHECKCONSTRAINTS can execute for a single constraint, a table, or the entire database. Like other
check commands, it can take substantial time to complete and will consume system resources,
particularly for larger databases. Unlike other check commands, CHECKCONSTRAINTS does not use a
database snapshot.
With Extended Events we can examine resource usage when we execute DBCC CHECKCONSTRAINTS for
the table. To better show the impact, I ran the Create Enlarged AdventureWorks Tables.sql script from
Jonathan Kehayias (blog |@SQLPoolBoy) to create larger tables. Jonathan’s script only creates the
indexes for the tables, so the statements below are necessary to add a few selected constraints:
USE [AdventureWorks2012];
GO
Once the constraints exist, we can compare the resource usage for DBCC CHECKCONSTRAINTS for a
single constraint, a table, and the entire database using Extended Events. First we’ll create a session
that simply capturessp_statement_completed events, includes the sql_text action, and sends the output
to the ring_buffer:
Next we’ll start the session and run each of the DBCC CHECKCONSTRAINT commands, then output the
ring buffer to a temp table to manipulate. Note that DBCC DROPCLEANBUFFERS executes before each
check so that each starts from cold cache, keeping a level testing field.
USE [AdventureWorks2012];
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS ('[Sales].[CK_SalesOrderDetailEnlarged_OrderQty]') WITH
NO_INFOMSGS;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS
('[Sales].[FK_SalesOrderDetailEnlarged_SalesOrderHeaderEnlarged_SalesOrderID]
') WITH NO_INFOMSGS;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS ('[Sales].[SalesOrderDetailEnlarged]') WITH NO_INFOMSGS;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS, NO_INFOMSGS;
GO
SELECT
n.value('(@name)[1]', 'varchar(50)') AS event_name,
DATEADD(HOUR ,DATEDIFF(HOUR, SYSUTCDATETIME(),
SYSDATETIME()),n.value('(@timestamp)[1]', 'datetime2')) AS [timestamp],
n.value('(data[@name="duration"]/value)[1]', 'bigint') AS duration,
n.value('(data[@name="physical_reads"]/value)[1]', 'bigint') AS physical_reads,
n.value('(data[@name="logical_reads"]/value)[1]', 'bigint') AS logical_reads,
n.value('(action[@name="sql_text"]/value)[1]', 'varchar(max)') AS sql_text,
n.value('(data[@name="statement"]/value)[1]', 'varchar(max)') AS [statement]
INTO #EventData
FROM
@target_data.nodes('RingBufferTarget/event[@name=''sp_statement_completed'']')
AS q(n);
GO
Parsing the ring_buffer into a temp table may take some additional time (about 20 seconds on my
machine), but repeated querying of the data is faster from a temp table than via the ring_buffer. If we
look at the output we see there are several statements executed for each DBCC CHECKCONSTRAINTS:
SELECT *
FROM #EventData
WHERE [sql_text] LIKE 'DBCC%';
Using Extended Events to dig into the inner workings of CHECKCONSTRAINTS is an interesting task, but
what we’re really interested here is resource consumption – specifically I/O. We can aggregate
the physical_reads for each check command to compare the I/O:
In order to check a constraint, SQL Server has to read through the data to find any rows that might
violate the constraint. The definition of the CK_SalesOrderDetailEnlarged_OrderQty constraint
is [OrderQty] > 0. The foreign key
constraint, FK_SalesOrderDetailEnlarged_SalesOrderHeaderEnlarged_SalesOrderID, establishes a
relationship onSalesOrderID between
the [Sales].[SalesOrderHeaderEnlarged] and [Sales].[SalesOrderDetailEnlarged]tables. Intuitively it might
seem as though checking the foreign key constraint would require more I/O, as SQL Server must read
data from two tables. However, [SalesOrderID] exists in the leaf level of
theIX_SalesOrderHeaderEnlarged_SalesPersonID nonclustered index on
the [Sales].[SalesOrderHeaderEnlarged] table, and in the IX_SalesOrderDetailEnlarged_ProductID index
on the [Sales].[SalesOrderDetailEnlarged] table. As such, SQL Server scans those two indexes to
compare the [SalesOrderID] values between the two tables. This requires just over 19,000 reads. In the
case of the CK_SalesOrderDetailEnlarged_OrderQty constraint, the [OrderQty] column is not included in
any index, so a full scan of the clustered index occurs, which requires over 72,000 reads.
When all the constraints for a table are checked, the I/O requirements are higher than if a single
constraint is checked, and they increase again when the entire database is checked. In the example
above, the [Sales].[SalesOrderHeaderEnlarged] and [Sales].[SalesOrderDetailEnlarged] tables are
disproportionally larger than other tables in the database. This is not uncommon in real-world
scenarios; very often databases have several large tables which comprise a large portion of the
database. When running CHECKCONSTRAINTS for these tables, be aware of the potential resource
consumption required for the check. Run checks during off hours when possible to minimize user
impact. In the event that checks must be running during normal business hours, understanding what
constraints exist, and what indexes exist to support validation, can help gauge the effect of the
check. You can execute checks in a test or development environment first to understand the
performance impact, but variations may then exist based on hardware, comparable data, etc. And
finally, remember that any time you run a check command that includes
theREPAIR_ALLOW_DATA_LOSS option, follow the repair with DBCC CHECKCONSTRAINTS. Database
repair does not take any constraints into account as corruption is fixed, so in addition to potentially
losing data, you may end up with data that violates one or more constraints in your database.
Transaction Log Configuration Issues
By Paul Randal
In the my last two posts I discussed ways to reduce the amount of transaction log being
generated and how to ensure the transaction log can always clear properly. In this post I want to
continue with the transaction log performance theme and discuss some transaction log configuration
issues that can cause problems.
The transaction log is split up into chunks called virtual log files (VLFs) so the log management system
can easily keep track of which portions of the transaction log are available for reuse. There is a formula
for how many VLFs you get when you create your transaction log, manually grow it, or it auto-grows:
More than 1GB 16 VLFs, each roughly 1/16 of the total size
For example, if you create a transaction log to be 8GB you’ll get 16 VLFs where each is roughly 512MB. If
you then grow the log by another 4GB, you’ll get an additional 16 VLFs with each being roughly 256MB,
for a total of 32 VLFs.
A general best practice is to set the log auto-growth to something other than the default 10%, so that
you can control the pause that’s required when zero-initializing new transaction log space. Let’s say you
create a 256MB transaction log and set the auto-growth to 32MB, and then the log grows to a steady-
state size of 16GB. Given the formula above, this will result in your transaction log having more than
4,000 VLFs.
This many VLFs will likely result in some performance issues for operations that process the transaction
log (e.g. crash recovery, log clearing, log backups, transactional replication, database restores). This
situation is called having VLF fragmentation. Generally any number of VLFs more than a thousand or so
is going to be problematic and needs to be addressed (the most I’ve ever heard of is 1.54 million VLFs in
a transaction log that was more than 1TB in size!).
The way to tell how many VLFs you have is to use the undocumented (and completely safe) DBCC
LOGINFO command. The number of rows of output is the number of VLFs in your transaction log. If you
think you have too many, the way to reduce them is:
4. Manually grow the log to the size it should be, in up to 8GB steps so each VLF is no larger than
about 0.5GB
You can read more about VLF fragmentation issues and the process to fix them at:
Tempdb
Tempdb needs to have its transaction log configured just like any other database, and it may grow just
like any other database. But it also has some insidious behavior that can cause you problems.
When a SQL Server instance restarts for any reason, tempdb’s data and log files will revert to the size
they were most recently set to. This is different from all other databases, which remain at their current
size after an instance restart.
This behavior means that if the tempdb transaction log has grown to accommodate the normal
workload you must perform an ALTER DATABASE to set the log file size otherwise its size will drop after
an instance restart and it will have to grow again. Every time a log file grows or auto-grows, the new
space must be zero-initialized and logging activity pauses while that is done. So if you do not manage
your tempdb log file size correctly, you’ll pay a performance penalty as it grows after each instance
restart.
Quite often I hear people saying how they usually shrink a database’s transaction log after it grows from
a regular operation (e.g. a weekly data import). This is not a good thing to do.
Just as I explained above, whenever the transaction log grows or auto-grows, there’s a pause while the
new portion of the log file is zero-initialized. If you’re regularly shrinking the transaction log because it
grows to size X, that means you’re regularly suffering performance problems as the transaction log auto-
grows back to size X again.
If your transaction log keeps growing to size X, leave it alone! Proactively set it to size X, managing your
VLFs as I explained above, and accept size X as the size that’s required for your normal workload. A
larger transaction log is not a problem.
There is no performance gain from creating multiple log files for a database. Adding a second log file
may be necessary, however, if the existing log file runs out of space and you’re unwilling to force the
transaction log to clear by switching to the simple recovery model and performing a checkpoint (as this
breaks the log backup chain).
I’m often asked whether there is any pressing reason to remove the second log file or whether it’s ok to
leave it in place. The answer is that you should remove it as soon as you can.
Although the second log file doesn’t cause performance problems for your workload, it does affect
disaster recovery. If your database is destroyed for some reason, you’ll need to restore it from scratch.
The first phase of any restore sequence is to create the data and log files if they don’t exist.
You can make the data file creation almost instantaneous by enabling instant file initialization which
skips the zero-initialization but that doesn’t apply to log files. This means that the restore has to create
all log files that existed when the full backup was taken (or are created during the period of time
covered by a transaction log backup) and zero-initialize them. If created a second log file and forgot to
drop it again, zero-initializing it during a disaster recovery operation is going to add to the total
downtime. This isn’t a workload performance problem, but it affects the availability of the server as a
whole.
The final issue in my list is actually a bug in SQL Server. If you use a database snapshot as a way to
quickly recover back to a known point in time without having to restore backups (known as reverting
from the snapshot) then you can save a lot of time. However, there is a big downside.
When the database reverts from the database snapshot, the transaction log is recreated with two
0.25MB VLFs. This means you will have to grow your transaction log back to its optimal size and number
of VLFs (or it will auto-grow itself), with all the zero-initialization and workload pauses I’ve discussed
previously. Clearly not the desired behavior.
Summary
As you can see from this post and my previous two posts, there are many things that can lead to poor
transaction log performance, which then has a knock-on effect on the performance of your overall
workload.
If you can take care of all these things, you’ll have healthy transaction logs. But it doesn’t end there as
you need to make sure you’re monitoring your transaction logs so you’re alerted for things like auto-
growth and excessive read and write I/O latencies. I’ll cover how to do that in a future post.
The Halloween Problem – Part 1
By Paul White
Much has been written over the years about understanding and optimizing SELECT queries, but rather
less about data modification. This series of posts looks at an issue that is specific
toINSERT, UPDATE, DELETE and MERGE queries – the Halloween Problem.
The phrase “Halloween Problem” was originally coined with reference to a SQL UPDATE query that was
supposed to give a 10% raise to every employee who earned less than $25,000. The problem was that
the query kept giving 10% raises until everyone earned at least $25,000. We will see later on in this
series that the underlying issue also applies to INSERT, DELETE and MERGEqueries, but for this first
entry, it will be helpful to examine the UPDATE problem in a bit of detail.
Background
The SQL language provides a way for users to specify database changes using an UPDATE statement, but
the syntax says nothing about how the database engine should perform the changes. On the other hand,
the SQL standard does specify that the result of an UPDATE must be the same as if it had been executed
in three separate and non-overlapping phases:
1. A read-only search determines the records to be changed and the new column values
Implementing these three phases literally in a database engine would produce correct results, but
performance might not be very good. The intermediate results at each stage will require system
memory, reducing the number of queries the system can execute concurrently. The memory required
might also exceed that which is available, requiring at least part of the update set to be written out to
disk storage and read back again later on. Last but not least, each row in the table needs to be touched
multiple times under this execution model.
An alternative strategy is to process the UPDATE a row at a time. This has the advantage of only
touching each row once, and generally does not require memory for storage (though some operations,
like a full sort, must process the full input set before producing the first row of output). This iterative
model is the one used by the SQL Server query execution engine.
The challenge for the query optimizer is to find an iterative (row by row) execution plan that satisfies
the UPDATE semantics required by the SQL standard, while retaining the performance and concurrency
benefits of pipelined execution.
CREATE TABLE dbo.Employees
(
Name nvarchar(50) NOT NULL,
Salary money NOT NULL
);
INSERT dbo.Employees
(Name, Salary)
VALUES
('Brown', $22000),
('Smith', $21000),
('Jones', $25000);
UPDATE e
SET Salary = Salary * $1.1
FROM dbo.Employees AS e
WHERE Salary < $25000;
The read-only first phase finds all the records that meet the WHERE clause predicate, and saves enough
information for the second phase to do its work. In practice, this means recording a unique identifier for
each qualifying row (the clustered index keys or heap row identifier) and the new salary value. Once
phase one is complete, the whole set of update information is passed to the second phase, which
locates each record to be updated using the unique identifier, and changes the salary to the new value.
The third phase then checks that no database integrity constraints are violated by the final state of the
table.
Iterative strategy
This approach reads one row at a time from the source table. If the row satisfies the WHERE clause
predicate, the salary increase is applied. This process repeats until all rows have been processed from
the source. A sample execution plan using this model is shown below:
As is usual for SQL Server’s demand-driven pipeline, execution starts at the leftmost operator –
the UPDATE in this case. It requests a row from the Table Update, which asks for a row from the
Compute Scalar, and down the chain to the Table Scan:
The Table Scan operator reads rows one at a time from the storage engine, until it finds one that
satisfies the Salary predicate. The output list in the graphic above shows the Table Scan operator
returning a row identifier and the current value of the Salary column for this row. A single row
containing references to these two pieces of information is passed up to the Compute Scalar:
The Compute Scalar defines an expression that applies the salary raise to the current row. It returns a
row containing references to the row identifier and the modified salary to the Table Update, which
invokes the storage engine to perform the data modification. This iterative process continues until the
Table Scan runs out of rows. The same basic process is followed if the table has a clustered index:
The main difference is that the clustered index key(s) and uniquifier (if present) are used as the row
identifier instead of a heap RID.
The Problem
Changing from the logical three-phase operation defined in the SQL standard to the physical iterative
execution model has introduced a number of subtle changes, only one of which we are going to look at
today. A problem can occur in our running example if there is a nonclustered index on the Salary
column, which the query optimizer decides to use to find rows that qualify (Salary < $25,000):
The row-by-row execution model can now produce incorrect results, or even get into an infinite loop.
Consider an (imaginary) iterative execution plan that seeks the Salary index, returning a row at a time to
the Compute Scalar, and ultimately on to the Update operator:
There are a couple of extra Compute Scalars in this plan due to an optimization that skips nonclustered
index maintenance if the Salary value has not changed (only possible for a zero salary in this case).
Ignoring that, the important feature of this plan is that we now have an ordered partial index scan
passing a row at a time to an operator that modifies the same index (the green highlight in the Plan
Explorer graphic above makes it clear the Clustered Index Update operator maintains both the base
table and the nonclustered index).
Anyway, the problem is that by processing one row at a time, the Update can move the current row
ahead of the scan position used by the Index Seek to locate rows to change. Working through the
example should make that statement a bit clearer:
The nonclustered index is keyed, and sorted ascending, on the salary value. The index also contains a
pointer to the parent row in the base table (either a heap RID or the clustered index keys plus uniquifier
if necessary). To make the example easier to follow, assume the base table now has a unique clustered
index on the Name column, so the nonclustered index contents at the start of update processing are:
The first row returned by the Index Seek is the $21,000 salary for Smith. This value is updated to
$23,100 in the base table and the nonclustered index by the Clustered Index operator. The
nonclustered index now contains:
The next row returned by the Index Seek will be the $22,000 entry for Brown which is updated to
$24,200:
Now the Index Seek finds the $23,100 value for Smith, which is updated again, to $25,410. This process
continues until all employees have a salary of at least $25,000 – which is not a correct result for the
given UPDATE query. The same effect in other circumstances can lead to a runaway update which only
terminates when the server runs out of log space or an overflow error occurs (it could occur in this case
if someone had a zero salary). This is the Halloween Problem as it applies to updates.
Eagle-eyed readers will have noticed that the estimated cost percentages in the imaginary Index Seek
plan did not add up to 100%. This is not a problem with Plan Explorer – I deliberately removed a key
operator from the plan:
The query optimizer recognizes that this pipelined update plan is vulnerable to the Halloween Problem,
and introduces an Eager Table Spool to prevent it from occurring. There is no hint or trace flag to
prevent inclusion of the spool in this execution plan because it is required for correctness.
As its name suggests, the spool eagerly consumes all rows from its child operator (the Index Seek)
before returning a row to its parent Compute Scalar. The effect of this is to introduce complete phase
separation – all qualifying rows are read and saved into temporary storage before any updates are
performed.
This brings us closer to the three-phase logical semantic of the SQL standard, though please note plan
execution is still fundamentally iterative, with operators to the right of the spool forming the read
cursor, and operators to the left forming the write cursor. The contents of the spool are still read and
processed row by row (it is not passed en masse as the comparison with the SQL standard might
otherwise lead you to believe).
The drawbacks of the phase separation are the same as mentioned earlier. The Table Spool
consumes tempdb space (pages in the buffer pool) and may require physical reads and writes to disk
under memory pressure. The query optimizer assigns an estimated cost to the spool (subject to all the
usual caveats about estimations) and will choose between plans that require protection against the
Halloween Problem versus those that don’t on the basis of estimated cost as normal. Naturally, the
optimizer may incorrectly choose between the options for any of the normal reasons.
In this case, the trade-off is between the efficiency increase by seeking directly to qualifying records
(those with a salary < $25,000) versus the estimated cost of the spool required to avoid the Halloween
Problem. An alternative plan (in this specific case) is a full scan of the clustered index (or heap). This
strategy does not require the same Halloween Protection because the keys of the clustered index are
not modified:
Because the index keys are stable, rows cannot move position in the index between iterations, avoiding
the Halloween Problem in the present case. Depending on the runtime cost of the Clustered Index Scan
compared with the Index Seek plus Eager Table Spool combination seen previously, one plan may
execute faster than the other. Another consideration is that the plan with Halloween Protection will
acquire more locks than the fully pipelined plan, and the locks will be held for longer.
Final Thoughts
Understanding the Halloween Problem and the effects it can have on data modification query plans will
help you analyse data-changing execution plans, and can offer opportunities to avoid the costs and side-
effects of unnecessary protection where an alternative is available.
There are several forms of the Halloween Problem, not all of which are caused by reading and writing to
the keys of a common index. The Halloween Problem is also not limited to UPDATEqueries. The query
optimizer has more tricks up its sleeve to avoid the Halloween Problem aside from brute-force phase
separation using an Eager Table Spool. These points (and more) will be explored in the next instalments
of this series.
The Halloween Problem – Part 2
By Paul White
In the first part of this series, we saw how the Halloween Problem applies to UPDATE queries. To recap
briefly, the problem was that an index used to locate records to update had its keys modified by the
update operation itself (another good reason to use included columns in an index rather than extending
the keys). The query optimizer introduced an Eager Table Spool operator to separate the reading and
writing sides of the execution plan to avoid the problem. In this post, we will see how the same
underlying issue can affect INSERT and DELETEstatements.
Insert Statements
Now we know a bit about the conditions that require Halloween Protection, it is quite easy to create
an INSERT example that involves reading from and writing to the keys of the same index structure. The
simplest example is duplicating rows in a table (where adding new rows inevitably modifies the keys of
the clustered index):
CONSTRAINT PK_Demo
PRIMARY KEY (SomeKey)
);
INSERT dbo.Demo
SELECT SomeKey FROM dbo.Demo;
The problem is that newly inserted rows might be encountered by the reading side of the execution
plan, potentially resulting in a loop that adds rows forever (or at least until some resource limit is
reached). The query optimizer recognizes this risk, and adds an Eager Table Spool to provide the
necessary phase separation:
You probably don’t often write queries to duplicate every row in a table, but you likely do write queries
where the target table for an INSERT also appears somewhere in the SELECT clause. One example is
adding rows from a staging table that do not already exist in the destination:
-- Sample data
INSERT dbo.Staging
(SomeKey)
VALUES
(1234),
(1234);
-- Test query
INSERT dbo.Demo
SELECT s.SomeKey
FROM dbo.Staging AS s
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Demo AS d
WHERE d.SomeKey = s.SomeKey
);
The execution plan is:
The problem in this case is subtly different, though still an example of the same core issue. There is no
value ‘1234’ in the target Demo table, but the Staging table contains two such entries. Without phase
separation, the first ‘1234’ value encountered would be inserted successfully, but the second check
would find that the value ‘1234’ now exists and would not attempt to insert it again. The statement as a
whole would complete successfully.
This might produce a desirable outcome in this particular case (and might even seem intuitively correct)
but it is not a correct implementation. The SQL standard requires that data modification queries execute
as if the three phases of reading, writing and checking constraints occur completely separately (see part
one).
Searching for all rows to insert as a single operation, we should select both ‘1234’ rows from the Staging
table, since this value does not exist in the target yet. The execution plan should therefore try to
insert both ‘1234’ rows from the Staging table, resulting in a primary key violation:
The phase separation provided by the Table Spool ensures that all checks for existence are completed
before any changes are made to the target table. If you run the query in SQL Server with the sample
data above, you will receive the (correct) error message.
Halloween Protection is required for INSERT statements where the target table is also referenced in the
SELECT clause.
Delete Statements
We might expect the Halloween Problem not to apply to DELETE statements, since it shouldn’t really
matter if we try to delete a row multiple times. We can modify our staging table example toremove rows
from the Demo table that do not exist in Staging:
DELETE dbo.Demo
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Staging AS s
WHERE s.SomeKey = dbo.Demo.SomeKey
);
This test seems to validate our intuition because there is no Table Spool in the execution plan:
This type of DELETE does not require phase separation because each row has a unique identifier (an RID
if the table is a heap, clustered index key(s) and possibly a uniquifier otherwise). This unique row locator
is a stable key – there is no mechanism by which it can change during execution of this plan, so the
Halloween Problem does not arise.
Nevertheless, there is at least one case where a DELETE requires Halloween protection: when the plan
references a row in the table other than the one which is being deleted. This requires a self-join,
commonly found when hierarchical relationships are modelled. A simplified example is shown below:
CONSTRAINT PK_Test
PRIMARY KEY (pk)
);
INSERT dbo.Test
(pk, ref)
VALUES
('B', 'A'),
('C', 'B'),
('D', 'C');
There really ought to be a same-table foreign key reference defined here, but let’s ignore that design
failing for a moment – the structure and data are nonetheless valid (and it is sadly quite common to find
foreign keys omitted in the real world). Anyway, the task at hand is to delete any row where
the ref column points to a non-existent pk value. The natural DELETE query matching this requirement
is:
If the execution engine starts with the row where pk = B, it would find no matching row (ref = A and
there is no row where pk = A). If execution then moves on to the row where pk = C, it would also be
deleted because we just removed row B pointed to by its ref column. The end result would be that
iterative processing in this order would delete all the rows from the table, which is clearly incorrect.
On the other hand, if the execution engine processed the row with pk =D first, it would find a matching
row (ref = C). Assuming execution continued in reverse pk order, the only row deleted from the table
would be the one where pk = B. This is the correct result (remember the query should execute as if the
read, write, and validation phases had occurred sequentially and without overlaps).
As an aside, we can see another example of phase separation if we add a same-table foreign key
constraint to the previous example:
CONSTRAINT PK_Test
PRIMARY KEY (pk),
CONSTRAINT FK_ref_pk
FOREIGN KEY (ref)
REFERENCES dbo.Test (pk)
);
INSERT dbo.Test
(pk, ref)
VALUES
('B', NULL),
('C', 'B'),
('D', 'C');
The insert itself does not require Halloween protection since the plan does not read from the same table
(the data source is an in-memory virtual table represented by the Constant Scan operator). The SQL
standard does however require that phase 3 (constraint checking) occurs after the writing phase is
complete. For this reason, a phase separation Eager Table Spool is added to the plan after the Clustered
Index Index, and just before each row is checked to make sure the foreign key constraint remains valid.
If you are starting to think that translating a set-based declarative SQL modification query to a robust
iterative physical execution plan is a tricky business, you are beginning to see why update processing (of
which Halloween Protection is but a very small part) is the most complex part of the Query Processor.
DELETE statements require Halloween Protection where a self-join of the target table is present.
Summary
Halloween Protection can be an expensive (but necessary) feature in execution plans that change data
(where ‘change’ includes all SQL syntax that adds, changes or removes rows). Halloween Protection is
required for UPDATE plans where a common index structure’s keys are both read and modified,
for INSERT plans where the target table is referenced on the reading side of the plan, and
for DELETE plans where a self-join on the target table is performed.
The next part in this series will cover some special Halloween Problem optimizations that apply only
to MERGE statements.
The Halloween Problem – Part 3
By Paul White
The MERGE statement (introduced in SQL Server 2008) allows us to perform a mixture
of INSERT, UPDATE, and DELETE operations using a single statement. The Halloween Protection issues
forMERGE are mostly a combination of the requirements of the individual operations, but there are
some important differences and a couple of interesting optimizations that apply only to MERGE.
We start by looking again at the Demo and Staging example from part two:
CONSTRAINT PK_Demo
PRIMARY KEY (SomeKey)
);
INSERT dbo.Staging
(SomeKey)
VALUES
(1234),
(1234);
INSERT dbo.Demo
SELECT s.SomeKey
FROM dbo.Staging AS s
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Demo AS d
WHERE d.SomeKey = s.SomeKey
);
As you may recall, this example was used to show that an INSERT requires Halloween Protection when
the insert target table is also referenced in the SELECT part of the query (the EXISTSclause in this case).
The correct behaviour for the INSERT statement above is to try to add both 1234 values, and to
consequently fail with a PRIMARY KEY violation. Without phase separation, the INSERT would incorrectly
add one value, completing without an error being thrown.
The code above has one difference from that used in part two; a nonclustered index on the Staging table
has been added. The INSERT execution plan still requires Halloween Protection though:
Now try the same logical insert expressed using MERGE syntax:
MERGE dbo.Demo AS d
USING dbo.Staging AS s ON
s.SomeKey = d.SomeKey
WHEN NOT MATCHED BY TARGET THEN
INSERT (SomeKey)
VALUES (s.SomeKey);
In case you are not familiar with the syntax, the logic there is to compare rows in the Staging and Demo
tables on the SomeKey value, and if no matching row is found in the target (Demo) table, we insert a
new row. This has exactly the same semantics as the previous INSERT...WHERE NOT EXISTS code, of
course. The execution plan is quite different however:
Notice the lack of an Eager Table Spool in this plan. Despite that, the query still produces the correct
error message. It seems SQL Server has found a way to execute the MERGE plan iteratively while
respecting the logical phase separation required by the SQL standard.
The hole-filling optimization
In the right circumstances, the SQL Server optimizer can recognize that the MERGE statement is hole-
filling, which is just another way of saying that the statement only adds rows where there is an existing
gap in the target table’s key.
For this optimization to be applied, the values used in the WHEN NOT MATCHED BY TARGET clause
must exactly match the ON part of the USING clause. Also, the target table must have a unique key (a
requirement satisfied by the PRIMARY KEY in the present case). Where these requirements are met,
the MERGE statement does not require protection from the Halloween Problem.
Of course, the MERGE statement is logically no more or less hole-filling than the
original INSERT...WHERE NOT EXISTS syntax. The difference is that the optimizer has complete control
over implementing the MERGE statement, whereas the INSERT syntax would require it to reason about
the wider semantics of the query. A human can easily see that the INSERT is also hole-filling, but the
optimizer does not think about things in the same way we do.
To illustrate the exact matching requirement I mentioned, consider the following query syntax, which
does not benefit from the hole-filling optimization. The result is full Halloween Protection provided by
an Eager Table Spool:
MERGE dbo.Demo AS d
USING dbo.Staging AS s ON
s.SomeKey = d.SomeKey
WHEN NOT MATCHED THEN
INSERT (SomeKey)
VALUES (s.SomeKey * 1);
The only difference there is the multiplication by one in the VALUES clause – something which does not
change the logic of the query, but which is enough to prevent the hole-filling optimization being applied.
In the previous example, the optimizer chose to join the tables using a Merge join. The hole-filling
optimization can also be applied where a Nested Loops join is chosen, but this requires an extra
uniqueness guarantee on the source table, and an index seek on the inner side of the join. To see this in
action, we can clear out the existing staging data, add uniqueness to the nonclustered index, and try
the MERGE again:
-- Remove existing duplicate rows
TRUNCATE TABLE dbo.Staging;
-- Sample data
INSERT dbo.Staging
(SomeKey)
VALUES
(1234),
(5678);
-- Hole-filling merge
MERGE dbo.Demo AS d
USING dbo.Staging AS s ON
s.SomeKey = d.SomeKey
WHEN NOT MATCHED THEN
INSERT (SomeKey)
VALUES (s.SomeKey);
The resulting execution plan again uses the hole-filling optimization to avoid Halloween Protection,
using a nested loops join and an inner-side seek into the target table:
Where the hole-filling optimization applies, the engine may also apply a further optimization. It can
remember the current index position while reading the target table (processing one row at a time,
remember) and reuse that information when performing the insert, instead of seeking down the b-tree
to find the insert location. The reasoning is that the current read position is very likely to be on the same
page where the new row should be inserted. Checking that the row does in fact belong on this page is
very fast, since it involves checking only the lowest and highest keys currently stored there.
The combination of eliminating the Eager Table Spool and saving an index navigation per row can
provide a significant benefit in OLTP workloads, provided the execution plan is retrieved from cache. The
compilation cost for MERGE statements is rather higher than for INSERT, UPDATE and DELETE, so plan
reuse is an important consideration. It is also helpful to ensure that pages have sufficient free space to
accommodate new rows, avoiding page splits. This is typically achieved through normal index
maintenance and the assignment of a suitable FILLFACTOR.
I mention OLTP workloads, which typically feature a large number of relatively small changes, because
the MERGE optimizations may not be a good choice where a large number of are rows processed per
statement. Other optimizations like minimally-logged INSERTs cannot currently be combined with hole-
filling. As always, the performance characteristics should be benchmarked to ensure the expected
benefits are realized.
The hole-filling optimization for MERGE inserts may be combined with updates and deletes using
additional MERGE clauses; each data-changing operation is assessed separately for the Halloween
Problem.
The final optimization we will look at can be applied where the MERGE statement contains update and
delete operations as well as a hole-filling insert, and the target table has a unique clustered index. The
following example shows a common MERGE pattern where unmatched rows are inserted, and matching
rows are updated or deleted depending on an additional condition:
CREATE TABLE #T
(
col1 integer NOT NULL,
col2 integer NOT NULL,
CONSTRAINT PK_T
PRIMARY KEY (col1)
);
CREATE TABLE #S
(
col1 integer NOT NULL,
col2 integer NOT NULL,
CONSTRAINT PK_S
PRIMARY KEY (col1)
);
INSERT #T
(col1, col2)
VALUES
(1, 50),
(3, 90);
INSERT #S
(col1, col2)
VALUES
(1, 40),
(2, 80),
(3, 90);
The MERGE statement required to make all the required changes is remarkably compact:
MERGE #T AS t
USING #S AS s ON t.col1 = s.col1
WHEN NOT MATCHED THEN INSERT VALUES (s.col1, s.col2)
WHEN MATCHED AND t.col2 - s.col2 = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET t.col2 -= s.col2;
No Halloween Protection, no join between the source and target tables, and it’s not often you will see a
Clustered Index Insert operator followed by a Clustered Index Merge to the same table. This is another
optimization targeted at OLTP workloads with high plan reuse and suitable indexing.
The idea is to read a row from the source table and immediately try to insert it into the target. If a key
violation results, the error is suppressed, the Insert operator outputs the conflicting row it found, and
that row is then processed for an update or delete operation using the Merge plan operator as normal.
If the original insert succeeds (without a key violation) processing continues with the next row from the
source (the Merge operator only processes updates and deletes). This optimization primarily
benefits MERGE queries where most source rows result in an insert. Again, careful benchmarking is
required to ensure performance is better than using separate statements.
Summary
The MERGE statement provides several unique optimization opportunities. In the right circumstances, it
can avoid the need to add explicit Halloween Protection compared with an equivalentINSERT operation,
or perhaps even a combination of INSERT, UPDATE, and DELETE statements. Additional MERGE-specific
optimizations can avoid the index b-tree traversal that is usually needed to locate the insert position for
a new row, and may also avoid the need to join the source and target tables completely.
In the final part of this series, we will look at how the query optimizer reasons about the need for
Halloween protection, and identify some more tricks it can employ to avoid the need to add Eager Table
Spools to execution plans that change data.
The Halloween Problem – Part 4
By Paul White
The Halloween Problem can have a number of important effects on execution plans. In this final part of
the series, we look at the tricks the optimizer can employ to avoid the Halloween Problem when
compiling plans for queries that add, change or delete data.
Background
Over the years, a number of approaches have been tried to avoid the Halloween Problem. One early
technique was to simply avoid building any execution plans that involved reading from and writing to
keys of the same index. This was not very successful from a performance point of view, not least
because it often meant scanning the base table instead of using a selective nonclustered index to locate
the rows to change.
A second approach was to completely separate the reading and writing phases of an update query, by
first locating all rows that qualify for the change, storing them somewhere, and only then starting to
perform the changes. In SQL Server, this full phase separation is achieved by placing the now-familiar
Eager Table Spool on the input side of the update operator:
The spool reads all rows from its input and stores them in a hidden tempdb work table. The pages of this
work table may remain in memory, or they might require physical disk space if the set of rows is large,
or if the server is under memory pressure.
Full phase separation can be less than ideal because we generally want to run as much of the plan as
possible as a pipeline, where each row is fully processed before moving on to the next. Pipelining has
many advantages including avoiding the need for temporary storage, and only touching each row once.
SQL Server goes much further than the two techniques described so far, though it does of course include
both as options. The SQL Server query optimizer detects queries that require Halloween Protection,
determines how much protection is required, and uses cost-based analysis to find the cheapest method
of providing that protection.
The easiest way to understand this aspect of the Halloween Problem is to look at some examples. In the
following sections, the task is to add a range of numbers to an existing table – but only numbers that do
not already exist:
CONSTRAINT PK_Test
PRIMARY KEY CLUSTERED (pk)
);
5 rows
Since this query reads from and writes to the keys of the same index on the Test table, the execution
plan requires Halloween Protection. In this case, the optimizer uses full phase separation using an Eager
Table Spool:
50 rows
With five rows now in the Test table, we run the same query again, changing the WHERE clause to
process the numbers from 1 to 50 inclusive:
This plan provides correct protection against the Halloween Problem, but it does not feature an Eager
Table Spool. The optimizer recognizes that the Hash Match join operator is blocking on its build input; all
rows are read into a hash table before the operator starts the matching process using rows from the
probe input. As a consequence, this plan naturally provides phase separation (for the Test table only)
without the need for a spool.
The optimizer chose a Hash Match join plan over the Nested Loops join seen in the 5-row plan for cost-
based reasons. The 50-row Hash Match plan has a total estimated cost of 0.0347345units. We can force
the Nested Loops plan used previously with a hint to see why the optimizer did not choose nested loops:
This plan has an estimated cost of 0.0379063 units including the spool, a bit more than the Hash Match
plan.
500 Rows
With 50 rows now in the Test table, we further increase the range of numbers to 500:
This time, the optimizer chooses a Merge Join, and again there is no Eager Table Spool. The Sort
operator provides the necessary phase separation in this plan. It fully consumes its input before
returning the first row (the sort cannot know which row sorts first until all rows have been seen). The
optimizer decided that sorting 50 rows from the Test table would be cheaper than eager-
spooling 450 rows just before the update operator.
The Sort plus Merge Join plan has an estimated cost of 0.0362708 units. The Hash Match and Nested
Loops plan alternatives come out at 0.0385677 units and 0.112433 units respectively.
If you have been running these examples for yourself, you might have noticed something odd about that
last example, particularly if you looked at the Plan Explorer tool tips for the Test table Seek and the Sort:
The Seek produces an ordered stream of pk values, so what is the point of sorting on the same column
immediately afterward? To answer that (very reasonable) question, we start by looking at just
the SELECT portion of the INSERT query:
SELECT Num.n
FROM dbo.Numbers AS Num
WHERE
Num.n BETWEEN 1 AND 500
AND NOT EXISTS
(
SELECT 1
FROM dbo.Test AS t
WHERE t.pk = Num.n
)
ORDER BY
Num.n;
This query produces the execution plan below (with or without the ORDER BY I added to address certain
technical objections you might have):
Notice the lack of a Sort operator. So why did the INSERT plan include a Sort? Simply to avoid the
Halloween Problem. The optimizer considered that performing a redundant sort (with its built-in phase
separation) was the cheapest way to execute the query and guarantee correct results. Clever.
The SQL Server optimizer has specific features that allow it to reason about the level of Halloween
Protection (HP) required at each point in the query plan, and the detailed effect each operator has.
These extra features are incorporated into the same property framework the optimizer uses to keep
track of hundreds of other important bits of information during its search activities.
Each operator has a required HP property and a delivered HP property. The required property indicates
the level of HP needed at that point in the tree for correct results. The delivered property reflects the
HP provided by the current operator and the cumulative HP effects provided by its subtree.
The optimizer contains logic to determine how each physical operator (for example, a Compute Scalar)
affects the HP level. By exploring a wide range of plan alternatives and rejecting plans where the
delivered HP is less than the required HP at the update operator, the optimizer has a flexible way to find
correct, efficient plans that do not always require an Eager Table Spool.
We saw the optimizer add a redundant sort for Halloween Protection in the previous Merge Join
example. How can we be sure this is more efficient than a simple Eager Table Spool? And how can we
know which features of an update plan are only there for Halloween Protection?
Both questions can be answered (in a test environment, naturally) using undocumented trace flag 8692,
which forces the optimizer to use an Eager Table Spool for Halloween Protection. Recall that the Merge
Join plan with the redundant sort had an estimated cost of 0.0362708 magic optimizer units. We can
compare that to the Eager Table Spool alternative by recompiling the query with trace flag 8692
enabled:
The Eager Spool plan has an estimated cost of 0.0378719 units (up from 0.0362708 with the redundant
sort). The cost differences shown here are not very significant due to the trivial nature of the task and
the small size of the rows. Real-world update queries with complex trees and larger row counts often
produce plans that are much more efficient thanks to the SQL Server optimizer’s ability to think deeply
about Halloween Protection.
Positioning a blocking operator optimally within a plan is not the only strategy open to the optimizer to
minimize the cost of providing protection against the Halloween Problem. It can also reason about the
range of values being processed, as the following example demonstrates:
UPDATE #Test
SET some_value = 10
WHERE some_value = 5;
The execution plan shows no need for Halloween Protection, despite the fact we are reading from and
updating the keys of a common index:
The optimizer can see that changing ‘some_value’ from 5 to 10 could never cause an updated row to be
seen a second time by the Index Seek (which is only looking for rows where some_value is 5). This
reasoning is only possible where literal values are used in the query, or where the query
specifies OPTION (RECOMPILE), allowing the optimizer to sniff the values of the parameters for a one-off
execution plan.
Even with literal values in the query, the optimizer may be prevented from applying this logic if the
database option FORCED PARAMETERIZATION is ON. In that case, the literal values in the query are
replaced by parameters, and the optimizer can no longer be sure that Halloween Protection is not
required (or will not be required when the plan is reused with different parameter values):
In case you are wondering what happens if FORCED PARAMETERIZATION is enabled and the query
specifies OPTION (RECOMPILE), the answer is that the optimizer compiles a plan for the sniffed values,
and so can apply the optimization. As always with OPTION (RECOMPILE), the specific-value query plan is
not cached for reuse.
Top
This last example shows how the Top operator can remove the need for Halloween Protection:
No protection is required because we are only updating one row. The updated value cannot be
encountered by the Index Seek, because the processing pipeline stops as soon as the first row is
updated. Again, this optimization can only be applied if a constant literal value is used in the TOP, or if a
variable returning the value ‘1’ is sniffed using OPTION (RECOMPILE).
If we change the TOP (1) in the query to a TOP (2), the optimizer chooses a Clustered Index Scan instead
of the Index Seek:
We are not updating the keys of the clustered index, so this plan does not require Halloween Protection.
Forcing the use of the nonclustered index with a hint in the TOP (2) query makes the cost of the
protection apparent:
The optimizer estimated the Clustered Index Scan would be cheaper than this plan (with its extra
Halloween Protection).
There are a couple of other points I want to make about Halloween Protection that have not found a
natural place in the series before now. The first is the question of Halloween Protection when a row-
versioning isolation level is in use.
Row Versioning
SQL Server provides two isolation levels, READ COMMITTED SNAPSHOT and SNAPSHOT ISOLATION that
use a version store in tempdb to provide a statement- or transaction-level consistent view of the
database. SQL Server could avoid Halloween Protection completely under these isolation levels, since
the version store can provide data unaffected by any changes the currently executing statement might
have made so far. This idea is currently not implemented in a released version of SQL Server, though
Microsoft has filed a patent describing how this would work, so perhaps a future version will incorporate
this technology.
If you are familiar with the internals of heap structures, you might be wondering if a particular
Halloween Problem might occur when forwarded records are generated in a heap table. In case this is
new to you, a heap record will be forwarded if an existing row is updated such that it no longer fits on
the original data page. The engine leaves behind a forwarding stub, and moves the expanded record to
another page.
A problem could occur if a plan containing a heap scan updates a record such that it is forwarded. The
heap scan might encounter the row again when the scan position reaches the page with the forwarded
record. In SQL Server, this issue is avoided because the Storage Engine guarantees to always follow
forwarding pointers immediately. If the scan encounters a record that has been forwarded, it ignores it.
With this safeguard in place, the query optimizer does not have to worry about this scenario.
There are very few occasions when using a T-SQL scalar function is a good idea, but if you must use one
you should be aware of an important effect it can have regarding Halloween Protection. Unless a scalar
function is declared with the SCHEMABINDING option, SQL Server assumes the function accesses tables.
To illustrate, consider the simple T-SQL scalar function below:
CREATE FUNCTION dbo.ReturnInput
(
@value integer
)
RETURNS integer
AS
BEGIN
RETURN @value;
END;
This function does not access any tables; in fact it does nothing except return the parameter value
passed to it. Now look at the following INSERT query:
INSERT @T (ProductID)
SELECT p.ProductID
FROM AdventureWorks2012.Production.Product AS p;
The execution plan is exactly as we would expect, with no Halloween Protection needed:
INSERT @T (ProductID)
SELECT dbo.ReturnInput(p.ProductID)
FROM AdventureWorks2012.Production.Product AS p;
The execution plan now includes an Eager Table Spool for Halloween Protection. SQL Server assumes
the function accesses data, which might include reading from the Product table again. As you may recall,
an INSERT plan that contains a reference to the target table on the reading side of the plan requires full
Halloween Protection, and as far as the optimizer knows, that might be the case here.
Adding the SCHEMABINDING option to the function definition means SQL Server examines the body of
the function to determine which tables it accesses. It finds no such access, and so does not add any
Halloween Protection:
INSERT @T (ProductID)
SELECT p.ProductID
FROM AdventureWorks2012.Production.Product AS p;
This issue with T-SQL scalar functions affects all update queries – INSERT, UPDATE, DELETE, and MERGE.
Knowing when you are hitting this problem is made more difficult because unnecessary Halloween
Protection will not always show up as an extra Eager Table Spool, and scalar function calls may be
hidden in views or computed column definitions, for example.
Break large delete operations into chunks
By Aaron Bertrand
Far too often I see folks complaining about how their transaction log took over their hard disk. Many
times it turns out that they were performing a massive data operation, such as deleting or archiving
data, in one large transaction.
I wanted to run some tests to show the impact, on both duration and the transaction log, of performing
the same data operation in chunks versus a single transaction. I created a database and populated it
with a largish table (SalesOrderDetailEnlarged, from this AdventureWorks enlarging script from Jonathan
Kehayias (blog | @SQLPoolBoy)). This table has 4.85 million rows and has a reserved space usage of 711
MB (478 MB in data and 233 MB in indexes).
After populating the table, I backed up the database, backed up the log, and ran a DBCC
SHRINKFILE (don’t shoot me) so that the impact on the log file could be established from a baseline
(knowing full well that these operations *will* cause the transaction log to grow).
I purposely used a mechanical disk as opposed to an SSD. While we may start seeing a more popular
trend of moving to SSD, it hasn’t happened yet on a large enough scale; in many cases it’s still too cost
prohibitive to do so in large storage devices.
The Tests
So next I had to determine what I wanted to test for greatest impact. Since I was involved in a discussion
with a co-worker just yesterday about deleting data in chunks, I chose deletes. And since the clustered
index on this table is on SalesOrderID, I didn’t want to use that – that would be too easy (and would very
rarely match the way deletes are handled in real life). So I decided instead to go after a series
of ProductID values, which would ensure I would hit a large number of pages and require a lot of
logging. I determined which products to delete by the following query:
ProductID ProductCount
--------- ------------
870 187520
712 135280
873 134160
This would delete 456,960 rows (about 10% of the table), spread across many orders. This isn’t a
realistic modification in this context, since it will mess with pre-calculated order totals, and you can’t
really remove a product from an order that has already shipped. But using a database we all know and
love, it is analogous to, say, deleting a user from a forum site, and also deleting all of their messages – a
real scenario I have seen in the wild.
I know this is going to require a massive scan and take a huge toll on the transaction log. That’s kind of
the point. :-)
While that was running, I put together a different script that will perform this delete in chunks: 25,000,
50,000, 75,000 and 100,000 rows at a time. Each chunk will be committed in its own transaction (so that
if you need to stop the script, you can, and all previous chunks will already be committed, instead of
having to start over), and depending on the recovery model, will be followed by either a CHECKPOINT or
a BACKUP LOG to minimize the ongoing impact on the transaction log. (I will also test without these
operations.) It will look something like this (I am not going to bother with error handling and other
niceties for this test, but you shouldn’t be as cavalier):
DECLARE @r INT;
SET @r = 1;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
SET @r = @@ROWCOUNT;
COMMIT TRANSACTION;
-- CHECKPOINT; -- if simple
-- BACKUP LOG ... -- if full
END
Of course, after each test, I would restore the original backup of the database WITH REPLACE,
RECOVERY, set the recovery model accordingly, and run the next test.
The Results
The outcome of the first test was not very surprising at all. To perform the delete in a single statement,
it took 42 seconds in full, and 43 seconds in simple. In both cases this grew the log to 579 MB.
The next set of tests had a couple of surprises for me. One is that, while these chunking methods did
significantly reduce impact to the log file, only a couple of combinations came close in duration, and
none were actually faster. Another is that, in general, chunking in full recovery (without performing a log
backup between steps) performed better than equivalent operations in simple recovery. Here are the
results for duration and log impact:
Again, in general, while log size is significantly reduced, duration is increased. You can use this type of
scale to determine whether it’s more important to reduce the impact to disk space or to minimize the
amount of time spent. For a small hit in duration (and after all, most of these processes are run in the
background), you can have a significant savings (up to 94%, in these tests) in log space usage.
Note that I did not try any of these tests with compression enabled (possibly a future test!), and I left
the log autogrow settings at the terrible defaults (10%) – partly out of laziness and partly because many
environments out there have retained this awful setting.
Next I thought I should test this on a slightly larger database. So I made another database and created a
new, larger copy of dbo.SalesOrderDetailEnlarged. Roughly ten times larger, in fact. This time instead of
a primary key on SalesOrderID, SalesorderDetailID, I just made it a clustered index (to allow for
duplicates), and populated it this way:
SELECT c.*
INTO dbo.SalesOrderDetailReallyReallyEnlarged
FROM AdventureWorks2012.Sales.SalesOrderDetailEnlarged AS c
CROSS JOIN
(
SELECT TOP 10 Number FROM master..spt_values
) AS x;
Due to disk space limitations, I had to move off of my laptop’s VM for this test (and chose a 40-core box,
with 128 GB of RAM, that just happened to be sitting around quasi-idle :-)), and still it was not a quick
process by any means. Population of the table and creation of the indexes took ~24 minutes.
The table has 48.5 million rows and takes up 7.9 GB in disk (4.9 GB in data, and 2.9 GB in index).
This time, my query to determine a good set of candidate ProductID values to delete:
ProductID ProductCount
--------- ------------
870 1828320
712 1318980
873 1308060
So we are going to delete 4,455,360 rows, a little under 10% of the table. Following a similar pattern to
the above test, we’re going to delete all in one shot, then in chunks of 500,000, 250,000 and 100,000
rows.
Results:
Duration, in seconds, of various delete operations removing 4.5MM rows
Log size, in MB, after various delete operations removing 4.5MM rows
So again, we see a significant reduction in log file size (over 97% in cases with the smallest chunk size of
100K); however, at this scale, we see a few cases where we also accomplish the delete in less time, even
with all the autogrow events that must have occurred. That sounds an awful lot like win-win to me!
Now, I was curious how these different deletes would compare with a log file pre-sized to accommodate
for such large operations. Sticking with our larger database, I pre-expanded the log file to 6 GB, backed it
up, then ran the tests again:
Results, comparing duration with a fixed log file to the case where the file had to autogrow
continuously:
Duration, in seconds, of various delete operations removing 4.5MM rows, comparing fixed log size and
autogrow
Again we see that the methods that chunk deletes into batches, and do *not* perform a log backup or a
checkpoint after each step, rival the equivalent single operation in terms of duration. In fact, see that
most actually perform in less overall time, with the added bonus that other transactions will be able to
get in and out between steps. Which is a good thing unless you want this delete operation to block all
unrelated transactions.
Conclusion
It is clear that there is no single, correct answer to this problem – there are a lot of inherent “it depends”
variables. It may take some experimenting to find your magic number, as there will be a balance
between the overhead it takes to backup the log and how much work and time you save at different
chunk sizes. But if you are planning to delete or archive a large number of rows, it is quite likely that you
will be better off, overall, performing the changes in chunks, rather than in one, massive transaction –
even though the duration numbers seem to make that a less attractive operation. It’s not all about
duration – if you don’t have a sufficiently pre-allocated log file, and don’t have the space to
accommodate such a massive transaction, it is probably much better to minimize log file growth at the
cost of duration, in which case you’ll want to ignore the duration graphs above and pay attention to the
log size graphs.
If you can afford the space, you still may or may not want to pre-size your transaction log accordingly.
Depending on the scenario, sometimes using the default autogrow settings ended up slightly faster in
my tests than using a fixed log file with plenty of room. Plus, it may be tough to guess exactly how much
you’ll need to accommodate a large transaction you haven’t run yet. If you can’t test a realistic scenario,
try your best to picture your worst case scenario – then, for safety, double it. Kimberly Tripp
(blog | @KimberlyLTripp) has some great advice in this post: 8 Steps to better Transaction Log
throughput – in this context, specifically, look at point #6. Regardless of how you decide to calculate
your log space requirements, if you’re going to end up needing the space anyway, better to take it in a
controlled fashion well in advance, than to halt your business processes while they wait for an autogrow
(never mind multiple!).
Another very important facet of this that I did not measure explicitly is the impact to concurrency – a
bunch of shorter transactions will, in theory, have less impact on concurrent operations. While a single
delete took slightly less time than the longer, batched operations, it held all of its locks for that entire
duration, while the chunked operations would allow for other queued transactions to sneak in between
each transaction. In a future post I’ll try to take a closer look on this impact (and I have plans for other
deeper analysis as well).
The Problem with Windows Functions and Views
By Paul White
Introduction
Since their introduction in SQL Server 2005, window functions like ROW_NUMBER and RANK have
proven to be extremely useful in solving a wide variety of common T-SQL problems. In an attempt to
generalize such solutions, database designers often look to incorporate them into views to promote
code encapsulation and reuse. Unfortunately, a limitation in the SQL Server query optimizer often
means that views* containing window functions do not perform as well as expected. This post works
through an illustrative example of the problem, details the reasons, and provides a number of
workarounds.
*
This problem can also occur in derived tables, common table expressions and in-line functions, but I see
it most often with views because they are intentionally written to be more generic.
Window functions
Window functions are distinguished by the presence of an OVER() clause and come in three varieties:
o ROW_NUMBER
o RANK
o DENSE_RANK
o NTILE
o COUNT, COUNT_BIG
o CHECKSUM_AGG
o LAG, LEAD
o FIRST_VALUE, LAST_VALUE
All of the window functions listed above are susceptible to the optimizer limitation detailed in this
article.
Example
Using the AdventureWorks sample database, the task at hand is to write a query that returns all product
#878 transactions that occurred on the most recent date available. There are all sorts of ways to express
this requirement in T-SQL, but we will choose to write a query that uses a windowing function. The first
step is to find transaction records for product #878 and rank them in date order descending:
SELECT
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
ORDER BY
rnk;
The results of the query are as expected, with six transactions occurring on the most recent date
available. The execution plan contains a warning triangle, alerting us to a missing index:
As usual for missing index suggestions, we need to remember that the recommendation is not the result
of a through analysis of the query – it is more of an indication that we need to think a bit about how this
query accesses the data it needs.
The suggested index would certainly be more efficient than scanning the table completely, since it
would allow an index seek to the particular product we are interested in. The index would also cover all
the columns needed, but it would not avoid the sort (by TransactionDate descending). The ideal index
for this query would allow a seek on ProductID, return the selected records in
reverse TransactionDate order, and cover the other returned columns:
With that index in place, the execution plan is much more efficient. The clustered index scan has been
replaced by a range seek, and an explicit sort is no longer necessary:
The final step for this query is to limit the results to just those rows that rank #1. We cannot filter
directly in the WHERE clause of our query because window functions may only appear in
theSELECT and ORDER BY clauses.
We can workaround this restriction using a derived table, common table expression, function, or view.
On this occasion, we will use a common table expression (aka an in-line view):
WITH RankedTransactions AS
(
SELECT
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
)
SELECT
TransactionID,
ReferenceOrderID,
TransactionDate,
Quantity
FROM RankedTransactions
WHERE
rnk = 1;
The execution plan is the same as before, with an extra Filter to return only rows ranked #1:
Selecting all the rows from the view results in the following execution plan and correct results:
We can now find the most recent transactions for product 878 with a much simpler query on the view:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = 878;
Our expectation is that the execution plan for this new query will be exactly the same as before we
created the view. The query optimizer should be able to push the filter specified in theWHERE clause
down into the view, resulting in an index seek.
We need to stop and think a bit at this point, however. The query optimizer can only produce execution
plans that are guaranteed to produce the same results as the logical query specification – is it safe to
push our WHERE clause into the view?<
The answer is yes, so long as the column we are filtering on appears in the PARTITION BY clause of the
window function in the view. The reasoning is that eliminating complete groups (partitions) from the
window function will not affect the ranking of rows returned by the query. The question is, does the SQL
Server query optimizer know this? The answer depends on which version of SQL Server we are running.
0
A look at the Filter properties in this plan shows it applying two predicates:
The ProductID = 878 predicate has not been pushed down into the view, resulting in a plan that scans
our index, ranking every row in the table before filtering for product #878 and rows ranked #1.
The SQL Server 2005 query optimizer cannot push suitable predicates past a window function in a lower
query scope (view, common table expression, in-line function or derived table). This limitation applies to
all SQL Server 2005 builds.
This is the execution plan for the same query on SQL Server 2008 or later:
The ProductID predicate has been successfully pushed past the ranking operators, replacing the index
scan with the efficient index seek.
The 2008 query optimizer includes a new simplification rule SelOnSeqPrj (select on sequence project)
that is able to push safe outer-scope predicates past window functions. To produce the less efficient
plan for this query in SQL Server 2008 or later, we have to temporarily disable this query optimizer
feature:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = 878
OPTION (QUERYRULEOFF SelOnSeqPrj);
Unfortunately, the SelOnSeqPrj simplification rule only works when the predicate performs a
comparison with a constant. For that reason, the following query produces the sub-optimal plan on SQL
Server 2008 and later:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID;
The problem can still occur even where the predicate uses a constant value. SQL Server may decide to
auto-parameterize trivial queries (one for which an obvious best plan exists). If auto-parameterization is
successful, the optimizer sees a parameter instead of a constant, and the SelOnSeqPrj rule is not
applied.
For queries where auto-parameterization is not attempted (or where it is determined to be unsafe), the
optimization may still fail, if the database option for FORCED PARAMETERIZATION is on. Our test query
(with the constant value 878) is not safe for auto-parameterization, but the forced parameterization
setting overrides this, resulting in the inefficient plan:
To allow the optimizer to ‘see’ a constant value for query that references a local variable or parameter
we can add an OPTION (RECOMPILE) query hint:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID
OPTION (RECOMPILE);
Note: The pre-execution (‘estimated’) execution plan still shows an index scan because the value of the
variable is not actually set yet. When the query is executed, however, the execution plan shows the
desired index seek plan:
The SelOnSeqPrj rule does not exist in SQL Server 2005, so OPTION (RECOMPILE) cannot help there. In
case you are wondering, the OPTION (RECOMPILE) workaround results in a seek even if the database
option for forced parameterization is on.
In some cases, it is possible to replace the problematic view, common table expression, or derived table
with a parameterized in-line table-valued function:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsForProduct(878) AS mrt;
This produces the desired index seek plan on all versions of SQL Server that support window functions.
This workaround produces a seek even where the predicate references a parameter or local variable –
OPTION (RECOMPILE) is not required.<
The function body could of course be simplified to remove the now-redundant PARTITION BY clause,
and to no longer return the ProductID column. I left the definition the same as the view it replaced to
more clearly illustrate the cause of the execution plan differences.
The second workaround only applies to ranking window functions that are filtered to return rows
numbered or ranked #1 (using ROW_NUMBER, RANK, or DENSE_RANK). This is a very common usage
however, so it is worth mentioning.
An additional benefit is that this workaround can produce plans that are even more efficient than the
index seek plans seen previously. As a reminder, the previous best plan looked like this:
That execution plan ranks 1,918 rows even though it ultimately returns only 6. We can improve this
execution plan by using the window function in an ORDER BY clause instead of ranking rows and then
filtering for rank #1:
That query nicely illustrates the use of a window function in the ORDER BY clause, but we can do even
better, eliminating the window function completely:
This plan reads only 7 rows from the table to return the same 6-row result set. Why 7 rows? The Top
operator is running in WITH TIES mode:
It continues to request one row at a time from its subtree until the TransactionDate changes. The
seventh row is required for the Top to be sure that no more tied-value rows will qualify.
We can extend the logic of the query above to replace the problematic view definition:
The view now uses a CROSS APPLY to combine the results of our optimized ORDER BY query for each
product. Our test query is unchanged:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID;
Both pre- and post-execution plans show an index seek without needing an OPTION (RECOMPILE) query
hint. The following is a post-execution (‘actual’) plan:
If the view had used ROW_NUMBER instead of RANK, the replacement view would simply have omitted
the WITH TIES clause on the TOP (1). The new view could also be written as a parameterized in-line
table-valued function of course.
One could argue that the original index seek plan with the rnk = 1 predicate could also be optimized to
only test 7 rows. After all, the optimizer should know that rankings are produced by the Sequence
Project operator in strict ascending order, so execution could end as soon as a row with a rank greater
than one is seen. The optimizer does not contain this logic today, however.
Final Thoughts
People are often disappointed by the performance of views that incorporate window functions. The
reason can often be traced back to the optimizer limitation described in this post (or perhaps because
the view designer did not appreciate that predicates applied to the view must appear in the PARTITION
BY clause to be safely pushed down).
I do want to emphasise that this limitation does not just apply to views, and neither is it limited
to ROW_NUMBER, RANK, and DENSE_RANK. You should be aware of this limitation when using any
function with an OVER clause in a view, common table expression, derived table, or in-line table-valued
function.
SQL Server 2005 users that encounter this issue are faced with the choice of rewriting the view as a
parameterized in-line table-valued function, or using the APPLY technique (where applicable).
SQL Server 2008 users have the extra option of using an OPTION (RECOMPILE) query hint if the issue can
be solved by allowing the optimizer to see a constant instead of a variable or parameter reference.
Remember to check post-execution plans when using this hint though: the pre-execution plan cannot
generally show the optimal plan.
Index
Berry, Glenn
Selecting a Processor for SQL Server 2012
Bertrand, Aaron
Best Approach for Running Totals
Split Strings the Right Way
Split Strings: Now with less T-SQL
Performance impact of different error handling techniques
Using named instances? Test your DAC connection!
What is the fastest way to calculate the median?
T-SQL Tuesday #33: Trick Shots : Schema Switch-A-Roo
Conditional Order By
Splitting Strings : A Follow-Up
When the DRY principle doesn’t apply
Hit-Highlighting in Full-Text Search
What impact can different cursor options have?
How much impact can a data type choice have?
What is the most efficient way to trim time from datetime?
Beware misleading data from SET STATISTICS IO
Trimming time from datetime – a follow-up
Is the sp_ prefix still a no-no?
Checking if a non-LOB column needs to be updated
Minimizing the impact of DBCC CHECKDB : DOs and DON’Ts
An important change to Extended Events in SQL Server 2012
Bad cardinality estimates coming from SSMS execution plans
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
Generate a set or sequence without loops – part 1
Generate a set or sequence without loops – part 2
Generate a set or sequence without loops – part 3
Potential enhancements to ASPState
Selecting a Processor for SQL Server 2012
Break large delete operations into chunks
Error Handling
Performance impact of different error handling techniques
Extended Events
A Look At DBCC CHECKCONSTRAINTS and I/O
An important change to Extended Events in SQL Server 2012
Measuring “Observer Overhead” of SQL Trace vs. Extended Events
Hall, Jason
My Perspective: The Top 5 Most Common SQL Server Performance Problems
Installation
Selecting a Processor for SQL Server 2012
IO Subsystem
Break large delete operations into chunks
A Look At DBCC CHECKCONSTRAINTS and I/O
Trimming More Transaction Log Fat
Potential enhancements to ASPState
Trimming the Transaction Log Fat
Minimizing the impact of DBCC CHECKDB : DOs and DON’Ts
Quick Tip – Speed Up a Slow Restore from the Transaction Log
The Zombie PerfMon Counters That Never Die!
Kehayias, Jonathan
Measuring “Observer Overhead” of SQL Trace vs. Extended Events
Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing
Kline, Kevin
The Zombie PerfMon Counters That Never Die!
Quick Tip – Speed Up a Slow Restore from the Transaction Log
Randal, Paul
Transaction Log Configuration Issues
Trimming the Transaction Log Fat
Trimming More Transaction Log Fat
Sack, Joe
Configuring a Dedicated Network for Availability Group Communication
Ten Common Threats to Execution Plan Quality
TRANSACTION_MUTEX and Multi-Session Transaction Access
SQL Indexes
A Look At DBCC CHECKCONSTRAINTS and I/O
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
Trimming the Transaction Log Fat
The Benefits of Indexing Foreign Keys
The Zombie PerfMon Counters That Never Die!
Hit-Highlighting in Full-Text Search
How much impact can a data type choice have?
What is the most efficient way to trim time from datetime?
What is the fastest way to calculate the median?
Conditional Order By
When the DRY principle doesn’t apply
SQL Job
My Perspective: The Top 5 Most Common SQL Server Performance Problems
SQL Lock/Block/Deadlock
My Perspective: The Top 5 Most Common SQL Server Performance Problems
SQL Memory
Hit-Highlighting in Full-Text Search
What impact can different cursor options have?
Using named instances? Test your DAC connection!
SQL Optimizer
The Problem with Windows Functions and Views
The Halloween Problem – Part 4
SQL Plan
The Problem with Windows Functions and Views
The Halloween Problem – Part 1
The Halloween Problem – Part 2
The Halloween Problem – Part 3
The Halloween Problem – Part 4
Generate a set or sequence without loops – part 1
Generate a set or sequence without loops – part 2
Generate a set or sequence without loops – part 3
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
The Benefits of Indexing Foreign Keys
Ten Common Threats to Execution Plan Quality
Bad cardinality estimates coming from SSMS execution plans
Beware misleading data from SET STATISTICS IO
What is the fastest way to calculate the median?
Conditional Order By
Best Approach for Running Totals
When the DRY principle doesn’t apply
SQL Trace
Beware misleading data from SET STATISTICS IO
Measuring “Observer Overhead” of SQL Trace vs. Extended Events
Is the sp_ prefix still a no-no?
Stellato, Erin
A Look At DBCC CHECKCONSTRAINTS and I/O
The Benefits of Indexing Foreign Keys
System Configuration
Break large delete operations into chunks
Transaction Log Configuration Issues
Potential enhancements to ASPState
Selecting a Processor for SQL Server 2012
Trimming the Transaction Log Fat
Minimizing the impact of DBCC CHECKDB : DOs and DON’Ts
Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing
Measuring “Observer Overhead” of SQL Trace vs. Extended Events
Configuring a Dedicated Network for Availability Group Communication
My Perspective: The Top 5 Most Common SQL Server Performance Problems
Transaction Log
Break large delete operations into chunks
Transaction Log Configuration Issues
Trimming More Transaction Log Fat
Potential enhancements to ASPState
Trimming the Transaction Log Fat
Waits
TRANSACTION_MUTEX and Multi-Session Transaction Access
Trimming More Transaction Log Fat
Trimming the Transaction Log Fat
White, Paul
The Halloween Problem – Part 1
The Halloween Problem – Part 2
The Halloween Problem – Part 3
The Halloween Problem – Part 4
The Problem with Windows Functions and Views