| Previous | Table of Contents | Next |
On this last day, you learn more about join performance and advanced join options. The topics for today include
Warning: If youre not an experienced SQL programmer, dont start with this chapter. Take the time to read and understand Days 7 and 15 on joins and index structures first. I assume that you understood those chapters and successfully completed the exercises before you started this lesson.
It has always been my feeling that the very best programmers and engineers are those with the best understanding of the underlying processes. If you deeply understand how the system works, you can anticipate its behavior, build effective models to test performance, and write more efficient systems.
The one aspect of SQL Server behavior that most people dont understand well is join processing. How does the server construct result sets out of many tables? What processes does it go through? What is the purpose of optimization with many tables? Why does join order matter?
Most T-SQL programmers dont know the answer to these questions, so they take join behavior for granted. But the more you know about these issues, the more easily you can write large, complex queries that work well.
A few years ago, a customer asked me to troubleshoot a large, complex stored procedure. The procedure took eight minutes to run, but it was part of an interactive system, and they were hoping to get it to under five minutes so users would be more likely to use the software.
Heres a pseudocode outline of the program:
declare cursor1 on outertable for all values
cursor1 loop
fetch row into key_var1
declare cursor2 for key_var1
cursor2 loop
fetch row into key_var2
declare cursor3 for key_var2
cursor3 loop
fetch row into key_var3
if key_var3 = x
process row
else
declare cursor4 for key_var3
cursor4 loop
fetch row into key_var4
process row
end cursor4 loop
end cursor3 loop
end cursor2 loop
end cursor1 loop
(Im not reproducing the code here because it was about 25 pages of T-SQL cursor statements.) This was a fine way to write COBOL code, but its all wrong for SQL. The procedure declared and deallocated about 5000 cursors as it ran.
Youve probably already figured out the solution. Use a single query to handle the multitable join. If you really need a cursor, declare the cursor for a join and let the server handle the mapping of tables to each other, like this:
declare cursor1 on joinresult for all values
cursor1 loop
fetch row into key_var1
if key_var1 = x
process row
else
fetch row into key_var2
declare cursor4 for key_var2
cursor4 loop
fetch row into key_var5
process row
end cursor4 loop
end cursor1 loop
Rewriting the procedure in this way was our first step, mostly so that we could better understand what it did. As it turned out, the cursor was completely unnecessary. We wrote a SELECT INTO query that created a temporary table; then we ran two SELECTs from that output, one that included a subquery, like this:
select x, y, z
into #T
from A, B, C
where A.id = B.id and B.id2 = C.id2
and
select *
from #T
where
select *
from #T
where z in
(select z from D where
)
Aside from being much easier to read, understand, and debug, the procedure ran in less than 30 seconds.
This is obviously an extreme example of a join problem caused by a lack of understanding about SQL Server and T-SQL, but getting a better understanding of how joins work is critical to writing good code.
| Previous | Table of Contents | Next |