Previous | Table of Contents | Next |
On this last day, you learn more about join performance and advanced join options. The topics for today include
Warning: If youre not an experienced SQL programmer, dont start with this chapter. Take the time to read and understand Days 7 and 15 on joins and index structures first. I assume that you understood those chapters and successfully completed the exercises before you started this lesson.
It has always been my feeling that the very best programmers and engineers are those with the best understanding of the underlying processes. If you deeply understand how the system works, you can anticipate its behavior, build effective models to test performance, and write more efficient systems.
The one aspect of SQL Server behavior that most people dont understand well is join processing. How does the server construct result sets out of many tables? What processes does it go through? What is the purpose of optimization with many tables? Why does join order matter?
Most T-SQL programmers dont know the answer to these questions, so they take join behavior for granted. But the more you know about these issues, the more easily you can write large, complex queries that work well.
A few years ago, a customer asked me to troubleshoot a large, complex stored procedure. The procedure took eight minutes to run, but it was part of an interactive system, and they were hoping to get it to under five minutes so users would be more likely to use the software.
Heres a pseudocode outline of the program:
declare cursor1 on outertable for all values cursor1 loop fetch row into key_var1 declare cursor2 for key_var1 cursor2 loop fetch row into key_var2 declare cursor3 for key_var2 cursor3 loop fetch row into key_var3 if key_var3 = x process row else declare cursor4 for key_var3 cursor4 loop fetch row into key_var4 process row end cursor4 loop end cursor3 loop end cursor2 loop end cursor1 loop
(Im not reproducing the code here because it was about 25 pages of T-SQL cursor statements.) This was a fine way to write COBOL code, but its all wrong for SQL. The procedure declared and deallocated about 5000 cursors as it ran.
Youve probably already figured out the solution. Use a single query to handle the multitable join. If you really need a cursor, declare the cursor for a join and let the server handle the mapping of tables to each other, like this:
declare cursor1 on joinresult for all values cursor1 loop fetch row into key_var1 if key_var1 = x process row else fetch row into key_var2 declare cursor4 for key_var2 cursor4 loop fetch row into key_var5 process row end cursor4 loop end cursor1 loop
Rewriting the procedure in this way was our first step, mostly so that we could better understand what it did. As it turned out, the cursor was completely unnecessary. We wrote a SELECT INTO query that created a temporary table; then we ran two SELECTs from that output, one that included a subquery, like this:
select x, y, z into #T from A, B, C where A.id = B.id and B.id2 = C.id2 and select * from #T where select * from #T where z in (select z from D where )
Aside from being much easier to read, understand, and debug, the procedure ran in less than 30 seconds.
This is obviously an extreme example of a join problem caused by a lack of understanding about SQL Server and T-SQL, but getting a better understanding of how joins work is critical to writing good code.
Previous | Table of Contents | Next |