Previous Table of Contents Next


Week 3

Day 21
Advanced Topics: Joins and Subqueries

On this last day, you learn more about join performance and advanced join options. The topics for today include

  Understanding join behavior and performance
  Outer joins
  Self-joins


Warning:  If you’re not an experienced SQL programmer, don’t start with this chapter. Take the time to read and understand Days 7 and 15 on joins and index structures first. I assume that you understood those chapters and successfully completed the exercises before you started this lesson.

Understanding Join Behavior and Performance

It has always been my feeling that the very best programmers and engineers are those with the best understanding of the underlying processes. If you deeply understand how the system works, you can anticipate its behavior, build effective models to test performance, and write more efficient systems.

The one aspect of SQL Server behavior that most people don’t understand well is join processing. How does the server construct result sets out of many tables? What processes does it go through? What is the purpose of optimization with many tables? Why does join order matter?

Most T-SQL programmers don’t know the answer to these questions, so they take join behavior for granted. But the more you know about these issues, the more easily you can write large, complex queries that work well.

An Anecdote

A few years ago, a customer asked me to troubleshoot a large, complex stored procedure. The procedure took eight minutes to run, but it was part of an interactive system, and they were hoping to get it to under five minutes so users would be more likely to use the software.

Here’s a pseudocode outline of the program:

declare cursor1 on outertable for all values
cursor1 loop
      fetch row into key_var1
       declare cursor2 for key_var1
      cursor2 loop
           fetch row into key_var2
             declare cursor3 for key_var2
           cursor3 loop
                fetch row into key_var3
              if key_var3 = x
                 process row
              else
                 declare cursor4 for key_var3
              cursor4 loop
                   fetch row into key_var4
                 process row
              end  cursor4  loop
       end  cursor3  loop
    end  cursor2  loop
end  cursor1  loop

(I’m not reproducing the code here because it was about 25 pages of T-SQL cursor statements.) This was a fine way to write COBOL code, but it’s all wrong for SQL. The procedure declared and deallocated about 5000 cursors as it ran.

You’ve probably already figured out the solution. Use a single query to handle the multitable join. If you really need a cursor, declare the cursor for a join and let the server handle the mapping of tables to each other, like this:

declare  cursor1  on  joinresult  for  all  values
cursor1   loop
      fetch row into key_var1
     if key_var1 = x
         process row
     else
          fetch row into key_var2
            declare cursor4 for key_var2
          cursor4 loop
               fetch row into key_var5
              process row
          end  cursor4  loop
end  cursor1  loop

Rewriting the procedure in this way was our first step, mostly so that we could better understand what it did. As it turned out, the cursor was completely unnecessary. We wrote a SELECT … INTO query that created a temporary table; then we ran two SELECTs from that output, one that included a subquery, like this:

select x, y, z
    into #T
    from A, B, C
   where A.id = B.id and B.id2 = C.id2
      and …

select  *
   from #T
   where …

select  *
   from #T
   where z in
       (select z from D where …)

Aside from being much easier to read, understand, and debug, the procedure ran in less than 30 seconds.

This is obviously an extreme example of a join problem caused by a lack of understanding about SQL Server and T-SQL, but getting a better understanding of how joins work is critical to writing good code.


Previous Table of Contents Next
Используются технологии uCoz