Previous Table of Contents Next


Temporary Tables in Procedures

There is one other problem commonly encountered in optimization. What if a procedure creates a fairly large temporary table and then joins that table back to another table? Here’s a sample query that requires a temporary table:

select t.title_id, pub_id, sum(qty) “CAqty”
  from titles t, sales s, stores st
 where state = “CA”
   and t.title_id *= s.title_id
   and s.stor_id = st.stor_id
 group by t.title_id, pub_id
Msg 303, Level 16, State 1
The table ‘sales’ is an inner member of an outer-join
clause.
This is not allowed if the table also participates in a regular join
clause.


Note:  Let’s remember what the outer join means. It instructs the server to show results for a row in a table, even if there are no corresponding rows in a joined table. In this case, SQL Server is to display every row in the titles table, even if there are no sales for that title to stores in California.

This is a common problem. If I have three tables that join in the format A *= B = C, SQL Server cannot perform the join in a single step. Instead, you need to break up the query into two steps, first resolving the join, B = C, and placing the results into a temporary table, #T. Then you can perform the outer join against the temporary table, A *= #T. Here’s the revised query:

select title_id, qty
  into #temp
  from sales s, stores st
 where state = “CA”
   and s.stor_id = st.stor_id
select t.title_id, pub_id, sum(qty) “CAqty”
  from titles t, #temp tp
 where t.title_id *= tp.title_id
 group by t.title_id, pub_id

The revised query is compiled properly and works. Let’s translate this query into a stored procedure that uses a parameter for the value of state:

create proc state_sales
   (@state char(2))
as
select title_id, qty
  into #temp
  from sales s, stores st
 where state = @state
   and s.stor_id = st.stor_id
select t.title_id, pub_id, sum(qty) “CAqty”
  from titles t, #temp tp
 where t.title_id *= tp.title_id
 group by t.title_id, pub_id
return

The first SELECT in the procedure is fairly standard. As long as the number of sales per state is fairly predictable, we won’t have to worry about recompiling or anything. The second query is another matter. What statistics on the size or distribution of the temporary table are available to SQL Server at optimization time? There aren’t any, so the server is forced to guess.


Tip:  When the server has to estimate the size of a temporary table, it always “guesses” that there will be 100 rows on 10 data pages (20KB). (Needless to say, this guess is usually wrong.)

If I were running the query in a batch mode, this problem would be easy to solve. I would separate the two statements with GO, allowing the second query to be separately optimized after its size was known.

Let’s follow the process the server does to determine the proper execution method for this query. Because it’s an outer join, the outer table must be processed first. SQL Server scans the outer table one time. Each time it finds a row that is not excluded by a WHERE clause, it scans the inner table one time. That inner scan is performed using an index where possible, a scan where it is not. If the server determines that there is no useful index, but it’s worth the trouble to create one, it “reformats” the inner table (builds a working index on the join column in tempdb).

So the problem arises in the second query in the procedure when the temporary table becomes large. SQL Server has no opportunity to determine the size of the temporary table at optimization time, so it simply guesses. If you expect the temporary table to be fairly large, use a subprocedure to join the outer table and the temporary table. The subprocedure optimization will occur after the temporary table is created, so it can take into account its size and consider a reformat strategy.

It turns out to be a little tricky to create the inner procedure because it expects the temporary table already to exist when it is created; however, the outer procedure does not expect this. Create the objects in this order:

1.  Drop both procedures, if they exist.
2.  Create the temporary table.
3.  Create the inner procedure.
4.  Drop the temporary table.
5.  Create the outer procedure.

Save it all as a single script so that you can create the procedures any time you need them.

-- drop both procs
if exists (select * from sysobjects where name = “state_sales_inner”)
    drop proc state_sales_inner
go
if exists (select * from sysobjects where name = “state_sales”)
    drop proc state_sales
go
-- create the temporary table
select title_id, qty
  into #temp
  from sales s, stores st
 where state =  “MA”
   and s.stor_id = st.stor_id
   and 1 = 2                     -- speeds up table creation
go
-- create the inner procedure
create proc state_sales_inner
  with recompile                 -- always recheck the size of #temp
as
select t.title_id, pub_id, sum(qty) “CAqty”
  from titles t, #temp tp
 where t.title_id *= tp.title_id
 group by t.title_id, pub_id
if @@error <> 0
   return 99
else
   return 0
go
-- drop the temporary table
drop table #temp
go
-- create the outer procedure
create proc state_sales
   (@state char(2))
as
declare @ret int
select title_id, qty
  into #temp
  from sales s, stores st
 where state = @state
   and s.stor_id = st.stor_id
if @@error <> 0
   return 98
execute @ret = state_sales_inner
drop table #temp
if @ret <> 0
   return 99
else
   return 0
go

I’ll grant you that this is a lot of work compared to the first solution, but it’s also the best way to guarantee that no problems will arise later on as you add more data to your tables. (Note that temporary tables are dropped when the server ends the connection or procedure where they were created. It’s not necessary, but it is good manners, to drop temporary tables when you are finished with them in a procedure.)


Previous Table of Contents Next
Используются технологии uCoz