Sams Teach Yourself Transact-SQL in 21:Advanced Topics: Joins and Subqueries

Outer Table Scan Cost

The outer table scan cost works just like a single-table optimization. The server asks, “Is there an index method to circumvent scanning the entire table?” If not, the cost of the scan is the number of data pages in the table times the cost of a disk read (physical read cost, or PR cost):

OUTER TABLE SCAN COST = OUTER DATA PAGES * PR

Looking at the example, if sales is the outer table, the cost for a full table scan is 4,356 pages. There’s no index for ord_date, so there’s no way to reduce the cost in that way.

If titles is the outer table, the cost of a scan is 39 pages. There are no indexes on price or type, so again there is no way to reduce the outer scan cost.

Outer Table Rows Found

The number of outer table rows found is estimated by looking at the size of the table in rows and multiplying it by the selectivity of any search arguments.

OUTER TABLE ROWS FOUND = OUTER TABLE ROWS * OUTER SELECTIVITY

Remember, even if a search argument can’t be used to identify the index for an outer scan, the search argument may still eliminate a lot of rows from the query, reducing the number of inner scans.

In the case of the sales table, the server can estimate that the date range argument will reduce the overall number of rows found. Without an index, there are no statistics to base its estimate on, so it simply guesses. With closed range queries like this one, the server estimates 25 percent selectivity.

Note: If you look at Table 21.1, you’ll notice that the selectivity of the ord_date clause (3 percent) is about one tenth the estimate (25 percent).
A little later, we’ll look at how to override the optimizer and instruct it exactly how to process a query. Generally, you do that when you know something the server doesn’t (which is seldom). This is one of those few times when you might actually want to do it.

If titles is outer, there are two search clauses that reduce the number of rows found. The selectivity of these rows can be combined to estimate a smaller number of rows. The server estimates 25 percent selectivity for the enclosed range on price, and 10 percent selectivity for the TYPE IN (…) clause. These are multiplied to come up with an estimate of 2.5 percent (versus the actual value of 8 percent).

Inner Table Scan Cost

The cost to scan the inner table depends on how selective the join is on the inner table (how many rows will be found per scan), and the method of finding those rows:

INNER TABLE SCAN COST = INNER PAGES * PR

Earlier, I mentioned that one of the considerations for an inner table is whether that table will fit in memory. In certain circumstances, the cost of the inner table scan is discounted because the server can assume all reads from the inner table are made from memory.

If titles is the inner table, there will be only one title per sales row. There’s a clustered index, so the cost per scan will be two pages—one to read the index value and one to read the page.

If sales is the inner table, there will be many sales rows per row found in titles. The ratio of rows is approximately 315 to 1. There’s a nonclustered index on title_id, so the cost of making sales the inner table is a little over 315 pages per scan (the server estimates 321 reads). (Nonclustered indexes require you to find the pointer in the index, and then use the page address to find the page you want in the table. A clustered index on title_id would be much more efficient for this query.)

Putting It Together

Let’s finish the calculations, looking at actual and estimated costs. Remember, this is the formula we’ll use:

COST = OUTER TABLE SCAN COST + OUTER TABLE ROWS FOUND * INNER TABLE SCAN
COST

We’re assuming that all work is physical reads, so we’ll just multiply the numbers together to determine how many I/Os are needed (see Table 21.2).

**Table 21.2.** Estimated and actual I/Os based on join order.

	Estimated	Actual

sales -> titles	4,356 pages +	4,356 pages +
	168,725 rows * 25% *	168,725 rows * 3% *
	2 pages per scan =	2 pages per scan =
	88,718 I/Os	14,480 I/Os
titles -> sales	39 pages +	39 pages +
	537 rows * 2.5% *	537 rows * 8% *
	321 pages per scan =	2 pages per scan =
	4,348 I/Os	13,829 I/Os

Whew. I apologize for making you look at all those numbers and equations. If you took the time to work it out as you read, you probably understood most of what happened.

So what have we learned from this exercise? We’ve seen that several factors are involved in determining the best join order. Here’s a quick rundown:

• Table sizes (smaller inner tables can be cached for faster access on repeated scans)

• Row density (tables that have more rows per page make better inner tables)

• Search argument selectivity (tables with highly selective search arguments make better outer tables)

• Index types (clustered indexes work better on inner tables that have lots of rows per join key)

Will these factors ever conflict with each other? Of course. One table may have the most selective search argument, the highest row density, and a good clustered index on the join key, and so it may be best suited for both the inner and outer table. The server tries all the pzossible approaches to determine which one works best.

Table of Contents

Используются технологии uCoz