Previous | Table of Contents | Next |
Were going to look at some practical techniques in a moment. First, lets recap the really important points youve learned so far:
Lets look a little more closely into that last point. Figure 15.7 shows the relationship between query cost (on the vertical axis) and selectivity (horizontal axis), based on different index strategies. The cost of the table scan is constant in spite of selectivity. A scan is an end-to-end proposition, so its cost is always the number of pages in the table.
Figure 15.7. The usefulness of an index decreases with its selectivity for a specific query.
The cost of using a clustered index increases proportionally to the selectivity of the query. Dont forget: A clustered index scan is essentially a partial table scan. If a clustered index is available, it will probably be used instead of a table scan, except in rare instances, because the cost of the clustered index scan does not equal the cost of a table scan unless the query finds all the rows in the table.
The cost of using a nonclustered index also increases in proportion to the selectivity of the query, but because of the need to read one page per row found, its cost quickly exceeds that of a table scan. Only when a very specific row or a small set of rows can be identified will the nonclustered index be used. The point at which the nonclustered index line crosses the table scan line is the maximum selectivity at which a nonclustered index on that column might be used. If the selectivity should be higher than that number, SQL Server will use a table scan to resolve the query.
Note: Often I hear people talk about a 20 percent threshold, where nonclustered indexes stop being used. There is no rule of thumb or basic threshold to define that crossover point. In the case of the sales table, the crossover point is around 4 percent. Other tables will cross over around 40 percent.What accounts for this variation is the relationship between the average row length and the length of the index key(s). As the rows in the table get longer, a nonclustered index may be used to resolve a 25 percent or 30 percent selective query. As the rows in the table get shorter, a nonclustered index will be less and less useful.
Im lost!
If there is a clustered and nonclustered index on the same table, why would you ever use the nonclustered index, since clustered indexes are so much more efficient?
This is a critical question. First, the clustered and nonclustered indexes are probably on different columns, so the selectivity of a query with regard to those indexes will differ. Second, the nonclustered index could cover the query, that is, contain sufficient information to answer the query without referring to the table itself. This approach is almost always more efficient than a clustered index scan.
So thats how the optimizer works. It looks at your query to find places where indexes might help. It estimates how effective the indexes might be. Then it decides on the most efficient path.
Note: Okay, I admit it, theres more to it. Weve only looked at one-table optimization, but there are also issues related to multiple-table optimization. (After optimizing each table, SQL Server considers which combination of table order and index selection provides the best performance.) But for now, lets keep it simple.
Heres where the rubber meets the road. How do you know what the optimizer decided to do, and how can you affect that choice? First, SQL Server provides you with two excellent tools for diagnosing problems with optimization, showplan and statistics io.
Previous | Table of Contents | Next |