Previous | Table of Contents | Next |
In order to estimate the work to perform a query, SQL Server checks a number of statistics about the table and the indexes. First, it determines how many pages and rows are in the table. Second, it looks in a distribution page for each index to find out approximately how many rows fit the condition defined by each SARG.
Each index has a distribution page allowing the server to estimate the selectivity of a WHERE clause.
Warning: This is really more of an administrative issue than a programming one, but its critical that somebody at your organization knows this fact. The distribution page is set up one time, when the index is first created. Afterwards, there is no automatic mechanism to cause SQL Server to rebuild the distribution page, resynchronizing its statistics with the actual data. This could lead to disastrous performance problems where the server grossly miscalculates the probable behavior of a query.On a periodic basis, someone needs to execute this statement for every table in every database:
update statistics <table-name>This could take a while, so plan to run this on large tables when nobody needs to be using them.
You can run this process on every table in your database by using a cursor. Theres an example of how to do this in Day 17.
For our sample query, SQL Server would look at the statistics to determine what percentage of the table each element of the WHERE clause would select. Lets look at the query again:
select count(*) count, sum(qty) total sold from sales where stor_id between F and Gz and title_id like BI2%
Using the distribution page for the clustered index on stor_id, SQL Server would determine that perhaps 7 percent of the table matched that criterion. The table itself consists of 168,000 rows on 4,536 pages. A clustered index scan of 7 percent of the table would find approximately 11,760 rows on approximately 320 pages. (Remember, because the data is ordered by stor_id, a clustered index scan on that column will find only values in the proper range.)
Note: In the interest of full disclosure, all the numbers in this section and in the diagrams are completely fabricated. (The examples well look at in the section Analyzing Optimization Plans are fully factual.) They are consistent among themselves, but they do not reflect actual values stored in the bigpubs database itself. (Sometimes fiction can be more illuminating than truth.)
Using the distribution page for the nonclustered index on title_id, SQL Server would estimate that perhaps 0.5 percent of the table had a title_id matching the pattern. A nonclustered index scan finding 0.5 percent of the table, would find approximately 840 rows on more than 840 pages (including the overhead of reading the index itself). (Remember again: The process of scanning a nonclustered index involves the additional step of reading a row from the data based on a value in the leaf level of the index. This is what makes nonclustered index scans only barely useful when the query would return many rows.)
SQL Server uses the statistics laid out in the previous section to decide on the most efficient approach. In addition to the index-based methods, the server also determines the cost to perform the query using a table scan. In this case, SQL Server assesses the three options shown in Table 15.1.
Execution Method | Estimated Cost |
---|---|
Table scan | 4536 pages |
Clustered index on stor_id | 320 pages |
Nonclustered index on title_id | 840 pages |
Although the nonclustered index approach is less expensive than the table scan, the clustered index scan is the fastest in this case.
Previous | Table of Contents | Next |