Sams Teach Yourself Transact-SQL in 21:How the Server Uses Indexes

Summary

Understanding how the optimizer works can help you write better code and establish realistic expectations of SQL Server. You may find that certain queries, no matter how well written, don’t perform well. In that case, you need to speak with your DBA about adding indexes or changing the structure of important tables. Those decisions fall outside the boundaries of this book.

More often than not, the optimizer will make the right choices for you. The exception arises when you know something about the query or the data that the optimizer does not know. For example, if you knew that an entire table was already in cache, scanning the table could be much more efficient than using a nonclustered index, even if the index was fairly selective. Remember that, in general, what you know about a current situation is transient and will probably be fixed or changed very shortly. In general, avoid overriding the optimizer’s index selection in programs that are part of a permanent application.

Q&A

Q Which is better, a clustered or nonclustered index?

A For most individual queries, if I had to choose between a clustered and nonclustered index on the same column, I would choose the clustered index. But that’s not the real choice. You only get one clustered index per table, so you use it to help queries that perform range selections and sorts. Other queries are aided by nonclustered indexes on other columns or sets of columns.

Q Doesn’t the overhead of keeping a clustered index slow everything down? Why keep the data sorted that way?

A Maintaining the physical sort during data modifications is only a problem when (1) you are inserting rows into a table or (2) you are changing the clustered index key of a row already in the table. During inserts, you need to find the page where the new row belongs. Occasionally, you need to split the page to make space for the new row. Only the page splits cause real performance problems. Clustered indexes actually improve insert performance by distributing inserts in various parts of the table, allowing multiple threads and multiple processors to add data to a single table concurrently.

Changing the clustered index key during an update does incur a cost, so DBAs are encouraged to choose nonvolatile columns as the basis of a clustered index.

Q How do I know if SQL Server is using an index and choosing the best plan?

A Use showplan to display the optimization plan and determine whether the server is using an index. Is the plan the best one? You will need to do some analysis. Look at the size of the table (in pages and rows) and the selectivity of the various indexes available for the query. If you think that the server isn’t choosing a good plan, try forcing a different one and see if the statistics io output proves you right. If so, consider restating the query or breaking it up into parts, using a temporary table to store an intermediate result set.

Q Does SQL Server warn you if your indexes aren’t useful?

A No. The only warning you get is when your queries take forever. (One good indication is that you’ve had to replace your stopwatch with a sundial or a calendar.)

Q Are my tables large enough to make good use of indexes?

A In the case of single-table queries, SQL Server will seldom use indexes on tables consisting of fewer than four pages. If you are testing query optimization and performance, try to build tables with at least a couple of thousand rows each before you start. Otherwise, all the optimization behavior will be skewed.

Q Can I audit index use?

A Not really. You can use the SQL Audit tool to track user queries; then you can re-execute the queries yourself with showplan to see how well those queries are being optimized.

Q How do I quickly see which tables have indexes?

A The fastest way to list indexes is to query the system table, sysindexes:

select o.name, i.indid, i.name
from sysobjects o, sysindexes I
where o.type = “U”   -- user tables only
and i.indid < 255    -- exclude text column structures
and i.id = o.id
order by o.name, i.indid

The indid column tells you the type of the index. A value of 0 means that the table has no clustered index: it is a heap table. A value of 1 designates the clustered index on the table. Note that each table will have a row in sysindexes with an indid value of 0 or 1. A value of 2 or higher designates a nonclustered index.

Q Why not always force an index?

A The optimizer will be better than your application at adapting to new circumstances, including an increase in the number of rows or a change in the data distribution. If the optimizer isn’t adjusting well, look at your query; the problem is often there.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you’ve learned. Try to understand the quiz and exercise answers before continuing on to the next day’s lesson. Answers are provided in the appendix.

Quiz

1. Which is larger, a nonclustered or clustered index?

2. What is the crucial factor for the optimizer in choosing a nonclustered index?

3. When is a nonclustered index always faster than a corresponding clustered index?

4. How many clustered indexes can I have on a table? How many can I have in a database?

Exercises

1. Write a query to output the contents of the titles table. Output the optimization plan and the I/O statistics for the query.

2. Add a WHERE clause to the query based on the title column and see if the optimization plan and statistics change.

3. Force the use of the nonclustered index on the title column. Check the query plan and statistics.

4. Remove columns from the query output to allow the server to choose a non-clustered index. Why does changing the selected columns affect the optimization?

Table of Contents

Используются технологии uCoz