Sams Teach Yourself Transact-SQL in 21:How the Server Uses Indexes

Optimization Without Indexes

When you first create a table (unless you define a primary key or unique constraint), SQL Server creates the table with no indexes, clustered or nonclustered. Every time a query references the table, SQL Server has no choice but to examine every row of the table to find the appropriate results using a table scan.

For example, if you were to submit the following query to a version of the sales table with no indexes, SQL Server would perform a table scan to determine which rows met the criteria:

select count(*)
from sales
where qty > 500
and title_id = “BI0194”

As the server reads each page into memory and retrieves each individual row, it looks at the values for qty and title_id, determining which rows belong in the final result set.

Here’s the key point: no matter whether the server finds 100 rows or 100,000 that meet the criteria, if there are no indexes available, the work required to perform the query is constant, and the amount of time required is also constant. Indexes allow the server to read fewer pages and fewer rows to return the same information more rapidly.

Clustered and Nonclustered Indexes

The server uses two types of indexes to organize data: clustered and nonclustered. A clustered index sorts the actual rows of data by the index key; nonclustered indexes create a separate, sorted list of keys with pointers to the actual rows.

A clustered index is an index structure that also defines the sort order of a table. A nonclustered index is an index structure that is independent of the table.

Let’s look at examples of each kind of index as it would relate to a real table: the sales table included in the pubs database. The actual version of sales included with the sample data is too small to show interesting index behavior, so I loaded up another version of the table with over 168,000 rows of data. I use that sample table to test complex queries and to do some benchmarking. We will use that table now to understand the indexing system used by SQL Server.

Note: The larger version of the sales table that we are working with was included on the CD-ROM with the book Microsoft SQL Server 6.5 Unleashed (Sams Publishing, ISBN: 0-672-30956-4). You can access my large version of the entire pubs database (called bigpubs) by accessing my Web site at http://www.metis.com. Refer back to Day 1 if you want more information on how to access the site.

You’ve used sp_help elsewhere to get a list of objects in a database or to learn about the columns in a table or view. In this chapter, you use it to learn about the indexes on a table.

sp_help sales

…
index_name    index_description               index_keys
------------- ------------------------------  -------------------------
UPKCL_sales   clustered, unique, primary key  stor_id, ord_num, title_id
…

I’ve only displayed the part of the output that involves indexes (and I’ve pruned that a little as well, so it would fit on the page). The table has two indexes, one clustered on stor_id, ord_num, and title_id, the other nonclustered on title_id. The clustered index in this case is also a unique index that has been designated a primary key.

Clustered Indexes

We said earlier that clustered indexes actually define the way data is sorted in a table. The first thing the server does when you create a clustered index is to sort the existing rows in the order of the clustered index. If you created a last_name index on a telephone directory, the server would immediately re-sort all the rows in the directory by last name. If you dropped that last name index and created a new clustered index on telephone_number, the server would re-sort the directory table by telephone number.

Note: There are all sorts of maintenance issues associated with indexes. How often should they be rebuilt to remove fragmentation? How long does it take to re-sort the data on a large table? Can I speed up index creation by sorting the data first before I load it? (The answer to the last question turns out to be “yes.”)
This book really is not intended to deal with those issues—they belong to the domain of system and database administrators. Microsoft SQL Server 6.5 Unleashed (Sams Publishing, ISBN: 0-672-30956-4) or Microsoft SQL Server DBA Survival Guide, Second Edition (Sams Publishing, ISBN: 0-672-30959-9) both contain useful information about this. Other issues related to performance will be addressed shortly.

The clustered index on stor_id forces SQL Server to sort the table by stor_id and to maintain that sort throughout any subsequent operations on that table.

Note: I can just hear you now, grumbling about the work required to maintain a sorted list through every update. We’ll address that question after you have a better idea of how clustered indexes work.

I’ve provided a diagram of the first level of the clustered index on sales in Figure 15.3. After sorting the rows by the index key (stor_id), the server builds the first level of the index by recording the first value for stor_id on each page, along with a pointer to that page. In the figure, the first stor_id value on the first page (page 8600) is 6380. The first value on the first page of the clustered index is also 6380. The row in the index includes a pointer to the page number in the table.

Figure 15.3. A table and its clustered index.

Note: Because the table is already sorted by stor_id, SQL Server does not need to store a pointer to each row on each page to know what values of stor_id are on the page. As a result, the number of rows in the first level of the clustered index is dramatically lower than the number of rows in the table.
As we proceed in this discussion, it’s vital to start to understand how the size of various structures changes, and what characteristics of a table or index cause it to be more or less efficient. I will continue to point out how the row counts compare between various structures, and among their various levels.

Table of Contents

Используются технологии uCoz