Previous | Table of Contents | Next |
With all of that said, I tend to use the rule-based optimizer unless I have a reason not to use it. First, some of my feelings come from the fact that the cost-based optimizer did not work very well until recent releases of the Oracle product (which is probably an unfair bias). The rule-based optimizer also has the beauty of being simpler to maintain. I do not have to gather statistics and keep them current. The rule-based optimizer is good enough for the vast majority of databases with which I have worked. There are however, a growing number of large databases where efficiency is essential (you can be inefficient on a 50-row table, but not on a 5-million-row table). These are the cases in which the cost-based optimizer can help you out. There is no harm in trying out the cost-based optimizer in your test environment to see whether it can benefit your applications. Your situation might benefit enormously from this tool. One note is that Oracle has put a lot of effort into improving the cost-based optimizer (for example, the new algorithms for star schemas that came with Oracle8) and will probably shift to the cost-based optimizer as the only algorithm some day.
Imagine having to design an execution plan manually. All you know is the SQL for an applications queries. You have no way to guess which is the smaller, look-up table and which is the monstrous main data table. You have no idea whether you are dealing with a huge or small query, whether you can perform your sorts in memory, or whether you will have to use disk sorting. How can you be expected to come up with a good execution plan?
These are the problems faced by the rule-based optimizer. It gets only the syntax and a list of objects that it does not understand. It does its best, but it winds up making assumptions that cause it to pick the best plan for the majority of queries of this type. The cost-based optimizer takes a little extra information that you have been kind enough to gather for it and uses this information to make better guesses at the optimal query path (see Figure 25.3). This can prevent, for example, scanning through a huge table to find rows to join to a little look-up table (it is much faster the other way around).
Figure 25.3. Cost-based optimization.
The statistics you gather cover several different topics that can be used to determine the most efficient query. The cost-based optimizer looks at the number of rows and how the objects are stored. Next, it performs some calculations for the impacts on the three major system resourcesCPU, memory, and input/output. It then is run through an algorithm (which factors in the goals you have set up) to determine which query will give you the best performance. I get a headache just thinking about all the calculations I would have to go through to determine which approach is best for a complex query joining multiple tables. However, that is what the software gets paid to do.
An important topic when discussing the cost-based optimizer is statistic collection. The following are a few thoughts of mine on this subject.
There is probably an entire series of papers written on the internal functioning of the cost-based optimizer. It is important that you understand the basics of what this opti-mizer does and what it can do for you. You should seriously consider using it in very large instances, where theres a mixture of small look-up tables and huge data tables. It could be a useful option in any instance where queries are taking longer than desired under the rule-based optimizer. Of course, you always want to ensure that you are properly tuned before you tell the developers to rewrite their applications, or before you change optimizers.
Lets start with the brute force approach to finding informationthe full table scan. This basic approach dates back to the days of flat files. The mind-set is that if you read every row in the table, you will find your answer (see Figure 25.4). Although it may not be the most brilliant plan possible, it has its uses. For example, if you have a specific query that is looking for a specific combination of a large number of columns, or you want to produce a detailed report showing all data stored in the database, this is exactly what you want. However, if youre looking up a single row of information using a common key value, youll waste a lot of time with this access method.
Figure 25.4. The full table scan.
The next simplest method is using an index to access only the rows of data you need. This typically cuts down on a lot of input/output to the table and also speeds access due to the more efficient storage algorithms used for indexes. Figure 25.5 illustrates the basic concepts of this approach. Obviously, if a table is really small or you are using all the columns of the table in your where clause, its not worth the effort of using an index (because you will read in all the columns of the table when reading the index and then read the table for any remaining columns).
Figure 25.5. Indexed searches.
The first two methods deal with access to a single table. These basic table search methods are often combined to obtain data from multiple tables and join it. There are two common algorithms for merging this datamerge-join and nested loop. Lets start with the merge-join algorithm (see Figure 25.6).
Figure 25.6. The merge-join approach.
Previous | Table of Contents | Next |