Previous Table of Contents Next


Clusters

Oracle clusters allow tables that are frequently accessed together (as with join operations), to be stored together. These clusters can be defined to be index clusters or hash clusters. With index clusters, the key value that the cluster is arranged around is indexed with hash clusters, the key value is hashed.

Clusters can be advantageous in certain situations and disadvantageous in others. You must be careful in determining whether a cluster can help performance in your configuration. Typically, clusters are advantageous if the clustered, related data is used primarily in joins because the data to be used in the join is retrieved together in one I/O operation.

If you have two tables with related data that are frequently accessed together, having a cluster can improve performance by preloading the related data into the SGA. Because you frequently use the data together, having that data already in the SGA greatly reduces access time.

If you do not typically use the information together, you will find no performance benefit from using a cluster. There is even a slight disadvantage because the SGA space is taken up by the additional table information.

Another disadvantage of clusters is a reduction in the performance of INSERT statements. This happens because of the additional complexity of the use of space and because there are multiple tables in the same block. The clustered table also spans more blocks than the individual table would, causing more data to be scanned.

Hash Clusters

A hash cluster is similar to a cluster but uses a hash function rather than an index to reference the cluster key. A hash cluster stores the data based on the result of a hash function (a numeric function that determines the data block in the cluster based on the value of the cluster key).

To find the data block in an index cluster, there must first be one or more I/Os to the cluster index to find the correct data block. In a hash cluster, the cluster key itself tells Oracle where the data block is. This arrangement can reduce to one the number of I/Os required to retrieve the row.

In contrast to the index cluster, which stores related data together based on the row’s cluster key value, the hash cluster stores related rows together based on their hash values.

The decision to use hash clusters is an important one. Hash clusters can be beneficial because, when they are effectively used, the requested data can be retrieved in just one I/O. Unfortunately, if a hash cluster is used on a table that is not a good candidate for hashing, performance can be severely degraded.

Although hash clusters can be used in a similar fashion to index clusters, you do not have to cluster the tables to use a hash cluster. In fact, in many cases, it is useful to create a single table as a hash cluster. By using hashing, you can retrieve your data with a single I/O rather than the multiple I/Os required to retrieve the same data using a B*-Tree index.

Because hashing uses the value of the data to calculate the data block the desired data is in, hashing is best used on tables that have unique values for the cluster key and that are queried primarily by equality queries on the cluster key. In the case of equality queries, the data is usually retrieved in one read operation. The cluster key need not be a single column; if the typical query uses an equality on a set of columns, use these columns to create a composite key.

A good candidate for hashing has the following properties:

  Unique cluster keys. Hashing performs best when the value of the cluster key is fairly unique. Because the data is laid out based on the key value, a column with a lot of duplicates is not a good candidate for the hash key.
  Equality queries. The majority of queries are equality queries on the cluster key. This type of query reaps the greatest benefit from the hash cluster.
  Static size. Hashing is optimal when the table or tables are fairly static in size. If the table stays within its initial storage allocation, you do not see any performance degradation from using a hash cluster; but if the table grows out of its initial allocation, performance can degrade. In this case, overflow blocks are required.
  Constant cluster key. Hashing can also degrade performance when the value of the cluster key changes. Because the location of the block in which the data resides is based on the cluster key value, a change in that value can cause the row to migrate to maintain the cluster.
  No table scans. Hashing can degrade the performance of table scans because the scan must read blocks that may not have much data in them. Because the table is originally created by laying out the data in the cluster based on the value of the cluster key, there may be some blocks that have few rows.

Do not use a hash cluster on a table if the application frequently modifies the cluster key or the table is constantly being modified. Because the cluster key is based on a calculation, you can incur significant overhead by constantly recalculating the key.

Anytime you have a somewhat static table with a unique column value or set of column values, consider creating a hash cluster.

Range Partitioning

A range partition allows you to divide the data in tables into separate pieces or ranges based on the value of data in a column or number of columns. By partitioning data in cases where table scans are common, such as data warehouses, you may be able to reduce the amount of data that is scanned.

You will be able to take advantage of these ranges by carefully coordinating the application and the database’s partitions. If the column that is specified in the range partition is used in the WHERE clause of a query, the optimizer will be able to convert a table scan into a range scan, where only certain partitions are scanned. This can reduce the amount of data being read dramatically. Any reduction in I/O is always a performance winner.


Previous Table of Contents Next
Используются технологии uCoz