Previous | Table of Contents | Next |
A traditional index sorts data in a column or columns using a binary tree to store the ordered sequence of each data value. For instance, if you index on a unique key, a traditional b-tree (binary tree) index will sort out the values into a search tree structure such as is shown in Figure 4.9.
Figure 4.9. A binary tree 32 levels deep can store a huge amount of index nodes.
If you assume that a pure binary tree with two branches per node, a huge amount of data can be stored this way so that a search of only a few nodes will find one row out of billions. For instance, if your tree is 32 levels deep in a pure binary tree, you can search 4.4 billion rows with 32 reads.
Of course, real-world indexes are rarely this efficient. Yet without the use of indexes, very little computer business would ever get done.
The problem with these binary indexes is that many times you need to search on a value that is not unique. Consider the simple table Army:
Army |
---|
Name |
Rank |
Serial No |
For instance, in a military database of the complete army, you have one serial number per soldier. When you search on this column you retrieve only one row. Even if you search by Name, in most cases, you will only receive a few rows. Lets say you wanted to search by rank in this database. You might have 20 ranks for one million people. A binary index like this is flat looking and needs to be stored in a huge amount of space without improving the search. Even if you know the rank, you are still sifting through many identical rows of data and a large number of rows will need to be returned.
Rows like Rank are said to have a low cardinality, because there is a lower percentage of different values occurring in the data.
Because of the need to search, summarize, and drill-down with many of todays databases, especially in data warehouse OLAP systems where one might want to view trends regarding soldiers based on rank, bit-mapped indexes approach the problem of searching a low cardinality column differently.
All rows are given a numerical value, a set of bits that represent them. With 32 bits you can represent 4.4 billion rows, as you saw earlier. You could store that many rows, using simple 4-bit variables in the C language. Obviously Oracle uses a more complex representation, but the result is that, in small amounts of space, a bit representation of a row_id that usually has many bytes can save space by a factor greater than 10. At runtime, these bits are converted to row_ids. See Figure 4.10.
Figure 4.10. A bit-mapped index saves a great deal of space, yet involves an additional conversion to row_id.
This saves space and file search time for the Oracle database. The only increase in time with bit-mapped indexes is the conversion from bit representation back to row_ids.
Creating a bit-mapped index in Oracle8 Enterprise Edition is not a big production; only the will to do so is required and using the new reserved word bitmap which appears after the create statement in SQL used to create indexes:
create bitmap index index_name on table(column);
As mentioned earlier, due to the new hardware architectures that include multiple CPUs, Oracle has felt compelled to offer the ability to split a query over many CPUs to receive a faster result.
Consider a company with employees who are listed in the employee table residing in a database. For each sale the company makes, recorded in the Sale table, the employee who made the sale is related back to the employee table by using the salesperson_id:
EMPLOYEE SALE employee_id -> salesperson_id
Select all the employees who sold over 100,000 dollars:
select employee_id from employee where 100000 < ( select sum(total_sale) from sale where salesperson_id = employee.employee_id )
In this query you can see that if you break up the inner select from the outer select and allow different CPUs to work on the two different select statements, you will gain in response time.
Like the parallel query, parallel DML refers to parallel processing of DML statements. DML stands for Data Manipulation Language. The three DML statements in SQL that are not queries, are the Update, Insert, and Delete statements.
An example of parallel DML would be an update of all salaries for any salesperson who sold over $100,000 of products. Again, you would use the employee table and the sale table. Give these smooth talkers a $1000 raise:
update employee set salary = salary + 1000 where 100000 <= ( select sum(total_sale) from sale where salesperson_id = employee.employee_id )
In this query you again can see that having a separate CPU work on the inner select statement and another CPU working on the actual update, the parallel option will save you time.
Even with a simpler update, Oracle needs not only to update the column you specify, but if the column is indexed, Oracle will need to update the index, write all this information to the redo logs, and move the old data to the rollback segments. This processing can be more readily accomplished with multiple CPUs (see Figure 4.11).
Figure 4.11. Multiple CPUs can make the tasks of updating an Oracle database easier.
Oracle8 Enterprise Edition allows multiple CPUs to scan index pages together. An index page is usually a 2KB block that contains the value you are indexing and the row_id, meaning the location, of that row on disk. By offering the ability to scan many pages at once with different CPUs, Oracle speeds up the process of browsing the nodes of an index b-tree.
A star query is a way of creating a data cube for an OLAP database. Instead of aggregating values at run-time, the data cube keeps these running totals at all times in a cube. Since a relational database deals with tables and not cubes, a star schema is simply a tabular representation of a data cube containing aggregate values.
To be able to scan the many combinations and aggregations of this data cube in parallel is another example of how Oracle intends to take the lead in OLAP processing, thus enhancing its position in the world of data warehousing.
Previous | Table of Contents | Next |