Previous Table of Contents Next


Scalability

The data warehouse design team must ensure that the data warehouse is scalable. It is easy for programmers to forget scalability issues and create a process that is destined to be rewritten. The programmers and designers must test all processes at the highest possible load level that can be expected in the data warehouse. Scalability of the data warehouse is not difficult to achieve if it is remembered. The data warehouse must be scalable in many ways.

1.  Extraction and scrubbing process. The designers need to ensure that the functions being performed on the data can be expanded easily without recoding the process. For example, if the current extraction process is pulling data from an Oracle database, it may not need to use the parallel query option. However, as the data grows in size, the parallel query option may be needed. If the extraction programmer allows a parallel clause to be determined at run time, the process may begin using the parallel query option if it is needed.
2.  Table structures. The data warehouse design team may decide to perform a join between two tables as they would in an operating environment. A more scalable solution may be to combine the tables into one nonrelational table. In other situations it may be more scalable to use object tables.
3.  Client processes. The client processes must be programmed to allow the data warehouse to scale to a large number of clients. If the client causes a large burden on the data warehouse by issuing unneeded or incorrect queries, the data warehouse will be limited to a small number of users.

Utilizing Oracle8 Features in the Data Warehouse

The data warehouse design team should be aware of the features of the database that contains the data warehouse. It is often appropriate to bring in a database guru on that particular database. Oracle8 has several features that will enhance the data warehouse environment. The three new Oracle features that I have found to be most helpful are partitioning, indexing, and hints.

Partitioning

Partitioning allows a table or index to be split into several physical structures while maintaining a single logical structure. Partitioning is often used in data warehousing to split a table into sections based on a time series. Figure 30.3 shows a Sales table partitioned by year. The 1994 and 1995 partitions currently reside in tablespaces that are offline. This typically means that they are stored on tape and are not accessible without the data being reloaded. The 1996 and 1997 partitions of the Sales table reside in read-only tablespaces. These partitions cannot be updated without the intervention of the database administrator. The 1998 partition is in a normal read/write tablespace. The users of this partition are only granted select (read-only) access. However, the extraction and scrubbing processes still have the ability to add data to those tables.


Figure 30.3.  A partitioned Sales table.

Indexing

Oracle8 has five types of indexes:

  B-tree (most common type of index)
  B-tree cluster (used for clusters)
  Hash clusters (uses a mathematical function to determine matching rows)
  Reverse key (distributes I/O access to the index)
  Bitmap indexes (used for low cardinality columns)

Each indexing scheme is valuable in certain situations and most data warehouses use all of them.


Tip:  
Bitmap indexes were designed by Oracle Corporation for data warehouse applications. They have proven to be a gold mine for the data warehouse designers who use them.

Hints

Oracle has provided some hints that cannot be passed up. The most important hint to the data warehouse is the star hint. The star hint tells the database to join the large fact table after joining all of the smaller dimension tables.

Monitoring Tools

Monitoring tools are used to oversee several aspects of the data warehouse. They monitor the speed of the extraction and scrubbing processes, as well as the queries being issued by the users. As they collect information, the data warehouse team can adjust the structure of the data warehouse to improve performance for the users. They may also detect resource that are being wasted by some users. For example, a manager in a small department may ask for the total breakdown of all sales for every product the company sells. This query may take two days to run and use 10% of the data warehouse resources. The data warehouse team may inquire into the relevance of this query and may ask that manager to stop running it.

The monitoring tool should also be able to detect when data is being queried unnecessarily. This often happens when the metadata is incorrect. The metadata may point the user to the incorrect data. The user will realize he or she has received the incorrect data and continue on to find the correct data. The data warehouse team may incorrectly optimize access to the bad table when they should be fixing the metadata that is pointing users to the wrong data. Monitoring tools can provide the ability to notice trends that may point to incorrect or misleading metadata.

The monitoring tools also record the length of time it takes to complete a query. If the query runs for an extended period of time, the monitoring tool should alert the design team that something may be wrong. The design team can then decide the proper course of action.

Develop the Extraction and Scrubbing Routines

The extraction and scrubbing processes can be one of the most time-consuming efforts of the data warehouse project. Often this step will take up the largest portion of resources allocated to the data warehouse project.

The first step in the extraction and scrubbing processes is to determine which data from the operating environment has changed since the last time the data warehouse was refreshed. This step is difficult because it may pull data from hundreds of different sources. It will be easy to determine the changed data for some sources. Others will require much more effort.

There are three basic ways to determine what data has changed.

  Auditing logs
  Full refreshes with data selection performed inside the data warehouse
  Code changes in the existing applications


Previous Table of Contents Next
Используются технологии uCoz