Previous | Table of Contents | Next |
The extraction team must keep several things in mind while developing extraction and cleansing processes.
The extraction processes must also verify the data before it is released to the users. This verification process performs several basic functions. One of these functions can use statistical formulas to determine the probability of data validity.
For example, a company may have always had between one and two million dollars in sales per day. If new data from the operational environments shows sales of only one hundred thousand dollars per day, the data must be verified by a data analyst before it is released to the user community. This type of verification performs three functions.
First, it checks the validity of the data warehousing extraction processes. In this case, the extraction process may have performed an improper math function as it pulled the data into the warehouse.
Second, it checks the validity of the operational processes. For this anomaly, the operational processes may have lost the records of some sales.
Third, the verification process may alert management to business problems. If the total sales has actually fallen by 90 percent, management will want to get this information as quickly as possible.
The verification process can be worth its weight in gold. Because almost all operational systems will tie into the data warehouse, the verification process can act as a company-wide alarm system.
Note:
A Fortune 500 company had a problem in their accounts receivable system. It had underbilled customers for a total loss of over 30 million dollars. The data warehouse verification process detected the error and reported it to a data warehouse analyst. The money that was recovered was enough to pay for the entire data warehouse.
Data marts are defined in several ways throughout the industry. Unfortunately, the term has never settled into one definition. This would explain the confusion that takes place when the data mart topic is brought up.
Most of the definitions can be classified into two major categories (see Figure 30.5).
Figure 30.5. Two definitions of data marts.
Data marts are a fairly recent addition to the data warehousing movement. They have become increasingly useful for several reasons.
First, they can be developed in a short period of time (usually 3 to 6 months per data mart). Due to the rapid development pace, they may begin paying for themselves sooner than a data warehouse.
Second, the designers will learn to build better data marts as they receive feedback from existing data marts. Usually the team has become much more sophisticated in data mart design after building its first data mart.
Another reason for the popularity of data marts is the ability to pick a small set of data for the first iteration of the data mart process. Several factors are considered.
After management has seen the benefits of a data mart, it is easier to receive funding for additional data marts or an entire data warehouse.
Another benefit of data marts is cost. A data mart usually costs several hundred thousand dollars to implement. This relatively small amount of money is easier for management to approve than a multi-million-dollar data warehouse. Due to the fact that financing is often done at the departmental level, its often easier to gain approval for a data mart because the department understands how the data mart will be used. It is much harder to find financing at the corporate-wide level.
Previous | Table of Contents | Next |