Previous Table of Contents Next


Chapter 30
Data Warehouses

by Matt Larson
In This Chapter
•  Introduction
•  The Six Steps to Creating a Data Warehouse
•  Document the Operational Environment
•  Choosing the Data Warehousing Technologies
•  Designing the Data Warehouse Model
•  Monitoring Tools
•  Develop the Extraction and Scrubbing Routines
•  What About Data Marts?
•  Metadata
•  Executive Information Systems
•  Data Mining (Panning for Gold)

Introduction

Businesses have often understood how important operational data is to the survival of the company. They spend millions collecting data during the course of business. Unfortunately, most businesses fail to properly utilize the data. Once the operational benefits have been gained, the data is forgotten. Due to the increasingly competitive marketplace, businesses are looking to their data to gain an edge over the competition.

What Is a Data Warehouse?

A data warehouse is a central repository of data used for analysis by decision makers. The data is pulled from the operational data stores into the warehouse where it is massaged into a format that is easy to utilize. The decision makers receive a simplified business view of the company data that can easily be used for analysis.

Why Would a Company Need a Data Warehouse?

Established 20 years ago, the fictitious Acme company has grown into a worldwide corporation that sells office supplies to over 30,000 businesses on four continents. Acme has always understood that current technology is a vital tool. In the last 10 years, Acme has amassed a computer network that consists of over 700 servers. Some of these servers are mainframes running on MVS, some are mid-range systems running on VMS and UNIX, and the newest servers are running on Windows NT. This open environment has been beneficial at the business unit level. Each business unit was allowed to purchase the best possible solution to its problem. However, it has become increasingly difficult to answer critical business questions that span more than one business unit.

The CEO recently contemplated expanding the sales force by 10%. In order to make an informed decision, the CEO asked about the relationship between the number of salespeople and profitability. There are several pieces of information needed to answer this question.

  Sales data
  Accounting data
  Employment data
  Market and economic trends

Data analysts ran into several problems while attempting to answer this simple question (see Figure 30.1).


Figure 30.1.  Operational data.

1.  Data cannot be found. Often the data needed to answer a business question is available within the corporate data stores. Unfortunately, this data may reside on any server within the enterprise. Without a central data catalog, finding the data is so difficult that the question often remains unanswered.
2.  Data is duplicated. Throughout the typical organization, many processes are duplicated. In Acme, the company records orders into a database as the orders are taken. The shipping department also has a separate database that records the orders actually shipped. Some business processes alter the order database, some alter the shipping database, and some affect both databases. The data analysts are forced to choose between two databases, which may both be partially incomplete.
3.  Data is in unusable form. Unusable data is data recorded or stored in a format that cannot easily be utilized. An example is data stored on tape. Although the data does exist, it cannot be accessed if it’s sitting on a closet shelf. Another example is data collected in a custom program format. This data cannot be retrieved without the original program that created it.

Eventually the data analysts gather the data they need. They compile a report and present it to the CEO. The report says that salespeople added to the sales force in the past year have not contributed to a higher profit. Instead, they appear to be taking sales away from existing salespeople. The CEO takes one look at it the report and asks about the profit potential of additional advertising. The data analysts, having spent two weeks on the sales force report, look at each other in amazement. There has to be a better way.

There is—data warehousing.

The Six Steps to Creating a Data Warehouse

There are six steps involved in creating a data warehouse. The steps are usually performed in sequential order, although some may be performed together.

1.  Document the operational environment.
2.  Choose the data warehousing technologies.
3.  Design the data warehouse model.
4.  Develop the extraction and scrubbing routines.
5.  Build the metadata repository.
6.  Create the multidimensional reports and cubes.

Document the Operational Environment

The data warehouse is going to pull data from all of the important systems in the company.

The first step in documenting the operational environment is to identify all of the systems. Some of them will be easy to spot.

  Mainframes that run the company’s order database.
  UNIX midrange systems that run the company’s internal payroll system.
  Lotus Notes server that holds group data.

Some of the systems will be harder to identify.

  Excel spreadsheets kept by managers to track projects.
  Old systems that are no longer used but contain needed historical data.
  Systems that are referenced only through another system. This step seems relatively easy, yet it is not uncommon for a system to go unnoticed. This type of mistake can be costly to fix.

The second step in documenting the operational environment is to determine what data is stored in each system. This must be done at the column level. In other words, the columns for every important table must be clearly understood. Depending on the environment, this step may take months to finish. For example, Oracle Applications 10.7 contains thousand of tables with valuable data. All of these tables must be fully understood and documented. This will become even more difficult with older systems. It may be difficult to find an analyst who understands how to interpret the table structures.


Previous Table of Contents Next
Используются технологии uCoz