Previous Table of Contents Next


Choosing the Data Warehousing Technologies

After documenting the current systems, it is time to choose the technologies that will be used to implement the data warehouse. Several types of software must be chosen:

  Operating system of the data warehouse
  Database for the data warehouse
  Data warehouse modeling tools
  Data warehouse development tools
  Data warehouse monitoring tools
  Extraction and scrubbing tools
  Reporting tools
  OLAP server (optional)
  Data mining tools (optional)


Tip:  
This is a list of suggestions to keep in mind when choosing a vendor.
  Choosing the “best” product is not always the proper solution. Instead, choose a quality product that has a good chance of becoming an industry standard. Often this is more dependent on the company marketing the product than on the product itself. A company such as Microsoft or Oracle will have the marketing forces to drive a smaller competitor out of the market.
  Use a smaller vendor for custom needs (this is often the case for the extraction/scrubbing process). A small, startup company is more likely to give the needed attention than a large corporation with an immense number of clients.
  Choose a product that follows industry standards closely. This will allow the data warehouse team to easily change vendors in the future if deemed appropriate.

There are a large number of vendors supplying this software. It can be difficult to evaluate and choose the best product. The technology has the potential of making or breaking the data warehouse project. Therefore, it is imperative to spend time evaluating various software packages from several vendors.

Designing the Data Warehouse Model

The data model is a blueprint of how the data will look when it resides inside the data warehouse. It is quite different from the data models in operational environments. Typically, operational data models include the following:

  Relational model
  Hierarchical model
  Network model
  Flat files model
  Object-oriented model

The relational model is currently the most popular. It is capable of allowing a large number of small transactions. These transactions are typically used to support the organization. One example is the work of the data entry clerk. The data entry clerk enters one record at a time into the database. There may be thousands of clerks performing this function at the same time. This results in several thousand small requests to the database. The data warehouse, on the other hand, usually services a small number of large requests used for decision making. This special type of system requires a completely different model. The most popular data warehouse model is the star schema.

The Star Schema

The star schema is designed for slicing and dicing data into a format that is easy for executives to understand. The star schema is made up of two types of tables.

  Fact tables
  Dimension tables

Fact Tables

The fact table contains two types of columns. The first type of column contains the information that is likely to be calculated.

  Sale amount
  Percentage of margin
  Sales commission
  Discount
  Cost to make product

These items are usually numbers, but can be text. These facts make up the grain of the table. The other type of column in the fact table is the reference column to the dimension tables. These columns store the primary key values of the dimension tables.

The fact tables are usually the largest tables in the database. The data warehouse design team must work to find a balance between the size of the table and the usability of its data.

Dimension Tables

The dimension tables contain data that will be used to selectively include or exclude data that will be returned from the fact table. The data stored in the dimension table is usually text, but sometimes it will be a number. An example of this is a zip code. Although the zip code is a number, it will not be used in mathematical functions such as the average, sum, or standard deviation. Instead, it will be used to determine the data from the fact table that will be returned. An executive might ask for the total sales from the 90210 zip code. This will use the zip code dimension field to select only the sales in the fact table that occurred in that zip code. An executive is unlikely to ask the data warehouse to determine the average zip code. The dimension tables will usually be much smaller than the fact table.

Figure 30.2 shows five tables. The fact table is the Sales table. This company sells service contracts for software support. The first four columns of this fact table are keys used to join with the four dimension tables.


Figure 30.2.  A star schema.

  service_id
  time_id
  sales_person_id
  customer_id

The other columns in the Sales table make up the grain of the table.

  amount_of_sale
  number_of_months
  cost_to_provide_service
  sales_commission

Time Dimension

The first dimension table is the time dimension. Many people question the need for the time dimension. They often wonder why a simple date column would not be sufficient. In the OLTP environment they would be correct. However, in a data warehouse, the data is being used for decision making and needs to be tied into the business environment. The time dimension can recognize business events that will be important to the users, such as the following:

  Fiscal year
  Fiscal quarter
  Day of the week
  Holiday
  Special event

The company may notice a jump in sales on Wednesday, February 14. This may seem abnormal except that February 13 is a special event. February 13 is the date of a new bonus structure for the sales team. Most salespeople wanted to close sales after the new structure was in place. The company may have lost sales as a result of procrastination by the sales force. Management may use this information to adjust the way bonus structures are implemented.

The following are some questions that may be answered with the time dimension:

  What was the average sale during the fourth quarter?
  What was the total sales commissions paid for sales made on Fridays during the last three months?
  How many sales are usually made on a holiday?
  When are we most likely to make a sale: at the beginning, middle, or end of the quarter?


Previous Table of Contents Next
Используются технологии uCoz