Oracle8 Server Unleashed:Data Warehouses

Data Mining Models

The data mining tool will typically develop a model that can be applied to determine relationships.

The first model is the if/then model.


Example:

IF    a customer requests an address change

THEN    the customer is likely to purchase household goods

With this model, a mail-order company would send a household goods catalog to all customers who change their address.

A second model is classification. The analyst will determine groups and use the data mining tool to place items into each group. For example, a business may have four classifications of their credit rating system.

• Great credit

• Average credit

• Bad credit

• Not enough information

The data mining tool will use these parameters to classify customers into one of the categories.

A third model is clusters. Clusters are similar to the classification model, except that the data mining tool determines the groups instead of the analyst. Using this same example of customer credit ratings, the data mining tool may cluster groups of people into five categories.

• Customer can be trusted with up to $5,000,000.

• Customer can be trusted with up to $2,000,000.

• Customer can be trusted with up to $500,000.

• Customer can be trusted with up to $100,000.

• Customer can be trusted with up to $100.

The data analyst must decide how to use the clusters the data mining tool has developed. In this case, the analyst may decide that all customers who are in the last category (trusted with only $100) should not be visited by any more salespeople.

The next model is sequences. Sequences show a pattern of events over time that are likely to recur. A car dealership may use sequences to sell accessories on new cars.

1. Client purchases vehicle.

2. Client uses dealership for oil changes.

3. Client purchases highly profitable accessories for vehicle.

4. Client purchases another car within three years from same dealer.

This sequence may lead the car dealer to provide oil changes for a low fee to all customers who purchases vehicles. The sequence may indicate that oil changes performed at the dealership will lead to additional profits in accessories and vehicle sales.

Another data mining model is market basket analysis. Market basket analysis looks at the relationship between products and determines which products are likely to be bought together. For example, a data mining tool may determine that bread and milk are likely to be bought together. Based on this information, a grocery store may place milk and bread in the same area of the store. The store hopes that a customer who comes in to buy milk will see the bread and purchase it.

The data mining tool uses several mathematical techniques to perform this analysis. The following is just a small list of the current techniques:

• Neural networks

• Decision trees

• Standard statistics

• Memory-based reasoning

• Genetic algorithms

• Link analysis

These techniques all can lead to the discovery of important data relationships. Each performs different functions and discovers different pieces of information. Thus it is important not to choose a single tool that utilizes only one or two of these techniques. Instead, it is beneficial to use several tools that use the entire spectrum of techniques for a complete data mining tool set.

These data mining techniques typically need three sets of data to develop a model. The first set of data is the training set. This set is used to develop the initial models. The second set of data is the test data. The test data is used to test the models that were created using the training set. If the models prove to be accurate in the test set, the model can be assumed to be correct for use with real data. The third set of data is called the application data. This is the data the model will actually be used against. As time passes, the model will receive feedback on its accuracy. Each time feedback is received, the model will determine if it needs to be changed.

After the models have been developed and tested, they can be presented to the executive community to use for decision making.

Before the relationships can be presented to executives, the data analyst must perform several functions:

1. Eliminate erroneous relationships.


IF    Salesperson last name is greater than 10 characters

THEN    Salesperson will sell 5% more than a salesperson with

less than 10 characters in their last name

This relationship although statistically possible is clearly unreasonable and, thus, should not be presented to an executive.

2. Eliminate unimportant relationships. An unimportant relationship is one that may provide insight into the business of the company but is of little significance. This type of relationship is not worth taking to the executive level.


IF    paper clips are purchased in bulk

THEN    10% of the purchase cost will be saved

Although this relationship is valid, it is not important enough to present to an executive.

3. Eliminate most relationships with low confidence levels. Only relationships that are highly correlated should be reported to executives.


IF    advertise in DBMS magazine

THEN    sales improve .2%

confidence factor 10%

This relationship shows that 10% of the time a company has advertised in DBMS magazine, sales have increased .2%. Unfortunately, this confidence factor is not high enough to establish a true relationship.

4. Propose specific recommendations based on the relationships. For example, if the data shows a clear relationship between advertising in The Wall Street Journal and increasing sales, the data analyst should attempt to determine the proper increase in advertising in The Wall Street Journal for maximum profitability. The data analyst should present the relationship as well as the potential solution to the executive. Executives appreciate the extra step and often act on the proposal quickly.

There are several ways of communicating this data to executives.

• Email

• GroupWare

• Web page

• Meetings

• Published reports

Table of Contents

Используются технологии uCoz