Previous | Table of Contents | Next |
The data mining tool will typically develop a model that can be applied to determine relationships.
The first model is the if/then model.
Example: IF a customer requests an address change THEN the customer is likely to purchase household goods
With this model, a mail-order company would send a household goods catalog to all customers who change their address.
A second model is classification. The analyst will determine groups and use the data mining tool to place items into each group. For example, a business may have four classifications of their credit rating system.
The data mining tool will use these parameters to classify customers into one of the categories.
A third model is clusters. Clusters are similar to the classification model, except that the data mining tool determines the groups instead of the analyst. Using this same example of customer credit ratings, the data mining tool may cluster groups of people into five categories.
The data analyst must decide how to use the clusters the data mining tool has developed. In this case, the analyst may decide that all customers who are in the last category (trusted with only $100) should not be visited by any more salespeople.
The next model is sequences. Sequences show a pattern of events over time that are likely to recur. A car dealership may use sequences to sell accessories on new cars.
This sequence may lead the car dealer to provide oil changes for a low fee to all customers who purchases vehicles. The sequence may indicate that oil changes performed at the dealership will lead to additional profits in accessories and vehicle sales.
Another data mining model is market basket analysis. Market basket analysis looks at the relationship between products and determines which products are likely to be bought together. For example, a data mining tool may determine that bread and milk are likely to be bought together. Based on this information, a grocery store may place milk and bread in the same area of the store. The store hopes that a customer who comes in to buy milk will see the bread and purchase it.
The data mining tool uses several mathematical techniques to perform this analysis. The following is just a small list of the current techniques:
These techniques all can lead to the discovery of important data relationships. Each performs different functions and discovers different pieces of information. Thus it is important not to choose a single tool that utilizes only one or two of these techniques. Instead, it is beneficial to use several tools that use the entire spectrum of techniques for a complete data mining tool set.
These data mining techniques typically need three sets of data to develop a model. The first set of data is the training set. This set is used to develop the initial models. The second set of data is the test data. The test data is used to test the models that were created using the training set. If the models prove to be accurate in the test set, the model can be assumed to be correct for use with real data. The third set of data is called the application data. This is the data the model will actually be used against. As time passes, the model will receive feedback on its accuracy. Each time feedback is received, the model will determine if it needs to be changed.
After the models have been developed and tested, they can be presented to the executive community to use for decision making.
Before the relationships can be presented to executives, the data analyst must perform several functions:
IF Salesperson last name is greater than 10 characters THEN Salesperson will sell 5% more than a salesperson with less than 10 characters in their last name
IF paper clips are purchased in bulk THEN 10% of the purchase cost will be saved
IF advertise in DBMS magazine THEN sales improve .2% confidence factor 10%
There are several ways of communicating this data to executives.
Previous | Table of Contents | Next |