![]() |
![]() |
|
|
Analytic Techniques: Simple ProfilingUse the index below to go directly to a section or use the next button to continue
Simple ProfilingThe simplest way to examine data is to profile individual variables from a database. This is accomplished with frequency distributions. The following bar graph shows a simple profile of the age distribution of customers:
We can see that the two largest age groups, 35-44 and 45-54, account for nearly 60% of customers; and that the single largest group, ages 45-54, accounts for 30% of all customers. And, of course, we can produce similar graphs for other variables, such as income, family size, home value, education, and so forth. While such univariate (i.e., one variable at a time) profiles may be useful for simple tasks such as verifying the integrity of variables in a database, it is unwise to base targeting definitions or other strategic decisions on them. In this case, what we really want to know is this: How do customers differ from non-customers? The next bar graph demonstrates how much more useful this understanding of differences really is:
Now we can see that although the 45-54 age group contains nearly one-third of our customers, it contains fully 40% of the non-customers. For targeting purposes, it may make much better sense to focus on the 35-44 age group and, secondarily, on the 25-34 age group, since these latter two age groups both show much higher penetration of customers than the 45-54 age group, while accounting for 44% of all customers. This is a very different conclusion from that drawn from the previous simple profiling, which would have been misleading. In fact, the main goal of nearly all data mining should be to identify valid and reliable patterns which are predictive of similarities within and differences between populations represented in the database. Simple profiling of only one population (e.g., customers; high-value customers; bad credit risks; people who are taking a new, experimental drug; etc.) has severe limitations, and can even lead to a wrong or dangerous conclusion. This is most apparent, of course, in situations such as clinical drug trials, where the inclusion of a control or placebo group, as well as one or more treatment groups, is a standard procedure. However, this also applies to virtually any data mining situation where we are trying to find useful patterns of information, such as target marketing, manufacturing defect analysis, credit or fraud analysis, response-to-promotion analysis, etc. We almost always want to know how the target market differs from the non-target; how the loyal or high-spending customer differs from the non-loyal or low-spending customer or the non-customer; how the good credit risk differs from the poor one; or, in general, how the more successful outcome differs from the less successful outcome.
|
Copyright © 1998-2008 SmartDrill. All rights reserved.