Tips and White Papers
Here we will be providing brief tips about data mining,
as well as some white papers covering particular topics in more depth.
Check back often, as we continue to add material to this new section.
 |
- White Paper:
- Using multiple modeling techniques on the same data set
|
For a more detailed discussion of this and other related topics, please
read our longer White Paper about Creative Data
Mining
Many people who buy syndicated data to overlay on their proprietary
databases try to use a variety of syndicators on a rotating basis, in
order to keep each syndicator "hungry." But is this always the
best strategy? Some syndicators' data are actually better than others
(e.g, in terms of having less missing data), particularly for certain
product or service categories. (Although, in the interest of fairness,
we will not single out any one syndicator here.)
And if you use the same syndicator over a period of time, you can
often get discounts on overlays. They also get to know you better, they
can be more helpful with their advice, and they tend to be quicker to
fix errors and go the extra distance for loyal customers.
Also, if you do not yet have a large marketing database up and running,
but could benefit from data mining to help you with market segmentation,
targeting or list selection, there are several creative alternatives you
might want to investigate. For example, some of the same survey-based
syndicated services that supply the data used by market researchers or
media researchers and media planners, will also sell you record-level
data (i.e., household- or respondent-level data records) for limited research
usage purposes.
You can buy record-level data from national surveys for just your
industry, and these data often include a great amount of detail on your
brand as well as competitor's brands. In addition to category and brand
usage data, you get the demographic data which they routinely collect
from the same respondents that provide the survey data on product usage.
This is almost as good as having your own in-house marketing database,
and at a fraction of the cost. (And, in some respects, these data can
actually be better than data in a proprietary marketing database, because
you get information on competitors' customers as well as your own.) The
data are available in formats that allow easy importing into statistical
and data mining analytic packages, for analysis by either your own staff
or outside consultants.
Other syndicators that run ongoing omnibus panel surveys, often used
by market researchers as a cost-efficient alternative to customized tracking
or market definition studies, are another option. As with the media data
syndicators, you can get detailed demographic and lifestyle data, but
you can also add some customized, proprietary questions to the mail or
telephone panel survey to suit your needs.
If you have proprietary data from a large-scale market definition
or tracking study, you can often use these data for advanced data mining.
Many times, the research suppliers who conduct, tabulate and report the
results of these customized, client-proprietary research studies base
their research report on simple banner-and-stub crosstabulations of the
data. (If you have read the Analytic Techniques section of our web site,
then you already know how limited, and even potentially misleading, such
simple bivariate analyses of data can be.)
By re-analyzing these studies, usually for a fraction of their original
cost, you can extract much richer and more actionable knowledge than you
got with the original analysis and report. SmartDrill staff have performed
many such re-analyses of data that was just collecting dust, and have
wowed clients with the new understanding gleaned from these studies.
Using Advanced Data Mining
Techniques to Create a Bridge Between Proprietary Market Research
Data and Large-scale Geo-demographic Targeting Analysis
Did you know that with advanced data mining techniques your existing
survey research data can often perform double duty as a bridge to larger-scale
geo-demographic analysis? For example, many times a retailer has attitude
and usage data, as well as key demographic data, from a recently conducted
market research study. The results of various advanced data mining analyses
of these data can be meaningfully projected onto units of microgeography,
to assist management with retail site selection, promotion targeting,
etc.
You don't have to pay a geo-demographic syndicator a large fee for
geocoding the data, overlaying their proprietary clustering codes, and
analyzing the enhanced data. Instead, you can use much less expensive
census data (which many retailers already have in-house) in conjunction
with your own market research data, to achieve powerful results.
Here's a simple example. Let's say that you have conducted a survey
that includes items measuring customer loyalty, heaviness of spending,
or usage of a particular retail department. If your survey also includes
standard, key demographic classification questions, then you can use advanced
data mining techniques to build a predictive model. The dependent variable
could be any of the aforementioned loyalty measures, and the demographics
are the predictors in the model. Once you have a satisfactory model, you
can use the results of the model to score units of micro-geography, much
the same as you would use modeling results to score households (or businesses)
in a customer or prospect database.
The trick is to translate the demographics from the survey respondent
level to the micro-geographic level. Again, to use a simple example, let's
say that you have discovered from the survey-based model that particular
age groups are more loyal (or heavier spenders, etc.) than other age groups.
Instead of scoring a household-level or business-level file using the
various categories of age, you can instead weight the model coefficients
by the proportions of a micro-geographic unit's population falling into
each age group.
For example, age groups' coefficients from a regression model can
be multiplied by the proportions of a micro-geographic unit's population
falling into the respective age groups. If a particular age group has
a strong coefficient, and/or they represent a disproportionately large
part of the age groups in the micro-geographic unit, then that unit will
achieve a higher model score.
This scoring procedure proceeds similarly for the various categories
of the other demographic variables from the survey research-based model.
After all microgeographic units of interest have been scored with all
model parameters, standard tabulation and mapping routines can be used
to perform retail siting, promotion targeting, and even plan-o-gramming.
It's a nifty trick, and it works even better if you plan ahead by
designing your market research to include demographic items that have
the same category breaks that standard census variables have. And if you
already have a site license that allows you to use cluster codes from
one or more of the popular syndicators such as Claritas (PRIZM), Donnelley,
etc., then that's even better. The point is that advanced data mining
techniques can significantly improve the knowledge discovery and application process,
whether or not you have a site license for a proprietary geo-demographic
and lifestyle targeting system.
|