![]() |
![]() |
|
RegressionLinear Regression and logistic regression are examples of multivariate modeling techniques. Linear regression is useful when we are using several variables to predict the values of a continuous dependent variable, such as customer value in dollars. We can include various predictor variables such as age, income, family size, education, etc., and then use linear regression to tell us the unique influence of each predictor on customer dollar value, controlling for the influence of all the other predictors. Thus, individual cross-tabs may show that both age and income seem quite important predictors of customer value, and that education is slightly important, but family size is not important. But linear regression can show us the unique, relative importance of each predictor by examining all of the predictors' simultaneous influences on the dependent variable (customer value). So regression is a more powerful way of sorting out multiple influences than the eyeballing of output from separate cross-tabs or other simple bivariate techniques. Regression generates exact coefficients for each predictor, and shows us what proportion of the variability of the dependent variable (customer value) is uniquely explained by each individual predictor. This makes it possible to build a predictive model which has predictor coefficients that can be used to "score" the records in a prospect file. This means that we can rank prospects in terms of what their likely value would be if they became customers, based on their demographic characteristics. When the dependent variable we are trying to predict has only two values (e.g., responder vs. non-responder to a promotional mailing), then we can use a special type of regression called Logistic Regression. As with Linear Regression, we generate coefficients for each predictor, and these coefficients can be used to score a prospect file to determine the best prospects for a promotional mailing. There are other variations of regression-type techniques, such as Curvilinear Regression and Loglinear Analysis, but we will not go into them here. Our intent is simply to point out that multivariate analysis is a much more powerful and useful data mining technique than either univariate or bivariate analysis. Regression-type techniques also have limitations, however. For example, it is difficult to understand the influence of complex interactions among predictor variables on the dependent variable. (E.g., does income influence customer value differently for older prospects than it does for younger ones?) In order to clarify these potentially complex interactive relationships, we would typically use other methods, either instead of regression or in addition to it. These might include techniques such as Classification and Regression Trees, Chi-square Automatic Interaction Detection (CHAID), or Neural Networks. We cannot cover all these other techniques here. Instead, we will select one of the more useful and easily understood techniques (CHAID) for additional discussion.
|
Copyright © 1998-2009 SmartDrill. All rights reserved.