SmartDrill SmartDrill
Mission & Clients
Examples
Case Studies
Tips & White Papers
Analytic Techniques
Data Mining Links
Contact Us

Using multiple modeling techniques on the same data set

 Linear Regression Model

We first run a linear regression model to predict credit risk status. After scoring the database with coefficients from the regression model, we examine the degree of correspondence between predicted and actual group membership (good credit risk vs. bad credit risk). To do this, we cut the distribution of predicted scores into two groups, at a probability of 0.5 (because households with a predicted probability of 0.5 or less should ideally be in the "bad credit risk" category of the dependent variable). Households with a predicted probability above 0.5 should ideally fall into the "good credit risk" category of the dependent variable.

The following table shows the level of accuracy of the model:

Credit Risk Linear Regression Model
- Classification Table -

 

Predicted

Observed

Bad risk

Good risk

Percent correct

Bad risk

3,634

1,211

75.00%

Good risk

1,774

25,726

95.50%

Overall:

90.77%

The table shows that the model correctly classified three quarters of the actual bad credit risk households, and just over 95% of the good credit risk households, for an overall accuracy rate of 90.77%. This is a good model, but it is better at identifying good credit risks than bad ones.

The following chart shows the cumulative probability distribution of predicted scores from the linear regression model:

Linear Regression Probability Plot.gif (4636 bytes)

We can see that the cumulative distribution of predicted scores crosses the 50% probability point in a rather smooth fashion, without any clear gap in the distribution.

Linear Regression Model Next      
  Logistic Regression Model CHAID Segmentation Model  Comparison of CHAID vs. Regression Models Conclusions and Implications

 


Copyright © 1998-2008 SmartDrill. All rights reserved.