SmartDrill SmartDrill
Mission and Clients
Examples
Case Studies
Tips & White Papers
Analytic Techniques
Data Mining Links
Contact Us

Using multiple modeling techniques on the same data set

Comparison of CHAID vs. Regression Models

The CHAID analysis in our example cuts the modeling sample into fewer gradations (only five segments) than the regression models we examined earlier, which assign a wider array of probability scores to households. [Note, however, that CHAID models typically have many more segments (usually anywhere from 20 to 80) than we created in our small example. Thus, CHAID can actually allow us to take rather fine cuts at a file.]

Therefore, we cannot examine a perfect one-to-one correspondence between the CHAID model and the regression models. However, we can select cut points that correspond to break points between CHAID segments, and then see what proportion of bad-risk households are captured by the regression models at these cut points.

The following table shows cut points for the approximately top 8% of the sample (segment #2), and the approximately top 24% of the sample (segments 2, 3 and 1):

 Performance Comparison of Three Modeling Techniques

 

Modeling technique

% of risky households captured @ approx. 24% of total file households

% of risky households captured @ approx. 8% of total file households

Linear regression 85.1% @ 24.3% of all HH 42.3% @ 8.0% of all HH
Logistic regression 85.1% @ 22.1% of all HH 42.3% @ 8.0% of all HH
CHAID 85.1% @ 24.8% of all HH 47.6% @ 8.2% of all HH


This table shows us that the logistic regression model performs slightly better at greater depth into the file, since it captures the same percent of risky households (85.1%) as the other two techniques while going only 22.1% down into the file.

However, CHAID outperforms the other two techniques when skimming the cream off the top of the file: when we go down into only 8.2% of the file, CHAID captures 47.6% of the risky households (vs. a 42.3% capture rate at 8.0% of the file for the other two models).

This suggests that CHAID may be the most useful scoring model if we want to avoid targeting credit sales efforts at the very worst prospects. But it may be useful to score the remainder of the file using the logistic regression model, if we have to screen out additional poor risks beyond about the top 8%.

    Back Comparison of CHAID vs. Regression Models Next
Linear Regression Model Logistic Regression Model CHAID Segmentation Model   Conclusions and Implications


Copyright © 1998-2009 SmartDrill. All rights reserved.