SmartDrill SmartDrill
Mission and Clients
Examples
Case Studies
Tips & White Papers
Analytic Techniques
Data Mining Links
Contact Us

Using multiple modeling techniques on the same data set

CHAID Segmentation Model

Finally, let's create a third model of this database, using CHAID (Chi-square Automatic Interaction Detection) segmentation modeling. CHAID is particularly useful for understanding complex relationships among predictors and allowing us to visualize these relationships, whereas the regression-type techniques are not.

Here is a CHAID tree diagram of this model. (If you are not familiar with CHAID, we'd advise you to review the CHAID portion of the "Analytic Techniques" section of our web site before proceeding with the following discussion.)

chaid credit risk analysis

Since we are using a relatively simple example with few variables, the resulting CHAID tree shows only five final market segments. We have set up the analysis as a screening model to identify bad credit risks. Therefore, the higher-index segments have a higher penetration of bad-credit-risk households.

Whereas the average percent of bad risk households in the entire modeling sample is 14.98%, the riskiest segment (# 2) has a whopping 86.67% penetration of bad risk households. These households are characterized by:

  • A weekly (vs. monthly) head-of-household pay schedule
  • A clerical or skilled manual labor head-of-household occupational status

The least risky segment (#5) has a miniscule 0.15% penetration of bad risk households. These households are characterized by:

  • A monthly (vs. weekly) pay schedule
  • Household head ages of 25 years or more

As we can see, CHAID adds an extra dimension to the modeling process that we do not get from regression-type modeling. Now we can see a picture of the market structure, showing how various segments are created by unique combinations of predictor variables. This is especially useful for client advertising managers as well as advertising agency creatives and media planners, who need to have a clear picture of their target audiences.

The following two gains charts from the CHAID analysis show us that the CHAID model is quite good. The first gains chart shows basic statistics for each segment, and the segments are arranged from most risky to least risky. The above-average-risk segments are highlighted in red, the below-average-risk segments in green.

 CHAID Credit Risk Model: Gains Chart

 

 

Segment #

Size (number of HH)

% of sample

Number of bad-risk HH

% of all bad-risk HH in seg.

Penetration of bad-risk HH (%)

Bad-risk penetration
index

2

2,662

8.2

2,307

47.6

86.67

579

3

2,136

6.6

894

18.5

41.86

279

1

3,229

10.0

923

19.0

28.58

191

4

5,128

15.9

692

14.3

13.50

90

5

19,190

59.3

29

0.6

0.15

1


We can see, for example, that the top segment (#2) represents only 8.2% of the modeling sample, but has a high-risk household penetration of 86.67%. Therefore, 47.6% of all bad-risk households in our sample fall into this one segment.

The next gains chart is a cumulative version of the previous gains chart:

CHAID Credit Risk Model: Cumulative Gains Chart

 

 

Segment #

Cumulative number of HH

Cumulative

% of sample

Cum. # of bad-risk HH

Cum. % of all bad-risk HH in seg.

Cum. Pen. of bad-risk HH (%)

Cumulative

Bad-risk penetration
index

2

2,662

8.2

2,307

47.6

86.67

579

3

4,798

14.8

3,201

66.1

66.72

445

1

8,027

24.8

4,124

85.1

51.38

343

4

13,155

40.7

4,816

99.4

36.61

244

5

32,345

100.0

4,845

100.0

14.98

100


Among other things, the above chart shows us that the riskiest three segments, shown in red, account for 24.8% of the modeling sample, and 85.1% of all high-risk households. And the top 8.2% of all households (segment #2) account for nearly half (47.6%) of all high-risk households.

  back CHAID Segmentation Model next  
Linear Regression Model Logistic Regression Model    Comparison of CHAID vs. Regression Models Conclusions and Implications

Copyright © 1998-2009 SmartDrill. All rights reserved.