Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
591 views5 pages

Assignment 3

The document describes an assignment to analyze customer data for a supermarket loyalty program to determine which customers are likely to purchase new organic products. The data set contains information on over 22,000 customers. Initial exploration found that about 24% of customers purchased organic products. Females outnumbered males in the data set. A decision tree model was created with the target variable being organic purchases (yes/no). The optimal tree had 24 leaves with age being the variable used for the first split. A second decision tree allowing 3-way splits was also created, finding 23 leaves and slightly better accuracy scores compared to the 2-way split tree.

Uploaded by

Ben Belanger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
591 views5 pages

Assignment 3

The document describes an assignment to analyze customer data for a supermarket loyalty program to determine which customers are likely to purchase new organic products. The data set contains information on over 22,000 customers. Initial exploration found that about 24% of customers purchased organic products. Females outnumbered males in the data set. A decision tree model was created with the target variable being organic purchases (yes/no). The optimal tree had 24 leaves with age being the variable used for the first split. A second decision tree allowing 3-way splits was also created, finding 23 leaves and slightly better accuracy scores compared to the 2-way split tree.

Uploaded by

Ben Belanger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Benjamin Belanger

10/20/14

Decision Trees Assignment


Please provide all your answers in this document:
1. Initial Data Exploration
A supermarket is offering a new line of organic products. The supermarket's management wants
to determine which customers are likely to purchase these products.
The supermarket has a customer loyalty program. As an initial buyer incentive plan, the supermarket
provided coupons for the organic products to all of the loyalty program participants and collected data
that includes whether these customers purchased any of the organic products.
The ORGANICS data set contains 13 variables and over 22,000 observations. The variables in the
data set are shown below with the appropriate roles and levels:
Name

Model
Role

Measurement
Level

Description

ID

ID

Nominal

Customer loyalty identification number

DemAffl

Input

Interval

Affluence grade on a scale from 1 to 30

DemAge

Input

Interval

Age, in years

DemCluster

Rejected

Nominal

Type of residential neighborhood

DemClusterGroup

Input

Nominal

Neighborhood group

DemGender

Input

Nominal

M = male, F = female, U = unknown

DemRegion

Input

Nominal

Geographic region

DemTVReg

Input

Nominal

Television region

PromClass

Input

Nominal

Loyalty status: tin, silver, gold, or platinum

PromSpend

Input

Interval

Total amount spent

PromTime

Input

Interval

Time as loyalty card member

TargetBuy

Target

Binary

Organics purchased? 1 = Yes, 0 = No

TargetAmt

Rejected

Interval

Number of organic products purchased

Although two target variables are listed, these exercises concentrate on the binary variable
TargetBuy.
a. Create a new diagram named Organics.

Benjamin Belanger
10/20/14
b. Define the data set AAEM.ORGANICS as a data source for the project.
1) Set the model roles for the analysis variables as shown above.
(You can go back and modify variable roles even after you complete the wizard by rightclicking on the Organics data source and selecting Edit Variables)
The variable DemClusterGroup contains collapsed levels of the variable DemCluster.
Presume that, based on previous experience you believe that DemClusterGroup is
sufficient for this type of modeling effort. Set the model role for DemCluster to Rejected.

Include a screen shot showing DemCluster is rejected.

Benjamin Belanger
10/20/14
Explore the Organics data source and answer the following questions: Examine the
distribution of the target variable. What is the proportion of individuals who purchased
organic products (Hint: Right click on Organics data source and choose explore and then
click on Action and then Plot, choose bar chart?
0.2433 - this indicates that approximately one in four of the target market purchases organic
products.

Include a screen shot showing the bar chart.

1) Are there more males or more females in the sample?


ANSWER: Yes 3237 were reported female, 1611 were reported as male.

(Hint - plot DemGender using a bar chart)

Include a screen shot showing the bar chart.


2) Finish the Organics data source definition.

Benjamin Belanger
10/20/14
a. Add the AAEM.ORGANICS data source to the Organics diagram workspace.
b. Add a Data Partition node to the diagram and connect it to the Data Source node. Assign 50%
of the data for training and 50% for validation.
c. Add a Decision Tree node to the workspace and connect it to the Data Partition node.
d. Create a decision tree model autonomously (i.e. just run the Decision Tree node). Use average
square error as the model assessment statistic.

Answer the two questions below and attach the screenshot(s) in your solution document where
you found the answer.
1) How many leaves are in the optimal tree? 24
2) Which variable was used for the first split? DemAge
What were the competing splits for this first split (Hint: Close the Results window for the
Decision tree Model. Select the Interactive ellipses () from the Decision Tree nodes
Properties panel. Right click the root node and select Split Node from the option menu. The
Split Node 1 window appears with information that answers the two questions?
DemAge (Age) , DemAffl (Affluence Grade), DemGender (Gender), PromSpend (Total
Spend), PromClass (Loyalty Status)

Benjamin Belanger
10/20/14
a. Add a second Decision Tree node to the diagram and connect it to the Data Partition node.
1) In the Properties panel of the new Decision Tree node, change the maximum number of
branches from a node to 3 to enable three-way splits.
2) Create a decision tree model. Use average square error as the model assessment statistic.

1) How many leaves are in the optimal tree? 23


a. Based on average square error, which of the decision tree models appears to be better?

Decision Tree

Training

Validation

Two-way

0.141347

0.141347

Three-way

0.140013

0.140196

The two models produce extremely similar ASE values. The decision node with three-way splits
does produce a slightly better ASE, however only very slightly.

You might also like