Telemarketing
Campaign –
Target
Customer
Prediction
HARSHAD B AMBEKAR
PGD-DS – MARCH 2019
To acquire knowledge, one must
study, but to acquire wisdom, one
must observe
- MARTIN LUTHER KING JR
Model
EDA Conclusion
Building
• Data Analysis • Build predictive • Determine the cost
• Identify relevant models of acquisition
predictor variables • Choose the best • Conclusion
for a response one
Exploratory Data Analysis
Data Analysis
Identify relevant predictor variables
for a response
Identify Relevant Predictor Variables for a Response
We can draw similarities from the age response analysis where we found that the youngest and eldest were most likely to
respond in a positive manner. It is reiterated by the above analysis, where we notice that student and retired have the
highest response rates.
Identify Relevant Predictor Variables for a Response
We can note that the four months of December, march, October and
September appear to be the best to contact the potential customers. However,
please note that these our months have the fewest data entries as well, so it is
not certain, how well it would behave when calls are made at a high volume.
You can see that the prospective customers contacted through cellular medium
are significantly ore likely to opt in to the investment scheme compared to
customers contacted through telephonic medium.
Identify Relevant Predictor Variables for a Response
Though there is a huge difference in the response rates of the three
categories, it is important to note that there is also a huge difference in the
volume of data entries in these categories
We can note that the response rate is greatest for the the potential customers
having previous contact of more than three. However, please note that the
number of data entries is very low for such cases, hence it's difficult to draw
any inferences.
Model Building
Build predictive models
Finding Variables using Logistic Reg. and RFE
Creating model and checking the summary for
significance of all variables
Choose the best one
Finding Variables using Logistic Reg. and RFE
Feature selection methods
can give you useful
information on the relative
importance or relevance of
features for a given
problem. We can use this
information to create
filtered versions of our
dataset and increase the
accuracy of our models.
Creating model and finding significance of all variables
1. A regression coefficient is not significant yet theoretically, that
variable should be highly correlated with response.
2. When we add or delete a factor from your model , the regression
coefficients change dramatically.
3. We may see a negative regression coefficient when our response
variables should increase along with that factor and vise versa.
4. Our independent variables have high pairwise correlations.
Model predicting an observation as 'positive'
ROC curve shows a trade off between sensitivity and specificity (any increase in
sensitivity will be accompanied by a decrease in specificity).
The concept of ROC and AUC builds upon the knowledge of Confusion Matrix,
Specificity and Sensitivity. Also, the example that I will use in this article is based on
Logistic Regression algorithm, however, it is important to keep in mind that the concept
of ROC and AUC can apply to more than just Logistic Regression.
Choosing Best One
Better sensitivity also decent accuracy and specificity too
Evaluate Performance of Model
Gain and Lift charts are used to evaluate performance of classification model. They measure how much better one can expect to do with
the predictive model comparing without a model. It's a very popular metrics in marketing analytics.
Gain at a given decile level is the ratio of cumulative number of targets (events) up to that
decile to the total number of targets (events) in the entire data set
It measures how much better one can expect to do with the predictive model comparing
without a model. It is the ratio of gain % to the random expectation % at a given decile
level. The random expectation at the xth decile is x%.
The Cum Lift of 4.03 for top two deciles, means that when selecting 20% of the records based
on the model, one can expect 4.03 times the total number of targets (events) found by
randomly selecting 20%-of-file without a model.
Conclusion
Summarising Analysis
Conclusion
Summerizing the Analysis and Conclusion
To achieve the business objective of acquiring 80% of total
prospects at minimum possible cost, we will need to target 50% of
the total customer base for entire dataset.
Logistic model created above has improved 50% efficiency, as
instead of contacting the whole set of customers, we can contact
only 50% of the customers to achieve the business objective.