Data Mining Assignment-Clustering Data-Ads 24x7 Summary

Uploaded by

Ganesh Manoharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views12 pages

Data Mining Assignment-Clustering Data-Ads 24x7 Summary

Uploaded by

Ganesh Manoharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

DATA Mining

Assignment-Part1
Clustering Clean Ads

Submitted by
Ganesh M
Ajai Subramaniam K
Aditya Raman
Shree Raksha S
1 - Clustering Digital Ads Data: The ads24x7 is a Digital Marketing
company which has now got seed funding of $10 Million. They are
expanding their wings in Marketing Analytics. They collected data
from their Marketing Intelligence team and now wants you (their
newly appointed data analyst) to segment type of ads based on the
features provided. Use Clustering procedure to segment ads into
homogeneous groups.
The following three features are commonly used in digital marketing:
CPM = (Total Campaign Spend / Number of Impressions) x 1,000.
Note that the Total Campaign Spend refers to the 'Spend' Column in
the dataset and the Number of Impressions refers to the
'Impressions' Column in the dataset.
CPC = Total Cost (spend) / Number of Clicks. Note that the Total Cost
(spend) refers to the 'Spend' Column in the dataset and the Number
of Clicks refers to the 'Clicks' Column in the dataset.
CTR = (Total Measured Clicks / Total Measured Ad Impressions) x
100. Note that the Total Measured Clicks refers to the 'Clicks' Column
in the dataset and the Total Measured Ad Impressions refers to the
'Impressions' Column in the dataset.
The Data Dictionary and the detailed description of the formulas for
CPM, CPC and CTR are given in the sheet 2 of the Clustering Clean
ads_data Excel File.
i. Read the data and perform basic analysis such
as printing a few rows (head and tail), info,
data summary, null values duplicate values,
etc.
All checks were done and given here the info tab for reference.

Total columns – 19
Total Null items – (23066-18330) – 4736 – CTR, CPM & CPC
Total duplicates – 0
Data types available
 Float64 – (06)
 Int64 – (07)
 Object – (06)
ii.Treat missing values in CPC, CTR and CPM
using the formula given in question using
lambda function

Along with the given formulae and using lambda function,

the missing values were treated and created a new set of
columns as CPMA,CTRA & CPCA to start with the clustering.
Checking & Treating the Outliers:
Box plot for Outliers check:
As most columns are highly right skewed, we have decided to treat outliers with upper whisker
using the formula = Q3 + 1.5*IQR for all columns except CTRA, CPMA & CPCA.

Yes. We must treat the outliers definitely for K-Means clustering. K-Means clustering can be highly
sensitive to outliers. As most columns are highly right skewed, we shall equate them to the upper
whisker calculated using the formula = Q3 + 1.5*IQR. IQR is the Inter Quartile Range (Q3-Q1)
Looking at the box plot, it seems that the variables
Available_Impressions,Matched_Queries,Impressions,Clicks,Spend,Fee Revenue,CTRA,CPMA,CPCA
have outlier present in the variables.

Removal of outliers using formula

Box Plot view After Treatment:

After using the IQR score method, the outliers were treated and above box plot shows
no outliers.
iii. Perform z-score scaling and discuss how it affects the speed of
the algorithm.
z-score scaling is a method of standardization where the scores are centred around the
mean with a standard deviation of 1. This means that each score is converted to a number of
standard deviations from the mean.

iv. Construction of Dendrogram

A dendrogram is a graphical representation of a set of data, in which the individual elements are represented
by branches extending from a central point. The dendrogram can be used to visualize the relationships
between the elements, as well as the relative importance of each element.

Truncated Dendrogram:
A dendrogram is a tree-like diagram that shows the relationships between different groups of
data. A truncated dendrogram is a dendrogram that has been cut off at a certain point,
typically to make it easier to read or to focus on a particular group of data.
v.Make Elbow plot (up to n=10) and identify optimum
number of clusters for k-means algorithm.
The WSS elbow plot is a graphical tool used to determine the optimal number of clusters
in a data set. The plot is generated by calculating the within-cluster sum of squares (WSS)
for a range of cluster numbers and then plotting the results. The point on the plot where
the WSS begins to decrease more slowly is generally considered to be the optimal number
of clusters.
vi.Print silhouette scores for up to 10 clusters and identify
optimum number of clusters.
The silhouette score is a measure of how well each point in a cluster is matched to points in its
own cluster, as compared to points in other clusters. A higher silhouette score indicates that the
points in a cluster are more similar to each other than to points in other clusters.
The optimum number of clusters is the number of clusters that results in the highest silhouette
score.

Finalizing through the above silhouette scores, we can say that the
Optimal number of Clusters is 6 as its silhouette score is greater
than all the clusters.
vii. Profile the ads based on optimum number of clusters using silhouette score and your
domain understanding [Hint: Group the data by clusters and take sum or mean to
identify trends in Clicks, spend, revenue, CPM, CTR, & CPC based on Device Type.
Make bar plots]
viii. Project Summary:
Based on analysis of using the 2 different clustering Hierarchical clustering & K-
Means Clustering.
We have identified 5 optimum number of clusters through Hierarchical
clustering
We have identified chosen optimum number of clusters as n=6 based on the
silhouette scores
This project is aimed at providing a 24x7 advertisement company with a more
efficient way to cluster their data using hierarchical clustering and K-means
clustering. The company will be able to use this project to determine which
advertisement campaigns are most successful and to better target their
advertising.

DATA MINING Project Report
No ratings yet
DATA MINING Project Report
28 pages
Timeseries Forecasting Assignment - Rose
No ratings yet
Timeseries Forecasting Assignment - Rose
1,329 pages
Final Documentation
No ratings yet
Final Documentation
68 pages
Session 34 - 35clustering
No ratings yet
Session 34 - 35clustering
50 pages
Data Mining Business Report Set
No ratings yet
Data Mining Business Report Set
12 pages
Data Mining Project Ashwani 3 PDF
100% (1)
Data Mining Project Ashwani 3 PDF
20 pages
Cluster Analysis: Kaushik B
No ratings yet
Cluster Analysis: Kaushik B
41 pages
Data Mining
75% (4)
Data Mining
22 pages
Data Minning Project
No ratings yet
Data Minning Project
31 pages
Data Mining - Project
100% (2)
Data Mining - Project
25 pages
Sukanya 3rd December 2023 Machine Learning1 Coded
No ratings yet
Sukanya 3rd December 2023 Machine Learning1 Coded
58 pages
Business Report
No ratings yet
Business Report
20 pages
K-Means Clustering BI Tool Report
No ratings yet
K-Means Clustering BI Tool Report
24 pages
Data Mining Project DSBA Clustering Report Final
100% (4)
Data Mining Project DSBA Clustering Report Final
26 pages
Machine Learning-1 Project
No ratings yet
Machine Learning-1 Project
47 pages
ML-1 Project
No ratings yet
ML-1 Project
30 pages
Data Mining Project DSBA Clustering Report Final
No ratings yet
Data Mining Project DSBA Clustering Report Final
26 pages
UNIT-5 Question Bank
No ratings yet
UNIT-5 Question Bank
4 pages
Machine Learning-1 BUSINESS REPORT
No ratings yet
Machine Learning-1 BUSINESS REPORT
122 pages
Data Mining Project DSBA Clustering Report Final
No ratings yet
Data Mining Project DSBA Clustering Report Final
26 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
BTech IT
No ratings yet
BTech IT
81 pages
Clustering & Association Mining Basics
No ratings yet
Clustering & Association Mining Basics
50 pages
Data Mining Project - Parijat
No ratings yet
Data Mining Project - Parijat
28 pages
RAJIV RANJAN 22 Jan 2023
No ratings yet
RAJIV RANJAN 22 Jan 2023
66 pages
Fowlkes-Mallows & K-Means Clustering
No ratings yet
Fowlkes-Mallows & K-Means Clustering
6 pages
ML 1
No ratings yet
ML 1
27 pages
TSF - Rose Data
No ratings yet
TSF - Rose Data
31 pages
Peer Eval
No ratings yet
Peer Eval
6 pages
Companion To Marketing Data Miner
No ratings yet
Companion To Marketing Data Miner
3 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
No ratings yet
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
31 pages
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
No ratings yet
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
4 pages
Data Mining Project: Clustering & PCA
100% (1)
Data Mining Project: Clustering & PCA
44 pages
Data Mining Project - Brahma Chari
No ratings yet
Data Mining Project - Brahma Chari
23 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
26 pages
Clustering
No ratings yet
Clustering
104 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
Data Clustering Guide for Analysts
No ratings yet
Data Clustering Guide for Analysts
3 pages
Great Learning DATA MINING PROJECT
No ratings yet
Great Learning DATA MINING PROJECT
15 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
90 pages
VARUNSAINI - 11 Dec 2022
No ratings yet
VARUNSAINI - 11 Dec 2022
16 pages
P L Lohitha 11-11-22 Data Mining Business Report
No ratings yet
P L Lohitha 11-11-22 Data Mining Business Report
47 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Automobile Parts Manufacturing Company
No ratings yet
Automobile Parts Manufacturing Company
22 pages
Computer Vision Lecture Notes All Compress
No ratings yet
Computer Vision Lecture Notes All Compress
17 pages
Data Mining Project: Clustering & Model Analysis
100% (1)
Data Mining Project: Clustering & Model Analysis
40 pages
Data Mining Business Report 2
No ratings yet
Data Mining Business Report 2
18 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
AI-Driven Optimization System For Large-Scale Kubernetes Clusters Enhancing Cloud Infrastructure Availability Security and Disaster Recovery
No ratings yet
AI-Driven Optimization System For Large-Scale Kubernetes Clusters Enhancing Cloud Infrastructure Availability Security and Disaster Recovery
26 pages
LDA Slides n0.1
No ratings yet
LDA Slides n0.1
19 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Data Mining
No ratings yet
Data Mining
24 pages
Data Mining - Business Report: Clustering Clean - Ads
100% (4)
Data Mining - Business Report: Clustering Clean - Ads
24 pages
Hierarchical Clustering Examples
No ratings yet
Hierarchical Clustering Examples
13 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Data Mininig Project
67% (3)
Data Mininig Project
28 pages
Pranjal - Singh - 25.12.2022 - Data Mining Project
No ratings yet
Pranjal - Singh - 25.12.2022 - Data Mining Project
8 pages
Monika Sree 08-06-2024
No ratings yet
Monika Sree 08-06-2024
36 pages
K-Means Clustering Tutorial
No ratings yet
K-Means Clustering Tutorial
10 pages
SPSS Annotated Output K Means Cluster Anal
No ratings yet
SPSS Annotated Output K Means Cluster Anal
10 pages
ML 2024 Part6 Classification Unsupervised
No ratings yet
ML 2024 Part6 Classification Unsupervised
43 pages
Week 01 Lecture Material PDF
No ratings yet
Week 01 Lecture Material PDF
79 pages
MSC CS Mqp0708
No ratings yet
MSC CS Mqp0708
12 pages
Data Mining
No ratings yet
Data Mining
27 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
11 - Mimbi and Bankole 2015
No ratings yet
11 - Mimbi and Bankole 2015
15 pages
Incineration Documentation Bengaluru
No ratings yet
Incineration Documentation Bengaluru
8 pages
Data-Science - Introduction
No ratings yet
Data-Science - Introduction
35 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
CSE704 Data Analytics Syllabus Theory
No ratings yet
CSE704 Data Analytics Syllabus Theory
2 pages
PhD Chemical Engineering SOP
No ratings yet
PhD Chemical Engineering SOP
2 pages
Brochure XLSTAT
No ratings yet
Brochure XLSTAT
2 pages
Decision Trees for Medical Use
No ratings yet
Decision Trees for Medical Use
72 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Job Runtime Prediction of HPC Cluster Based On PC-Transformer
No ratings yet
Job Runtime Prediction of HPC Cluster Based On PC-Transformer
27 pages
Exploring Emotional Competence: Its Effects On Coping, Social Capital, and Performance of Salespeople
No ratings yet
Exploring Emotional Competence: Its Effects On Coping, Social Capital, and Performance of Salespeople
59 pages
Skin Disease Detection Using Image Processing Technique
No ratings yet
Skin Disease Detection Using Image Processing Technique
4 pages
Glossary of Common Machine Learning, Statistics and Data Science Terms - Analytics Vidhya
No ratings yet
Glossary of Common Machine Learning, Statistics and Data Science Terms - Analytics Vidhya
54 pages
Cluster Analysis for Statisticians
No ratings yet
Cluster Analysis for Statisticians
9 pages
Business Analytics for Professionals
No ratings yet
Business Analytics for Professionals
20 pages
(I64) A Swarm-Inspired Projection Algorithm PDF
No ratings yet
(I64) A Swarm-Inspired Projection Algorithm PDF
23 pages
UNIT4 Clustering
No ratings yet
UNIT4 Clustering
30 pages
Notes On Cluster Analysis
No ratings yet
Notes On Cluster Analysis
15 pages
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
No ratings yet
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
25 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
Adulteration
No ratings yet
Adulteration
9 pages
Finance & Risk Analytics QSTN 2 - Market Risk
No ratings yet
Finance & Risk Analytics QSTN 2 - Market Risk
10 pages
The Use of Artificial Intelligence (AI) in Qualitative Research For Theory Development
No ratings yet
The Use of Artificial Intelligence (AI) in Qualitative Research For Theory Development
18 pages
Unsupervised Learning: Harsha Vardhan Reddy Burri
No ratings yet
Unsupervised Learning: Harsha Vardhan Reddy Burri
10 pages
Traffic Data Visualization Tool
No ratings yet
Traffic Data Visualization Tool
8 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Data Science Syllabus Detailed Point Wise Answers
No ratings yet
Data Science Syllabus Detailed Point Wise Answers
3 pages
Bank Customer Segmentation Guide
No ratings yet
Bank Customer Segmentation Guide
32 pages