Amazon Product Data Analysis Report Final

The Amazon Product Data Analysis Report details the cleaning, exploratory data analysis, and visualizations of an Amazon product dataset. Key findings include an average discount of 56.6%, a mean product rating of 4.09, and a weak correlation between ratings and review counts. Machine learning models were built for regression and classification, with the classification model achieving an accuracy of approximately 76.4% in predicting product ratings.

Uploaded by

ashleshpai12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

Amazon Product Data Analysis Report Final

Uploaded by

ashleshpai12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Amazon Product Data Analysis Report

This report presents a comprehensive data analysis of the provided Amazon

product dataset. The analysis includes data cleaning, exploratory data analysis
(EDA), and visualizations to derive key insights about product pricing, ratings,
and categories.

1. Data Cleaning and Preparation

The raw dataset contained several columns with mixed data types and special
characters, which needed to be cleaned for analysis.
• The discounted_price, actual_price, rating, and rating_count columns
were initially stored as strings.
• These columns were cleaned by removing symbols like '₹', '%', and ','
before converting them to a numeric data type (float or integer).
• Any rows with missing rating values were removed, as this is a critical
column for product analysis.
• Duplicate product entries were identified and removed to ensure data
integrity.
• A new column, discount_amount, was created to represent the absolute
discount value for each product, calculated as the difference between
actual_price and discounted_price.
• Code Snippet:

This process resulted in a clean and structured dataset ready for analysis.

2. Exploratory Data Analysis (EDA)

Here are the key descriptive statistics and insights from the cleaned data:

Metric discounted_pr actual_pri discount_percent rating_cou

rating
ice (₹) ce (₹) age (%) nt
1435.0
Count 1435.00 1435.00 1435.00 1435.00
0
Mean 1374.96 3217.43 56.59 4.09 19637.38
Media
449.00 1000.00 60.00 4.10 5064.00
n
129900.0
Max 77990.00 94.00 5.00 426973.00
0
Min 60.00 60.00 0.00 2.00 0.00

Key Findings:
• The average discount across products is approximately 56.6%, indicating
that many items are sold at significantly reduced prices.
• The mean product rating is 4.09, suggesting a generally positive
customer satisfaction.
• The ratings count has a large range, with a few products having an
extremely high number of reviews (max: 426,973). This indicates a
longtailed distribution where a few popular products dominate the
review landscape.

3. Visualizations
The following plots provide a deeper look into the data, revealing trends and
relationships between different variables.
1. Distribution of Product Ratings
This histogram shows that the majority of products have a high rating,
clustering between 4.0 and 4.5. This indicates that most products on the
platform are well-received by customers.
2. Distribution of Discount Percentages
The distribution of discounts is right-skewed, with a prominent peak around
50% to 70%. This confirms the high discount strategy observed in the
descriptive statistics and suggests that products are frequently offered at a
deep markdown.
3. Relationship between Rating and Rating Count
This scatter plot, with a logarithmic scale on the x-axis, reveals a weak positive
correlation between rating and rating count. While highly-rated products
(above 4.0) tend to have more reviews, there are also many products with high
ratings and very few reviews. This suggests that while a high rating is often a
sign of a popular product, it is not the sole determinant of its review count.
4. Relationship between Actual Price and Discounted Price
The plot above shows a clear linear relationship between the actual price and
the discounted price. This pattern suggests a consistent pricing strategy where
discounts are applied proportionally across different price points, with
higherpriced products receiving larger discounts in absolute terms.
5. Top 10 Product Categories
The bar chart highlights the most prevalent product categories in the dataset.
"Electronics" is the most common, followed by "Computers & Accessories"
and "Home & Kitchen". This gives a clear overview of the types of products
included in the dataset, with a strong emphasis on technology and home goods.

Graphical Analysis:
Machine Learning – Model Building
(a) Regression Attempt (SVR)
• Built a pipeline with preprocessing (scaling + one-hot encoding) and
Support Vector Regression.
• Evaluated model with R² score and RMSE.
• Found that R² score was low (~0.1) → meaning regression was not very
predictive.
(b) Classification Approach
• Converted rating into a binary label:
o good_product = 1 if rating ≥ 4.0, else 0.
• Selected features: discounted_price, actual_price, discount_percentage,
rating_count, main_category.
• Built a pipeline with preprocessing + Random Forest Classifier.
• Split data into train/test sets.
• Trained the model and evaluated it using:
o Accuracy
o Precision, Recall, F1-score (via classification_report).
This allowed you to measure how well the model can classify products as good
(≥ 4.0) or bad (< 4.0).

Output:
Classification Performance:
Accuracy: 0.7642857142857142
Classification Report:
precision recall f1-score support

0 0.62 0.31 0.41 75

1 0.79 0.93 0.85 205

accuracy 0.76 280

macro avg 0.70 0.62 0.63 280
weighted avg 0.74 0.76 0.73 280

Results:
o Accuracy score → proportion of correct predictions.
o Classification report → detailed performance per class (precision,
recall, F1)

LSS Curriculm
No ratings yet
LSS Curriculm
3 pages
6165-10 L2 Qualification Handbook v2 PDF
No ratings yet
6165-10 L2 Qualification Handbook v2 PDF
56 pages
Stair, Staircase and Ramps
No ratings yet
Stair, Staircase and Ramps
18 pages
Physics Cambridge Igcse Year 10 Paper 1
No ratings yet
Physics Cambridge Igcse Year 10 Paper 1
18 pages
Experiment 4 - Numerical Differentiation
No ratings yet
Experiment 4 - Numerical Differentiation
6 pages
Textbook of Clinical Embryology 2nd Edition Vishram Singh PDF Download
100% (1)
Textbook of Clinical Embryology 2nd Edition Vishram Singh PDF Download
153 pages
Nernst Heat Theorem
No ratings yet
Nernst Heat Theorem
10 pages
Mathematical Quantization 1st Edition Nik Weaver Instant Download
100% (6)
Mathematical Quantization 1st Edition Nik Weaver Instant Download
61 pages
BTCS9202 Data Sciences Lab Manual
No ratings yet
BTCS9202 Data Sciences Lab Manual
39 pages
Set Theory: Well-Defined Collections and Sets
No ratings yet
Set Theory: Well-Defined Collections and Sets
32 pages
Data Science Project Report
No ratings yet
Data Science Project Report
45 pages
Predicting Mobile Phone Pricing Using Machine Learning
No ratings yet
Predicting Mobile Phone Pricing Using Machine Learning
8 pages
Predicting Customer Satisfaction Using Machine Learning: Insights From Behavioral and Demographic Data
No ratings yet
Predicting Customer Satisfaction Using Machine Learning: Insights From Behavioral and Demographic Data
17 pages
City University of Hong Kong Course Syllabus Offered by Department of Mathematics With Effect From Semester - A - 20 - 15 - / 16
No ratings yet
City University of Hong Kong Course Syllabus Offered by Department of Mathematics With Effect From Semester - A - 20 - 15 - / 16
6 pages
PID Controller
No ratings yet
PID Controller
5 pages
Chapter 15: Helical, Bevel and Worm Gears: The Main Object of Science Is The Freedom and Happiness of Man
No ratings yet
Chapter 15: Helical, Bevel and Worm Gears: The Main Object of Science Is The Freedom and Happiness of Man
32 pages
Python Project 20 Amazon Data 1690998456
No ratings yet
Python Project 20 Amazon Data 1690998456
14 pages
Argos: UK Retail Giant's Growth
No ratings yet
Argos: UK Retail Giant's Growth
14 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Tutorials
No ratings yet
Tutorials
75 pages
Grade 5 Math Lesson Plan: Volume
No ratings yet
Grade 5 Math Lesson Plan: Volume
14 pages
Eda Project Final Report
No ratings yet
Eda Project Final Report
40 pages
PM Guided Project Sample Business Report
No ratings yet
PM Guided Project Sample Business Report
35 pages
Data Analysis On BigMart Sales
67% (3)
Data Analysis On BigMart Sales
17 pages
Food Sales Prediction Presentation
No ratings yet
Food Sales Prediction Presentation
10 pages
Data Driven Inventory Optimization & Issue Reduction For ReefCap
No ratings yet
Data Driven Inventory Optimization & Issue Reduction For ReefCap
37 pages
Current, Resistance, Emf - Summative Test
No ratings yet
Current, Resistance, Emf - Summative Test
3 pages
Retail Price Optimization
No ratings yet
Retail Price Optimization
16 pages
Finding Volume
No ratings yet
Finding Volume
9 pages
DM Lab Report
No ratings yet
DM Lab Report
13 pages
Wavelet Theory and Application in Communication An
No ratings yet
Wavelet Theory and Application in Communication An
18 pages
Efficient Barcode Decoding Algorithm
No ratings yet
Efficient Barcode Decoding Algorithm
6 pages
DC-1 Assignment-8
No ratings yet
DC-1 Assignment-8
5 pages
Rahat Test 1
No ratings yet
Rahat Test 1
2 pages
AIPPTMaker - E-Commerce Insights - Product Sales Analysis and Recommendations
No ratings yet
AIPPTMaker - E-Commerce Insights - Product Sales Analysis and Recommendations
32 pages
PM GRADED PROJECT Wagisha Jain
No ratings yet
PM GRADED PROJECT Wagisha Jain
21 pages
Introductory of Statistics - Chapter 4
No ratings yet
Introductory of Statistics - Chapter 4
5 pages
Plate Notebook Guided Project 1 1
No ratings yet
Plate Notebook Guided Project 1 1
58 pages
Circles
No ratings yet
Circles
2 pages
Estimation of The Low Cycle Fatigue Life For Submarine Pressure Hull
No ratings yet
Estimation of The Low Cycle Fatigue Life For Submarine Pressure Hull
12 pages
Mobile Phone Price Classification and Prediction - Final Project
No ratings yet
Mobile Phone Price Classification and Prediction - Final Project
7 pages
Report
No ratings yet
Report
8 pages
Big Mart Sales Prediction Using Machine Learning Report PDF
No ratings yet
Big Mart Sales Prediction Using Machine Learning Report PDF
56 pages
T Sivaprakash MBA BA03 040 Capstone Project
No ratings yet
T Sivaprakash MBA BA03 040 Capstone Project
16 pages
Etp Report
No ratings yet
Etp Report
17 pages
File 2620
No ratings yet
File 2620
24 pages
7com1078 Cap Mock 2021
No ratings yet
7com1078 Cap Mock 2021
2 pages
Grade 9 Tos - WW1
No ratings yet
Grade 9 Tos - WW1
2 pages
Ankit Survey Paper
No ratings yet
Ankit Survey Paper
6 pages
Find My Tech
No ratings yet
Find My Tech
14 pages
Math Reviewer
No ratings yet
Math Reviewer
1 page
Java
No ratings yet
Java
34 pages
ML Project
100% (1)
ML Project
10 pages
Ids Case Study
No ratings yet
Ids Case Study
15 pages
Amazon Sales Data Analysis
No ratings yet
Amazon Sales Data Analysis
32 pages
Laptop Price Pred
No ratings yet
Laptop Price Pred
11 pages
TEM NOTES Overview
No ratings yet
TEM NOTES Overview
16 pages
Phase 4
No ratings yet
Phase 4
11 pages
Boosting Sales Margin of A Product Using Price
No ratings yet
Boosting Sales Margin of A Product Using Price
18 pages
Retail Pricing Using Optimization - by Riya Kulshrestha - Analytics Vidhya - Medium
No ratings yet
Retail Pricing Using Optimization - by Riya Kulshrestha - Analytics Vidhya - Medium
16 pages
Optional Challenge 2
0% (6)
Optional Challenge 2
3 pages
Project Analysis of Shopping Trends Using Data Analytics
No ratings yet
Project Analysis of Shopping Trends Using Data Analytics
4 pages
Coursera Capstone Project
No ratings yet
Coursera Capstone Project
4 pages
Sales Prediction and Product Recommendation Model Through
No ratings yet
Sales Prediction and Product Recommendation Model Through
20 pages
AML Assignment 1 1
No ratings yet
AML Assignment 1 1
4 pages
AaronBeatty Amazon
No ratings yet
AaronBeatty Amazon
15 pages
SS Teamproject Documentation
No ratings yet
SS Teamproject Documentation
33 pages
Olist Kasyapa
No ratings yet
Olist Kasyapa
22 pages
Full Text 02
No ratings yet
Full Text 02
52 pages
IJCRT2105404 Bigmart 4
No ratings yet
IJCRT2105404 Bigmart 4
4 pages
Amazon Review Data Analysis
No ratings yet
Amazon Review Data Analysis
23 pages
UI21CS29 Lab2
No ratings yet
UI21CS29 Lab2
11 pages
Balaji 1
No ratings yet
Balaji 1
30 pages
Amazon Apparel PDF
No ratings yet
Amazon Apparel PDF
138 pages
Retail Market Analysis: Ke Yuan, Yaoxin Liu, Shriyesh Chandra, Rishav Roy New York University
No ratings yet
Retail Market Analysis: Ke Yuan, Yaoxin Liu, Shriyesh Chandra, Rishav Roy New York University
12 pages
Laptop Price Predictor Final Report
No ratings yet
Laptop Price Predictor Final Report
7 pages
Product Characterisation Towards Personalisation
No ratings yet
Product Characterisation Towards Personalisation
10 pages
Moblie Price Classification 1
No ratings yet
Moblie Price Classification 1
11 pages
Black Friday Sales Prediction Project
No ratings yet
Black Friday Sales Prediction Project
14 pages
Features For The Dataset
No ratings yet
Features For The Dataset
3 pages
Eda Presentation
No ratings yet
Eda Presentation
12 pages
For Office Use Only T1 T2 T3 T4 Team Control Number For Office Use Only F1 F2 F3 F4
No ratings yet
For Office Use Only T1 T2 T3 T4 Team Control Number For Office Use Only F1 F2 F3 F4
19 pages
BigMart Sales Prediction Python Project
No ratings yet
BigMart Sales Prediction Python Project
5 pages
Laptop Price Predicton Report
No ratings yet
Laptop Price Predicton Report
30 pages
Lalit Suryavanshi - DSA-II - Assignment
No ratings yet
Lalit Suryavanshi - DSA-II - Assignment
36 pages
Assgn
No ratings yet
Assgn
6 pages
Thesis Goal and Algo
No ratings yet
Thesis Goal and Algo
2 pages