Amazon Recommendations Project

The document outlines the development of a recommender system using a Kaggle dataset of over 500,000 Amazon reviews, focusing on predicting the top 5 rated items for users. Data insights reveal challenges such as a skewed rating distribution and a high number of duplicate ratings, leading to a refined dataset of 15,000 users and 25,000 items. Various models were evaluated, with a Random Forest Regressor selected for deployment despite potential overfitting, and recommendations were generated for users based on their purchase history.

Uploaded by

tram.levobao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views14 pages

Amazon Recommendations Project

Uploaded by

tram.levobao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Amazon items

recommendation
Dor Meir, 22.2.2024
Introduction
 Kaggle dataset, collected from Amazon reviews
 Content
 500000+ reviews
 100000+ users
 100000+ items
 12 features about items, reviews and users
 Task
build a recommender system: top 5 rating-predicted
items

Model Comparison Service

Introduction Data Insights
Selection Methodology Deployment
Data Insights
• Target (ratings) is left skewed
• 77% items < 5 reviews, 87% users < 6 reviews
• After filtering  15k users, 25k items
• 5% user ratings unverified  dropped
• Assume each userName is unique
• 14% ratings by non-unique “Amazon” or “Kindle”
Customer
• 73k ratings are duplicates  dropped
• Train (80% data) is 136k rows
Model Comparison Service
Introduction Data Insights
Selection Methodology Deployment
Data Insights - continued
• 60% items < 2 brands, 7% ratings mention prices
 Item_id = brand + itemName + price
• Top item: KIND’s Caramel snack, 500 reviews (0.4%)
• Median review:
• 2 words summary (“X stars”), 509 chars description, no votes
• one word Text (“Good”) 5 images, 5 features, price of 13$
• Brands: 8k unique, 2% of reviews: “KONG”, 0.1% missing
• Categories: 9 unique (<1% were merged), 44% “Pet Supplies”
• Prices: 10 price groups, 14.6% missing, imputed by:
1. 0.5% - Same item_id older reviews (275 dates, no change over time)
2. 12.8% - brand & category means (most are close to mean), 1.3% Categories
Model Comparison Service
Introduction Data Insights
Selection Methodology Deployment
Data Insights - continued
• 450 features engineered
• Dummies for categories
• Dummies for rows of missing values and outliers
• User specific statistics: min, mean ,median ,max, std
• Item specific statistics: min, mean ,median ,max, std
• Collaborative filtering: weighted averages of similar users
• 80 features dropped due to high correlation (90%)
• Some features had different distribution for train, validation
and test
• Highest correlated features to target (but less than 50%)
• Item specific rating: mean, median, max, std.
Model Comparison Service
Introduction Data Insights
Selection Methodology Deployment
Model Selection
• Mean user rating (Baseline Model)
• Benchmark. Had surprisingly low error.
• Linear Regression
• The go-to regression model. Highly Interpretable
• Linear Regression L1 regularization (Lasso):
• LR needed feature selection. Only one important feature.
• Random Forest Regressor (different max_depths):
• Lasso didn’t capture complex relationships. Non-linear Bagging +
selection
• Light Gradient Boosting Machine (LightGBM):
• Non-linear Boosting, complex + selection, asymmetrical, very fast
Model Comparison Service
Introduction Data Insights
Selection Methodology Deployment
Comparison Methodology
• Metrics
• Mean Absolute Error (MAE)
• Rather easy to interpret, large errors have proportionally large impact.
• Mean Squared Error (MSE)
• Squaring gives bigger weight to bigger errors. Common LR loss function.
• Mean Absolute Percentage Error (MAPE)
• Average abs differences in %. Interpretable, but overestimation is downplayed.
• R2 (Coefficient of Determination)
• In linear models, % of variance explained. Mathematically increases with
features.
• Mean absolute error on each target value, Max error
• Distribution of error over bins (and max error). Each range:
underfitting/overfitting.
Model Comparison Service
Introduction Data Insights
Selection Methodology Deployment
Comparison Methodology
• RF max_depth=2
• had 3 important features, but MAE & MSE > baseline.
• RF max_depth=None
• MAE < baseline, MSE > baseline – more extreme errors. Smaller errors on rating <=3.
depth – overfit?
• LGBM defaults
• More equal feature importance, MAE & MSE > baseline.
• LGBM grid searched on max_depth, learning_rate, n_estimators.
• Similar errors, a little better on rating <=3.
• RF grid search on max_depth
• Higher Complexity  better MAE & MSE
• Didn’t outperform max_depth=None
• Not enough time to mitigate overfitting,
With more n_estimators or feature selection

Model Comparison Service

Introduction Data Insights
Selection Methodology Deployment
Model Selection -decision
• RF max_depth=None
• MAE < baseline, MSE > basline (bigger errors), better on smaller bins  filters unlikable
items better.
• Even if we wanted to, we can’t pick basline as it gives same rating to all possible user’s
items.
• For lack of time, selected this though it might overfit.

• As expected, was a little worse on Test data. Hoping final train on entire data will
Top important
mitigate a little.
features are similar
on entire data model

Model Comparison Service

Introduction Data Insights
Selection Methodology Deployment
Service Deployment
• Dropped items with < 0.1% ratings.
• Similarity features took too long to compute with not enough benefit 
dropped
• Predict ratings for all pairs of users and items the users haven't purchased.
• Filtered top 5 recommendations per user, exported to csv.
• Filtered all users top 5 recommendations in different
5 categories, exported
to csv. recommende
d items
User with
history? No – 5 Most
popular items
User exists? per category

5 Most
No popular items
per category

Model Comparison Service

Introduction Data Insights
Selection Methodology Deployment
Service Deployment

Model Comparison Service

Introduction Data Insights
Selection Methodology Deployment
Service Deployment

Model Comparison Service

Introduction Data Insights
Selection Methodology Deployment
Service Deployment

Run app

Model Comparison Service

Introduction Data Insights
Selection Methodology Deployment
Questions?

Understanding Recommendation Systems
No ratings yet
Understanding Recommendation Systems
45 pages
41 Perusse Alexander Aperusse PDF
No ratings yet
41 Perusse Alexander Aperusse PDF
7 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
Lec 5-d Analytics Recommenders
No ratings yet
Lec 5-d Analytics Recommenders
39 pages
AaronBeatty Amazon
No ratings yet
AaronBeatty Amazon
15 pages
Predict
No ratings yet
Predict
196 pages
Chatbots & Recommendation Systems Final Review
No ratings yet
Chatbots & Recommendation Systems Final Review
49 pages
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
Machine Learning Recommender Systems
No ratings yet
Machine Learning Recommender Systems
33 pages
Recommender Systems Overview
No ratings yet
Recommender Systems Overview
26 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
第十讲-Recommender Systems
No ratings yet
第十讲-Recommender Systems
81 pages
SML PBL
No ratings yet
SML PBL
18 pages
Inn Aat Report
No ratings yet
Inn Aat Report
10 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
10 Recommender Systems
No ratings yet
10 Recommender Systems
35 pages
CSE545 sp23 (9) Recommendation Systems 4-10
No ratings yet
CSE545 sp23 (9) Recommendation Systems 4-10
72 pages
A Review On Web Service Recommendation System Using Collaborative Filtering
No ratings yet
A Review On Web Service Recommendation System Using Collaborative Filtering
6 pages
Python Recommendation Systems Guide
No ratings yet
Python Recommendation Systems Guide
11 pages
Global Baseline Estimate - 12S21009
No ratings yet
Global Baseline Estimate - 12S21009
8 pages
Title Obvhbresearch Project
No ratings yet
Title Obvhbresearch Project
7 pages
CS345A Data Mining: Recommendation Systems
No ratings yet
CS345A Data Mining: Recommendation Systems
26 pages
Papers Summary
No ratings yet
Papers Summary
9 pages
RS Part 1
No ratings yet
RS Part 1
40 pages
Recommender Systems: Collaborative Filtering & Content-Based Recommending
No ratings yet
Recommender Systems: Collaborative Filtering & Content-Based Recommending
47 pages
Recommendation Systems
No ratings yet
Recommendation Systems
6 pages
Team8 Presentation
No ratings yet
Team8 Presentation
15 pages
Experiment
No ratings yet
Experiment
36 pages
DA Slide 5,6,7
No ratings yet
DA Slide 5,6,7
11 pages
Recommender: An Analysis of Collaborative Filtering Techniques
No ratings yet
Recommender: An Analysis of Collaborative Filtering Techniques
5 pages
Module 2
No ratings yet
Module 2
53 pages
A Review of Information Filtering-CF
No ratings yet
A Review of Information Filtering-CF
47 pages
Module4.4-Case Study and Project-Recommendation System
No ratings yet
Module4.4-Case Study and Project-Recommendation System
16 pages
RecSys Updated
No ratings yet
RecSys Updated
37 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
36 pages
What Are Recommendation Engines ?: Item - Feature Matrix
No ratings yet
What Are Recommendation Engines ?: Item - Feature Matrix
6 pages
Recommendation Systems
No ratings yet
Recommendation Systems
62 pages
Recommender Systems
No ratings yet
Recommender Systems
12 pages
AI Exercises
No ratings yet
AI Exercises
2 pages
Recommender - Introduction
No ratings yet
Recommender - Introduction
25 pages
Recommender System Unit Ii
No ratings yet
Recommender System Unit Ii
14 pages
For Office Use Only T1 T2 T3 T4 Team Control Number For Office Use Only F1 F2 F3 F4
No ratings yet
For Office Use Only T1 T2 T3 T4 Team Control Number For Office Use Only F1 F2 F3 F4
19 pages
Collaborative Filtering Using A Regression-Based Approach: Slobodan Vucetic
No ratings yet
Collaborative Filtering Using A Regression-Based Approach: Slobodan Vucetic
22 pages
Social Suggest Team Report
No ratings yet
Social Suggest Team Report
52 pages
Recommender Systems Overview
No ratings yet
Recommender Systems Overview
72 pages
Module 5
No ratings yet
Module 5
8 pages
Recommenation Systems Project - Padma - Business Presentation
No ratings yet
Recommenation Systems Project - Padma - Business Presentation
19 pages
Proposal-Writeup TU Alumni 2017
No ratings yet
Proposal-Writeup TU Alumni 2017
66 pages
Introduction To Algorithms For Behavior Based Recommendation
No ratings yet
Introduction To Algorithms For Behavior Based Recommendation
36 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
DM Lect 6 - Recommender Systems
No ratings yet
DM Lect 6 - Recommender Systems
46 pages
Filter 2
No ratings yet
Filter 2
7 pages
Collaborative Filtering & Content-Based Recommending: CS 293S. T. Yang Slides Based On R. Mooney at UT Austin
No ratings yet
Collaborative Filtering & Content-Based Recommending: CS 293S. T. Yang Slides Based On R. Mooney at UT Austin
22 pages
Slides Lecture 2 RecSys
No ratings yet
Slides Lecture 2 RecSys
86 pages
Module5 Recommender Systems PartA
No ratings yet
Module5 Recommender Systems PartA
54 pages
Chapter 9 - Recommendation Systems
No ratings yet
Chapter 9 - Recommendation Systems
12 pages
Construction BOQ: Water Riser & Plumbing
No ratings yet
Construction BOQ: Water Riser & Plumbing
2 pages
Process Control for Engineers
No ratings yet
Process Control for Engineers
30 pages
EASA Part 145 MOE Compliance Guide
No ratings yet
EASA Part 145 MOE Compliance Guide
54 pages
3D Area Clearance
100% (1)
3D Area Clearance
6 pages
IPT Review
No ratings yet
IPT Review
22 pages
DEVILBISS
No ratings yet
DEVILBISS
2 pages
Engineering List
No ratings yet
Engineering List
3 pages
Gas Welding & Cutting Basics
No ratings yet
Gas Welding & Cutting Basics
73 pages
1.1 KV Grade XLPE Cables
No ratings yet
1.1 KV Grade XLPE Cables
28 pages
LBA Guide - 20070813
No ratings yet
LBA Guide - 20070813
15 pages
ASP Scripting Guide for Developers
No ratings yet
ASP Scripting Guide for Developers
17 pages
Wabt
No ratings yet
Wabt
5 pages
Installation, Commissioning and Maintenance Dyneo+: Magnet Assisted Reluctance Motors
No ratings yet
Installation, Commissioning and Maintenance Dyneo+: Magnet Assisted Reluctance Motors
28 pages
Quality Control Check Sheets
100% (2)
Quality Control Check Sheets
14 pages
Clinical Data Management
100% (4)
Clinical Data Management
39 pages
New Syllabus ARS NET 2012
No ratings yet
New Syllabus ARS NET 2012
4 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
77 pages
Omniswitch 6350 Family Datasheet en
No ratings yet
Omniswitch 6350 Family Datasheet en
8 pages
Cylinder Lubrication Guidelines: Service Letter SL12-553/HRR
No ratings yet
Cylinder Lubrication Guidelines: Service Letter SL12-553/HRR
3 pages
Ijuk Fiber Composite for Motorcycles
No ratings yet
Ijuk Fiber Composite for Motorcycles
6 pages
CRF Work Flow Process & SLA
No ratings yet
CRF Work Flow Process & SLA
5 pages
Vinilon Group Weekly Report Summary
100% (1)
Vinilon Group Weekly Report Summary
8 pages
Pavement Design Report (PK 50 - PK 61.3) - Rev.1 05 May 2021
100% (1)
Pavement Design Report (PK 50 - PK 61.3) - Rev.1 05 May 2021
94 pages
Data Structures for Students
No ratings yet
Data Structures for Students
11 pages
Ald Software Fmea
No ratings yet
Ald Software Fmea
2 pages
Summer Internship
No ratings yet
Summer Internship
4 pages
Inclusive Components Free Chapter PDF
No ratings yet
Inclusive Components Free Chapter PDF
40 pages
Gnanamani College of Technology
No ratings yet
Gnanamani College of Technology
22 pages
AON7408 30V N-Channel MOSFET: Features General Description
No ratings yet
AON7408 30V N-Channel MOSFET: Features General Description
4 pages
Hjefhevhfviwaevfiew
No ratings yet
Hjefhevhfviwaevfiew
7 pages

Amazon Recommendations Project

Uploaded by

Amazon Recommendations Project

Uploaded by

Amazon items

Model Comparison Service

Model Comparison Service

Model Comparison Service

Model Comparison Service

Model Comparison Service

Model Comparison Service

Model Comparison Service

You might also like