Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views35 pages

Unit 4 Notes

The document outlines the fundamentals of recommendation systems, including their definition, types (content-based, collaborative filtering, and hybrid systems), and how they analyze user behavior to generate personalized suggestions. It discusses the utility matrix, the long tail phenomenon, and various applications across industries such as e-commerce, entertainment, and education. Additionally, it covers methods for populating the utility matrix and techniques for discovering features of documents for content-based recommendations.

Uploaded by

chayashreev07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views35 pages

Unit 4 Notes

The document outlines the fundamentals of recommendation systems, including their definition, types (content-based, collaborative filtering, and hybrid systems), and how they analyze user behavior to generate personalized suggestions. It discusses the utility matrix, the long tail phenomenon, and various applications across industries such as e-commerce, entertainment, and education. Additionally, it covers methods for populating the utility matrix and techniques for discovering features of documents for content-based recommendations.

Uploaded by

chayashreev07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

UNIT 4 SYLLABUS: Recommendation Systems Model for Recommendation Systems,

Utility Matrix, Content- Based Recommendations, Discovering Features of Documents,


Collaborative Filtering, Application of dimensionality reduction.

WHAT IS A RECOMMENDATION SYSTEM?

A recommendation system is a software tool or technique designed to predict user


preferences and suggest items they might like. These systems are commonly used in
various industries, such as e-commerce, streaming platforms, and social media, to
personalize the user experience and improve engagement.

How Recommendation Systems Work

1. Analyze User Behavior: Track user interactions (e.g., clicks, purchases, or views) to
understand preferences.

2. Identify Patterns: Use data and algorithms to find correlations between users and
items.

3. Generate Recommendations: Suggest items that align with the user's interests or that
similar users have liked.

Real-World Applications

• E-commerce: Amazon suggests products based on browsing and purchase history.

• Streaming Platforms: Netflix recommends shows or movies based on what you’ve


watched.

• Social Media: Instagram suggests accounts or posts based on your interactions.

Example: Consider a movie platform like Netflix:

• User Action: You watch and rate several romantic comedies.

• System Analysis: The system notices your preference for this genre.

• Recommendation: Netflix suggests similar movies like Crazy Rich Asians or The
Proposal.

Types of Recommendation Systems

There are two main types of recommendation systems, each with a unique approach to
making suggestions:

1. Content-Based Recommendation Systems: These systems focus on the features of


items and suggest similar ones based on user preferences.
How It Works

• Each item is described using its features (e.g., genre, cast, or director for movies).

• The system analyzes what the user has interacted with and finds items with similar
features.

Example

• Scenario: You watch The Dark Knight (Action/Thriller) on Netflix.

• System Insight: Recognizes your interest in action-packed movies.

• Recommendation: Suggests movies like Inception or Avengers: Endgame, which share


similar themes.

Advantages

1. Works well for users with clear preferences.

2. No need for other users' data.

Limitations

1. Struggles with the "cold start" problem for new users with no history.

2. Can lead to a narrow focus, recommending only similar items (lack of diversity).

2. Collaborative Filtering Systems: These systems focus on user relationships and use
the preferences of similar users to make recommendations.

How It Works

• Finds users with similar tastes based on their interactions.

• Recommends items that similar users have liked.

Types of Collaborative Filtering

1. User-Based Collaborative Filtering: Recommends items based on the preferences of


similar users.

o Example: If you and another user both rated Inception and The Matrix highly, the
system might recommend Interstellar to you if they liked it.

2. Item-Based Collaborative Filtering: Finds relationships between items based on user


ratings.

o Example: If most users who liked Harry Potter also liked Lord of the Rings, the system
might recommend the latter to you.
Advantages

1. Can provide diverse recommendations beyond user preferences.

2. Handles the cold start problem for items (but not users).

Limitations

1. Requires a large amount of user interaction data.

2. Suffers from the sparsity problem: If users interact with only a few items, it’s harder
to find patterns.

Hybrid Systems: Many platforms, such as Netflix, combine both approaches:

1. Use content-based methods to identify user preferences.

2. Apply collaborative filtering to diversify recommendations.

9.1. A MODEL FOR RECOMMENDATION SYSTEM

9.1.1 The Utility Matrix

Definition: A utility matrix is a data structure that represents the relationship between
two sets of entities, typically users and items, by storing their preferences. It is a
foundational concept in recommendation systems.

Structure

• Rows: Represent users.

• Columns: Represent items (e.g., movies, products, songs).

• Values: Represent user preferences for items, such as ratings (e.g., 1–5 stars) or binary
interactions (e.g., 1 for "liked," blank for "not interacted").

Key Features

1. Sparsity: Most entries in the matrix are blank since users interact with only a few items.

2. Prediction Goal: The recommendation system predicts the blank entries to suggest
items.
1. Known Ratings: User 1 gave a 4 to Movie A and a 5 to Movie B.

2. Prediction Task: Predict User 1's rating for Movie C based on patterns in the matrix.

Use in Recommendation Systems: The utility matrix is used in two primary ways:

1. Content-Based Systems: Use item properties (e.g., genres, features) to predict missing
entries.

o Example: If User 1 liked two action movies, predict a high rating for Movie C if it's also
an action movie.

2. Collaborative Filtering Systems: Use similarities between users or items.

o Example: If User 2 and User 1 rated Movie B similarly, predict User 2's preference for
Movie A based on User 1's rating.

Challenges

1. Sparsity: Most users interact with only a small fraction of available items.

o Solution: Algorithms like clustering or dimensionality reduction can fill in gaps.

2. Cold Start Problem:

o For new users: No past interactions to base predictions on.

o For new items: No user data to suggest them.

o Solution: Use hybrid models that combine content-based and collaborative filtering
techniques.

9.1.2 The Long Tail Phenomenon

Definition: The long tail phenomenon refers to the ability of online platforms to cater to
both popular and niche items, unlike traditional physical stores that focus only on the
most popular items due to space limitations.

Concept

• Popular Items: Represent a small fraction of the total inventory but account for most
sales in physical stores.
• Niche Items: Represent a large fraction of the inventory and contribute collectively to
significant sales online.

Illustration: Imagine the sales distribution of books:

• Physical Bookstores: Stock only the top 1,000 bestsellers due to limited shelf space.

• Amazon: Offers millions of books, including rare or niche ones.

Advantages of the Long Tail

1. Increased Diversity: Online platforms can offer users access to items they might not
have considered.

2. Discovery of Niche Products: Tailored recommendations help users explore lesser-


known items.

9.1.3 Applications of Recommendation Systems

Recommendation systems are widely used across industries to personalize user


experiences and boost engagement. Here’s a detailed look at their applications:

1. E-commerce: Suggest products to enhance user shopping experience and increase


sales.

How It Works: Recommendations are based on user browsing history, past purchases, or
product searches.

Examples:

1. Amazon: Suggests "Customers who bought this also bought" items.

o If you purchase a smartphone, it might recommend a phone case or screen protector.

2. Flipkart: Recommends similar or complementary products based on your cart items.

2. Entertainment Platforms: Help users discover new content based on their viewing
or listening history.

How It Works: Systems analyse user preferences and find patterns across similar users.

Examples:

1. Netflix:

o Recommends movies or series based on your watch history.

o If you watch Stranger Things, it might suggest Dark or The Umbrella Academy.

2. Spotify:
o Suggests playlists or artists based on your listening habits.

o Its "Discover Weekly" playlist is generated using collaborative filtering.

3. News Platforms: Deliver personalized news articles to keep users engaged.

How It Works: Uses content-based and collaborative filtering techniques to recommend


articles.

Examples:

1. Google News: Recommends articles aligned with topics you frequently read.

2. Flipboard: Suggests stories based on your selected categories and interaction history.

4. Education: Help learners find relevant courses or study materials.

How It Works: Systems consider previous course completions, interests, or ratings.

Examples:

1. Coursera: Recommends courses based on enrolled or completed courses.

2. Khan Academy: Suggests topics or exercises tailored to student progress.

5. Social Media: Enhance user experience by suggesting connections, content, or


communities.

How It Works: Uses collaborative filtering to recommend friends or groups.

Examples:

1. Facebook: Suggests "People you may know" based on mutual friends.

2. Instagram: Recommends accounts to follow based on similar interests.

6. Retail: Improve in-store and online shopping by analysing preferences.

How It Works: Suggests products using user profiles or purchase patterns.

Example:

• Walmart: Uses recommendation systems to suggest complementary items during


checkout (e.g., adding batteries when buying a remote).

7. Healthcare: Provide recommendations for medical treatments, exercises, or health tips.

How It Works: Combines user data with medical knowledge to make suggestions.

Example:

• Fitness Apps: Recommend exercises based on activity history and goals.


9.1.4 Populating the Utility Matrix

The section Populating the Utility Matrix focuses on the challenge of filling the utility
matrix with user-item interactions. A utility matrix is central to recommendation systems
as it represents the degree of preference (e.g., ratings) users have for items. This section
discusses two primary approaches for obtaining the data needed to populate this matrix.

Approaches for Populating the Utility Matrix

1. Asking Users to Rate Items

• Users are explicitly asked to provide ratings for items they have interacted with.

• Example:

o Movie ratings: Websites like Netflix ask users to rate movies on a scale of 1 to 5
stars.

o Product ratings: Amazon allows customers to rate products they purchase.

o Content platforms: YouTube and news sites often ask users to rate videos or
articles.

Limitations:

• Low response rates: Many users do not bother to provide ratings.

• Bias: Ratings are typically provided only by users who are motivated, which might
not represent the general population.

2. Inferring Ratings from User Behavior

• User behavior, such as purchases, views, or clicks, is analyzed to infer their


preferences.

• Example:

o Purchase data: If a user buys a product, it is inferred that they "like" it. Such
an interaction might be recorded as a "1" in the utility matrix.

o Viewing data: On YouTube, watching a video can indicate a preference.

o Article reading: If a user reads a news article, they may be considered


interested in the topic.

Features of Inferred Ratings:

• This type of matrix often contains binary ratings:


o "1" means the user interacted with or liked the item.

o A blank or "0" indicates no interaction, which may not necessarily mean


dislike but rather no data.

Challenges in Populating the Utility Matrix

1. Sparse Data:

o Utility matrices are typically sparse because users interact with only a small subset of
the available items.

o For example, an online retailer may have thousands of products, but each user
interacts with only a handful.

2. Cold Start Problem:

o New users or items lack sufficient interaction data, making it difficult to populate their
rows or columns in the utility matrix.

3. Implicit Feedback:

o Inferring preferences from actions like purchases or views can be noisy and less
reliable compared to explicit ratings.

9.2 Content-Based Recommendations

Content-based recommendation systems focus on the attributes of items and user


preferences to generate personalized suggestions. The system learns what a user likes by
analysing the properties of items they have interacted with and recommends similar items
based on their features.

9.2.1 Item Profiles

In a content-based system, we must construct for each item a profile, which is a record or
collection of records representing important characteristics of that item. In simple cases,
the profile consists of some characteristics of the item that are easily discovered.

Each movie is described by attributes like:

• Title: 3 Idiots

• Genre: Comedy, Drama

• Director: Rajkumar Hirani


• Cast: Aamir Khan, Kareena Kapoor, R. Madhavan

• Release Year: 2009

These features form the item profile for the movie.

2. User Profiles

A user profile summarizes the features of the Bollywood movies they like. It is created
by aggregating the attributes of these movies.

• Example: User Profile:

o Likes 3 Idiots and PK:

▪ Preferred Genres: Comedy (60%), Drama (40%)

▪ Preferred Director: Rajkumar Hirani

▪ Preferred Cast: Aamir Khan, Anushka Sharma

3. Recommendation Process

The system matches the user profile with movie profiles using similarity metrics (e.g.,
cosine similarity).

Step-by-Step Example

Scenario: A streaming platform wants to recommend Bollywood movies to a user named


Rahul.

Step 1: Gather Data

• Movies Rahul Likes:

1. 3 Idiots (Comedy, Drama; Aamir Khan, Kareena Kapoor; Rajkumar Hirani).

2. PK (Comedy, Drama; Aamir Khan, Anushka Sharma; Rajkumar Hirani).

• Movies in Catalog:

1. Chhichhore (Comedy, Drama; Sushant Singh Rajput, Shraddha Kapoor; Nitesh Tiwari).

2. Taare Zameen Par (Drama; Aamir Khan, Darsheel Safary; Aamir Khan).

3. Dil Chahta Hai (Comedy, Drama; Aamir Khan, Saif Ali Khan; Farhan Akhtar).

Step 2: Create Movie Profiles


Step 3: Build Rahul’s Profile

Based on the features of 3 Idiots and PK:

Step 4: Measure Similarity

Compare Rahul’s profile with the profiles of other movies using similarity metrics (e.g.,
cosine similarity).

1. Similarity with Chhichhore:

o Genre: Comedy, Drama (full match, high score).

o Cast: No overlap (low score).

o Director: Nitesh Tiwari (no match, low score).

o Overall Score: Medium.

2. Similarity with Taare Zameen Par:

o Genre: Drama (partial match, medium score).

o Cast: Aamir Khan (high match, high score).

o Director: Aamir Khan (no match as director but high affinity due to actor overlap).

o Overall Score: High.

3. Similarity with Dil Chahta Hai:

o Genre: Comedy, Drama (full match, high score).

o Cast: Aamir Khan (high match, high score).


o Director: Farhan Akhtar (no match, low score).

o Overall Score: High.

Step 5: Recommend Movies

Based on similarity scores:

1. Recommend Dil Chahta Hai (high similarity).

2. Recommend Taare Zameen Par (high similarity).

3. Optionally suggest Chhichhore (medium similarity).

Advantages

1. Personalized Suggestions:

o Tailored to Rahul’s preference for Comedy-Drama movies with Aamir Khan.

2. No Dependency on Other Users:

o Recommendations are based solely on Rahul’s preferences.

9.2.2 Discovering Features of Documents

Discovering features of documents is an essential process for recommendation systems that


rely on textual data. Unlike structured data (e.g., movie genres or product specifications),
documents often lack readily available features. This necessitates the extraction of
meaningful attributes to represent the content effectively.

Key Concepts

1. Challenge in Feature Discovery

Documents, such as news articles, blogs, or web pages, often do not have predefined
features like genre or author. Extracting representative features involves identifying
elements that summarize the main topics or themes of the document.

2. Feature Extraction Techniques

a. Stop Words Removal

• Stop words are the most common words in a language (e.g., "and," "the," "is") that do
not provide meaningful information about the content.

• Example:

o Original Text: "The cat is sitting on the mat."


o After Removal: "cat sitting mat."

b. TF-IDF (Term Frequency-Inverse Document Frequency)

• Definition: TF-IDF is a statistical measure that evaluates the importance of a word in


a document relative to a collection of documents (corpus).

• Steps:

1. Term Frequency (TF): Counts how often a word appears in a document.

2. Inverse Document Frequency (IDF): Measures how unique the word is across all
documents.

o Words with high TF-IDF scores are considered important features.

• Example:

o Corpus:

▪ Doc 1: "apple banana apple."

▪ Doc 2: "banana orange."

o TF of "apple" in Doc 1: 2/3.

o IDF of "apple": log(Total Docs / Docs Containing "apple") = log(2/1).

o TF-IDF: Combines these values to highlight "apple" as a key term in Doc 1.

3. Choosing Features

• Top-n Words: Select the top-n words with the highest TF-IDF scores as features.

• Threshold-Based Selection: Include words with TF-IDF scores above a specific


threshold.

Document Representation

After extracting features, documents are represented as vectors or sets of words:

1. Vector Representation:

o Each document becomes a vector in a multi-dimensional space, with dimensions


corresponding to features (words).

o Example:

▪ Features: [apple, banana, orange].

▪ Doc 1 Vector: [2, 1, 0] (frequency of each feature).


2. Similarity Measures:

o Jaccard Distance: Measures overlap between sets of words.

o Cosine Similarity: Compares the angle between vectors to assess their similarity.

o Example: Two articles about "sports" might have a high cosine similarity due to shared
keywords like "match," "score," and "team."

Applications

1. News Platforms: Recommend articles on similar topics.

o Example: Suggesting "Renewable Energy Initiatives" to readers of "Climate Change


Policies."

2. Educational Content: Recommend related research papers or books based on shared


keywords.

3. E-Commerce: Match product descriptions (documents) with user search terms.

Advantages

1. Automated Feature Discovery: Reduces manual effort in tagging or categorizing


documents.

2. Scalable: Efficient for processing large document collections.

Challenges

1. Semantic Understanding:

o TF-IDF does not capture the meaning of words or phrases.

o Solution: Use advanced methods like word embeddings (e.g., Word2Vec, BERT).

2. Stop Words Context:

o Removing stop words may lose context (e.g., "not" in "not effective").

o Solution: Manually curate stop word lists for specific domains.

9.2.3 Obtaining Item Features From Tags

In the context of recommendation systems, tags are user-generated labels or keywords that
describe the content or features of an item. Tags provide a valuable way to extract item
features that might not be explicitly available from the item’s metadata (such as genre,
author, etc.). This approach allows recommendation systems to use crowdsourced
information to identify characteristics of items that may not be easily discernible from
traditional analysis methods.

Challenges in Feature Extraction for Items (Example: Images)

For items like images, traditional methods like analyzing pixel data don’t provide much
useful information. A simple image may not convey its meaning, such as whether it is a
picture of Tiananmen Square or a sunset at Malibu, through pixel analysis alone. However,
tags help to bridge this gap by allowing users to describe images in a way that the system
can interpret.

Example: A user tags an image with the word "sunset at Malibu," while another user
tags a different image with "Tiananmen Square." These tags provide important
descriptors of the items (images in this case) that are much more insightful than pixel
analysis.

Using Tags for Recommendation Systems

The use of tags can be an effective approach for discovering item features in
recommendation systems. Websites like del.icio.us (now part of Yahoo) invited users to tag
web pages with descriptive keywords. This system helped users find web pages that
matched their search terms by searching with a specific set of tags. These tags can also be
used in a recommendation system, where if a user frequently bookmarks or retrieves pages
with certain tags, the system can recommend other items that share the same tags.

Example: Del.icio.us: Users who bookmarked pages tagged with “AI,” “machine
learning,” and “data science” could be recommended other pages tagged similarly, thus
enhancing the user’s ability to discover related content.

The Problem with Tagging for Feature Discovery

While tagging can help in discovering item features, its effectiveness depends on user
participation. The success of the system relies on users being willing to tag items and
provide accurate tags. Furthermore, there must be enough tags to ensure that erroneous
or inconsistent tags don’t negatively impact the recommendation system.

Potential Issues:

• Low Participation: If not enough users tag items, there won’t be enough data to make
accurate recommendations.
• Erroneous Tags: Users might tag items incorrectly, which could mislead the
recommendation system.

Innovative Tagging Approaches

One innovative method for encouraging tagging is through game-like mechanisms. A


system like Luis von Ahn's games allows players to collaborate on tagging items. In this
system, two players suggest tags for the same item, and if they agree, they win. This kind
of game-based approach has the potential to engage more users and generate more accurate
and consistent tags.

Limitations of Tagging for Feature Discovery

1. User Effort: Tagging requires active participation from users, which might not always
happen in large quantities.
2. Quality Control: Erroneous or irrelevant tags can distort the feature extraction process.
3. Tagging Coverage: If the tags do not cover all important aspects of the items, the
recommendation system may miss out on critical features.

9.2.4 Representing Item Profiles

In content-based recommendation systems, the primary task is to create item profiles


that represent important characteristics of items. These profiles can be used to measure
similarities between items, which is essential for recommending items that align with user
preferences.

Types of Item Profiles

1. Boolean Vector Representation

For discrete features (e.g., actors, genres, directors), we can represent an item as a vector
of 0s and 1s. Each component corresponds to a specific feature, and the vector is
populated based on whether the item contains that feature.

• Example: Consider a movie as an item:


o Features: Actor (Aamir Khan), Genre (Comedy, Drama), Director (Rajkumar Hirani).
o Movie 1: 3 Idiots
▪ Actor (Aamir Khan) → 1
▪ Genre (Comedy) → 1
▪ Genre (Drama) → 1
▪ Director (Rajkumar Hirani) → 1
o Movie 2: PK
▪ Actor (Aamir Khan) → 1
▪ Genre (Comedy) → 1
▪ Genre (Drama) → 1
▪ Director (Rajkumar Hirani) → 1

Each movie is represented as a vector with 1s for features it contains and 0s for features
it doesn't. This makes it easy to compute similarities between movies using techniques
like cosine similarity.

2. Numerical Features

Some features cannot be represented as Boolean values (e.g., ratings, screen size, price).
These features should be stored as numerical values within the item profile.

• Example: For a movie, you might include the average rating as a numerical feature:
o Movie 1: 3 Idiots → Average Rating: 4.2
o Movie 2: PK → Average Rating: 4.5

These numerical features are stored as components of the item’s vector, allowing for a
more nuanced comparison between items.

3. Hybrid Feature Representation

An item profile can also contain a combination of Boolean and numerical features, as
many items have both types of features (e.g., movies with actors and ratings).

• Example:
o Movie Profile: [Actor: Aamir Khan → 1, Genre: Comedy → 1, Rating → 4.2]

Building Item Profiles for Different Types of Items

Different items require different features, and those features can be extracted using
various methods:
1. Movies:Features could include actors, directors, genres, and average ratings. These
are typically easy to extract from movie metadata like IMDB or Rotten Tomatoes.
2. Books: Author, publication year, genre, and average ratings can serve as features
for books.
3. Products (E-commerce): Features for products could include brand, category, price,
and user reviews.
4. Documents: For documents, keywords or important terms (often derived from TF-
IDF) can form the features for an item profile. For example, news articles might use
terms such as "politics," "economy," or "health" to describe the subject matter.

Example: Movie Item Profiles

• Movie 1: 3 Idiots:
o Features: Aamir Khan, Rajkumar Hirani, Comedy, Drama, 4.2 Average Rating.
o Profile: [1, 1, 1, 0, 1, 4.2]
• Movie 2: PK:
o Features: Aamir Khan, Rajkumar Hirani, Comedy, Drama, 4.5 Average Rating.
o Profile: [1, 1, 1, 0, 1, 4.5]

These profiles can be compared using similarity measures to recommend items with
similar profiles.

Computing Similarity Between Items: To recommend similar items, cosine similarity is


often used. The cosine similarity measures the cosine of the angle between two vectors (item
profiles). A higher cosine value indicates that the two items are more similar.

Cosine Similarity Formula:

Challenges in Representing Item Profiles

• Data Sparsity: Some items may lack enough features (e.g., unknown movies or
products).
• Feature Selection: Identifying the right set of features to represent each item can be
challenging, especially when dealing with large datasets or heterogeneous item types.
• Normalization: Different scales for numerical features (like ratings or prices) may
require normalization to ensure they are equally weighted in similarity calculations.

9.2.5 User Profiles

User profiles are critical in content-based recommendation systems as they help represent
a user's preferences for items, which is essential for recommending relevant items to the
user. In a content-based system, we aggregate the characteristics or features of the items
that a user has interacted with to create a personalized profile that describes the user's
tastes.

Creating User Profiles

User profiles are typically constructed using data from the utility matrix, which contains
known information about the degree to which users like certain items. This matrix is often
sparse, meaning most of the entries are unknown, but it provides a useful framework for
constructing user profiles.

Steps to Create a User Profile:

1. Utility Matrix Representation: Each user’s preferences for items are represented in
the utility matrix, where the entries can be binary (e.g., 1 for purchased or liked, 0 for
not interacted) or ratings (e.g., a 1-5 scale indicating how much a user likes an item).
o Example: User 1 likes Movie A (rating = 4) and Movie B (rating = 5), but has not rated
Movie C.
2. Aggregating Item Profiles: To form a user profile, you aggregate the profiles of the
items that the user has interacted with (rated or liked).
o Boolean Profiles: If the utility matrix has binary data (e.g., 1 for liked items), the user
profile is typically the average of the item profiles for the items the user likes.
▪ Example: If a user likes 20% of the movies with Julia Roberts, their profile will have a
0.2 in the component for Julia Roberts.
3. Non-Binary Data (Ratings): For non-binary data (ratings), the item profiles are
weighted by the user’s ratings.
o The user profile is calculated by normalizing the ratings (subtracting the user’s average
rating to emphasize deviations from their average).
▪ Example: If User 1 gives an average rating of 3 and rates three movies with Julia Roberts
as 3, 4, and 5, their profile for Julia Roberts would be the average of (3-3), (4-3), and (5-
3), which equals 1.
▪ Conversely, for another user, say User 2, if their average rating is 4 and their ratings
for Julia Roberts movies are 2, 3, and 5, their profile would be the average of (2-4), (3-
4), and (5-4), which equals -2/3.
4. Profile Representation: The user profile is often represented as a vector containing
values for each feature (e.g., actor, director, genre).
o Example:
▪ If the features are "Julia Roberts," "Action," and "Drama," and the user likes movies with
Julia Roberts in them and favors the "Action" genre, their profile might look like:
▪ [0.2 (Julia Roberts), 0.8 (Action), 0.3 (Drama)].

Similarity between User and Item Profiles: Once user profiles are created, we can
compute the similarity between a user’s profile and the item profiles using cosine
similarity or other distance measures. This helps to estimate which items the user will
like based on their preferences.

Advantages of User Profiles

1. Personalization: User profiles allow for personalized recommendations based on the


specific preferences and behaviors of each individual user.
2. Content-Based: The system can recommend items based solely on their content
(features) and the user’s past interactions with similar items, without needing data from
other users.

Challenges

1. Cold Start Problem: When a user has little or no history of interacting with items, it’s
difficult to build a meaningful profile. This is known as the cold start problem.
o Solution: Asking users to fill out preference questionnaires or using hybrid
recommendation systems that combine content-based and collaborative filtering can
help.
2. Feature Selection: Identifying the right features that represent an item (e.g., genres,
actors, directors) can be difficult, especially for complex items like books or movies with
multiple attributes.
o Solution: Use automated feature extraction techniques such as natural language
processing (NLP) for text-based items.
3. Sparsity: The utility matrix often has many empty or missing entries, making it
challenging to derive accurate user profiles.
o Solution: Use dimensionality reduction techniques or collaborative filtering to fill in the
gaps and improve the accuracy of predictions.

9.2.6 Recommending Items to Users Based on Content

Content-based recommendation systems make suggestions to users by analyzing the


attributes of items and comparing these with the user's preferences. The goal is to
recommend items that are similar to those the user has already shown interest in, based
on shared features.

Key Concepts in Content-Based Recommendations

1. Item Profiles:
o Each item is represented by a set of features. For example, for movies, the profile may
include attributes like: Actors, Director, Genre, Year of release
2. User Profiles:
o User profiles are constructed by aggregating the features of items the user has liked or
interacted with. The idea is to identify what characteristics the user prefers.
3. Cosine Similarity:
o Once the user and item profiles are built, cosine similarity is often used to measure
how similar an item is to the user's preferences. This similarity is calculated by
comparing the vectors representing both the user and the item profiles.

Process of Recommending Items

Step 1: Construct Item Profiles: An item profile consists of features that describe the
item. For example, a movie profile may contain:

▪ Actors: Aamir Khan, Kareena Kapoor


▪ Director: Rajkumar Hirani
▪ Genres: Comedy, Drama

Step 2: Create User Profiles: A user profile aggregates the preferences the user has shown.
If a user watches or rates several movies featuring Aamir Khan, their profile will reflect a
preference for movies with this actor. The user profile will contain feature values
representing their affinity for specific actors, genres, or directors.
Step 3: Calculate Similarity: To recommend an item, the system computes the similarity
between the user’s profile and the profiles of all available items.

o Cosine similarity is used to measure the closeness of vectors

Step 4: Recommend Items: Based on the cosine similarity, the system recommends items
with the highest similarity scores. These items are likely to match the user's preferences,
as they share the most features with the items the user has shown interest in.

Example: Recommending Movies Based on Content

Consider a user who has watched and rated the following movies highly:

• 3 Idiots: A comedy-drama with Aamir Khan.


• PK: A comedy-drama with Aamir Khan and Anushka Sharma.

Step 1: Build Item Profiles: Each movie will have a profile with the following features:

• Actors: Aamir Khan, Kareena Kapoor, Anushka Sharma


• Genres: Comedy, Drama
• Director: Rajkumar Hirani

Step 2: Create User Profile: The user’s profile will aggregate the features of the movies
they have rated:

• Actor Preference: Aamir Khan (strong preference based on repeated appearances in


the movies the user liked).
• Genre Preference: Comedy, Drama.
• Director Preference: Rajkumar Hirani (appears in both movies).

Step 3: Calculate Cosine Similarity: For a new movie, say Dangal (also starring Aamir
Khan and directed by Nitesh Tiwari), its profile is compared to the user’s profile:

• Movie Profile: Aamir Khan, Drama, Nitesh Tiwari.


• The cosine similarity score between the user profile and Dangal will be calculated,
considering shared features like Aamir Khan and the Drama genre.

Step 4: Make Recommendations: If Dangal has a high cosine similarity to the user's
profile, it will be recommended.

Advantages of Content-Based Recommendations


1. Personalization: The system recommends items specifically based on the user’s past
preferences.
2. No Need for User Collaboration: Unlike collaborative filtering, content-based methods
don’t require data from other users. They rely entirely on the properties of items and
the user’s interaction history.
3. Works Well for Niche Users: Content-based systems are particularly effective when
users have specific tastes, as the recommendations are highly personalized.

Challenges in Content-Based Recommendations

1. Cold Start Problem: For new users with no interaction history, it’s difficult to build an
accurate profile.
o Solution: Asking users to explicitly input preferences can help mitigate this problem.
2. Limited Diversity: The system might recommend items that are too similar, leading to
a lack of variety.
o Solution: Introducing some level of randomness or diversity in recommendations can
help.
3. Feature Extraction: Identifying the right features for items can be complex, especially
for unstructured data like images or texts.
o Solution: Use techniques like TF-IDF for documents or deep learning for images to
automatically extract features.

Real-World Examples

1. Netflix: Recommends movies or series based on previously watched content. If a user


has watched many action movies, they are likely to get more action movie
recommendations.
2. Amazon: Suggests products based on the categories or brands a user has previously
viewed or purchased.
3. Spotify: Suggests songs or artists based on a user’s listening history, considering
factors like genre, artist, and song features.

9.2.7 Classification Algorithms

Classification Algorithms in Recommendation Systems

In the context of recommendation systems, classification algorithms are used to predict


the preferences of users based on known data. These algorithms treat the recommendation
task as a machine learning problem, where the system learns from a training set of user-
item interactions and then predicts the rating or preference for all items.

Approach Overview

Instead of using item profiles and utility matrices directly, the recommendation system can
treat the problem as a classification task. For each user, we build a classifier to predict
their ratings for all items. This method relies on training the model on historical data
(known ratings) and using machine learning techniques to make predictions for the
unknown ratings.

Classification Process

1. Training Data: The training set consists of user-item interactions. This data is often
represented in a utility matrix, where rows represent users, columns represent items,
and entries represent ratings (or preferences).
2. Classifiers: Many classifiers can be used for this task, with decision trees being a
common choice for classification in recommendation systems. A decision tree classifies
data based on certain conditions applied to the features of the items.
3. Decision Trees: A decision tree is a collection of nodes arranged in a tree-like structure.
The internal nodes represent conditions on item features (e.g., whether an item belongs
to a certain genre), and the leaves represent decisions, which in the case of
recommendations are either "likes" or "dislikes" (or in more complex systems, ratings).
o How Decision Trees Work:
▪ Start at the root node and apply the condition (predicate) to the item.
▪ Depending on whether the condition is true or false, move to the left or right child node.
▪ Repeat this process until a leaf is reached, which will provide the classification (e.g.,
whether the user likes the item).
▪ Example: A decision tree for recommending movies could start by checking if the genre
is “comedy.” If true, move to one branch; if false, move to another branch that might
check the director or actor.
4. Building the Tree:
o The process of constructing a decision tree involves selecting the best predicates
(conditions) that divide the items into positive (liked) and negative (disliked) examples.
o Various techniques, such as Gini impurity or entropy measures, can be used to
evaluate the quality of a predicate.
5. Prediction:
o Once the decision tree is built for a user, it can predict whether the user will like or
dislike an item based on the features of that item.

Challenges and Considerations

1. Large Number of Features: In real-world applications, items often have many features,
such as actors, genres, directors, etc. Selecting the most relevant features for
classification is crucial to avoid overfitting.
2. Overfitting: Decision trees can become very complex and may overfit the data, meaning
they perform well on the training data but poorly on unseen data. To mitigate this,
techniques like pruning or using ensemble methods (e.g., Random Forests) can be
employed.
3. Scalability: Constructing a separate classifier for each user may not scale well when
dealing with a large number of users. Techniques like ensemble learning or combining
decision trees can be used to improve scalability.

Example of a Decision Tree for Movie Recommendations: Suppose we have a user who
generally likes action movies but dislikes movies with certain actors. The features of the
items (movies) might include:

• Genre: Action, Comedy, Drama


• Actor: Tom Cruise, Aamir Khan, Brad Pitt
• Director: Christopher Nolan, Rajkumar Hirani

A possible decision tree could look like this:

• Root node: Is the movie “Action” genre?


o If yes, go to the left child.
o If no, go to the right child.
• Left child: Does the movie feature Tom Cruise?
o If yes, recommend the movie (positive leaf).
o If no, check for other actors or directors.
• Right child: Is the movie directed by Rajkumar Hirani?
o If yes, recommend the movie.

This decision tree classifies movies as either "liked" or "not liked" based on the user’s
preferences for genres, actors, and directors.
Exercise 9.2.1: Computing Cosine Distances Between Vectors

Given three computers (A, B, C) with numerical features, we are tasked with calculating
the cosine similarity between pairs of computers. The features are:

(a) Compute the cosines in terms of α and β

The cosine of the angle between two vectors A and B is given by:
Let's break down the steps to find the angle between the vectors for three computers (A,
B, and C) with different scale factors for α (disk size) and β (main memory size). The
process involves calculating the cosine similarity and then using that to compute the
angle between the vectors.
Exercise 9.2.2 : An alternative way of scaling components of a vector is to begin by
normalizing the vectors. That is, compute the average for each component and subtract it
from that component’s value in each of the vectors.
(a) Normalize the vectors for the three computers described in Exercise 9.2.1.
Final Answer:

Normalized Ratings:

A: 0.33

B: -1.67

C: 1.33

User Profile:

Processor Speed: 0.42

Disk Size: 481.8

Main Memory Size: 3.28

You might also like