Unit 4 Notes
Unit 4 Notes
1. Analyze User Behavior: Track user interactions (e.g., clicks, purchases, or views) to
understand preferences.
2. Identify Patterns: Use data and algorithms to find correlations between users and
items.
3. Generate Recommendations: Suggest items that align with the user's interests or that
similar users have liked.
Real-World Applications
• System Analysis: The system notices your preference for this genre.
• Recommendation: Netflix suggests similar movies like Crazy Rich Asians or The
Proposal.
There are two main types of recommendation systems, each with a unique approach to
making suggestions:
• Each item is described using its features (e.g., genre, cast, or director for movies).
• The system analyzes what the user has interacted with and finds items with similar
features.
Example
Advantages
Limitations
1. Struggles with the "cold start" problem for new users with no history.
2. Can lead to a narrow focus, recommending only similar items (lack of diversity).
2. Collaborative Filtering Systems: These systems focus on user relationships and use
the preferences of similar users to make recommendations.
How It Works
o Example: If you and another user both rated Inception and The Matrix highly, the
system might recommend Interstellar to you if they liked it.
o Example: If most users who liked Harry Potter also liked Lord of the Rings, the system
might recommend the latter to you.
Advantages
2. Handles the cold start problem for items (but not users).
Limitations
2. Suffers from the sparsity problem: If users interact with only a few items, it’s harder
to find patterns.
Definition: A utility matrix is a data structure that represents the relationship between
two sets of entities, typically users and items, by storing their preferences. It is a
foundational concept in recommendation systems.
Structure
• Values: Represent user preferences for items, such as ratings (e.g., 1–5 stars) or binary
interactions (e.g., 1 for "liked," blank for "not interacted").
Key Features
1. Sparsity: Most entries in the matrix are blank since users interact with only a few items.
2. Prediction Goal: The recommendation system predicts the blank entries to suggest
items.
1. Known Ratings: User 1 gave a 4 to Movie A and a 5 to Movie B.
2. Prediction Task: Predict User 1's rating for Movie C based on patterns in the matrix.
Use in Recommendation Systems: The utility matrix is used in two primary ways:
1. Content-Based Systems: Use item properties (e.g., genres, features) to predict missing
entries.
o Example: If User 1 liked two action movies, predict a high rating for Movie C if it's also
an action movie.
o Example: If User 2 and User 1 rated Movie B similarly, predict User 2's preference for
Movie A based on User 1's rating.
Challenges
1. Sparsity: Most users interact with only a small fraction of available items.
o Solution: Use hybrid models that combine content-based and collaborative filtering
techniques.
Definition: The long tail phenomenon refers to the ability of online platforms to cater to
both popular and niche items, unlike traditional physical stores that focus only on the
most popular items due to space limitations.
Concept
• Popular Items: Represent a small fraction of the total inventory but account for most
sales in physical stores.
• Niche Items: Represent a large fraction of the inventory and contribute collectively to
significant sales online.
• Physical Bookstores: Stock only the top 1,000 bestsellers due to limited shelf space.
1. Increased Diversity: Online platforms can offer users access to items they might not
have considered.
How It Works: Recommendations are based on user browsing history, past purchases, or
product searches.
Examples:
2. Entertainment Platforms: Help users discover new content based on their viewing
or listening history.
How It Works: Systems analyse user preferences and find patterns across similar users.
Examples:
1. Netflix:
o If you watch Stranger Things, it might suggest Dark or The Umbrella Academy.
2. Spotify:
o Suggests playlists or artists based on your listening habits.
Examples:
1. Google News: Recommends articles aligned with topics you frequently read.
2. Flipboard: Suggests stories based on your selected categories and interaction history.
Examples:
Examples:
Example:
How It Works: Combines user data with medical knowledge to make suggestions.
Example:
The section Populating the Utility Matrix focuses on the challenge of filling the utility
matrix with user-item interactions. A utility matrix is central to recommendation systems
as it represents the degree of preference (e.g., ratings) users have for items. This section
discusses two primary approaches for obtaining the data needed to populate this matrix.
• Users are explicitly asked to provide ratings for items they have interacted with.
• Example:
o Movie ratings: Websites like Netflix ask users to rate movies on a scale of 1 to 5
stars.
o Content platforms: YouTube and news sites often ask users to rate videos or
articles.
Limitations:
• Bias: Ratings are typically provided only by users who are motivated, which might
not represent the general population.
• Example:
o Purchase data: If a user buys a product, it is inferred that they "like" it. Such
an interaction might be recorded as a "1" in the utility matrix.
1. Sparse Data:
o Utility matrices are typically sparse because users interact with only a small subset of
the available items.
o For example, an online retailer may have thousands of products, but each user
interacts with only a handful.
o New users or items lack sufficient interaction data, making it difficult to populate their
rows or columns in the utility matrix.
3. Implicit Feedback:
o Inferring preferences from actions like purchases or views can be noisy and less
reliable compared to explicit ratings.
In a content-based system, we must construct for each item a profile, which is a record or
collection of records representing important characteristics of that item. In simple cases,
the profile consists of some characteristics of the item that are easily discovered.
• Title: 3 Idiots
2. User Profiles
A user profile summarizes the features of the Bollywood movies they like. It is created
by aggregating the attributes of these movies.
3. Recommendation Process
The system matches the user profile with movie profiles using similarity metrics (e.g.,
cosine similarity).
Step-by-Step Example
• Movies in Catalog:
1. Chhichhore (Comedy, Drama; Sushant Singh Rajput, Shraddha Kapoor; Nitesh Tiwari).
2. Taare Zameen Par (Drama; Aamir Khan, Darsheel Safary; Aamir Khan).
3. Dil Chahta Hai (Comedy, Drama; Aamir Khan, Saif Ali Khan; Farhan Akhtar).
Compare Rahul’s profile with the profiles of other movies using similarity metrics (e.g.,
cosine similarity).
o Director: Aamir Khan (no match as director but high affinity due to actor overlap).
Advantages
1. Personalized Suggestions:
Key Concepts
Documents, such as news articles, blogs, or web pages, often do not have predefined
features like genre or author. Extracting representative features involves identifying
elements that summarize the main topics or themes of the document.
• Stop words are the most common words in a language (e.g., "and," "the," "is") that do
not provide meaningful information about the content.
• Example:
• Steps:
2. Inverse Document Frequency (IDF): Measures how unique the word is across all
documents.
• Example:
o Corpus:
3. Choosing Features
• Top-n Words: Select the top-n words with the highest TF-IDF scores as features.
Document Representation
1. Vector Representation:
o Example:
o Cosine Similarity: Compares the angle between vectors to assess their similarity.
o Example: Two articles about "sports" might have a high cosine similarity due to shared
keywords like "match," "score," and "team."
Applications
Advantages
Challenges
1. Semantic Understanding:
o Solution: Use advanced methods like word embeddings (e.g., Word2Vec, BERT).
o Removing stop words may lose context (e.g., "not" in "not effective").
In the context of recommendation systems, tags are user-generated labels or keywords that
describe the content or features of an item. Tags provide a valuable way to extract item
features that might not be explicitly available from the item’s metadata (such as genre,
author, etc.). This approach allows recommendation systems to use crowdsourced
information to identify characteristics of items that may not be easily discernible from
traditional analysis methods.
For items like images, traditional methods like analyzing pixel data don’t provide much
useful information. A simple image may not convey its meaning, such as whether it is a
picture of Tiananmen Square or a sunset at Malibu, through pixel analysis alone. However,
tags help to bridge this gap by allowing users to describe images in a way that the system
can interpret.
Example: A user tags an image with the word "sunset at Malibu," while another user
tags a different image with "Tiananmen Square." These tags provide important
descriptors of the items (images in this case) that are much more insightful than pixel
analysis.
The use of tags can be an effective approach for discovering item features in
recommendation systems. Websites like del.icio.us (now part of Yahoo) invited users to tag
web pages with descriptive keywords. This system helped users find web pages that
matched their search terms by searching with a specific set of tags. These tags can also be
used in a recommendation system, where if a user frequently bookmarks or retrieves pages
with certain tags, the system can recommend other items that share the same tags.
Example: Del.icio.us: Users who bookmarked pages tagged with “AI,” “machine
learning,” and “data science” could be recommended other pages tagged similarly, thus
enhancing the user’s ability to discover related content.
While tagging can help in discovering item features, its effectiveness depends on user
participation. The success of the system relies on users being willing to tag items and
provide accurate tags. Furthermore, there must be enough tags to ensure that erroneous
or inconsistent tags don’t negatively impact the recommendation system.
Potential Issues:
• Low Participation: If not enough users tag items, there won’t be enough data to make
accurate recommendations.
• Erroneous Tags: Users might tag items incorrectly, which could mislead the
recommendation system.
1. User Effort: Tagging requires active participation from users, which might not always
happen in large quantities.
2. Quality Control: Erroneous or irrelevant tags can distort the feature extraction process.
3. Tagging Coverage: If the tags do not cover all important aspects of the items, the
recommendation system may miss out on critical features.
For discrete features (e.g., actors, genres, directors), we can represent an item as a vector
of 0s and 1s. Each component corresponds to a specific feature, and the vector is
populated based on whether the item contains that feature.
Each movie is represented as a vector with 1s for features it contains and 0s for features
it doesn't. This makes it easy to compute similarities between movies using techniques
like cosine similarity.
2. Numerical Features
Some features cannot be represented as Boolean values (e.g., ratings, screen size, price).
These features should be stored as numerical values within the item profile.
• Example: For a movie, you might include the average rating as a numerical feature:
o Movie 1: 3 Idiots → Average Rating: 4.2
o Movie 2: PK → Average Rating: 4.5
These numerical features are stored as components of the item’s vector, allowing for a
more nuanced comparison between items.
An item profile can also contain a combination of Boolean and numerical features, as
many items have both types of features (e.g., movies with actors and ratings).
• Example:
o Movie Profile: [Actor: Aamir Khan → 1, Genre: Comedy → 1, Rating → 4.2]
Different items require different features, and those features can be extracted using
various methods:
1. Movies:Features could include actors, directors, genres, and average ratings. These
are typically easy to extract from movie metadata like IMDB or Rotten Tomatoes.
2. Books: Author, publication year, genre, and average ratings can serve as features
for books.
3. Products (E-commerce): Features for products could include brand, category, price,
and user reviews.
4. Documents: For documents, keywords or important terms (often derived from TF-
IDF) can form the features for an item profile. For example, news articles might use
terms such as "politics," "economy," or "health" to describe the subject matter.
• Movie 1: 3 Idiots:
o Features: Aamir Khan, Rajkumar Hirani, Comedy, Drama, 4.2 Average Rating.
o Profile: [1, 1, 1, 0, 1, 4.2]
• Movie 2: PK:
o Features: Aamir Khan, Rajkumar Hirani, Comedy, Drama, 4.5 Average Rating.
o Profile: [1, 1, 1, 0, 1, 4.5]
These profiles can be compared using similarity measures to recommend items with
similar profiles.
• Data Sparsity: Some items may lack enough features (e.g., unknown movies or
products).
• Feature Selection: Identifying the right set of features to represent each item can be
challenging, especially when dealing with large datasets or heterogeneous item types.
• Normalization: Different scales for numerical features (like ratings or prices) may
require normalization to ensure they are equally weighted in similarity calculations.
User profiles are critical in content-based recommendation systems as they help represent
a user's preferences for items, which is essential for recommending relevant items to the
user. In a content-based system, we aggregate the characteristics or features of the items
that a user has interacted with to create a personalized profile that describes the user's
tastes.
User profiles are typically constructed using data from the utility matrix, which contains
known information about the degree to which users like certain items. This matrix is often
sparse, meaning most of the entries are unknown, but it provides a useful framework for
constructing user profiles.
1. Utility Matrix Representation: Each user’s preferences for items are represented in
the utility matrix, where the entries can be binary (e.g., 1 for purchased or liked, 0 for
not interacted) or ratings (e.g., a 1-5 scale indicating how much a user likes an item).
o Example: User 1 likes Movie A (rating = 4) and Movie B (rating = 5), but has not rated
Movie C.
2. Aggregating Item Profiles: To form a user profile, you aggregate the profiles of the
items that the user has interacted with (rated or liked).
o Boolean Profiles: If the utility matrix has binary data (e.g., 1 for liked items), the user
profile is typically the average of the item profiles for the items the user likes.
▪ Example: If a user likes 20% of the movies with Julia Roberts, their profile will have a
0.2 in the component for Julia Roberts.
3. Non-Binary Data (Ratings): For non-binary data (ratings), the item profiles are
weighted by the user’s ratings.
o The user profile is calculated by normalizing the ratings (subtracting the user’s average
rating to emphasize deviations from their average).
▪ Example: If User 1 gives an average rating of 3 and rates three movies with Julia Roberts
as 3, 4, and 5, their profile for Julia Roberts would be the average of (3-3), (4-3), and (5-
3), which equals 1.
▪ Conversely, for another user, say User 2, if their average rating is 4 and their ratings
for Julia Roberts movies are 2, 3, and 5, their profile would be the average of (2-4), (3-
4), and (5-4), which equals -2/3.
4. Profile Representation: The user profile is often represented as a vector containing
values for each feature (e.g., actor, director, genre).
o Example:
▪ If the features are "Julia Roberts," "Action," and "Drama," and the user likes movies with
Julia Roberts in them and favors the "Action" genre, their profile might look like:
▪ [0.2 (Julia Roberts), 0.8 (Action), 0.3 (Drama)].
Similarity between User and Item Profiles: Once user profiles are created, we can
compute the similarity between a user’s profile and the item profiles using cosine
similarity or other distance measures. This helps to estimate which items the user will
like based on their preferences.
Challenges
1. Cold Start Problem: When a user has little or no history of interacting with items, it’s
difficult to build a meaningful profile. This is known as the cold start problem.
o Solution: Asking users to fill out preference questionnaires or using hybrid
recommendation systems that combine content-based and collaborative filtering can
help.
2. Feature Selection: Identifying the right features that represent an item (e.g., genres,
actors, directors) can be difficult, especially for complex items like books or movies with
multiple attributes.
o Solution: Use automated feature extraction techniques such as natural language
processing (NLP) for text-based items.
3. Sparsity: The utility matrix often has many empty or missing entries, making it
challenging to derive accurate user profiles.
o Solution: Use dimensionality reduction techniques or collaborative filtering to fill in the
gaps and improve the accuracy of predictions.
1. Item Profiles:
o Each item is represented by a set of features. For example, for movies, the profile may
include attributes like: Actors, Director, Genre, Year of release
2. User Profiles:
o User profiles are constructed by aggregating the features of items the user has liked or
interacted with. The idea is to identify what characteristics the user prefers.
3. Cosine Similarity:
o Once the user and item profiles are built, cosine similarity is often used to measure
how similar an item is to the user's preferences. This similarity is calculated by
comparing the vectors representing both the user and the item profiles.
Step 1: Construct Item Profiles: An item profile consists of features that describe the
item. For example, a movie profile may contain:
Step 2: Create User Profiles: A user profile aggregates the preferences the user has shown.
If a user watches or rates several movies featuring Aamir Khan, their profile will reflect a
preference for movies with this actor. The user profile will contain feature values
representing their affinity for specific actors, genres, or directors.
Step 3: Calculate Similarity: To recommend an item, the system computes the similarity
between the user’s profile and the profiles of all available items.
Step 4: Recommend Items: Based on the cosine similarity, the system recommends items
with the highest similarity scores. These items are likely to match the user's preferences,
as they share the most features with the items the user has shown interest in.
Consider a user who has watched and rated the following movies highly:
Step 1: Build Item Profiles: Each movie will have a profile with the following features:
Step 2: Create User Profile: The user’s profile will aggregate the features of the movies
they have rated:
Step 3: Calculate Cosine Similarity: For a new movie, say Dangal (also starring Aamir
Khan and directed by Nitesh Tiwari), its profile is compared to the user’s profile:
Step 4: Make Recommendations: If Dangal has a high cosine similarity to the user's
profile, it will be recommended.
1. Cold Start Problem: For new users with no interaction history, it’s difficult to build an
accurate profile.
o Solution: Asking users to explicitly input preferences can help mitigate this problem.
2. Limited Diversity: The system might recommend items that are too similar, leading to
a lack of variety.
o Solution: Introducing some level of randomness or diversity in recommendations can
help.
3. Feature Extraction: Identifying the right features for items can be complex, especially
for unstructured data like images or texts.
o Solution: Use techniques like TF-IDF for documents or deep learning for images to
automatically extract features.
Real-World Examples
Approach Overview
Instead of using item profiles and utility matrices directly, the recommendation system can
treat the problem as a classification task. For each user, we build a classifier to predict
their ratings for all items. This method relies on training the model on historical data
(known ratings) and using machine learning techniques to make predictions for the
unknown ratings.
Classification Process
1. Training Data: The training set consists of user-item interactions. This data is often
represented in a utility matrix, where rows represent users, columns represent items,
and entries represent ratings (or preferences).
2. Classifiers: Many classifiers can be used for this task, with decision trees being a
common choice for classification in recommendation systems. A decision tree classifies
data based on certain conditions applied to the features of the items.
3. Decision Trees: A decision tree is a collection of nodes arranged in a tree-like structure.
The internal nodes represent conditions on item features (e.g., whether an item belongs
to a certain genre), and the leaves represent decisions, which in the case of
recommendations are either "likes" or "dislikes" (or in more complex systems, ratings).
o How Decision Trees Work:
▪ Start at the root node and apply the condition (predicate) to the item.
▪ Depending on whether the condition is true or false, move to the left or right child node.
▪ Repeat this process until a leaf is reached, which will provide the classification (e.g.,
whether the user likes the item).
▪ Example: A decision tree for recommending movies could start by checking if the genre
is “comedy.” If true, move to one branch; if false, move to another branch that might
check the director or actor.
4. Building the Tree:
o The process of constructing a decision tree involves selecting the best predicates
(conditions) that divide the items into positive (liked) and negative (disliked) examples.
o Various techniques, such as Gini impurity or entropy measures, can be used to
evaluate the quality of a predicate.
5. Prediction:
o Once the decision tree is built for a user, it can predict whether the user will like or
dislike an item based on the features of that item.
1. Large Number of Features: In real-world applications, items often have many features,
such as actors, genres, directors, etc. Selecting the most relevant features for
classification is crucial to avoid overfitting.
2. Overfitting: Decision trees can become very complex and may overfit the data, meaning
they perform well on the training data but poorly on unseen data. To mitigate this,
techniques like pruning or using ensemble methods (e.g., Random Forests) can be
employed.
3. Scalability: Constructing a separate classifier for each user may not scale well when
dealing with a large number of users. Techniques like ensemble learning or combining
decision trees can be used to improve scalability.
Example of a Decision Tree for Movie Recommendations: Suppose we have a user who
generally likes action movies but dislikes movies with certain actors. The features of the
items (movies) might include:
This decision tree classifies movies as either "liked" or "not liked" based on the user’s
preferences for genres, actors, and directors.
Exercise 9.2.1: Computing Cosine Distances Between Vectors
Given three computers (A, B, C) with numerical features, we are tasked with calculating
the cosine similarity between pairs of computers. The features are:
The cosine of the angle between two vectors A and B is given by:
Let's break down the steps to find the angle between the vectors for three computers (A,
B, and C) with different scale factors for α (disk size) and β (main memory size). The
process involves calculating the cosine similarity and then using that to compute the
angle between the vectors.
Exercise 9.2.2 : An alternative way of scaling components of a vector is to begin by
normalizing the vectors. That is, compute the average for each component and subtract it
from that component’s value in each of the vectors.
(a) Normalize the vectors for the three computers described in Exercise 9.2.1.
Final Answer:
Normalized Ratings:
A: 0.33
B: -1.67
C: 1.33
User Profile: