Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views13 pages

RS LVC 3 Post-Session Summary

The document discusses advanced techniques in recommendation systems, focusing on combining multiple dimensions such as measurements, content features, and time dynamics to solve complex problems. It illustrates the application of matrix estimation and synthetic control methods through examples like the impact of terrorism on the Basque economy and California's tobacco control program. The final module integrates these concepts into a comprehensive model that accounts for time-varying tensors and their relationships across different measurements.

Uploaded by

2100312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

RS LVC 3 Post-Session Summary

The document discusses advanced techniques in recommendation systems, focusing on combining multiple dimensions such as measurements, content features, and time dynamics to solve complex problems. It illustrates the application of matrix estimation and synthetic control methods through examples like the impact of terrorism on the Basque economy and California's tobacco control program. The final module integrates these concepts into a comprehensive model that accounts for time-varying tensors and their relationships across different measurements.

Uploaded by

2100312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Recommendation Systems

LVC 3: Recommendation Systems Part 3

In the first lecture, the discussion was around the basics of recommendation systems and their
applications in different domains. Later, we defined the problem statement for recommendation
systems and discussed two types of solutions: Averaging, where we assume that every user/item is
the same, and Content-Based where we use additional information/features about users and/or
items.

In the second lecture, we discussed more complex techniques to solve the problem of
recommendation systems: Clustering, Collaborative Filtering, Singular Value Thresholding, and
Optimization using Alternate Least Squares.

Now, in this final lecture, we are going to use the algorithms we have learned so far as building
blocks to come up with algorithms that can be used to solve even more complex problems. In
general, problems in recommendations systems have three dimensions:

1. Multiple Measurements: Data for observed preferences


2. Content or Exogenous features: Features of users/items
3. Dynamics: Time-varying aspect

So far we have only touched upon the second dimension in our problems, but in this lecture, we will
try to combine all three dimensions.

The main idea of this lecture is to demonstrate that the problem of recommendation systems and
their techniques can be used to solve complex machine learning problems and are not limited to
just providing recommendations. Before discussing algorithms, let’s go through some examples to
establish this idea:

● Did Terrorism have an impact on the Economy of Basque Country?

The Basque Country is a region in northern Spain. It was affected by terrorism from the mid to late
70s and it also experienced a decline in the per-capita GDP income around the same time. The
question is whether this decline was due to that terrorism or due to some other factors?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 1
To answer this question, we need to come up with a counterfactual
prediction, or a synthetic prediction, of the per-capita GDP of the Basque
Country, i.e., the GDP if there would have been no terrorism. In the below graph, the dark line shows
the actual per-capita GDP of Basque Country and the dotted line shows the synthetic prediction.

These synthetic predictions can be generated by simply using the Matrix Estimations that were
discussed in previous lectures. Before discussing how to come up with these synthetic predictions,
let’s go through one more example:

● Did the California tobacco control program (Prop 99) work?

In 1988, California voters enacted Proposition 99, increasing the tax on cigarettes by 25 cents per
pack on the sale of tobacco cigarettes within California. The per-capita cigarette sales declined after
this proposition. The question is whether this decline in the sales is due to the proposition or some
other factors?

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Now, how do we find these predictions?

We will use Matrix Estimation to determine the synthetic predictions. The idea is similar to what we
have learned in Collaborative Filtering: find users and items similar to a given user, item, and then
averaging (simple or weighted) amongst user-item specific similar users, items.

In the first example, we can observe the per-capita GDP of different regions in Spain or Europe which
were not affected by terrorism, similarly for the second example, we can observe the per-capita
cigarettes sales where Proposition 99 was not introduced. We will call this main factor (terrorism or
Proposition 99) an intervention.

In the above image, time 𝑇0 is the time when the intervention is introduced, the target is the main
user for which we need to find predictions (California or Basque Country), and donors are similar
users.

We will observe the pre-intervention data, find the similarity of each donor with the target, and use it
to create synthetic post-intervention data. This method is called synthetic control.

There are many more applications where the technique of matrix estimation can be used to solve
such complex time series problems. We can even use this technique to predict scores in an
ongoing game. In such applications, the choice of 𝑇0 matters greatly. Let’s look at an example of
Forecasting Cricket Trajectory in an India vs Australia game from 2011.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
In a cricket game, there are 300 balls bowled by each team. The below
images show the prediction of Australia’s score when 210 balls have been
bowled, i.e., 70% of Australia’s innings is finished. The prediction is pretty accurate, the lines for
predicted and actual scores are almost overlapping.

But what if we change the 𝑇0 from 70% to 60% i.e. 180 balls?

The above images show that the predictions change and some gap between the actual and predicted
score can be observed. This is because a crucial moment happened at this point in the game
whichresulted in Australia’s lower score than predicted. This implies that an important breakthrough

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
happened around this time that affected the outcome of the game. Hence,
this method can be used to find highlights or breakthroughs in a game or
history.

Now that we have an understanding that recommendation system techniques can be used to solve
other complex problems, let’s go through the first module of this lecture.

Matrix Estimations & Content-Based Recommendations


In previous lectures, we discussed the prediction problem of recommendation systems: complete the
matrix where we need to find the likelihood of user 𝑖 matching with item 𝑗, i.e., 𝐿𝑖 𝑗 .

In the content-based model, we can do this using observed features of user 𝑖, denoted by 𝑥𝑖 , and
item 𝑗, denoted by 𝑦𝑗. The below image shows that the problem of estimating 𝐿𝑖 𝑗 is reduced to
learning a model 𝑓 which is a function of 𝑥𝑖 and 𝑦𝑗. The learned model can be a simple regression
model or classification model depending on the target values.

In matrix estimations, we assume that there are latent features of user 𝑖, denoted by 𝑢𝑖, and item 𝑗,
denoted by 𝑣𝑗, and 𝐿𝑖 𝑗 is a function of those latent features.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
While using one algorithm - content-based or matrix estimation, we lose information about the other
set of features: latent or observed. What if we can combine these two methods?

We can take into account the function of latent features, 𝑢𝑖 and 𝑣𝑗, as well as observed features, 𝑥𝑖
and 𝑦𝑗 , and combine them to get a more accurate estimation of 𝐿𝑖 𝑗.

The algorithm can be given in three steps as follows:

● Step 1: Content-based supervised learning


○ Learn the regressor (or classifier) using the observed features 𝑓𝑜𝑏𝑠 such that

𝑜𝑏𝑠
𝐿𝑖 𝑗 = 𝑓𝑜𝑏𝑠(𝑥𝑖 , 𝑦𝑗)

● Step 2: Matrix estimation


𝑑𝑖𝑓𝑓 𝑜𝑏𝑠
○ Compute “difference” matrix 𝐿𝑖 𝑗 = 𝐿𝑖 𝑗 − 𝐿𝑖 𝑗 , over observed entries

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
𝑑𝑖𝑓𝑓 𝑀𝐸
○ Use matrix estimation on 𝐿𝑖 𝑗 to produce 𝐿

𝑜𝑏𝑠 𝑀𝐸
● Step 3: Combine the final estimates 𝐿𝑖 𝑗 =𝐿𝑖 𝑗 +𝐿𝑖 𝑗

Now, let’s move to the second module of this lecture that takes into account the time-varying factor
while performing matrix estimation.

Matrix Estimation Across Time


So far we have only considered the completion matrix for a single instance in time but, in reality, time
is an important factor as people’s preferences or tastes change over time. For example, a successful
movie from the 90s might not get very good reviews today.

So, the problem reduces to estimating time-varying matrices where latent features are time-varying,
and observations are partial & noisy.

There are multiple time series, one for each entry i.e. each 𝐿𝑖 𝑗(𝑡) is a time series, indexed by 𝑡. So, if
there are 𝑁 users and 𝑀 items, then there are 𝑁 × 𝑀 time series. We assume that each time series is
explained by a series of latent features.
𝑇
𝐿𝑖 𝑗(𝑡) = 𝑢𝑖(𝑡) 𝑣𝑗(𝑡)

Where each component of 𝑢𝑖 and 𝑣𝑗 is a structured time-series.

Note: For now, we are assuming that the observed features 𝑥𝑖 and 𝑦𝑗 are unknown or constant over
time.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Since we might not observe entries for each user/item at every instance of
time, the observations are partial and noisy. Also, we don’t know 𝑢𝑖(𝑡) and
𝑣𝑗(𝑡) because they are latent features. So, how do we estimate the whole matrix 𝐿𝑖 𝑗? We will solve
this massive time series by converting it into a recommendation system problem.

First, let’s see how to estimate a single entry of the matrix 𝐿𝑖 𝑗, say 𝐿1,1.

Suppose the following time series represents the true values of 𝐿1,1:

..., 𝑋(− 1), 𝑋(0), 𝑋(1),...., 𝑋(𝑇 − 1), 𝑋(𝑇), 𝑋(𝑇 + 1),...

Where 𝑇 denotes the current time. Out of all these values, there might be some observations that are
missing, some are observed and some we need to forecast.

As per the traditional time series approach, we might need to model all the components of the time
series - seasonality, trend, and residuals. Here, we will ignore all of these, and try to convert this
time series into a matrix estimation problem. How do we do that? We will do this in three steps:

Step 1: Transform the time series into a matrix

We will map the time series to a matrix called the page matrix. We will choose some 𝐿 > 1, say 20,
𝑇
and divide the first 𝑇 values of the time series into 𝐿
equal segments. We can treat each segment as
a single column of the page matrix, denoted by 𝑃.

𝑇
It is a matrix of dimension 𝐿 × 𝐿
whose each entry is given by:
𝑃𝑖 𝑗 = 𝑋(𝑖 + (𝑗 − 1)𝐿)

Step 2: Do matrix estimation on the generated matrix

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Once we do matrix estimation, we can treat each row as a feature and
forecast the time series using simple regression models.

Step 3: Convert the matrix back to a univariate time series. That’s it.

Now, the question is does this really work?

Yes, it does. In practice, it works really well. The below table shows that this algorithm has
out-performed many state-of-the-art time series algorithms in multiple domains.

Remark: ‘MSSA’ stands for Multivariate Singular Spectrum Analysis.

Now that we understand the algorithm for a single entry 𝐿1,1, let’s look at how to extend this to
estimate the whole matrix 𝐿𝑖 𝑗. We will again follow the below three steps:

Step 1: Convert time series for each entry into a (Page) matrix

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
Step 2: Concatenate matrices of all entries, along columns, into a big matrix,
say 𝑍.

𝑇
The concatenated matrix has 𝐿 rows and 𝐿
× 𝑁 × 𝑀 columns, where 𝑁 and 𝑀 are the numbers of
users and items, respectively.

Step 3: Perform matrix estimation over 𝑍.

Once step 3 is completed, we can obtain the predictions by reversing the mapping.

So far we have discussed two dimensions - content and time but separately. Now, let’s move to the
third module of this lecture and combine everything together and also include the third dimension -
multiple measurements.

Everything Together
In real-time, we have more pieces of information, we may have different interactions between users
and items. These different interactions are called measurements. For example, in retail, we may have
measurements like purchased an item or not, browsed an item or not, item added to cart or not,
reviews for an item, etc., and these measurements are related to each other. This multilinear
relationship among many slices of data can be represented by tensors.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
The measurements that are available are changing over time. A time-varying
tensor is a very high-dimensional complex object that changes over time with very partial and noisy
information.

We need to model everything together now, i.e., we need to estimate the time-varying tensor where
each entry is represented as:
𝐿𝑖 𝑗 𝑘(𝑡)

Where 𝑖 for the user, 𝑗 is for the item, 𝑘 is for the measurement, and 𝑡 is for the time.

We will follow the same idea that we discussed in module 1 of this lecture - model using the
observed features as well as latent features and then combine them. The general equation of the
model is given as:
𝑘 𝑘
𝐿𝑖 𝑗 𝑘(𝑡) = 𝑓𝑜𝑏𝑠(𝑥𝑖 , 𝑦𝑗) + 𝑓𝑙𝑎𝑡𝑒𝑛𝑡(𝑢𝑖(𝑡) , 𝑣𝑗(𝑡)), where 𝑢𝑖 , 𝑣𝑗 are time series

The above equation implies that we are modeling each slice of measurement separately using the
observed features that are not changing over time, and the latent features. But each of these slices
cannot be independent, they must have some relation and the model would take it into account. For
𝑘 𝑘
example, the below equations shows that 𝑓𝑜𝑏𝑠 is different for each 𝑘 but 𝑓𝑙𝑎𝑡𝑒𝑛𝑡 has the same features,
only multiplied by different weights for each 𝑘.
𝑘 𝑘 𝑘 𝑘
𝑓𝑜𝑏𝑠(𝑥𝑖 , 𝑦𝑗) = α 𝑥𝑖 + β 𝑦𝑗 + γ
𝑑
𝑘
𝑓𝑙𝑎𝑡𝑒𝑛𝑡(𝑢𝑖(𝑡) , 𝑣𝑗(𝑡) = ∑ 𝑢𝑖𝑙(𝑡)𝑣𝑗𝑙(𝑡)𝑤𝑘𝑙
𝑙=1

So, how do we find 𝐿 i.e. predictions for the matrix 𝐿? The algorithm consists of 5 major steps. Let’s
go through them one by one.

Step 1: Content-based learning

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
For each measurement 𝑘, learn via supervised learning using the observed
features
𝑘 𝑘 𝑘 𝑘
𝑓𝑜𝑏𝑠, that is (α , β , γ )

Since the observed features are independent of time, this would be a good model if things are not
changing over time but most likely, that won’t be the case which takes us to the next step.

Step 2: Obtain difference over observed entries


𝑑𝑖𝑓𝑓 𝑜𝑏𝑠 𝑜𝑏𝑠 𝑘
𝐿𝑖 𝑗 𝑘 (𝑡) = 𝐿𝑖 𝑗 𝑘(𝑡) − 𝐿𝑖 𝑗 𝑘, where 𝐿𝑖 𝑗 𝑘 = 𝑓𝑜𝑏𝑠(𝑥𝑖 , 𝑦𝑗)

Step 3: Build stacked page matrix across entries, slices of tensor


𝑘
As discussed in the second module of this lecture, we can create a page matrix, 𝑍 , for each
measurement 𝑘, from the differences obtained in step 2. Then we can stack all the matrices together
to create a huge single matrix. This is called flattening a tensor in a matrix.

𝑑𝑖𝑓𝑓
Step 4: Perform matrix estimation on the stacked matrix to obtain predictions on 𝐿𝑖 𝑗 𝑘 (𝑡), which is
given as:
𝑑𝑖𝑓𝑓
𝐿𝑖 𝑗 𝑘 (𝑡)

Step 5: Compute the final estimate


𝑑𝑖𝑓𝑓
𝑜𝑏𝑠
𝐿𝑖 𝑗 𝑘(𝑡) = 𝐿𝑖 𝑗 𝑘 (𝑡) + 𝐿𝑖 𝑗 𝑘

In general, the final estimate works well as it takes into account all three dimensions of the problem.
𝑑𝑖𝑓𝑓
Remark: We flattened the tensor in a matrix to estimate 𝐿𝑖 𝑗 𝑘 (𝑡) but in an extremely sparse data
regime, directly estimating the tensor can help. However, this will increase the computational cost
significantly.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Links for Additional Reading

● Original paper on time series predict DB


● Links to get started/explore tspDB -
○ http://tspdb.mit.edu/
○ https://www.powtoon.com/s/fkVh3axA4Jy/1/m
● CricketML
● The original paper on Model Agnostic Time Series Analysis via Matrix Estimation

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 13

You might also like