Thanks to visit codestin.com
Credit goes to github.com

Skip to content

nunomtg/recsys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Matrix Factorization for Recommendation Systems

This project implements some recommendation systems from scratch using pure NumPy, exploring SVD, NMF (Lee & Seung, NNLS-ALS), and SGD-based approaches. The system predicts user ratings using the MovieLens dataset and also visualizes clusters in the latent space.


Key Takeaways

  • Low-Rank Approximation SVD: Requires imputation and performs poorly overall.

    Reconstruction Errors

    The graph shows the final training and test MSE for different rank $k$ approximations.

  • Non-negative Matrix Factorization (NMF): What we want to solve is: $\min _{M, U} \sum_{(i, j) \in \kappa} \left(r_{ij} - M_i U_j^T\right)^2 + \lambda \left(\left\|M_i\right\|^2 + \left\|U_j\right\|^2\right), \quad \text{s.t.} \quad M \geq 0, \, U \geq 0$

    Following Lee and Seung1, the multiplicative update rule, without regularization, for $M$ and $U$ are as follows:

    $M \leftarrow M \cdot \frac{RU}{MU^{\top}U}$

    $U \leftarrow U \cdot \frac{R^{\top}M}{UM^{\top}M}$

    However, since $R$ is a sparse matrix, we need to update each $M_i$ according to existing ratings of the movie $i$. Similarly, we need to update $U_j$ according to existing ratings of the user $j$. Together with a regularization parameter $\lambda$ we have:

    $M_{i,k} \leftarrow M_{i,k} \cdot \frac{\sum_{j \in M_i^*}U_{j,k}\cdot r_{i,j}}{\sum_{j \in M_i^*}U_{j,k}\cdot \hat{r}_{i,j} + \lambda|M_u^*|M_{i,k}}$

    $U_{j,k} \leftarrow U_{j,k} \cdot \frac{\sum_{u \in U_i^*}M_{i,k}\cdot r_{i,j}}{\sum_{u \in U_j^*}M_{i,k}\cdot \hat{r}_{i,j} + \lambda|U_j^*|U_{i,k}}$

    Where:

    • $M_{i,k}$ is the $k^{th}$ latent factor of $M_i$
    • $U_{j,k}$ is the $k^{th}$ latent factor of $U_j$
    • $M_i^*$ is the set of users who rated movie $i$
    • $U_j^*$ is the set of movies rated by user $j$

    NMF Error Comparison

    This image shows the final train and test error with varying $k$.

    This method is definitely more robust and interpretable and also avoids overfitting better than SVD.

  • Non-negative least squares (NNLS-ALS): The idea, as the name suggest, is to rotate between fixing the $M_i$'s and fixing the $U_j$'s. It was put into practice by Bro and De Jong2. The algorithm is essentially:

    Repeat until the convergence criterion is met:

    1. For $i = 1$ to $n$:

      • Compute NNLS for: $\underbrace{\left(\sum_{r_{ij} \in r_{i *}} U_j U_j^{\top} + \lambda I_k \right)}_\text{A} M_i = \underbrace{\sum_{r_{ij} \in r_{i *}} r_{ui} U_j}_\text{b}$
    2. For $j = 1$ to $m$:

      • Compute NNLS for: $\underbrace{\left(\sum_{r_{ij} \in r_{* j}} M_i M_i^{\top} + \lambda I_k \right)}_\text{A} U_j = \underbrace{\sum_{r_{ij} \in r_{* j}} r_{ui} M_i}_\text{b}$

    NNLS Error Comparison

    This image shows the final train and test error with varying $k$.

For more details check the main document.

References

Footnotes

  1. Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In Proceedings of the 13th International Conference on Neural Information Processing Systems, NIPS’00, page 535–541, Cambridge, MA, USA, 2000. MIT Press.

  2. Rasmus Bro and Sijmen de Jong. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics, 11, 1997.

About

Matrix Factorization for Recommendation Systems in pure NumPy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published