Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 81d7766

Browse files
authored
Add recommendation model (tensorflow#4175)
* Add recommendation model * Fix pylints check error * Rename file * Address comments, update input pipeline, and add distribution strategy * Fix import error * Address more comments * Fix lints
1 parent 8507934 commit 81d7766

File tree

7 files changed

+1110
-0
lines changed

7 files changed

+1110
-0
lines changed

official/recommendation/README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Recommendation Model
2+
## Overview
3+
This is an implementation of the Neural Collaborative Filtering (NCF) framework with Neural Matrix Factorization (NeuMF) model as described in the [Neural Collaborative Filtering](https://arxiv.org/abs/1708.05031) paper. Current implementation is based on the code from the authors' [NCF code](https://github.com/hexiangnan/neural_collaborative_filtering) and the Stanford implementation in the [MLPerf Repo](https://github.com/mlperf/reference/tree/master/recommendation/pytorch).
4+
5+
NCF is a general framework for collaborative filtering of recommendations in which a neural network architecture is used to model user-item interactions. Unlike traditional models, NCF does not resort to Matrix Factorization (MF) with an inner product on latent features of users and items. It replaces the inner product with a multi-layer perceptron that can learn an arbitrary function from data.
6+
7+
Two instantiations of NCF are Generalized Matrix Factorization (GMF) and Multi-Layer Perceptron (MLP). GMF applies a linear kernel to model the latent feature interactions, and and MLP uses a nonlinear kernel to learn the interaction function from data. NeuMF is a fused model of GMF and MLP to better model the complex user-item interactions, and unifies the strengths of linearity of MF and non-linearity of MLP for modeling the user-item latent structures. NeuMF allows GMF and MLP to learn separate embeddings, and combines the two models by concatenating their last hidden layer. [neumf_model.py](neumf_model.py) defines the architecture details.
8+
9+
Some abbreviations used the code base include:
10+
- NCF: Neural Collaborative Filtering
11+
- NeuMF: Neural Matrix Factorization
12+
- GMF: Generalized Matrix Factorization
13+
- MLP: Multi-Layer Perceptron
14+
- HR: Hit Ratio (HR)
15+
- NDCG: Normalized Discounted Cumulative Gain
16+
- ml-1m: MovieLens 1 million dataset
17+
- ml-20m: MovieLens 20 million dataset
18+
19+
## Dataset
20+
The [MovieLens datasets](http://files.grouplens.org/datasets/movielens/) are used for model training and evaluation. Specifically, we use two datasets: **ml-1m** (short for MovieLens 1 million) and **ml-20m** (short for MovieLens 20 million).
21+
22+
### ml-1m
23+
ml-1m dataset contains 1,000,209 anonymous ratings of approximately 3,706 movies made by 6,040 users who joined MovieLens in 2000. All ratings are contained in the file "ratings.dat" without header row, and are in the following format:
24+
```
25+
UserID::MovieID::Rating::Timestamp
26+
```
27+
- UserIDs range between 1 and 6040.
28+
- MovieIDs range between 1 and 3952.
29+
- Ratings are made on a 5-star scale (whole-star ratings only).
30+
31+
### ml-20m
32+
ml-20m dataset contains 20,000,263 ratings of 26,744 movies by 138493 users. All ratings are contained in the file "ratings.csv". Each line of this file after the header row represents one rating of one movie by one user, and has the following format:
33+
```
34+
userId,movieId,rating,timestamp
35+
```
36+
- The lines within this file are ordered first by userId, then, within user, by movieId.
37+
- Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).
38+
39+
In both datasets, the timestamp is represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970. Each user has at least 20 ratings.
40+
41+
## Running Code
42+
43+
### Download and preprocess dataset
44+
To download the dataset, please install Pandas package first. Then issue the following command:
45+
```
46+
python data_download.py
47+
```
48+
Arguments:
49+
* `--data_dir`: Directory where to download and save the preprocessed data. By default, it is `/tmp/movielens-data/`.
50+
* `--dataset`: The dataset name to be downloaded and preprocessed. By default, it is `ml-1m`.
51+
52+
Use the `--help` or `-h` flag to get a full list of possible arguments.
53+
54+
Note the ml-20m dataset is large (the rating file is ~500 MB), and it may take several minutes (~10 mins) for data preprocessing.
55+
56+
### Train and evaluate model
57+
To train and evaluate the model, issue the following command:
58+
```
59+
python ncf_main.py
60+
```
61+
Arguments:
62+
* `--model_dir`: Directory to save model training checkpoints. By default, it is `/tmp/ncf/`.
63+
* `--data_dir`: This should be set to the same directory given to the `data_download`'s `data_dir` argument.
64+
* `--dataset`: The dataset name to be downloaded and preprocessed. By default, it is `ml-1m`.
65+
66+
There are other arguments about models and training process. Use the `--help` or `-h` flag to get a full list of possible arguments with detailed descriptions.
67+
68+
## Benchmarks (TODO)
69+
### Training times
70+
### Evaluation results

official/recommendation/__init__.py

Whitespace-only changes.

official/recommendation/constants.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# ==============================================================================
15+
"""NCF Constants."""
16+
17+
TRAIN_RATINGS_FILENAME = 'train-ratings.csv'
18+
TEST_RATINGS_FILENAME = 'test-ratings.csv'
19+
TEST_NEG_FILENAME = 'test-negative.csv'
20+
21+
TRAIN_DATA = 'train_data.csv'
22+
TEST_DATA = 'test_data.csv'
23+
24+
USER = "user_id"
25+
ITEM = "item_id"
26+
RATING = "rating"

0 commit comments

Comments
 (0)