Pyreclab is a recommendation library designed for training recommendation models with a friendly and easy-to-use interface, keeping a good performance in memory and CPU usage.
In order to achieve this, Pyreclab is built as a Python module to give a friendly access to its algorithms and it is completely developed in C++ to avoid the lack of performace of the interpreted languages.
At this moment, the following recommendation algorithms are supported:
Rating Prediction
- User Avgerage
- Item Average
- Slope One
- User Based KNN
- Item Based KNN
- Funk's SVD
Item Recommendation
- Most Popular
Although Pyreclab can be compiled on most popular operating system, it has been tested on the following distributions.
| Operating System | Version |
|---|---|
| Ubuntu | 16.04 |
| CentOS | 6.4 |
| Mac OS X | 10.11 ( El Capitan ) |
| Mac OS X | 10.12 ( Sierra ) |
If you use this library, please cite:
@inproceedings{1706.06291v2, author = {Gabriel Sepulveda and Vicente Dominguez and Denis Parra}, title = {pyRecLab: A Software Library for Quick Prototyping of Recommender Systems}, year = {2017}, month = {August}, eprint = {arXiv:1706.06291v2}, keywords = {Recommender Systems, Software Development, Recommender Library, Python Library} }
1.- Before starting, verify you have libboost-dev and cmake installed on your system. If not, install it through your distribution's package manager, as shown next.
$ sudo apt-get install cmake
$ sudo apt-get install libboost-dev$ yum install cmake
$ yum install boost-devel$ brew install cmake
$ brew install boost2.- Clone the source code of Pyreclab in a local directory.
$ git clone https://github.com/gasevi/pyreclab.git3.- Build the Python module ( default: Python 2.7 ).
$ cd pyreclab
$ cmake .
$ makeBy default, PyRecLab will be compiled for Python 2.7. If you want to build it for Python 3.x, you can execute the following steps:
$ cd pyreclab
$ cmake -DCMAKE_PYTHON_VERSION=3.x .
$ make4.- Install PyRecLab.
$ sudo make installPyreclab provides the following classes for representing each of the recommendation algorithm currenly supported:
-
So, you can import any of them as follows:
>>> from pyreclab import <RecAlg>or import the entire module as you prefer
>>> import pyreclabAfer that, to create an instance of any of these clases, you must provide a dataset file with the training information, which must contain the fields user_id, item_id and rating.
The following example shows the generic format for creating one of these instances.
>>> obj = pyreclab.RecAlg( dataset = filename,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2 )Where RecAlg represents the recommendation algorithm chosen from the previous list, and its parameters are presented in the next table.
| Parameter | Type | Default value | Description |
|---|---|---|---|
| dataset | mandatory | N.A. | Dataset filename with fields: userid, itemid and rating |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Whether dataset filename contains a header line to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
Due to the different nature of each algorithm, their train methods can have different parameters. For this reason, they have been described for each class as shown below.
- Training
>>> obj.train()- Rating prediction
>>> prediction = obj.predict( userId, itemId )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| itemId | mandatory | N.A. | Item identifier |
- Top-N item recommendation
>>> ranking = obj.recommend( userId, topN, includeRated )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| topN | optional | 10 | Top N items to recommend |
| includeRated | optional | false | Include rated items in ranking generation |
- Testing and evaluation for prediction
>>> predictionList, mae, rmse = obj.test( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
output_file = 'predictions.csv' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| output_file | optional | N.A. | Output file to write predictions |
- Testing for recommendation
>>> recommendationList = obj.testrec( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
topn = 10,
output_file = 'ranking.json' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| topn | optional | 10 | Top N items to recommend |
| output_file | optional | N.A. | Output file to write predictions |
- Training
>>> obj.train()- Rating prediction
>>> prediction = obj.predict( userId, itemId )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| itemId | mandatory | N.A. | Item identifier |
- Top-N item recommendation
>>> ranking = obj.recommend( userId, topN, includeRated )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| topN | optional | 10 | Top N items to recommend |
| includeRated | optional | false | Include rated items in ranking generation |
- Testing and evaluation for prediction
>>> predictionList, mae, rmse = obj.test( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
output_file = 'predictions.csv' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| output_file | optional | N.A. | Output file to write predictions |
- Testing for recommendation
>>> recommendationList = obj.testrec( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
topn = 10,
output_file = 'ranking.json' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| topn | optional | 10 | Top N items to recommend |
| output_file | optional | N.A. | Output file to write predictions |
- Training
obj.train()- Rating prediction
prediction = obj.predict( userId, itemId )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| itemId | mandatory | N.A. | Item identifier |
- Top-N item recommendation
>>> ranking = obj.recommend( userId, topN, includeRated )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| topN | optional | 10 | Top N items to recommend |
| includeRated | optional | false | Include rated items in ranking generation |
- Testing and evaluation for prediction
>>> predictionList, mae, rmse = obj.test( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
output_file = 'predictions.csv' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| output_file | optional | N.A. | Output file to write predictions |
- Testing for recommendation
>>> recommendationList = obj.testrec( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
topn = 10,
output_file = 'ranking.json' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| topn | optional | 10 | Top N items to recommend |
| output_file | optional | N.A. | Output file to write predictions |
- Training
>>> obj.train( knn, similarity )| Parameter | Type | Default value | Description |
|---|---|---|---|
| knn | optional | 10 | K nearest neighbors |
| similarity | optional | 'pearson' | Similarity metric |
- Rating prediction
>>> prediction = obj.predict( userId, itemId )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| itemId | mandatory | N.A. | Item identifier |
- Top-N item recommendation
>>> ranking = obj.recommend( userId, topN, includeRated )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| topN | optional | 10 | Top N items to recommend |
| includeRated | optional | false | Include rated items in ranking generation |
- Testing and evaluation for prediction
>>> predictionList, mae, rmse = obj.test( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
output_file = 'predictions.csv' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| output_file | optional | N.A. | Output file to write predictions |
- Testing for recommendation
>>> recommendationList = obj.testrec( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
topn = 10,
output_file = 'ranking.json' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| topn | optional | 10 | Top N items to recommend |
| output_file | optional | N.A. | Output file to write predictions |
- Training
>>> obj.train( knn, similarity )| Parameter | Type | Default value | Description |
|---|---|---|---|
| knn | optional | 10 | K nearest neighbors |
| similarity | optional | 'pearson' | Similarity metric |
- Rating prediction
>>> prediction = obj.predict( userId, itemId )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| itemId | mandatory | N.A. | Item identifier |
- Top-N item recommendation
>>> ranking = obj.recommend( userId, topN, includeRated )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| topN | optional | 10 | Top N items to recommend |
| includeRated | optional | false | Include rated items in ranking generation |
- Testing and evaluation for prediction
>>> predictionList, mae, rmse = obj.test( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
output_file = 'predictions.csv' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| output_file | optional | N.A. | Output file to write predictions |
- Testing for recommendation
>>> recommendationList = obj.testrec( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
topn = 10,
output_file = 'ranking.json' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| topn | optional | 10 | Top N items to recommend |
| output_file | optional | N.A. | Output file to write predictions |
- Training
>>> obj.train( factors = 1000, maxiter = 100, lr = 0.01, lamb = 0.1 )| Parameter | Type | Default value | Description |
|---|---|---|---|
| factors | optional | 1000 | Number of latent factors in matrix factorization |
| maxiter | optional | 100 | Maximum number of iterations reached without convergence |
| lr | optional | 0.01 | Learning rate |
| lamb | optional | 0.1 | Regularization parameter |
- Rating prediction
>>> prediction = obj.predict( userId, itemId )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| itemId | mandatory | N.A. | Item identifier |
- Top-N item recommendation
>>> ranking = obj.recommend( userId, topN, includeRated )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| topN | optional | 10 | Top N items to recommend |
| includeRated | optional | false | Include rated items in ranking generation |
- Testing and evaluation for prediction
>>> predictionList, mae, rmse = obj.test( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
output_file = 'predictions.csv' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| output_file | optional | N.A. | Output file to write predictions |
- Testing for recommendation
>>> recommendationList = obj.testrec( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
itemcol = 1,
ratingcol = 2,
topn = 10,
output_file = 'ranking.json' )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| itemcol | optional | 1 | Item column position in dataset file |
| rating | optional | 2 | Rating column position in dataset file |
| topn | optional | 10 | Top N items to recommend |
| output_file | optional | N.A. | Output file to write predictions |
- Training
>>> obj.train()- Top-N item recommendation
>>> ranking = obj.recommend( userId, topN, includeRated )| Parameter | Type | Default value | Description |
|---|---|---|---|
| userId | mandatory | N.A. | User identifier |
| topN | optional | 10 | Top N items to recommend |
| includeRated | optional | false | Include rated items in ranking generation |
- Testing for recommendation
>>> recommendationList = obj.testrec( input_file = testset,
dlmchar = b'\t',
header = False,
usercol = 0,
output_file = 'ranking.json',
topN = 10 )| Parameter | Type | Default value | Description |
|---|---|---|---|
| input_file | mandatory | N.A. | Testset filename |
| dlmchar | optional | tab | Delimiter character between fields (userid, itemid, rating) |
| header | optional | False | Dataset filename contains first line header to skip |
| usercol | optional | 0 | User column position in dataset file |
| output_file | optional | N.A. | Output file to write rankings |
| topN | optional | 10 | Top N items to recommend |
- Add ranking evaluation metrics.
- Add Windows support.