Filmwise is a proof-of-concept movie recommendation system designed to enhance the movie streaming experience for users. The system combines collaborative and content-based filtering techniques to provide personalized movie recommendations. This README file documents the process of developing the Filmwise project.
The initial step involved cleaning the provided dataset, which included the following steps:
-
Handling Missing Values:
- Missing rating values were replaced with the mean ratings for the particular movie, ensuring a more complete and accurate dataset.
-
Alphanumeric to Numeric Conversion:
- Non-numeric ratings were converted to their numeric equivalents, such as converting 'Five' to 5.
-
Standardizing Movie Names:
- Typos and inconsistencies in movie names were corrected to ensure a standardized representation.
-
Handling Duplicate Ratings:
- If a user had multiple ratings for the same movie, only the latest rating was retained, streamlining the dataset.
To enrich the dataset, additional information such as movie genres was incorporated. This supplementary data was sourced from a larger dataset obtained from Kaggle, enhancing the breadth of information available for analysis and recommendation.
Data analysis was conducted to gain insights into user preferences and patterns within the dataset. Key analyses include:
-
Average Ratings per Movie:
- Calculated to understand the overall reception of each movie.
-
Number of Ratings per User and per Movie:
- Explored to identify user engagement and popular movies.
-
Most Popular Genres:
- Investigated to understand the distribution of genres and user preferences.
The recommendation algorithm involves a hybrid approach, combining collaborative and content-based filtering:
-
Collaborative Filtering:
- A collaborative filtering model using the Surprise library's SVD algorithm was trained on the cleaned dataset. This model leverages user ratings to make personalized recommendations.
-
Content-Based Filtering:
- A content-based filtering model using movie genres and TF-IDF was trained. This model considers the content of movies to enhance recommendations.
-
Hybrid Recommendations:
- A hybrid recommendation function was implemented to combine collaborative and content-based scores, providing more robust and personalized movie recommendations.
The weights in the combination formula were adjusted based on experimentation and performance evaluation to optimize the recommendation system.
The directory structure of the project is as shown below:
-
data/
- raw/
- raw_data.csv
- cleaned/
- expanded.csv
- catalog/
- movies.csv
- raw/
-
src/
- __init__.py
- data/
- __init__.py
- data_analysis.py
- data_preprocessing.py
- algorithms/
- __init__.py
- hybrid_recommendation.py
- main.py
-
requirements.txt
-
README.md
-
.gitignore
- Clone this repository - Clone the repo and open it in the terminal.
- Create a python virtual environment - On the terminal, run
pip install virtualenv. Thenvirtualenv envto create a virtual environment named env - Activate the virtual environment - For linux, run
source .env/bin/activateon terminal - Install Dependencies - install the dependencies. Run
pip install -r requirements.txt - Configure PythonPath in env - Run this in the project root, i.e, in the filmwise folder
export PYTHONPATH="$PYTHONPATH:$PWD". Navigate to src folder (cd src), then run the same command. - Run the main.py - Run the main.py file by running this command:
python3 main.py