Educational & developer-friendly Flask + NLP app that aggregates movie reviews from Letterboxd, Rotten Tomatoes, OMDb, and YouTube trailer comments to perform sentiment analysis, keyword extraction, genre visualization, and interactive Plotly dashboards.
This project was originally built for a college assignment and has been upgraded to a structured, extensible, and production-style educational repository. It demonstrates:
- Web scraping (ethical & rate-conscious patterns) for movie reviews
- Sentiment analysis using TextBlob, NLTK, optional profanity filtering, and basic keyword extraction
- Visualization with Plotly (distribution, polarity, keywords, genre breakdown)
- Comparison of audience mood before release (YouTube trailer comments) vs after release (user reviews)
- Clean project structure, Docker build, CI pipeline, tests, and contribution standards
β οΈ Disclaimer: This repository is for learning & research. Always review platform Terms of Service before scraping and avoid abusive traffic patterns.
- π Enter a movie title β automatic metadata & poster retrieval via OMDb
- π Aggregated reviews (Letterboxd + Rotten Tomatoes users)
βΆοΈ Pre-release sentiment via YouTube trailer comments- π Interactive charts: pie, bar, polarity histogram, genre distribution
- π§ Keyword extraction for positive & negative reviews
- βοΈ Word cloud generation
- π§Ό Profanity filtering & basic NLP cleanup (spaCy + NLTK)
- π³ Dockerized for reproducibility
- β CI: linting, tests, security scanning
Component | Purpose |
---|---|
app.py |
Flask web server, routing & orchestration |
imdb_scraper.py |
Fetches metadata (OMDb) + Letterboxd + Rotten Tomatoes + similar movies |
youtube_scraper.py |
Trailer search + comment harvesting (YouTube Data API) |
sentiment_analysis.py |
Sentiment classification, keyword extraction, word cloud, plots |
templates/ |
Jinja2 HTML templates for UI |
static/ |
CSS & generated plot assets |
config.py |
Central environment & app configuration |
tests/ |
Pytest-based unit tests |
- Python 3.11
- Flask, Plotly, TextBlob, NLTK, spaCy, WordCloud
- BeautifulSoup4, Requests
- Google API Client (YouTube Data API)
- Docker, GitHub Actions, Ruff, Black, Pytest
git clone https://github.com/jambhaleAnuj/senti_analysis_1.git
cd senti_analysis_1
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
pip install -r requirements-dev.txt # optional for contributors
Copy .env.example
β .env
and fill in:
OMDB_API_KEY=your_key
YOUTUBE_API=your_key
GEMMA_API_KEY = your_key
python app.py
Open: http://127.0.0.1:5000
docker build -t movie-sentiment .
docker run --rm -p 5000:5000 --env-file .env movie-sentiment
pytest -q
ruff check .
black .
Relevant keywords included: Movie Sentiment Analysis, Flask NLP, YouTube Comments Sentiment, Rotten Tomatoes reviews, Letterboxd scraping, Plotly dashboard, TextBlob sentiment, Python movie analytics, genre visualization, pre-release vs post-release audience mood.
Variable | Purpose | Required |
---|---|---|
OMDB_API_KEY |
Movie metadata & posters | Yes (fallback demo key used if absent) |
YOUTUBE_API |
Trailer comments sentiment | Optional (feature disabled if missing) |
GEMMA_API_KEY |
Trailer comments sentiment | Optional (feature disabled if missing) |
FLASK_DEBUG |
Enable debug mode | No |
Sentiment (TextBlob polarity): >0
positive, =0
neutral, <0
negative.
Keywords: filtered tokens (stopwords removed, alphanumeric, length>2) frequency top 10 per class.
Limitations: sarcasm, slang, named entities may skew results. Consider VADER / transformers for production.
- Caching (Redis) for repeated titles
- Transformer-based sentiment option
- Better keyword extraction (bigrams, POS filtering)
- PDF export of analysis
- Public JSON API endpoints
See CONTRIBUTING.md. PRs welcome.
No real keys committed. See SECURITY.md.