"Fair" IMDB Ranking

Web Scraping practice project with BeautifulSoup, Requests, CSV. It scrapes the Top 250 Movies from IMDB, then recalculates their ranking based on number of reviews & Oscars won.

Usage: Just run main.py

Optional arguments:

  -h, --help            Show this help message and exit.
  -o OUTPUT, --output OUTPUT
                        Output file name. Default: ./Fair_IMDB_Ratings_top20.csv
  -c CACHE, --cache CACHE
                        Cache file name. Default: ./cache.csv
                        (If you have a previously scraped csv, or want to do recalculation for movies other than the Top250 list.)
  -w MAX_WORKERS, --max-workers MAX_WORKERS
                        Maximum number of workers to use for parallel scraping.
                        (Default is 4, 8-32 works best for performance, above 32 you'll get temporarily blocked by IMDB.)
  -m MOVIES, --movies MOVIES
                        Number of movies to scrape. (Default is 20, maximum is 250.)

Todo:

Add some UI / Data Visualization.
Investigate more why multiprocessing is more performant in this case than the more obvious multithreading.

Done:

Make it more general (e.g. user definable list of movies, or use bigger dataset, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
fairmdb		fairmdb
tests		tests
.gitignore		.gitignore
LICENSE.MD		LICENSE.MD
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

"Fair" IMDB Ranking

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

kasztp/FairMDB

Folders and files

Latest commit

History

Repository files navigation

"Fair" IMDB Ranking

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages