Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
/ FairMDB Public

Web scraping practice project to recalculate IMDB Rankings to take into account the number of reviews and won Oscars as well.

License

Notifications You must be signed in to change notification settings

kasztp/FairMDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"Fair" IMDB Ranking Pylint

Web Scraping practice project with BeautifulSoup, Requests, CSV. It scrapes the Top 250 Movies from IMDB, then recalculates their ranking based on number of reviews & Oscars won.

Usage: Just run main.py

Optional arguments:

  -h, --help            Show this help message and exit.
  -o OUTPUT, --output OUTPUT
                        Output file name. Default: ./Fair_IMDB_Ratings_top20.csv
  -c CACHE, --cache CACHE
                        Cache file name. Default: ./cache.csv
                        (If you have a previously scraped csv, or want to do recalculation for movies other than the Top250 list.)
  -w MAX_WORKERS, --max-workers MAX_WORKERS
                        Maximum number of workers to use for parallel scraping.
                        (Default is 4, 8-32 works best for performance, above 32 you'll get temporarily blocked by IMDB.)
  -m MOVIES, --movies MOVIES
                        Number of movies to scrape. (Default is 20, maximum is 250.)

Todo:

  1. Add some UI / Data Visualization.
  2. Investigate more why multiprocessing is more performant in this case than the more obvious multithreading.

Done:

  1. Make it more general (e.g. user definable list of movies, or use bigger dataset, etc.)

About

Web scraping practice project to recalculate IMDB Rankings to take into account the number of reviews and won Oscars as well.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages