Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Saharufus/Project_ITC

Repository files navigation

Web Scraping Project

First project from the ITC program

Creators: Bar I. | Omer C. | Sahar G.

Main project purpose:
 Create an easy-to-read db of restaurants in chosen cities.
 The db will contain 5 tables: cities, restaurants, cuisines, reviews, awards.
 See ERD below for tables contents.

By using the tripadvisor_scraper.py you can insert list of cities and number of pages per city (max 30 restaurants per page) and it will insert desired data to db tables.
The arguments of tripadvisor_scraper.py are as follows:

  • cities - name of the desired cities, -c "city_1" "city_2"
  • pages - Number of restaurants pages to scrape per city -p #num
  • API - Optional - perform scraping using Travel Advisor API (RapidAPI) --API

####Initial Configuration:

  • Make sure you have Google Chrome browser installed (relevant for web scraping, not API)
  • Install requirements.txt pip install -r requirements.txt
  • Edit db_config.py USERNAME and PASSWORD with local MySQL configuration
  • Edit HEADERS in config.py for Travel Advisor API based on you personal account https://rapidapi.com/apidojo/api/travel-advisor/

Run tripadvisor_scraper.py -c "city_1" "city_2" etc -p #num#

####Data which can be retrieved only via API:

  • cities table - num_restaurants, timezone, num_reviews, latitude, longitude
  • restaurants table - latitude, longitude
  • awards table
  • reviews table - API is limited to 3 reviews per restaurant, Web scraper limited to 10.

About

ITC web scraping project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages