Web Scraping Project

First project from the ITC program

Creators: Bar I. | Omer C. | Sahar G.

Main project purpose:
Create an easy-to-read db of restaurants in chosen cities.
The db will contain 5 tables: cities, restaurants, cuisines, reviews, awards.
See ERD below for tables contents.

By using the tripadvisor_scraper.py you can insert list of cities and number of pages per city (max 30 restaurants per page) and it will insert desired data to db tables.
The arguments of tripadvisor_scraper.py are as follows:

cities - name of the desired cities, -c "city_1" "city_2"
pages - Number of restaurants pages to scrape per city -p #num
API - Optional - perform scraping using Travel Advisor API (RapidAPI) --API

####Initial Configuration:

Make sure you have Google Chrome browser installed (relevant for web scraping, not API)
Install requirements.txt pip install -r requirements.txt
Edit db_config.py USERNAME and PASSWORD with local MySQL configuration
Edit HEADERS in config.py for Travel Advisor API based on you personal account https://rapidapi.com/apidojo/api/travel-advisor/

Run tripadvisor_scraper.py -c "city_1" "city_2" etc -p #num#

####Data which can be retrieved only via API:

cities table - num_restaurants, timezone, num_reviews, latitude, longitude
restaurants table - latitude, longitude
awards table
reviews table - API is limited to 3 reviews per restaurant, Web scraper limited to 10.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
API_db_update.py		API_db_update.py
API_scraper.py		API_scraper.py
ERD.jpeg		ERD.jpeg
README.md		README.md
config.py		config.py
create_db.py		create_db.py
db_config.py		db_config.py
detailed_page_mining.py		detailed_page_mining.py
get_rest_url_list.py		get_rest_url_list.py
next_page.py		next_page.py
open_rest_in_tabs.py		open_rest_in_tabs.py
requirements.txt		requirements.txt
scraper.py		scraper.py
search_in_tripadvisor.py		search_in_tripadvisor.py
tripadvisor_scraper.py		tripadvisor_scraper.py
update_db.py		update_db.py
verify_functions.py		verify_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping Project

First project from the ITC program

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Saharufus/Project_ITC

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project

First project from the ITC program

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages