Webscrapping with Python

This repository is made for Project 1 of the discipline Collection, Preparation, and Data Analysis on Pontifical Catholic University of Rio Grande do Sul.

It consists of two activities of webscrapping: one in a desktop application - the 'paises.ipynb'-, and one in a real environment - 'imdb.ipynb' file.

Instructions

This project uses Beautiful Soup on both parts to parse through the html files.

For IMDB notebook:

This notebook executes a webscrapping routine on IMDB movie reviewing website. It runs on Selenium extension for Python and your kernel needs to have it installed. For some operational systems, the webdriver doesn't support running Chrome, so it is possible that it would be needed to change it to firefox.

A part of the code searches through the page by strings, so the website needs to be on English.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
html_pages		html_pages
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Trabalho Coleta de Dados.pdf		Trabalho Coleta de Dados.pdf
imdb.ipynb		imdb.ipynb
places.csv		places.csv
places.ipynb		places.ipynb
top250_movies.json		top250_movies.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webscrapping with Python

Instructions

For IMDB notebook:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

heloysapelizon/CPA_t1

Folders and files

Latest commit

History

Repository files navigation

Webscrapping with Python

Instructions

For IMDB notebook:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages