SCRAPING FROM AMAZON

This is a code that gets the name of the product, the title of the review, the number of stars given, and the full comment from the review page of a product you want from Amazon.com. I took airpods 2 comments as an example here

Required libraries

requests
BeautifulSoup from bs4
pandas

After importing our libraries, we define an empty list to convert to dataframe structure after receiving our comments.

We use the header structure (`User-Agent`) so that the Amazon site does not consider us as robots and prevent us from pulling data.

We define the function that we send a request to the site (`get_soup`). We send a request by typing the link of the product we want into `requests.get` and adding `headers` to the end. With `BeautifulSoup`, we split the data from the lxml method. (can also be done in html)

Then, with the `get_reviews` function, we select the previously obtained data as product, title, rating and body part according to the html structure.

Finally, we take product, title, rating and body parts in each comment and put it in the empty directory (`reviewlist`) we defined at the beginning and save it as an excel file. Using the for loop with range, we determine from the beginning how many pages of comments we want to receive. The if part at the end of the for loop is to avoid an error when the last page is reached.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
README.md		README.md
scraping_amazon.ipynb		scraping_amazon.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SCRAPING FROM AMAZON

This is a code that gets the name of the product, the title of the review, the number of stars given, and the full comment from the review page of a product you want from Amazon.com. I took airpods 2 comments as an example here

Required libraries

After importing our libraries, we define an empty list to convert to dataframe structure after receiving our comments.

We use the header structure (`User-Agent`) so that the Amazon site does not consider us as robots and prevent us from pulling data.

We define the function that we send a request to the site (`get_soup`). We send a request by typing the link of the product we want into `requests.get` and adding `headers` to the end. With `BeautifulSoup`, we split the data from the lxml method. (can also be done in html)

Then, with the `get_reviews` function, we select the previously obtained data as product, title, rating and body part according to the html structure.

About

Uh oh!

Releases

Packages

Languages

erenonal/scraping-amazon

Folders and files

Latest commit

History

Repository files navigation

SCRAPING FROM AMAZON

This is a code that gets the name of the product, the title of the review, the number of stars given, and the full comment from the review page of a product you want from Amazon.com. I took airpods 2 comments as an example here

Required libraries

After importing our libraries, we define an empty list to convert to dataframe structure after receiving our comments.

We use the header structure (User-Agent) so that the Amazon site does not consider us as robots and prevent us from pulling data.

We define the function that we send a request to the site (get_soup). We send a request by typing the link of the product we want into requests.get and adding headers to the end. With BeautifulSoup, we split the data from the lxml method. (can also be done in html)

Then, with the get_reviews function, we select the previously obtained data as product, title, rating and body part according to the html structure.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

We use the header structure (`User-Agent`) so that the Amazon site does not consider us as robots and prevent us from pulling data.

We define the function that we send a request to the site (`get_soup`). We send a request by typing the link of the product we want into `requests.get` and adding `headers` to the end. With `BeautifulSoup`, we split the data from the lxml method. (can also be done in html)

Then, with the `get_reviews` function, we select the previously obtained data as product, title, rating and body part according to the html structure.

Packages