Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views3 pages

Class Assign

The document outlines a multi-part project focused on web scraping, including ethical considerations, HTML basics, and practical scraping techniques using Python. It covers tasks such as creating an HTML page, scraping static pages, integrating public APIs, and advanced scraping challenges. The project emphasizes responsible scraping practices, data handling, and visualization skills.

Uploaded by

caixuanhoa2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views3 pages

Class Assign

The document outlines a multi-part project focused on web scraping, including ethical considerations, HTML basics, and practical scraping techniques using Python. It covers tasks such as creating an HTML page, scraping static pages, integrating public APIs, and advanced scraping challenges. The project emphasizes responsible scraping practices, data handling, and visualization skills.

Uploaded by

caixuanhoa2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Part 1: Foundations and Ethics

Task 1.1: Ethics and Legal Research

Write a 500-word report addressing:

• What is web scraping and when is it appropriate?


• Explain robots.txt files and how to check them
• Discuss the legal and ethical considerations
• Provide 3 real-world examples of responsible web scraping

Task 1.2: Basic HTML Understanding (15 points)

Create a simple HTML page with:

• A table containing at least 10 rows of sample data (books, movies, products, etc.)
• Use proper HTML tags: <table>, <tr>, <td>, <th>
• Include attributes like class and id
• Add some basic CSS styling
• Practice using browser developer tools to inspect elements

Deliverable: HTML file and screenshot of developer tools inspection

Part 2: Basic Scraping Techniques


Task 2.1: Static Page Scraping

Using Python and BeautifulSoup, scrape the HTML page you created in Task 1.2:

python
# Required libraries: requests, beautifulsoup4, pandas
# Your code should:
# 1. Load the HTML file
# 2. Parse it with BeautifulSoup
# 3. Extract all table data
# 4. Save to CSV format

Requirements:

• Proper error handling


• Clean, commented code
• Output data to CSV file
• Print summary statistics (number of rows extracted)

Task 2.2: Public API Integration (15 points)

Choose one of these free APIs and create a data collection script:

• JSONPlaceholder (fake data for testing)


• OpenWeatherMap (weather data)
• REST Countries (country information)
• Cat Facts API

Requirements:

• Make at least 10 API calls


• Handle API rate limits appropriately
• Save data in both JSON and CSV formats
• Include error handling for failed requests

Part 3: Intermediate Scraping


Task 3.1: Real Website Scraping

Choose ONE of these beginner-friendly websites:

• Books.toscrape.com (practice scraping site)


• Quotes.toscrape.com (quotes collection)
• Scrape.center (designed for learning)

Scraping Requirements:

• Extract at least 50 items


• Collect minimum 4 attributes per item
• Implement respectful delays (1-2 seconds between requests)
• Handle pagination if applicable
• Check and respect robots.txt

Data Processing:

• Clean and validate the extracted data


• Handle missing values appropriately
• Create basic visualizations using matplotlib or seaborn
• Generate a summary report of your findings
Task 3.2: Advanced Challenges

Implement TWO of the following features:

• User-Agent rotation: Use different user agents for requests


• Session handling: Maintain cookies across requests
• Data validation: Implement schema validation for scraped data
• Duplicate detection: Identify and handle duplicate entries
• Concurrent scraping: Use threading for faster collection (with care)

You might also like