0% found this document useful (0 votes)

4 views12 pages

Tutorial 3 Solution

The document is a tutorial on web scraping using BeautifulSoup and requests, featuring a series of questions and answers that cover various aspects of the library and its usage. Topics include extracting elements, handling attributes, managing requests, and ethical considerations in web scraping. The tutorial emphasizes practical coding examples and best practices for effective web scraping.

Uploaded by

mittalanuj1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

Tutorial 3 Solution

Uploaded by

mittalanuj1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

DAI 101 Tutorial 3 (Web Scraping)

Q.1 Consider the following code snippet:

What will links contain?

(a) All <a> tags in the HTML

(b) All <a> tags with an href attribute
(c) All href attributes in the HTML
(d) The first <a> tag with an href attribute
Answer: b) All <a> tags with an href attribute

Q.2 Using BeautifulSoup, what code would extract a list of all fruit names?

(a) soup.find_all('li')
(b) [item.text for item in soup.find_all('li', class_='item')]
(c) soup.find('ul').get_text()
(d) [item for item in soup.select('.fruits .item')]

Answer: B) [item.text for item in soup.find_all('li', class_='item')]

Q.3 .You are given the task to extract and print the href attributes of all anchor tags (<a>)
on a webpage using BeautifulSoup and requests. Which code snippet achieves this
correctly?

(a) links_a = [a['href'] for a in soup.find_all('a')]

(b) links_b = [a.get('href') for a in soup.find_all('a', href=True)]
(c) for a in soup.select('a'): print(a['href'])
(d) links_d = (a.get_attribute('href') for a in soup.find('a'))

Answer: B) links_b = [a.get('href') for a in soup.find_all('a', href=True)]

Q.4 What is the output of the following BeautifulSoup code snippet?

(a) FirstSecond
(b) First
(c) Second
(d) <span>First</span>

Answer: B) First

Q.5 Which of the following is the correct way to find a tag by its ID using BeautifulSoup?

(a) soup.find(id="example")
(b) soup.find("#example")
(c) soup.select("#example")
(d) Both b and c
(e) Both a and c
Answer: e) Both a and c
Q.6 Given the HTML snippet below, which BeautifulSoup command would correctly count
and return the number of <li> elements within the <ul class="fruits"> section?

(a) count = len(soup.find_all('li', class_='fruits'))

(b) count = len(soup.find('ul').find_all('li'))
(c) count = len(soup.find_all('li', parent='fruits'))
(d) count = soup.count('li')

Answer: B) count = len(soup.find('ul').find_all('li'))

Q.7 What does the following code snippet do?

(a) Sends a POST request to example.com

(b) Creates a BeautifulSoup object from the HTML content of example.com
(c) Parses XML content from example.com
(d) Prints the HTML content of example.com

Answer: (b) Explanation: This code snippet sends a GET request to example.com using
requests.get(), then creates a BeautifulSoup object from the response content using the HTML
parser.
Q.8 Which line of code correctly gets the first item in items and makes the most sense
following the below code snippet?

(a) first_item = items[0]

(b) first_item = items.get(0)
(c) first_item = items.find[0]
(d) first_item = soup.items[0]
Answer: a) first_item = items[0]

Q.9 What does this code do?

a) Removes all JavaScript and CSS from the page

b) Executes all JavaScript and applies all CSS styles
c) Finds all script and style tags
d) Decompresses all script and style content

Answer: a) Removes all JavaScript and CSS from the page

Explanation: This code iterates over all <script> and <style> tags in the HTML and uses

Q.10 Which code snippet selects all elements with the class item?

(a) items = soup.find_all('item')

(b) items = soup.select('.item')
(c) items = soup.find_all(class_='item')
(d) B and C
Answer: D) B and C
Explanation: find_all(class_='item') and select('.item') are both valid ways to find all elements
with the class item. The first method uses find_all with the class_ argument, and the second uses
a CSS selector.

Q.11 What does the following block of code do?

(a) It retrieves 'cover3.jpg' and saves it to your computer.

(b) It displays the image 'cover3.jpg'.
(c) It retrieves the url to download 'cover3.jpg'
(d) None of the above

Answer: (a)

Q.12 What does the line soup.find('ul') do in the code?

(a) It finds the first unordered list in the HTML page

(b) It finds all unordered lists in the HTML page
(c) It finds the first hyperlink in the HTML page
(d) It finds all hyperlinks in the HTML page

Answer: (a) It finds the first unordered list in the HTML page
Q.13 In the context of the provided code in Q.12, what does the li.a statement retrieve?

(a) All text content within a list item

(b) The first link within each list item
(c) All list items in the unordered list
(d) The last list item in the unordered list

Answer: (b) The first link within each list item

Q.14 How can you convert a BeautifulSoup object back to a string?

(a) str(soup)
(b) soup.to_string()
(c) soup.prettify()
(d) Both a and c
(e) Both b and c
Answer: d) Both a and c

Q.15 What's the output of the following code?

a) The full HTML content of the page

b) The title tag of the page
c) The text content of the title tag
d) None

Answer: c) The text content of the title tag

Explanation: soup.title returns the <title> tag, and .string extracts the string content from that tag.
Q.16 What does the following block of code print?

(a) retrieves and displays the webpage

(b) downloads the webpage
(c) prints the images from 'www.nytimes.com'
(d) prints all the 'img' sources under 'src' from 'www.nytimes.com'

Answer: (d)

Q.17 What will this code do?

a) Print all tag names in the HTML

b) Print all tag contents in the HTML
c) Print all attributes of each tag
d) Raise an error

Answer: a) Print all tag names in the HTML

Explanation: This code iterates over all tags in the HTML (because find_all(True) matches all
tags) and prints the name of each tag.
Q.18 Which of the following is the correct way to send a file using the Requests library?

(a) requests.post(url, data={'file': open('file.txt', 'rb')})

(b) requests.post(url, attachments={'file': 'file.txt'})
(c) requests.post(url, files={'file': open('file.txt', 'rb')})
(d) requests.post(url, upload={'file': open('file.txt', 'rb')})

Answer: c) requests.post(url, files={'file': open('file.txt', 'rb')})

Explanation: This is the correct way to send a file using the Requests library. The files parameter
is used for file uploads in multipart/form-data requests.

Q.19 What is the purpose of the verify parameter in the requests.get() function?

a) To verify the SSL certificate of the website

b) To verify the content type of the response
c) To verify the HTTP status code
d) To verify the encoding of the response

Answer: a) To verify the SSL certificate of the website

Explanation: The verify parameter in requests.get() is used to control SSL certificate verification.
When set to True (default), it verifies the SSL certificate of the website. When set to False, it
disables SSL certificate verification, which can be useful for debugging but is not recommended
for production use due to security risks.

Q.20 In web scraping, what is the purpose of the strip method on a BeautifulSoup text
element?

(a) To convert the text to uppercase

(b) To remove HTML tags
(c) To remove leading and trailing whitespace
(d) To find all text elements

Answer: (c) To remove leading and trailing whitespace

Q.21 Which of the following is a potential ethical issue with web scraping?

(a) Enabling cookies

(b) Scraping public data
(c) Overloading a server with frequent requests
(d) Using the latest Python version
Answer:(C) Overloading a server with frequent requests

Q.22 In BeautifulSoup, how can you select a DOM element using a CSS selector?

(a) soup.find(css='selector')
(b) soup.select('selector')
(c) soup.css('selector')
(d) soup.locate('selector')
Answer: (b) soup.select('selector')

Q.23 What should you consider when scraping websites to avoid legal issues?

(a) Scraping as much data as possible

(b) Accessing high-security websites
(c) Complying with the website's robots.txt file and terms of service
(d) Using a high-speed internet connection

Answer: C) Complying with the website's robots.txt file and terms of service

Q.24 How do you extract attributes, such as src from an img tag, using BeautifulSoup?

(a) img['src']
(b) img.src
(c) img.get('src')
(d) img.src.get()

Answer: (c) img.get('src')

Q25. What is the primary limitation of using only requests and BeautifulSoup for web
scraping dynamic websites?

(a) They are unable to parse HTML

(b) They don't support HTTP requests
(c) They cannot execute JavaScript
(d) They are platform-dependent
Answer: c) They cannot execute JavaScript

Q.26 Which Selenium WebDriver property is used to find an element by its class name?

(a) find_element_by_xpath()
(b) find_element_by_css_selector()
(c) find_element_by_class_name()
(d) find_element_by_tag_name()
Answer: C) find_element_by_class_name()

Q.27 What technique can you employ to ensure your web scraping script mimics human
activity more closely to avoid detection?

(a) Scrape at the fastest speed possible

(b) Use a consistent IP address
(c) Implement random time delays between requests
(d) Scrape the entire website at once
Answer: C) Implement random time delays between requests

Q.28 What is the purpose of the headers parameter in a requests.get() function?

(a) It specifies the URL of the webpage

(b) It contains cookies to pass along with the request
(c) It includes metadata such as User-Agent to disguise the request
(d) It writes the response to a file
Answer: (c) It includes metadata such as User-Agent to disguise the request
Q.29 Which of the following is a method to handle pagination in web scraping?

(a) Collect data arbitrarily

(b) Use proxies to change IP addresses
(c) Automate clicking of the next page button or modify page URLs
(d) Increase the script's execution speed

Answer: C) Automate clicking of the next page button or modify page URLs

Q.30 What can be a more advanced technique for extracting data from a heavily
JavaScript-rendered website?

(a) Using the json module

(b) Accessing public API endpoints directly
(c) Employing only BeautifulSoup
(d) Parsing data with regex
Answer: B) Accessing public API endpoints directly

Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (3)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Accomplishment Report of Project ICARE
100% (1)
Accomplishment Report of Project ICARE
10 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
Unit I
No ratings yet
Unit I
12 pages
DAP - Module 4
No ratings yet
DAP - Module 4
57 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
Lecture 12 - Web Scrapping
No ratings yet
Lecture 12 - Web Scrapping
11 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
DAP Module4 1
No ratings yet
DAP Module4 1
110 pages
Mcqs Int349
No ratings yet
Mcqs Int349
6 pages
Assignment
No ratings yet
Assignment
5 pages
Q-1 Web Scraping: Definition and Significance
No ratings yet
Q-1 Web Scraping: Definition and Significance
4 pages
Notes For Web Scraping - BeautifulSoup-3903
No ratings yet
Notes For Web Scraping - BeautifulSoup-3903
6 pages
Webscraping1 1 PDF
No ratings yet
Webscraping1 1 PDF
10 pages
Api and Data Structure
No ratings yet
Api and Data Structure
3 pages
Dewp 2
No ratings yet
Dewp 2
23 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
BeautifulSoup For Python RPA
No ratings yet
BeautifulSoup For Python RPA
6 pages
Python For Web Scraping - Week 3: 1 Installing A Module
No ratings yet
Python For Web Scraping - Week 3: 1 Installing A Module
4 pages
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell PDF Available
No ratings yet
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell PDF Available
127 pages
Python Quiz on Web Scraping and OOP
No ratings yet
Python Quiz on Web Scraping and OOP
10 pages
Web Scrapping
100% (1)
Web Scrapping
20 pages
Beautiful Soup Documentation - Beautiful Soup 4.4.0 Documentation
No ratings yet
Beautiful Soup Documentation - Beautiful Soup 4.4.0 Documentation
49 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
Full Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Ebook All Chapters
No ratings yet
Full Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Ebook All Chapters
67 pages
Web Scraping CheatSheet Guide
No ratings yet
Web Scraping CheatSheet Guide
10 pages
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Instant Download
No ratings yet
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell Instant Download
52 pages
Beautiful Soup 4 Documentation Guide
No ratings yet
Beautiful Soup 4 Documentation Guide
61 pages
Beautiful Soup & Selenium Web Scraping Guide
No ratings yet
Beautiful Soup & Selenium Web Scraping Guide
5 pages
Beautiful Soup Documentation: Getting Help
100% (1)
Beautiful Soup Documentation: Getting Help
56 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
Beautiful Soup: Python HTML/XML Parsing
No ratings yet
Beautiful Soup: Python HTML/XML Parsing
40 pages
Chapter 11. Web Scraping
100% (1)
Chapter 11. Web Scraping
57 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Web Technologies QA
No ratings yet
Web Technologies QA
5 pages
055-En
No ratings yet
055-En
2 pages
Beautiful Soup
No ratings yet
Beautiful Soup
61 pages
Cheat Sheet For API's and Data Collection
No ratings yet
Cheat Sheet For API's and Data Collection
4 pages
Beautiful Soup Tutorial
100% (2)
Beautiful Soup Tutorial
56 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Getting Data II Solutions
No ratings yet
Getting Data II Solutions
9 pages
03 Web Scraping
No ratings yet
03 Web Scraping
41 pages
Web Scraping for Developers
No ratings yet
Web Scraping for Developers
8 pages
Slide10 Part1
No ratings yet
Slide10 Part1
35 pages
Web Scraping
No ratings yet
Web Scraping
28 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
Python Web Scraping Guide
100% (1)
Python Web Scraping Guide
13 pages
Apuntes Curso
No ratings yet
Apuntes Curso
2 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Dap Mod 4-5
No ratings yet
Dap Mod 4-5
19 pages
Lesson 4 Unstructured Data
No ratings yet
Lesson 4 Unstructured Data
20 pages
BeautifulSoup HTML Parsing Guide
No ratings yet
BeautifulSoup HTML Parsing Guide
9 pages
Web Scraping Python Tutorial - How To Scrape Data From A Website
No ratings yet
Web Scraping Python Tutorial - How To Scrape Data From A Website
19 pages
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell No Waiting Time
100% (1)
Web Scraping With Python Collecting Data From The Modern Web 1st Edition Ryan Mitchell No Waiting Time
115 pages
Web Scraping
No ratings yet
Web Scraping
11 pages
ETHICS AND CSR - Merged
No ratings yet
ETHICS AND CSR - Merged
33 pages
Unit Ii Personality
No ratings yet
Unit Ii Personality
92 pages
Organizational Change
No ratings yet
Organizational Change
18 pages
Unit-I-Introduction To Ob
No ratings yet
Unit-I-Introduction To Ob
32 pages
Year 5 Autumn Term 1 SPaG Activity Mat 2
No ratings yet
Year 5 Autumn Term 1 SPaG Activity Mat 2
6 pages
ESL Lesson Introduction & Presentation
No ratings yet
ESL Lesson Introduction & Presentation
3 pages
World Literature 1
No ratings yet
World Literature 1
54 pages
Latex of Mini Project
No ratings yet
Latex of Mini Project
21 pages
Advanced NX Meshing Techniques
No ratings yet
Advanced NX Meshing Techniques
22 pages
Theory of L-Functions: An Introduction To The
No ratings yet
Theory of L-Functions: An Introduction To The
205 pages
Functions and Relations Solutions
No ratings yet
Functions and Relations Solutions
46 pages
English For Tourism Luh Sri Kusuma Dewi
No ratings yet
English For Tourism Luh Sri Kusuma Dewi
6 pages
12 Ip
No ratings yet
12 Ip
4 pages
A. Nagoor Kani - Circuit Theory-McGraw-Hill Education (2018)
67% (3)
A. Nagoor Kani - Circuit Theory-McGraw-Hill Education (2018)
808 pages
History12 - 2 - Bhakti - Sufi Traditions PDF
No ratings yet
History12 - 2 - Bhakti - Sufi Traditions PDF
30 pages
2 ND Selection Bed
No ratings yet
2 ND Selection Bed
12 pages
1 Teaching Assign Trinity in Asian Contexts
No ratings yet
1 Teaching Assign Trinity in Asian Contexts
6 pages
Wabi, Sabi, and Shibui
No ratings yet
Wabi, Sabi, and Shibui
2 pages
Manning 2010
No ratings yet
Manning 2010
13 pages
41 - Sermon Outlines 2017
No ratings yet
41 - Sermon Outlines 2017
155 pages
Analytical Exposition Text Guide
100% (1)
Analytical Exposition Text Guide
7 pages
Arduino Motor Shield 2A
No ratings yet
Arduino Motor Shield 2A
6 pages
SQL 2
No ratings yet
SQL 2
2 pages
BAB 9 Matroid
No ratings yet
BAB 9 Matroid
15 pages
Striving For Inner Peace Najmi
No ratings yet
Striving For Inner Peace Najmi
2 pages
7.2 Algorithms
No ratings yet
7.2 Algorithms
4 pages
Pentecostal Education
No ratings yet
Pentecostal Education
176 pages
Mobile App Portfolio
No ratings yet
Mobile App Portfolio
56 pages
Processor Organization: Module-3 Part-2
No ratings yet
Processor Organization: Module-3 Part-2
88 pages
Process of Writing
No ratings yet
Process of Writing
5 pages
24 Lessons Learned
No ratings yet
24 Lessons Learned
3 pages
Identifying The Firmware of A Qlogic or Emulex FC HBA
No ratings yet
Identifying The Firmware of A Qlogic or Emulex FC HBA
2 pages
DLL Sept 19 English III
No ratings yet
DLL Sept 19 English III
3 pages

Tutorial 3 Solution

Uploaded by

Tutorial 3 Solution

Uploaded by

DAI 101 Tutorial 3 (Web Scraping)

Q.1 Consider the following code snippet:

What will links contain?

(a) All <a> tags in the HTML

Answer: B) [item.text for item in soup.find_all('li', class_='item')]

(a) links_a = [a['href'] for a in soup.find_all('a')]

Answer: B) links_b = [a.get('href') for a in soup.find_all('a', href=True)]

Q.4 What is the output of the following BeautifulSoup code snippet?

(a) count = len(soup.find_all('li', class_='fruits'))

Answer: B) count = len(soup.find('ul').find_all('li'))

Q.7 What does the following code snippet do?

(a) Sends a POST request to example.com

(a) first_item = items[0]

Q.9 What does this code do?

a) Removes all JavaScript and CSS from the page

Answer: a) Removes all JavaScript and CSS from the page

(a) items = soup.find_all('item')

Q.11 What does the following block of code do?

(a) It retrieves 'cover3.jpg' and saves it to your computer.

Q.12 What does the line soup.find('ul') do in the code?

(a) It finds the first unordered list in the HTML page

(a) All text content within a list item

Answer: (b) The first link within each list item

Q.14 How can you convert a BeautifulSoup object back to a string?

Q.15 What's the output of the following code?

a) The full HTML content of the page

Answer: c) The text content of the title tag

(a) retrieves and displays the webpage

Q.17 What will this code do?

a) Print all tag names in the HTML

Answer: a) Print all tag names in the HTML

(a) requests.post(url, data={'file': open('file.txt', 'rb')})

Answer: c) requests.post(url, files={'file': open('file.txt', 'rb')})

a) To verify the SSL certificate of the website

Answer: a) To verify the SSL certificate of the website

(a) To convert the text to uppercase

Answer: (c) To remove leading and trailing whitespace

(a) Enabling cookies

(a) Scraping as much data as possible

Answer: (c) img.get('src')

(a) They are unable to parse HTML

(a) Scrape at the fastest speed possible

Q.28 What is the purpose of the headers parameter in a requests.get() function?

(a) It specifies the URL of the webpage

(a) Collect data arbitrarily

(a) Using the json module

You might also like