DAI 101 Tutorial 3 (Web Scraping)
Q.1 Consider the following code snippet:
What will links contain?
(a) All <a> tags in the HTML
(b) All <a> tags with an href attribute
(c) All href attributes in the HTML
(d) The first <a> tag with an href attribute
Answer: b) All <a> tags with an href attribute
Q.2 Using BeautifulSoup, what code would extract a list of all fruit names?
(a) soup.find_all('li')
(b) [item.text for item in soup.find_all('li', class_='item')]
(c) soup.find('ul').get_text()
(d) [item for item in soup.select('.fruits .item')]
Answer: B) [item.text for item in soup.find_all('li', class_='item')]
Q.3 .You are given the task to extract and print the href attributes of all anchor tags (<a>)
on a webpage using BeautifulSoup and requests. Which code snippet achieves this
correctly?
(a) links_a = [a['href'] for a in soup.find_all('a')]
(b) links_b = [a.get('href') for a in soup.find_all('a', href=True)]
(c) for a in soup.select('a'): print(a['href'])
(d) links_d = (a.get_attribute('href') for a in soup.find('a'))
Answer: B) links_b = [a.get('href') for a in soup.find_all('a', href=True)]
Q.4 What is the output of the following BeautifulSoup code snippet?
(a) FirstSecond
(b) First
(c) Second
(d) <span>First</span>
Answer: B) First
Q.5 Which of the following is the correct way to find a tag by its ID using BeautifulSoup?
(a) soup.find(id="example")
(b) soup.find("#example")
(c) soup.select("#example")
(d) Both b and c
(e) Both a and c
Answer: e) Both a and c
Q.6 Given the HTML snippet below, which BeautifulSoup command would correctly count
and return the number of <li> elements within the <ul class="fruits"> section?
(a) count = len(soup.find_all('li', class_='fruits'))
(b) count = len(soup.find('ul').find_all('li'))
(c) count = len(soup.find_all('li', parent='fruits'))
(d) count = soup.count('li')
Answer: B) count = len(soup.find('ul').find_all('li'))
Q.7 What does the following code snippet do?
(a) Sends a POST request to example.com
(b) Creates a BeautifulSoup object from the HTML content of example.com
(c) Parses XML content from example.com
(d) Prints the HTML content of example.com
Answer: (b) Explanation: This code snippet sends a GET request to example.com using
requests.get(), then creates a BeautifulSoup object from the response content using the HTML
parser.
Q.8 Which line of code correctly gets the first item in items and makes the most sense
following the below code snippet?
(a) first_item = items[0]
(b) first_item = items.get(0)
(c) first_item = items.find[0]
(d) first_item = soup.items[0]
Answer: a) first_item = items[0]
Q.9 What does this code do?
a) Removes all JavaScript and CSS from the page
b) Executes all JavaScript and applies all CSS styles
c) Finds all script and style tags
d) Decompresses all script and style content
Answer: a) Removes all JavaScript and CSS from the page
Explanation: This code iterates over all <script> and <style> tags in the HTML and uses
Q.10 Which code snippet selects all elements with the class item?
(a) items = soup.find_all('item')
(b) items = soup.select('.item')
(c) items = soup.find_all(class_='item')
(d) B and C
Answer: D) B and C
Explanation: find_all(class_='item') and select('.item') are both valid ways to find all elements
with the class item. The first method uses find_all with the class_ argument, and the second uses
a CSS selector.
Q.11 What does the following block of code do?
(a) It retrieves 'cover3.jpg' and saves it to your computer.
(b) It displays the image 'cover3.jpg'.
(c) It retrieves the url to download 'cover3.jpg'
(d) None of the above
Answer: (a)
Q.12 What does the line soup.find('ul') do in the code?
(a) It finds the first unordered list in the HTML page
(b) It finds all unordered lists in the HTML page
(c) It finds the first hyperlink in the HTML page
(d) It finds all hyperlinks in the HTML page
Answer: (a) It finds the first unordered list in the HTML page
Q.13 In the context of the provided code in Q.12, what does the li.a statement retrieve?
(a) All text content within a list item
(b) The first link within each list item
(c) All list items in the unordered list
(d) The last list item in the unordered list
Answer: (b) The first link within each list item
Q.14 How can you convert a BeautifulSoup object back to a string?
(a) str(soup)
(b) soup.to_string()
(c) soup.prettify()
(d) Both a and c
(e) Both b and c
Answer: d) Both a and c
Q.15 What's the output of the following code?
a) The full HTML content of the page
b) The title tag of the page
c) The text content of the title tag
d) None
Answer: c) The text content of the title tag
Explanation: soup.title returns the <title> tag, and .string extracts the string content from that tag.
Q.16 What does the following block of code print?
(a) retrieves and displays the webpage
(b) downloads the webpage
(c) prints the images from 'www.nytimes.com'
(d) prints all the 'img' sources under 'src' from 'www.nytimes.com'
Answer: (d)
Q.17 What will this code do?
a) Print all tag names in the HTML
b) Print all tag contents in the HTML
c) Print all attributes of each tag
d) Raise an error
Answer: a) Print all tag names in the HTML
Explanation: This code iterates over all tags in the HTML (because find_all(True) matches all
tags) and prints the name of each tag.
Q.18 Which of the following is the correct way to send a file using the Requests library?
(a) requests.post(url, data={'file': open('file.txt', 'rb')})
(b) requests.post(url, attachments={'file': 'file.txt'})
(c) requests.post(url, files={'file': open('file.txt', 'rb')})
(d) requests.post(url, upload={'file': open('file.txt', 'rb')})
Answer: c) requests.post(url, files={'file': open('file.txt', 'rb')})
Explanation: This is the correct way to send a file using the Requests library. The files parameter
is used for file uploads in multipart/form-data requests.
Q.19 What is the purpose of the verify parameter in the requests.get() function?
a) To verify the SSL certificate of the website
b) To verify the content type of the response
c) To verify the HTTP status code
d) To verify the encoding of the response
Answer: a) To verify the SSL certificate of the website
Explanation: The verify parameter in requests.get() is used to control SSL certificate verification.
When set to True (default), it verifies the SSL certificate of the website. When set to False, it
disables SSL certificate verification, which can be useful for debugging but is not recommended
for production use due to security risks.
Q.20 In web scraping, what is the purpose of the strip method on a BeautifulSoup text
element?
(a) To convert the text to uppercase
(b) To remove HTML tags
(c) To remove leading and trailing whitespace
(d) To find all text elements
Answer: (c) To remove leading and trailing whitespace
Q.21 Which of the following is a potential ethical issue with web scraping?
(a) Enabling cookies
(b) Scraping public data
(c) Overloading a server with frequent requests
(d) Using the latest Python version
Answer:(C) Overloading a server with frequent requests
Q.22 In BeautifulSoup, how can you select a DOM element using a CSS selector?
(a) soup.find(css='selector')
(b) soup.select('selector')
(c) soup.css('selector')
(d) soup.locate('selector')
Answer: (b) soup.select('selector')
Q.23 What should you consider when scraping websites to avoid legal issues?
(a) Scraping as much data as possible
(b) Accessing high-security websites
(c) Complying with the website's robots.txt file and terms of service
(d) Using a high-speed internet connection
Answer: C) Complying with the website's robots.txt file and terms of service
Q.24 How do you extract attributes, such as src from an img tag, using BeautifulSoup?
(a) img['src']
(b) img.src
(c) img.get('src')
(d) img.src.get()
Answer: (c) img.get('src')
Q25. What is the primary limitation of using only requests and BeautifulSoup for web
scraping dynamic websites?
(a) They are unable to parse HTML
(b) They don't support HTTP requests
(c) They cannot execute JavaScript
(d) They are platform-dependent
Answer: c) They cannot execute JavaScript
Q.26 Which Selenium WebDriver property is used to find an element by its class name?
(a) find_element_by_xpath()
(b) find_element_by_css_selector()
(c) find_element_by_class_name()
(d) find_element_by_tag_name()
Answer: C) find_element_by_class_name()
Q.27 What technique can you employ to ensure your web scraping script mimics human
activity more closely to avoid detection?
(a) Scrape at the fastest speed possible
(b) Use a consistent IP address
(c) Implement random time delays between requests
(d) Scrape the entire website at once
Answer: C) Implement random time delays between requests
Q.28 What is the purpose of the headers parameter in a requests.get() function?
(a) It specifies the URL of the webpage
(b) It contains cookies to pass along with the request
(c) It includes metadata such as User-Agent to disguise the request
(d) It writes the response to a file
Answer: (c) It includes metadata such as User-Agent to disguise the request
Q.29 Which of the following is a method to handle pagination in web scraping?
(a) Collect data arbitrarily
(b) Use proxies to change IP addresses
(c) Automate clicking of the next page button or modify page URLs
(d) Increase the script's execution speed
Answer: C) Automate clicking of the next page button or modify page URLs
Q.30 What can be a more advanced technique for extracting data from a heavily
JavaScript-rendered website?
(a) Using the json module
(b) Accessing public API endpoints directly
(c) Employing only BeautifulSoup
(d) Parsing data with regex
Answer: B) Accessing public API endpoints directly