Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
47 views2 pages

NLP Assignment 4.4

Uploaded by

bayehip127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views2 pages

NLP Assignment 4.4

Uploaded by

bayehip127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 4

Web Scraping and Translation


Name: Abhiraj Singh UID:22BCS12473
Task: Scrape multilingual content from websites (e.g., news articles available in multiple
languages) .
• Use web scraping to gather text in different languages.
• Implement a translation system using NLP techniques to translate the scraped content
into a target language of your choice.
• Evaluate the accuracy and fluency of the translated text

Solution:

1. Preparation:

• Target Website: Choose a website with multilingual content (e g., BBC News).
• Target Language: Select a target language for translation (e.g., English).
• Tools:
o Scraping Library: Choose a library based on your skills (Beautiful Soup for
Python in this example).
o Translation API: Sign up for a translation API like Google Translate or DeepL
(free tiers available with limitations).

2. Web Scraping:

• Inspect Website: Use your browser's developer tools to examine the website's HTML
structure. Identify the HTML elements containing the desired content (titles, body text,
etc.) in different languages.
• Write Scraper Script:
1. Import necessary libraries (Beautiful Soup, requests).
2. Define a function to fetch the website content using requests . get (url).
3. Parse the HTML content using Beautifu lsoup( content,
'html.parser').
4. Use appropriate methods (e.g., I ind, f ind_a11) to locate elements containing
the text in different languages based on the identified HTML stricture.
5. Extract the text content from each language element and store it in separate
vañables (e.g., original_text,translated_text).

3. Translation:

• Integrate API: Follow the chosen translation API's documentation to integrate it into
your script.
• Translate Text: Use the API function to translate the original text (or igina l_text)
into the target language (trans lated_text).

4. Evaluation:

• Accuracy Check:
6. Manually translate a small sample of the original text.
7. Compare the machine translation with your human translation to identify any
significant errors.
• Fluency Analysis:
8. Use a tool like Grammarly to check for unnatural phrasing or grammatical errors
in the translated text.
9. Manually review the fluency of the translated content, ensuring it reads naturally.

5. Enhancements:

• Error Handling: Implement error handling mechanisms to gracefully handle situations


like website changes or API failures.
• Data Storage: Store the scraped and translated data in a structured format like CSV for
further analysis or use.
• Ethical Scraping: Respect the website's terms of service (check robots.txt) and avoid
overwhelming the website with excessive requests.

Implementation:

This code scrapes articles from Wikipedia using requests and BeautifulSoup. It fetches the page
content for given URLs, extracts the title and content based on specified HTML tags, and stores
the results in a dictionary. The articles are then printed in different languages (English and
French).
The Code then follows to analyze and process the result to produce the fluency of the translation
done by the code by back translation from the website itself

You might also like