NLP Assignment 4.4

Uploaded by

bayehip127

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views2 pages

NLP Assignment 4.4

Uploaded by

bayehip127

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment 4

Web Scraping and Translation

Name: Abhiraj Singh UID:22BCS12473
Task: Scrape multilingual content from websites (e.g., news articles available in multiple
languages) .
• Use web scraping to gather text in different languages.
• Implement a translation system using NLP techniques to translate the scraped content
into a target language of your choice.
• Evaluate the accuracy and fluency of the translated text

Solution:

1. Preparation:

• Target Website: Choose a website with multilingual content (e g., BBC News).
• Target Language: Select a target language for translation (e.g., English).
• Tools:
o Scraping Library: Choose a library based on your skills (Beautiful Soup for
Python in this example).
o Translation API: Sign up for a translation API like Google Translate or DeepL
(free tiers available with limitations).

2. Web Scraping:

• Inspect Website: Use your browser's developer tools to examine the website's HTML
structure. Identify the HTML elements containing the desired content (titles, body text,
etc.) in different languages.
• Write Scraper Script:
1. Import necessary libraries (Beautiful Soup, requests).
2. Define a function to fetch the website content using requests . get (url).
3. Parse the HTML content using Beautifu lsoup( content,
'html.parser').
4. Use appropriate methods (e.g., I ind, f ind_a11) to locate elements containing
the text in different languages based on the identified HTML stricture.
5. Extract the text content from each language element and store it in separate
vañables (e.g., original_text,translated_text).

3. Translation:

• Integrate API: Follow the chosen translation API's documentation to integrate it into
your script.
• Translate Text: Use the API function to translate the original text (or igina l_text)
into the target language (trans lated_text).

4. Evaluation:

• Accuracy Check:
6. Manually translate a small sample of the original text.
7. Compare the machine translation with your human translation to identify any
significant errors.
• Fluency Analysis:
8. Use a tool like Grammarly to check for unnatural phrasing or grammatical errors
in the translated text.
9. Manually review the fluency of the translated content, ensuring it reads naturally.

5. Enhancements:

• Error Handling: Implement error handling mechanisms to gracefully handle situations

like website changes or API failures.
• Data Storage: Store the scraped and translated data in a structured format like CSV for
further analysis or use.
• Ethical Scraping: Respect the website's terms of service (check robots.txt) and avoid
overwhelming the website with excessive requests.

Implementation:

This code scrapes articles from Wikipedia using requests and BeautifulSoup. It fetches the page
content for given URLs, extracts the title and content based on specified HTML tags, and stores
the results in a dictionary. The articles are then printed in different languages (English and
French).
The Code then follows to analyze and process the result to produce the fluency of the translation
done by the code by back translation from the website itself

Python Module-4
No ratings yet
Python Module-4
109 pages
Python Web Scraping Tutorial
92% (12)
Python Web Scraping Tutorial
65 pages
System Software Notes 5TH Sem Vtu
100% (1)
System Software Notes 5TH Sem Vtu
98 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
10 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
DAP Module4 1
No ratings yet
DAP Module4 1
110 pages
Crawl4ai Docs
No ratings yet
Crawl4ai Docs
253 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
21 pages
Python Selenium Web Scraping Guide
No ratings yet
Python Selenium Web Scraping Guide
14 pages
Parser Transformations
100% (3)
Parser Transformations
13 pages
A Simple Python Web Crawler...
100% (1)
A Simple Python Web Crawler...
5 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
DR TOC 1313
No ratings yet
DR TOC 1313
11 pages
Unit I
No ratings yet
Unit I
12 pages
Compiler Construction
0% (1)
Compiler Construction
19 pages
Lecture 12 - Web Scrapping
No ratings yet
Lecture 12 - Web Scrapping
11 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
AIML Manual Lab-For Students
No ratings yet
AIML Manual Lab-For Students
45 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Sem 6
No ratings yet
Sem 6
9 pages
Study Plan 10 Weeks
No ratings yet
Study Plan 10 Weeks
20 pages
NLP Report
No ratings yet
NLP Report
17 pages
Class Assign
No ratings yet
Class Assign
3 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
Multi Languages Pattern Matching Based Scraping of News Websites4
No ratings yet
Multi Languages Pattern Matching Based Scraping of News Websites4
6 pages
Lab 8
No ratings yet
Lab 8
6 pages
Ujjual PDF Web Scraping 2
No ratings yet
Ujjual PDF Web Scraping 2
2 pages
UNIT-I Basics of System Programming
No ratings yet
UNIT-I Basics of System Programming
88 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
IV Semester Course Details 2021-22
No ratings yet
IV Semester Course Details 2021-22
49 pages
Language Mediator
No ratings yet
Language Mediator
11 pages
Basic Scraping Techniques
No ratings yet
Basic Scraping Techniques
7 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
Natural Language Understanding Allen 1995 Chapter 9
No ratings yet
Natural Language Understanding Allen 1995 Chapter 9
86 pages
Study Plan 2 Months
No ratings yet
Study Plan 2 Months
2 pages
NLP Merged
No ratings yet
NLP Merged
100 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Submitted To: Submited by
No ratings yet
Submitted To: Submited by
45 pages
Web Scraping for Developers
No ratings yet
Web Scraping for Developers
8 pages
Web Scraper Mini Project
No ratings yet
Web Scraper Mini Project
13 pages
Introduction To Web Scraping in RPA With Python
No ratings yet
Introduction To Web Scraping in RPA With Python
10 pages
EnCase Processor Hardware and Configuration Recommendations
No ratings yet
EnCase Processor Hardware and Configuration Recommendations
7 pages
Document 2
No ratings yet
Document 2
6 pages
Sithfal-Task2 Explation Matter
No ratings yet
Sithfal-Task2 Explation Matter
6 pages
Hintoenglish Translator
No ratings yet
Hintoenglish Translator
7 pages
Web Scrapping Final
No ratings yet
Web Scrapping Final
7 pages
SSCD Sys
No ratings yet
SSCD Sys
3 pages
Python Using AI
No ratings yet
Python Using AI
9 pages
UI Ex 6 (61) - 1
No ratings yet
UI Ex 6 (61) - 1
3 pages
Beautifulsoap4 Experiments
No ratings yet
Beautifulsoap4 Experiments
7 pages
Unit-I NLP
No ratings yet
Unit-I NLP
37 pages
Ancient Script Detection System
No ratings yet
Ancient Script Detection System
20 pages
Python Web Scraping Guide
100% (1)
Python Web Scraping Guide
13 pages
FREE AI Python Code Explainer - Explain Python Code Online
No ratings yet
FREE AI Python Code Explainer - Explain Python Code Online
3 pages
Upload PDF
No ratings yet
Upload PDF
11 pages
Developer's Changelog for Translators
No ratings yet
Developer's Changelog for Translators
8 pages
21CSC303JJ SEPM - Ex 1
No ratings yet
21CSC303JJ SEPM - Ex 1
4 pages
Web Scraper-Document
No ratings yet
Web Scraper-Document
2 pages
Set02 - ESE - MAY24 - SOCS - B TECH (CSE-AI&ML-H-NH) - VI - CSEG3024 - Computational LNLProcessing
No ratings yet
Set02 - ESE - MAY24 - SOCS - B TECH (CSE-AI&ML-H-NH) - VI - CSEG3024 - Computational LNLProcessing
2 pages
Werff
No ratings yet
Werff
3 pages
AI & ML Syllabus V Sem
No ratings yet
AI & ML Syllabus V Sem
16 pages
S6CSEHand Out
No ratings yet
S6CSEHand Out
59 pages
Python Web Scraping Basics
No ratings yet
Python Web Scraping Basics
4 pages
244256-Exabeam Security Content in The Legacy Structure-Pdf-En
No ratings yet
244256-Exabeam Security Content in The Legacy Structure-Pdf-En
142 pages
6 Results and Discussions
No ratings yet
6 Results and Discussions
5 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
Host A Scheduled Scraper On AWS As An API Endpoint - Amen
No ratings yet
Host A Scheduled Scraper On AWS As An API Endpoint - Amen
3 pages
Ai in Natural Language Processing
No ratings yet
Ai in Natural Language Processing
4 pages
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
No ratings yet
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
14 pages
Download
No ratings yet
Download
4 pages
Python Web Scraper Guide
No ratings yet
Python Web Scraper Guide
1 page
Speech & Language Processing Course
No ratings yet
Speech & Language Processing Course
39 pages
On Python Project VI Semester: Academic Year: 2018-2019
No ratings yet
On Python Project VI Semester: Academic Year: 2018-2019
7 pages
Semester Vi Syllabus
No ratings yet
Semester Vi Syllabus
17 pages
Midterm: What Are Agent and Environment?
No ratings yet
Midterm: What Are Agent and Environment?
14 pages
CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
No ratings yet
CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
9 pages
Left Recursion
No ratings yet
Left Recursion
9 pages
Science of Computer Programming: Original Software Publication
No ratings yet
Science of Computer Programming: Original Software Publication
10 pages
Advanced NLP Course Project Guide
No ratings yet
Advanced NLP Course Project Guide
7 pages
Important Points
No ratings yet
Important Points
8 pages
CT 203 Theory of Computation
No ratings yet
CT 203 Theory of Computation
3 pages
Grammar Checker Prototype Study
No ratings yet
Grammar Checker Prototype Study
6 pages
1 People - Counting - Project
No ratings yet
1 People - Counting - Project
4 pages
WebScraping Lessons 1
100% (1)
WebScraping Lessons 1
3 pages
Psd3a-Principles of Compiler Design
No ratings yet
Psd3a-Principles of Compiler Design
1 page
PCD Poster - 20240501 - 075056 - 0000
No ratings yet
PCD Poster - 20240501 - 075056 - 0000
1 page
Web Scraping
No ratings yet
Web Scraping
28 pages