News Summarizer Using ChatGpt
Submitted by
Aravind R – 950022104040
Haribalaji H – 950022104016
Siva N – 950022104018
In partial fulfilment of the requirements for the course
ChatGpt
Conducted by GUVI
Under NaanMudhalvan Scheme
Department of Computer Science and Engineering
Anna University Regional Campus -Tirunelveli
ACKNOWLEDGEMENT
We thank Government of Tamilnadu for offering us with a course like Naan
Mudhalvan, We also would like to thank our college Anna University Regional
Campus- Tirunelveli for providing us with a good environment until the completion
of the full course.
With a heart full of gratitude, we express our sincere thanks to our Dean
Dr.N.SHENBAGA VINAYAGA MOORTHI M.E, Ph.D., Anna University
Regional Campus, Tirunelveli for providing the necessary facilities to carry out our
project.
It gives us immense pleasure to express our deep sense of gratitude to
Dr.C.AKILA , HOD & Assistant Professor, Department of Computer Science and
Engineering. Anna University Regional Campus, Tirunelveli.
We would also like to thank SPOC Dr.J.JESU VEDHA NAYAGI, Assistant
Professor, Department of Computer Science and Engineering, for helping us
throughout the session. We would also like to thank our course coordinator
Dr.C.AKILA , mam for instructing us properly till the completion of the session.
We specially thank GUVI team for helping us develop our knowledge
On ChatGpt.
Aravind R – 950022104040
Haribalaji H – 950022104016
Siva N – 950022104018
TABLE OF CONTENTS
C. TITLE PG.
NO NO
ACKNOWLWDGEMENT
TABLE OF CONTENTS
1 INTRODUCTION
2 PROJECT OBJECTIVES
3 SYSTEM REQUIREMENTS
3.1 SOFTWARE REQUIREMENTS
3.2 HAEDWARE REQUIREMENTS
4 METHODOLOGY
4.1 PROBLEM DEFINITION
4.2 DATA COLLECTION
4.3 PREPROCESSING THE DATA
4.4 CHATGPT INTEGRATION
4.5 BUILDING THE STREAMLIT
INTERFACE
4.6 TESTING AND DEBUGGING
4.7 OPTIMIZATION
4.8 DOCUMENTATION
EXISTING WORK
5
5.1 SUMMARIZATION MODELS
5.2CHATGPT INTEGRATION
5.3WEB SCRAPPING FOR CONTENT
EXTRACTION
5.4 EXISTING SUMMARIZATION SERVICES
5.5 OPEN-SOURCE PROJECTS
6 PROPOSED WORK
6.1 INTEGRATION OF CHATGPT FOR
ABSTRACTIVE SUMMARIZATION
6.2 INTERACTIVE USER INTERFACE
WITH STREAMLIT
6.3 WEB SCRAPING FOR CONTENT
EXTRACTION
6.4 PERSONALIZATION AND
CUSTOMIZATION FEATURES
6.5 SUPPORT FOR MULTIPLE
LANGUAGES
6.6 REAL-TIME SUMMARIZATION
6.7 SCALABILITY AND FUTURE
ENHANCEMENTS
6.8 FLOW CHART
7 CODE & OUTPUT
7.1 NEWS SUMMARIZER USING
CHATGPT ( CODE )
7.2 OUTPUT
8 FUTURE WORK & CONCLUSION
8.1 FUTURE WORK
8.2 CONCLUSION
Project Title : News Summarizer Using ChatGpt
CHAPTER – 1
INTRODUCTION
The digital era has brought an exponential rise in the availability of news and
information, with thousands of articles being published daily across various
platforms. While this ensures that users are never short of information, it also
poses a significant challenge: filtering through the vast ocean of content to extract
relevant and meaningful insights. Long and detailed news articles often demand
substantial time and effort from readers, leaving many struggling to keep up with
the latest developments. This gap calls for an innovative solution that simplifies
news consumption without compromising on the quality of information.
Our project, News Summarizer Using ChatGPT, addresses this challenge by
leveraging cutting-edge artificial intelligence (AI) to provide concise, accurate,
and meaningful summaries of news articles. Built on the advanced capabilities of
OpenAI's ChatGPT, this tool is designed to condense lengthy articles into easy-
to-understand summaries, ensuring users can stay informed in a fraction of the
time required to read the full content.
The News Summarizer offers multiple input methods for user convenience. Users
can provide a URL to a news article, paste the article's text directly, or upload a
file containing the content. The tool then processes the input to extract the main
text and passes it to ChatGPT, which generates a summary tailored to the user’s
preferences. To further enhance usability, the summarizer supports adjustable
summary lengths, allowing users to choose between brief overviews, medium
summaries, or more detailed explanations. Additionally, the tool incorporates
automatic category detection, classifying articles into topics like Politics,
Technology, Sports, or Entertainment.
This project combines a powerful backend, driven by OpenAI’s API and
supporting frameworks, with an intuitive and accessible user interface. By
integrating APIs for real-time news fetching and data extraction, the application
ensures a seamless and reliable experience. Furthermore, the platform can be
extended to support multi-language summarization, catering to a broader audience
and making the tool accessible to users worldwide.
The News Summarizer is not just about saving time; it’s about empowering
readers to make informed decisions in an age of information overload. For
professionals, students, and everyday users, the ability to quickly access relevant
news summaries translates to better productivity and engagement with current
events. With its focus on clarity, precision, and ease of use, this tool aligns with
the growing demand for intelligent systems that simplify complex tasks.
Moreover, the News Summarizer can help bridge the gap for those who might
otherwise skip reading news due to time constraints or the daunting nature of
lengthy articles. By ensuring that the most important points are presented in a
clear and digestible format, the tool encourages broader news consumption and
fosters a more informed society.
The potential applications of the News Summarizer extend beyond individual use.
It could be deployed in educational settings to help students quickly review large
volumes of reading material, in corporate environments to summarize industry
news for decision-making, or even in media outlets to generate previews of
articles. Features like voice-based input and output, real-time notifications for
trending topics, and options to export summaries as PDFs or emails make this
project scalable and versatile for diverse use cases.
In conclusion, News Summarizer Using ChatGPT is a forward-thinking project
that combines AI technology with practical functionality to redefine how users
interact with news. It offers a seamless, time-saving, and informative experience,
ensuring that staying updated has never been easier.
CHAPTER - 2
PROJECT OBJECTIVES
➢ Efficient News Summarization: Provide users with concise and accurate
summaries of lengthy news articles to save time and enhance readability.
➢ User-Friendly Input Options: Allow users to input news articles via URLs,
text, or file uploads for flexibility and ease of use.
➢ Customizable Summary Length: Enable users to choose summary lengths,
such as brief, medium, or detailed, based on their preferences.
➢ Automatic Content Extraction: Use advanced techniques to extract relevant
content from news sources, ensuring accurate summarization.
➢ Category Classification: Automatically categorize news articles into topics
like Politics, Sports, Technology, and more for better organization.
➢ Language Support: Support multi-language summarization to make the tool
accessible to a diverse audience worldwide.
➢ Real-Time News Integration: Integrate with news APIs to fetch and
summarize the latest news articles from trusted sources.
➢ Keyword Highlighting: Highlight key terms and concepts in summaries to
improve user understanding and engagement.
➢ Scalable and Reliable Architecture: Build a robust backend using OpenAI’s
API and efficient frameworks to ensure smooth performance and scalability.
➢ Future Scalability: Design the system for easy integration with additional
features like voice input/output, notifications, and export options for broader
usability.
CHAPTER - 3
SYSTEM REQUIREMENTS
3.1 Software Requirements:
➢ Python: Core programming language for developing the application
(Python 3.9 or later).
➢ Streamlit: Framework for building the interactive user interface (pip
install streamlit).
➢ OpenAI API: For integrating ChatGPT to generate summaries (pip
install openai).
➢ Requests Library: For making API calls to interact with external
APIs like OpenAI (pip install requests).
➢ Pandas: For processing and managing input data (pip install pandas).
➢ Dependency Management: Use pip to install required libraries and
manage project dependencies.
➢ IDE Code Editor: Tools like Visual Studio Code or PyCharm for
writing and managing the codebase.
➢ Streamlit Widgets: Built-in components like sliders, buttons, and text
inputs for user interaction.
➢ Streamlit Testing: Use the streamlit run command to test and debug
the application locally.
3.2 Hardware Requirements
➢ Processor (CPU):
• Minimum: Dual-core processor (Intel i3 or equivalent).
• Recommended: Quad-core processor (Intel i5 or equivalent) for smoother
performance.
➢ Memory (RAM):
• Minimum: 4 GB RAM (sufficient for basic development and testing).
• Recommended: 8 GB RAM for better multitasking and handling larger data.
➢ Storage:
• Minimum: 128 GB storage (enough for basic project files and software).
• Recommended: 256 GB SSD for faster read/write speeds.
➢ Operating System:
• Windows 10/11, macOS, or Linux (Ubuntu 20.04 or newer) for
compatibility with development tools.
➢ Network:
➢ Stable internet connection to access APIs (OpenAI, News API) and fetch
online data.
CHAPTER – 4
METHODOLOGY
4.1 Problem Definition
➢ Objective:
The project aims to create a tool that can summarize lengthy news articles
using ChatGPT.
➢ User Input:
The system allows users to either input articles manually or provide URLs for
automatic extraction.
➢ Customization:
The app should allow users to adjust the length of the summary, offering
options like short, medium, or long summaries.
4.2 Data Collection
➢ News API Integration:
Use external news APIs (e.g., NewsAPI or Google News API) to gather up-to-
date news articles for summarization.
➢ Manual Input and Web Scraping:
Enable users to paste articles directly into the app or input URLs for automatic
scraping of article content using web scraping tools like BeautifulSoup.
➢ Diverse Data Sources:
Consider expanding the app’s capabilities by allowing it to work with a variety
of news websites and handle different formats (e.g., text, multimedia).
4.3 Preprocessing the Data
➢ Text Extraction:
Extract the main content of the news article from a webpage, eliminating
unnecessary elements like ads or sidebars.
➢ Text Cleaning:
Clean the extracted text by removing unwanted symbols, extra spaces, and
HTML tags to prepare the article for effective summarization.
➢ Data Formatting:
Format the cleaned text to ensure it’s compatible with ChatGPT’s token limits
and produces a concise summary.
4.4 ChatGPT Integration
➢ API Setup:
Integrate the OpenAI API to send cleaned article text to ChatGPT and request
summaries.
➢ Prompt Design:
Design clear and concise prompts to guide ChatGPT in generating accurate
summaries based on user input.
➢ Customization of Outputs:
Allow users to control the length and detail of the summary by modifying the
API request with additional parameters.
4.5 Building the Streamlit Interface
➢ User Input Interface:
Develop an input section where users can paste articles or URLs, with a text
box or URL field for submission.
➢ Summary Display:
Design an intuitive area to display the generated summary after processing.
➢ Interactive Features:
Provide options for the user to select summary length (e.g., short, medium,
long) and regenerate summaries if desired.
4.6 Testing and Debugging
➢ API Testing:
Test the OpenAI API integration to ensure that summaries are accurate,
coherent, and relevant to the provided content.
➢ Edge Case Handling:
Implement error handling to manage broken links, excessively long articles, or
unsupported formats.
➢ User Acceptance Testing:
Gather feedback from real users to fine-tune the interface and improve
usability.
4.7 Optimization
➢ Performance Tuning:
Minimize the number of API requests by caching results for previously
summarized articles to improve response times.
➢ Scalability:
Optimize the system for handling multiple requests simultaneously and
ensuring smooth performance with high traffic.
➢ Resource Management:
Monitor resource usage (e.g., API limits, server load) and optimize the
backend code to reduce unnecessary computational load.
4.8 Documentation
➢ User Guide:
Provide an easy-to-follow manual explaining how users can input articles,
select summary options, and interpret the results.
➢ Code Documentation:
Include comments and explanations in the code to help other developers
understand the flow and structure of the app.
CHAPTER - 5
EXISTING WORK
5.1 Summarization Models
➢ Abstractive Summarization: This method involves generating new sentences
that capture the meaning of the original content. Models such as GPT-3,
BERT, and T5 are widely used for this purpose. These models are capable of
understanding context and rephrasing text, making them ideal for creating
fluent, human-like summaries. GPT-3 is particularly effective in generating
summaries with minimal input, preserving the essence of the article.
➢ Extractive Summarization: Extractive summarization focuses on selecting
key sentences directly from the text. Algorithms like TextRank and LexRank
are commonly used, ranking sentences based on importance and relevance.
These models work by identifying and extracting the most important
information from a given article without generating new content.
5.2 ChatGPT Integration
➢ GPT-3 for Summarization: ChatGPT, powered by GPT-3, has become a
popular tool for text summarization. It provides accurate and contextually
relevant summaries by processing articles through the language model. The
ability to generate coherent and concise summaries with user control over
length and style makes it a powerful tool for news summarization.
➢ Streamlit for UI: Streamlit is a Python framework that allows easy
development of web applications. It is commonly used in conjunction with
GPT-3 to create interactive summarization tools. By integrating Streamlit,
developers can offer users a simple interface where they can input news
articles and receive summaries almost instantly, with the flexibility to adjust
the summary length.
5.3 Web Scraping for Content Extraction
➢ Web Scraping Technologies: Tools like BeautifulSoup and Selenium are
widely used to extract articles from websites. BeautifulSoup works well for
static pages, while Selenium is more suited for dynamic websites that load
content with JavaScript. These technologies allow the automation of content
gathering, making it easier to collect and summarize articles without manual
intervention.
5.4 Existing Summarization Services
➢ Google News & Flipboard: Services like Google News and Flipboard
provide automatic news aggregation and basic summarization. However, these
platforms lack the ability to generate personalized summaries or offer
customization in terms of summary length or depth. They mostly provide
curated content based on user interests but do not allow detailed control over
summarization.
5.5 Open-Source Projects
➢ There are many open-source projects in the field of news summarization that
use models like BERT, BART, and GPT-3. Projects such as BERTSum and
Sumy offer frameworks for building summarization systems. These open-
source tools provide valuable resources for building customized
summarization systems, leveraging pre-trained models to generate accurate
summaries.
CHAPTER – 6
PROPOSED WORK
6.1 Integration of ChatGPT for Abstractive Summarization
➢ The core of the project will leverage ChatGPT (GPT-3) for abstractive
summarization. ChatGPT excels in generating fluent, human-like summaries
by understanding the context of the text. This method will ensure that the
summaries are not just extracts but rephrased versions of the original content,
making them more coherent and readable.
➢ The user will be able to input a URL or paste the full content of a news article,
and ChatGPT will generate a brief yet comprehensive summary, reducing the
time and effort needed to read lengthy articles.
6.2 Interactive User Interface with Streamlit
➢ The project will use Streamlit to develop the user interface (UI). Streamlit is a
powerful framework that allows easy creation of interactive web applications.
The user interface will be simple, intuitive, and responsive, enabling users to
easily submit articles for summarization.
➢ Streamlit will provide options for users to adjust the length of the summary
(e.g., short, medium, or detailed summaries) and to specify which type of
summary they prefer—whether focusing on key facts, general overview, or
specific themes.
6.3 Web Scraping for Content Extraction
➢ To provide flexibility and automate content extraction, the project will
incorporate web scraping techniques. Using tools like BeautifulSoup and
Selenium, the system will allow users to simply enter a URL, and the tool will
automatically extract the article's content.
➢ The web scraper will handle dynamic web pages that load content with
JavaScript, making the solution adaptable to a wide range of news websites.
6.4 Personalization and Customization Features
➢ Users will have the option to tailor the summarization process by adjusting
parameters such as summary length, tone, and focus area. For example, they
may choose a more formal or casual tone for the summary or request a focus
on specific information (e.g., sports, politics).
➢ The project will also incorporate feedback mechanisms, allowing users to rate
the summary quality, helping improve future outputs.
6.5 Support for Multiple Languages
➢ Given the global nature of news consumption, the summarizer will be
designed to support multiple languages. Users will be able to input articles in
different languages, and the model will attempt to generate summaries in the
same language.
➢ Multilingual support will be integrated into the project by utilizing the
capabilities of GPT-3, which can process various languages efficiently.
6.6 Real-Time Summarization
➢ The system will provide real-time summarization, meaning that users can get
their summaries instantly after submitting the article content or URL. This will
make it easy for users to quickly access news summaries without having to
wait long processing times.
6.7 Scalability and Future Enhancements
➢ The proposed solution will be designed to handle a growing number of users
and requests efficiently. Future enhancements could include integrating more
sophisticated machine learning models for better summarization quality or
adding support for multimedia content like videos and podcasts.
➢ The system could also incorporate additional AI-driven features, such as
sentiment analysis, to provide users with insights into the tone or mood of the
news article.
6.8 Flow Chart
CHAPTER – 7
CODE
7.1 News Summarizer Using ChatGpt ( Code ):
import os
import google.generativeai as genai
import streamlit as st
from dotenv import load_dotenv
from io import StringIO
import fitz
load_dotenv()
genai.configure(api_key=os.environ["gemini_api_key"])
generation_config = {
"temperature": 1,
"top_p": 0.95,
"top_k": 64,
"max_output_tokens": 8192,
"response_mime_type": "text/plain",
}
model = genai.GenerativeModel(
model_name="gemini-1.5-pro",
generation_config=generation_config,
)
def summarize_text(text):
chat_session = model.start_chat(
history=[
{
"role": "user",
"parts": [
"summarize the document",
],
},
]
)
response = chat_session.send_message(text)
return response.text
def extract_text_from_pdf(uploaded_file):
pdf_text = ""
pdf_document = fitz.open(stream=uploaded_file.read(), filetype="pdf")
for page_num in range(pdf_document.page_count):
page = pdf_document[page_num]
pdf_text += page.get_text()
pdf_document.close()
return pdf_text
def main():
st.title("News Summarizer")
uploaded_file = st.file_uploader("Upload News as Document (txt, csv,
pdf)", type=["txt", "csv", "pdf"])
if uploaded_file is not None:
file_content = ""
if uploaded_file.type == "text/plain":
file_content = uploaded_file.read().decode("utf-8")
elif uploaded_file.type == "text/csv":
file_content = uploaded_file.read().decode("utf-8")
elif uploaded_file.type == "application/pdf":
file_content = extract_text_from_pdf(uploaded_file)
st.subheader("Original News")
st.write(file_content)
if st.button("Summarize News"):
with st.spinner("Summarizing the News..."):
summary = summarize_text(file_content)
st.subheader("Summarized News")
st.write(summary)
if __name__ == "__main__":
main()
7.2 Output :
CHAPTER - 8
FUTURE WORK & CONCLUSION
8.1 Future Work
➢ Enhanced Summarization Models: The current implementation leverages
ChatGPT for summarization. Future versions could explore integrating more
specialized models or fine-tuning ChatGPT on domain-specific news articles
to improve the quality and relevance of summaries.
➢ Multilingual Support: As the demand for news summarization grows
globally, adding multilingual capabilities would allow the tool to summarize
articles in multiple languages. This could be particularly useful for
international users, helping bridge the language gap in news dissemination.
➢ Sentiment Analysis: Future work could incorporate sentiment analysis to not
only summarize the news but also classify the tone (e.g., positive, negative,
neutral) of the content. This would add more value by allowing users to
understand the sentiment behind the article quickly.
➢ Real-time News Integration: Currently, the summarization is based on user-
uploaded or pasted content. A more advanced version could pull news articles
from live RSS feeds or news APIs to provide real-time summaries. This
feature would help users stay updated with the latest news efficiently.
➢ Customization of Summaries: More customization options, such as adjusting
summary length or focusing on specific sections (e.g., politics, economics),
could be implemented. This would give users control over the depth of
information they receive.
➢ Mobile Application Integration: The system could be developed into a
mobile application to make it more accessible and user-friendly. This would
allow users to easily summarize news articles on the go.
➢ Improved User Interface: While Streamlit provides a functional UI, future
work could involve building a more polished, interactive user interface that
provides better user experience (UX) with features like drag-and-drop for
article uploading, progress indicators, and more intuitive feedback systems.
➢ Artificial Intelligence for Content Prioritization: The project could be
expanded to analyze large volumes of content and prioritize articles based on
relevance, user preferences, or trending topics. This would make the
summarizer more effective in delivering curated content.
8.2 Conclusion :
The News Summarizer using ChatGPT project serves as an innovative solution
to address the growing need for efficient, concise information dissemination. By
leveraging the capabilities of ChatGPT, the system is able to generate accurate
and concise summaries of lengthy news articles, making it easier for users to stay
informed without investing time in reading entire articles.
Through its simple interface using Streamlit, the project demonstrates the
potential of AI-driven summarization technologies to improve productivity and
information accessibility. The integration of natural language processing with
tools like ChatGPT proves to be highly effective in generating human-like
summaries that maintain the essence of the original content.
However, the system is not without limitations, and there are several areas where
it can be improved, such as incorporating multilingual support, real-time news
feeds, and more customization options. As the project evolves, it could play a
crucial role in transforming how we consume news and stay updated with global
developments.
In summary, this project showcases the power of AI in transforming traditional
information consumption methods, providing users with quick, reliable, and
relevant summaries that help them make informed decisions in an increasingly
fast-paced world. With continuous improvements, this news summarizer could
significantly enhance how we interact with and absorb information.