Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
58 views35 pages

WhatsApp Chat Analysis Report

Uploaded by

Vishwajit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views35 pages

WhatsApp Chat Analysis Report

Uploaded by

Vishwajit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

A

Project Report on

WhatsApp Chat Analysis


Submitted in partial fulfillment for the award of

Bachelor of Technology in
Artificial Intelligence and Data Science

Submitted to

Bikaner Technical University, Bikaner (Raj.)


Submitted by
Raj Acharya (20EEBAD048)
Vikram Vaishnav (20EEBAD058)
Tushar Bapna (20EEBAD055)

Under the supervision of

Project Guide Head of Department


Ms. Anita Chandel Dr. Ajay Choudhary
Professor Professor
WHATSAPP CHAT ANALYSIS
A Project Report Submitted
In Partial Fulfillment of the Requirement for the Degree of

BACHELOR OF TECHNOLOGY
In
Artificial Intelligence & Data Science
Year IV, Semester VIII
2023-24
Paper No 8AD7-0
By

Raj Acharya
VikramVaishnav
Tushar Bapna

MENTORS AND GUIDES

Ms. Anita Chandel


Ms. Jyoti Bhati

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE


ENGINEERING COLLEGE BIKANER
[BIKANER TECHNICAL UNIVERSITY,
BIKANER] BIKANER, RAJASTHAN
CERTIFICATE
This is to certify that the work embodies in this Project entitled “WhatsApp Chat Analysis” being submitted by Raj
Acharya, Vikram Vaishnav and Tushar Bapna in partial fulfillment of the requirement for the award of Bachelor of
Technology in Artificial Intelligence & Data Science to Bikaner Technical University, Bikaner (Raj.) during the
academic year 2023-24. This is a record of a bonafide piece of work, carried out by him/her under our supervision and
guidance in the Department of Artificial Intelligence and Data Science ECB , Bikaner.

Project Guide Head of Department


Ms. Anita Chandel Dr. Ajay Choudhary
Professor Professor
CANDIDATE DECLARATION

This is to certify that the work embodied in this Project entitled “WhatsApp Chat Analysis ” being submitted by Raj
Acharya , Vikram Vaishnav and Tushar Bapna in partial fulfillment of the requirement for the award of Bachelor of
Technology in Artificial Intelligence & Data Science to Bikaner Technical University, Bikaner (Raj.) during the
academic year 2023-24 is a record of bonafide work carried out by them under our supervision and guidance in the
Department of Artificial Intelligence and Data Science ECB, Bikaner.

Raj Acharya (20EEBAD48)


Vikram Vaishnav (20EEBAD58)
Tushar Bapna (20EEBAD055)

Department of Artificial Intelligence and Data Science,


Government Engineering College Bikaner, Bikaner
ACKNOWLEDGEMENT

I extend my heartfelt gratitude to all those who have contributed to the completion of this project on the
development of a medical chatbot.

First and foremost, I would like to express my sincere appreciation to Ms. Anita Chandel, whose
guidance, support, and expertise have been invaluable throughout this journey. Their insightful feedback,
encouragement, and mentorship have played a pivotal role in shaping the direction of thisproject.

We would like to thank our Head of the Department Dr. Ajay Choudhary for their valuable suggestions towards
formulating the problem statement and planning for the work. Also, we thank him for motivating us at appropriate
stages of the project work and being a critic of this work.

I would also like to acknowledge the contributions of my peers and colleagues who provided assistance,
feedback, and constructive criticism at various stages of the project. Their collaboration and insights have
enriched the quality of our work.

Special thanks are due to the healthcare professionals and domain experts who generously shared their
knowledge, insights, and domain-specific expertise, which proved instrumental in refining the
functionalities and capabilities of the medical chatbot.

Last but not least, I am grateful to my family and friends for their unwavering support, understanding,
and encouragement throughout this endeavor. Their belief in my abilities has been a constant source of
motivation and inspiration.
Abstract

WhatsApp Chat Analysis is an emerging field that leverages data science and natural language processing
techniques to extract and interpret information from WhatsApp chat logs. Given the platform's extensive global
user base and the vast volume of daily interactions, this analysis provides significant insights into communication
patterns, user behavior, and social dynamics. This study aims to explore various dimensions of WhatsApp chat
data, including message frequency, participant activity levels, peak communication times, common words and
phrases, and the sentiment expressed in messages.

The methodology involves exporting and preprocessing chat data, followed by detailed analysis using Python and
libraries such as Pandas, NLTK, and Matplotlib. Key findings include identifying the most active participants,
visualizing peak messaging hours, and generating word clouds to depict commonly used terms. Sentiment analysis
is conducted to gauge the overall mood of the conversations, revealing interesting trends in emotional expression.

The results of this analysis have broad applications, from enhancing customer service and marketing strategies in
business to supporting social science research and mental health monitoring. This abstract underscores the
potential of WhatsApp Chat Analysis as a valuable tool for gaining deeper insights into digital communication,
paving the way for more informed decision-making across various domains.
Contents
1 Introduction………………………………………………………….…………………………..1
1.1 Problem Statement and Objective………………….…………….…………………………2
1.2 Objective………………………….……………………………….………………………..2
2 Prerequisites for Data Analysis……………………………………….……………………..3
3 How to Export Chat from WhatsApp?.....................................................................................4
4 Data Analysis: Project Development Steps……………………………….…………………5
4.1 Data Analysis: Create Data Frame from Chat File…………………….………………..5
4.2.Separate the Message and User Name………………………….……….………………7
4.3. Breaking the Date Column into Different Columns……….…………….…………….8
5 Data Analysis: Display Basic Statistics for Data Analysis…………………….…………..10
5.1 Get the Total Number of Messages………………………………………….…………10
5.2 Get the Total Number of Words………………………………………………..………10
5.3 Get the Number of Media Messages……………………………………………..……..10
5.4 Get the Total Number of Links Shared…………………………………………………11
6 Data Analysis: Find the Busiest Users in Group………………………………………..….11
6.1Display Top Words in a Chat……………………………….…………………...………13
6.2 Find the Top 20 Most Common Words………………….……………..………………15
7 Emoji Analysis…………………………………………….…………………..……………16
8 Time-based Analysis…………………………………….………………………………….17
8.1 Monthly Chats Timeline………………………….…………………………………….17
8.2 Daily Timeline………………………………….………………………………..……..19
8.3Day-based Activity Map……………………….………………………………………..20
8.4Monthly Activity Map……………………….………………………………..………...21
9 Which Time User Remains Active?.................................................................................22
10 Creating Streamlit Web App after Data Analysis….………………………………..……23
10.1 Preprocessor.py……………………………………………………………………….23
10.2 Helper.py………………………………………………….……………………….….24
11 Running the App on your localhost……………………………………………………….25
12 Prepare Cloud Files………………………………..………………………………………25
12.1 Procfile…………………………………………………….…………………….……25
12.2 Requirements……………………………………………….…………………..……26
12.3 Setup File…………………………………………………….………………………26
13 Conclusion………………………………………………………….….…………………..27
14 Resources……………………………………………………………...…………….……..28
1.Introduction

In today's digital age, instant messaging applications have revolutionized the way we communicate, with
WhatsApp being one of the most prominent platforms globally. As of 2024, WhatsApp boasts over 2 billion active
users, facilitating a staggering amount of daily communication through text messages, voice notes, images, videos,
and other multimedia formats. This vast and continuously growing repository of data offers a unique opportunity
for in-depth analysis, commonly referred to as WhatsApp Chat Analysis.

WhatsApp Chat Analysis involves extracting and examining data from chat logs to uncover patterns, trends, and
insights about user behavior and communication dynamics. By analyzing chat data, one can gain valuable insights
into various aspects of social interactions, such as the frequency and timing of messages, the nature of the content
shared, and the overall sentiment expressed in conversations.

The primary objectives of WhatsApp Chat Analysis include understanding communication patterns among
participants, identifying the most active users, determining peak messaging times, and analyzing the types of
content shared. Additionally, sentiment analysis plays a crucial role in assessing the emotional tone of the
conversations, providing a deeper understanding of the mood and engagement levels within the chat.

This analysis is not only beneficial for personal insights but also has broader applications across multiple domains.
For businesses, it can enhance customer service by analyzing feedback and interactions. For social scientists, it
offers a window into studying human behavior and social dynamics. In the healthcare sector, it can be used to
monitor patient communication and support mental health initiatives.

To conduct a comprehensive WhatsApp Chat Analysis, various tools and techniques from the fields of data
science, natural language processing (NLP), and machine learning are employed. These include data preprocessing
to clean and structure the chat logs, statistical analysis to quantify communication patterns, and visualization
techniques to present the findings in an accessible manner. Advanced NLP techniques further allow for sentiment
analysis and the identification of key themes and topics within the chats.

In summary, WhatsApp Chat Analysis serves as a powerful tool to decode the rich data encapsulated in everyday
digital conversations, offering a wealth of insights that can be applied in diverse fields ranging from customer
service to social research and beyond. Through meticulous analysis of chat data, we can enhance our understanding
of communication behaviors and leverage this knowledge for various practical applications.

1
1.1 Problem Statement and Objective

Everyone today uses Whatsapp for daily conversation. It has become one of the biggest business
engines where multiple e-commerce businesses share the product-designs and product details,
accept orders, is involved in money transactions, and a lot more. Indeed the business support
Whatsapp does not have analytics support where the people or business can analyze their monthly or
daily activities to get an idea of where they are lacking, the demand of customers, sales, marketing,
activeness of group members, and many things.

So to gain the solution to the above statement, we aim to develop a complete interface where users
can upload the WhatsApp chat in text format by exporting the chat from WhatsApp. It will provide
users with two options to study the chats. On submitting the chat, the engine will display the
complete report with interactive graphs, which is easy to understand. The user can get an in -depth
idea of how the business over WhatsApp is performing. The report we want to display will include
the following analysis from the chat we need to showcase.

1. Total number of messages


2. Total words
3. Number of Media and links shared
4. Monthly and Daily Timeline: Chat activity on a daily basis and on a monthly basis.
5. Most busy day and month – In a week which day outperforms the best, and in a year, which month
includes the most conversations?
6. Weekly activity map
7. Most Busy Users
8. Top and common words in conversation
9. Emoji analysis

2
2. Prerequisites for Data Analysis

• Python: You should be familiar with python basics and syntax.


• Pandas: It is a python library used to preprocess the data. We are working with a dataframe, so
we will need to apply some processing functions of pandas. Also used for Data Analysis.
• Matplotlib: Python library for data visualization and Data Analysis.
• Streamlit: Python-based UI framework used for creating the web application without HTML or
CSS. The basics of streamlit are sufficient to understand the syntax. Please refer to this article if
you do not know about streamlit or want to explore it.

3
3.How to Export Chat from WhatsApp?

The WhatsApp chat is our important data for analysis. So to get the text file of the chat follow the
simple 3-step process. We are working in a 24-hour date-time format, so before uploading the
chat file, convert the data and time setting to a 24-hour format.

Open any WhatsApp group or individual chat you want to analyze


Click the three dots on the top right corner and click more.
You will find an option for export chat, click it and select without media.
Now you can share or download the text file of chats

4
4.Data Analysis: Project Development Steps

We are ready with a theoretical explanation of the project and its time for development. Before
developing, we need to keep our steps clear so as mentioned. So you have to export one
WhatsApp chat and create a new Jupiter notebook or Google colab.

2 Load the text file and convert the chat to Dataframe.


3 create an analytics function to meet each objective.
4 Create a Streamlit app to integrate each function to display our analysis.
5 Deploy the app to the cloud for the use of people to get an analysis of any chat.

4.1.Data Analysis: Create Data Frame from Chat File

We need to create a dataframe from a text file containing WhatsApp chats. The first column will
contain the user message and the name, and the second column will contain the message’s date.
So, first, we load the file in read mode. After that, we have to separate the message and dates, so
we will use ReGEX (Regular Expression) to find the data and separate the message. We split the
pattern to separate the message from the dates, and after that, we can pick all the dates by
applying the pattern. Below is the code snippet with comments to better understand each
statement.

5
6
4.2.Separate the Message and User Name

We have a dataframe, but the user name and message are present in a single column. So to
separate them, we just split the string with a colon by matching it with alphanumeric characters
and pick the first string as a name and the second as a message .

7
4.3. Breaking the Date Column into Different Columns

we will break down the date column into multiple columns for better analysis to create
the year, month, day, hour, and minute columns.

Now we have the dataframe ready, we can start our analysis and showcase each with interactive
graphs. we also need to add a period column to data representing the hour combination, like data
recorded between what hour and what hour. When we create a streamlit app, we will provide an
option for overall analysis or user-level analysis where the dataframe will be filtered, so our
functions run fine.

8
9
5 Data Analysis: Display Basic Statistics for Data Analysis

We are supposed to provide an overview of chats that include total messages, words, and media
shared to get an idea of how much talk is done.

5.1. Get the Total Number of Messages

To find the total number of messages, you only need to print the number of rows in the message
column or the number of rows in the dataframe. To find the number of messages of a particular
user, you only need to select the user, and the dataframe gets filtered so the results will be
correct.

5.2. Get the Total Number of Words

To find the total number of words, you need to loop over the messages column and find the sum
of the length of each message. In simple words, count the number of words in each message and
sum up them.

5.3. Get the Number of Media Messages

We selected an option without media when we exported the chat data into a text file. So in the
text file, instead of media, there is a text as media omitted. So to count the number of media
files shared, we can display the count of word media omitted.

10
5.4. Get the Total Number of Links Shared

To find the count of links, we have a rich library in python as a URL extractor that can extract
all the URLs from the given string in a list. So we will find all the URLs in a message and sum
their count.

6 Data Analysis: Find the Busiest Users in Group

The stats only apply to a group-level analysis and will not work on a user level. We will find
the top 5 users whose chats are more compared to others. To find the top users, count the
number of messages sent by each, sort the count in descending, and extract the top five. You do
not need to follow the complete process because we directly have a function in Pandas to do so.
We want to display the analysis with the help of a bar graph.

11
Along with displaying the bar graph, we will also display the percentage of chats
each user has done. To find the percentage, you only need to divide the count of each
user by the total number of messages and multiply by 100. After that, we round off
the value with 2 decimal places and convert it to a dataframe by renaming column
names.

12
6.1. Display Top Words in a Chat

We will display a word cloud that will display the top words frequently used in chats, meaning
words with a higher frequency than others, and get displayed according to their size in the word
cloud. The word cloud will be generated with the help of your message column, and to display
this, python directly supports the word cloud library, which is mainly used in text mining .

We must clean the data a bit to find the most frequent words. And if you want to see the stop
words and below problem in data, then run the code without applying the transformations once
and then apply the transformation to observe the difference.

• Remove Group Notifications: There are many notifications with different analyses, so
we need to remove them.
• Remove Media Omitted: A lot of media is being shared, and we have omitted the
media files, so they are embedded as text which needs to be removed.
• Remove stop words: We have the WhatsApp chat data in Hindi plus English because
we Indians used to type in both languages. Python stop words library supports only
English, so if you have chats only in English, then well and good. Else you can
download this file which has stop words in both languages. You can try it according to
your chats. You can also remove punctuations or other frequently used characters; for
example, some people write ‘Hi’ as ‘Hie.’

13
• The remove stopwords function loads the stop words file and, in each message,
checks whether the words present in the message are in the stop words list or not. If
found, exclude that word and include all remaining words in a message.
• The remove punctuation function checks for any kind of punctuation and removes
them. the string module of python provides us with all the functions which we
replace in a string with an empty string using ReGEX.

14
6.2. Find the Top 20 Most Common Words

The question is slightly similar to the upper one, but the context is different, where
we have to find the top 20 frequently used words other than stop words. Stop words
are words that are helpful in forming a sentence, but they do not have any specific
meaning to the context. For this, we have to write some custom code where we need a
dictionary that stores a word as a key and its count in total messages. After preparing
the dictionary, we can find the top words with the highest frequency. The steps will
be similar to the above one, where we must clean the data.

15
7 Emoji Analysis

Expression is a part of body language to convey your message to another person. While
chatting, we use different types of emojis to express different feelings. We will analyze which
emoji is used and how many times in a chat.

To find the count of each emoji, you need to install one library named emoji. After this, the
code is very simple where first, we find the emoji in each message and store it in one list, and
after that, we count the occurrence of each emoji.

16
8 Time-based Analysis

Now We will do a time-based analysis where on the x-axis, there will be a timeline, and on the
y-axis, we will display the number of messages that will display the month when users were
more active. So we will display it on a monthly and daily basis.
8.1. Monthly Chats Timeline

We will display the line chart to showcase the number of active chats per month for which year.
For this, we count the messages by grouping them according to month and year columns. To
plot the chart, we combine month and year columns.

17
18
8.2. Daily Timeline

Similarly, we can create a daily timeline where you must group the data according to date and
count the number of messages. To display this analysis Line chart is perfect.

19
8.3. Day-based Activity Map

The analysis is to find the highest number of chats on which day of the week. Or we can say it
as which was the busiest day in a week.

20
8.4. Monthly Activity Map

Find in which month the most chats happened. The same you need to find for the busiest month
in a year as we have done above.

21
9 Which Time User Remains Active?

This is an interesting analysis where we want to showcase in 24 hours at what time a user is
more active and offline. The heatmap is an interesting graph to showcase this analysis. Where
the black color is there, it shows the user was offline, and where the color is there, it shows the
user is online. It helps businesses get an idea to post an advertisement at which time to expect
more feedback and clicks within a short span of time to rank their Add .

22
10 Creating Streamlit Web App after Data Analysis

Streamlit is a python web framework used to create data apps without any knowledge of front -
end (HTML, CSS, and JS), which is called the fastest way to build and deploy data apps. It
includes various functions with HTML elements pre-built, like buttons, sidebar, textbox, input
fields, etc. If you do not know about streamlit, you can review this article.

Create one folder for storing project files. First, you need to install the libraries required to
create an app. Open the command prompt or Anaconda, and in a project, the folder directory
and run the below commands one by one. After that, you need to create the below python files
to organize the code where we will combine the complete analysis code.

• Preprocessor.py – At the start, we did some data preprocessing, so we will store all
preprocessing in separate functions in a separate file.
• helper.py – We have created different analyses like the monthly, weekly, busy user, etc.,
so we will store each function in a helper file for each analysis.
• app.py is a main web app file where streamlit will be written, and we will get the data.
After that, using each helper function, we will display our analysis with streamlit on the
UI.

I hope that you have created the above files. We have done all the analysis and preprocessing in
a Jupiter notebook. Hence, we only need to combine them in a streamlit file which I am
providing in the below code snippets, along with

10.1. Preprocessor.py

In this file, we will accept the text file data, and we need to create and return the dataframe with
the respective columns as we have prepared in the beginning.

23
10.2. Helper.py

In the helper file, we have to write all analytics functions that will accept the dataframe and
selected user as the parameter and return the required results

24
11 Running the App on your localhost

We are done with the coding, and now you must be thinking of how it will look on the
server, so open the command prompt in the project file directory and run the below
command. After running, you will get a localhost URL to copy and open in your app’s
browser.

12 Prepare Cloud Files

To deploy an application on the cloud, we need to provide some details, and in order to meet the
criteria, we need to create some files for the cloud to understand the application and run it.

12.1.. Procfile

Create a file named Procfile without any extension, an indicator to the cloud about which file to run.

We have created a WhatsApp chat Analyzer which is running on our local server. On the

local system only, we can use the app by running the server, but if we want our app to be

used by the public and get feedback, we need to deploy an app on the cloud. So we have a

free cloud like Heroku that allows deploying any application and is visible through URL.

Prepare Cloud Files

To deploy an application on the cloud, we need to provide some details, and in order to

meet the criteria, we need to create some files for the cloud to understand the application

and run it.

25
12.2. Requirements

Create a file named requirements.txt in which we list all the libraries we have used to build
the project so that it will install all the required libraries before running the application on a
cloud. You can mention the version of libraries in the file to install.

12.3. Setup File

It is an important file for the cloud that helps to create the directory structure on the cloud.

26
9 Conclusion

Hurray! We have developed and deployed a data analysis project that analyses the WhatsApp
chat on a group and individual level. We started with the data ingestion with the analysis at
different levels and cloud deployment. So let us take the Key learning points we have learned
in this article.

• We have learned the Data Analysis Life Cycle and Machine learning Life cycle.
• Data visualizing charts like Heatmap, Bar graphs, and Line charts with their importance in
conveying the data.
• Emojis (body language) Play an important role in the conversation, and we have learned at a
basic level how to analyze emojis with python. There are more methods for Data Analysis to
analyze emojis that you can explore on the internet.

27
14.Resources

Below are the links to get the resource to the code and files for easy access and troubleshooting
of any errors while developing the project.

• Python Notebook for data analysis: Colab


• Streamlit Code Files: GitHub

28

You might also like