0% found this document useful (0 votes)

347 views13 pages

Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report

This document summarizes a project report on sentiment analysis of movie reviews. The report discusses how sentiment analysis was performed on movie reviews from the NLTK movie reviews corpus using various natural language processing techniques. Three main approaches to preprocessing the data were explored: using the 2000 most frequent words, bag-of-words modeling, and bi-gram modeling. Naive Bayes classification was used to train models on the preprocessed data. Accuracy improved from 70% using unigrams only to 77% when combining unigram and bigram features. Potential applications of sentiment analysis discussed include brand monitoring, reputation management, and customer support.

Uploaded by

Himanshu Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

347 views13 pages

Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report

Uploaded by

Himanshu Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

SENTIMENT ANALYSIS ON MOVIE REVIEWS

Natural Language Processing UML602

Project Report

BE Third Year, COE

Submitted by:

101603120 Himanshu Dhiman

101603125 Himanshu Pandey

Submitted to:

Dr. Aashima Sharma

Computer Science and Engineering Department

TIET, Patiala
April, 2019
1. INTRODUCTION

Sentiment Analysis means analyzing the sentiment of a given text or document and categorizing
the text/document into a specific class or category (like positive and negative). In other words, we
can say that sentiment analysis classifies any particular text or document as positive or negative.
Basically, the classification is done for two classes: positive and negative. By definition Sentiment
analysis refers to the use of natural language processing, text analysis, computational linguistics,
and biometrics to systematically identify, extract, quantify, and study affective states and
subjective information. Sentiment Analysis is also referred as Opinion Mining. It’s mostly used in
social media and customer reviews data.

1.1 Steps involved during sentiment analysis

Figure 1.1
1.2 Libraries used

Natural Language Toolkit (NLTK)

NLTK is a leading platform for building Python programs to work with human language data. It
provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along
with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing,
and semantic reasoning.

2. STEPS OF WORKING

In this project, NLTK’s movie_reviews corpus is used as our labeled training data. The
movie_reviews corpus contains 2,000 movie reviews with sentiment polarity classification. The
two categories for classification are - positive and negative. The movie_reviews corpus already
has the reviews categorized as positive and negative. The reviews are categorized using supervised
classification technique. In supervised classification, the classifier is trained with labeled training
data.

The below shown figure depicts the working followed during training and testing of the model.

Figure 2.2
2.1 Pre-processing of data

Three different ways are used pre-process the data to achieve maximum training and testing
accuracy.

2.1.1 Using 2000 most frequently occurring words:

1. Convert movie review data into useful format

2. Remove Stopwords and Punctuation

3. Create word feature using 2000 most frequently occurring words

2.1.2 Bag of words feature

1. Create unique list based on positive and negative review

2. Shuffle both list separately and add equal no of reviews

3. Train classifier and test model

2.1.3 n-gram feature

1. Create unique list based on positive and negative review

2. We define two functions

bag_of_words: that extracts only unigram features from the movie review words

bag_of_ngrams: that extracts only bigram features from the movie review words

We then define another function

bag_of_all_words: that combines both unigram and bigram features

4. Train classifier and test model

2.2 Training of model

The model is trained using NLTK’s Naïve Bayes Classifier which is an in-built classifier of the
module. It’s a simple, fast, and easy classifier which performs well for small datasets. It’s a
simple probabilistic classifier based on applying Bayes’ theorem. Bayes’ theorem describes the
probability of an event, based on prior knowledge of conditions that might be related to the
event.
2.3 Testing of model

The model accuracy is tested on training data as well as on custom data input by the user.

3. CODE

Pre-processing of data

The below shown code creates frequency distribution of all the words in the document and removes
stop-words and punctuations from the text and as a result data is cleaned and cleaned words are
added to a new list.

Figure 3.1
Creating document feature using top-N occurring words

The below shown code creates the document feature using 2000 frequently occurring words and
then trains the model using Naïve Bayes classifier and prints the accuracy of the model.

Figure 3.2
Creating feature word using bag of words method

The code shown below categorizes the text as positive and negative in different lists which helps
to reduce positive and negative data in separately and then pre-processes the data.

Figure 3.3
Bi-Gram Feature

In bag of words feature extraction, we used only unigrams. In the example below, we will use
both unigram and bigram feature, i.e. we will deal with both single words and double words.

Figure 3.4
Training the model

After pre-processing, the created feature sets are trained using NLTK’s Naïve Bayes classifier.

Figure 3.5

Figure 3.6
4. Results

top-N most frequently occurring words –

Figure 4.1

We can see that custom negative reviews are categorized accurately but in case of positive
custom review we get inaccurate results.

In the top-N feature, we only used the top 2000 words in the feature set.

We combined the positive and negative reviews into a single list, randomized the list, and then
separated the train and test set.

This approach can result in the un-even distribution of positive and negative reviews across the
train and test set.
Bag of words Feature –

Figure 4.2

Now using bag of words feature we get appropriate results on custom test reviews but the overall
accuracy of the model is decreased to 70%
Bi-gram Feature –

Figure 4.3

The accuracy of the classifier has significantly increased when trained with combined feature set
(unigram + bigram).

Accuracy was 70% while using only Unigram features.

Accuracy has increased to 77% while using combined (unigram + bigram) features.
5. Applications & Future Scope

5.1 Brand Monitoring - or you could also call it Reputation management. We all know how
much good reputation means these days when the majority of us check social media reviews as
well as review sites before making a purchase decision.

5.2 Customer support - Social media are channels of communication with your customers
these days, and whenever they’re unhappy about something related to you, whether or not
it’s your fault, they’ll call you out on Facebook/Twitter/Instagram.

Such mentions will appear in your dashboard with a flashing red color, and you better start
engaging them as soon as they are there.

People nowadays expect brands to respond on social media almost immediately, and if
you’re not quick enough, you might as well see them moving on to your competitors instead
of waiting for your reply.

Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
The Book of The Dun Cow by Walter Wangerin - Teacher Study Guide
No ratings yet
The Book of The Dun Cow by Walter Wangerin - Teacher Study Guide
33 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
Sentiment Analysis Report
No ratings yet
Sentiment Analysis Report
31 pages
Multilayer Perceptron (MLP) & Linear Separabaility
No ratings yet
Multilayer Perceptron (MLP) & Linear Separabaility
7 pages
Sentiment Analysis Over Online Product Reviews A Survey
No ratings yet
Sentiment Analysis Over Online Product Reviews A Survey
9 pages
DIP IPT Unit V Complete
No ratings yet
DIP IPT Unit V Complete
69 pages
Data Structures - Map ADT
No ratings yet
Data Structures - Map ADT
7 pages
DL Unit5 RNN
No ratings yet
DL Unit5 RNN
107 pages
Sentiment Analysis with AI-Deep Learning
No ratings yet
Sentiment Analysis with AI-Deep Learning
74 pages
Fds - Syllabus-2 Engineering Sppu
No ratings yet
Fds - Syllabus-2 Engineering Sppu
8 pages
M.Tech CSE Syllabus Notes
No ratings yet
M.Tech CSE Syllabus Notes
32 pages
A.V.C College of Engineering, Mannampandal M.E - Applied Electronics
No ratings yet
A.V.C College of Engineering, Mannampandal M.E - Applied Electronics
3 pages
Analog Circuits for ECE Students
No ratings yet
Analog Circuits for ECE Students
147 pages
Tsukamoto Fuzzy Model
No ratings yet
Tsukamoto Fuzzy Model
23 pages
Vanishing and Exploding
No ratings yet
Vanishing and Exploding
9 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Advanced Machine Learning: Module-1
No ratings yet
Advanced Machine Learning: Module-1
164 pages
NNFL Question Paper
No ratings yet
NNFL Question Paper
2 pages
SAMPLE PPT For Avishkar NEW - PPT 2
No ratings yet
SAMPLE PPT For Avishkar NEW - PPT 2
13 pages
AI Mid-Term
No ratings yet
AI Mid-Term
3 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
Sentimental Analysis Final Year Project
No ratings yet
Sentimental Analysis Final Year Project
21 pages
CS Lab Manual New Format
No ratings yet
CS Lab Manual New Format
57 pages
EC3451-LIC Notes
No ratings yet
EC3451-LIC Notes
221 pages
Various Neural Network Architect Assignment Questions
No ratings yet
Various Neural Network Architect Assignment Questions
9 pages
(New) (New) ML KNN Introduction Handwritten Notes
No ratings yet
(New) (New) ML KNN Introduction Handwritten Notes
6 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
ML Viva Questions
No ratings yet
ML Viva Questions
8 pages
Soft Computing Lab Manual
No ratings yet
Soft Computing Lab Manual
24 pages
Autoencoders & Keras Overview
No ratings yet
Autoencoders & Keras Overview
42 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
Neural Networks
No ratings yet
Neural Networks
1 page
NOSQL
No ratings yet
NOSQL
16 pages
DL Notes ALL
No ratings yet
DL Notes ALL
63 pages
CS3401-ALGORITHMS QB Original
No ratings yet
CS3401-ALGORITHMS QB Original
51 pages
Decision Tree and Ensemble
No ratings yet
Decision Tree and Ensemble
92 pages
Internship Papers Previous
No ratings yet
Internship Papers Previous
52 pages
FPGA Design Flow & Experiment 1
No ratings yet
FPGA Design Flow & Experiment 1
5 pages
(Ebook) Generative Deep Learning, 2nd Edition (Third Early Release) by David Foster ISBN 9781098134174, 1098134176
No ratings yet
(Ebook) Generative Deep Learning, 2nd Edition (Third Early Release) by David Foster ISBN 9781098134174, 1098134176
81 pages
Image Classification-AIML Project Presentation
No ratings yet
Image Classification-AIML Project Presentation
18 pages
MATLAB Moving Object Detection
No ratings yet
MATLAB Moving Object Detection
19 pages
18AI61
No ratings yet
18AI61
3 pages
Mod. 3.3 Indirect TCP, Snooping TCP, Mobile TCP
No ratings yet
Mod. 3.3 Indirect TCP, Snooping TCP, Mobile TCP
18 pages
Notes On COMPUTER VISION
No ratings yet
Notes On COMPUTER VISION
10 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
DC Characteristics of Op-Amp
No ratings yet
DC Characteristics of Op-Amp
6 pages
FL 03 Defuzzification
No ratings yet
FL 03 Defuzzification
55 pages
Topic - 7 (Uncertainty)
No ratings yet
Topic - 7 (Uncertainty)
25 pages
Ieee Paper
No ratings yet
Ieee Paper
5 pages
Scilab for ECE Students
No ratings yet
Scilab for ECE Students
24 pages
Computer Vision Seminar Report
No ratings yet
Computer Vision Seminar Report
45 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Assignment of ML
No ratings yet
Assignment of ML
5 pages
355955B30 Siddesh Mahind SMA Exp-5
No ratings yet
355955B30 Siddesh Mahind SMA Exp-5
11 pages
Imdb Article (23bai11047)
No ratings yet
Imdb Article (23bai11047)
9 pages
Document Movie Review
No ratings yet
Document Movie Review
31 pages
Final Presentation
No ratings yet
Final Presentation
18 pages
Detailed Report
No ratings yet
Detailed Report
6 pages
Namma Kalvi 12th Zoology Question Bank em 217045
No ratings yet
Namma Kalvi 12th Zoology Question Bank em 217045
45 pages
Top 100 AI Tools for Productivity
No ratings yet
Top 100 AI Tools for Productivity
19 pages
New Design of Intelligent Load Shedding Algorithm Based On Critical Line Overloads To Reduce Network Cascading Failure Risks
No ratings yet
New Design of Intelligent Load Shedding Algorithm Based On Critical Line Overloads To Reduce Network Cascading Failure Risks
15 pages
Physical Education Class 12 Important Questions Chapter 10 Kinesiology Biomechanics and Sports - Learn CBSE
No ratings yet
Physical Education Class 12 Important Questions Chapter 10 Kinesiology Biomechanics and Sports - Learn CBSE
14 pages
Special Instructions for IAEA Bidders
No ratings yet
Special Instructions for IAEA Bidders
5 pages
Fender
No ratings yet
Fender
14 pages
Aircraft Electrical Load and Power Source Capacity Analysis: Standard Guide For
100% (4)
Aircraft Electrical Load and Power Source Capacity Analysis: Standard Guide For
8 pages
Blower & Vacuum Pump: IRS-32A・IRS-40A・IRS-50H/L・IRS-65H/L IRS-80H/L・IRS-100L・IRS-125R/L・IRS-150R/L
No ratings yet
Blower & Vacuum Pump: IRS-32A・IRS-40A・IRS-50H/L・IRS-65H/L IRS-80H/L・IRS-100L・IRS-125R/L・IRS-150R/L
68 pages
Egsh064784 (1) - 060844
No ratings yet
Egsh064784 (1) - 060844
1 page
NARAYANI MAHAL Job Fare
No ratings yet
NARAYANI MAHAL Job Fare
2 pages
Automatic Door Solutions Guide
No ratings yet
Automatic Door Solutions Guide
5 pages
ANZ J. Surg. 2008 78 (Suppl. 1) A68-A80
No ratings yet
ANZ J. Surg. 2008 78 (Suppl. 1) A68-A80
13 pages
Martin Et Al Manuscript Final
No ratings yet
Martin Et Al Manuscript Final
74 pages
Ethiopian Construction Claims Study
100% (1)
Ethiopian Construction Claims Study
128 pages
Selling Task % Weight of Task in Sales Process % Advertising Contribution To Task Advertising's Contribution To Sales Estimated Estimated Projected
100% (1)
Selling Task % Weight of Task in Sales Process % Advertising Contribution To Task Advertising's Contribution To Sales Estimated Estimated Projected
2 pages
Equipment Design: Mechanical Aspects Week 1 Assignment - 1 Solution
No ratings yet
Equipment Design: Mechanical Aspects Week 1 Assignment - 1 Solution
4 pages
Three-Dimensional Printing (3D Printing) : by Dr. Vineet Srivastava
No ratings yet
Three-Dimensional Printing (3D Printing) : by Dr. Vineet Srivastava
9 pages
Industrial Two Roll Mill Quotation
No ratings yet
Industrial Two Roll Mill Quotation
3 pages
CCC Professional Cloud Security Manager
No ratings yet
CCC Professional Cloud Security Manager
32 pages
Some Basic Concepts of Chemistry
No ratings yet
Some Basic Concepts of Chemistry
19 pages
Puritanism & Early American Literature
No ratings yet
Puritanism & Early American Literature
4 pages
Well Productivity in An Iranian Gas-Cond
No ratings yet
Well Productivity in An Iranian Gas-Cond
11 pages
P 1515 - Design and Contstruction of Anchored and Strutted Sheet Pile Walls Iin Soft Clay PDF
No ratings yet
P 1515 - Design and Contstruction of Anchored and Strutted Sheet Pile Walls Iin Soft Clay PDF
36 pages
Career Adaptation Strategies
No ratings yet
Career Adaptation Strategies
4 pages
Vivekananda Universe
No ratings yet
Vivekananda Universe
4 pages
17 Managerial Roles
No ratings yet
17 Managerial Roles
4 pages
Anterior Uveitis
No ratings yet
Anterior Uveitis
65 pages
Traction Alternator Type Ta10106cy
No ratings yet
Traction Alternator Type Ta10106cy
64 pages
Electromagnetic Warp Drive Theory
No ratings yet
Electromagnetic Warp Drive Theory
16 pages