Dataset Description
In this project, you will use a dataset of Amazon reviews of unlocked phone. PromptCloud
extracted 400 thousand reviews of unlocked mobile phones sold on Amazon.com to find out
insights with respect to reviews, ratings, price and their relationships.
Tools and libraries:
,you will use the following tools and libraries:
• Python
• Pandas
• TextBlob
///////////////////////////////////////////////////////////////////////////////////////////////////////////
• Task 1: Write a Python function called sentimentAnalyzer(text). This function
takes a text (i.e review) and returns the sentiment as follow:
- Score < -0.2: returns ‘Negative’
- Score between -0.2 and 0.2: returns ‘Neutral’
- Score > 0.2: returns ‘Positive’
The score is the polarity of text as determined by TextBlob. An example of creating a
textblob object is:
blob = TextBlob(text)
(Refer to TextBlob’s documentation to learn about determining polarity)
• Task 2: Verify that the function does classify the sentiment correctly by passing the
following words to the function:
Word Polarity Resulting
Sentiment
happy 0.8 Positive
exciting 0.3 Positive
Positive good 0.7 Positive
rich 0.375 Positive
smile 0.3 Positive
sad -0.5 Negative
disappointed -0.75 Negative
Negative bad -0.699 Negative
poor -0.4 Negative
anger -0.7 Negative
food 0 Neutral
Neutral
animal 0 Neutral
• Task 3: Import the provided dataset into a Pandas DataFrame. Filter the data to only
include one product of your choice. The product you select must have at least 1000
reviews. Then, describe the data related to your product. This includes:
o Product name
o Number of rows
o Number of columns
o Length of the shortest review, length of the longest review, and the average length
of the review.
• Task 4: Apply the function sentimentAnalyzer(text)to the text column in your
dataframe. This should create a new column in the dataframe called (Sentiment) which
includes the sentiment for each review.
• Task 5: Using visualization and summative statistics (in pandas and matlibplot), describe
the results of the sentiment analysis of your product and analyze the results. This should
include:
o Visualizations of the results. For example, a bar chart showing number of
documents with positive, negative, and neutral sentiment. Include any
visualization you think is helpful.
o Examples of reviews with positive, negative, and neural sentiment along with
their polarity.
o Insights on what your client (producer or seller) need to do to minimize the
negative sentiment and improve their reputation and product.
o Examples of reviews where you think TextBlob might have assigned the wrong
sentiment to the review. Explain why you think this happened. Note: You may
use the rating column to assess TextBlob’s sentiment.
My #First 3ctasks code:
from textblob import TextBlob
import pandas as pd
#Task1
def sentimentAnalyzer(text):
blob = TextBlob(text)
score = blob.sentiment.polarity
if score < -0.2:
return 'Negative'
elif -0.2 <= score <= 0.2:
return 'Neutral'
else:
return 'Positive'
def sentimentScore(text):
blob = TextBlob(text)
score = blob.sentiment.polarity
return score
#Task2
words = ['happy' , 'exciting' , 'good' , 'rich' , 'smile' , 'sad' , 'disappointed' , 'bad' , 'poor' , 'anger' ,
'food' , 'animal']
print("\nTask2>>\n")
score = []
sentiment = []
for i in words:
score.append(sentimentScore(i))
sentiment.append(sentimentAnalyzer(i))
col_names = ['word' , 'Polarity' , 'Resulting Sentiment' ]
cols = [words , score , sentiment]
zipped = list(zip(col_names,cols))
data = dict(zipped)
results = pd.DataFrame(data)
print(results)
print("\n\n\n")
#Task3
print("Task3>>\n")
Amazon_reviews = pd.read_csv('/Users/reem/desktop/Amazon_Unlocked_Mobile.csv')
#print(Amazon_reviews)
product_name = 'Apple iPhone 4 32GB (Black) - AT&T'
filt = (Amazon_reviews['Product Name'] == product_name)
filterd = Amazon_reviews.loc[filt]
num_rows, num_columns = filterd.shape
shortest_review = filterd['Reviews'].apply(len).min()
longest_review = filterd['Reviews'].apply(len).max()
average_review = filterd['Reviews'].apply(len).mean()
print(f"Product Name: {product_name}")
print(f"Number of Rows: {num_rows}")
print(f"Number of Columns: {num_columns}")
print(f"Length of the Shortest Review: {shortest_review}")
print(f"Length of the Longest Review: {longest_review}")
print(f"Average Length of the Review: {average_review}")
#print(filterd)