Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The project predicts the sentiment ( positive or negative) on real-time tweets using Tweepy, AWS Kinesis Streaming, S3, Django REST Framework

Notifications You must be signed in to change notification settings

agvar/Prediction_Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Jupyter Notebook scikit-learn DjangoREST AWS

Twitter Sentiment Prediction

Description

The project predicts the sentiment ( positive or negative) on real-time tweets. Tweets are read using the tweepy API with a python Producer process and pushed into a Kinesis data stream. A consumer python process reads the tweets and calls the prediction API to predict the sentiment. The API is a Django application that uses a Logistic Regression model to make predictions. In Django, the response predictions are displayed on a dashboard to depict trends. The Twitter topic to be used when pulling tweets can be configured, along with the number of tweets to be read, at a time. The raw tweets and the predictions are stored on AWS S3 as JSON files

Table of Contents

Process Flow

Architecture Diagram

Data Collection

Datasets used:

Sentiment140 Dataset Details
Source : http://help.sentiment140.com/for-students
Description: The training data was automatically created, as opposed to having humans manual annotate tweets. In the approach used, any tweet with positive emoticons, like :), were positive, and tweets with negative emoticons, like :(, were negative. We used the Twitter Search API to collect these tweets by using keyword search. This is described in the following paper(https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf) The data is a CSV with emoticons removed.

Data Pre-processing,Model selection, training

Data preprocessing ,vectorization ,evaluation of multiple models and training in the following notebook Notebook for model training

Installation

Clone the repository
git clone [email protected]:agvar/Prediction_Text.git

Install django on AWS Elastic Beanstalk

To deploy the django project on elastic beanstalk:
Follow the aws guide:
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-django.html
The following changes need to be made after the initial Beanstalk deployment

  1. The first time the env and application are created in EBS, make sure the only config file in the .ebextensions has the boiler plate code- make sure the application name is modified
    option_settings:
    aws:elasticbeanstalk:container:python:
    WSGIPath: ebdjango.wsgi:application
  2. Check the eb status- the status would RED meaning the application is not ready or usable
  3. change the config files to as follows
    The django.config
    option_settings:
    aws:elasticbeanstalk:application:environment:
    DJANGO_SETTINGS_MODULE: "Prediction_API.settings"
    PYTHONPATH: "/var/app/current:$PYTHONPATH"
    aws:elasticbeanstalk:container:python:
    WSGIPath: Prediction_API.wsgi:application

The ngix.config( This is added to increase the timeouts)
option_settings:

  • namespace: aws:elb:policies
    option_name: ConnectionSettingIdleTimeout
    value: 300
    files:
    "/etc/nginx/conf.d/nginx.custom.conf":
    mode: "644"
    owner: "root"
    group: "root"
    content: |
    client_header_timeout 300;
    client_body_timeout 300;
    send_timeout 300;
    proxy_connect_timeout 300;
    proxy_read_timeout 300;
    proxy_send_timeout 300;

container_commands:
01_restart_nginx:
command: "sudo service nginx reload"

  1. change the settings.py django file as follows

ALLOWED_HOSTS = ['<ebs service name>',' <IP address of the EC2 instance>']

The first element is the ebs service, the other is the EC2 instance on the EBS

  1. Deploy using eb deploy and run eb status again
    (Always remember to add any changes to git before deploying)

Install the AWS Kinesis consumer and producer

pip install -r requirements.txt

Modify the ./twitter_streaming/twitter_streaming.ini file as needed to update the following:
stream_filter ->set the filter on tweets to be processed
stream_language -> sets the tweet language to look for
file_max_tweet_limit -> maximum of tweets to be processed into a single json file
collect_max_tweet_limt -> maximum of tweets to be processed on a single run.

To execute Producer process
python ./twitter_streaming/twitter_streaming/producer/twitter_stream_message_producer.py

To execute Consumer process python ./twitter_streaming/twitter_streaming/consumer/twitter_stream_message_consumer.py The consumer process reads tweets from Kinesis and calls the tensorFlow api to make predictions and stores them on s3.

Project Organization

├── LICENSE
├── README.md                  <- The top-level README for developers using this project.
├── Prediction_API             <- Django API folder
├── Sentiment_Prediction_DS    <- Python notebooks ,models folder
├── twitter_streaming          <- consumer and producer modules for reading from tweepy and writing to Kineses, S3
└── images                     <- images,diagrams for the project

Credits

https://ileriayo.github.io/markdown-badges/
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-django.html

License

License

About

The project predicts the sentiment ( positive or negative) on real-time tweets using Tweepy, AWS Kinesis Streaming, S3, Django REST Framework

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published