The project predicts the sentiment ( positive or negative) on real-time tweets. Tweets are read using the tweepy API with a python Producer process and pushed into a Kinesis data stream. A consumer python process reads the tweets and calls the prediction API to predict the sentiment. The API is a Django application that uses a Logistic Regression model to make predictions. In Django, the response predictions are displayed on a dashboard to depict trends. The Twitter topic to be used when pulling tweets can be configured, along with the number of tweets to be read, at a time. The raw tweets and the predictions are stored on AWS S3 as JSON files
- Process Flow
- Data Collection
- Data preprocessing Model selection, training
- Installation
- Project Organization
- Credits
- License
Datasets used:
Sentiment140 Dataset Details
Source : http://help.sentiment140.com/for-students
Description: The training data was automatically created, as opposed to having humans manual annotate tweets. In the approach used, any tweet with positive emoticons, like :), were positive, and tweets with negative emoticons, like :(, were negative. We used the Twitter Search API to collect these tweets by using keyword search. This is described in the following paper(https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf) The data is a CSV with emoticons removed.
Data preprocessing ,vectorization ,evaluation of multiple models and training in the following notebook Notebook for model training
Clone the repository
git clone [email protected]:agvar/Prediction_Text.git
To deploy the django project on elastic beanstalk:
Follow the aws guide:
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-django.html
The following changes need to be made after the initial Beanstalk deployment
- The first time the env and application are created in EBS, make sure the only config file in the .ebextensions has the boiler plate code- make sure the application name is modified
option_settings:
aws:elasticbeanstalk:container:python:
WSGIPath: ebdjango.wsgi:application
- Check the eb status- the status would RED meaning the application is not ready or usable
- change the config files to as follows
The django.config
option_settings:
aws:elasticbeanstalk:application:environment:
DJANGO_SETTINGS_MODULE: "Prediction_API.settings"
PYTHONPATH: "/var/app/current:$PYTHONPATH"
aws:elasticbeanstalk:container:python:
WSGIPath: Prediction_API.wsgi:application
The ngix.config( This is added to increase the timeouts)
option_settings:
namespace: aws:elb:policies
option_name: ConnectionSettingIdleTimeout
value: 300
files:
"/etc/nginx/conf.d/nginx.custom.conf":
mode: "644"
owner: "root"
group: "root"
content: |
client_header_timeout 300;
client_body_timeout 300;
send_timeout 300;
proxy_connect_timeout 300;
proxy_read_timeout 300;
proxy_send_timeout 300;
container_commands:
01_restart_nginx:
command: "sudo service nginx reload"
- change the settings.py django file as follows
ALLOWED_HOSTS = ['<ebs service name>',' <IP address of the EC2 instance>']
The first element is the ebs service, the other is the EC2 instance on the EBS
- Deploy using eb deploy and run eb status again
(Always remember to add any changes to git before deploying)
pip install -r requirements.txt
Modify the ./twitter_streaming/twitter_streaming.ini file as needed to update the following:
stream_filter
->set the filter on tweets to be processed
stream_language
-> sets the tweet language to look for
file_max_tweet_limit
-> maximum of tweets to be processed into a single json file
collect_max_tweet_limt
-> maximum of tweets to be processed on a single run.
To execute Producer process
python ./twitter_streaming/twitter_streaming/producer/twitter_stream_message_producer.py
To execute Consumer process
python ./twitter_streaming/twitter_streaming/consumer/twitter_stream_message_consumer.py
The consumer process reads tweets from Kinesis and calls the tensorFlow api to make predictions and stores them on s3.
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── Prediction_API <- Django API folder
├── Sentiment_Prediction_DS <- Python notebooks ,models folder
├── twitter_streaming <- consumer and producer modules for reading from tweepy and writing to Kineses, S3
└── images <- images,diagrams for the project
https://ileriayo.github.io/markdown-badges/
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-django.html