After a disaster the social media will get millions of messages about what had happened and the immediate needs of the affected people. Following the disaster response organizations will need to filter this data and take actions to specific problems.
The goal of the project is to analize disaster messages from [Figure Eight] (https://appen.com/) by building a supervised learning model that classifies the messages into 36 pre-defined categories.
This project contains a web app in where a user can input a new message and the classification result will be displayed. Moreover, it will display some visualization of the data.
Th files contains python and HTML files. It requires Pythin version 3.* and the following packages: pandas, numpy, pickle, re, nltk, sklear, sqlalchemy, sys, warnings, json, ploty and flask.
It is included in the process_data.py file. The script takes the file paths of the two datasets and database, cleans the datasets, and stores the clean data into a
SQLite database called DisasterResponse.db.
It is included in the train_classifier.py file. This file:
- Loads data from the SQLite database
- Splits the dataset into training and test sets
- Builds a text processing and machine learning pipeline that uses NLTK, scikit-learn's Pipeline and GridSearchCV
- Trains and tunes a model using GridSearchCV
- Outputs results on the test set: a final model that uses the message column to predict classifications for 36 categories
- Exports the final model as a pickle file called classifier.pkl
The web app enables the user to enter a disaster message, and then view in which of the 36 categories it is classified. The main page includes two visualizations of the database in which the model has been trained.
The files in the project follow the structure below:
- app
- template
master.html- main page of web appgo.html- classification result page of web app
run.py- Flask file that runs app
- template
- data
- disaster_categories.csv - input data to process
- disaster_messages.csv - input data to process
process_data.pyDisasterResponse.db- output database containing the clean data
- models
train_classifier.py- MPL modelclassifier.pkl- saved model *README.md
- Run ETL pipeline from the data folder:
python process_data.py messages.csv categories.csv DisasterResponse.db - Run ML pipeline:
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl - Run the web app from the app folder:
python run.py - Go to: http://localhost:3001
This project is part of the Udacity Data Analysis Nanodegree.