Boosted Models for Parkinsons Prediction

View the Web App here:

Overview

Using the first 12 months of doctor's visits where protein mass spectometry data has been recorded, the model is meant to assist doctors in determining whether a patient is likely to develop moderate-to-severe parkinsons for the UPDRS 1, 2, and 3. A categorical prediction of 1 means the patient is predicted to have moderate-to-severe UPDRS rating at some point in the future. A categorical prediction of 0 means the patient is predicted to have none-to-mild UPDRS ratings in the future. If a protein or peptide column is not present in the data, then it is given a value of 0, meaning it is not present in the sample. The visit month is defined as the months since the first recorded visit. It is necessary for predicting the UPDRS score with these models. The column upd23b_clinical_state_on_medication is based on whether the patient was taking medication during the clinical evaluation and can be values "On", "Off", or NaN.

updrs 1 categorical ratings: 10 and below is mild, 11 to 21 is moderate, 22 and above is severe
updrs 2 categorical ratings: 12 and below is mild, 13 to 29 is moderate, 30 and above is severe
updrs 3 categorical ratings: 32 and below is mild, 33 to 58 is moderate, 59 and above is severe
updrs 4 was dropped due to too few samples for training

Project Write-Up

For an in-depth look at the project view the following document:

Parkinsons Project Write-Up

Data Source

The raw data can be found at Kaggle Parkinsons Dataset

To Use this Project

Make Predictions

Option 1:

Take the Kaggle dataset and get predictions for each of the patients

Take the file train_peptides.csv, train_proteins.csv, train_clinical_data.csv from Kaggle link in the Data Source section of this README. Place those files csv files in the ./data/raw/ directory
Create a python virtual environment
Use the Makefile to install requirements:
$ make install
CD into the src/ directory and run the prediciton pipeline:
$ python pred_pipeline.py
This will process all of the raw data and run predictions with the trained models, which can be found in ./models/prod_models/, and a new file called full_updrs_preds.csv will be created in the ./data/predictions/ directory
- The predictions will have the column names:
- "updrs_1_cat_preds"
- "updrs_2_cat_preds"
- "updrs_3_cat_preds"

Option 2:

Use your own input of protein and peptide data that is a .json file with "visit_month", "patient_id", and the protein and peptide names:values. Or use the examples in ./data/api_examples/ to return a prediction.

Create a virtual environment
Install the dependencies:
$ make install
Change to the src directory:
$ cd src
Run the prediction pipeline file with your data filepath:
$ python pred_pipeline_user_input.py file/path/to/data.json
The raw data and predictions are stored in ./data/predictions/ with the name {visit_id}_predictions.json
- If "visit_id" is in the input data file keys then that will be used, otherwise "visit_id" is the {patient_id}_{visit_month}

Option 3:

Take user input of protein and peptide data and perform a prediction, or use the example .json files from ./data/api_examples/ to return a prediction.

Build the docker image:
docker build -t parkinsons-predict .
Confirm the docker images is listed:
docker images parkinsons-predict
Run the docker container in port 5000:
docker run -p 5000:5000 -d --name parkinsons-predict parkinsons-predict
Confirm it is running by visiting http://localhost:5000 in the web browser
- It should read "Welcome to the Parkinsons Prediction API"
Run a automatic test prediction by visiting http://localhost:5000/test_predict
- It should return a json string with the predictions and the visit_id
Make an API request with the example data:
$ python api_request.py ./data/api_examples/16566_24_data.json
Make an API request with your own json data:
$ python api_request.py file/path/to/data.json
A json file will be stored in ./data/predictions/ with the name visit_id_prediction.json

Notebook Descriptions

Compare_medication_SMOTE_1yr_Models
- This notebook trains the models using SMOTE data, including the medication data, for visits that fall within 12 months or less. The notebook then compares these models. These models use hyperparameters values that were optimized using the hyperopt package. Two of the final models are in this notebook.
Compare_medication_SMOTE_1yr_Baseline_Models
- This notebook trains and compares the performance of models using the default parameters. The data for evaluation includes the SMOTE data and medication data for visits that fall within 12 months or less.
Compare_SMOTE_1yr_Models:
- This notebook trains the models using SMOTE data, with no medication data, for visits that fall within 12 months or less. The notebook then compares these models. These models use hyperparameters values that were optimized using the hyperopt package. Two of the final models are in this notebook.
Compare_finetune_1yr_Cat_Results:
- This notebook compares the performance of models that were trained using data that did not have class imbalanced processing performed and without medication data for patient visits that fall within 12 months or less. These models were tuned manually.
Compare_1yr_Cat_Results:
- This notebook compares the performance of models using their default settings on data that was not preprocessed and does not include any medication data for patient visits that fall within 12 months or less.
Compare_Categorical_Results:
- This notebook compares the performance of models using their default settings on data that was not preprocessed and does not include any medication data and includes data from all patient visits.
Raw_Data_EDA:
- Prelimary EDA of the UPDRS values including value counts, histogram plots, correlations with each other and features, and UPDRS max value changes over time.
Patient_Data_EDA:
- Explores the distribution of patient visits, number of proteins with non-zero values for each class of patient, and subsampling the data to visit months of 12 or less and of 24 or less.
First_12_Months_EDA:
- Distribution of visit months and how many patients have their max UPDRS in the first 12 months and from which category.
EDA_RandomForest:
- Explores the data for regression analysis. Explores the medication data. Looks at the correlation between protein values and the UPRDS values. Also the linear relationship of the visit month and the UPDRS score as well as the visit month and the change in UPDRS score.

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boosted Models for Parkinsons Prediction

View the Web App here:

Overview

Project Write-Up

Data Source

To Use this Project

Make Predictions

Option 1:

Option 2:

Option 3:

Notebook Descriptions

Project Organization

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
.vscode		.vscode
data		data
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
streamlit_data		streamlit_data
webapp		webapp
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Parkinsons_Project_Write_Up.pdf		Parkinsons_Project_Write_Up.pdf
README.md		README.md
api_request.py		api_request.py
index.html		index.html
parkinsons_proj_1.code-workspace		parkinsons_proj_1.code-workspace
requirements.txt		requirements.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py
tox.ini		tox.ini

License

dagartga/Boosted-Models-for-Parkinsons-Prediction

Folders and files

Latest commit

History

Repository files navigation

Boosted Models for Parkinsons Prediction

View the Web App here:

Overview

Project Write-Up

Data Source

To Use this Project

Make Predictions

Option 1:

Option 2:

Option 3:

Notebook Descriptions

Project Organization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages