Kaggle Quora Challenge

How to run this repo

Install the required packages listed in the dependencies by using "pip install [name]"
Execute the data_download.py file
- Make sure you see the success message at the end
- Should the file download fail, follow the instruction at the top of the file
Execute any file you want!
- Each file is completely self contained, since these files used to be notebooks on kaggle.com
- You can find a description below

Available Approaches & Descriptions

data_analysis.py Gives an overview over the data
embedding_CNN_LSTM.py Contains a CNN and a LSTM classifier
meanembedding_SVM Contains a SVM classifier with mean embedding vectorization
metaembedding_Capsule Contains a Capsule classifier with a meta embedding (External)
metaembedding_Ensemble Contains a Ensemble classifier with a meta embedding (External)
preprocessing_comparison Contains a comparison between using and not using preprocessing before the embedding
tfidf_SVM Contains a SVM classifier with TF-IDF vectorization

Requirements

Have around 22 GB of free disk space
Have 16GB+ of RAM for loading the embeddings
Have all specified packages & data files
Recommended: Have a GPU
Recommended: Have tensorflow-gpu installed

Dependencies

Please make sure to have these python packages installed before runnging the code

pandas
tqdm
textblob
timeit
sklearn
keras
tensorflow
numpy

Rights

Certain code snippets are taken from different owners. I tried to include all sources.
This GitHub Repo https://github.com/MarvinThiele/KaggleQuoraChallenge

Troubleshooting

I receive GPU OOM errors when trying to execute the code
- To fix this issue you can reduce the batch size in the code (at train_pred())
- If this doesnt fix the issue, it's most likely the embedding layer, which cannot be reduced in size.
- Use a server with more VRAM in this case
My computer freezes and doesn't work anymore
- This can happen if you are running less than 16GB of RAM
- This is caused by loading the word embeddings into the memory
I have problems not listed here
- Contact me at [email protected]

Code from other authors

I have taken code snippets & inspiration from other public kaggle notebooks
Special Thanks to:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
input		input
.gitignore		.gitignore
README.md		README.md
Report_MarvinThiele.pdf		Report_MarvinThiele.pdf
data_analysis.py		data_analysis.py
data_download.py		data_download.py
embedding_CNN_LSTM.py		embedding_CNN_LSTM.py
meanembedding_SVM.py		meanembedding_SVM.py
metaembedding_Capsule.py		metaembedding_Capsule.py
metaembedding_Ensemble.py		metaembedding_Ensemble.py
preprocessing_comparison.py		preprocessing_comparison.py
tfidf_SVM.py		tfidf_SVM.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kaggle Quora Challenge

How to run this repo

Available Approaches & Descriptions

Requirements

Dependencies

Rights

Troubleshooting

Code from other authors

About

Uh oh!

Releases

Packages

Languages

MarvinThiele/KaggleQuoraChallenge

Folders and files

Latest commit

History

Repository files navigation

Kaggle Quora Challenge

How to run this repo

Available Approaches & Descriptions

Requirements

Dependencies

Rights

Troubleshooting

Code from other authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages