Leaf Classification from Kaggle

#Overview

Mixed model of 2 layer convolutional neural net trained on images and 1 layer convolutional neural net trained on features.
Images standardized to 128 x 128 with proportional scaling up to largest dimension of all the images (1706 x 1706) and then scaled back down

#Getting Started

Install nvidia driver according to:
- https://github.com/NVIDIA/nvidia-docker/wiki/Deploy-on-Amazon-EC2
- In order for it to survive restarts of your ec2 instance, refer to:
  - NVIDIA/nvidia-docker#137
- Use either Docker image
  - docker pull rllin/gpu-tensorflow-python
  - sudo nvidia-docker run -itd --name=leaf -e "PASSWORD=password" -p 8754:8888 -p 6006:6006 rllin/gpu-tensorflow-python
  - sudo nvidia-docker exec -it leaf bash
  - cd ./leaf_classification
  - git pull
- or requirements.txt (not tested)
  - pip install -r requirements.txt
  - git clone https://github.com/rllin/leaf_classification.git

#Detailed Usage

python run_specific.py will start a training and validation session using the following hyperparamaters:

{
  "f_conv1_num": 8,
  "f_conv1_out": 512,
  "f_d_out": 1024,
  "f_dropout": 0.8255444236474426,
  "conv1_num": 5,
  "conv1_out": 128,
  "conv2_num": 7,
  "conv2_out": 256,
  "d_out": 1024,
  "dropout": 0.7296244459829335,
  "report_interval": 100,
  "l2_penalty": 0.01,
  "LEARNING_RATE": 0.001,
  "TRAIN_SIZE": 1.0,
  "WIDTH": 128,
  "SEED": 42,
  "BATCH_SIZE": 66,
  "ITERATION": 5000.0,
  "HEIGHT": 128,
  "CHANNEL": 1,
  "VALIDATION_SIZE": 0.2,
  "NUM_CLASSES": 99,
  "CLASS_SIZE": 1.0,
  "features_images": "features only"
}

LEARNING_RATE looks like a fixed parameter, but it is searched over also.
These hyperparameters will achieve fairly high validation accuracy 80% for features only run after 5000 iterations.
- However, running the above python run_specific.py is meant to be images + features, and these hyperparamters don't seem to break 60% validation accuracy.

#Next steps

I've ordered next possible steps in decreasing combined ease of implementation and expected marginal benefit:
- Consider third convolutional layer to farther pool and shrink image.
- Combine dropout with max norm rather than l2_loss as that seems to be suggested as best for preventing exploding or imploding weights.
- Include image sizes as hyperparamters to search over. Most of this code is in place already.
- Write a better batching process that's more pipe like, perhaps workers create or find images based on size based on hyperparameters of image size.
- Formalize testing for batching process. I was testing this myself adhoc in jupyter notebooks.
- Note that util/cnn_classifier has decorators capable of assigning tensorflow variables as properties of the class without scoping issues thanks to one of the references below. I had the class nice and clean with the decorators, but changed it to better debug. I would change it back for clarity.

#References

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
data		data
notebooks		notebooks
tensorboard		tensorboard
tmp		tmp
util		util
README.md		README.md
features_only_20170130-183102.txt		features_only_20170130-183102.txt
requirements.txt		requirements.txt
run.py		run.py
run_specific.py		run_specific.py
sources.txt		sources.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leaf Classification from Kaggle

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Leaf Classification from Kaggle

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages