Classification of images of ants using deep learning
- FormicID
- Description
- How to use
- Project Structure
- AntWeb
- Neural Network
- Requirements
- Credits
- Why this name, FormicID?
Code repository for CNN-based image classification of AntWeb images
The proposal can be found here.
The report can be found here.
Below are some steps to get you going. Futhermore, all functions have descriptions and should get you more information.
Clone the repository.
$ git clone https://github.com/naturalis/FormicID
$ cd ./FormicIDSkip step 2 if you don't need to download the data.
Create a 2 column .csv file with the genus + species specified for downloading from AntWeb. indet species will be skipped because it it just a aggregation of unidentified specimens within a genus. Species that will show 0 specimen count will also be skipped.
| genus | species | 
|---|---|
| genus1 | species1 | 
| genus2 | species2 | 
| ... | ... | 
get_species_list.py is made to do this for you. Here you just have to set the number of minimum images you want species to have and a 2 column csv file is created with genus and species names. However, due to the problem that some species have more than 3 images (e.g. close-ups), the counting of images per species is incorrect if you just want to have dorsal, head and profile shot types. Therefore, use this script with caution.
Configure the configuration file formicID/configs/config.json or create your own configuration file based on this one.
- Set an experiment name using exp_name.
- Set a data set name in data_set.
- Set the following (as integers):
- batch size(for InceptionResNetV2: 32)
- dropout
- learning rate
- num_epochs
- seed
 
- Set the modelto one of:- InceptionV3
- InceptionResNetV2
- Xception
- Resnet50
- DenseNet169
- Build(this is the own designed network)
 
- Set the optimizerto one of the following:- Nadam
- Adam
- RMSprop
- SGD
- Eve(not working as of now)
 
- Set the test_splitandval_splitas float percentages.
- Set the shottypeto use (dorsal,head,profileorstitched).
{
    "exp_name": "experiment_name",
    "data_set": "dataset_name",
    "batch_size": 32,
    "dropout": 0.5,
    "learning_rate": 0.001,
    "model": "InceptionResNetV2",
    "num_epochs": 100,
    "num_iter_per_epoch": 32,
    "optimizer": "Nadam",
    "seed": 1,
    "test_split": 0.1,
    "val_split": 0.2,
    "shottype": "head"
}Next, using the python file get_dataset.py you can download, stitch, split data and/or remove reproductives.
Set the correct values for the function below. This function will download json files that will hold all the information on species (such as names, catalog identifier and URLs to images). Then it will filter out the relevant information, after which it will download the images. Quality of images is one of: low, medium, thumbview or high. Shottypes can be d, h, or p or a combination of those. If you flag multi_only to True, shottypes needs to be dhp.
get_dataset(
    input='species.csv', # The csv file from step `2.1.
    n_jsonfiles=5,       # Should be equal to the number of species from step 2
    config=config,       # The configuration file.
	shottypes="dhp",	 # Specifies the shottypes to download images
    quality='medium',    # The quality of images.
    update=True,         # Whether to update for broken URLs.
    offset_set=0,        # The offset for specimens in a JSON file.
    limit_set=99999,     # The specimen limit to add to the JSON file.
	multi_only=True		 # Flag `True` if doing multi-view
)Run the function below to stitch together the images from three shottypes, if you are doing the multi-view approach.
stitch_maker(config=config)Run this function to split the data in a training, validation and test set, configured by the config file. You can also set a 1 column csv file containing bad specimens (e.g. affected by funghi, or missing bodyparts).
split_in_directory(config=config, bad="data/badspecimens.csv")Together with a 1 column csv file containing catalognumbers, you can remove the reproductives from a test set using the function below.
remove_reproductives(
     csv="data/reproductives.csv",
     dataset="top97species_Qmed_def_clean_wtest",
     config=config,
 )Now you can run formicID/main.py with config.json as a system argument and the model will be initialized, compiled and training will begin, as set by the configuration file.
Possible callbacks, loaded from utils/logger.py, are Tensorboard, EarlyStopping, ModelCheckpoint, CSVLogger and ReduceLROnPlateau.
- Using TensorBoard you can get insight in training metrics.
- Earlystopping will make sure the model does not overfit and continue training for too long
- Weights will be saved every time the model is improved, based on the validation loss and at the end of training.
- A csvlogger is logging all the training and validation metrics per epoch
- Learning rate is reduced if the model has stopped improving.
After training it will be possible to launch TensorBoard to view loss, accuracy, and top-3 accuracy for training and validation. Using evaluator() the test set will be run against the model to see test metrics.
Further evaluation options are:
- It is possible to plot metrics, right after training, using plot_history().
- Predict labels for the test set using predictor().
- Get prediction reports for the test set using predictor_reports()in 2 forms:- classification report with precision, recall, f1 and support
- true labels and its corresponding predicted label
 
- Plot a confusion matrix using the species names, true labels and predicted labels using plot_confusion_matrix().
Using predict_image.py it is possible to initialize a model, load pre-trained weights and add an image to get a classification for that image.
Utilities that can be loaded are:
- Image utilities
- Saving data augmentation examples of 1 sample image augmentation.py.
- Viewing data augmentation for 1 sample images  show_augmentation_from_dir().
- Viewing multiple images show_multi_img().
 
- Saving data augmentation examples of 1 sample image 
- Handeling models and weights.
- Saving a model save_model().
- Loading a model from a file load_model_from_file().
- Saving weights weights_load().
- Model summary model_summary().
- Saving a models as configuation file model_config().
- Load a model from a configuration file model_from_config().
- Load a model from a JSON file model_from_architecture().
- Visualize the model model_visualization().
- Train multiple GPUs make_multi_gpu().
 
- Saving a model 
|-- formicID
    |-- __version__.py
	|-- augmentation.py
	|-- get_dataset.py
	|-- get_species_list.py
	|-- main.py
	|-- predict_image.py
    |-- AntWeb
    |   |-- AW2_to_json.py
    |   |-- AW3_to_json.py    
    |   |-- json_to_csv.py
    |-- configs
    |   |-- config.json
    |-- data_loader
    |   |-- data_input.py
    |-- data_scraper
    |   |-- scrape.py
    |-- models
    |   |-- build.py
    |   |-- models.py
    |-- testers
    |   |-- tester.py
    |-- trainers
    |   |-- train.py
    |-- utils
        |-- img.py
        |-- load_config.py
        |-- logger.py
        |-- model_utils.py
        |-- utils.py
AntWeb is the world's largest online database of images, specimen records, and natural history information on ants. It is community driven and open to contribution from anyone with specimen records, natural history comments, or images.
Our mission is to publish for the scientific community high quality images of all the world's ant species. AntWeb provides tools for submitting images, specimen records, annotating species pages, and managing regional species lists.
Text is taken from www.AntWeb.org
Images are harvested from www.AntWeb.org. At this moment API version 2 is used, because version 3 was still in beta when the project started. Later, the scripts could be changed to use version 3.
Below you can see two images representing the dataset. One is an image of Lasius flavus and the other one is a mosaic of Tetramorium gollum I made using the image set.
| Lasius flavus | Mosaic of Tetramorium gollum | 
|---|---|
Inception based
- Inception v3
- Inception-ResNet V2 (recommended)
- Xception (Inception based) ResNet based
- ResNet
- DenseNet (ResNet based)
It is also possible to use a model made by the author by flagging the model in the configuration file as Build.
- Naturalis Biodiversity Center
- Supervisor: dr. Rutger Vos
- 2nd Corrector: dr. Jeremy Miller
- Bookmarks and Resources
FormicID is a concatenation of Formicidae (the family name of ants) and identification