Set of tools to simplify interactions with IFCB data
IFCB Tools provide tools to extract raw IFCB data to matlab files and incorporate classification from EcoTaxa (extractIFCBdata.py). In addition, an utility to download data from EcoTaxa is provided (getEcoTaxa.py).
Install python dependencies
pip install numpy pandas Pillow beautifulsoup4 imageio matlab openpyxl tqdm
Install matlab Engine API for python (https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html)
cd /Applications/MATLAB_R2024b.app/extern/engines/python
python -m pip install .
Matlab Package requirements: Parallel Toolbox, MatlabProgressBar. This can be installed using the Matlab Add-On Explorer.
Download IFCB Analysis code required to extract features from the IFCB.
cd ifcb-tools
wget https://github.com/hsosik/ifcb-analysis/archive/master.zip
unzip master.zip
wget https://github.com/hsosik/ifcb-analysis/archive/features_v3.zip
unzip features_v3
IFCB Analysis requirements:
- the file
/ifcb-analysis-<branch>/feature_extraction/ModHausdorffDistMex.cppcompiled for your operating system. If you are using operating system different than Windows 64-bits or OSX 64-bits, compile the original cpp file. Instructions are available here - the functions statxture, statmoments, invmoments, and bound2im from DIPUM must be present in the folder DIPUM. They can be downloaded here
getEcoTaxa.py downloads projects classification from EcoTaxa. It requires the user to authentificate through his EcoTaxa account.
Usage: getEcoTaxa.py [-h] -u USER [-p PATH] [-i IDS [IDS ...]] [-a AUTHORIZATION]
Optional arguments:
-h,--helpshow this help message and exit-u USER,--user USER(required) Set email of EcoTaxa account.-p PATH,--path PATH(optional) Set download directory. The download directory is the directory where all files will be saved.-i IDS [IDS ...], --ids IDS [IDS ...](required) Set project identification numbers to be downloaded. Multiple projects can be given (must be separated by a space). If not provided all projects from the EcoTaxa account are downloaded.-a AUTHORIZATION, --authorization AUTHORIZATION(optional) Provide EcoTaxa password through command line. Not recommended.
Example:
./getEcoTaxa.py -u [email protected] -i 1234 4321 -p ~/Downloads/
extractIFCBdata.py extract raw IFCB data for machine learning training, machine learning classification, EcoTaxa, or Ecological studies.
Usage: extractIFCBdata.py [-h] -r RAW -m ENVIRONMENTAL [-t TAXONOMY] [-e ECOTAXA] -o OUTPUT [-p] [-s SAMPLE] [-f] [-u] mode
Positional arguments:
modeSet data extraction mode. Options available are: ml- train, ml-classify-batch, ml-classify-rt, ecotaxa, ecology.
Optional arguments:
-h,--helpshow this help message and exit-r RAW,--raw RAWSet path to raw IFCB directory (adc, hdr, and roi files).-m ENVIRONMENTAL,--environmental ENVIRONMENTALSet path to environmental metadata file.-t TAXONOMY,--taxonomy TAXONOMYSet path to taxonomic grouping file.-e ECOTAXA,--ecotaxa ECOTAXASet path to EcoTaxa classification directory or file.-o OUTPUT,--output OUTPUTSet path to directory of formatted output data.-p,--parallelEnable Matlab parallel processing.-s SAMPLE,--sample SAMPLESet sample to process in mode ml-classify-rt.-f,--forceForce update of all data in mode ecology.-u,--update-classificationUpdate classification data in mode ecology.