this page is still under construction, come back later 😴 Reproductive workflow of Colares et al. (2024)
Lucas Colares 2023-08-02
Hey there, nerds! This R markdown file is the summarization of all the steps to achieve the results reported in the paper Contrasting responses of insects to forest loss promoted by a mega dam in the Amazon, by Lucas Colares and co-authors. First of all, download and unzip (or clone) the GitHub repository of the project at https://github.com/lucas-colares/blow-me-a-fly. Then, open the R project by double-clicking in the HabIns.Rproj file. If this R markdown file does not automatically appear, double click in the full_step_by_step.Rmd file and follow the steps below. Hope it helps!
Okay, after this introduction, we can now go coding! First of all, you will need to set a few configurations and functions for this markdown file to work. We are going to need some (ok, a lot of) packages and other small setups. Don’t worry, I already created a R script that you can run to install all packages and functions required. Just run the following code to setup your gear:
source("scripts/00. setup.R")
## Carregando pacotes exigidos: imagerExtra
## Carregando pacotes exigidos: imager
## Carregando pacotes exigidos: magrittr
##
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
##
## add
## The following objects are masked from 'package:stats':
##
## convolve, spectrum
## The following object is masked from 'package:graphics':
##
## frame
## The following object is masked from 'package:base':
##
## save.image
## Carregando pacotes exigidos: vegan
## Carregando pacotes exigidos: permute
## Carregando pacotes exigidos: lattice
## This is vegan 2.6-4
## Carregando pacotes exigidos: XML
Now that you are setup, we are going to need to download the raw images of all 236 sticky traps to the folder insect_imgs/, which should be empty when you download or clone the GitHub repository. You can either manually download the raw images at https://figshare.com/ndownloader/files/41804211 and insert it into the specified folder, or run the following command line (please, note that the zip file is ~3GB, so this should take a while to download):
#download.file(url = "https://figshare.com/ndownloader/files/41804211",
# destfile = "insect_imgs/originals.zip",mode = "wb",)
Nice! Now we need to unzip the file, run the following:
#unzip(zipfile = "insect_imgs/originals.zip",exdir = "insect_imgs/")
Now that we got our images, we are going to conduct a process called
Image segmentation, which basically means that we are going to select
which elements we are interested in an image (in our case, insects). To
do this, we will automatically identify the background (yellow sticky
traps) and separate it from the foreground (insects) using the function
mask_segmentation()
, which was created by our team specifically for
this task. Before running it, let’s take a look on what exactly this
function is doing.
First of all, the mask_segmentation()
function will load the original
image from the sticky trap:
Second, the mask_segmentation()
function will apply a blur to the
image, so the edges of the foreground are soften for the segmentation.
This blur is very soft, so you may not even notice it:
Thirdly, the mask_segmentation()
will resize the image, if you want
to. This resize step is useful if you have limited computational power.
The function will run much fast with a resized image. You may adjust how
much you want to resize the image using the Resize
argument. The
default option is 1
, which means that the full resolution will be
used. If you set the Resize
argument to 2
, then the resolution used
will be 1/2 of the original image. If you set it to 4
, then the image
will be resized to 1/4 of the original resolution, and so on. For our
images, we recommend setting the Resize
argument to any number ranging
from 1
to 5
, no more than that.
Following, the mask_segmentation()
function will then convert the RGB
image to gray-scale for the next steps:
In the next step, the mask_segmentation()
function will apply an
Adaptive Threshold in the image, which means that light-colored pixels
will be converted to white pixels and darker-pixels will be converted to
black pixels, leaving the insects in the sticky trap evident (figure 4).
This Adaptive Threshold method is useful for images with varying
lighting conditions, which is the case of our images. It will conduct
this Threshold to small pieces of the original image so that every
region has its own threshold values (pieces, i.e., window, of size 10%
relative to the largest side of the image). In the function, you can set
the K
argument to any number between 0 and 1. When K
is high, local
threshold values tend to be lower. When K
is low, local threshold
values tend to be higher. Default is 0.2, which works fine for our
images. The image will look like this after Adaptive Threshold:
Following, the function applies a erode mathematical operation to enlarge a bit the edges of the white pixels. Then, white polygons too small are cleaned and the colors are inverted in a way that black pixels now represent the foreground and white pixels represent the background:
Figure 5. Image after erode, clean and inversion.Then, all white polygons that are connected are separated into small pieces in a imlist and x and y coordinates of the limits of each small piece are extracted to a data frame. Further, a cluster analysis is conducted using the gower distance to check which set of white pixels are closer to each other:
Figure 6. Cluster dendrogram representing which set of white pixels are closer to each other in the image.In this step, the dendrogram is cut in a specific height so that groups
of white pixels that are closer to each other are formed. The numeric
scalar in which the dendrogram is cut can be set in the H
argument of
the mask_segmentation()
function. Default value is 0.05, which works
fine for our images. The groups that are formed after cutting the
dendrogram represent the white pixels that will be together in the final
image piece. In this way, insects that are close to each other stay in
the same image. In the end, the original image is sliced into many small
pieces. Note that you can specify a margin for these small pieces using
the Margin
argument of the mark_segmentation()
function. We suggest
you to set this margin to 100 pixels (the default), in this way, we can
capture more details from the image that would otherwise go missing.
These small pieces are then saved in a folder specified by the user,
which can be set in the destFolder
argument of the
mark_segmentation()
function. Here, we choose to save all the pieces
in the “insect_imgs/slices/all_slices/” folder. A XML file is saved
with the small pieces representing the position in which the white
pixels (i.e., the insects) were located. The white pixels in all XML
files will be labelled as “insects” but we will refine these annotations
later. We can later use these XML files to accelerate the annotation
process in the next steps.
Nice! After this (not so) short explanation about how the function
works, let’s finally generate these small pieces for all our images. You
will need to provide a vector with the path for all images you want to
separate into small pieces, we do this in the line
paste0("insect_imgs/originals/",dir("insect_imgs/originals/"))
and
store these paths in the images
object. These will need to be provided
for the function using the imgs
argument. Then, we specify the folder
we want the images and XML files to be saved in the line
destination.folder="insect_imgs/slices/all_slices/"
. The folder where
images and XML files will be saved is then stored in the
destination.folder
object. Now, we can move forward and run the
mask_segmentation()
function using the specific arguments. You will
get a loading bar that will indicate to you when this process is over,
this can take several hours. If you want to download the same specific
image slices that we used in the paper, you can jump ahead and download
it directly using code chunck after the next one.
images=paste0("insect_imgs/originals/",dir("insect_imgs/originals/"))[300:469]
destination.folder="insect_imgs/slices/all_slices/"
#mask_segmentation(imgs = images,destFolder = destination.folder,Resize = 4, K=0.2, Blur=5, H=0.05, Margin=100)
Please, note that we removed several sliced images from our dataset during annotation because there was nothing in it. In the end, we ended with the 8926 slices described in the paper. To download these, run the following:
Now that we have all the image pieces we need, it’s time to properly label these insects! This process is called annotation and consists on basically naming a considerable amount of images to train our object detection model. For this task, we developed an interactive function that uses the previous XML file we created on the previous step.
slices=paste0("insect_imgs/slices/all_slices/",dir("insect_imgs/slices/all_slices/",pattern = ".jpg"))
destination.folder2="insect_imgs/slices/annotated/"
#auto_annotation(imgs = slices, destFolder = destination.folder2, Blur = 2, K = 0.2, Erode = 5)