Dr.
Ronit Mandal AG60202
AI Applications in Agriculture
AI Applications in Image-based Quality Detection
Image analysis for food quality
Analysis of food products using cameras. It utilizes cameras to automatically observe, capture,
evaluate, and recognize stationary/moving objects. Image acquisition (RGB imaging, thermal
imaging, NMR, MRI…) is followed image processing to allow making learned predictions about
food quality. Image analysis allows non-destructive testing (grading, package quality, foreign
object, debris detection, …). Imaging mode can be Online (continuous monitoring) or offline
(individual capturing).
Figure. Imaging techniques
Image analysis can be done for two routes: using conventional feature extraction methods based
on pixel values (statistical models or conventional ML models) or deep learning based methods.
Figure. Image analysis techniques
Dr. Ronit Mandal AG60202
Conventional ML models typically involve more manual feature extraction and may require prior
knowledge about the image data. Deep learning, on the other hand, eliminates the need for manual
feature extraction by automatically learning the features from the raw image data.
Image analysis by conventional ML models
Pipeline:
• Preprocessing: Images are often resized, normalized, or converted into a different format (e.g.,
grayscale or a set of specific color channels).
• Feature Extraction: This is a crucial step where the model extracts meaningful features from the
image. This can be done using traditional computer vision techniques such as edge detection
(e.g., Canny or Sobel filters), histograms of Oriented Gradients (HOG), Local Binary Patterns
(LBP), colour histograms
• Modeling: The features extracted from the images are then used as input for conventional
machine learning algorithms, such as Support Vector Machines (SVM), Random Forests, k-
Nearest Neighbors (k-NN), Logistic Regression, Decision Trees
Image pre-processing (done by set of algorithms [you don’t need to remember the codes])
• Resizing: Standardize image size for model input (e.g., 224x224 for CNNs); Algorithm:
OpenCV: cv2.resize()
• Gray scaling: Reduce complexity by decreasing no. of channels; Algorithm: Formula: 𝑌 =
0.299𝑅 + 0.587𝐺 + 0.114𝐵; OpenCV: cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
• Denoising: Remove graininess from images; Algorithm: Gaussian Blur: Smooths image,
cv2.GaussianBlur(), Median Filter: Removes salt-and-pepper noise, cv2.medianBlur(),
Bilateral Filter: Preserves edges, cv2.bilateralFilter()
• Equalization: Improve contrast in low-light images; Algorithm: Global Equalization:
cv2.equalizeHist() (for grayscale)
• Thresholding: Convert grayscale to black & white for segmentation; Algorithms: Global
Thresholding: _, thresh = cv2.threshold(img, T, 255, cv2.THRESH_BINARY)
• Edge Detection: Highlight boundaries of defects, cracks, or structures (Canny Detection)
• Normalization: Scale pixel values to a standard range (e.g., 0 to 1); Algorithm: Formula:
X_norm = (X - min) / (max - min)
Figure. Image pre-processing steps
Image segmentation
Process of partitioning an image into regions or objects of interest for isolating the defect, feature,
or structure. The segmentation algorithms (ROI segmentation, contour detection) identifies
Dr. Ronit Mandal AG60202
relevant regions, reduces background noise, enables accurate feature extraction and automated
decision-making (e.g., fresh vs. spoilt)
Image feature extraction
It is the process of converting an image (complex and high-dimensional) into a meaningful
numerical representation that can be input into ML models. The raw images are transformed into
a structured set of numerical values (features) that form the characteristics of the image. Features
include geometric features, colour features, textural features.
• Geometric features
Area (𝑨)
Measures the size of a detected defect
𝐴 = ∑1[𝐼(𝑥, 𝑦) = 1]
(Here, 1[⋅] is an indicator function that is 1 for pixels belonging to the defect and 0 otherwise.)
Perimeter (𝑷)
Length of contour around defect
𝐴1 𝑃1 ⋯
The geometric features are stored in the feature matrix 𝐹 = [𝐴2 𝑃2 ⋯]; each row represents
⋮ ⋮ ⋱
the sample
• Colour features
Mean colour intensity
The average intensity value within the ROI, computed for each colour channel
For red
1
𝜇𝑅 = ∑ 𝑅(𝑥, 𝑦)
𝑁
(𝑥,𝑦)∈𝑅𝑂𝐼
Dr. Ronit Mandal AG60202
𝐴1 𝑃1 𝜇𝑅 ⋯
Feature matrix, 𝐹 = [𝐴2 𝑃2 ⋯ ⋯]
⋮ ⋮ ⋮ ⋱
• Textural features
Local Binary Patterns (LBP)
Describes texture by thresholding the neighborhood of each pixel and considering the result as
binary number. Carries out for each pixel in the image. If neighbor ≥ center then 1, else 0
35 54 21
34 30 54
22 42 65
1 1 0
1 1
0 1 1
This is expressed as clockwise [11011101] and changed into binary as 221
The LBP operation is carried out for whole image and all the corresponding LBPs are stored in
columns in feature vector
Gray Level Co-occurrence Matrix (GLCM)
Analyzes spatial relationships between pixel intensities to capture texture statistics such as
contrast, homogeneity, etc.
Feature matrix 𝐹 includes the data for each sample in rows and the columns have the individual
features
• Shape feature: [A, P]
• Colour features: [𝜇𝑅 , 𝜇𝐺 , 𝜇𝐵 ]
• GLCM features: [Contrast, Correlation, Energy, Homogeneity]
Dr. Ronit Mandal AG60202
• Local binary pattern histogram: [𝐿𝐵𝑃1 , 𝐿𝐵𝑃2 , …, 𝐿𝐵𝑃𝑛 ]
𝐴1 𝑃1 𝜇𝑅,1 𝜇𝐺,1 𝜇𝐵,1 𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡 … 𝐿𝐵𝑃1,1 ⋯ 𝐿𝐵𝑃𝑛,1
Feature matrix, 𝐹 = [𝐴2 𝑃2 𝜇𝑅,2 𝜇𝐺,2 𝜇𝐵,2 𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡 … 𝐿𝐵𝑃1,2 ⋯ 𝐿𝐵𝑃𝑛,2 ]
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋯ ⋱
Here, each row denotes an individual sample image and columns are the values of individual
features.
The ML models take the feature vector 𝐹 as input (𝑋𝑖 ) and predicts classes (𝑦̂𝑖 ) labelled as [1,0]
𝑥𝑖 = [𝐴, 𝑃, 𝜇𝑅 , 𝜇𝐺 , 𝜇𝐵 , 𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡, 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛, 𝐻𝑜𝑚𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦, … , 𝐿𝐵𝑃1 , … , 𝐿𝐵𝑃… ] ∈ 𝑅 𝑛
𝐹𝑖 = 𝑋𝑖 ∈ 𝑅 𝑛 be the feature vector for the 𝑖th sample
Mapping: 𝑓: 𝑅 𝑛 → {0,1}; the problem is passed into ML classification models such as random
forest, logistic regression, SVM, k-NN models
Output is denoted as:
1(e.g., spoiled)
𝑦̂𝑖 = {
0(e.g., fresh
Classroom Exercise
Classification of bread crumb quality:
Dataset curated from research images to extract features [number of gas cells, 'num_cells’, average
area of cells, 'avg_area’, SD of area of cells, 'std_area’, area of largest cell, 'max_area’, area of
smallest cell, 'min_area’]. Based on expert knowledge the samples labelled as 0, 1, 2
{Underproofed: 0; Normal: 1, Overproofed: 2}
The model was trained on the dataset using random forest, logistic regression, SVM, k-NN models
and compared. Logistic regression performed the best. The predictions of logistic regression were
used to check if model can generalize for unseen images.
Figure. Results for bread crumb quality prediction
Dr. Ronit Mandal AG60202
Image analysis by Deep-learning models for food quality prediction
Deep learning has the ability to learn complex features without manual engineering. Traditional
ML needs calculated features (colour, texture, etc.) whereas DL learns features automatically from
raw pixels.
Figure. CNN architecture
A raw color image is made up of three distinct color channels: Red (R), Green (G), and Blue (B).
This is often referred to as the RGB color model. Each pixel in the image has three components
corresponding to these channels: R, G, and B. The intensity values of these channels (for each
pixel) are usually in the range [0, 255] (8-bit per channel).
Considering an image of resolution 𝑀 × 𝑁, 𝑀 = Number of rows (height of the image) and 𝑁 =
Number of columns (width of the image). A colour image with this resolution is represented as a
3D matrix where 𝑀 corresponds to the height of the image, 𝑁 corresponds to the width of the
image and third dimension (3) corresponds to the R,G,B channels.
Let the image of an apple be used as an example. Considering a small central 5 × 5 patch, which
can be decomposed into three individual channels with individual pixel values (see below)
Figure. Image decomposition into individual RGB channels
Step 1:
Convolutional layer: applying a 3 × 3 filter or kernel as shown below
Dr. Ronit Mandal AG60202
The kernel and pixel values for location (0,0) are multiplied by dot product and corresponding
output is calculated. Then the kernel moves along the image patch to provide outputs.
2 2
[𝑅𝑖,𝑗 ⋅ 𝐾𝑅 [𝑖, 𝑗] + 𝐺𝑖,𝑗 ⋅ 𝐾𝐺 [𝑖, 𝑗]
𝑂𝑢𝑡𝑝𝑢𝑡 (𝑖, 𝑗) = ∑ ∑
+𝐵𝑖,𝑗 ⋅ 𝐾𝐵 [𝑖, 𝑗]
𝑖=0 𝑗=0
Location (0,0) →
237 237 238 1 0 -1 118 109 91 0 1 0 52 44 48 -1 0 1
𝑂𝑢𝑡𝑝𝑢𝑡 (0,0) = [233 228 236] × [1 0 -1] + [116 103 94] × [0 1 0] + [49 37 48] × [-1 0 1]
238 242 233 1 0 -1 90 90 87 0 1 0 48 53 38 -1 0 1
=[237×1 + 237×0 + 238×(-1)+ 233×1 + 228×0 + 236×(-1)+ 233×1 + 242×0 + 233×(-1)]
+[118×0 + 109×1 + 91×0+ 116×0 + 103×1 + 94×0+ 90×0 + 90×1 + 87×0]
+[52×(-1) + 44×0 + 48×1+ 49×(-1)+ 37×0 + 48×1+ 48×(-1)+ 53×0 + 38×1] = -4+302-15 = 283
Compiling all outputs
𝑂𝑢𝑡𝑝𝑢𝑡 (0,0) = 283; 𝑂𝑢𝑡𝑝𝑢𝑡 (0,1) = 234; 𝑂𝑢𝑡𝑝𝑢𝑡 (0,2) = 215; 𝑂𝑢𝑡𝑝𝑢𝑡 (1,0) = 240;
𝑂𝑢𝑡𝑝𝑢𝑡 (1,1) = 234; 𝑂𝑢𝑡𝑝𝑢𝑡 (1,2) = 230; 𝑂𝑢𝑡𝑝𝑢𝑡 (2,0) = 229 𝑂𝑢𝑡𝑝𝑢𝑡 (2,1) = 230;
𝑂𝑢𝑡𝑝𝑢𝑡 (2,2) = 228
283 234 215
Final output 3 × 3 feature map = [240 234 230]
229 230 228
Activation function
After convolution, each pixel in the feature map is passed through an activation function like ReLU
(Rectified Linear Unit). The function introduces non-linearity to the model
𝑅𝑒𝐿𝑈(𝑂𝑢𝑡𝑝𝑢𝑡) = 𝑚𝑎𝑥 (0, 𝑥)
If 𝑥 is greater than or equal to 0, the function returns 𝑥.
If 𝑥 is less than 0, the function returns 0
283 234 215 283 234 215
Final output 3 × 3 feature map = [240 234 230] → 𝑅𝑒𝐿𝑈(𝑂𝑢𝑡𝑝𝑢𝑡) = [240 234 230]
229 230 228 229 230 228
Dr. Ronit Mandal AG60202
Step 2:
Pooling layer
It reduces the size of the feature map while preserving important features and takes maximum
value for each 2 × 2 block
283 234 215
𝑅𝑒𝐿𝑈(𝑂𝑢𝑡𝑝𝑢𝑡) = [240 234 230]
229 230 228
283 234 283 234
After Max pooling [ ] output is 283; Pool feature map = [ ]
240 234 240 234
Figure. After convolutional layer and max pooling layer
Step 3:
Flattening layer
It flattens the layer out vector [283,240,234,234] and this is input into the fully connected layer of CNN
Step 4:
Dense layer
Flattened feature vector is passed into one or more dense layers and multiplied with corresponding
weights 𝑊 and bias 𝑏 term is added. Let the initial weight and bias be:
0.1 -0.2 0.05 0.1
𝑊 = [ 0.2 0.1 -0.3 0.05]
-0.05 0.2 0.1 -0.2
0.3 -0.1 0.4 0.2
𝑏 = [1,-2,0.5,0]
𝑍 = 𝑊. 𝑥 + 𝑏
Dr. Ronit Mandal AG60202
Figure. Flatten layer to fully connected layer
Neuron 1: z1=283(0.1)+234(0.2)+240(-0.05)+234(0.3)+1=28.3+46.8-12+70.2+1= 134.3
Neuron 2: z2=283(-0.2)+234(0.1)+240(0.2)+234(-0.1)-2=-56.6+23.4+48-23.4-2= -10.6
Neuron 3: z3=283(0.05)+234(-0.3)+240(0.1)+234(0.4)+0.5=14.15-70.2+24+93.6+0.5=62.05
Neuron 4: z4=283(0.1)+234(0.05)+240(-0.2)+234(0.2)+0=28.3+11.7-48+46.8 = 38.8
Step 5:
Output layer
ReLU activated values are passed into output layer
𝑍 output from dense layer becomes new input here. New weights 𝑊′, New Bais
𝑛
𝑍 ′ = ∑ 𝑊 ′ 𝑖 𝑥 ′ 𝑖 + 𝑏 ′ = (0.2×134.3)+(-0.1×0)+(0.05×62.05)+(0.3×38.8)+(-10)
𝑖=1
= 26.86+0+3.10+11.64-10 = 31.6
Dr. Ronit Mandal AG60202
Activation function (Sigmoid function)
1
𝑦̂ = 𝜎(𝑍 ′ ) = 𝜎(31.6) = ≈ 1 [0,1]
1 + 𝑒 -31.6
Step 6
Loss function, backpropagation and training loop
The predicted labels 𝑦̂ is compared with the true label using a loss function ℒ(𝑦, 𝑦̂); The network
updates weights to minimize misclassification error. This is repeated over for many epochs over
the training data, until accuracy is satisfactory.
Transfer learning in DL
Instead of training from scratch, a model pre-trained on ImageNet is used to classify images. The
pre-trained CNN can be fine-tuned for better results. Some pre-trained models include ResNet50,
MobileNetV2, VGG16
Classroom exercise
Classification of apples as {Defective: 0; Non-defective: 1}
Apple dataset of 550 images (275 defective, 275 non-defective). After 14 epochs over the training
data, accuracy was satisfactory