See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/382530034
A Review of Building a Search Engine Using Image Recognition
Article · December 2024
CITATIONS READS
0 108
1 author:
Amal Omar Saad
College of Computer Science and Tecnolgogy
4 PUBLICATIONS 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Amal Omar Saad on 25 July 2024.
The user has requested enhancement of the downloaded file.
Journal of Electronic Systems and Programming www.epc.ly
A Review of Building a Search Engine Using Image
Recognition
Amal Omar O.Saad, Libyan Academy for Postgraduate Studies, Libya-Tripoli
[email protected], [email protected]
Abstract—Image recognition is the process of identifying objects, places, persons, writing, and
actions in images using software. It has been the subject of extensive research over the past decade
and has found many new applications in e-commerce as intelligent assistants. Artificial
Intelligence (AI) is replacing ordinary computers and machine learning is being used in various
application areas such as disease prediction, weather forecasting, and image recognition.
Convolutional Neural Network (CNN) models have been widely used to study images and retrieve
the query image from a massive number of images saved in a dataset. CNN is one of the data
mining (DM) techniques. Image recognition is based on the features of the image; it searches for
the best patterns to get the required image in a short time. Image processing is not an easy concept;
it needs to take all image features into account. That’s why AI and DM are the best solution for
building a great image search engine. This paper reviews the models and their algorithms that have
been used to develop the Image search engine in terms of the best techniques to get images
queried. The models are ResNet30/50, ResNet30, Inception V3 and ConvNets.
Keywords— Image recognition, Pattern Recognition, CBIR, neuronal networks, CNN
I. INTRODUCTION
Image recognition has become more dominant, and search using image (image to
image) is used in a search engine to find a product that adds the strongest support to e-
commerce and online shopping. With so many alternatives available to buyers, it can
take a lot of searching to locate one that meets their needs. Since there are so many
options available to buyers, it may take some time for them to choose one that suits their
needs. Many studies have tried to get the required image based on studying image
Issue: 7 Jun 2023 Page 1
Journal of Electronic Systems and Programming www.epc.ly
properties. One of them, V. Ragatha et al. [7], proposed an image-based search engine
using techniques for the detection of change of edge, sober filter design, and color
coherence vectors (CCV). These techniques are called Content-Based Image Retrieval
(CBIR). Y. C. Wahl et al., (2013), proposed a scheme to use the learning feature to
include clothing image interpretation that widely used in e-commerce. Reverse image
search is a technique for querying the internet for an image to see if any other exact
examples are present. More technically, it is a CBIR query method that gives the CBIR
system a sample image it will use to base its search. When you upload a picture, it goes
through the reverse image search engine’s fingerprinting algorithm. The search engine
will then try to find the entries with the closest fingerprints, referred to as “image
distance”. Vissarut Surkarin et al. (2016) suggested a way to categorize and identify
clothing types using feature extraction, Speed-Up Robust Features (SURF) in
combination with the Bag of Feature (BoF) [2]-based local discretions patterns (LDP).
The query image is compared using a CBIR technique to find the most similar relevant
photos between the database of feature vectors and the query image’s feature vectors and
arrive at the closest relevant images. It can use an image search engine to face many
things such as clothing which is a major part of your everyday life; search for
information in the server databases is useful in the modern world of fashion. On the
other hand, Earth science researchers at NASA developed Worldview search engine
tools which camper input satellite images with the saved image in the dataset [2].
Various algorithms have been proposed for the purpose of the feature extraction task by
using data mining techniques that have proved to be the most efficient, interactional,
flexible and quick image likeness search engine [3]. There are two approaches that can
be used as search techniques which are metadata or features.
This paper presents some algorithms that have been proposed for the feature
extraction task using a metadata approach. The contents of each image are identified
using tags. The images with the metadata that most closely match the query metadata are
returned to the user when they attempt to search for images (either using text or another
image as input). On the other hand, the second approach is to search by the image
features themselves. In this approach, a trained Convolutional Neural Network (CNN)
model is used to search by features in the image. The database uses a flexible neural
network to extract data using this deep neural network of image recognition, and pattern
matching and is very effective in testing fabric prediction. CNN models have many
Issue: 7 Jun 2023 Page 2
Journal of Electronic Systems and Programming www.epc.ly
algorithms like Residual Neural Network ResNet30/50 [3]-[5], VGGT16 [5] and
InceptionV3 [6] which are based on features Image extraction. This deep neural network
is particularly effective at pattern matching, picture recognition, and fabric prediction
testing based on looking for patterns in databases. The image recognition system consists
of the standard operational features described in Fig.1.
Pre-processing Image
Image acquisition
(filtering, enhancement ..etc.) segmentation
Obtain the result in terms Feature extraction
Classification
of accuracy (Texture, colors)
Fig.1. Image Recognition System Adapted from [4]
Image acquisition involves taking an image with the help of different units. A
variety of sensors are available such as light sensors, ultrasound, radar, and so on. Before
removing an object in image processing, which is based on certain assumptions and
conditions like image format, image resolution, color, and size, part of the initial
experience with the functionalities includes a histogram of balance conversion, noise
reduction, etc.
Segmentation is when processing the image, different image regions are selected
and suitable points for further treatment processing are displayed. Feature extraction
deals with the information contained in the section that looks like a corresponding sphere
for the size of the image or one specific element. Classification is when all elements are
classified into multiple categories based on their assets. This paper discusses various
techniques and approaches used in image search engines. Three search engines are
presented: Worldview search engine, Reverse Image Search Engine for Garment
Industry, and Snapshop-An Image-Based Search Application. Finally, a table
summarizing the models used to develop the search engine for the three studies is
presented.
A. Snapshot Search Engine
In-depth searches are underway in the field of image recognition, based on the maxim
Issue: 7 Jun 2023 Page 3
Journal of Electronic Systems and Programming www.epc.ly
that an image is worth 1000 words. This field of image recognition in visual search will
continue to grow in the future. Using advanced computer techniques to achieve higher
accuracy, Convolutional Neural Networks (CNNs) are deep learning algorithms capable
of capturing the image entry, assigning values (tools and readable selections) to different
elements/regions of the image, and separating them. The ConvNets model is used to
develop this search engine; it is able to learn these filters/features with proper training.
Figure 2 presents the main steps of the CNN process in this section. CNN consist of:
Convergence (filter) layer: In this layer, the image is define as some of the pixel
dimensions (e.g. 3×3, 5×5), and the convolution function is an output of the
points defined by the value in the pixel filter.
Activation layer: This creates a much smaller image size. This feature
distributes a nonlinear boot layer to allow the network to train in the
background. Sigmoid, SoftMax, Relu, and TANH have activation capabilities.
Pool (merge) layer: This signifies additional reduction and reduction of matrix
volume “remove redundancy.” This allows the network to learn faster, focusing
on the most significant details in every aspect of the image.
Fully integrated layer: This is a standard format for multiple-layer perceptron. It
works with a group of opportunities associated with anything (e.g., dog, cat,
bird).
Every test has proven that CNN in the Image-Net dataset provides the most accurate
results and performance. However, it still needs to know something in the form of small
ants on a flower stalk. In addition, it finds it hard to distinguish images with filters
similar to Instagram filters. The main problem in image processing is that the image
has some features or functionality. As a result, many kinds of image recognition are
available for various purposes:
1. Object recognition: This is a way to see one or more previously investigated objects
or their classes if they exist in uniform or three spatial dimensions. One example of
a search engine is Google.
2. Identification: This is used for identifying a specific object like pointing up
somebody’s face or defining handwriting digitally.
Issue: 7 Jun 2023 Page 4
Journal of Electronic Systems and Programming www.epc.ly
Fig.2. CNN's simple design for fashion MNIST adapted from [4]
A range of request entry images of various types are taken. The top-4 similar images
recovered by boring metric indexing (closest approximation) are tabulated with
similarity scores. Similar images, as seen, correspond in terms of color, shape, pattern,
texture, and style. The image search engine system can successfully capture features for
a wide range of images and deliver accurate results.
Figure 3 shows the Architecture Snapshot-based Search Engine which consists of three
layers:
1. User Interface: This is an essential part of any user-based application. It helps
to engage the users and give them a great experience.
2. Middle Layer: This layer includes a web service based on picture recognition.
3. Dataset Layer: Dataset Layer: A collection of related data that is organized
into some type of data structure.
Fig.3. System Architecture Adapted from [4]
Issue: 7 Jun 2023 Page 5
Journal of Electronic Systems and Programming www.epc.ly
B. Worldview search engine
NASA’s Earth Observing System Data and Information System (EOSDIS) created an
appliance that contains over 52,000 satellite images and about 900 layers saved in a
database [3]. This appliance helps users interface with the earth map as layers, which are
updated daily. The Worldview search engine is a type of artificial neural network used in
image recognition and processing that is specifically designed to process pixel data. The
image data is saved as a list of integers called “features”. These features are the key to
searching for the image. ResNet50 is a CNN model that compares the 2048 data size to
128 features, and the storage data “Annoy library” speed can retrieve data for 5 seconds
per query on a single virtual machine (VM) in the cloud. Additionally, the Worldview
built UX helps search for the image relatively until it gets the required image. The UX
takes a subset from the input image using snipping tool. The search engine in this study
is called Pipeline. Pipeline works based on more than 52k images taken from NASA
GIBS with zoom 8 and 200 images taken with zoom 4. The images are stored with path
file (*.ann) as Annoy python library. The save data algorithm is designed based on
Fig.3. Pipeline of the Reverse Image Search Adapted from [3]
Issue: 7 Jun 2023 Page 6
Journal of Electronic Systems and Programming www.epc.ly
binary tree, query speed and organized method. The Annoy search returns indexes of the
features that most match the user’s query image. These indexes can be converted into a
NASA GIBS URL pointing towards the image that the characteristics represent. The
application developed is called Flask app. Figure 3 below shows the image search engine
steps.
The researchers designed a scalable reverse imagery search system into a cloned version
of NASA’s Worldview. The ResNet30 model is capable of reversing the search for a
larger number of image classes and having a database of several years of images. The
current CNN is only applicable to the labeled image classes on which it was created.
Further, it would be impractical to label images to form the pattern in a supervised
mansion for reverse image research. Scientists would be required to store their image
database as a collection of the characteristics of all images and deliver their custom
pattern in this file code as (*.h5) because it cannot save all images and manage them.
The researchers have shown that creating a reverse image search system on Worldview
is highly feasible. This has allowed us to speed up research while reducing the storage of
massive databases of satellite images.
ResNet improved to speed up search along with shrinking the storage of the enormous 6
satellite image databases. Besides that, using a standard ResNet model was not ideal for
searching reversed images, as the ResNet model was good when classifying for color but
lacked detection of finer features.
C. Garment Industry Reverse Image Search Engine
The garment industry is aiming to enter the world of e-commerce aggressively.
However, current search engines such as Google and Bing that search text to text and
text to images do not achieve this goal. This study developed an approach consisting of
four models for a search engine based on searching image to image, as shown in Figure
4. The four models are listed below:
1. Image Pre-processing and Feature Extraction: The first module focuses on pre-
processing the images in the clothing dataset and extracting relevant features to
represent visual information. The images are first resized and then converted into
RGB format. Then, they are converted into a numpy representation in 3 dimensions
that are centered at zero using the RGB mean values specific to each model. Lastly,
the numpy representations are transmitted through the model and the features are
extracted from the pre-final layer, before the final classification layers. The feature
Issue: 7 Jun 2023 Page 7
Journal of Electronic Systems and Programming www.epc.ly
files are saved in the (.npy) extension.
2. CNN Model Selection: The second module completely compares three pre-formed
CNN models VGG16, Resnet50 and InceptionV3 to choose the optimal model for
the clothing dataset. The selected CNN model has a significant effect on the
extraction of characteristic vectors and the subsequent accuracy of retrieval of
similar images.
3. Training Model using Transfer
Learning: The third module is
the application of transfer
learning to refine the selected
model to retrieve domain-
specific items. The
development of the CNN
model aims at allowing the
selection of layers in the pre-
formed CNN model as well as
fine adjustment of the CNN
model for custom dataset.
Viewing feature maps after
each convolution block in
ResNet-50 helps in deciding
the optimal extraction layer. In
terms of focus, the ResNet50
pattern until the fully Fig.4. Basic Block Adapted from [3]
connected layers head is used as the basic pattern to which additional layers are
added. The base model is frozen prior to recycling, therefore only added layers are
formed on the dataset. The focus process ensures that the model is more
personalized to our data set and can capture more subtle features.
4. Fine tuning ResNet-50: The fourth module performs image retrieval for a given
query image by using Annoy Indexing as a similarity measure. This example
examines the approach. The dataset contains about 4200 images. Unique image
identifiers, image URLs, alternative image texts, and product page URLs are also
stored for access. ResNet50 displays similar images with better quality than those
Issue: 7 Jun 2023 Page 8
Journal of Electronic Systems and Programming www.epc.ly
that are less similar. Elapsed time for extracting original functionality from a subset
of the dataset is 1150 seconds. Elapsed time to extract an enhanced feature from a
subset of the dataset is 1032 seconds. The similarity optimal metric for the models
for accuracy and recovery time are factors that should be considered.
The recovery time is significant here. When it considers a similar search engine,
scalability and capacity to process applications as quickly as possible should be
implemented. Differences, although small, are more significant on a larger scale.
The Annoy indexing process offers faster indexing and optimizes the comparison
process by building a rough static tree indexing structure.
The three resulting images corresponding to the query images demonstrating the
similarity search process of the image search engine are shown in Table I.
Table I RESULTS OF SIMILAR IMAGE Adapted from
[5]
Issue: 7 Jun 2023 Page 9
Journal of Electronic Systems and Programming www.epc.ly
The differences, although small, are large in scale over time:
Euclidean Distance: The time elapsed for 10 images for 10 queries are 4.33
seconds.
Manhattan Distance: The time elapsed for 10 images for 10 queries are 4.76
seconds.
Cosine Similarity: The time elapsed for 10 images for 10 queries are 3.83
seconds.
Annoy Indexing: The time elapsed for 10 images for 10 queries are 3.45
seconds.
II. Discussion
Deep convolutional neural networks (CNNs) are the dominant technology in
computer vision nowadays. Unfortunately, it’s not clear how different from
each other the best CNNs really are. The results from processing algorithms
are getting the same features. The selection of the training set may be more
important than the selection of the algorithm.
The table below summarizes three studies that provided search engine images
based on the model or algorithm used in search engine development and
techniques.
Table II Summery of models
N Study Techniques algorithms Advantage Disadvantage
o (Model)
1 A - CNN -ResNet - Using CNN The system is
Worldview enhancement not able to
search features color in generate rules
engine the satellite image. without
- High-speed and learning from
efficient storage. datasets to
generate
rules.
Issue: 7 Jun 2023 Page 10
Journal of Electronic Systems and Programming www.epc.ly
2 A Reverse - CNN - ResNet It successfully CNN models
Image - Boring -VGG16 researched accurate extract image
Search indexing - Inception similar images for characteristics
Engine for algorithm V3 the clothing , including
Garment (non- industry. background,
Industry supervisory, such that a
non-linear noisy
Spotify music background
recovery leads to
algorithm) confusion.
3 A Snap- - CNN - ResNet - The features cover -
shop An Classification his ability to search
Image for images at a
Based (supervised) much more specific
Search level than metadata.
Application
III. Conclusion
This paper tried to figure out the answer to our research question building a search
engine using image recognition. It examined three studies showing the model and
method used to build an image search engine. These studies are Worldview search
engine, Reverse Image Search Engine for Garment Industry, and Snapshot-An Image-
Based Search Application. However, the image recognition service is the basic
functionality of the app that enables users to extract visual information from images.
The technology and models used in the image search engine in these studies are CNNs
which are a subset of machine learning and are at the heart of deep learning algorithms.
They are part of data mining techniques and create an adaptive system that computers
use to learn from their mistakes and improve continuously. Thus, artificial neural
networks attempt to solve complicated problems like summarizing documents or
recognizing faces with greater accuracy. Their uses of the CNN models are successful,
but the lack of exploration of similarity measures diminishes the limit for their
implementation. Image recognition is a field of computer vision that has been growing
rapidly in recent years. It has many applications, including building search engines based
on image recognition. Future work, it can build an image search engine using deep
learning models such as CLIP model. CLIP is a neural network built to learn image
features through natural language supervision. The finished engine is quick and accurate.
Issue: 7 Jun 2023 Page 11
Journal of Electronic Systems and Programming www.epc.ly
IV. REFERENCES
[1] S. Jain and J. Dhar "Image based search engine using deep learning" 2017 Tenth International Conference
on Contemporary Computing (IC3), 2017, pp. 1-7, doi: 10.1109/IC3.2017.8284301.
[2] M. Seeley, F. Civilini, N. Srishankar, S. Praveen, A. Koul, A. Berea, and H. El-Askary "Knowledge
Discovery Framework" 2020.
[3] A.Sodani ,M.Levy,A.Koul, M.An and K.Ganju "Scalable Reverse Image Search Engine for NASA
Worldview" 10 Aug 2021.
[4] Karan D. Argade , Dhanashree M. Gaware, Prajkta S. Umap, Savita P. Nalawade , Snehal Baravkar
“Snapshop-An Image Based Search Application”, International Journal of Scientific Research in Science and
Technology Print ISSN: 2395-6011 |
Online ISSN: 2395-602X,Published : 17 June 2021 , mage_Based_Search_Application, accessed date:5
URL: May 2022.
https://www.academia.edu/49659377/Snapshop_An_I
[5] A.Eswaran and Varshini E "Reverse Image Search Engine for Garment Industry" 2022 8th International
Conference on Advanced Computing and Communication Systems (ICACCS), Authorized licensed use limited
to: UNIVERSITY OF STRATHCLYDE, Downloaded: July 16, 2022 at 09:35:38 UTC from IEEE Xplore.
Restrictions apply.
[6] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for
Computer Vision" 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp.
2818- 2826, doi: 10.1109/CVPR.2016.
[7] Venkata, Divya & Yadav, Divakar. (2012) "Image Query Based Search Engine Using Image Content
Retrieval" International Conference on Modeling and Simulation .0.1109/UKSim. 2012.48.
[8] Modeling and Simulation. 10.1109/UKSim.2012.48. [3] A. Nodari, M. Ghiringhelli, A. Zamberletti, M.
Vanetti, S. Albertini and I. Gallo "A mobile visual search application for content based image retrieval in the
fashion domain" 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), 2012, pp.
[9] J.Wu and LAMDA Group "Introduction to Convolutional Neural Networks" Namjing Universtity, China.
[10] "Building a powerful Image Search Engine for your pictures using Deep Learning", published: 22Jul
2021,URL: Building a powerful Image Search Engine using DL | CodeX (medium.com), accessed 8 Jun 2023.
Issue: 7 Jun 2023 Page 12
View publication stats