Deep Learning in Indus Valley Script Digitization
Deep Learning in Indus Valley Script Digitization
5-2024
Part of the Artificial Intelligence and Robotics Commons, and the Databases and Information Systems
Commons
Recommended Citation
Atturu, Deva Munikanta Reddy, "Deep Learning in Indus Valley Script Digitization" (2024). Theses and
Dissertations. 1416.
https://repository.fit.edu/etd/1416
This Thesis is brought to you for free and open access by Scholarship Repository @ Florida Tech. It has been
accepted for inclusion in Theses and Dissertations by an authorized administrator of Scholarship Repository @
Florida Tech. For more information, please contact [email protected].
DEEP LEARNING IN INDUS VALLEY SCRIPT DIGITIZATION
by
DEVA MUNIKANTA REDDY ATTURU
Bachelor of Technology
Computer Science and Engineering
Siddharth Institute of Engineering and Technology
2021
A Thesis
submitted to the College of Engieering and Science
at Florida Institute of Technology
in partial fulfillment of the requirements
for the degree of
Master of Science
in
Computer Science
Melbourne, Florida
May, 2024
© Copyright 2024 DEVA MUNIKANTA REDDY ATTURU
All Rights Reserved
Title:
DEEP LEARNING IN INDUS VALLEY SCRIPT DIGITIZATION
Author:
DEVA MUNIKANTA REDDY ATTURU
Major Advisor:
Debasis Mitra, Ph.D.
iii
lowed by the Motif on the IVC Seal in the structured format.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Conceptual Landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . 8
3.2 YoloV3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 MobileNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
v
5.1.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Grapheme Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.3 M-net Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Model Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3.1 M-net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 Motif Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.2 MIP-net Architecture . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.4 MIP-net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.5.1 UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.5.1.1 Class Diagram . . . . . . . . . . . . . . . . . . . . . . 23
5.5.1.2 Component Diagram . . . . . . . . . . . . . . . . . . . 24
5.5.1.3 Sequence Diagram . . . . . . . . . . . . . . . . . . . . 25
5.5.2 Storing the Final Data . . . . . . . . . . . . . . . . . . . . . . . 25
5.5.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5.3 Sample Queries to Retrieve Data from Database . . . . . . . . . 26
5.6 End-to-End Workflow of Indus Script Digitization . . . . . . . . . . . . 27
vi
6.4 Motif Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.5 Database Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.1 Low sample size for some classes . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Broken seals with only partially visible motif . . . . . . . . . . . . . . . 33
7.3 Stylistic variations and uncertain class . . . . . . . . . . . . . . . . . . 34
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
vii
List of Figures
viii
List of Symbols, Nomenclature or
Abbreviations
ix
Acknowledgements
I owe a profound debt of gratitude to my thesis advisor, Dr. Debasis Mitra, for his
unwavering support, invaluable guidance, and constant motivation throughout this
research endeavor. His depth of knowledge and insightful critiques have been instru-
mental in shaping every aspect of this thesis. I extend my heartfelt thanks to my
esteemed committee members, Dr. Xianqi Li and Dr. Eraldo Ribeiro, for graciously
accepting to be part of my committee and for their invaluable contributions and sup-
port. Additionally, I am deeply appreciative of the staff and faculty at the Florida
Institute of Technology for fostering a stimulating academic environment conducive to
intellectual growth. Lastly, I extend my gratitude to all the participants whose invalu-
able contributions made this research possible. Special thanks also go to Shubam B,
Ali N, Ujjwal B, and Mukharjee A for their significant contributions.
National Endowment for the Humanities : PR-290075-23.
x
Dedication
To my beloved family, whose unwavering love, encouragement, and sacrifices have been
my anchor throughout this academic journey. Your steadfast support has fueled my
determination to reach this milestone. This thesis is dedicated to you, with heartfelt
gratitude and immense love. To my esteemed thesis advisor, Debasis Mitra, whose
constant support, guidance, and motivation have been indispensable. Your expertise
and insightful critiques have profoundly shaped this thesis. I also extend my sincere
appreciation to my committee members, Dr. Xianqi Li and Dr. Eraldo, for their
invaluable feedback and support. To the staff and faculty at Florida Institute of Tech-
nology, thank you for fostering a stimulating academic environment that has nurtured
my growth and learning. And to all the participants who made this research possible,
your contributions are deeply appreciated. This thesis is a reflection of the collective
efforts and support that have propelled me forward. Thank you all.”.
xi
Chapter 1
Introduction
The Indus Civilization, also known as the Harappan Civilization, represents one of the
world’s oldest urban societies, flourishing in the vast floodplains of the Indus River and
possibly the now-extinct Saraswati River in present-day Pakistan and northwest India.
Spanning roughly from 2600 BCE to 1900 BCE, this ancient civilization is renowned
for its advanced urban planning, sophisticated drainage systems, standardized weights
and measures, and distinctive artifacts, including seals bearing inscriptions in the enig-
matic Indus script. Despite its prominence, the Indus script remains undeciphered,
posing a significant challenge to scholars seeking to unravel the mysteries of this an-
cient civilization. Unlike other ancient civilizations such as Egypt and Mesopotamia,
which have benefited from the discovery of bilingual inscriptions like the Rosetta Stone,
the Indus Civilization lacks a comparable linguistic key, hindering efforts to decipher
its script and understand its society, economy, and culture.
Over the past century, scholars have engaged in meticulous studies of the Indus
script, employing various methodologies to decipher its meaning. However, the ab-
sence of a Rosetta Stone equivalent has compelled researchers to explore alternative
approaches, such as statistical analyses of grapheme sequences, intra-script grapheme
1
associations, and contextual clues derived from archaeological artifacts. These manual
efforts, while insightful, are labor-intensive, time-consuming, and limited in scalability.
In recent years, advancements in data science and machine learning have opened up
new avenues for the computational analysis of ancient scripts, offering the potential to
automate and expedite the decipherment process. To address the challenge of grapheme
identification within the Indus script, we propose the use of ASR-net, a novel neural
network architecture that combines the strengths of M-net and YOLOv3 for efficient
and accurate identification of individual graphemes. ASR-net leverages the capabilities
of M-net for character recognition and YOLOv3 for object detection, enabling robust
detection and classification of graphemes on Indus seals.
Moreover, motif identification on Indus seals presents another significant challenge,
as these motifs often serve as key elements for understanding the symbolic and cul-
tural significance of the artifacts. To tackle this challenge, we introduce MIP-net, a
machine learning framework specifically designed for motif identification in archaeo-
logical imagery. MIP-net employs convolutional neural networks (CNNs) trained on
annotated datasets of Indus seals to automatically identify and classify motifs, allowing
for efficient analysis of large collections of artifacts.
In light of these developments, our research aims to bridge the gap between tradi-
tional scholarship and computational analysis by proposing a machine learning-based
approach for the automated identification and analysis of motifs—distinctive symbols
or iconographic elements—found on Indus seals. These seals, typically made of steatite
or other soft stones, feature intricate engravings comprising motifs, often accompanied
by short inscriptions in the Indus script. By leveraging ASR-net for grapheme identifi-
cation and MIP-net for motif identification, our proposed system seeks to automate the
process of deciphering Indus seals, enabling researchers to efficiently analyze large col-
lections of artifacts and extract valuable insights into the socio-cultural and economic
2
aspects of the Indus Civilization.
Additionally, we have developed a comprehensive database comprising high-resolution
images of Indus seals, along with metadata detailing their provenance, dimensions, and
associated inscriptions where available. This database serves as a foundational resource
for our research, providing a rich repository of visual and contextual data for training
and validating our machine learning models. Through the development of automated
tools for motif identification, we aim to contribute to the broader scholarly efforts aimed
at deciphering the Indus script and shedding light on the rich tapestry of the ancient
Indus Civilization. By harnessing the power of machine learning and computational
analysis, we hope to unlock new avenues of research and deepen our understanding of
this enigmatic ancient society.
Below is the brief description of what the chapter describes about.
Chapter 2 presents a comprehensive survey of existing literature on the decipher-
ment of the Indus script. Traditional methodologies and computational approaches
used in Indus script analysis are reviewed, critically evaluating previous efforts and
identifying gaps in research.
Chapter 3 introduces key concepts and methodologies employed in the research. It
explains machine learning algorithms and techniques relevant to motif identification,
along with an overview of data annotation, model training, and evaluation processes.
Chapter 4 describes the proposed methodology for automated motif identification
on Indus seals. It discusses the rationale behind the selection of machine learning
algorithms and data preprocessing techniques, providing an outline of the workflow for
data annotation, model training, and deployment.
Chapter 5 provides a detailed explanation of the implementation process, including
data collection, annotation, and model training. It describes the tools and technolo-
gies utilized in the implementation phase, along with an overview of the challenges
3
encountered and solutions devised during implementation and includes Unified Mod-
eling Language (UML) diagrams illustrating the system architecture, data flow, and
entity relationships. It explains each diagram and its relevance to the proposed ap-
proach and implementation..
Chapter 6 presents and analyzes the results obtained from the implementation
phase. It evaluates the performance of the machine learning models in motif identi-
fication and discusses the implications of the results for deciphering the Indus script
and understanding the Indus Civilization.
Chapter 7 discusses the challenges encountered during the research process. It ex-
plores the difficulties faced in implementing the proposed approach, including technical
limitations, data quality issues, and methodological constraints.
Chapter 8 provides a summary of the research findings and their significance in
the context of deciphering the Indus script. It reflects on the strengths and limita-
tions of the proposed approach and proposes future research directions and potential
improvements to the methodology.
Next we have the bibliography, listing all the references cited throughout the thesis
or research paper. It provides readers with a comprehensive list of sources for further
reading and verification of the information presented in the document.
The following part will elaborate on the background work associated with the
project.
4
Chapter 2
Literature Survey
The study by Varun Venkatesh et al. [31] investigated the Indus script by analyzing
patterns and positions of individual signs, pairs, and sequences. They built statistical
models and algorithms to predict sign behavior based on their position. This analysis
revealed significant differences in the language used in Indus texts from West Asia
compared to those from the Indian subcontinent, suggesting distinct regional dialects
within the Indus civilization.
Researchers have proposed a novel method to tackle the challenges of deciphering
undeciphered scripts like the Indus Valley Script in a study by Shurthi Daggumati et
al. [3]. This method focuses on identifying and grouping together different ways of
writing the same symbol (allographs) based on their positions within the inscriptions.
The authors argue that this approach can significantly simplify the script by reducing
the number of unique symbols, potentially paving the way for a breakthrough in de-
ciphering its hidden messages. They applied their method to the Indus Valley Script
and identified 50 symbol pairs that could be grouped, reducing the complexity of the
script by 12%. This exciting development holds promise for unlocking the secrets of
these ancient languages.
5
In a paper by Michael Oakes et al. [17], the distribution of Indus Valley script signs
found in Mahadevan’s 1977 concordance is analyzed. Using Large Numbers of Rare
Events (LNRE) models, the authors estimate a vocabulary of around 857 signs, includ-
ing undiscovered ones. Statistical analysis reveals non-random distributions based on
factors like position, archaeological site, object type, and direction of writing. The au-
thors conclude that further analysis is needed to understand the underlying structure
and meaning of the Indus Valley script.
While the study by Ansumali Mukhopadhyay et al. [16] offers an intriguing ap-
proach to deciphering the Indus Valley script using Dravidian languages, it acknowl-
edges several key areas requiring further exploration. The connection between Dravid-
ian languages and the Rig Veda remains a point of debate within academic circles, and
the vast timeframe between the Mehargarh civilization and the Indus Valley necessi-
tates careful consideration. Additionally, the paper highlights the uncertainties sur-
rounding the Aryan invasion and its impact on pottery styles. By acknowledging these
open questions and encouraging further research, the analysis ultimately contributes
to the ongoing quest to unlock the secrets of the Indus script, even if it doesn’t provide
definitive answers at this stage.
In their study, S.Palaniappan et al. [18] recognize the endeavor to automate the
preparation of standardized corpora for undeciphered scripts as a significant challenge,
often requiring laborious manual effort from raw archaeological records. Recent efforts
have sought to address this challenge by exploring the potential of machine learning al-
gorithms to streamline the process, offering valuable insights for epigraphical research.
Building upon this groundwork, authors present a pioneering deep learning pipeline
tailored for the Indus script, aiming to automate the extraction and classification of
graphemes from archaeological artifacts. Through the integration of convolutional neu-
ral networks and established image processing techniques, their methodology demon-
6
strates promising advancements in accurately identifying and categorizing textual el-
ements. This work contributes to the evolving landscape of computational epigraphy,
showcasing the potential of deep learning approaches to revolutionize research method-
ologies in the digital humanities domain.
The related works presented by the cited papers offer valuable insights and method-
ologies relevant to the project of deep learning in Indus Valley script digitization.
Firstly, they highlight the complexity of the script and the challenges associated with
deciphering it, emphasizing the need for innovative approaches. The studies on statisti-
cal analysis and allograph identification provide crucial groundwork for understanding
the patterns and structures within the script, which can inform the design of deep
learning models. Additionally, the exploration of linguistic connections, such as with
Dravidian languages, offers potential insights into the script’s origins and linguistic
context. Moreover, the efforts to automate corpus preparation and grapheme extrac-
tion demonstrate the application of advanced computational techniques, particularly
deep learning, in streamlining the digitization process. By building upon these previous
works, the project aims to leverage deep learning algorithms to automate the analysis
and interpretation of the Indus Valley script, ultimately contributing to the broader
goal of unlocking its hidden messages and historical significance.
The subsequent section will detail the array of concepts utilized in the project.
7
Chapter 3
Conceptual Landscape
Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision
by introducing powerful hierarchical representations of visual data. Unlike traditional
neural networks, CNNs are specifically designed to effectively capture spatial hierar-
chies in images through the use of convolutional layers. These layers consist of filters
that slide over input images, capturing local patterns and features at different spatial
scales. By stacking multiple convolutional layers followed by pooling layers, CNNs are
able to progressively learn complex representations of visual data.
The architecture of a typical CNN comprises multiple layers, including convolutional
layers, activation functions, pooling layers, and fully connected layers. Convolutional
layers are responsible for learning features from input images by applying convolution
operations with learnable filters. Activation functions, such as ReLU (Rectified Linear
Unit), introduce non-linearity to the network, allowing it to learn complex relationships
between features. Pooling layers, such as max pooling or average pooling, downsample
feature maps to reduce the spatial dimensions and computational complexity of subse-
8
Figure 3.1: Basic CNN Architecture
quent layers. Fully connected layers integrate extracted features for final classification
or regression tasks.
CNNs have demonstrated remarkable success in various computer vision tasks, in-
cluding image classification, object detection, and semantic segmentation. Their ability
to automatically learn hierarchical representations of visual data has led to significant
advancements in fields such as medical imaging, autonomous driving, and image-based
biometrics. Additionally, CNNs have been widely adopted in industry applications,
powering image recognition systems in smartphones, surveillance cameras, and quality
control systems.
The widespread adoption of CNNs can be attributed to their effectiveness in han-
dling large-scale visual data, robustness to variations in input, and scalability to dif-
ferent tasks and domains. Their architecture and design principles have laid the foun-
dation for numerous advancements in deep learning and computer vision research.
As CNNs continue to evolve with innovations such as residual connections, attention
mechanisms, and efficient architectures like MobileNet, they remain at the forefront of
cutting-edge research and practical applications in the field of computer vision.
9
3.2 YoloV3
YOLOv3, short for You Only Look Once version 3, is an advanced object detection
model renowned for its efficiency and accuracy. Introduced by Joseph Redmon and Ali
Farhadi in 2018, YOLOv3 represents a significant improvement over its predecessors by
incorporating several key enhancements. The fundamental concept behind YOLOv3 is
its ability to perform object detection in real-time by dividing the input image into a
grid and predicting bounding boxes and class probabilities directly from the grid cells.
10
resolutions.
One of the key features of YOLOv3 is its ability to predict bounding boxes at
different scales using a technique called multi-scale prediction. This allows YOLOv3
to detect objects of varying sizes and aspect ratios with high accuracy. Additionally,
YOLOv3 incorporates anchor boxes to improve the localization of objects by predicting
bounding box offsets relative to predefined anchor shapes.
YOLOv3 has gained widespread popularity due to its impressive performance in
real-time object detection tasks across diverse domains, including surveillance, au-
tonomous driving, and robotics. Its efficiency in processing images and videos in real-
time makes it a popular choice for applications requiring rapid and accurate object
detection capabilities.
We integrate YOLOv3 into our project to create bounding boxes around the graphemes
present on Indus seals. This enables us to accurately identify and isolate the individual
graphemes for further analysis. YOLOv3’s efficiency in processing images and videos
in real-time makes it a popular choice for applications requiring rapid and accurate
object detection capabilities.
3.3 MobileNet
11
these layers sequentially, MobileNet achieves a remarkable reduction in the number of
parameters and computations required, making it particularly well-suited for deploy-
ment on resource-constrained platforms.
12
The significance of MobileNet lies in its ability to democratize deep learning on
mobile devices, enabling a wide range of applications in fields such as image classifi-
cation, object detection, and semantic segmentation. By reducing the computational
burden without compromising accuracy, MobileNet empowers developers to deploy so-
phisticated computer vision models on smartphones, tablets, and other edge devices.
Its efficiency makes it an ideal choice for real-time applications where latency and re-
source constraints are critical considerations. As a result, MobileNet has become a
cornerstone in the development of mobile vision applications, driving innovation and
accessibility in the field of deep learning for mobile platforms.
TensorFlow, developed by Google Brain, is an open-source machine learning frame-
work renowned for its flexibility, scalability, and ease of use. TensorFlow provides com-
prehensive tools and resources for building, training, and deploying machine learning
models across a variety of platforms, including mobile and embedded devices. With its
robust ecosystem and support for diverse hardware accelerators, TensorFlow enables
developers to seamlessly integrate sophisticated deep learning models such as Mo-
bileNet into mobile applications. Furthermore, TensorFlow’s optimization techniques,
such as model quantization and conversion to TensorFlow Lite format, further enhance
the deployment efficiency of deep learning models on resource-constrained platforms.
In our project, we utilized MobileNet to encode graphemes, leveraging its effi-
cient architecture to handle the computational demands of processing visual data on
resource-constrained devices. By integrating MobileNet into our workflow, we were
able to achieve high performance in grapheme encoding.
The upcoming chapter will detail the proposed approach.
13
Chapter 4
Proposed Approach
In this section, I outline the methodology employed in our project, which integrates
various deep learning models to analyze and extract information from visual data.
Firstly, we utilize the YOLOv3 model as a foundational component of our system.
YOLOv3 acts as a robust visual detector, efficiently identifying and delineating indi-
vidual characters within input images. This is akin to the process of drawing chalk
outlines around suspects at a crime scene, where each character is enclosed within
a bounding box. These bounding boxes serve as the initial step in organizing and
preparing the visual data for further analysis.
Following the detection stage, our approach incorporates specialized models such
as M-net and MIp-net to delve deeper into the extracted bounding boxes.
M-net is responsible for decoding the sequence of graphemes represented by each
bounding box. It meticulously analyzes the spatial arrangement of characters within
the image, sorting them from top to bottom and investigating each row from right to
left. This sequential processing mirrors the reading pattern observed in certain lan-
guages and ensures accurate character recognition, even in scenarios involving multiple
lines of text.
14
On the other hand, MIp-net focuses on extracting information regarding motifs and
symbols present in the input image. By examining the deeper context and symbolism
embedded within visual elements, MIp-net enriches our understanding of the image’s
content beyond mere character recognition.
The collaborative approach of these models allows for efficient processing and ex-
traction of valuable insights from diverse visual data. While YOLOv3 handles the
initial detection and organization of characters, M-net and MIp-net specialize in deci-
phering the identities of characters and extracting contextual information, respectively.
This synergy enables our system to provide comprehensive analysis and utilization of
visual data stored within our database.
By combining these advanced deep learning techniques, our proposed approach
aims to achieve accurate and insightful analysis of visual data, contributing to various
applications such as image understanding, text recognition, and content extraction.
The subsequent chapter will provide an in-depth discussion on constructing the
model.
15
Chapter 5
5.1.1 Description
Data Collection: The process began with the collection of images containing graphemes.
These images likely consisted of text or handwritten characters that needed to be an-
alyzed. In total, 232 images were gathered for training purposes. Annotation: Each
image was meticulously annotated to mark the location of individual graphemes. This
annotation process likely involved outlining or labeling each grapheme within the im-
age. The annotations were then stored in XML files, which served as a structured for-
mat to record the coordinates and other relevant information about each grapheme’s
position within the image. Model Selection: YOLOv3, short for ”You Only Look Once
version 3,” was chosen as the object detection model for this task. YOLOv3 is known
for its efficiency and accuracy in detecting objects within images.
16
5.1.2 Dataset
Training: The YOLOv3 model was trained using the 232 annotated images. Dur-
ing training, the model learned to recognize the patterns and features associated with
graphemes within the images, ultimately enabling it to predict bounding boxes around
them. Validation: To assess the performance of the trained model and ensure its gener-
alization ability, a separate set of 13 images with annotations was used for validation.
These images were likely selected to represent a diverse range of scenarios and grapheme
configurations.
5.2.1 Description
In the initial approach, Convolutional Neural Networks (CNNs) are employed to rec-
ognize characters within bounding boxes due to their adeptness in learning and ex-
tracting features from images automatically. The M-net model is integrated into this
architecture to provide further refinement in character recognition. Unlike traditional
CNNs that operate on entire images, M-net focuses specifically on the characters within
bounding boxes, ensuring precise decoding of sequences of graphemes.
During the process, M-net meticulously analyzes the spatial arrangement of char-
acters within each bounding box. It follows a sequential processing approach, sorting
characters from top to bottom and examining each row from right to left. This ap-
proach mirrors typical reading patterns in certain languages, ensuring accurate char-
acter recognition even in complex scenarios involving multiple lines of text or irregular
arrangements.
Furthermore, as part of the validation process, multiple layers of CNN-based classi-
17
fication models are utilized. These models work in conjunction with M-net to validate
and refine the accuracy of character recognition. The combination of CNN-based classi-
fication models and M-net’s sequential processing enhances the robustness of character
recognition within the bounding boxes.
Additionally, to explore avenues for further improvement, transfer learning tech-
niques are employed. Pre-trained transfer learning-based models, including popular
architectures like ResNet and DenseNet, are considered. While traditionally used for
image classification tasks, these models can be adapted and fine-tuned to enhance char-
acter recognition within bounding boxes. By integrating transfer learning techniques
with the M-net model, the initial approach aims to leverage the knowledge and fea-
tures learned from large datasets to improve the accuracy and efficiency of character
recognition in diverse scenarios.
Overall, the M-net model serves as a critical component within the initial approach,
contributing to the accuracy and robustness of character recognition within bounding
boxes. Its sequential processing, combined with the capabilities of CNN-based classi-
fication models and transfer learning techniques, enables comprehensive analysis and
extraction of information from visual data.
5.2.2 Dataset
There are a total number of 40 classes(labels) in the dataset. The 40 labels are: M8,
M12, M15, M17, M19, M28, M48, M51, M53, M59, M102, M104, M141, M162, M173,
M174, M176, M204, M205, M211, M216, M245, M249, M267, M287, M294, M296,
M302, M307, M326, M327, M328, M330, M336, M342, M387, M389, M391, Other.
The number of Images used for Training - 12,264 (300+ images for each class) The
number of Images used for Validation - 200 (5 Images for each class).
18
5.2.3 M-net Architecture
5.3.1 M-net
The above graph showing the accuracy of a model called the M-net Model. The x-
axis of the graph is labeled ”Epoch” and the y-axis is labeled ”Accuracy”. The graph
shows that the accuracy of the model increases as the number of epochs increases. The
training accuracy is shown in blue and the validation accuracy is shown in green. The
highest training accuracy is 0.94 and the highest validation accuracy is 0.95. The model
has been trained on 40 classes with around 12,264 images with pre-augmentation. The
validation data doesn’t undergo the augmentation which has 200 images in total for
all the classes. We can see that the accuracy started with 0.40 which reaches the 0.94
for 10 epochs.
19
Figure 5.2: M-Net Accuracy
5.4.1 Description
The MIP-net model, short for Motif Identification and Prediction Network, is a machine
learning model designed for motif identification tasks. In this case, it’s specifically
trained for identifying motifs in images, particularly the IVC Seal image.
Here’s how the process typically works:
Training the MIP-net Model: The MIP-net model is trained using a dataset of IVC
Seal images, where each image is associated with a particular motif. The model learns
to recognize patterns and features in the images that are indicative of different motifs.
Utilizing 11 Different Classes: The model is trained to classify the motifs into
11 different classes. These classes represent the different motifs that the model can
identify. Each class corresponds to a specific motif that the model has been trained to
recognize.
20
Input Image and Prediction: When an IVC Seal image is provided as input to the
trained MIP-net model, the model predicts the probability for each of the 11 classes.
This is done by passing the image through the trained neural network, which computes
the likelihood or confidence score for each motif class.
Selecting the Most Probable Motif: After obtaining the probabilities for each class,
the model selects the class with the highest probability as the predicted motif. In other
words, the class that the model is most confident about is chosen as the output motif.
Returning the Output: Finally, the predicted motif, along with its associated prob-
ability score, is returned as the output of the model. This motif represents the pattern
or feature that the model believes is present in the input IVC Seal image.
Overall, the MIP-net model serves as a tool for automatically identifying motifs in
IVC Seal images, providing a systematic and efficient way to analyze and categorize
these images based on their visual characteristics.
21
5.4.3 Dataset
There are a total number of 11 classes(labels) in the dataset. 11 Labels used are ”buf-
falo”, ”bull”, ”elephant”, ”horned ram”, ”man holding tigers”, ”pashupati”, ”sharp
horn and long trunk”, ”short horned bull with head lowered towards a trough”, ”swastik”,
”tiger looking man on tree”, ”unicorn”.
The number of Images used for Training - 3300. The number of Images used for
Augmentation - 55 (5 Images for each class).
5.4.4 MIP-net
The above graph showing the accuracy of a model called the MIP-net Model. The
x-axis of the graph is labeled ”Epoch” and the y-axis is labeled ”Accuracy”. The graph
shows that the accuracy of the model increases as the number of epochs increases. The
22
training accuracy is shown in blue and the validation accuracy is shown in green. The
highest training accuracy is 0.95 and the highest validation accuracy is approximately
0.96. The model has been trained on 11 classes with around 3300 images with pre-
augmentation. The validation data doesn’t undergo the augmentation which has 55
images in total for all the classes. We can see that the accuracy started with 0.20 which
reaches the 0.96 after trained for 10 epochs.
5.5 Database
The class diagram illustrates the structure of the system by showing the classes in
the system and their relationships. . In this context, the class diagram depicts the
main entities involved in the pipeline, such as Image, Grapheme, and Motif, along
with their attributes and associations. It provides an overview of the data structure
23
and relationships within the system, aiding in understanding the organization of the
system’s components
The component diagram illustrates the physical deployment of components in the sys-
tem and their interactions.
In this context, the component diagram depicts the various components involved in
the system, such as the Image Processing Module, YOLOv3 Model, MobileNet Model,
MIP-net Model, and Database. It provides an overview of the deployment architecture
of the system, showing how different components are interconnected and deployed in
the system environment.
24
5.5.1.3 Sequence Diagram
The sequence diagram illustrates the interactions between objects in the system over
time, showing the flow of messages between objects. In this context, the sequence
diagram depicts the sequence of actions involved in executing the pipeline, from the
user initiating the process to the various components processing the image and storing
the data. It helps in understanding the dynamic behavior of the system and the
sequence of activities performed during the execution of the pipeline.
5.5.2.1 Description
Storing project results in a SQL database is crucial for data management and acces-
sibility. This step ensures that the valuable insights gained from the previous phases
of the project are preserved in a structured and organized manner. Here’s a detailed
breakdown of the process:
Database Setup: First, a SQL database needs to be set up. This involves creating
a new database or using an existing one where the project results will be stored. The
database schema should be designed to accommodate the data to be stored, ensuring
that it reflects the structure of the project results.
25
Table Creation: Within the database, tables need to be created to represent dif-
ferent entities or aspects of the project results. For example, there may be a table to
store image data, another table for grapheme sequences, and another for motifs. Each
table should have appropriate columns to store relevant information, such as ImageID,
Image, GraphemeSequence, and Motif.
Data Insertion: Once the tables are set up, the project results can be inserted into
the database. This involves executing SQL INSERT statements to add records to the
respective tables. For image data, the actual images may be stored in the database as
binary large objects (BLOBs) or as file paths pointing to image files stored externally.
Grapheme sequences and motifs are typically stored as text or varchar data types.
Data Retrieval and Querying: SQL SELECT statements can be used to extract
specific data or perform analysis on the stored information.
Data Integrity and Maintenance: It’s essential to ensure data integrity within the
database. This involves implementing constraints, such as primary keys, foreign keys,
and unique constraints, to maintain data consistency and prevent errors.
Scalability and Performance: As the project progresses and more data is collected,
the database should be scalable to accommodate the growing volume of information.
Overall, storing project results in a SQL database provides a centralized and struc-
tured repository for the data, enabling easy access, analysis, and collaboration among
project team members. It ensures that the insights generated from the project are
well-preserved and can be leveraged effectively for future research or decision-making
purposes.
• SELECT * FROM details; - display all the rows from the details table which
represents the ImageID, Image, GraphemeSequence and Motif.
26
• SELECT * FROM details where Motif in (”swastik”,”bull”) - display all the rows
of data where the Motif is either swastik or bull.
tion
In the upcoming chapter, you can expect a thorough exploration of the results
achieved.
27
Chapter 6
This pipeline processes images of seals to extract information about graphemes (written
symbols) and motifs (patterns) on the seal. Here’s a breakdown of each step:
The pipeline starts with an image as input. This image is resized and reshaped to
match the specific format required by the trained model. This ensures compatibility
and optimal processing.
28
6.2 Bounding Box creation
A MobileNet model named ”Mahadevan” takes the grapheme bounding boxes from
the previous step as input.
29
This model extracts features from each grapheme based on its location and ap-
pearance. These extracted features are then encoded into text format and stored in a
separate text file alongside the original image.
The MIP-net model analyzes the original image again, this time focusing on identifying
motifs present on the seal. Motifs could be specific patterns, symbols, or designs with
meaning. MIP-net extracts information about these motifs and provides it in a format
understandable by the system.
30
• Motif: Information about the identified motifs from step 4.
The subsequent chapter will outline the intricacies surrounding the challenges en-
countered.
31
Chapter 7
Challenges
As mentioned before, a few motifs may be sparsely present in any corpora. Please note
that any corpora happens to be only a small subset of the seals produced by IC over
nearly a thousand years and over a large geographical region.
32
archaeologist, it is not feasible for most ML algorithms to learn a motif from only one
sample. Extreme disbalance in sample sizes over multiple classes is our first challenge
in deep learning.
Many seals are broken during their long burial period, or even discarded for being
broken during their active lifetime. A broken seal may not have a reduced importance
in archaeological research. For example, the context in which the seal was used, and
the motif present on it, may infer the same conclusion irrespective of the seal being
broken or not. The motif present on a broken or damaged seal may be only partially
visible, and yet, it may be well recognizable by a human by observing only a small part
it. Can our ML model be trained to perform at the same level as a human being in
recognizing motif from only a small but relevant part of it? We address this question
in this work.
33
7.3 Stylistic variations and uncertain class
Artisans from IVC have curved motifs in many different styles and variations, either
for artistic reasons or for conveying some meaning. For identification purpose, archae-
ologists group all such variability under one motif class or type. For example, the most
frequently found Unicorn motif may have two to twelve thread patterns on their necks.
Wide variation within a class, which is a challenge for deep learning algorithms, unless
each variation is strongly present in the training set. Another problem is that a motif
may look like a different one, even to the human eyes. For example, a ”horned-zebra”
may look like a ”unicorn.” While most such cases could be discerned easily by an expert
with only a closer examination, it is not clear how can one train an ML algorithm to
make such discrimination over different motifs that look very similar.
The upcoming chapter will cover information about the future prospects and con-
clusion.
34
Chapter 8
The future of analyzing ancient civilizations lies in enriching the data available for
study. One key approach involves incorporating additional sources like the Parpo-
la/Uesugi corpus and the complete motif list from the Mahadevan corpus. This ex-
panded data pool will allow researchers to delve deeper into the linguistic nuances of
ancient texts and uncover the symbolic meanings embedded in artifacts. With a more
comprehensive understanding of these elements, we can gain a richer appreciation of
the cultural heritage of these lost civilizations.
Our understanding of the past can be further enhanced by moving beyond the study
of individual civilizations. Expanding the scope of research to include other ancient
societies, like Mesopotamia, Egypt, and Mesoamerica, presents exciting opportunities.
35
By comparing and contrasting writing systems and cultural practices across different
regions and time periods, we can uncover broader patterns and trends in human devel-
opment. This comparative approach will provide a richer tapestry of human history,
allowing us to appreciate the diversity and interconnectedness of ancient civilizations.
As I plan my future projects, I intend to explore advanced techniques for object detec-
tion beyond the current framework. While YOLOv5 has gained attention, I will also
investigate alternative methodologies that align with my project requirements. By
conducting this exploration, I aim to identify solutions that can significantly enhance
object detection performance, ensuring the reliability and accuracy of my system.
36
recognition. By adopting such an approach, I anticipate elevating the overall efficacy
and performance of my systems for text analysis tasks.
8.2 Conclusion
37
which are believed to hold specific functions within certain periods of the civilization.
Despite the challenges associated with applying deep learning to MIP, the creation of
an open-source dataset of annotated seals serves as a crucial stepping stone for further
theoretical archaeological research on the Indus Valley Civilization. Through the inte-
gration of advanced technological approaches and interdisciplinary collaboration, this
research contributes to the ongoing efforts to decipher the ancient mysteries embedded
within the artifacts of the Indus Valley Civilization.
The following news items have been made about the project:
• phys.org
• Infobae.com
• Omnia.com
In the chapter that follows, you’ll find a thorough exposition on the bibliography.
38
Bibliography
[1] Berkeley Vision and Learning Center. BVLC GooLeNet ILSVRC 2014 Snapshot.
https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet.
[2] Andrew Brock and et al. Biggan: Large scale gan training for high fidelity natural
image synthesis. Proceedings of the International Conference on Learning Repre-
sentations (ICLR), 2019.
[4] Jia Deng and et al. Imagenet: A large-scale hierarchical image database. 2009.
[5] Mark Everingham and et al. The pascal visual object classes challenge: A retro-
spective. International Journal of Computer Vision, 111(1):98–136, 2015.
[6] Ross Girshick. Fast r-cnn. Proceedings of the IEEE international conference on
computer vision, 2015.
[7] Ian Goodfellow and et al. Generative adversarial nets. Advances in neural infor-
mation processing systems, 27:2672–2680, 2014.
[8] Kaiming He and et al. Deep residual learning for image recognition. Proceedings
of the IEEE conference on computer vision and pattern recognition, 2016.
39
[9] Kaiming He and et al. Mask r-cnn. Proceedings of the IEEE international conference
on computer vision, 2017.
[10] Gao Huang and et al. Densely connected convolutional networks. Proceedings of
the IEEE conference on computer vision and pattern recognition, 2017.
[11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification
with deep convolutional neural networks. Advances in neural information processing
systems, 25:1097–1105, 2012.
[12] Yann LeCun and et al. Gradient-based learning applied to document recognition.
Proceedings of the IEEE, 86(11):2278–2324, 1998.
[13] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint
arXiv:1312.4400, 2013.
[14] Tsung-Yi Lin and et al. Feature pyramid networks for object detection. Proceed-
ings of the IEEE conference on computer vision and pattern recognition, 2017.
[15] Wei Liu and et al. Ssd: Single shot multibox detector. European Conference on
Computer Vision, 2016.
[16] Ansumali Mukhopadhyay and An. Ancestral dravidian languages in indus civi-
lization: Ultraconserved dravidian tooth-word reveals deep linguistic ancestry and
supports genetics. Humanit Soc Sci Commun, 8(193):193, 2021.
[17] Michael Oakes and Michael P. Oakes. Statistical analysis of the tables in mahade-
van’s concordance of the indus valley script. Journal of Quantitative Linguistics,
26(4):401–422, 2019.
[18] S Palaniappan and R. Adhikari. Deep learning indus script. PLOS Submission,
2017.
40
[19] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transac-
tions on Knowledge and Data Engineering, 22(10):1345–1359, 2009.
[20] Asko Parpola. The indus script: A challenging puzzle. World Archaeology,
17(3):399–419, Feb. 1986.
[21] Venkatesh-Prasad Ranganath and John Hatcliff. An overview of the indus frame-
work for analysis and slicing of concurrent java software (keynote talk - extended
abstract). In Proceedings of the Sixth IEEE International Workshop on Source Code
Analysis and Manipulation (SCAM ’06). IEEE Xplore, October 2006.
[22] S. R. Rao. Indus script and language. Annals of the Bhandarkar Oriental Research
Institute, 61(1/4):157–188, 1980.
[24] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. 2018.
[25] Shaoqing Ren and et al. Faster r-cnn: Towards real-time object detection with re-
gion proposal networks. Advances in neural information processing systems, 28:91–
99, 2015.
[26] Olga Russakovsky and et al. Imagenet large scale visual recognition challenge.
International Journal of Computer Vision, 115(3):211–252, 2015.
[27] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition. 2014.
[28] Christian Szegedy and et al. Going deeper with convolutions. Proceedings of the
IEEE conference on computer vision and pattern recognition, 2015.
41
[29] Christian Szegedy and et al. Inception-v4, inception-resnet and the impact of
residual connections on learning. Proceedings of the AAAI conference on artificial
intelligence, 2017.
[30] Ali A. Vahdati and Raffaele Biscione. The seals from khorsan. In A. Parpola
and P. Koskikallio, editors, Corpus of Indus Seals and Inscriptions 3.3 (CISI 3.3),
pages l–lvi. Printed in Finland by Kirjapaino Hermes Oy, Tampere, Helsinki, jan
2022.
[31] Varun Venkatesh and Ali Farghaly. Identifying anomalous indus texts from west
asia using markov chain language models. pages 1–7, 2023.
[32] Jason Yosinski and et al. How transferable are features in deep neural networks?
Advances in neural information processing systems, 27:3320–3328, 2014.
42