Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
86 views30 pages

Final Doc Genomics

This document provides an introduction to genomics and its intersection with artificial intelligence (AI) and machine learning (ML). It discusses key concepts in genomics like DNA, genes, and genomes. It also covers the importance of genomics in healthcare and research. Additionally, it introduces AI/ML and explains why these fields are needed for genomic analysis given the vast amounts of complex genomic data. The document outlines several ways AI/ML are being applied in genomics like genome sequencing, disease prediction, and drug discovery.

Uploaded by

ydeepak3240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views30 pages

Final Doc Genomics

This document provides an introduction to genomics and its intersection with artificial intelligence (AI) and machine learning (ML). It discusses key concepts in genomics like DNA, genes, and genomes. It also covers the importance of genomics in healthcare and research. Additionally, it introduces AI/ML and explains why these fields are needed for genomic analysis given the vast amounts of complex genomic data. The document outlines several ways AI/ML are being applied in genomics like genome sequencing, disease prediction, and drug discovery.

Uploaded by

ydeepak3240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Table of Content

Candidates Declaration…………………………………………………………………….(ii)

Certificate…………………………………………………………………………………..(iii)

Acknowledgement………………………………………………………………………….(iv)

1. Introduction.......................................................................................................................7

1.1 Importance of Genomics in Healthcare and Research 8


1.2 Genomics Basic Terminology 8-9
1.3 Genomics V/S Genetics 9-10

2.BASICS OF GENOMICS...................................................................................................11

2.1 Definitions of Genomics 11


2.2 Structure and Function of DNA 11
2.3 Genome Sequencing Techniques 12
2.4 Genomic Variation and Its Significance 12-13
2.5 Functional Genomics 13-14

3. INTRODUCTION TO AI & ML......................................................................................15

3.1 Artificial Intelligence and Key concepts 15-16


3.2 Machine Learning and Key concepts 16
3.3 Why is there a need for AI/ML in Genomics? 16-19
3.4 Some Ways in which AI/ML are being used in genomics? 19

4. INTERSECTION OF AI, ML, and GENOMICS...........................................................20

4.1 Harnessing AI in Genomics 20


4.2 AI Applications in Functional Genomics 20-21
4.3 Machine Learning in Genomic Analysis 21-23
4.4 Some ML Algorithms and their Contributions 23-25

5. APPLICATIONS OF AI and ML in Genomics..............................................................26

5.1 Genome Sequencing and Variant Calling 26-27


5.2 Disease Prediction and Risk Assessment 27-28
5.3 Drug Discovery and Development 28-29

CONCLUSION.................................................................................................................30-31

REFERENCES.......................................................................................................................32
Chapter 1

1. Introduction

Genomics, the study of the complete set of genetic material within an organism, has emerged
as a cornerstone in modern biology, revolutionizing our understanding of life at the molecular
level. The unraveling of the human genome and advancements in genome sequencing
technologies have paved the way for unprecedented insights into genetic diversity, evolution,
and the molecular basis of diseases.

Genomics is an interdisciplinary field of biology focusing on the structure, function,


evolution, mapping, and editing of genomes. A genome is an organism's complete set
of DNA, including all of its genes as well as its hierarchical, three-dimensional structural
configuration.[1][2][3] [4] In contrast to genetics, which refers to the study of individual genes
and their roles in inheritance, genomics aims at the collective characterization and
quantification of all of an organism's genes, their interrelations and influence on the
organism.[5] Genes may direct the production of proteins with the assistance of enzymes and
messenger molecules. In turn, proteins make up body structures such as organs and tissues as
well as control chemical reactions and carry signals between cells. Genomics also involves
the sequencing and analysis of genomes through uses of high throughput DNA
sequencing and bioinformatics to assemble and analyze the function and structure of entire
genomes. Advances in genomics have triggered a revolution in discovery-based research
and systems biology to facilitate understanding of even the most complex biological systems
such as the brain.

The field also includes studies of intragenomic (within the genome) phenomena such
as epistasis (effect of one gene on another), pleiotropy (one gene affecting more than one
trait), heterosis (hybrid vigour), and other interactions between loci and alleles within the
genome.

As genomics continues to evolve, so does the need for sophisticated tools and methodologies
to interpret and harness the vast amount of genomic data generated. In this context, Artificial
Intelligence (AI) and Machine Learning (ML) have emerged as transformative technologies,
offering novel solutions to the challenges posed by the complexity and scale of genomic
information.

7
Chapter 1

1.1 Importance of Genomics in Healthcare and Research

The implications of genomics in healthcare are profound. By understanding the genetic basis
of diseases, researchers and clinicians can identify genetic markers associated with various
conditions, enabling more accurate diagnosis and prognosis. Moreover, the advent of
precision medicine leverages genomic information to tailor treatments to an individual's
genetic makeup, increasing the efficacy and minimizing adverse effects.

In research, genomics plays a pivotal role in unraveling the mysteries of evolution,


biodiversity, and ecological systems. It aids in the identification of genes responsible for
specific traits in organisms, facilitating advancements in agriculture, conservation, and
environmental science. The comprehensive knowledge generated by genomics provides a
foundation for addressing global challenges, from feeding a growing population to preserving
biodiversity in the face of environmental changes.

1.2 Genomics Basic Terminology

A gene is the basic physical and functional unit of heredity. Genes are made up of DNA.
Some genes act as instructions to make molecules called proteins. However, many genes do
not code for proteins. In humans, genes vary in size from a few hundred DNA bases to more
than 2 million bases. An international research effort called the Human Genome Project,
which worked to determine the sequence of the human genome and identify the genes that it
contains, estimated that humans have between 20,000 and 25,000 genes.

DNA, or deoxyribonucleic acid, is the hereditary material in humans and almost all other
organisms. Nearly every cell in a person’s body has the same DNA. Most DNA is located in
the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be
found in the mitochondria (where it is called mitochondrial DNA or mtDNA). Mitochondria
are structures within cells that convert the energy from food into a form that cells can use.

8
Chapter 1

Cells are the basic building blocks of all living things. The human body is composed of
trillions of cells. They provide structure for the body, take in nutrients from food, convert
those nutrients into energy, and carry out specialized functions. Cells also contain the body’s
hereditary material and can make copies of themselves.

Cells have many parts, each with a different function. Some of these parts, called organelles,
are specialized structures that perform certain tasks within the cell.

1.3 Genomics V/S Genetics

Genetics and genomics are two terms that are often incorrectly used interchangeably.
Genetics is the study of single genes and their role in the way traits or conditions are passed
from one generation to the next. Genomics is a term that describes the study of all parts of an
organism’s genes.

Genetics

Genetics is a scientific study of the effects that genes — which are units of heredity — have
on an individual. Genes hold information in the molecule DNA, which is a string of
chemicals called bases. The order, or sequence, of bases on the string determines the meaning
of a genetic message. The message contains instructions for making proteins, which, in turn,
direct cells and functions of the body. Humans have thousands of genes that are packaged
into 23 pairs of chromosomes.

Genomics

All of the genes of an organism taken together, plus all of the sequences and information
contained therein, are called the genome. The human genome consists of all of the thousands
of genes and the 23 chromosome pairs. Genomics includes study of how the genes within the
genome interact with each other and with the individual’s environment.

Researchers may conduct genetic or genomic tests. Genetic testing is when the researchers
investigate a single piece of genetic information for specific bits of DNA with a known
function. By investigating a single known entity, scientists may isolate the underlying causes
of the specific genetic variant in question. Genomic testing is broader, with no target.

9
Chapter 1

Genomic testing involves investigating large sections of genetic material and information,
from which broad or specific conclusions may be drawn.

Some examples of genetic or inherited disorders include cystic fibrosis, Down syndrome,
hemophilia, Huntington’s disease, phenylketonuria (PKU) and sickle-cell disease.

Some disorders and complex diseases that have been studied in the field of genomics include
asthma, cancer, diabetes and heart disease. These diseases are caused by a combination of
genetic and environmental factors, rather than simply a single genetic defect. The study of
genomics has provided the medical community with new diagnostic tools and therapies for
these complex diseases.

10
Chapter 2

2.BASICS OF GENOMICS

2.1 Definitions of Genomics

Genomics, a field within molecular biology, involves the study of the entire set of genetic
material within an organism, encompassing its DNA, genes, and non-coding regions. The
term "genome" refers to the complete set of genetic instructions that an organism inherits
from its parents. Genomics extends beyond individual genes, aiming to understand the
structure, function, evolution, and interactions of all the genetic elements within a given
species.

2.2 Structure and Function of DNA

The iconic double helical structure of DNA, discovered by James Watson and Francis Crick
in 1953, serves as the molecular backbone of genomics. Each DNA strand is composed of
nucleotides, with a sugar-phosphate backbone and nitrogenous bases projecting inward. The
specific sequence of these bases encodes genetic information. Adenine pairs with thymine,
forming a stable base pair, while guanine pairs with cytosine. This complementary base-
pairing ensures the faithful transmission of genetic information during processes like DNA
replication.

Beyond its role as a genetic blueprint, DNA is a dynamic molecule involved in various
cellular processes. Genes, segments of DNA that code for proteins, contribute to the synthesis
of molecules essential for cell structure, function, and regulation. Non-coding regions of
DNA, once considered "junk DNA," are now recognized for their regulatory roles,
influencing gene expression and contributing to the complexity of cellular processes.

11
Chapter 2

2.3 Genome Sequencing Techniques

The evolution of genome sequencing techniques has been a transformative force in genomics
research. Early methods, such as Maxam-Gilbert sequencing and Sanger sequencing, were
groundbreaking but limited in their scalability and efficiency. The advent of Next-Generation
Sequencing (NGS) technologies in the 21st century marked a paradigm shift, enabling the
simultaneous sequencing of millions to billions of DNA fragments.

NGS platforms, including Illumina, Ion Torrent, and Pacific Biosciences, utilize various
sequencing-by-synthesis or sequencing-by-ligation approaches. These technologies offer
high-throughput capabilities, allowing researchers to sequence entire genomes quickly and
cost-effectively. The Human Genome Project, completed in 2003, exemplifies the impact of
these advancements, providing a reference genome that serves as a cornerstone for further
genomic exploration.

In addition to NGS, emerging technologies like third-generation sequencing, such as Oxford


Nanopore and Pacific Biosciences' single-molecule real-time (SMRT) sequencing, offer
advantages in long-read sequencing, enhancing our ability to resolve complex genomic
regions and structural variations.

2.4 Genomic Variation and Its Significance

Genomes exhibit remarkable diversity within and between species, manifesting as genomic
variations. These variations can take the form

of single nucleotide polymorphisms (SNPs), insertions, deletions, and structural alterations.


Understanding genomic variation is essential as it underlies the diversity observed in
populations and influences susceptibility to diseases.

Genomic variations contribute to the phenotypic diversity seen in populations, influencing


traits such as eye color, susceptibility to certain diseases, and response to drugs. Moreover,
studying genomic variation is crucial for unraveling the evolutionary history of species,
providing insights into adaptation and speciation.

12
Chapter 2

The significance of genomic variation extends to personalized medicine, where knowledge of


an individual's unique genetic makeup informs treatment decisions. Identifying variations
associated with diseases allows for targeted therapies, enhancing treatment efficacy and
minimizing adverse effects. Advances in genomics, coupled with sophisticated analytical
tools, continue to deepen our understanding of genomic variation, unlocking new avenues for
precision medicine and personalized interventions.

2.5 Functional Genomics

Functional genomics is the science that studies, on a genomewide scale, the relationships
among the components of a biological system - genes, transcripts, proteins, metabolites, etc. -
and how these components work together to produce a given phenotype. The term ”functional
genomics” takes root in the scientific community at the time of the rising of the first genome
sequencing projects. These projects are ultimately aimed at determining the complete genome
sequence of a given organism and to annotate functionally relevant features therein, such as
protein-coding and non-coding genes as well as DNA regulatory regions. The landmark such
endeavour is the Human Genome Project (HGP),1 a worldwide collaborative project
launched in 1990 and officially completed in 2003 (International Human Genome
Sequencing Consortium [33]). However, the first completely sequenced genome from a
eukaryote, that of the budding yeast Saccharomyces cerevisiae, was released already in 1996
[34] and provided material to start exploring the complex relationships between genes and
gene products at the genome scale. Indeed, a tentative definition of functional genomics was
first published in 1997 by Hieter and Boguski [35], that at the beginning of their paper state:
‘‘An informal poll of colleagues indicates that the term [functional genomics] is widely used,
but has many different interpretations. There is even some sentiment that the term is
unnecessary and that it does nothing more than refer to biological research as a whole.”

Beyond the static representation of the genome, functional genomics explores how genes and
their interactions contribute to biological functions. It involves deciphering the roles of
individual genes, understanding gene regulation, and uncovering the networks that govern
cellular processes. Systems biology, an interdisciplinary approach, integrates genomics data

13
Chapter 2

with computational modeling to comprehend the dynamic and interconnected nature of


biological systems.

Functional genomics employs tools such as gene expression profiling, RNA interference
(RNAi), and CRISPR-Cas9 gene editing to unveil the functions of genes and their products.
By dissecting the molecular mechanisms underlying cellular processes, functional genomics
provides valuable insights into the relationships between genotype and phenotype.

Systems biology complements functional genomics by considering the holistic view of


biological systems. It involves the integration of genomics, transcriptomics, proteomics, and
metabolomics data to construct comprehensive models of cellular processes and organismal
behavior. This integrative approach enhances our understanding of the intricate networks that
govern life, offering a systems-level perspective on biology.

Understanding functional genomics and systems biology is pivotal for comprehending the
complexity of living organisms and elucidating the molecular basis of health and disease. As
genomics advances, these approaches contribute to a more nuanced understanding of the
interplay between genes and their functional outcomes.

14
Chapter 3

3. INTRODUCTION TO AI & ML

3.1 Artificial Intelligence

Artificial intelligence (AI) is the development of computer systems that are able to perform
tasks that normally require human intelligence. Advances in AI software and hardware,
especially deep learning algorithms and the graphics processing units (GPUs) that power their
training, have led to a recent and rapidly increasing interest in medical AI applications. In
clinical diagnostics, AI-based computer vision approaches are poised to revolutionize image-
based diagnostics, while other AI subtypes have begun to show similar promise in various
diagnostic modalities.

In some areas, such as clinical genomics, a specific type of AI algorithm known as deep
learning is used to process large and complex genomic datasets. In this review, we first
summarize the main classes of problems that AI systems are well suited to solve and describe
the clinical diagnostic tasks that benefit from these solutions. Next, we focus on emerging
methods for specific tasks in clinical genomics, including variant calling, genome annotation
and variant classification, and phenotype-to-genotype correspondence. Finally, we end with a
discussion on the future potential of AI in individualized medicine applications, especially for
risk prediction in common complex diseases, and the challenges, limitations, and biases that
must be carefully addressed for the successful deployment of AI in medical applications,
particularly those utilizing human genetics and genomics data.

Artificial intelligence (AI) is the simulation of intelligence in a non-living agent. In the


context of clinical diagnostics, we define AI as any computer system that can correctly
interpret health data, especially in its native form as observed by humans. Often, these
clinical applications adopt AI frameworks to enable the efficient interpretation of large
complex datasets. These AI systems are trained on external health data that have usually been
interpreted by humans and that have been minimally processed before exposure to the AI

15
Chapter 3

system, for example, clinical images that have been labeled and interpreted by a human
expert. The AI system then learns to execute the interpretation task on new health data of the
same type, which in clinical diagnostics is often the identification or forecasting of a disease
state.

3.2 Machine Learning

Machine learning (ML) and deep learning are fields of study frequently mentioned in the
context of AI. Both kinds of learning are subfields of AI. Machine learning is a process by
which machines can be given the capability to learn about a given dataset without being
explicitly programmed on what to learn.

Machines can usually learn in either a supervised or unsupervised manner. Under supervised
learning, scientists provide machines with separate training and test data sets. The training
data has defined categories (e.g., people with coronary heart disease and those without) that
the machine can use to infer hidden qualities of the data and distinguish the categories from
each other. It is then able to use this knowledge to work on the test data and make informed
predictions (e.g., which people in a population are likely to develop coronary heart disease).

In an unsupervised learning setting, machines can recognize patterns in large datasets and
make predictions about the real world without requiring any additional help from humans.

When machines can learn in an unsupervised manner, they are considered to be learning
“deeply.” Deep learning is a relatively modern technique used to implement machine
learning. A deep learning algorithm takes a dataset and finds patterns and critical information
by imitating how a human brain’s neurons interact with each other. The algorithms are
artificial neural networks — a computing system that simulates the brain’s ability to weigh
the importance of some data versus others, and handle bias.

16
Chapter 3

3.3 Why is there a need for AI/ML in Genomics?

As of 2021, 20 years have passed since the landmark completion of the draft human genome
sequence. This milestone has led to the generation of an extraordinary amount of genomic
data. Estimates predict that genomics research will generate between 2 and 40 exabytes of
data within the next decade.

DNA sequencing and other biological techniques will continue to increase the number and
complexity of such data sets. This is why genomics researchers need AI/ML-based
computational tools that can handle, extract and interpret the valuable information hidden
within this large trove of data.

The integration of Artificial Intelligence (AI) and Machine Learning (ML) in genomics has
become increasingly important due to the complexity and vast amount of data involved in
understanding the genetic makeup of individuals and populations. Here are several reasons
why AI and ML are crucial in genomics:

1. Data Complexity and Volume:

 Genomic data is massive, with each individual's genome containing billions of


base pairs. Analyzing and interpreting this data manually is practically
impossible.

 AI and ML algorithms can efficiently process and analyze large-scale genomic


datasets, identifying patterns and relationships that might be missed by
traditional methods.

2. Pattern Recognition:

 AI and ML excel at recognizing patterns in complex data sets. In genomics,


these patterns could represent genetic variations, mutations, or associations
between genes and diseases.

 The ability to identify subtle patterns is crucial for understanding the genetic
basis of diseases and developing targeted treatments.

3. Disease Prediction and Diagnosis:

17
Chapter 3

 AI and ML algorithms can analyze genomic data to predict an individual's


susceptibility to certain diseases based on their genetic makeup.

 These technologies can aid in early disease diagnosis by identifying genetic


markers associated with specific conditions.

4. Personalized Medicine:

 AI and ML play a key role in the development of personalized medicine,


where treatments are tailored to an individual's genetic profile.

 By analyzing genomic data, AI can help identify the most effective treatments
and predict potential adverse reactions to specific drugs.

5. Variant Interpretation:

 Genomic sequencing often reveals numerous genetic variants, and


distinguishing between benign and pathogenic variants is a challenging task.

 ML models can be trained to interpret variants, prioritize those with clinical


relevance, and assist geneticists in making informed decisions.

6. Drug Discovery:

 AI and ML can accelerate drug discovery by analyzing genomic data to


identify potential drug targets, predict drug responses, and optimize drug
development processes.

 Targeting specific genetic markers associated with diseases can lead to more
effective and personalized therapeutic interventions.

7. Functional Genomics:

 Understanding the functions of individual genes and their interactions is


crucial in genomics research.

 AI and ML algorithms can analyze functional genomics data to infer gene


functions, identify regulatory elements, and decipher complex genetic
networks.

8. Data Integration:

18
Chapter 3

 Genomic data often needs to be integrated with clinical, environmental, and


other omics data for a comprehensive understanding of health and diseases.

 AI and ML techniques facilitate the integration of diverse datasets, enabling


researchers to derive more holistic insights.

3.4 Some Ways in which AI/ML are being used in genomics?

Although the use of AI/ML tools in genomics is still at an early stage, researchers have
already benefited from developing programs that assist in specific ways.

Some examples include:

 Examining people’s faces with facial analysis AI programs to accurately identify


genetic disorders.

 Using machine learning techniques to identify the primary kind of cancer from a
liquid biopsy.

 Predicting how a certain kind of cancer will progress in a patient.

 Identifying disease-causing genomic variants compared to benign variants using


machine learning.

 Using deep learning to improve the function of gene editing tools such as CRISPR.

These are just a few ways by which AI/ML methods are helping predict and identify hidden
patterns in genomic data. Scientists are also using AI/ML to predict future variations in the
genomes of the influenza and SARS-CoV-2 viruses to assist public health efforts.

19
Chapter 4

4. INTERSECTION OF AI, ML, and GENOMICS

Advancements in genomics have ushered in an era of unprecedented data generation, with the
capability to sequence entire genomes swiftly and cost-effectively. However, the sheer
volume and complexity of genomic data present challenges in analysis, interpretation, and
deriving meaningful insights. This is where the integration of Artificial Intelligence (AI) and
Machine Learning (ML) emerges as a transformative force in genomics research.

4.1 Harnessing AI in Genomics

Artificial Intelligence (AI), a branch of computer science that aims to create intelligent
machines capable of learning and problem-solving, has found a fertile ground in genomics.
AI algorithms, particularly those based on deep learning, demonstrate remarkable capabilities
in pattern recognition and feature extraction, making them well-suited for handling intricate
genomic datasets.

AI is applied to tasks such as variant calling, where it can accurately identify genetic
variations from raw sequencing data, and genomic annotation, where it assists in interpreting
the functional significance of genetic variants. Additionally, AI plays a pivotal role in
predicting the three-dimensional structures of proteins, aiding in understanding their
functions and interactions.

4.2 AI Applications in Functional Genomics

In the last decades, ML has been widely used in many areas of ‘‘omics” sciences, especially
those characterized by the production of large amounts of data and/or complex mechanisms
governed by the synergic participation of different factors. Important applications include:
prediction of DNA regulatory regions; discovery of cell morphology and spatial organization;

20
Chapter 4

identification of associations between phenotypes and genotypes; classification of DNA


methylation and histone modifications; biomarkers discovery; transcriptional enhancers
detection; cancer diagnosis and analysis of evolutionary mechanisms . Since the 1980s we
have witnessed the first attempts to apply supervised training techniques to ‘‘omics” sciences.
In 1982, Stormo et al. used the Perceptron algorithm to distinguish E. coli translational
initiation sites from all other sites in a library of over 78.000 nucleotides of mRNA sequence.
In 1993, Rost and Sander implemented a neural network to predict the protein secondary
structure [43]. DL techniques began to be massively used in functional genomics only in the
second decade of the 2000s, due to the improvement of PC performance and the collapse of
genome sequencing costs . In 2015, two important deep architectures have been implemented
and applied to functional genomics, producing results of great scientific impact.

DeepBind is a fully automatic standalone software for the prediction of sequence


specificities of DNA and RNA binding proteins. DeepSEA (deep learning-based sequence
analyser) predicts chromatin effects of sequence alterations with single-nucleotide resolution,
by learning regulatory sequences from large-scale chromatin-profiling data. Both methods,
based on deep architectures, have overcome many challenges such as the processing of
millions of sequences, the generalization between data from different technologies, the
tolerance of noise and missing data and the end-to-end and totally automatic learning, without
the need for hand-tuning. These approaches outperformed other state-of-the-art methods and
encouraged many scientists to follow similar exciting paths.

4.3 Machine Learning in Genomic Analysis

Machine Learning (ML), a subset of AI, involves the development of algorithms that enable
computers to learn patterns and make predictions without explicit programming. In genomics,
ML algorithms contribute to a wide array of applications, including disease prediction, drug
discovery, and population genetics.

ML models excel in predicting disease risks based on genomic information. By analyzing


patterns across large datasets, ML algorithms identify subtle correlations between genetic

21
Chapter 4

variations and disease susceptibility. These predictive models offer valuable insights into
personalized medicine, guiding clinicians in tailoring treatments based on individual genetic
profiles.

In AI, ML is a computer-based model used to acknowledge and understand patterns in an


overall volume of information to build classification and prediction models based on the
training data. Arthur Samuel, an IBM employee, firstly created the word “machine learning”
in the 1950s. Machine learning has progressed significantly since then [28]. ML is divided
into supervised and unsupervised learning, as well as reinforcement learning [29]. The reward
for good performance and punishment for bad performance is used to train reinforcement
learning models. Positive feedback effectively guides the ML model to make the same choice
again in the future.

In contrast, negative feedback essentially guides the ML model to evade making the same
decision again in the hereafter. In contrast to supervised or unsupervised ML techniques,
reinforcement learning plays a minor part in precision medicine approaches because of the
direct response. Machine learning is primarily classified into three types: classification,
clustering, and regression. Supervised learning techniques include classification and
regression, whereas clustering is an unsupervised learning technique. Classification uses
labels and parameters to predict discrete, categorical response values, such as detecting
malignancy through biopsy samples. Clustering is used to segment data, for example, to
determine the currency of a disease in a given community as a result of pollution or chemical
spills. Regression forecasts continuous-response numeric data to discover administration
trends, such as the time interval between a patient's discharge and readmission to the hospital
(positive/negative).

Machine Learning is transforming healthcare by guiding individual and population health


through a variety of computational benefits. It contributes to observing sick patients, disease
pattern analysis, diagnosis and making prescriptions of a drug, providing patient-centered
care, reducing clinical errors, predictive scoring, therapeutic decision making, detecting
sepsis, and high-risk emergencies in patients.

22
Chapter 4

It also identifies phenotypes, decode clinical statements out of death certificates and post-
mortem reports of patients, identifies cardiovascular diseases, cancer, and symptoms related
to different diseases, predicting and inter-venting risk, and paneling and resourcing. In
precision medicine, there are ten algorithms which are generally used. They are SVM, genetic
algorithm, hidden Markov, linear regression, DA, decision tree, logistic regression, Naïve
Bayes, deep-learning model (HMM), random forest, and K-nearest neighbor (KNN).

4.4 Some ML Algorithms and their Contributions

 SVM : SVM classify and analyze symptoms to develop better diagnostic accuracy.
The other contributions of SVM in precision medicine include identifying biomarkers
of neurological and psychological diseases and analyzing SNPs to validate multiple
myeloma and breast cancer. Clinical, pathological, and epidemiological data are
analyzed by SVM to resist breast and cervical cancer. It analyzes clinical, molecular,
and genomic data to validate oral cancer and diagnose mental disease

 Deep Learning : It is a commonly used algorithm in medicine. Generally, Deep


Learning is utilized to analyzed images from different healthcare sectors, but it was
highly employed in oncology. The algorithm was implemented to analyze lung
cancer, CT scan, and MRI of the abdominal and pelvic area, colonoscopy,
mammography, brain scan for brain tumors, radiation oncology, skin cancer, biopsy
sample visualize, ultrasound of biopsy sample of prostate tumor, radiographs of
malignant lung nodules, glioma through histopathological scanning, and biomarker
data and sequencing (DNA and RNA). Moreover, it was also applied in the diagnostic
process of many diseases, for instance, diabetic retinopathy, nodular BCC,
histopathological anticipation in women with cytological deformations, dermal nevus
and seborrheic keratosis, cardiac abnormalities, and cardiac muscle failure by
analyzing MRI of ventricles of the heart.

23
Chapter 4

 Logistic Regression : This algorithm can evaluate the potential risk of several
complex diseases such as breast cancer and tuberculosis. It also contributes to
assessing patient survival rates and identifying cardiovascular disease. By analyzing
prognostic factors, it can identify pulmonary thromboembolism (PTE) and non-
lymphoma Hodgkin's diagnosis.

 Decision Tree : This machine-learning algorithm is well applied for real-time


healthcare monitoring, detecting and sensor aberrant data, data-extracting model for
pollution prediction, and therapeutic decision support system. Some real-time
application of decision tree algorithm includes challenges in order alternate therapies
in oncology patients, identifying predictors of health outcomes, supporting clinical
decisions, diagnosing hypertension through finding factors, locating genes associated
with pressure ulcers (PUs) among elderly patients, therapeutic decision making in
psychological patients, stratifying patient’s data in order to interpret decision making
for precision medicine, finding the potential patients of telehealth services, diabetic
foot amputation risk, and lastly it analyzes contents to help patients in medical
decision.

 Random Forest : This algorithm has been widely employed in several parts of the
healthcare system. The reported contributions of this algorithm include prediction of
metabolic pathways of individuals, predicting results of a patient’s encounter with
psychiatrist, mortality prediction of ICU patients, classification and diagnosis of
Alzheimer’s disease monitoring medical wireless sensors, detecting knee
osteoarthritis, healthcare cost prediction, diagnosing mental illness, identifying non-
medical factors related to health, predicting the risk of emergency admission,
forecasting disease risks from clinical error data, finding factor accompanied with
diabetic peripheral neuropathy diagnosis, identification of patients who are ready to
get discharged from ICU, detecting depression Alzheimer patients, and diagnosing
sleep disorders and non-assumptive diverse treatment effects.

24
Chapter 4

 Naïve Bayes : This algorithm is being used in distinct areas of medicine such as
predicting risks by identifying Mucopolysaccharidosis type II, utilizing censored and
time-to-event data, classifying EHR, shaping clinical diagnosis for decision support,
extracting genome-wide data to identify Alzheimer's disease, modeling a decision
related to cardiovascular disease, measuring quality healthcare services, constructing
a predictive model for cancer in brain, asthma, prostate, and breast.

 KNN : KNN has been employed in various scientific domains, although it has just a
few uses in the healthcare system. It was implemented in preserving the confidential
information of clinical prediction in the e-Health cloud, pattern classification for
breast cancer diagnosis, pancreatic cancer prediction using published literature,
modeling diagnostic performance, detection of gastric cancer, pattern classification
for health monitoring applications, medical dataset classification, and EHR data are
some examples of real-time examples.

25
Chapter
5

5. APPLICATIONS OF AI and ML in Genomics

5.1 Genome Sequencing and Variant Calling

One of the fundamental applications of AI and ML in genomics is in the domain of genome


sequencing and variant calling. AI algorithms, particularly those based on deep learning, can
efficiently analyze sequencing data to identify genetic variations. These algorithms improve
the accuracy of variant calling, ensuring the reliable detection of single nucleotide
polymorphisms (SNPs), insertions, deletions, and structural variations across the genome.

The ability of ML models to discern patterns in complex genomic datasets contributes to


more precise and comprehensive genomic analyses. This has implications for understanding
genetic diversity within populations, uncovering disease-associated variations, and
facilitating large-scale genomic projects.

Sequencing simply means determining the exact order of the bases in a strand of DNA.
Because bases exist as pairs, and the identity of one of the bases in the pair determines the
other member of the pair, researchers do not have to report both bases of the pair

In the most common type of sequencing used today, called sequencing by synthesis, DNA
polymerase (the enzyme in cells that synthesizes DNA) is used to generate a new strand of
DNA from a strand of interest. In the sequencing reaction, the enzyme incorporates into the
new DNA strand individual nucleotides that have been chemically tagged with a fluorescent
label. As this happens, the nucleotide is excited by a light source, and a fluorescent signal is
emitted and detected. The signal is different depending on which of the four nucleotides was
incorporated. This method can generate 'reads' of 125 nucleotides in a row and billions of
reads at a time.

To assemble the sequence of all the bases in a large piece of DNA such as a gene, researchers
need to read the sequence of overlapping segments. This allows the longer sequence to be
assembled from shorter pieces, somewhat like putting together a linear jigsaw puzzle. In this
26
Chapter
5
process, each base has to be read not just once, but at least several times in the overlapping
segments to ensure accuracy.

Researchers can use DNA sequencing to search for genetic variations and/or mutations that
may play a role in the development or progression of a disease. The disease-causing change
may be as small as the substitution, deletion, or addition of a single base pair or as large as a
deletion of thousands of bases.

5.2 Disease Prediction and Risk Assessment

AI and ML play a pivotal role in predicting disease risks based on genomic information. By
analyzing patterns in genomic data from both affected and healthy individuals, ML
algorithms can identify subtle correlations between genetic variations and disease
susceptibility. These predictive models assist in assessing an individual's likelihood of
developing certain diseases, allowing for proactive and personalized healthcare strategies.

Applications in disease prediction extend to various medical fields, including cancer risk
assessment, cardiovascular disease prediction, and neurodegenerative disorders. ML models,
trained on diverse genomic datasets, contribute to more accurate risk assessments and early
interventions.

Cancer Genomics

In the last decades, the rise of NGS techniques has revolutionized the medical approach to
cancer [177]. Genomics has become increasingly important in clinical study, prevention,
treatment and monitoring practices. Cancer genomics studies differences in DNA sequences
and gene expression between tumour and normal cells, with the aim to understand the
dynamics underlying the formation and spread of tumours at the genetic, metabolic, systemic
and environmental level. The Cancer Genome Atlas [178] project C. Caudai, A. Galizia, F.
Geraci et al. Computational and Structural Biotechnology Journal 19 (2021) 5762–5790 5770
collected multi-level NGS data for 33 different types of common tumours, an enormous data
resource made available to study tumour-specific as well as recurrent cancer mechanisms.
The availability and integration of large quantities of genomic, proteomic and epigenomic
information has allowed increasingly comprehensive representations of complex dynamics,
such as cancer formation[179], to be obtained. Indeed, integration of multiple omics data can

27
Chapter
5
help overcome possible noise and/or bias of single data layers, thus improving the relevance
of extracted representative features. In this framework, data integration has been an active
field of research for ML and DL techniques applied to omics data, especially cancer
genomics [180,181] (see Section 4.3 for a more detailed discussion on data integration). In
particular, the introduction of autoencoders, such as denoising autoencoders, has allowed
robust representations of heterogeneous data to be provided, and extraction of highly
representative and predictive features to be more easily performed [182–184]. Indeed, AI
applications to cancer genomics can provide useful information for a rapid growth of
precision medicine and for disease prevention and monitoring. ML applications to mutation
detection and interpretation can help in identifying cancer-predisposing genes such as
BRCA1/2 and in predicting cancer risk [185,186]. AI performances in cancer genomics are
very promising.

In 2017, Way et al. developed an ML approach based on ensemble logistic regression, which
was trained on both mutation and transcriptomic profiles of glioblastoma from The Cancer
Genome Atlas, to predict genes that may exhibit synthetic lethality in cancer cells lacking the
neurofibromin 1 tumour suppressor gene. In 2019, Das et al. implemented DiscoverSL, a
multiparameter RF classifier trained on multi-omic cancer data from The Cancer Genome
Atlas [178] to predict and visualize synthetic lethality in cancers. In 2020, Wan et al.
developed EXP2SL, a semisupervised NN-based method, which was trained on a large
collection of cancer cell line expression signatures from the LINCS1000 Program [198], to
predict cancer cell-line specific synthetic lethal interactions.

5.3Drug Discovery and Development

 Target Identification and Validation : AI and ML algorithms analyze vast genomic


datasets to predict and prioritize potential drug targets. By scrutinizing the genetic
underpinnings of diseases and identifying key molecular players, these technologies
streamline the process of target identification and validation. ML models sift through
extensive biological data to uncover associations between specific genes or proteins
and diseases, aiding researchers in selecting targets with a higher likelihood of

28
Chapter
5
therapeutic success. This targeted approach accelerates the initial stages of drug
discovery, enabling researchers to focus their efforts on biologically relevant targets,
ultimately increasing the chances of developing successful therapeutics.

 Predictive Toxicology and Safety Assessment : AI and ML contribute significantly to


predictive toxicology and safety assessment during drug development. By leveraging
genomic and chemical data, ML models predict potential adverse effects and toxicity
of drug candidates. This early identification of safety concerns enhances decision-
making, allowing researchers to prioritize compounds with a more favorable safety
profile. These technologies not only reduce the risk of late-stage clinical trial failures
due to safety issues but also expedite the identification of safe and effective drug
candidates. Predictive toxicology models, trained on diverse datasets, provide a
comprehensive understanding of potential risks associated with specific compounds.

29
CONCLUSION

In the dynamic intersection of Artificial Intelligence (AI), Machine Learning (ML), and
genomics, we find ourselves at the forefront of a scientific revolution that is reshaping the
landscape of biological research, healthcare, and drug discovery. This report has traversed the
fundamentals of genomics, explored the transformative applications of AI and ML in
genomics, and delved into the synergy that is propelling the fields forward.

The advent of high-throughput sequencing technologies has unleashed an unprecedented era


of genomic data generation. However, it is the fusion of AI and ML with genomics that has
catalyzed a paradigm shift in how we analyze, interpret, and extract knowledge from these
vast datasets. From genome sequencing and disease prediction to drug discovery and
personalized medicine, the applications are both diverse and profound.

AI and ML bring forth a new era in genomics, offering solutions to challenges that were once
insurmountable. The accuracy of variant calling, the precision of disease risk prediction, and
the efficiency of drug discovery exemplify the transformative power of these technologies.
The ability to analyze massive datasets, recognize intricate patterns, and predict outcomes has
propelled genomics into a realm where the translation of research findings into tangible
applications is more rapid and effective than ever before.

The integration of AI and ML in genomics is not without challenges, including ethical


considerations, the need for robust validation, and interpretability of complex models. As we
navigate these challenges, the promise of advancements in real-time genomic data analysis,
improved model interpretability, and novel applications in emerging fields awaits
exploration.

Looking ahead, the implications of this intersection extend beyond the confines of scientific
laboratories. In healthcare, the promise of personalized medicine is becoming a reality, where
treatments are tailored to individual genetic profiles. In drug discovery, the acceleration of
the development pipeline is bringing potential therapeutics to market faster and with higher
success rates. The impact on agriculture, environmental conservation, and our fundamental
understanding of life itself is profound.

30
As we stand at the crossroads of genomics and AI/ML, the journey forward holds great
promise and responsibility. The knowledge gleaned from this synergy has the potential to not
only transform the way we approach health and disease but also contribute to a deeper
understanding of the fundamental mechanisms that govern life. By fostering interdisciplinary
collaboration, embracing ethical considerations, and pushing the boundaries of innovation,
we embark on a path where the convergence of AI, ML, and genomics continues to shape a
future rich in scientific discovery and societal impact.

31
REFERENCES

https://en.wikipedia.org/wiki/Genomics

https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-019-0689-8

https://medlineplus.gov/genetics/understanding/basics/gene/

https://www.genome.gov/about-nhgri/Director/genomics-landscape/jan-5-2023-artificial-
intelligence-and-machine-learning-becoming-pervasive-at-nhgri-and-in-genomics

https://www.genome.gov/about-genomics/fact-sheets/A-Brief-Guide-to-Genomics

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9198206/

32

You might also like