Artificial Intelligence To Deep Learning: Machine Intelligence Approach For Drug Discovery
Artificial Intelligence To Deep Learning: Machine Intelligence Approach For Drug Discovery
https://doi.org/10.1007/s11030-021-10217-3
Received: 29 January 2021 / Accepted: 22 March 2021 / Published online: 12 April 2021
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2021
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists.
However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug
design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also
impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role
in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized
the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such
as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring
and release, pharmacophore modeling, quantitative structure–activity relationship, drug repositioning, polypharmacology,
and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learn-
ing in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently
developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent
opportunity for rational drug design and discovery process, which will eventually impact mankind.
Graphic abstract
The primary concern associated with drug design and development is time consumption and production cost. Further, inef-
ficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and
development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms
can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as
superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning,
and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug
design and development. The artificial neural network, deep neural network, support vector machines, classification and
regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to
the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and develop-
ment process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative struc-
ture–activity relationship to drug repositioning, protein misfolding to protein–protein interactions, and molecular pathway
identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and
inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker
* Pravir Kumar
[email protected]; [email protected]
1
Molecular Neuroscience and Functional Genomics
Laboratory, Department of Biotechnology, Delhi
Technological University (Formerly DCE), Shahbad
Daulatpur, Bawana Road, Delhi 110042, India
13
Vol.:(0123456789)
1316 Molecular Diversity (2021) 25:1315–1360
development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity,
and identification of mode of action.
Keywords Artificial intelligence · Machine learning · Deep learning · Virtual screening · Drug design and discovery ·
Artificial neural networks · Computer-aided drug design · Quantitative structure–activity relationship · Drug repurposing
13
Molecular Diversity (2021) 25:1315–1360 1317
13
1318 Molecular Diversity (2021) 25:1315–1360
13
Molecular Diversity (2021) 25:1315–1360 1319
◂Fig. 1 a History of artificial intelligence in healthcare: the first break- overview on the congregation of AI and conventional chem-
through of artificial intelligence in healthcare comes in 1950 with istry in the improvement of the drug discovery process and
the development of turning tests. Later on, in 1975, the first research
resource on computers in medicines was developed, followed by
the application of AI in the improvement of the traditional
NIH’s first central AIM workshop marked the importance of artificial drug discovery process. Afterward, we discuss the numerous
intelligence in healthcare. With the development of deep learning in AI applications throughout the drug design and discovery
the 2000s and the introduction of DeepQA in 2007, the scope of arti- processes such as primary and secondary screening, drug
ficial intelligence in healthcare has increased. Further, in 2010 CAD
was applied to endoscopy for the first time, whereas, in 2015, the first
toxicity, drug release and monitoring, drug dosage effective-
Pharmbot was developed. In 2017, the first FDA-approved cloud- ness and efficacy, drug repositioning, and polypharmacol-
based DL application was introduced, which also marked the imple- ogy, and drug-target interactions.
mentation of artificial intelligence in healthcare. From 2018 to 2020
several AI trials in gastroenterology were performed. b Classification
of artificial intelligence: there are seven classifications of artificial
intelligence, which are reasoning and problem solving, knowledge Evolution of artificial intelligence: machine
representation, planning and social intelligence, perception, machine learning to deep learning
learning, robotics: motion and manipulation, and natural language
processing, as discussed by Russel and Norvig in their book “Artifi-
cial Intelligence: A Modern Approach.” Machine learning is further
In September 2015, the Google search trend showed that
divided into three significant subsets: supervised learning, unsuper- after the introduction of ML, AI was the most searched term.
vised learning, and deep learning, whereas vision is divided into two Some describe ML as the primary AI application, while oth-
subsets, such as image recognition and machine vision. Similarly, ers describe it as a subset of AI [11, 12]. AI is an umbrella
speech is divided into two subsets: speech to text and text to speech,
whereas natural language processing is classified into five main sub-
term where computer programs are able to think and behave
sets, including classification, machine translation, question answer- as humans do, whereas ML is beyond that where data are
ing, text generation, and content extraction. c Artificial intelligence in inputted in the machine along with an algorithm like Naïve
the healthcare and pharmaceutical industry has five significant appli- Bayes, decision tree (DT), hidden Markov models (HMM)
cations, which change the entire scenario. These applications include
research and discovery, clinical development, manufacturing and sup-
and others, which helps the machine to learn without being
ply chain, patient surveillance, and post-market surveillance explicitly programmed. Later, with the development of neu-
ral networks, machines could classify and organize inputted
data that mimics like a human brain, which further shows
companies is managing the cost and speed of the process [5]. advancement in AI. Around twentieth century, Igor Aizen-
AI has answered all these questions in a simple and scientific berg and his colleagues, while talking about the artificial
manner, which reduced the time consumption and cost of the neural network (ANN), brought up the term “deep learning”
process. Moreover, the increase in data digitization in the for the first time. DL is a subset of ML, which itself is a
pharmaceutical companies and healthcare sector motivates subset of AI, and thus, the evolution goes like AI > ML > DL
the implementation of AI to overcome the problems of scru- [13, 14]. ML either uses supervised learning, where the
tinizing the complex data [6]. model is trained to use labeled data, which means that the
AI, which is also referred to as machine intelligence, input has been tagged with corresponding preferred output
means the ability of computer systems to learn from input labels or uses unsupervised learning, where the model is
or past data. The term AI is commonly used when a machine trained to use unlabeled data but looks for recurring pat-
mimics cognitive behavior associated with the human brain terns from the input data [15]. Others are semi-supervised
during learning and problem solving [7]. Nowadays, bio- learning that uses the combination of both supervised and
logical and chemical scientists extensively incorporate AI unsupervised learnings; self-supervised learning, which is
algorithms in drug designing and discovery process [8]. a special case, uses a two-step process where unsupervised
Computational modeling based on AI and ML principles learning generates labels for unlabeled data and its ultimate
provides a great avenue for identification and validation of goal is to make supervised learning model; reinforcement
chemical compounds, target identification, peptide synthesis, learning is a type of ML which improves its algorithm over
evaluation of drug toxicity and physiochemical properties, time with the help of a constant feedback loop and lastly
drug monitoring, drug efficacy and effectiveness, and drug DL where there are many layers of ML algorithms which is
repositioning [9]. With the advent of AI principles along called as a brain-inspired family of algorithms which mim-
with ML and DL algorithms, VS of compounds from chemi- ics human brain but requires high computational power for
cal libraries, which comprises more than 1 06 million com- training and big data to succeed [16, 17]. The origin of ML
pounds, become easy and time-effective. Further, AI models dates back to 1943 when McCulloch and Pitts published
eliminate the toxicity problems, which arise due to off-target an article named “A logical calculus of the idea immanent
interactions [10]. Herein, we briefly discuss the evolution of in nervous activity,” where they gave the first-ever math-
AI from ML to DL and big data involvement in revolution- ematical model of a neural network [18]. Alan M. Turing
izing the drug discovery process. Later on, we presented an theorized the concept of ML in his seminal paper published
13
1320 Molecular Diversity (2021) 25:1315–1360
in 1950 [19]. In 1952, Arthur L. Samuel popularized the around 2012, which helped in strengthening the speed and
term “machine learning” by writing a checker-playing pro- dropout using rectified linear units [33]. In the same year,
gram for IBM [20]. In 1957, Frank Rosenblatt developed “the cat experiment” conducted by Google Brain concluded
perceptron, which was built for image recognition [21]. that the network correctly recognizes less than 16% of the
Henry J. Kelley developed the continuous backpropagation presented objects [34]. In 2014 Nvidia introduced CUDA
model in 1960, and a simpler version based only on-chain deep neural network (cuDNN), a CUDA-based DL library,
rule was developed by Stuart Dreyfus in 1962 [22, 23]. In which accelerated DL-based operations [35]. Similarly,
1965, Ivakhnenko and Lapa developed the first working DL “Deep Face” was developed and released in 2014 to identify
networks. Around 1980, Kunihiko Fukushima developed an faces with 97.5% accuracy [36]. In the same year, generative
ANN called neocognitron that had a multilayered design adversarial networks (GANs) were introduced, using two
that could help the computer learn how to recognize visual competing neural networks to check whether the data are
patterns [24]. He also developed the first convolutional neu- genuine or generated [37]. In 2016, Cray Inc. used Micro-
ral network (CNN) which was based on the visual cortex soft’s neural network software on its XC50 supercomputer
organization found in animals [25] [Fig. 1]. with 1000 Nvidia Tesla P100 GPUs that could perform the
David Rumelhart, Geoffrey Hinton, and Ronald J. Wil- task and gave output in a fraction of seconds. In 2017 Nvidia
liams published a paper entitled “Learning Representations introduced Tesla V100 GPU, which had tensor cores that
by Back-propagating Errors” in 1986, which demonstrated accelerated AI-based operations. However, DL is still in its
that backpropagation could provide an improvement in growth phase, and creative ideas are required for further
shape recognition and word prediction [26]. After the initial advancement in this field.
success, there were some setbacks, but Hinton kept work-
ing during the second AI Winter to achieve new heights.
Thus, he is considered as the Godfather of DL. Soon, in Revolutionizing drug discovery process: role
1989, Yann LeCun gave the first practical demonstration of of big data and artificial intelligence
backpropagation at Bell Labs [27]. The same year, Chris-
topher Watkins published his thesis entitled “Learning Big data can be defined as data sets that are too gigantic and
from Delayed Rewards,” which introduced the concept of intricate to be analyzed with the conventional data analyzing
Q-learning, which further improved reinforcement learning software, tools, and techniques. The three main character-
in computer programs [28]. In 1995, Corinna Cortes and istic features of big data are volume, velocity, and variety,
Vladimir Vapnik developed support vector machines (SVM) where volume represents the huge amount and mass of data
to map and recognize similar data [29]. After two years, in generated, velocity represents the rate at which these data
1997, Jürgen Schmidhuber and Sepp Hochreiter developed are being reproduced, and variety represents heterogenicity
long short-term memory (LSTM) for recurrent neural net- present in the data sets [38]. With the advent of microar-
works [30]. ray, RNA-seq, and high-throughput sequencing (HTS) tech-
In 1999, a graphic processing unit (GPU) was launched nologies, a plethora of biomedical data is being engendered
as a microprocessor circuit, which was developed initially every day, due to which contemporary drug discovery has
to accelerate 3D graphics processing for computer gaming. made a transition into the big data era. In drug discovery,
Later on, GPUs became popular in the field of technology the first and foremost step is the identification of appropri-
and research as well because of their ability of parallel com- ate targets (e.g., genes, proteins) involved in disease patho-
puting. A research report presented by META Group in 2001 physiology, followed by finding suitable drugs or drug-like
stated that volume, speed, source and types of data were molecules which can meddle with these targets, and now we
increasing, which was a call to prepare for the attack of Big have access to a constellation of biomedical data repositories
Data. In 2007 Nvidia introduced compute unified device which can help us in this regard [39]. Moreover, the evolu-
architecture (CUDA), a framework that allowed program- tion of AI has made big data analytics a lot easier as there is
mers and researchers to use GPU for general purpose com- a myriad of ML techniques available now, which can help in
puting [31]. Since then, with the help of CUDA, researchers extracting useful features, patterns, and structures present in
started using GPUs for DL-driven operations, as high mem- these big biomedical data sets [40]. For target identification,
ory bandwidth of GPUs allowed easy handling of massive a feature like a gene expression is widely used to understand
data involved in DL algorithms, and thousands of cores in disease mechanisms and find genes responsible for the dis-
GPUs allowed simultaneous parallel processing of neural ease. Microarray and RNA-seq technologies have generated
networks. In 2009, Fei-Fei Li launched ImageNet, which is a large amount of gene expression data for various disorders.
a free database containing millions of labeled images that NCBI Gene Expression Omnibus (GEO) (https://w ww.n cbi.
can be used for research purposes [32]. AlexNet, a convo- nlm.nih.gov/geo/) [41], The Cancer Genome Atlas (TCGA)
lutional neural network, was created by Alex Krizhevsky (https://www.cancer.gov/about-nci/organization/ccg/resea
13
Molecular Diversity (2021) 25:1315–1360 1321
rch/structural-genomics/tcga) [42], Arrayexpress (https:// metabolism, and excretion (ADME), toxicity properties of
www.ebi.ac.uk/arrayexpress/) [43], are some of the big these compounds, and even their target interactions. Fur-
repositories which contain gene expression data. By analyz- ther, DrugBank (https://go.drugbank.com/) [58] is another
ing gene expression signatures, we can find out target genes open access pharmaceutical data repository which contains
responsible for different disorders. For example, using the data of various drugs, their targets, and mechanism [59].
ML approach and gene expression data, van IJzendoorn et al. Additionally, the library of integrated network-based cellular
2019 found out novel biomarkers and potential drug targets signature (LINCS) L1000 (https://lincsproject.org/LINCS/)
for rare soft tissue sarcoma [44]. [60] is another repository that contains information on the
Further, genome-wide association studies (GWAS) can change in gene expression signatures of human cell lines
determine the interrelation of genomic variants with par- when treated with different chemical compounds. LINCS
ticular complex disorders [45]. GWAS central (https://w ww. L1000 data-driven search engine, known as L 1000CDS2, is
gwasce ntral.o rg/) [46], NHGRI-EBI GWAS Catalog (https:// an open-access search engine that contains data of drugs that
www.ebi.ac.uk/gwas/home) [47] are some of the reposito- can revert the expression of differentially expressed genes;
ries which contain GWAS data. Further, with the help of hence, they too can be used for drug discovery [61]. Further,
GWAS, we can ascertain the disease-associated genetic loci, the protein data bank (PDB) (https://www.rcsb.org/) [62]
and it has been observed that genes linked with these loci is another freely accessible online repository that contains
are potential therapeutic targets. For instance, Li et al. [48] data of three-dimensional structures of proteins, DNA, RNA
used the GWAS catalog, gene expression, epigenomics, and [63]. PDB data are also widely used to assess protein–ligand
methylation data to determine target genes associated with interactions and then find appropriate inhibitors of a target
juvenile idiopathic arthritis loci through ML analysis . In protein. Xu et al. [64] combined ML and molecular docking
addition, specific genes whose mutations can lead to dif- to find inhibitors of COVID 3CL proteinase; here, the crystal
ferent threatening diseases are also promising therapeutic structure of COVID 3CL proteinase was obtained from PDB.
targets. These risk genes can be identified by analyzing the
various genome and exome sequencing data. For sequencing
data, we have public repositories like Sequence read archive Congregation of artificial intelligence
(https://www.ncbi.nlm.nih.gov/sra) [49], which contains and conventional chemistry: improves drug
sequencing data obtained from next-gen sequencing tech- discovery
nology. The National Cancer Institute Genomic Data Com-
mons (NCIGDC) (https://gdc.cancer.gov/) [50] and TCGA In the pharmaceutical industry, AI has emerged as a possible
are data repositories that contain sequencing data related to solution to the problems raised due to classical chemistry or
cancer. Moreover, taking advantage of big data and AI, Han chemical space, which hampers drug discovery and develop-
et al. 2019 have developed DriverML (https://github.com/ ment. With the advancements in technologies and the devel-
HelloYiHan/DriverML), a supervised ML-based tool that opment of high-performance computers, AI algorithms such
can point out driver genes related to cancer [51] [Fig. 2]. as ML to DL have been increased in computer-aided drug
Moreover, sometimes even published literature can be design (CADD). AI is not a new technique for scientists in
used for target identification, and PubMed (https://pubmed. drug discovery and development; neither chemists’ desire to
ncbi.nlm.nih.gov/) [52] is a major repository of the vari- accurately forecast chemical activity-structure relationships.
ous published biomedical literature, whose data mining can For example, Hammett relates equilibrium constants with
help in identifying targets for different disorders. After an reaction rates, whereas Hansch performed computer-assisted
appropriate target has been identified and validated, the next prediction of drug compounds’ physicochemical properties
step is to find suitable drugs and/or drug-like molecules that and biological activity. The success of Hansch provides an
can interact with the target and elicit the desired response avenue for research that will focus on (a) detailed identifi-
[53]. In the age of big data, the multitude of big chemical cation and prediction of the chemical structure along with
databases is at our disposal, which can help in finding per- the characterization of properties such as pharmacophores
fect drugs for a specific target. Likewise, PubChem (https:// and three-dimensional structure and (b) hypothesize com-
pubchem.ncbi.nlm.nih.gov/) [54] is a freely accessible plex mathematical equations that will relate to chemical
chemical database that contains data of various chemical representation and biological activity of the predicted com-
structures, including their biological, physical, chemical, pound. However, scientists’ main aim in the current era is to
and toxic properties [55]. Further, the ChEMBL database improve the drug discovery and development process with
(https://www.ebi.ac.uk/chembl/) [56] is an open access big high accuracy and confidence scores through ML algorithms
database containing data of numerous bioactive compounds based on classical chemistry activities. This will encour-
exhibiting drug-like properties [57]. The ChEMBL data- age chemists to identify the potential of AI techniques for
base also contains information on absorption, distribution, answering two crucial questions of medical chemistry, such
13
1322 Molecular Diversity (2021) 25:1315–1360
as "what should be the next compound?” and "what is the simplified molecular input line entry system and directly
process of making a compound?”. Thus, the last two dec- used as input data instead of any chemical descriptor and
ades developed many techniques and tools for computational act as natural language processing. They have used two dif-
drug discovery, quantitative-structure activity relationship ferent cutoffs for the single data set (Z-score = 3) and the
(QSAR) methods, and free-energy minimization techniques. whole data set (Z-score = 5 or 6). Later on, they incorporated
For example, [65] distinguish compound cell activity using nine different metrics used to evaluate the model’s precision,
machine intelligence methods such as DT, random forest accuracy, the area under the curve, and Cohen’s K value. The
(RF) method, CNN, SVM, LSTM network, and gradient results demonstrated that the gradient boosting machine is
boosting machine. Among the mentioned models, in some competent at balanced data distribution. The experiment’s
models, the compounds were expressed as a string by the outcomes also concluded that classical ML methods and
13
Molecular Diversity (2021) 25:1315–1360 1323
◂Fig. 2 Application of big data for drug designing and discovery: designing provide a great future research platform. Moreo-
with the increase in biological and chemical data from the litera- ver, system biology and chemical scientists worldwide, in
ture, in vitro, in vivo, clinical studies, genomics studies, proteomics
studies, metabolomics studies, gene ontology studies, and molecular
coordination with computational scientists, develop modern
pathway data, different data repositories have been developed. For ML algorithms and principles to enhance drug discovery
instance, ChemSpider, ChEMBL, ZINC, BindingDB, and PubChem and development.
are the essential databases for compound synthesis and screening
in the drug designing and discovery process. The data stored in the
above-said databases were curated and screened out for pharmaco-
logical and physicochemical properties of compound necessary for Transforming traditional computational
the drug discovery process instead of quantum mechanical calcula- drug design through artificial intelligence
tions such as solvation energy and proton affinity the wave function, and machine learning techniques
atomic forces, and transition state. The high-throughput screened data
were subject to filtration based on drug-likeness, PAINS calculation,
ADMET analysis, and toxicity. The filtered compounds were subject For many years computational methods have played an
to artificial intelligence models such as deep learning, random forest, essential role in drug design and discovery, which trans-
classification and regression, and neural networks for further analy- formed the whole process of drug design. However, many
sis. These compounds were then subjected to quantitative-structure
activity relationship and pharmacophore models followed by molecu-
issues like time cost, computational cost, and reliability, are
lar docking and molecular dynamics simulations studies. Afterward, still associated with traditional computational methods [70,
the final predicted compounds were visualized for binding energy 71]. AI has the potential to remove all these bottlenecks
calculations and active site identification. Thus, the final compound in the area of computational drug design, and it also can
was identified and underwent in vitro and in vivo experimental stud-
ies for validation. However, quantum mechanical properties play a
enhance the role of computational methods in drug develop-
crucial role in the process of drug discovery and designing, but these ment. Moreover, with the advent of ML-based tools, it has
properties cannot directly hamper the process of drug designing. QM become relatively easier to determine the three-dimensional
methods include ab initio density functional theory and semi-empir- structure of a target protein, which is a critical step in drug
ical calculations, where accurate calculations use electron correla-
tion methods. QM will become a more prominent tool in the reper-
discovery, as novel drugs are designed based on the three-
toire of the computational medicinal chemist. Therefore, modern QM dimensional ligand biding environment of a protein [72, 73].
approaches will play a more direct role in informing and streamlining Recently, Google’s DeepMind (https://github.com/deepm
the drug-discovery process ind) has devised an AI-based tool trained on PDB structural
data, referred to as AlphaFold, which can predict the 3D
DL methods could classify compound cell activity [65]. structure of proteins from their amino acid sequences [74].
Similarly, [66] predicted the PAMPA effective permeability AlphaFold predicts 3D structures of proteins in two steps:
using a two-QSAR approach, where the authors developed (i) firstly, using a CNN it transforms an amino acid sequence
a classical QSAR model and an ML-based QSAR model of a protein to distance matrix as well as a torsion angle
using a partial least square (PLS) scheme and hierarchi- matrix, (ii) secondly, using a gradient optimization technique
cal SVM (HSVR) scheme. The authors concluded that the it translates these two matrices into the three-dimensional
HSVR scheme executed better than the PLS scheme in the structure of a protein [75]. Likewise, Mohammed AlQurai-
training set, test set, and statistical analysis [66]. Further, for shi from Harvard Medical school has also designed a DL-
the synthesis of new compounds, chemical scientists read- based tool that takes protein’s amino acid sequence as input
ily depended on published literature. With advancements in and generates its three-dimensional structure. This model,
automated drug discovery methods involving AI and ML, it referred as Recurrent Geometric Network (https://github.
is relatively simple to distinguish between existing drugs and com/a qlabo rator y/r gn), uses a single neural network to figure
novel chemical structures. For example, [67] applied a com- out bond angles and angle of rotation of chemical bonds con-
putational approach to screen the hepatotoxic ingredients in necting different amino acids in order to predict the three-
traditional Chinese medicines, whereas [68] demonstrated dimensional structure of a given protein [76].
the phylogenetic relationship, structure–toxicity relationship, Further, quantum mechanics is used to determine the
and herb-ingredient network using computational technique. properties of molecules at a subatomic level, which is used
Recently, Zhang et al. implemented computational analy- to estimate protein–ligand interactions during drug develop-
sis against a novel coronavirus, where the authors screened ment. However, sometimes with conventional computational
different compounds that were biologically active against techniques, quantum mechanics can be computationally
severe acute respiratory syndrome (SARS). Later on, the very expensive and demanding, which can affect its accu-
compounds were subjected to ADME and docking analysis. racy [77]. However, with AI, quantum mechanics can get
The results concluded that 13 existing Chinese traditional more user-friendly and efficacious. Schtutt et al. 2019 have
medicines were effective against novel coronavirus [69]. recently developed a DL-driven tool, referred to as SchNOrb
Thus, conventional chemistry-oriented drug discovery and (https://github.com/atomistic-machine-learning/SchNOrb),
development concepts combined with computational drug which can predict molecular orbitals and wave functions of
13
1324 Molecular Diversity (2021) 25:1315–1360
organic molecules accurately. With these data, we can deter- synthesis pathways by using a collection of chemical rules
mine the electronic properties of molecules, the arrangement which are generated via ML models [88].
of chemical bonds around a molecule, and the location of Additionally, various text mining-based tools have also
reactive sites [78]. Thus, SchNOrb can help researchers in been developed, which can aid the process of traditional
designing new pharmaceutical drugs. Moreover, molecular drug discovery. Text mining uses methods like natural lan-
dynamics (MD) simulation analyzes how molecules behave guage processing (NLP) to transform unstructured texts in
and interact at an atomistic level [79]. In drug discovery, MD various literature and databases into structured data, which
simulation is used to evaluate protein–ligand interactions can be analyzed appropriately to gain new insights. NLP is
and binding stability. One major issue with MD simulation is a branch of AI, which allows computers to process and ana-
that it can be very arduous and time-consuming. AI has the lyze human languages like speech and text through AI-based
capacity to accelerate the process of MD simulation [80]. In algorithms. Taking advantage of this AI driven techniques,
this regard, Drew Bennett et al. performed MD simulations various text mining-based tools have been developed. For
to calculate free energies for transferring 15,000 small mol- instance, Jang et al. 2018 developed PISTON (http://datab
ecules from water to cyclohexane to train a 3D convolutional io.gachon.ac.kr/tools/PISTON/), a tool that can predict drug
network and spatial graph CNN using these free energies and side effects and drug indications, using NLP and topic mod-
some other atomistic features. The researchers found that the eling [89]. Likewise, DisGeNET (https://w ww.d isgen et.o rg/)
trained neural networks predicted free energies of transfer is a text mining-driven database that contains a plethora of
with almost similar accuracy compared to MD simulation information on gene-disease and variants-disease relation-
calculations [81]. This study shows that ML techniques can ships [90]. Data in DisGeNET can analyze various biological
improvize and expedite MD simulations. However, a large processes like adverse drug reactions, molecular pathways
amount of training data is required to achieve this. involved in disease, drug action on targets. Further, STRING
Moreover, de novo drug design has also taken advantage (https://string-db.org/) is another text mining-driven data-
of AI in recent years. For example, Q.Bai et al. 2020 have base containing a myriad of information on protein–protein
devised MolAIcal (https://molaical.github.io/), a tool that interactions for various organisms [91]. In addition, STITCH
can design three-dimensional drugs in three-dimensional (http://stitch.embl.de/) is another text mining-driven data-
protein pockets [82]. MolAICal designs 3D drugs by action base, which contains information on interactions between
of two components: (i) first component uses DL and genetic proteins and chemicals/small molecules [92]. Information
algorithm trained on the US food and drug administration in STICH can also be used to ascertain binding affinities of
(FDA)-approved drugs, for de novo drug design, (ii) sec- drugs and drug-target association.
ond component combines molecular docking and DL model
trained on ZINC database (https://zinc.docking.org/) [83].
Likewise, Popova et al. 2018 designed a deep reinforcement Artificial intelligence in primary
learning-based algorithm, referred to as ReLeaSE (https:// and secondary drug screening
github.com/isayev/ReLeaSE), for de novo drug design.
ReLeaSE achieves its desired outcome by integrating two Today AI has come out as a very successful and demanding
deep neural networks (DNN), known as generative and pre- technology because it saves time and is cost-efficient [93].
dictive, where the generative model is used to produce new In general, cell classification, cell sorting, calculating prop-
compounds, and the predictive model is used to predict the erties of small molecules, synthesizing organic compounds
properties of the compound [84]. Further, in recent times, with the help of computer programs, designing new com-
AI has been used to upgrade the process of synthesis plan- pounds, developing assays, and predicting the 3D structure
ning as well, a process that is used to determine an optimal of target molecules are some time-consuming and tiresome
synthesis pathway for a molecule of interest. Recently, Grzy- tasks which with the help of AI can be reduced and can
bowski et al. [85] developed a DT-based program, referred to speed up the process of drug discovery [94, 95]. The primary
as chematica, to design novel synthesis pathways for desired drug screening includes the classification and sorting of cells
molecules. Similarly, Genheden et al. have implemented by image analysis through AI technology. Many ML mod-
AiZynthFinder (https://github.com/MolecularAI/aizynthfin els using different algorithms recognize images with great
der), an open-source tool for retrosynthesis planning built accuracy but become incompetent when analyzing big data.
on Monte Carlo tree search, which is regulated by a neural To classify the target cell, firstly, the ML model needs to be
network [86]. Likewise, Segler et al. [87] used the integra- trained so that it can identify the cell and its features, which
tion of three distinct neural networks in conjugation with is basically done by contrasting the image of the targeted
the Monte Carlo tree search to discover novel retrosynthe- cells, which separates it from the background [96]. Images
sis routes. ICSYNTH (https://www.deepmatter.io/products/ with varying textured features like wavelet-based texture
icsynth/) is another tool that can produce novel chemical features and Tamura texture features are extracted, which
13
Molecular Diversity (2021) 25:1315–1360 1325
is further reduced in dimensions through principal compo- [106–108]. The above-said limitations of drug discovery and
nent analysis (PCA). A study suggests that least-square SVM development can be addressed by implementing AI-based
(LS-SVM) showed the highest classification accuracy of tools and techniques. AI is involved in every stage of the
95.34% [97, 98]. Regarding cell sorting, the machine needs drug development process such as small molecules design,
to be fast to separate out the targeted cell type from the given identification of drug dosage and associated effectiveness,
sample. Evidence suggests that image-activated cell sorting prediction of bioactive agents, protein–protein interactions,
(IACS) is the most advanced device that could measure the identification of protein folding and misfolding, structure
optical, electrical, and mechanical properties of the cell [99] and ligand-based VS, QSAR modeling, drug repurposing,
[Fig. 3]. prediction of toxicity and bioactive properties, and identi-
The secondary drug screening includes analyzing the fication of mode of action of drug compounds as discussed
physical properties, bioactivity, and toxicity of the com- below.
pound. Melting point and partition coefficient are some of
the physical properties that govern the compound’s bio- Peptide synthesis and small molecule design
availability and are also essential to design new compounds
[100], while designing a drug, molecular representation Peptides are a biologically active small chain of around 2–50
can be done using different methods like molecular fin- amino acids, which are increasingly being explored for ther-
gerprinting, simplified molecular-input line-entry system apeutic purposes as they have the ability to cross the cellular
(SMILES), and Coulomb matrices [101]. These data can be barrier and can reach the desired target site [109]. In recent
used in DNN, which comprises two different stages, namely years, researchers have taken advantage of AI and used it to
generative and predictive stage. Though both the stages are discover novel peptides. For instance, Yan et al. 2020 devel-
trained separately through supervised learning, when they oped Deep-AmPEP30, a DL-based platform for the identifi-
are trained jointly, bias can be applied to the output, where cation of short anti-microbial peptides (AMPs) [110]. Deep-
it is either rewarded or penalized for a specific property. AmPEP30 (https://cbbio.online/AxPEP/) is a CNN-driven
This whole procedure can be used for reinforcement learning tool that predicts short AMPs from DNA sequence data.
[84]. Matched molecular pair (MMP) has been extensively Using Deep-AmPEP30, Yan et al. identified novel AMPs
used for QSAR studies. MMP is associated with a single from the genome sequence of C. glabrate, a fungal patho-
change in a drug candidate, which further influences the gen present in the GI tract. Likewise, Plisson et al. 2020
bioactivity of the compound [102]. Along with MMP, other combined the ML algorithm with an outlier detection tech-
ML methods are used like DNN, RF, and gradient boosting nique to discover AMPs with non-hemolytic profiles [111].
machines (GBM) to get modifications. It has been observed In addition, Kavousi et al. developed IAMPE (http://c bb1.u t.
that DNN can predict better than RF and GBM [103]. With ac.ir/), a web server for the identification of anti-microbial
the increase in databases, which are publicly available like peptides, which integrates 13CNMR-based features and phys-
ChEMBL, PubChem, and ZINC, we have access to millions icochemical features of peptides as input to ML algorithms,
of compounds annotating information like their structure, in order to identify novel AMPs [112]. Similarly, Yi et al.
known targets and purchasability; MMP plus ML can predict 2019 devised ACP-DL (https://g ithub.c om/h aiche ngyi/A
CP-
bioactivity like oral exposure, intrinsic clearance, ADMET, DL), a DL-based tool for the discovery of novel anti-cancer
and method of action [98, 104, 105]. Optimizing the toxicity peptides [113]. ACP-DL uses the LSTM algorithm, which is
of a compound is the most time-consuming and expensive an improved version of the recursive neural network (RNN),
task in drug discovery and is a crucial parameter as it adds for differentiating anti-cancer peptides from non-anti-cancer
significant value to the drug development process. peptides. Moreover, Yu et al. [114] proposed DeepACP, a
deep recurrent neural network-based model for identifying
anti-cancer peptides. Likewise, Tyagi et al. 2013 developed
Applications of artificial intelligence in drug an SVM-based platform for identifying new anti-cancer pep-
development process tides [115]. In addition, Rao et al. 2020 combined a graphi-
cal convolutional network and one-hot encoding to design
The most arduous and desponding step in the drug discov- ACP-GCN for the discovery of anti-cancer peptides [116].
ery and development process is identifying suitable and Moreover, Grisoni et al. used an ensemble of four counter
bioactive drug molecules present in the vast size of chemi- propagation ANN for identifying new anti-cancer peptides.
cal space, which is in the order of 1060 molecules. Further, Likewise, Wu et al. [117] proposed PTPD, a tool based on
the drug discovery and development process are considered CNN and word2vec, for the discovery of novel peptides for
a time- and cost-consuming process. The most infuriating therapeutics.
point is that nine out of ten drug molecules usually fail to Moreover, small molecules are molecules that have very
pass phase II clinical trials and other regulatory approvals low molecular weight, and like peptides, small molecules
13
1326 Molecular Diversity (2021) 25:1315–1360
are too being explored for therapeutic purposes using AI- a generative reinforcement learning-based tool for the de
based tools. For instance, Zhavoronkov et al. [118] devised novo design of small molecules. With the help of GEN-
generative tensorial reinforcement learning (GENTRL), TRL (https:// g ithub. c om/ i nsil i come d icine/ G ENTRL),
13
Molecular Diversity (2021) 25:1315–1360 1327
◂Fig. 3 Artificial intelligence in primary and secondary drug screen- Further, Pantuck et al. [123] developed CURATE.AI, to
ing: in drug discovery and designing pipeline, screening of potential determine adequate drug dose, which uses a patient’s per-
lead is crucial, and artificial intelligence plays a great role in identi-
fying novel and potential lead compounds. There are approximately
sonal data and transforms it to CURATE.AI profile in order
106 million chemical structure presents in chemical space from dif- to ascertain optimum dose. The study was performed, where
ferent studies such as OMIC studies, clinical and pre-clinical studies, a combination of cancer drug enzalutamide and investiga-
in vivo assays, and microarray analysis. With machine learning mod- tion drug ZEN-3694 was given to a patient with metastatic
els such as reinforcement models, logistic models, regression models,
and generative models, these chemical structures are screened out
castration-resistant prostate cancer. Using CURATE.AI, in
based on active sites, structure, and target binding ability. The com- the course of time, they found a 50% lower than starting dose
plete drug discovery process through artificial intelligence will take of ZEN-3694, which can achieve desired results and arrest
about 14–18 years, which is comparatively less than the traditional the cancer growth.
drug discovery process. The first step in the drug discovery pro-
cess is lead identification, in which disease-modifying target protein
Further, Julkunen et al. [124] devised comboFM (https://
is identified through reverse docking, bioinformatics analysis, and github.c om/a alto-i cs-kepaco/c omboF M), a novel ML-
computational chemical biology. In the second step, primary screen- driven tool, which ascertain appropriate drug combinations
ing of compounds is done to select potential lead compounds, which and dose in pre-clinical studies like cancer cell lines. com-
can inhibit target protein. This can be done through virtual screening
and de novo designing. The next step in the drug discovery process
boFM determines appropriate drug combinations and dose
includes lead optimization and lead compound identification through by using factorization machines (https://github.com/geffy/
focused library design, drug-like analysis, drug-target reproducibility, tffm), an ML framework for high-dimensional data analy-
and computational biology. Afterward, secondary screening of com- sis. In their study, using comboFM, Julkunen et al. identi-
pounds is performed, followed by pre-clinical trials. The drug discov-
ery process’s final step is clinical development through cell-culture
fied a novel combination of anti-cancer drugs crizotinib and
analysis, animal model experimentation, and patient analysis bortezomib, showing promising efficacy in lymphoma cell
lines. Similarly, Sharabiani et al. used the ML approach to
determine the optimum initial dose of anticoagulant drug
Zhavoronkov et al. discovered novel inhibitors of an enzyme, warfarin. They used relevance vector machines to classify
DDR1 kinase [118]. Likewise, McCloskey et al. [119] com- different patients based on their dose demands, and then,
bined DNA-encoded small molecule libraries (DEL) data regression models were used to predict appropriate doses for
with ML models like Graph CNN and RF to discover novel the patients [125]. Likewise, Nemati et al. [126] developed
small drug-like molecules. Similarly, Xing et al. [120] inte- a deep reinforcement learning model trained on multipa-
grated XGBoost, SVM, and DNN to find small molecules rameter intelligent monitoring in intensive care II database
for targets implicated in rheumatoid arthritis. (MIMIC II) to find an ideal dose of another anticoagulant
drug, heparin. Likewise, Tang et al. [127] used ML tech-
Identification of drug dosage and drug delivery niques like ANN, Bayesian additive regression trees, boosted
effectiveness regression trees, multivariate adaptive regression splines to
determine the optimum dose of immunosuppressive drug
Administering an improper dose of any drug to a patient Tacrolimus. Moreover, Hu et al. [128] performed ML analy-
can lead to undesirable and lethal side effects; hence, it is sis with techniques like classification and regression trees,
crucial to determine a safe drug dose for treatment purposes. multilayer perceptron network, k-nearest neighbor to find
Over the years, it has been challenging to ascertain the opti- out the safe initial dose of cardiac drug digoxin. In addition,
mum dose of a drug that can achieve the desired efficacy Imai et al. [129] developed a DT model to find a safe starting
with minimum toxic side effects [121]. With the emergence dose of antibiotic drug vancomycin.
of AI, lots of researchers are taking the help of ML and
DL algorithms to determine appropriate drug dosage. For Predicting bioactive agents and monitoring of drug
instance, Shen et al. [122] developed an AI-based platform, release
referred to as AI-PRS, to determine the optimum dose and
combinations of drugs to be used for HIV treatment through Designing and monitoring of drug-likeness is a tedious and
antiretroviral therapy. AI-PRS is a neural network-driven time-consuming process. Lately, multiple online tools have
approach, which relates drug combinations and dosage to been developed to analyze drug release and check account-
efficacy through a parabolic response curve (PRS). In their ability of selected bioactive compounds as a carrier. Bench-
study, Shen et al. administered a combination of tenofovir, mark data sets are later used to validate the computational
efavirenz, and lamivudine to 10 HIV patients, and in due analysis. For such evaluation’s pharmacophore based on
course, using the PRS method, they found out the dose of the chemical feature suits the best. These models construct
tenofovir could be reduced by 33% of the starting dose with- large 3D data sets developed via in silico experiments or in
out causing virus relapse. Hence, using AI-PRS optimum house compound collection [130]. To study ligand-based
drug dosage can be found out for other diseases as well. chemical features, various successful experiments have been
13
1328 Molecular Diversity (2021) 25:1315–1360
established using the CATALYST program (www.accelrys. predict Drug-ADR correlation. Moreover, Raja et al. [144]
com), and a group of researchers was successful in predict- used machine learning analysis to predict ADRs, which are
ing 11β-hydroxysteroid dehydrogenase type 1 inhibitors a result of drug-drug interactions. They further used their
using the VS experiments [131]. model to predict ADR related to cutaneous disease drugs.
Determining bioactive ligands is a crucial step for select- Besides screening for an effective bioactive agent, another
ing a potent drug for a specific target. Now, researchers are critical area to work with is drug likeliness and its interac-
taking advantage of artificial intelligence in determining tion post-release. Recently, a freely accessible, user-friendly
bioactive compounds that can be used for specific targets graphical interface SwissADME (http://w ww.s wissa dme.c h)
associated with a disease. For instance, Wu et al. integrated was developed to evaluate the compatibility of the drug and
DL and RF methods to devise WDL-RF (https://zhanglab. its pharmacokinetic actions [145]. Mathematical models
ccmb.m ed.u mich.e du/W
DL-R
F/) for determining bioactivity such as Higuchi, Hixson–Crowell, Ritger–Peppas–Kormey-
of G protein-coupled receptors (GPCRs) targeting ligands. ers, Brazel–Peppas, Baker–Lonsdale, Hopfenberg, Weibull,
Likewise, Cichonska et al. [132] developed pairwiseMKL and Peppas–Sahlin have also been applied in drug discovery,
(https://github.com/aalto-ics-kepaco), a multiple kernel and one of the most common practice has been the calcula-
learning-based method, for determining the bioactivity of tion of drug loading capacity of the selected or screened
compounds [133]. To test their model’s efficiency, they used bioactive molecule.
to predict the anti-cancerous potency of compounds. Fur-
ther, Mustapha et al. [134] developed an Xgboost model to Prediction of protein folding and protein–protein
determine bioactive chemical molecules. In addition, Mer- interactions
get et al. [135] created machine learning models like DNN,
RF to determine the bioactivity of more than 280 differ- Analyzing protein–protein interactions (PPIs) is crucial for
ent kinases. Furthermore, Arshadi et al. [136] have devised effective drug development and discovery. Most of the pro-
DeepMalaria, a DL-based model for identifying compounds tein annotation methods use sequence homology that has
having Plasmodium falciparum inhibitory activity. Likewise, limited scope. High-throughput protein–protein interaction
Sugaya et al. [137] created a ligand-efficiency-driven support data, with ever-increasing volume, are becoming the foun-
vector regression model to ascertain the biological activity dation for new biological discoveries. A great challenge
of various chemical compounds. Moreover, Afolabi et al. to bioinformatics is to manage, analyze, and model these
[138] used data from the MLD drug data report (MDDR) data. Hence, computational models were developed that
repository and applied it to a combination of boosting algo- predicts multiple inputs at one place simultaneously [146].
rithms to identify novel bioactive compounds. Additionally, Computational methods are implied to study both PPIs and
Petinrin et al. [139] used the majority voting technique with protein–protein non-interactions (PPNIs), although PPIs are
an ensemble of different machine learning models to deter- considered more informative than PPNIs. PPIs prediction
mine biologically active molecules. can be identified as direct PPI, direct PPI with indirect func-
Further, adverse drug reactions (ADRs) are unexpected, tional associations and PPIs for signal transduction path-
pernicious, fatal side effects caused by drug administration. ways [147]. Machine and statistical learning approaches like
ADRs are a major challenge in drug development, and it has K-nearest neighbor, Naïve Bayesian, SVM, ANN, DT, and
become essential to identify possible ADRs during the nas- RF are used to predict the hindrance in PPIs. Use of Bayes-
cent stage of drug development to make the drug develop- ian network (BN) has been applied to predict PPIs essen-
ment process more robust and efficacious. Lately, research- tially using gene co-expression, gene ontology (GO), and
ers have used AI to determine possible ADRs associated other biological process similarity. Data set integration using
with different drugs before they are launched in the market BN produces precise and accurate PPI networks illustrating
for public use. For instance, Dey et al. [140] used DL-based comprehensive yeast interactome [148]. Another group also
model, which can predict ADRs associated with a drug and used BN to combine data sets for the yeast to study PPIs
even identify chemical substructures responsible for those [149]. A novel hierarchical model PCA-ensemble extreme
ADRs. In addition, Liu et al. [141] integrated chemical, learning machine (PCA-EELM) to predict protein–protein
biological, phenotypic properties of drugs to predict ADR interactions only using protein sequences information has
associated with it via machine learning analysis. Likewise, appeared as a powerful tool that gives output with accuracy
Jamal et al. [142] combined biological, chemical, and phe- and less duration [150]. Further, DNNs PPIs prediction effi-
notypic properties to predict nervous system ADRs linked ciency was improved by a novel method known as DNN for
with drugs through machine learning analysis. The authors protein–protein interactions prediction (DeepPPI) (http://
also used their model to find out ADRs associated with ailab.ahu.edu.cn:8087/DeepPPI/index.html) [151]. In mam-
current Alzheimer’s drugs. Further, Xue et al. [143] inte- malian cells, signal transduction is mostly controlled by PPIs
grated biomedical network topology with a DL algorithm to between unstructured motifs and globular proteins binding
13
Molecular Diversity (2021) 25:1315–1360 1329
domains (PBDs). To predict these PBDs across multiple conformations [164]. To use ML for VS, there should be a
protein families bespoke ML tool was developed, known as filtered training set comprising of known active and inactive
hierarchical statistical mechanical modeling (HSMM) [152]. compounds. These training data are used to train a model
Prediction of protein–protein interactions based on ML, using supervised learning techniques. The trained model is
domain-domain affinities and frequency tables, a novel tool then validated, and if it is accurate enough, the model is used
referred to as PPI_SVM, was developed in 2011, which is on new data sets to screen compounds with desired activity
freely accessible at (http://code.google.com/p/cmater-bioin against a target [165]. After that, the shortlisted compounds
fo/) [153]. Due to the increased number of solved complex can go for ADMET analysis, followed by various bioassays
structures, a multimeric threading approach, MULTIPROS- before entering clinical trials. Hence, ML has the power to
PECTOR, has been developed. In this method, proteins with speed up VS, make it more robust, and can even reduce false
known template structures are rethreaded, and their interac- positives in VS. Docking is the main principle applied in
tion with other proteins, their interfacial energy, and Z-score SBVS, where several AI and ML-based scoring algorithms
are established [154]. Structure-based threading logistic have been developed such as NNScore, CScore, SVR-Score,
regression tool Struct2Net (http://struct2net.csail.mit.edu) and ID-Score [166]. Similarly, ML and DL methods such
to evaluate the probability of interaction is the first structure- as RFs, SVMs, CNNs, and shallow neural networks have
based PPI predictor apart from homology modeling [155]. been constructed to predict protein–ligand affinity in SBVS.
Gene cluster-based methods calculate the co-occurrence Moreover, AI-based algorithms have been developed for
probability of orthologs of query proteins encoded from the molecular dynamic simulation assays in SBVS [167]. On
same gene clusters. This method is also named domain/gene the other hand, LBVS consists of several steps, and each step
co-occurrence. If two proteins’ genes are not close by in the comes up with novel AI- and ML-based algorithms to speed
genome, then this method cannot reliably predict an interac- up the process and increase reliability. For example, several
tion between these two genes [156, 157]. ML- and DL-based algorithms have been constructed for
the preparation of useful decoy sets such as Gaussian mix-
Structure‑based and ligand‑based virtual screening ture models (GMMs), isolation forests, and artificial neural
networks (ANNs).
In drug designing and drug discovery, VS is one of the cru- Further, ML models such as PARASHIFT, HEX, USR,
cial methods of CADD. VS refers to the identification of a and ShaPE algorithms have been constructed for LBVS
small chemical compound that binds to a drug target. VS is [168, 169]. Currently, with the rise of AI algorithms in the
an efficient method to screen out the promising therapeu- healthcare and pharma industry, different tools and models
tic compound from a pool of compounds [158]. Thus, it have been developed for both LBVS and SBVS. For exam-
becomes an important tool in high-throughput screening, ple, tools such as MTiOpenScreen (http://bioser v.r pbs.
which incurred the problem of high-cost and low-accuracy univ-p aris-d iderot.f r/s ervic es/M TiOpe nScre en/) [170],
rate. In general, there are two important types of VS that are FlexX‐Scan [171], CompScore (http://bioquimio.udla.edu.
structure-based VS (SBVS) and ligand-based VS (LBVS) ec/compscore/) [172], PlayMolecule BindScope (PlayMol-
[159, 160]. The LBVS depends on the chemical structure ecule.org) [173], GeauxDock (http://www.brylinski.org/
and empirical data of both active and inactive ligands, geauxdock) [174], EasyVS (http://biosig.unimelb.edu.au/
which uses the chemical and physiochemical similarities of easyvs) [175], DEKOIS 2.0 [176], PL-PatchSurfer2 (http://
active ligands to predict the other active ligand from a pool www.kiharalab.org/plps2/) [177], SPOT-ligand 2 (http://
of compounds with high bioactivity. However, the LBVS sparks-lab.org/) [178], Gypsum-DL (https://durrantlab.pitt.
does not depend on the 3-D structure of the target protein, edu/gypsum-dl/) [179], and ENRI [180] have been devel-
and thus, this method is implemented where target struc- oped for SBVS. Moreover, mounting evidence validates the
ture or information is missing, and the obtained structural hypothesis that AI plays a critical role in SBVS, such as
accuracy is low [161]. On the other hand, SBVS has been identification of non-peptide cysteine-cysteine chemokine
implemented in such cases where 3-D structural informa- receptor 5 receptor agonists [181], screening of partial ago-
tion of protein or target has been elucidated either through nists of the β2 adrenergic receptor [182], identification of
in vitro or in vivo experiments or through computational bromodomain-containing protein 4 inhibitors [183], discov-
modeling [162, 163]. In general, this method is used to pre- ery of natural product-like signal transducer and activator of
dict the interaction between the active ligand or its associ- transcription 3 dimerization inhibitor [184], prediction of
ated target and to predict the amino acid residues, which VHL and hypoxia-inducible factor 1-alpha inhibitors [185],
are involved in drug-target binding. In comparison with and prediction of Kelch-like ECH-associated protein-nuclear
LBVS, SBVS possesses high accuracy and precision. How- factor erythroid 2-related factor 2 (Keap-Nrf2) small-mol-
ever, SBVS is associated with the problem of an increasing ecule inhibitors [186]. Likewise, Liu et al. 2017 discovered
number of disease-causing proteins and their complicated low toxicity O-GlcNAc transferase inhibitors, whereas Dou
13
1330 Molecular Diversity (2021) 25:1315–1360
13
Molecular Diversity (2021) 25:1315–1360 1331
◂Fig. 4 a Ligand-based virtual screening: in the drug design and dis- ress.com/) [208], Ligity [209], D3Similarity (https://www.
covery process, ligand-based virtual screening is the most crucial d3pharma.com/D3Targets-2019-nCoV/D3Similarity/index.
step, which comprises different steps as shown in the figure. The
initial step consists of database screening and the 3-D structural
php) [210], and GCAC (http://ccbb.jnu.ac.in/gcac) [211].
model’s prediction through the active site for a special target and Emerging evidence suggests the potential implementation
X-ray structure of complexes. Later on, pharmacophore modeling of of AI algorithms in LBVS such as identification of aurora
selected compounds with selected features is performed, followed by kinase A inhibitors [212], G-quadruplex-targeting chemo-
pharmacophore and docking-based virtual screening of compounds.
The screened compounds are subjected to different toxicity and physi-
types [213], PI3Kα inhibitors [214], targeting dengue virus
ochemical properties for further analysis. Finally, the lead compounds non-structural protein 3 helicases [215], potential selective
are subjected to in vitro and in vivo bioassays for validation. b struc- histone deacetylase 8 inhibitors [216], and novel p-Hydroxy-
ture-based virtual screening: it is another type of virtual screening phenylpyruvate dioxygenase inhibitors [217]. Apart from
applied in the drug discovery process, where target structure prepa-
ration and chemical compound library preparation are initial steps.
these mentioned studies number of literature validated the
Afterward, structural analysis and binding site prediction are done, possible implementation of AI in LBVS, such as identifica-
followed by molecular docking of compounds with the selected tar- tion of HIV entry inhibitors and potent inhibitors of DNA
get. Later on, molecular dynamics simulation studies are carried out methyltransferase [218, 219]. Like SBVS, LBVS also plays
to validate the screened compounds in silico, followed by experimen-
tal validation through bioassays
a crucial role in identifying potential therapeutic compounds
against novel human coronaviruses. For example, Amin
et al. 2020 demonstrated the molecular docking study of
et al. [187] identified novel glycogen synthase kinase 3 beta some in-house molecules as papain-like protease inhibitors,
(GSK-3β) inhibitors through SBVS [188]. Different studies whereas Hofmarcher et al. 2020 through DNN identified
were conducted on cancer and leukemia through SBVS, such 30,000 compounds from the library across 3.6 M compounds
as the discovery of novel GSK-3β for treatment of acute as CoV-2 inhibitors [220, 221]. Similarly, Choudhary et al.
myeloid leukemia [189], identification of novel protein argi- 2020 identified SARS-CoV-2 cell entry inhibitors, whereas
nine methyltransferase 5 inhibitor in non-small cell lung can- Ferraz et al. 2020 identified bedaquiline, glibenclamide, and
cer [190], identification of vascular endothelial growth factor miconazole as potential therapeutic compounds against cor-
receptor 2 potent compounds for the treatment of renal cell onavirus [222, 223]. Xiao et al. 2018 developed ligand-based
carcinoma [191], identification of multi-targeted inhibitors big data DNN models for VS of compound libraries against
against breast cancer [192], and discovery of Mdm2-p53 six anti-cancer targets. The study integrated 0.5 M chemi-
inhibitor [193]. Recently, novel corona virus became a huge cal compounds, and the models developed were evaluated
problem worldwide, and thus, here also SBVS provides a by tenfold cross-validation [224]. With the growing size of
great opportunity for chemical and biological scientists to chemical compound libraries, it is become so difficult to find
identify novel drug compounds against disease-causing tar- a potential hit and it is like finding a “needle in a haystack.”
gets. For example, Gahlawat et al. 2020 identified that saqui- Thus, SBVS and LBVS have huge role in minimizing the
navir, lithospermic acid, and 11m_32045235 were promising complexity in identification of potential therapeutic com-
therapeutic compound against SARS-Cov-2 main protease, pounds against the disease-causing target. Further, AI-based
whereas Selvaraj et al. 2020 demonstrated that TCM 57,025, models in SBVS and LBVS make it simpler with high accu-
TCM 3495, TCM 5376, TCM 20,111, and TCM 31,007 racy and precision. Table 1 discusses the different AI- and
were therapeutic compounds that interact with the substrate- DL-based web tools and algorithms implemented in LBVS
binding site of N7-MTase [194, 195]. On the same trend, and SBVS.
Cruz et al. 2018 concluded that ZINC91881108 was potent
compound against RIPK2, whereas Simoben et al. 2018 QSAR modeling and drug repurposing
demonstrated eight novel N-(2,5-dioxopyrrolidin-3-yl)-n-
alkylhydroxamate derivatives as smHDAC8 inhibitors with In drug designing and discovery, it is crucial to develop the
IC50 values ranging from 4.4 to 20.3 µM against smHDAC8 relationship between chemical structures and their physi-
[196, 197] [Fig. 4]. ochemical properties with biological activities. Thus, QSAR
Moreover, different algorithms and tools have been devel- modeling is a computational approach through which quanti-
oped for LBVS such as SwissSimilarity (http://www.swiss tative mathematical models can be created between chemical
simila rity.c h/) [198], METADOCK [199], Open-source plat- structure and biological activities. The main advantage of
form [200], HybridSim-VS (http://www.rcidm.org/Hybri developing a mathematical model is identifying the diverse
dSim-VS/) [201], PKRank [202], PyGOLD (http://www. chemical structure from molecular databases, which can
agkoch.de/) [203], BRUSELAS (http://bio-hpc.eu/softw be used as therapeutic compounds against a disease target.
are/B
rusel as) [204], RADER (http://r cidm.o rg/r ader/) [205], Once the most promising compound is selected, it is sub-
QEX [206], IVS2vec (https://g ithub.c om/h aiping1010/ jected to laboratory synthesis and in vitro or in vivo testing.
IVS2Ve c) [207], AutoDock Bias (http://a utodo ckbia s.w ordp QSAR models are broadly classified into two types that are
13
1332
13
Table 1 Application of artificial intelligence (AI) algorithms including machine learning (ML) and deep learning principles in structure and ligand-based virtual screening
Tool and software Description Method Feature Reference
LS-align An atom-level, flexible ligand structural alignment algorithm Machine learning Generate fast and accurate atom-level structural alignments [225]
for high-throughput virtual screening. http://zhanglab. of ligand molecules
ccmb.med.umich.edu/LS-align/
LigGrep A tool for filtering docked poses to improve virtual-screening Machine learning It can improve the hit rates of test VS targeting H. sapiens [226]
hit rates. http://durrantlab.com/liggrep/ poly(ADPribose) polymerase 1 (HsPARP1), H. sapiens
peptidyl-prolyl cis–trans isomerase NIMA-interacting 1
(HsPin1p), and S. cerevisiae hexokinase-2 (ScHxk2p)
AutoGrow4 De novo drug design and lead optimization. http://durrantlab. Genetic algorithm The predicted binding modes of the AutoGrow4 compounds [227]
com/autogrow4 mimic those of the known inhibitors, even when Auto-
Grow4 is seeded with random small molecules
DLIGAND2 Improved knowledge-based energy function for protein– Distance-scaled Best performance as a parameter-free statistical potential and [228]
ligand interactions. https://github.com/sysu-yanglab/ among the best in all performance measures
DLIGAND2
StackCBPred A stacking-based prediction of protein-carbohydrate binding Machine learning Predicted structural properties of amino acids to effectively [229]
sites from the sequence. https://bmll.cs.uno.edu/ train a Stacking-based machine learning method for the
accurate prediction of protein-carbohydrate binding sites
LSA A local-weighted structural alignment tool for virtual phar- Conventional similarity algorithms Computes the similarity of two molecular structures by [230]
maceutical screening considering the contributions of both overall similarity and
local substructure match
ProPose Steered Virtual Screening by Simultaneous Protein−Ligand Machine learning The combination of ligand- and receptor-based methods [231]
Docking and Ligand−Ligand Alignment steers the virtual screening by ranking molecules accord-
ing to the similarity of their interaction pattern with known
ligands
TrixX Structure-based molecule indexing for large-scale virtual Machine learning TrixX counts among the fastest virtual screening tools [232]
screening in sublinear time currently available and is nearly two orders of magnitude
faster than standard FlexX
DrugFinder In silico virtual screening service Machine learning It intended as a validation of the screening platform and its [233]
methods, and to promote confidence in its software compo-
nents to produce valuable results
DEEPScreen High-performance drug-target interaction prediction. https:// Convolutional neural networks The DEEPScreen system can be exploited in the fields of [234]
github.com/cansyl/DEEPscreen drug discovery and repurposing for in silico screening of
the chemogenomic space
Molecular Diversity (2021) 25:1315–1360
Molecular Diversity (2021) 25:1315–1360 1333
regression model and classification models. Gaussian pro- processes in a single workbench [251]. A. S. Geoffrey et al.
cesses (GPs) are a type of QSAR building regression model, 2020 conducted two different studies using PyQSAR, such
which is a robust and powerful method of QSAR modeling. as identification of potent drug candidates for novel corona-
GP methods can handle a large number of descriptors and virus and development of QSAR of quercetin and its tumor
identify the crucial ones. Recently, two classification models necrosis factor-alpha inhibition activity [252, 253]. Further,
have been demonstrated using GP that is intrinsic GP clas- Zuvela et al. developed ANN-based QSAR models for pre-
sification methods, and the other is a combination of GP diction of antioxidant activity of flavonoids. In this study, the
regression technique and probit analysis [235, 236]. Fur- authors integrated six methods such as PaD, PaD2, weights,
ther, the method is suitable for modeling nonlinear relation- stepwise, perturbation, and profile for interpretation and
ships and does not require subjective determination of the elucidation of ANN-based models, which calculates trolox-
model parameters [237]. Recent advancements and increas- equivalent antioxidant properties. The results concluded that
ing applications of ML algorithms such as neural networks, the ANN-based algorithm could eliminate the difficulties
DL, and SVM provide a great avenue for QSAR modeling. that arise due to poor interpretation of quantum mechani-
Several web-based tools and algorithms have been devel- cal parameters describing the molecular structure [254]. In
oped for QSAR modeling such as VEGA platform (https:// parallel, Ding et al. 2020 generated a web-based tool known
www.vega-qsar.eu/) [238], QSAR-Co (https://sites.google. as VISAR (https://github.com/Svvord/visar) for dissecting
com/view/qsar-co) [239], FL-QSAR (https://github.com/ chemical features through the DNN QSAR approach [255].
bm2-lab/FL-QSAR) [240], Meta-QSAR (https://github. The mounting evidence demonstrates the implementation of
com/meta-QSAR/simple-tree) (https://github.com/meta- QSAR modeling in drug designing and discovery process
QSAR/drug-target-descriptors) [241], DPubChem (www. such as modeling of ToxCast assays relevant to the molec-
cbrc.k aust.e du.s a/d pubch em) [242], Transformer-CNN ular initiating events of AOPs in Hepatic Steatosis [256],
(https://github.com/bigchem/transformer-cnn) [243], Cloud development of dipeptidyl peptidase 4 inhibitors against
3D-QSAR (http://chemyang.ccnu.edu.cn/ccb/server/cloud dipeptidyl peptidase 8 and dipeptidyl peptidase 9 enzymes
3dQSAR/) [244], MoDeSuS and Chemception (https:// [257], the applicability of QSAR model on domain analy-
github.com/Abdulk084/Chemception) [245]. Karpov et al. sis of HIV-1 protease inhibitors [258], and targeting HIV/
2020 developed a novel algorithm for QSAR modeling based HCV coinfection [259]. A well-recognized problem of ML
on ANN called transformer-CNN. The method uses SMILES models is data imputation for missing values in the bioassay
augmentation for training and interference. Similarly, Wang data for SAR model generation. Basically there are three
et al. 2020 developed QSAR modeling web-based tools by major types of missing values: (i) Missing Completely at
integrating the characteristics features of molecular struc- Random (MCAR), which occurs when the probability of
ture generation, alignment, and molecular interaction field. missing values in a variable is the same for all samples; (ii)
Jin et al. through Cloud 3D-QSAR discovered a potent and Missing at Random (MAR), which means that probability
selective monoamine oxidase B (MAO-B) inhibitor. In this of missing values, at random, in a variable depends only on
study, the authors concluded that (S)-1-(4-((3-fluorobenzyl) the available information in other predictors; (iii) Missing
oxy)benzyl)azetidine-2-carboxamide (C3) were more potent Not at Random (MNAR), which means when probability
and selective inhibitor of MOB as compared to safinamide. of missing values is not random and depends on the infor-
Further, in vivo analysis revealed that compound C3 could mation which is not recorded and the existing information
inhibit cerebral MAO-B activity and rescue 1-methyl-4-phe- predicts the missing values [260]. There are several ways
nyl-1,2,3,6-tetrahydropyridine (MPTP)-induced dopamin- to handle missing values like imputation using zero, mean,
ergic neuronal loss [246]. On the same trend, Bennett et al. median or mode common value, imputation using a ran-
2020, through Chemception, predicted the small molecules domly selected value, imputing with a model or imputa-
transfer free energy by combining MD simulations and DL tion using Deep Learning Library–Datawig. Every data set
[81]. Moreover, the QSAR-Co tool was implemented in dif- has missing values that need to be handled wisely in order
ferent studies such as the development of multi-target chem- to build a robust model [261]. Moreover, the complexity
ometric models for the inhibition of class I phosphoinositide of data should be removed, and data must be curated to
3-kinases enzyme isoforms, screening of ERK inhibitors increase the accuracy and precision of the models gener-
as anti-cancer agents, prediction of K562 cells functional ated. Moreover, initially QSAR models were implemented
inhibitors, and prediction of antifungal properties of phe- for predicting the toxicity and metabolism of small mol-
nolic compounds [247–250]. Likewise, Kim and Cho 2018 ecules such as molecules having molecular weight (mw) less
developed a novel algorithm called PyQSAR (https://g ithub. than 1500 m.w. However, the QSAR technology applied in
com/crong-k/pyqsar_tutorial) for a fast QSAR modeling the early 2000s comes with some sort of constraints such as
platform using ML and Jupyter notebook. PyQSAR is a stan- accuracy and reliability [262]. With the growing applica-
dalone python package that combines all QSAR modeling tion of QSAR in drug discovery and design process such as
13
1334 Molecular Diversity (2021) 25:1315–1360
13
Molecular Diversity (2021) 25:1315–1360 1335
◂Fig. 5 a Quantitative structure–activity relationship workflow: the a platform for future research. ML algorithms replace the
initial step comprises of data set compilation, where data from pub- chemical similarity and molecular docking-based conven-
lic database and literature database are accumulated and compiled,
which further divided into different subsets for investigation. After-
tional methods with new system biology methods, which
ward, data set processing is performed, where data pre-processing can evaluate drug effects [270–273]. Thus, different AI-
and curation followed by calculation of molecular descriptors are based algorithm and web-based tools have been developed
done. After description calculation, data set processing normalization in recent times such as DrugNet (http://genome2.ugr.es/
of data and splitting of data into different sets are performed. In the
third step, model construction is performed, where data sets such as
drugnet/) [274], DRIMC (https://github.com/linwang1982/
internal data and external data are accumulated, and learning algo- DRIMC) [275], DPDR-CPI (http://c pi.b io-x.c n/d pdr/) [276],
rithms are applied for QSAR modeling. Finally, the statistical calcu- PHARMGKB (https://www.pharmgkb.org/) [277], PRO-
lation is done to measure the model robustness. The final step in the MISCUOUS 2.0 (http://bioinformatics.charite.de/promi
quantitative-structure activity relationship is model evaluation, where
the model is evaluated by comparison from previous benchmark
scuous2) [278], and DRRS (http://bioinformatics.csu.edu.
models, identifying characteristics features, performance evaluation, cn/resources/softs/DrugRepositioning/DRRS/index.html)
and interpretation of essential features. b Drug repurposing or repo- [279]. Moreover, Yella and Jegga et al. 2020 constructed
sitioning workflow: the first step is collection of data and data pre- a model for drug repositioning using a multi-view graph
processing followed by computational model generation. The models
generated are support vector machines, logistic regression, random
attention approach known as MGATRx [280], whereas Yan
forest, deep learning, and matrix factorization. Afterward, the genera- et al. 2019 constructed a novel algorithm for drug repurpos-
tion of proof-of-concept from a literature source is performed. Later ing based on a multisimilarity fusion approach known as
on, evaluation of repositioning models through cross-validation, case BiRWDDA [281]. Further, Fahimian et al. 2020 constructed
analysis, and evaluation metrics is performed. Finally, validation of
repurposed drugs is carried out through clinical trials, in vitro studies,
a novel algorithm known as RepCOOL to identify promising
and in vivo studies repurposed drugs for breast cancer stage II. The results con-
cluded that doxorubicin, paclitaxel, trastuzumab, and tamox-
ifen were potential therapeutic agents against breast cancer
VS, lead optimization, and target identification medicinal stage II [282]. Likewise, Li et al. 2020 constructed a com-
scientists and biologist were in constant efforts for devel- putational framework of host-based drug repurposing for
opment of more reliable and dependable approaches [263]. broad-spectrum antivirals against RNA virus. In this study,
AI/ML algorithms-based QSAR models have potential to the authors investigated 2352 approved drugs and 1062 natu-
eliminate the constraints imposed by early methods. AI/ ral compounds against different viral pathogens and con-
ML-based QSAR model, namely hologram-based QSAR cluded that the repurposed drugs were effective against zika
(HQSAR), group-based QSAR (G-QSAR), and Ensemble- virus and coronavirus [283]. Further, Wu et al. 2020 applied
based, have accelerated the drug discovery process by sev- ML models, namely structural profile prediction model and
eral folds [264, 265]. Further, apart from classical Hansch biological profile prediction model, to predict anti-fibrosis
and Free-Wilson approaches, QSAR has gradually evolved drug candidates. The results demonstrated that the area
over the past few years with newer refinementapproaches, under the receiver operating characteristics curve were 0.879
new methods for descriptors calculations, implementation and 0.972 in the training set, whereas 0.814 and 0.874 in the
of methodical validation tests, and involvement of receptor testing set. The results concluded that natural products pos-
structural information. Similarly, apart from classical lead sess anti-fibrosis characteristics and serve as potential anti-
optimization, QSAR have been applied in different emerg- fibrosis drug targets [284]. Recently, COVID-19 emerged as
ing areas of drug discovery and designing such as peptide a global pandemic and researchers around the globe started
QSAR, mixture toxicity QSAR, nanoparticles QSAR, QSAR the hunt for promising therapeutic agents. In this regard AI-
of ionic liquids, cosmetic QSAR, phytochemical QSAR, and based drug repositioning plays a crucial role. For example,
material informatics [266] [Fig. 5]. network-based drug repurposing identified 16 potential
Apart from QSAR modeling, the AI algorithm has also anti-HCoV repurposable drugs, whereas Hooshmand et al.
been implemented in drug repurposing or drug repositioning 2020 identified 12 promising drug targets for COVID-19
method. In drug designing and discovery, drug repositioning based on the multimodal DL approach [285, 286]. In recent
refers to the investigation of drugs that have already been times, the development of neural networks, DL models, and
developed for one diseased condition and reposition them pipelines for drug repositioning have increased to a great
for other diseased conditions. Repositioning drugs might be extent. For example, SNF-CVAE based on drug similarity
successful due to the possibility of multiple-target involve- network fusion identified promising therapeutic agents for
ment in multiple diseases [267–269]. On another note, the Alzheimer’s disease (AD) and juvenile rheumatoid arthri-
emergence of large data sets from genomics, proteomics, and tis, whereas DTI-RCNN based on neural network algorithm
pharmacological in vivo and in vitro studies provides a great and integrates long short-term memory predicts drug-target
avenue for drug repositioning. Recently, the emergence of interactions [287, 288]. PhenoPredict and SDTNBI are
AI-based tools and algorithms in drug discovery provides two other ML-based algorithms used to identify disease
13
1336 Molecular Diversity (2021) 25:1315–1360
phenome-wide drug repositioning for schizophrenia and implementation of MD simulation and AI-based algorithms
prediction of drug-target interactions, respectively [289, can increase the efficiency and accuracy of molecular dock-
290]. Zang et al. 2019 developed a DL-based model known ing. In addition, over the years, limitations in the use of
as deepDR (https://github.com/ChengF-Lab/deepDR) to molecular docking have also been addressed. For instance,
predict in silico drug repositioning. In the study, the authors in drug designing, molecular docking can be used only for
integrate 10 different types of biological networks such as those biological targets whose crystal structures are avail-
drug-disease, drug-side effects, drug-target, and seven drug- able as there are many targets whose structures are not avail-
drug networks. The results concluded that deepDR predicted able. Thus, a technique like homology modeling has been
approved drugs such as risperidone and aripiprazole for the developed to overcome this hindrance [297]. Further, crystal
treatment of Alzheimer’s disease (AD), whereas methylphe- structure data in PDB are increasing exponentially, enhanc-
nidate and pergolide for treatment of Parkinson’s disease ing the applicability of molecular docking in drug discovery.
(PD) [291]. Likewise, Chen et al. 2020 constructed an AI- Table 2 discusses the tools and algorithm that have been
based novel algorithm called as iDrug (https://github.com/ implemented in in silico QSAR and drug repositioning.
Case-esaC/iDrug) for the integration of drug repositioning
and drug-target prediction through cross-network embed- Prediction of physicochemical properties
ding. The efficiency and effectiveness of iDrug allow users and bioactivity
to understand novel clinical insights of drug-target-disease
mechanisms [292]. Studies demonstrated that drug repurpos- It is a well-established fact that every chemical compound
ing through an AI-based algorithm can be implemented in is associated with physicochemical properties such as solu-
cancer. For example, Li et al. 2020 integrated transcriptom- bility, partition coefficient, ionization degree, permeability
ics data and chemical structure information using DL and coefficient, which may hinder the pharmacokinetic proper-
identified that pimozide as a promising therapeutic candidate ties of the compound and drug-target binding efficiency.
against non-small cell lung cancer [293]. Similarly, Kuenzi Thus, the physicochemical properties of compounds must
et al. 2020 predicted drug response and synergy using a DL be considered while designing a novel drug molecule [100,
model of human cancer cells. The results concluded that 298]. For this, different AI-based tools have been developed
predicted combinations improve progression-free survival, to predict the physicochemical properties of chemical com-
and response predictions stratify ER-positive breast cancer pounds. The AI-based tools developed for predicting bio-
patient clinical outcomes [294]. Another AI application physical and biochemical properties of compounds include
in drug repurposing comes from the study performed by molecular fingerprinting, a SMILES format, Coulomb matri-
Wang et al. 2020, which used bipartite graph convolutional ces, and potential energy measurements, which are used in
networks for in silico drug repurposing. The authors con- the DNN training phase [299, 300]. Recently, Zhang et al.
structed a model known as BiFusion (https://github.com/ developed a QSAR model to predict the six different physi-
zcwang0702/BiFusion) through DL and heterogeneous ochemical properties of environmental agents extracted
information fusion. The results demonstrated that BiFusion from environmental protection agency (EPA). Similarly,
achieved improved performance than multiple baselines for Lusci et al. 2013 constructed a neural network-based model
drug repurposing [295]. The examples mentioned above to predict the molecular properties. In the study, molecules
concluded the potential role of AI-based algorithms in drug are described by undirected cyclic graphs, whereas the for-
repurposing. Further, with the advancement in technology, mer approaches for predicting physicochemical properties
chemical scientists, biological scientists, and computational use directed acyclic graphs [301]. Later on, six AI-based
scientists search the methods for improving the accuracy and algorithms were constructed for the prediction of human
precision of AI-based models. Moreover, both QSAR and intestinal absorption of compounds. The methods con-
drug repositioning methods of drug discovery are incom- structed are SVM, k-nearest neighbor, probabilistic neural
plete without the involvement of molecular docking, which network, ANN, PLS, and linear discriminate model. Among
is used to analyze the interaction between the target mol- the above-said models, SVM has higher accuracy of 91.54%
ecule and a ligand molecule. Initially, in the early 2000s [302]. In 2016, Zang et al. developed an ML-based model
molecular docking was developed as a standalone tool that for the prediction of physicochemical properties such as
is used to determine the interaction between two molecules octanol–water partition coefficient, water solubility, boiling
that is a target molecule and a ligand molecule. However, point, melting point, vapor pressure, and bioconcentration
with the advent of AI technology the applicability of molec- factors of environmental chemicals [303]. Moreover, differ-
ular docking has changed. Now molecular docking is being ent AI-based tools have been developed such as ALOGPS
used in conjugation with MD simulation and AI-based tools 2.1 (http:// w ww. vcclab. o rg/ l ab/ a logps/) [304], ASNN
in different areas of drug discovery like VS, target identifica- (http://www.vcclab.org/lab/asnn/) [305], E-BABEL (http://
tion, polypharmacology, and drug repurposing [296]. The www.vcclab.org/lab/babel/) [304], PCLIENT (http://www.
13
Table 2 Application of artificial intelligence (AI) algorithms including machine learning (ML) and deep learning principles in drug design and discovery process
Tool and Software Description Method Feature Reference
QSAR modeling
QSAR-Co-X Open-source toolkit for multi-target QSAR mode- Machine learning and classification model Integrate diverse chemical and biological data into a [239]
ling. https://github.com/ncordeirfcup/QSAR-Co-X single model equation
Cloud 3D-QSAR A web tool for the development of quantitative Machine learning Integrating the functions of molecular structure [244]
structure–activity relationship models in drug dis- generation, alignment, molecular interaction field
covery. http://agroda.gzu.edu.cn:9999/ccb/server/ (MIF)
cloud3dQSAR/
ChemDes An integrated web-based platform for molecular Pybel, CDK, RDKit, BlueDesc, Chemopy, PaDEL Format converting, MOPAC optimization and [379]
Molecular Diversity (2021) 25:1315–1360
descriptor and fingerprint computation. http:// and jCompoundMapper fingerprint similarity calculation
www.scbdd.com/chemdes
OntoQSAR An Ontology for Interpreting Chemical and Bio- Machine learning mathematical model Obtain chemical descriptors and biological proper- [380]
logical Data in Quantitative Structure–Activity ties of chemical compounds
Relationship Studies
ChemGrapher Optical graph recognition of chemical compounds Deep learning Produces all information necessary to relate each [381]
component of the resulting graph to the source
image
ChemSAR An online pipelining platform for molecular SAR RDKit or ChemoPy package, scikit-learn package Generating SAR classification models that will ben- [382]
modeling. http://chemsar.scbdd.com/ efit cheminformatics and other biomedical users
ANFIS Evaluate physicochemical descriptors of certain Neuro-fuzzy modeling and principal component ANFIS was applied to train the final descriptors [383]
chemical compounds for their appropriate biologi- analysis (Mor22m, E3s, R3v + , and R1e +) using a hybrid
cal activities in terms of QSAR models with the algorithm consisting of back-propagation and
aid of artificial neural network (ANN) approach least-square estimation while the optimum num-
combined with the principle of fuzzy logic ber and shape of related functions were obtained
through the subtractive clustering algorithm
Drug repurposing
DrugNet Network-based drug-disease prioritization by inte- Machine learning Simultaneous integration of information about dis- [274]
grating heterogeneous data. http://genome2.ugr. eases, drugs and targets can lead to a significant
es/drugnet/ improvement in drug repositioning tasks
RepCOOL Computational drug repositioning via integrating Random forest classifier The potency of the proposed method in detecting [282]
heterogeneous biological networks true drug-disease relationships
GIPAE Computational drug repositioning, designed to Gaussian interaction profile kernel and autoencoder The batch normalization layer and the full-con- [384]
identify new indications for existing drugs, nected layer are introduced to reduce training
complexity
DrPOCS Drug Repositioning Based on Projection onto Machine learning DrPOCS predicts potential associations between [385]
Convex Sets drugs and diseases with matrix completion
HeteroDualNet A dual convolutional neural network with hetero- Neural network Embedded heterogeneous layers of original and [386]
geneous layers for drug-disease association predic- neighboring drug-disease representations in a dual
tion via chou’s five-step rule neural network improved the association predic-
tion performance
RCDR A Recommender Based Method for Computational Collaborative filtering model Prioritize candidate drugs for diseases [387]
Drug Repurposing
13
1337
Table 2 (continued)
1338
13
GRTR Drug-disease association prediction based on graph Regression model Graph-regularized transductive regression is used [388]
regularized transductive regression on a heteroge- to score and rank drug-disease associations
neous network iteratively
SAEROF An ensemble approach for large-scale drug-disease Deep neural network This model is a feasible and effective method to [389]
association prediction by incorporating rotation predict drug-disease correlation, and its perfor-
forest and sparse autoencoder deep neural network mance is significantly improved compared with
existing methods
WGMFDDA A novel weighted-based graph regularized matrix K -nearest neighbor The framework of graph regularized matrix factori- [390]
factorization for predicting drug-disease associa- zation is utilized to reveal unknown associations
tions of drugs with the disease. To evaluate the predic-
tion performance of the proposed WGMFDDA
method, ten-fold cross-validation is performed on
Fdata set
HNet-DNN Inferring new drug–disease associations with deep Deep neural network Topological features for drug-disease associations [391]
neural network based on heterogeneous network from the heterogeneous network and used them to
features train a DNN model
DeepConv-DTI Prediction of drug-target interactions via deep learn- Deep learning Prediction model for detecting local residue patterns [392]
ing with convolution on protein sequences. https:// of target proteins successfully enriches the protein
github.com/GIST-CSBL/DeepConv-DTI features of a raw protein sequence, yielding better
prediction results than previous approaches
DeepH-DTA Predicting Drug-Target Interactions. https://github. Deep learning Heterogeneous graph attention (HGAT) model [393]
com/Hawash-AI/deepH-DTA to learn topological information of compound
molecules and bidirectional ConvLSTM layers for
modeling spatio-sequential information in simpli-
fied molecular-input line-entry system (SMILES)
sequences of drug data
Neg Stacking Drug-target interaction prediction. https://github. Ensemble learning and logistic regression NegStacking can improve the performance of [394]
com/Open-ss/NegStacking predictive DTIs, and it has broad application pros-
pects for improving the drug discovery process
SPIDR Small-molecule peptide-influenced drug repurpos- Genetic algorithm and heuristic search procedure SPIDR has been generalized and integrated into [395]
ing DockoMatic v 2.1
DeepPurpose Library for drug-target interaction prediction. Deep learning Supports the training of customized DTI predic- [396]
https://github.com/kexinhuang12345/DeepPurpose tion models by implementing 15 compound and
protein encoders and over 50 neural architectures,
along with providing many other useful features
DTI-CDF A cascade deep forest model toward the prediction Deep forest model There are 1352 newly predicted DTIs that are [397]
of drug-target interactions based on hybrid fea- proved to be correct by KEGG and DrugBank
tures. https://github.com/a96123155/DTI-CDF databases
Molecular Diversity (2021) 25:1315–1360
Table 2 (continued)
Tool and Software Description Method Feature Reference
Pred-binding Large-scale protein–ligand binding affinity predic- Support vector machine and random forest 1589 molecular descriptors and 1080 protein [398]
tion descriptors in 9948 ligand–protein pairs predicted
DTIs that were quantified by Ki values. The
cross-validation coefficient of determination of
0.6079 for SVM and 0.6267 for RF was obtained,
respectively
Physicochemical properties and bioactivity
Chembench A Publicly Accessible, Integrated Cheminformatics Machine learning Tools and services for computer-assisted drug [399]
Portal. https://chembench.mml.unc.edu design and computational toxicology available on
Molecular Diversity (2021) 25:1315–1360
Chembench
mCSM-lig Quantifying the effects of mutations on protein- Machine learning models, Platinum database Effective in predicting a range of chemotherapeu- [400]
small molecule affinity in genetic disease and tic, antiviral and antibiotic resistance mutations,
emergence of drug resistance. http://structure.bioc. providing useful insights for genotypic screening
cam.ac.uk/mcsm_lig and guiding drug development
CSM-lig A web server for assessing and comparing protein- Machine learning, graph-based chemical signatures Automatically predict binding affinities of collec- [401]
small molecule affinities. http://structure.bioc.cam. based on PDBbind databases tions of structures and assess the interactions
ac.uk/csm_lig made
mCSM-AB A web server for predicting antibody-antigen affin- Machine learning Predicting antibody-antigen affinity changes upon [402]
ity changes upon mutation. http://structure.bioc. mutation which relies on graph-based signatures
cam.ac.uk/mcsm_ab
dendPoint A web resource for dendrimer pharmacokinetics Machine learning and principal component analysis Used to guide dendrimer construct design and [403]
investigation and prediction. http://biosig.unimelb. refinement before embarking on more time-con-
edu.au/dendpoint suming and expensive in vivo testing
MDCKpred A web tool to calculate MDCK permeability coef- Regression model An intuitive way of prioritizing small molecules [404]
ficient of small molecule using membrane-interac- based on calculated MDCK permeabilities
tion chemical features. http://www.mdckpred.in/
Vienna LiverTox Prediction of interactions profiles of small mol- Machine learning classification model Identify pharmacokinetic properties [405]
ecules with transporters relevant for regulatory
agencies. https://livertox.univie.ac.at/
Ambit-SMIRKS A software module for reaction representation, reac- The Chemistry Development Kit Standardization of large chemical databases and [406]
tion search and structure transformation. http:// pathway transformation database and prediction
ambit.sourceforge.net/smirks
COSMOfrag A Novel Tool for High-Throughput ADME Property Quantum Chemistry In the COSMO − RS picture, any molecular infor- [407]
Prediction and Similarity Screening mation is gathered in the so-called σ profiles,
COSMOfrag replaces the single σ profile with a
composition of partial σ profiles, selected by the
use of extensive similarity searching algorithms
RosENet Predicting the absolute binding affinity of protein– Convolutional neural networks Combines voxelized molecular mechanics energies [408]
ligand complexes and molecular descriptors
MDeePred Novel multi-channel protein featurization for deep Deep learning MDeePred is a scalable method with sufficiently [409]
learning-based binding affinity prediction in drug high predictive performance
discovery. https://github.com/cansyl/MDeePred
13
1339
Table 2 (continued)
1340
13
Mode of action and toxicity of compounds
ProTox-II Webserver for the prediction of toxicity of chemi- Molecular similarity, fragment propensities, and Predicts acute toxicity, hepatotoxicity, cytotoxicity, [410]
cals. http://tox.charite.de/protox_II machine learning carcinogenicity, mutagenicity, immunotoxicity
ADMETlab A platform for systematic ADMET evaluation based Designed based on the Django framework in Python Early drug-likeness evaluation, rapid ADMET [411]
on a comprehensively collected ADMET database. virtual screening or filtering and prioritization of
http://admet.scbdd.com/ chemical structures
lazar A modular predictive toxicology framework QSAR model, classification model, and regression Choose between a large variety of algorithms for [412]
model descriptor calculation and selection, chemical
similarity indices, and model building
TargetNet A web service for predicting potential drug-target Naïve Bayes models The server will predict the activity of the user’s [413]
interaction profiling via multi-target SAR models. molecule across 623 human proteins by the estab-
http://targetnet.scbdd.com lished high-quality SAR model, thus generating a
DTI profiling that can be used as a feature vector
of chemicals for wide applications
PSBP-SVM The computational identifier for predicting poly- Machine learning: support vector machines Model contains four machine learning steps, includ- [414]
styrene binding peptides. http://server.malab.cn/ ing feature extraction, feature selection, model
PSBP-SVM/index.jsp training and optimization
IDDkin Prediction of kinase inhibitors. https://github.com/ Deep diffusion model Network-based computational methods could be [415]
CS-BIO/IDDkin employed to aggregate the effective information
from heterogeneous sources
SMPDB 2.0 Comprehensive, colorful, fully searchable and Because of its utility and breadth of coverage, [416]
highly interactive database for visualizing human SMPDB is now integrated into several other data-
metabolic, drug action, drug metabolism, physi- bases, including HMDB and DrugBank
ological activity and metabolic disease pathways.
http://www.smpdb.ca/
DruGeVar Online resource triangulating drugs with genes and Allows users to formulate simple and complex [417]
genomic biomarkers for clinical pharmacogenom- queries
ics. http://drugevar.genomicmedicinealliance.org
DrugPathSeeker Interactive UI for exploring drug-ADR relation via Machine learning Uses a Small Molecular Risk Profiler to make ADR [418]
pathways predictions for a given drug
SNF-NN Computational method to predict drug-disease Neural networks Computational drug repositioning research can [419]
interactions significantly benefit from integrating similarity
measures in heterogeneous networks
DeepDrug A general graph-based deep learning framework for Graph convolutional networks The structural features learned by DeepDrug, [420]
drug relation prediction. https://github.com/wanwe which display compatible and accordant patterns
nzeng/deepdrug in chemical properties, providing additional
evidence to support the strong predictive power of
DeepDrug
Molecular Diversity (2021) 25:1315–1360
Molecular Diversity (2021) 25:1315–1360 1341
vcclab.org/lab/pclient/) [304], E-DRAGON (http://www. [318]. Jiang et al. 2019, using DeepAffinity, proposed a
vcclab.org/lab/edragon/) [304], ChemSpider (http://www. novel protein descriptor for identifying drug-target interac-
chemspider.com/) [306], SPARC (http://sparc.chem.uga. tion, whereas Born et al. 2020 with the help of Deep Affin-
edu/sparc/) [307], and OSIRIS property explorer (https:// ity, identified antiviral candidates for SARS-CoV-2 [319,
www.organic-chemistry.org/prog/peo/) [308]. In 2020, a 320]. The above data validate the importance of ML and
study was conducted to design, synthesize, and ADMET DL algorithms in physiochemical properties and bioactiv-
prediction of bis-benzimidazole as anticancer agents. In the ity of drug molecules during drug designing. However, the
same study, the author calculated molecular properties of validation and accuracy of such algorithms are still a signifi-
compounds through Lipinski’s rule of five and predicted cant drawback from a research perspective. Thus, extensive
the pre-ADMET properties of the synthetic compounds research should be done to maximize the accuracy and preci-
[309]. Further, Puratchikody et al. 2016 used ORISIS prop- sion of AI-based algorithms through curated and extensive
erty explorer in their study to predict the quantitative struc- data input. In Table 2, we have summarized the tools and
tural toxicity of tyrosine derivates intended for safe, potent databases for physiochemical and bioactivity prediction
inflammation treatment. The results concluded that out of based on AI algorithms, including DL, neural networks,
55 potent molecules, only 19 molecules were considered SVM, and others.
as potent cyclooxygenase-2 inhibitors [310]. On similar
lines, RF- and DNN-based models were constructed to Prediction of mode of action and toxicity
predict human intestinal absorption of different chemical of compounds
compounds. Thus, from the examples, it must be concluded
that the AI-based approach has a significant role in drug Drug toxicity refers to the chemical molecule’s adverse effect
discovery and development through the prediction of phys- on an organism or on any part of the organism due to the
icochemical properties. compound’s mode of action or metabolism. The extended
Moreover, the therapeutic activity of drug molecules scope of AI has the potential to predict the off-target and
depends on their binding efficiency with the receptor or tar- on-target effects of drug molecules along with in vivo safety
get, and thus, the chemical molecule, which are not able to analysis of chemical compounds before their synthesis has
show the binding affinity with the drug target, will not be fascinated the scientists associated with the drug develop-
considered as a therapeutic agent. For this reason, the pre- ment process. The involvement of AI has reduced drug
diction of the binding affinity of a chemical molecule with development time, cost, attrition rates, and human resources.
the therapeutic target is vital for drug discovery and develop- For this different web-based tools have been developed such
ment [311]. Recent advancements in AI algorithms enhance as LimTox (http://limtox.bioinfo.cnio.es/) [321], pkCSM
the process of binding affinity prediction, which uses simi- (http://biosig.unimelb.edu.au/pkcsm/) [322], admetSAR
larity features of the drug and its associated target. Several (http://lmmd.ecust.edu.cn/admetsar2/) [323], and Toxtree
web-based tools have been developed, such as ChemMapper (http://toxtree.sourceforge.net/) [324]. Srivastava et al. 2020
and the similarity ensemble approach (SEA). Further, ML- used admetSAR to evaluate the toxicity of Withania somnif-
and DL-based models for the identification of drug-target era as a therapeutic compound against COVID-19, whereas
affinity have been constructed, such as KronRLS, SimBoost, Uygun et al. 2021 incorporated pkCSM for the identifica-
DeepDTA, and PADME [312]. The KronRLS predicts the tion of the therapeutic effect and toxicological properties
similarity between a drug and its target to calculate the drug- of pyrazolo[1,5-a]pyrazine-4(5H)-one derivative on lung
target binding affinity based on the ML algorithm. KronRLS adenocarcinoma cell line [325, 326]. Advancements in AI-
considered both feature-based and similarity-based interac- based approaches led to the development of different toxic-
tion while predicting drug-target binding affinity [313]. DL ity prediction software and web-based tools such as Tox21
approaches such as DeepDTA (https://github.com/hkmzt (https://ntp.niehs.nih.gov/whatwestudy/tox21/index.html)
rk/DeepDTA) [314], and PADME [315] predict drug-target [327], SEA (http://sea.bkslab.org/) [328], eToxPred (https://
binding affinity, which depends on the 3-D structure of a www.b rylin ski.o rg/e toxpr ed-0) [329], and TargeTox (https://
protein. Beck et al. 2020 conducted a study to predict com- github.com/artem-lysenko/TargeTox) [330]. Tox21 evalu-
mercially available antiviral drugs as a potential therapeu- ates the toxicity of 12,707 environmental compounds and
tic agent against novel coronavirus (SARS-CoV-2) through drugs, whereas SEA forecasts the toxicity of 656 marketed
DeepDTA [316]. Similarly, Lee and Kim 2019 predicted the drugs against 73 unintended targets. TargeTox predicts tox-
drug-target interactions by DNN based on large-scale drug- icity risk based on the target-drug biological network. In
induced transcriptome data using PADME [317]. Another 2016, Huang et al. predicted the in vivo toxicity profile and
DL model that uses both RNN and CNN was constructed mechanism characterization of more than 10,000 chemical
to predict drug-target binding affinity, which is called as compounds through modeling Tox21, whereas, in the same
DeepAffinity (https://github.com/Shen-Lab/DeepAffinity) year, Zhou et al. predicted the cancer-relevant proteins using
13
1342 Molecular Diversity (2021) 25:1315–1360
an improved molecular SEA [331, 332]. Further, Gupta and for toxicity prediction depends on the quality and quantity of
Rana. 2019 employed eToxPred to predict the toxicity of data sets. In short, more research should be done to make AI-
small molecules of androgen receptor. The authors incor- based algorithms reliable for toxicity prediction. However,
porated 1444 characteristics features of small molecules on the current ML-based predictors remain inappropriate to
10,273 drugs in which 461 are considered as active and 9812 replace biological systems, but they are sufficient to extend
are inactive [333]. the medicinal chemistry principles in the right direction,
DeepTox (http://bioinf.jku.at/research/DeepTox/tox21. which reduces the number of synthesis cycles. Further, the
html) [334] and PrOCTOR (https://github.com/kgayvert/ detailed description of toxicity prediction AI-based algo-
PrOCTOR) [335], are used for prediction of toxicity of new rithms and tools is discussed in Table 2.
compounds and prediction of the toxicity probability in
clinical trials, respectively. For example, Robledo-Cadena Identification of molecular pathways
et al. 2020 predicted the effect of non-steroidal anti-inflam- and polypharmacology
matory drugs on cisplatin, paclitaxel, and doxorubicin effi-
cacy against cervix cancer cells using PrOCTOR, whereas One of the significant outcomes of AI and ML algorithms
Gilvary et al. 2020 identified the novel indications for 2,576 in drug discovery and development is the prediction and
small molecules incorporated with 16 different drug fea- estimation of overall topology and dynamics of disease net-
tures for PD and Type 2 diabetes [336, 337]. Similarly, work or drug-drug interaction or drug-target relationships
using DeepTox, Simm et al. 2018 analyzed and repurposed [349]. This methodology offers a vast avenue for the identi-
high-throughput imaging assay data to predict the biological fication of novel molecular therapeutic targets for a particu-
activity of different chemical compounds that were targeting lar disease. Text mining-driven databases like DisGeNET,
alternative biological pathways and processes [338]. Fur- STITCH, STRING are widely used to ascertain gene-dis-
thermore, DeepTox was used for the development of several ease associations, drug-target associations, and molecular
ML and DL algorithms, which predicts the toxicity proper- pathways, respectively. For instance, Gu et al. 2020 used
ties and chemical characteristics features of drug compounds the similarity ensemble approach to identify targets for 197
such as SMILES2Vec (predicts chemical properties) [339], most commonly used Chinese herbs. Later, the DisGeNET
Chemception (DNN-based prediction of chemical proper- database was used to associate those drug targets with dif-
ties) [245], DeepSynergy (prediction of anti-cancer drug ferent diseases, thus linking herbs with diseases in which
synergy with DL) [340], and deepAOT (prediction of com- they can be used [350]. Further, chen et al. 2019 used the
pound acute oral toxicity) [341]. However, the accuracy and STITCH database to find targets of potential drugs short-
precision of DeepTox and PrOCTOR could be increased by listed for esophageal carcinoma [351]. Likewise, Taha et al.
using large and refined data sets, which could be achieved 2020 used the STITCH database to find targets for active
with the pharmaceutical industry’s involvement. Recently, constituents of Nandina domestica, a plant used for treat-
other ML-based tools such as SPIDER [342] and read-across ing various tumors. Later STRING database was used to
structure–activity relationships (RASAR) [343] were devel- construct compound-target pathways with the help of the
oped, which are capable of analyzing β-lapachone targets cytoscape tool [352].
and linking molecular structures and toxic properties of an In medicinal chemistry, polypharmacology refers to
unknown compound, respectively. designing a single drug molecule capable of interacting
Zhang et al. [344] developed different toxicity predic- with multiple targets in a disease-related drug-target bio-
tive models for drug-induced liver toxicity based on five logical network. It is best suited for designing a promising
ML algorithms combined with MACCS or FP4 fingerprint- therapeutic agent for more complex diseases such as cancer,
ing. The results demonstrated that the best model yielded neurodegenerative disease (NDDs), diabetes, heart failure,
an accuracy rate of 75% against an external validation data and many others [353–355]. ML-based methods have the
set [344]. Similarly, several toxicity evaluation algorithms potential to analyze guilt-by-association molecular networks
were constructed based on ML methods such as relevance due to strong mining capabilities and data analysis. Further,
vector machine (RVM), regularized-RF, C5.0 trees, eXtreme ML models assist in the rational design of multitarget ligand
gradient boosting (XGBoost), AdaBoost, SVM boosting through the generation of chemical compounds with desired
(SVMBoost), RVM Boosting (RVMBoost). The constructed polypharmacological features as ML models generate a vast
models were used to evaluate rat oral acute toxicity, respira- number of chemical structures with different chemical and
tory toxicity, and urinary tract toxicity [345–348]. In recent topological features. Thus, the probability of discovering
years, the execution of deep-learning algorithms has led to multi-target ligands increases. Furthermore, ML models help
novel approaches for the molecular representation of chemi- in the identification of multi-target ligands, where there are
cal compounds, making DL methods suitable for predicting dissimilar binding pockets. Recent advancements in AI in
compound toxicity. Further, the potential for DL algorithms drug discovery and development have led to the generation
13
Molecular Diversity (2021) 25:1315–1360 1343
of web-based tools and stand-alone software packages for GWAS studies conducted by Isac-Lopez et al. [377] pre-
polypharmacology prediction such as polypharmacology dicted the multiple risk loci and highlighted fibrotic and
browser (PPB) (http://www.gdb.unibe.ch/) [356], TarPred vasculopathy pathways. The results demonstrated that 27
(http://www.dddc.ac.cn/t arpred/) [140], Self-Organizing independent genome-wide-associated signals and 13 novel
Map Based Prediction of Drug Equivalence Relationship risk loci were associated with systematic sclerosis. Martin
(SPiDER) (http://m odlab cadd.ethz.ch/s oftwa re/s pider) et al. studied chromatin interactions to predict novel gene
[357], Targethunter (https://www.cbligand.org/TargetHunt targets in rheumatic diseases. In the same study, the authors
er3D/) [358], PharmMapper (http://lilab-ecust.cn/pharm concluded that 454 high confidence genes were associated
mapper/) [359], ChemMapper (http://lilab.ecust.edu.cn/ with rheumatic disease, in which 48 were drug targets, and
chemmapper/) [360], and Swiss Target Prediction (Swis- 11 were existing targets. Finally, they demonstrated that 367
sTargetPrediction) (http://www.swisstargetprediction.ch/) drugs were suitable for repositioning [378].
[361]. Poirier et al. 2018 conducted an experiment using
PPB for the identification of lysophosphatidic acid acyltrans-
ferase β as a therapeutic target of nanomolar angiogenesis,
whereas Ozhathil et al. 2018 identified potent and selective Implementation of artificial intelligence
small-molecule inhibitors of cation channel transient recep- in de novo drug designing
tor potential cation channel subfamily M member 4 using
PPB [362, 363]. Further, Vleet Van et al. 2018 implemented The iterative process to design 3D structures of receptors to
the TarPred tool for screening strategies and methods for generate a novel molecule is termed as de novo drug design-
improved off-target liability prediction, whereas, in the same ing, which is intended to produce new dynamics. However,
year, Ratnawati et al. predicted the active compounds from de novo drug designing has not seen a boundless use in med-
SMILES codes using backpropagation algorithm [364, 365]. ication disclosure. Further, the field has seen some recovery
Among the above said web-based tools PharmMapper and recently because of advancements in the field of AI [421,
ChemMapper were frequently used for current research. For 422]. VS has emerged as a massive tool in the drug improve-
example, synergistic mechanism of huangqi and huanglian ment measure, as it conducts profitable in silico look in an
for Diabetes Mellitus [366], investigation of blood enrich- enormous number of blends, further, extending yields of
ing mechanism of danggui buxue decoction [367], and pre- potential medicine leads. As a subset of AI, ML is a tech-
diction of multiple mechanisms of Hedyotis diffusa Willd. nique for coordinating VS for drug leads, which generally
On Colorectal Cancer [368], used PharmMapper. Similarly, incorporates gathering a filtered set of compounds, contain-
identification of human copper trafficking blocker in can- ing known actives and inactive compounds to train a model
cer [369], identification of multi-target ligands through [423, 424]. In the wake of setting up the model, it is tested
chemical-protein interaction in AD [370], prediction of the and, if accurate enough, used on a previously unknown data-
anticancer mechanism of Kushen Injection against Hepato- base, to identify novel drug. In this section, we discuss how
cellular carcinoma [371], and discovery of Pteridin-7(8H)- AI has proved to be a boon for drug designing using the de
one-Based as therapeutic compound against epidermal novo technique.
growth factor receptor kinase T790M/L858R mutant [372], In one study, the researchers utilized the indolent space
were performed using ChemMapper. One major limita- portrayal to prepare a model dependent on the quantitative
tion of AI algorithms for polypharmacology prediction is estimate of drug-likeness (QED) drug-similarity score and
inadequate data or reliability of the data set. Thus, quantum the manufactured availability score synthetic accessibility
chemical calculations, which provide fine-tuned data set, score (SAS) [425]. In another distribution, the presentation
should be done and, thus, which can increase the accuracy of such a variational autoencoder was contrasted with an
of a predictive model. antagonistic autoencoder [426]. The ill-disposed autoen-
Moreover, AI in drug development opened the gates for coder comprises of a generative model delivering novel
identifying molecular pathways or molecular targets for compound structures. A second discriminative antagonistic
the treatment of human disease through genomics informa- model is prepared to differentiate genuine particles from pro-
tion, biochemical features, and target specifications [373]. duced ones, while the generative model attempts to trick the
“OpenTargets” (https://w ww.o penta rgets.o rg/) [374], a discriminative one [427]. The antagonistic autoencoder cre-
freeware and ML-based tool, used for prioritizing potential ated more substantial structures than the variational autoen-
therapeutic drug targets with over 71% accuracy. Recently, coder in generation mode essentially. In mix with an in silico
Nabirotchkin et al.identified the unfolded protein response model, novel structures anticipated to be dynamic against
and autophagy-related pathways of common approved drugs the dopamine receptor type, 2 could be gotten. Researches
against COVID-19, whereas Lopez-Cortes et al. identified utilized a generative ill-disposed organization (GAN) to pro-
allele frequencies in colorectal cancer [375, 376]. Further, pose mixes with putative anticancer properties [428].
13
1344 Molecular Diversity (2021) 25:1315–1360
RNN has likewise been effectively utilized for de novo of systems-based pharmacology and polypharmacology,
drug design. Since SMILES strings encode substance struc- method development for the rational design of multi-target
tures in a grouping of letters, RNNs have been utilized to drugs has to become urgent. The first de novo multi-target
generate compound structures. It was observed that RNNs drug configuration program known as LigBuilder V3 (http://
have the potential to utilize SMILES strings for drug design- www.pkumdl.cn/ligbuilder3/) has been devised to design
ing [429]. A similar methodology was likewise effectively ligands for different receptors, numerous coupling locales
utilized for the development of novel peptide structures of one receptor, or different configurations of one receptor.
[430]. Neural network learning was effectively applied to LigBuilder V3 is again used for multi-target drug plans and
inclination the created mixes toward wanted properties enhancement, particularly for compact ligands for proteins
[431]. Similarly, transfer learning was utilized as another with varying ligand binding sites [440]. De novo drug design
system to create novel synthetic structures with an ideal actively seeks to use sets of chemical rules for the fast and
natural action. In the subsequent steps, the organization is efficient identification of structurally new chemotypes with
prepared to get familiar with the SMILES syntax with a huge the desired set of biological properties. Moreover, fragment-
preparing set [432, 433]. In the subsequent advance, the based de novo design tools have been successfully applied in
preparation is proceeded with mixes having the ideal move- the discovery of non-covalent inhibitors. Herein a new pro-
ment. Moreover, additional epochs of training were adequate tocol, called Cov_FB3D, has been devised, which involves
to reach the stage of novel combinations into a compound the in silico assembly of potential novel covalent inhibitors
space involved by dynamic atoms. Five atoms were com- by identifying the active fragments in the covalently binding
bined in light of such a methodology, and the plan action site of the target protein [441].
could be affirmed for four particles against atomic, chemical
receptors [434]. A few distinct designs have been proposed,
which have created legitimate, important novel structures.
The novel synthesis has been investigated by these strate- Artificial intelligence: possible role
gies, with the property dissemination of the created mol- in pharmaceutical manufacturing
ecules or atoms being similar to the extensive training set and clinical trial design
used. The primary application for this strategy was adequate,
with 4 out of 5 atoms indicating the ideal action [435]. Opti- The use of computational methods is quite well established
mization of AI and multi-objective has been a promising in the pharmaceutical industries. However, the introduction
solution to bridge the chemical and biological phases. Novel of AI has given a broader scope to develop new approaches
pairs of multi-objectives based on RNN for the automated that can improve and optimize drug discovery [442]. This
de novo design based on SMILES were developed to find has not only encouraged the scientific community but has
the best possible match between physicochemical properties also resulted in the growing partnership between the phar-
and their constrained biological targets. The results indicated maceutical industry and AI companies [443]. A study stated
that AI and multi-objective optimization allows capturing that the overall success rate for 21,143 drugs was nearly
the latent links joining chemical and biological aspects, thus 5.2% in 2013, which was less than 11.2% in 2005. Thus,
providing easy-to-use options for customizable design strate- the use of AI is mainly associated with a need to reduce
gies, which proved especially effective for both lead genera- attrition and costs [444]. It usually takes 12 years to bring a
tion and lead optimization [436]. new drug to the market, which can cost up to 3 billion USD
ML models like SVM, RF, DNNs, and many others have [445]. Further, it is a huge task to find a new drug when there
been used for drug discovery for analyzing the pharmaceu- are ~ 1060 existing drug-like molecules [446]. The current
ticals applications from docking to VS [437]. Recently, drug drug discovery challenges are related to the toxicity of the
repurposing has emerged as an innovative approach to mini- drug, its side effects, choosing the right target site, appropri-
mize drug development duration that usually involves data ate dosages, and even intellectual property [447]. The phar-
mining and AI [438]. A group proposed a question–answer maceutical industry mostly does not share pharmacokinetic
artificial system (QAAI) that had the capability to repur- and pharmacodynamic measurements of the drugs until they
pose drugs that used Google semantic AI universal encoder are approved. In addition to that, very less drug discovery
to compute the sentence embedding in the red brain JSON data are available to train AI models [448]. There needs to be
database. The study validated prediction for the lipoxyge- a community that can regulate and manage preclinical and
nase inhibitor drug zileuton as a modulator of the NRF2 clinical pharmacology data to accelerate the progress of AI
pathway in vitro, with potential applications to reduce mac- in this field. Recent advances in AI have impacted clinical
rophage M1 phenotype and reactive oxygen species pro- pharmacology in many ways like literature searching and
duction. This novel approach has been proved to effective processing, interactions with online predictive ML models,
for reposition in NDDs [439]. With the rapid development ML methods in framing policy to encourage healthcare in
13
Molecular Diversity (2021) 25:1315–1360 1345
many countries and also to get predictive analysis for drug- progression of this disorder. Hence, there is a dire need for
related information [449, 450]. new drug targets and drug compounds, which can alleviate
When a drug candidate successfully passes all preclinical the symptoms and mitigate the diseased conditions of the
tests, it is then administered to patients under clinical trials, central nervous systems [462]. Nowadays, ML is extensively
which comprises of three phases: Phase 1, drug safety test- used to find novel targets and biomarkers associated with
ing with a small number of people; Phase 2, drug efficacy NDDs. For example, Martínez-Ballesteros et al. 2016 com-
testing with the small number of human subjects affected by bined DT, quantitative association rules, and hierarchical
a particular disease; Phase 3, efficacy studies with a large clustering to determine potential risk genes with AD via
number of patients and after passing the clinical trials FDA gene expression profiling of patient and control samples.
reviews it for approval and commercialization [451, 452]. Further, [463] used a combination of protein–protein interac-
Further, the failure rate of clinical trials adds up to the drug tion networks, autoencoder, and SVM to predict novel target
development process’s inefficiency, and each failed trial genes associated with PD. Likewise, [464] used ML models
ruins the investment and impairs the costs of preclinical like RF, DT, generalized linear model, and rule induction to
testing. The two main reasons behind high failure rates are find out risk genes of HD through gene expression profiling.
improper patient selection and inefficient monitoring during Moreover, [465] used a CNN trained on an extensive GWAS
trials. Furthermore, after the introduction of AI technology, data set to find novel risk single nucleotide polymorphisms
the success rates of clinical trials have improved drasti- and genes associated with ALS.
cally [453]. A system for clinical trial matching has been Moreover, ML techniques are also being used to find
developed by IBM Watson, which uses medical records of suitable inhibitors of target proteins implicated in NDDs.
patients and an abundance of past clinical trial data to cre- For instance, [466] applied a combination of VS, ML, and
ate detailed clinical findings profiles. It could also be used molecular docking to find class 1 and class IIb histone dea-
to keep a check on patients enrolled [454]. AI models can cetylase inhibitors, as HDAC enzymes have been reported
also reduce the cost of clinical trials by enhancing the suc- to promote AD neurotoxicity. Here, ML was used for the
cess rate by analyzing toxicity, side effects, and other related classification of inhibitors and non-inhibitors post-VS. Fur-
parameters [455]. One such example, which predicted the ther, [467] used descriptors derived from MD simulation
outcome of phase I and phase II clinical trials, was based on trajectories of the caspase-8 protein–ligand complex to train
DL and calculated the probability of possible side effects ANN and RF models to find inhibitors of caspase 8 protease,
and pathway activation score, which was further used to a protease that has been implicated in AD pathogenesis. In
train the model [456]. Similarly, another project named another study, [468] used data from a traditional Chinese
Visual Physiological Human was made to support in silico medicine database, followed by VS, molecular docking, and
trials [457]. Further, development in AI technology will help ML techniques, including DL, to find inhibitors of GSK3β,
in better management of clinical trial data, ultimately aiming an enzyme implicated in AD. Further, MD simulation was
to develop personalized medicines. used to assess the stability of GSK3β-ligand interactions.
Additionally, Ponzoni et al. 2019 made a QSAR model for
finding inhibitors of the BACE1 enzyme, which is responsi-
Involvement of artificial intelligence in drug ble for β-amyloid (Aβ) aggregation in AD. Here, the QSAR
development: a case of neurodegenerative model was built using an optimum set of molecular descrip-
diseases tors, which were sorted out using an amalgamation of ML
algorithms, hybridization techniques, backward elimination
NDDs are lethal, multifaceted, enervating disorders of the strategy, and visual analysis [469]. Similarly, [470] used a
central nervous system and a major cause of death world- cascade of Naïve Bayes networks to find potent and safe
wide. AD, PD, Amyotrophic Lateral Sclerosis (ALS), and abelson tyrosine-protein kinase 1 (c-Abl) inhibitors, which
Huntington’s disease (HD) are some of the most commonly promote neuroprotection in PD. Likewise, Shao et al. 2018
observed NDDs, which can ultimately lead to the death of used integration of SVM algorithm and Tanimoto similarity-
the neurons in different areas of the central nervous system based clustering, followed by in vitro experiments, to find
[458]. The aggregation of toxic, misfolded, cytoplasmic pro- novel antagonists of both A 2A adenosine receptor as well as
teins in different brain regions is one of the primary reasons Dopamine D2 receptor, as it has been observed that blocking
for the inception of these disorders [459]. Further, these dis- these two receptors leads to neuroprotection in PD [471]. In
orders can exhibit varying symptoms like cognitive decline, addition, [472] implemented molecular docking, AI-QSAR,
slow movement, tremors, memory loss, depression, speaking and MD simulations to find inhibitors of the NLR family
problems, muscle stiffness [460, 461]. The major challenge pyrin domain containing 3 (NLRP3), an inflammasome
posed by NDDs is in the area of drug discovery as to date, involved in PD pathogenesis. Here, VS followed by dock-
no drug has been discovered, which can arrest and revert the ing was used to shortlist compounds from the traditional
13
1346 Molecular Diversity (2021) 25:1315–1360
Chinese medicine database, whereas AI and QSAR models acquisition under which there are two significant concerns.
were used to ascertain bioactivity of the compounds, fol- Firstly, labeling cannot be binary as the action of drugs in
lowed by assessing their binding stability via MD simula- biological systems is complicated; secondly, the amount of
tions [472]. Similarly, [473] used molecular docking, AI, data available in drug discovery is infinitesimal compared
and MD simulations to discover inhibitors of Galectin-3 to the enormous amount of information available. There-
a protein implicated for neuroinflammation in HD. Here, fore, a community is required that not only provides quan-
molecular docking was used for initial shortlisting, followed tity but the quality of data. In the pharmaceutical industry,
by evaluating the bioactivity of compounds through ML and open data sharing is not common, and Pistoia alliance has
assessing their binding stability through MD simulations. taken the initiative to start a movement that has encouraged
Further, different studies have used ML algorithms for drug many companies to share their data with others. They also
repurposing in NDDs. Similarly, X. Zeng et al. 2019 devel- intend to establish a uniform data format, which is techni-
oped a DL-based drug repurposing tool, called deepDR cally challenging [161]. A possible solution to deal with this
(https://github.com/ChengF-Lab/deepDR), which is used to problem is to develop an algorithm that can handle sparse
find new repurposed drugs for AD and PD [291]. Further- data; one such has been developed by Stanford University
more, [474] proposed telmisartan as potential repurposed named “one-shot learning,” which predicts properties of a
drug for AD by using a genetic network-driven classifica- drug on the basis of heterogeneous data [482]. Moreover,
tion model. In addition, [475] proposed a drug repurposing the accuracy and uncertainty of the experimental data can
strategy for PD by scanning scientific literature through an be used for model building, that is instead of establishing
integration of knowledge representation learning and ML new ML technologies, one can put efforts in training the
algorithms . existing one by tuning large number of hyperparameters
and optimizing it for good results, although some studies
indicated that some reasonable parameters can be used to
Future challenges and possible solutions start the optimization [435]. Molecular representation is also
a challenge as it is one of the governing factors in model
At present, the major challenge for the pharmaceutical building. Few recently developed models learn task-related
industry while developing a new drug is its increased costs features from the raw data and refine the molecular represen-
and reduced efficiency. However, ML approaches and recent tation to a standard. Earlier, drug repurposing used to rely
developments in DL come with great opportunities to reduce only on clinical observations. However, the current large
this cost, increase efficiency, and save time during the drug amount of data comprising of scientific literature, patents,
discovery and development process. Advances in AI algo- and clinical trial results can collectively be used to improve
rithms, especially in DL approaches along with improving the screening process. Additionally, DL-based VS can make
architectural hardware and easy accessibility of big data, are full use of the data and reduce false-positive rates obtained
all indicating toward the third wave of AI. AI approaches due to imbalance in positive and negative data. Lead opti-
in drug development have aroused great interest among mization is also a challenge in order to develop an efficient
researchers, such that many pharmaceutical companies drug with good ADMET properties and target activities;
have collaborated with AI companies. Moreover, the num- however, these parameters are independent and at times
ber of startups in this field has also escalated and reached mutually incompatible with each other. This problem can
230 by June 2020 [476]. Further, DL approaches integrate be solved by optimizing each parameter separately and fur-
data at multiple levels through nonlinear models, which is ther improving the model. Pharmaceutical companies’ faces
the shortcoming of the AI and ML approaches. However, trouble recruiting sufficient number of patients for clinical
integration of data at multiple levels makes DL algorithm trials. AI approaches will help identify and recruit target
advantageous as it provides great accuracy and precision. patients and will also help in managing the collected data.
Moreover, in comparison with AI and ML algorithms, DL Regarding drug discovery for neurodegenerative disorders,
provides a much more flexible architecture to create a neu- the major problem is their unknown pathophysiology which
ral network for a specific problem [477–480]. Applications makes drug identification even more challenging. The “black
of AI like natural language processing, image, and voice box” nature of ML models is an additional challenge where
recognition are easily doable these days, which has beaten even experts cannot explain that how the model arrives at
humans in terms of performance [481]. So, it comes with no a result and comprehend the biological mechanism behind
surprise that AI can very well be used in the drug discovery it. Furthermore, the escalating numbers of ML models and
process. Today, AI is used in drug discovery for target iden- their claim to be latest have left non-professional helpless
tification, hit discovery, lead optimization, ADMET predic- as they cannot decide which model to choose to solve their
tion, and structuring clinical trials. Despite great success, problem. Thus, it will be better if users and developers agree
there are many remaining challenges like high-quality data upon standard objective evaluation and thereafter check
13
Molecular Diversity (2021) 25:1315–1360 1347
the performance of the model. Further, it is important to 2. Hamet P, Tremblay J (2017) Artificial intelligence in medicine.
note that most of the countries do not give patents to those Metabolism. https://doi.org/10.1016/j.metabol.2017.01.011
3. Hassanzadeh P, Atyabi F, Dinarvand R (2019) The significance
inventions which are exclusively created by AI technol- of artificial intelligence in drug delivery system design. Adv
ogy. Moreover, companies who use AI technology for drug Drug Deliv Rev. https://doi.org/10.1016/j.addr.2019.05.001
discovery has to go through vigorous process to copyright 4. Duch W, Swaminathan K, Meller J (2007) Artificial intelli-
their work so as to secure patent rights. Security is also a gence approaches for rational drug design and discovery. Curr
Pharm Des. https://doi.org/10.2174/138161207780765954
major concern, as AI-driven personalized medicine requires 5. Zhang L, Tan J, Han D, Zhu H (2017) From machine learning
person’s genetic code for which personal information will to deep learning: progress in machine intelligence for rational
be required. Finally, faster computation will be required for drug discovery. Drug Discov Today. https://doi.org/10.1016/j.
handling big data and it is said that in future the current drudis.2017.08.010
6. Jordan AM (2018) Artificial intelligence in drug design–the
supercomputers will be replaced by quantum computers or storm before the calm? ACS Med Chem Lett. https://doi.org/
another technology which will do the job in minutes rather 10.1021/acsmedchemlett.8b00500
than taking hours. Although AI has given many novel targets 7. Goel AK, Davies J (2019) Artificial intelligence. In: The Cam-
and novel compounds for different diseases, still there has bridge Handbook of Intelligence. Cambridge
8. Harrer S, Shah P, Antony B, Hu J (2019) Artificial Intelligence
not been any success story where a compound generated for Clinical Trial Design. Sci, Trends Pharmacol. https://doi.
through AI made it to the market for public use. Recently, for org/10.1016/j.tips.2019.05.005
the first time ever, a novel target and its novel inhibitor has 9. Zhong F, Xing J, Li X et al (2018) Artificial intelligence in
been proposed through AI-based tools. In silico medicine, a drug design. Sci China Life Sci. https://d oi.o rg/1 0.1 007/
s11427-018-9342-2
biotechnology company, proposed a novel target involved in 10. Brown N, Ertl P, Lewis R et al (2020) Artificial intelligence in
idiopathic pulmonary fibrosis and made its novel inhibitor chemistry and drug design. J Comput Aided Mol Des. https://
from scratch, through their AI-based tools. The identified doi.org/10.1007/s10822-020-00317-x
small molecule inhibitor has showed good efficacy in human 11. Badillo S, Banfai B, Birzele F et al (2020) An introduction
to machine learning. Clin Pharmacol Ther. https://doi.org/10.
cells and animal models. In December 2020, in silico nomi- 1002/cpt.1796
nated their small molecule inhibitor for investigational new 12. Dutta Majumdar D (1985) Trends in pattern recognition and
drug (IND) enabling studies and they are targeting clinical machine learning. Def Sci J. https://doi.org/10.14429/dsj.35.
trials by early 2022. If the trials are successful, then it will 6027
13. Kubat M (2017) An Introduction to Machine Learning
be, for the first time ever, where a novel target and its inhibi- 14. Aggarwal M, Murty MN (2021) Deep Learning. In: Springer-
tor was proposed through AI-based tools and got approved. Briefs in Applied Sciences and Technology. https://doi.org/10.
Though there are some unavoidable obstacles and tremen- 1007/978-981-33-4022-0_3
dous amount of work has to be done to incorporate AI tools 15. Schmidhuber J (2015) Deep learning in neural networks: an
overview. Neural Netw. https://doi.org/10.1016/j.neunet.2014.
in drug discovery cycle, there is no doubt that in the near 09.003
future AI will bring revolutionary changes in drug discovery 16. Hu YH, Hwang JN (2001) Introduction to neural networks for
and development process. signal processing. In: Handbook of Neural Network Signal Pro-
cessing. CRC Press, pp 12–41
Acknowledgements We would like to thank the senior management of 17 Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep
Delhi Technological University for their constant support and guidance. learning for computational biology. Mol Syst Biol. https://doi.
org/10.15252/msb.20156651
18. McCulloch WS, Pitts W (1943) A logical calculus of the ideas
Authors’ contribution All authors have read the paper and agreed to immanent in nervous activity. Bull Math Biophys 5:115–133.
submit. PK conceived the idea. RG, DS, MS, ST arranged the data. RG, https://doi.org/10.1007/BF02478259
DS, MS, and ST contributed equally to this work. RKA and PK given 19. Turing AM (2009) Computing machinery and intelligence. Pars-
their critical comments and structured this paper. Art work is done by ing the Turing Test: Philosophical and Methodological Issues in
RG, RAK, and PK. Paper is written by PK. the Quest for the Thinking Computer. Springer, Netherlands, pp
23–65
Declarations 20. Samuel AL (1959) Some studies in machine learning using the
game of checkers. IBM J Res Dev 3:210–229. https://doi.org/10.
Conflict of interest There is no conflict of interest declared by the au- 1147/rd.33.0210
thors. 21. Rosenblatt F (1957) The Perceptron: A Perceiving and Recogniz-
ing Automaton, Report 85–60–1
22. KELLEY HJ, (1960) Gradient theory of optimal flight paths.
ARS J 30:947–954. https://doi.org/10.2514/8.5282
23. Dreyfus S (1962) The numerical solution of variational prob-
References lems. J Math Anal Appl 5:30–45. https://doi.org/10.1016/0022-
247X(62)90004-5
1. Lipinski CF, Maltarollo VG, Oliveira PR et al (2019) Advances 24. Fukushima K (1980) Neocognitron: a self-organizing neural net-
and perspectives in applying deep learning for drug design and work model for a mechanism of pattern recognition unaffected
discovery. Front Robot AI. https://doi.org/10.3389/frobt.2019. by shift in position. Biol Cybern 36:193–202. https://doi.org/10.
00108 1007/BF00344251
13
1348 Molecular Diversity (2021) 25:1315–1360
25. Fukushima K (1988) Neocognitron: a hierarchical neural therapeutic targets for soft tissue sarcomas. PLoS Comput Biol
network capable of visual pattern recognition. Neural Netw 15:1–19. https://doi.org/10.1371/journal.pcbi.1006826
1(2):119–130. https://doi.org/10.1016/0893-6080(88)90014-7 45. Lau A, So HC (2020) Turning genome-wide association study
26. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning rep- findings into opportunities for drug repositioning. Comput Struct
resentations by back-propagating errors. Nature 323:533–536. Biotechnol J 18:1639–1650. https://doi.org/10.1016/j.csbj.2020.
https://doi.org/10.1038/323533a0 06.015
27. LeCun Y, Boser B, Denker JS et al (1989) Backpropagation 46. Beck T, Hastings RK, Gollapudi S et al (2014) GWAS Central:
applied to handwritten zip code recognition. Neural Comput a comprehensive resource for the comparison and interrogation
1:541–551. https://doi.org/10.1162/neco.1989.1.4.541 of genome-wide association studies. Eur J Hum Genet. https://
28. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn doi.org/10.1038/ejhg.2013.274
8:279–292. https://doi.org/10.1007/bf00992698 47. Buniello A, Macarthur JAL, Cerezo M et al (2019) The NHGRI-
29. Cortes C, Vapnik V (1995) Support-vector networks. Mach EBI GWAS Catalog of published genome-wide association stud-
Learn 20:273–297. https://doi.org/10.1023/A:1022627411411 ies, targeted arrays and summary statistics 2019. Nucleic Acids
30. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Res. https://doi.org/10.1093/nar/gky1120
Neural Comput 9:1735–1780. https://doi.org/10.1162/neco. 48. Li J, Yuan X, March ME et al (2019) Identification of target
1997.9.8.1735 genes at juvenile idiopathic arthritis GWAS loci in human neu-
31. Ilievski A, Zdraveski V, Gusev M (2018) How CUDA Pow- trophils. Front Genet. https://doi.org/10.3389/fgene.2019.00181
ers the machine learning revolution. 2018 26th Telecommun 49. Leinonen R, Sugawara H, Shumway M (2011) The sequence read
Forum, TELFOR 2018 - Proc 420–425. https://doi.org/https:// archive. Nucleic Acids Res. https://d oi.o rg/1 0.1 093/n ar/g kq101 9
doi.org/10.1109/TELFOR.2018.8611982 50. Jensen MA, Ferretti V, Grossman RL, Staudt LM (2017)
32. Deng J, Dong W, Socher R et al (2010) ImageNet: a large-scale The NCI genomic data commons as an engine for precision
hierarchical image database. Inst Electric Electron Eng IEEE. medicine. Blood 130(4):453–459. https://d oi.o rg/1 0.1 182/
https://doi.org/10.1109/CVPR.2009.5206848 blood-2017-03-735654
33. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet 51. Han Y, Yang J, Qian X et al (2019) DriverML: a machine learn-
Classification with Deep Convolutional Neural Networks. In ing algorithm for identifying driver genes in cancer sequencing
Proceedings of the 25th International Conference on Neural studies. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz096
Information Processing Systems - Volume 1 52. Guillaume JC (1998) PubMed. Ann Dermatol Venereol. https://
34. Le Q V, Ranzato M’ A, Monga R, et al (2012) Building High- doi.org/10.1002/9783527678679.dg10319
level Features Using Large Scale Unsupervised Learning. 53. Canese K, Weis S (2013) PubMed: The bibliographic database.
https://arxiv.org/abs/1112.6209v5 NCBI Handb
35. Jorda M, Valero-Lara P, Pena AJ (2019) Performance evalu- 54. Kim S, Chen J, Cheng T et al (2021) PubChem in 2021: new data
ation of cuDNN convolution algorithms on NVIDIA volta content and improved web interfaces. Nucleic Acids Res. https://
GPUs. IEEE Access 7:70461–70473. https://doi.org/10.1109/ doi.org/10.1093/nar/gkaa971
ACCESS.2019.2918851 55. Kim S, Chen J, Cheng T et al (2019) PubChem 2019 update:
36. Taigman Y, Yang M, Ranzato M, Wolf L (2014) DeepFace: improved access to chemical data. Nucleic Acids Res. https://
Closing the gap to human-level performance in face verifi- doi.org/10.1093/nar/gky1033
cation. In: Proceedings of the IEEE Computer Society Con- 56. Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards
ference on Computer Vision and Pattern Recognition. IEEE direct deposition of bioassay data. Nucleic Acids Res. https://d oi.
Computer Society, pp 1701–1708 org/10.1093/nar/gky1075
37. Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Genera- 57. Bento AP, Gaulton A, Hersey A et al (2014) The ChEMBL bio-
tive Adversarial Networks. Commun ACM. https://doi.org/10. activity database: an update. Nucleic Acids Res. https://doi.org/
1145/3422622 10.1093/nar/gkt1031
38. Gandomi A, Haider M (2015) Beyond the hype: Big data con- 58. Wishart DS, Knox C, Guo AC et al (2008) DrugBank: a knowl-
cepts, methods, and analytics. Int J Inf Manage 35:137–144. edgebase for drugs, drug actions and drug targets. Nucleic Acids
https://doi.org/10.1016/j.ijinfomgt.2014.10.007 Res. https://doi.org/10.1093/nar/gkm958
39. Brazma A, Kapushesky M, Parkinson H et al (2006) [20] Data 59 Wishart DS, Feunang YD, Guo AC et al (2018) DrugBank 5.0: a
Storage and Analysis in ArrayExpress. Methods Enzymol major update to the DrugBank database for 2018. Nucleic Acids
411:370–86. https://doi.org/10.1016/S0076-6879(06)11020-4 Res. https://doi.org/10.1093/nar/gkx1037
40. Lo Y-C, Ren G, Honda H, L. Davis K (2020) Artificial Intelli- 60. Keenan AB, Jenkins SL, Jagodnik KM et al (2018) The library
gence-Based Drug Design and Discovery. In: Cheminformatics of integrated network-based cellular signatures NIH program:
and its Applications: https://doi.org/10.5772/intechopen.89012 system-level cataloging of human cells response to perturbations.
41. Edgar R, Domrachev M, Lash AE (2002) Gene expression Cell Syst 6(1):13–24. https://doi.org/10.1016/j.cels.2017.11.001
omnibus: NCBI gene expression and hybridization array data 61. Duan Q, Reid SP, Clark NR et al (2016) L1000CDS2: LINCS
repository. Nucleic Acids Res. https://d oi.o rg/1 0.1 093/n ar/ L1000 characteristic direction signatures search engine. npj Syst
30.1.207 Biol Appl 2:1–12. https://doi.org/10.1038/npjsba.2016.15
42. Wang Z, Jensen MA, Zenklusen JC (2016) A practical 62. Rose PW, Prlić A, Altunkaya A et al (2017) The RCSB protein
guide to The Cancer Genome Atlas (TCGA). In: Methods in data bank: integrative view of protein, gene and 3D structural
Molecular Biology 1418:111–41: https:// d oi. o rg/ 1 0. 1 007/ information. Nucleic Acids Res. https://doi.org/10.1093/nar/
978-1-4939-3578-9_6 gkw1000
43. Parkinson H, Kapushesky M, Shojatalab M et al (2007) Array- 63. Burley SK, Berman HM, Bhikadiya C et al (2019) RCSB Pro-
Express-a public database of microarray experiments and gene tein data bank: biological macromolecular structures enabling
expression profiles. Nucleic Acids Res. https://doi.org/10.1093/ research and education in fundamental biology, biomedicine,
nar/gkl995 biotechnology and energy. Nucleic Acids Res. https://doi.org/
44. van IJzendoorn DGP, Szuhai K, Briaire-De Bruijn IH, et al 10.1093/nar/gky1004
(2019) Machine learning analysis of gene expression data 64. Xu Z, Yang L, Zhang X et al (2020) Discovery of potential
reveals novel diagnostic and prognostic biomarkers and identifies flavonoid inhibitors against COVID-19 3CL proteinase based
13
Molecular Diversity (2021) 25:1315–1360 1349
on virtual screening strategy. Front Mol Biosci 7:1–8. https:// 82. Bai Q, Tan S, Xu T et al (2020) MolAICal: a soft tool for 3D drug
doi.org/10.3389/fmolb.2020.556481 design of protein targets by artificial intelligence and classical
65. Fan Y, Zhang Y, Hua Y et al (2019) Investigation of machine algorithm. Brief Bioinform 00:1–12. https://d oi.o rg/1 0.1 093/b ib/
intelligence in compound cell activity classification. Mol bbaa161
Pharm. https://doi.org/10.1021/acs.molpharmaceut.9b00558 83. Sterling T, Irwin JJ (2015) ZINC 15-ligand discovery for every-
66. Chi CT, Lee MH, Weng CF, Leong MK (2019) In silico pre- one. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.5b005
diction of PAMPA effective permeability using a two-QSAR 59
approach. Int J Mol Sci. https://doi.org/10.3390/ijms20133170 84. Popova M, Isayev O, Tropsha A (2018) Deep reinforcement
67. He S, Zhang X, Lu S et al (2019) A computational toxicology learning for de novo drug design. Sci Adv 4:1–15. https://doi.
approach to screen the hepatotoxic ingredients in traditional org/10.1126/sciadv.aap7885
chinese medicines: polygonum multiflorum thunb as a case 85. Grzybowski BA, Szymkuć S, Gajewska EP et al (2018) Chemat-
study. Biomolecules. https://doi.org/10.3390/biom9100577 ica: a story of computer code that started to think like a chemist.
68 He S, Zhang C, Zhou P et al (2019) Herb-induced liver injury: Chem 4:390–398. https://doi.org/10.1016/j.chempr.2018.02.024
Phylogenetic relationship, structure-toxicity relationship, and 86. Genheden S, Thakkar A, Chadimová V et al (2020) AiZynth-
herb-ingredient network analysis. Int. J Mol Sci. 20(15):3633. Finder: a fast, robust and flexible open-source software for ret-
https://doi.org/10.3390/ijms20153633 rosynthetic planning. J Cheminform 12:1–9. https://doi.org/10.
69. Zhang D, hai, Wu K lun, Zhang X, et al (2020) In silico screen- 1186/s13321-020-00472-1
ing of Chinese herbal medicines with the potential to directly 87. Segler MHS, Preuss M, Waller MP (2018) Planning chemical
inhibit 2019 novel coronavirus. J Integr Med. https://doi.org/ syntheses with deep neural networks and symbolic AI. Nature
10.1016/j.joim.2020.02.005 555:604–610. https://doi.org/10.1038/nature25978
70. Baldi A (2010) Computational approaches for drug design and 88. Bøgevig A, Federsel HJ, Huerta F et al (2015) Route design in
discovery: an overview. Syst Rev Pharm 1(1):99. https://doi. the 21st century: the IC SYNTH software tool as an idea genera-
org/10.4103/0975-8453.59519 tor for synthesis prediction. Org Process Res Dev 19:357–368.
71. Lavecchia A, Cerchia C (2016) In silico methods to address https://doi.org/10.1021/op500373e
polypharmacology: current status, applications and future per- 89. Jang G, Lee T, Hwang S et al (2018) PISTON: predicting drug
spectives. Drug Discov Today 21(2):288–298. https://doi.org/ indications and side effects using topic modeling and natural
10.1016/j.drudis.2015.12.007 language processing. J Biomed Inform 87:96–107. https://doi.
72. Smith JS, Roitberg AE, Isayev O (2018) Transforming com- org/10.1016/j.jbi.2018.09.015
putational drug discovery with machine learning and AI. ACS 90. Piñero J, Bravo Á, Queralt-Rosinach N et al (2017) DisGeNET:
Med Chem Lett 9(11):1065–1069. https://d oi.o rg/1 0.1 021/ a comprehensive platform integrating information on human dis-
acsmedchemlett.8b00437 ease-associated genes and variants. Nucleic Acids Res. https://
73. Jing Y, Bian Y, Hu Z et al (2018) Deep learning for drug doi.org/10.1093/nar/gkw943
design: an artificial intelligence paradigm for drug discovery 91. Szklarczyk D, Gable AL, Lyon D et al (2019) STRING v11:
in the big data era. AAPS J 20(3):58. https://doi.org/10.1208/ protein-protein association networks with increased coverage,
s12248-018-0210-0 supporting functional discovery in genome-wide experimental
74. Powles J, Hodson H (2017) Google deepmind and healthcare datasets. Nucleic Acids Res. https://doi.org/10.1093/nar/gky11
in an age of algorithms. Health Technol (Berl). https://doi.org/ 31
10.1007/s12553-017-0179-1 92. Szklarczyk D, Santos A, Von Mering C et al (2016) STITCH 5:
75. Senior AW, Evans R, Jumper J et al (2020) Improved protein augmenting protein-chemical interaction networks with tissue
structure prediction using potentials from deep learning. Nature and affinity data. Nucleic Acids Res 44:D380–D384. https://doi.
577:706–710. https://doi.org/10.1038/s41586-019-1923-7 org/10.1093/nar/gkv1277
76. AlQuraishi M (2019) End-to-End differentiable learning of 93. Davenport TH, Ronanki R (2018) Artificial intelligence for the
protein structure. Cell Syst 8:292-301.e3. https://doi.org/10. real world. Harv Bus Rev
1016/j.cels.2019.03.006 94. Zhavoronkov A, Vanhaelen Q, Oprea TI (2020) Will Artificial
77. Kalaiarasi C, Manjula S, Kumaradhas P (2019) Combined Intelligence for Drug Discovery Impact Clinical Pharmacology?
quantum mechanics/molecular mechanics (QM/MM) methods Clin Pharmacol Ther. https://doi.org/10.1002/cpt.1795
to understand the charge density distribution of estrogens in 95. Watson O, Cortes-Ciriano I, Taylor A, Watson JA (2018) A deci-
the active site of estrogen receptors. RSC Adv. https://doi.org/ sion theoretic approach to model evaluation in computational
10.1039/c9ra08607b drug discovery. arXiv. https://arxiv.org/abs/1807.08926
78. Schütt KT, Gastegger M, Tkatchenko A et al (2019) Unifying 96. Tripathy RK, Mahanta S, Paul S (2014) Artificial intelligence-
machine learning and quantum chemistry with a deep neural based classification of breast cancer using cellular images. RSC
network for molecular wavefunctions. Nat Commun. https:// Adv 4:9349–9355. https://doi.org/10.1039/c3ra47489e
doi.org/10.1038/s41467-019-12875-2 97. Samui P, Kothari DP (2011) Utilization of a least square support
79. Gastegger M, McSloy A, Luya M et al (2020) A deep neural vector machine (LSSVM) for slope stability analysis. Sci Iran
network for molecular wave functions in quasi-atomic mini- 18:53–58. https://doi.org/10.1016/j.scient.2011.03.007
mal basis representation. J Chem Phys DOI. https://doi.org/ 98. Chan HCS, Shan H, Dahoun T et al (2019) Advancing Drug
10.1063/5.0012911 Discovery via Artificial Intelligence. Trends Pharmacol Sci
80. De Vivo M, Masetti M, Bottegoni G, Cavalli A (2016) Role 40:592–604. https://doi.org/10.1016/j.tips.2019.06.004
of molecular dynamics and related methods in drug discovery. 99. Ho CWL, Soon D, Caals K, Kapur J (2019) Governance of
J Med Chem 59(9):4035–4061. https://doi.org/10.1 021/acs. automated image analysis and artificial intelligence analytics in
jmedchem.5b01684 healthcare. Clin Radiol 74:329–337. https://doi.org/10.1016/j.
81. Bennett WFD, He S, Bilodeau CL et al (2020) Predicting crad.2019.02.005
small molecule transfer free energies by combining molecular 100. Andrysek T (2003) Impact of physical properties of formulations
dynamics simulations and deep learning. J Chem Inf Model. on bioavailability of active substance: Current and novel drugs
https://doi.org/10.1021/acs.jcim.0c00318 with cyclosporine. In: Molecular Immunology; 39(17–18):1061–
5. https://doi.org/10.1016/s0161-5890(03)00077-4.
13
1350 Molecular Diversity (2021) 25:1315–1360
101. Elton DC, Boukouvalas Z, Butrico MS et al (2018) Apply- 119. McCloskey K, Sigel EA, Kearnes S et al (2020) Machine learn-
ing machine learning techniques to predict the properties of ing on DNA-encoded libraries: a new paradigm for hit finding.
energetic materials. Sci Rep 8:9059. https://doi.org/10.1038/ J Med Chem 63:8857–8866. https://doi.org/10.1021/acs.jmedc
s41598-018-27344-x hem.0c00452
102. Tyrchan C, Evertsson E (2017) Matched molecular pair analysis 120. Xing G, Liang L, Deng C et al (2020) Activity prediction of small
in short: algorithms, applications and limitations. Comput Struct molecule inhibitors for antirheumatoid arthritis targets based on
Biotechnol J 15:86–90. https://doi.org/10.1016/j.csbj.2016.12. artificial intelligence. ACS Comb Sci. https://doi.org/10.1021/
003 acscombsci.0c00169
103. Turk S, Merget B, Rippmann F, Fulle S (2017) Coupling matched 121. Dimmitt S, Stampfer H, Martin JH (2017) When less is more–
molecular pairs with machine learning for virtual compound efficacy with less toxicity at the ED50. Br J Clin Pharmacol
optimization. J Chem Inf Model 57:3079–3085. https://doi.org/ 83(7):1365–1368. https://doi.org/10.1111/bcp.13281
10.1021/acs.jcim.7b00298 122. Shen Y, Liu T, Chen J et al (2020) Harnessing artificial intel-
104. Carpenter KA, Huang X (2018) Machine learning-based virtual ligence to optimize long-term maintenance dosing for antiretro-
screening and its applications to Alzheimer’s drug discovery: a viral-naive adults with HIV-1 Infection. Adv Ther 3:1900114.
review. Curr Pharm Des 24:3347–3358. https://doi.org/10.2174/ https://doi.org/10.1002/adtp.201900114
1381612824666180607124038 123. Pantuck AJ, Lee D-K, Kee T et al (2018) Modulating BET bro-
105. Schyman P, Liu R, Desai V, Wallqvist A (2017) vNN web server modomain inhibitor ZEN-3694 and Enzalutamide combination
for ADMET predictions. Front Pharmacol 8:889. https://doi.org/ dosing in a metastatic prostate cancer patient using CURATE.
10.3389/fphar.2017.00889 AI an artificial intelligence platform. Adv Ther. https://doi.org/
106. Álvarez-Machancoses Ó, Fernández-Martínez JL (2019) Using 10.1002/adtp.201800104
artificial intelligence methods to speed up drug discovery. Expert 124. Julkunen H, Cichonska A, Gautam P et al (2020) Leveraging
Opin Drug Discov 14(8):769–777. https://d oi.o rg/1 0.1 080/1 7460 multi-way interactions for systematic prediction of pre-clinical
441.2019.1621284 drug combination effects. Nat Commun. https://d oi.o rg/1 0.1 038/
107. Fleming N (2018) How artificial intelligence is changing drug s41467-020-19950-z
discovery. Nature. https://doi.org/10.1038/d41586-018-05267-x 125. Sharabiani A, Bress A, Douzali E, Darabi H (2015) Revisiting
108. Segler MHS, Kogej T, Tyrchan C, Waller MP (2018) Generating warfarin dosing using machine learning techniques. Comput
focused molecule libraries for drug discovery with recurrent neu- Math Methods Med. https://doi.org/10.1155/2015/560108
ral networks. ACS Cent Sci. https://doi.org/10.1021/acscentsci. 126. Nemati S, Ghassemi MM, Clifford GD (2016) Optimal medica-
7b00512 tion dosing from suboptimal clinical examples: a deep reinforce-
109 Bruno BJ, Miller GD, Lim CS (2013) Basics and recent advances ment learning approach. Proc Annu Int Conf IEEE Eng Med Biol
in peptide and protein drug delivery. Ther. Deliv 4(11):1443–67. Soc EMBS. https://doi.org/10.1109/EMBC.2016.7591355
https://doi.org/10.4155/tde.13.104 127. Tang J, Liu R, Zhang YL et al (2017) Application of machine-
110. Yan J, Bhadra P, Li A et al (2020) Deep-AmPEP30: improve learning models to predict tacrolimus stable dose in renal trans-
short antimicrobial peptides prediction with deep learning. Mol plant recipients. Sci Rep. https://doi.org/10.1038/srep42192
Ther-Nucleic Acids 20:882–894. https://doi.org/10.1016/j.omtn. 128. Hu YH, Tai CT, Tsai CF, Huang MW (2018) Improvement of
2020.05.006 adequate digoxin dosage: an application of machine learning
111. Plisson F, Ramírez-Sánchez O, Martínez-Hernández C (2020) approach. J Healthc Eng. https://doi.org/10.1155/2018/3948245
Machine learning-guided discovery and design of non-hemo- 129. Imai S, Takekuma Y, Miyai T, Sugawara M (2020) A new algo-
lytic peptides. Sci Rep 10:1–19. https:// d oi. o rg/ 1 0. 1 038/ rithm optimized for initial dose settings of vancomycin using
s41598-020-73644-6 machine learning. Biol Pharm Bull 43:188–193. https://doi.org/
112. Kavousi K, Bagheri M, Behrouzi S et al (2020) IAMPE: NMR- 10.1248/bpb.b19-00729
assisted computational prediction of antimicrobial peptides. J 130. Rollinger JM, Stuppner H, Langer T (2008) Virtual screening
Chem Inf Model 60:4691–4701. https://doi.org/10.1021/acs. for the discovery of bioactive natural products. Prog Drug Res
jcim.0c00841 65:212–249. https://doi.org/10.1007/978-3-7643-8117-2_6
113. Yi HC, You ZH, Zhou X et al (2019) ACP-DL: a deep learning 131. Schuster D, Maurer EM, Laggner C et al (2006) The discovery
long short-term memory model to predict anticancer peptides of new 11β-hydroxysteroid dehydrogenase type 1 inhibitors by
using high-efficiency feature representation. Mol Ther-Nucleic common feature pharmacophore modeling and virtual screening.
Acids 17:1–9. https://doi.org/10.1016/j.omtn.2019.04.025 J Med Chem 49:3454–3466. https://doi.org/10.1021/jm0600794
114. Yu L, Jing R, Liu F et al (2020) DeepACP: a novel computational 132. Wu J, Zhang Q, Wu W et al (2018) WDL-RF: predicting bioac-
approach for accurate identification of anticancer peptides by tivities of ligand molecules acting with G protein-coupled recep-
deep learning algorithm. Mol Ther-Nucleic Acids 22:862–870. tors by combining weighted deep learning and random forest.
https://doi.org/10.1016/j.omtn.2020.10.005 Bioinformatics 34:2271–2282. https://doi.org/10.1093/bioin
115. Tyagi A, Kapoor P, Kumar R et al (2013) In silico models for formatics/bty070
designing and discovering novel anticancer peptides. Sci Rep 133. Cichonska A, Pahikkala T, Szedmak S et al (2018) Learning
3:1–8. https://doi.org/10.1038/srep02984 with multiple pairwise kernels for drug bioactivity prediction.
116. Rao B, Zhang L, Zhang G (2020) ACP-GCN: the identification of Bioinformatics 34:i509–i518. https://d oi.o rg/1 0.1 093/b ioinf orma
anticancer peptides based on graph convolution networks. IEEE tics/bty277
Access 8:176005–176011. https://doi.org/10.1109/access.2020. 134. Babajide Mustapha I, Saeed F (2016) Bioactive molecule predic-
3023800 tion using extreme gradient boosting. Molecules 21:1–11. https://
117. Wu C, Gao R, Zhang Y, De Marinis Y (2019) PTPD: predicting doi.org/10.3390/molecules21080983
therapeutic peptides by deep learning and word2vec. BMC Bio- 135. Merget B, Turk S, Eid S et al (2017) Profiling prediction of
informatics 20:1–8. https://doi.org/10.1186/s12859-019-3006-z kinase inhibitors: toward the virtual assay. J Med Chem 60:474–
118. Zhavoronkov A, Ivanenkov YA, Aliper A et al (2019) Deep learn- 485. https://doi.org/10.1021/acs.jmedchem.6b01611
ing enables rapid identification of potent DDR1 kinase inhibi- 136. Arshadi AK, Salem M, Collins J et al (2020) Deepmalaria: arti-
tors. Nat Biotechnol 37:1038–1040. https://doi.org/10.1038/ ficial intelligence driven discovery of potent antiplasmodials.
s41587-019-0224-x Front Pharmacol. https://doi.org/10.3389/fphar.2019.01526
13
Molecular Diversity (2021) 25:1315–1360 1351
137. Sugaya N (2014) Ligand efficiency-based support vector regres- domain-domain affinities and frequency tables. Cell Mol Biol
sion models for predicting bioactivities of ligands to drug target Lett 16:264–278. https://doi.org/10.2478/s11658-011-0008-x
proteins. J Chem Inf Model 54:2751–2763. https://doi.org/10. 154. Lu L, Lu H, Skolnick J (2002) Multiprospector: an algorithm
1021/ci5003262 for the prediction of protein-protein interactions by multimeric
138. Afolabi LT, Saeed F, Hashim H, Petinrin OO (2018) Ensemble threading. Proteins Struct Funct Genet 49:350–364. https://doi.
learning method for the prediction of new bioactive molecules. org/10.1002/prot.10222
PLoS ONE 13:1–14. https://d oi.o rg/1 0.1 371/j ourna l.p one.0 1895 155. Singh R, Park D, Xu J et al (2010) Struct2Net: a web service
38 to predict protein-protein interactions using a structure-based
139. Petinrin OO, Saeed F (2018) Bioactive molecule prediction approach. Nucleic Acids Res 38:508–515. https://doi.org/10.
using majority voting-based ensemble method. J Intell Fuzzy 1093/nar/gkq481
Syst 35:383–392. https://doi.org/10.3233/JIFS-169596 156. Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation
140. Liu X, Gao Y, Peng J et al (2015) TarPred: a web application for of gene order: a fingerprint of proteins that physically interact.
predicting therapeutic and side effect targets of chemical com- Trends Biochem Sci 23:324–328. https://d oi.o rg/1 0.1 016/S 0968-
pounds. Bioinformatics. https://doi.org/10.1093/bioinformatics/ 0004(98)01274-2
btv099 157. Keskin O, Tuncbag N, Gursoy A (2016) Predicting protein-pro-
141. Liu M, Wu Y, Chen Y et al (2012) Large-scale prediction of tein interactions from the molecular to the proteome level. Chem
adverse drug reactions using chemical, biological, and pheno- Rev 116:4884–4909. https://doi.org/10.1021/acs.chemrev.5b006
typic properties of drugs. J Am Med Informatics Assoc 19:28– 83
35. https://doi.org/10.1136/amiajnl-2011-000699 158. Lavecchia A, Giovanni C (2013) Virtual screening strategies in
142. Jamal S, Goyal S, Shanker A, Grover A (2017) Predicting neuro- drug discovery: a critical review. Curr Med Chem. https://doi.
logical adverse drug reactions based on biological, chemical and org/10.2174/09298673113209990001
phenotypic properties of drugs using machine learning models. 159. Gonczarek A, Tomczak JM, Zaręba S et al (2018) Interaction
Sci Rep 7:1–12. https://doi.org/10.1038/s41598-017-00908-z prediction in structure-based virtual screening using deep learn-
143. Xue R, Liao J, Shao X et al (2020) Prediction of adverse drug ing. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.
reactions by combining biomedical tripartite network and graph 2017.09.007
representation model. Chem Res Toxicol 33:202–210. https:// 160. Goh GB, Hodas NO, Vishnu A (2017) Deep learning for compu-
doi.org/10.1021/acs.chemrestox.9b00238 tational chemistry. J Comput Chem 38(16):1291–1307. https://
144. Raja K, Patrick M, Elder JT, Tsoi LC (2017) Machine learn- doi.org/10.1002/jcc.24764
ing workflow to enhance predictions of adverse drug reactions 161. Yang X, Wang Y, Byrne R et al (2019) Concepts of artificial
(ADRs) through drug-gene interactions: application to drugs intelligence for computer-assisted drug discovery. Chem. Rev
for cutaneous diseases. Sci Rep 7:1–11. https://doi.org/10.1038/ 119(18):10520–10594. https://doi.org/10.1021/acs.chemrev.
s41598-017-03914-3 8b00728
145. Daina A, Michielin O, Zoete V (2017) SwissADME: a free web 162. Arciniega M, Lange OF (2014) Improvement of virtual screen-
tool to evaluate pharmacokinetics, drug-likeness and medicinal ing results by docking data feature analysis. J Chem Inf Model.
chemistry friendliness of small molecules. Sci Rep. https://doi. https://doi.org/10.1021/ci500028u
org/10.1038/srep42717 163. Feinstein WP, Brylinski M (2015) Calculating an optimal box
146. Rost B, Liu J, Nair R et al (2003) Automatic prediction of protein size for ligand docking and virtual screening against experimen-
function. Cell Mol Life Sci 60:2637–2650. https://doi.org/10. tal and predicted binding pockets. J Cheminform. https://d oi.o rg/
1007/s00018-003-3114-8 10.1186/s13321-015-0067-5
147. Browne F, Zheng H, Wang H, Azuaje F (2010) From experimen- 164. Gazgalis D, Zaka M, Zaka M et al (2020) Protein binding pocket
tal approaches to computational techniques: a review on the pre- optimization for virtual high-throughput screening (vHTS) drug
diction of protein-protein interactions. Adv Artif Intell. https:// discovery. ACS Omega. https://d oi.o rg/1 0.1 021/a csome ga.0 c005
doi.org/10.1155/2010/924529 22
148. Hale WH (1913) American association for the advancement of 165. Carpenter KA, Huang X (2018) Machine learning-based virtual
science. Sci Am 75:34–34. https://doi.org/10.1038/scientific screening and its applications to Alzheimer’s drug discovery:
american01181913-34supp a review. Curr Pharm Des. https://doi.org/10.2174/1381612824
149. Troyanskaya OG, Dolinski K, Owen AB et al (2003) A Bayesian 666180607124038
framework for combining heterogeneous data sources for gene 166. Serafim MSM, Kronenberger T, Oliveira PR et al (2020) The
function prediction (in Saccharomyces cerevisiae). Proc Natl application of machine learning techniques to innovative anti-
Acad Sci U S A 100:8348–8353. https://doi.org/10.1073/pnas. bacterial discovery and development. Expert Opin Drug Discov.
0832373100 https://doi.org/10.1080/17460441.2020.1776696
150. You ZH, Lei YK, Zhu L et al (2013) Prediction of protein- 167. Melville J, Burke E, Hirst J (2009) Machine learning in virtual
protein interactions from amino acid sequences with ensemble screening. Comb Chem High Throughput Screen. https://d oi.o rg/
extreme learning machines and principal component analy- 10.2174/138620709788167980
sis. BMC Bioinformatics 14:1–11. https://doi.org/10.1186/ 168. Wójcikowski M, Ballester PJ, Siedlecki P (2017) Performance
1471-2105-14-S8-S10 of machine-learning scoring functions in structure-based virtual
151. Du X, Sun S, Hu C et al (2017) DeepPPI: boosting prediction of screening. Sci Rep. https://doi.org/10.1038/srep46710
protein-protein interactions with deep neural networks. J Chem 169. Carpenter KA, Cohen DS, Jarrell JT, Huang X (2018) Deep
Inf Model 57:1499–1510. https://d oi.o rg/1 0.1 021/a cs.j cim.7 b000 learning and virtual drug screening. Future Med Chem.
28 10(21):2557–2567. https://doi.org/10.4155/fmc-2018-0314
152. Cunningham JM, Koytiger G, Sorger PK, AlQuraishi M (2020) 170. Labbé CM, Rey J, Lagorce D et al (2015) MTiOpenScreen: a web
Biophysical prediction of protein–peptide interactions and sign- server for structure-based virtual screening. Nucleic Acids Res.
aling networks using machine learning. Nat Methods 17:175– https://doi.org/10.1093/nar/gkv306
183. https://doi.org/10.1038/s41592-019-0687-1 171 Schellhammer I, Rarey M (2004) FlexX-Scan: Fast, structure-
153. Chatterjee P, Basu S, Kundu M et al (2011) PPI_SVM: pre- based virtual screening. Proteins Struct Funct Bioinforma
diction of protein-protein interactions using machine learning, 57:504–517. https://doi.org/10.1002/prot.20217
13
1352 Molecular Diversity (2021) 25:1315–1360
172. Perez-Castillo Y, Sotomayor-Burneo S, Jimenes-Vargas K et al 189. Wang Y, Dou X, Jiang L et al (2019) Discovery of novel glycogen
(2019) CompScore: boosting structure-based virtual screening synthase kinase-3α inhibitors: Structure-based virtual screening,
performance by incorporating docking scoring function compo- preliminary SAR and biological evaluation for treatment of acute
nents into consensus scoring. J Chem Inf Model. https://doi.org/ myeloid leukemia. Eur J Med Chem. https://doi.org/10.1016/j.
10.1021/acs.jcim.9b00343 ejmech.2019.03.039
173. Skalic M, Martínez-Rosell G, Jiménez J, De Fabritiis G (2019) 190. Wang Q, Xu J, Li Y et al (2018) Identification of a novel pro-
PlayMolecule bindscope: large scale CNN-based virtual screen- tein arginine methyltransferase 5 inhibitor in non-small cell lung
ing on the web. Bioinformatics. https://doi.org/10.1093/bioin cancer by structure-based virtual screening. Front Pharmacol.
formatics/bty758 https://doi.org/10.3389/fphar.2018.00173
174. Fang Y, Ding Y, Feinstein WP et al (2016) GeauxDock: acceler- 191. Sharma K, Patidar K, Ali MA et al (2018) Structure-based virtual
ating structure-based virtual screening with heterogeneous com- screening for the identification of high affinity compounds as
puting. PLoS ONE. https://doi.org/10.1371/journal.pone.01588 potent vegfr2 inhibitors for the treatment of renal cell carcinoma.
98 Curr Top Med Chem. https://d oi.o rg/1 0.2 174/1 56802 66196 6618
175. Pires DEV, Veloso WNP, Myung YC et al (2020) EasyVS: a 1130142237
user-friendly web-based tool for molecule library selection and 192. Yousuf Z, Iman K, Iftikhar N, Mirza MU (2017) Structure-based
structure-based virtual screening. Bioinformatics. https://d oi.o rg/ virtual screening and molecular docking for the identification of
10.1093/bioinformatics/btaa480 potential multi-targeted inhibitors against breast cancer. Breast
176. Ibrahim TM, Bauer MR, Boeckler FM (2015) Applying DEKOIS Cancer Targets Ther. https://doi.org/10.2147/BCTT.S132074
2.0 in structure-based virtual screening to probe the impact of 193. Leão M, Pereira C, Bisio A et al (2013) Discovery of a new
preparation procedures and score normalization. J Cheminform. small-molecule inhibitor of p53-MDM2 interaction using a
https://doi.org/10.1186/s13321-015-0074-6 yeast-based approach. Biochem Pharmacol. https://doi.org/10.
177. Shin WH, Christoffer CW, Wang J, Kihara D (2016) PL-Patch- 1016/j.bcp.2013.01.032
Surfer2: improved local surface matching-based virtual screening 194. Gahlawat A, Kumar N, Kumar R et al (2020) Structure-based
method that is tolerant to target and ligand structure variation. J virtual screening to discover potential lead molecules for the
Chem Inf Model. https://doi.org/10.1021/acs.jcim.6b00163 SARS-CoV-2 main protease. J Chem Inf Model. https://doi.org/
178. Litfin T, Zhou Y, Yang Y (2017) SPOT-ligand 2: improving 10.1021/acs.jcim.0c00546
structure-based virtual screening by binding-homology search on 195. Selvaraj C, Dinesh DC, Panwar U et al (2020) Structure-based
an expanded structural template library. Bioinformatics. https:// virtual screening and molecular dynamics simulation of SARS-
doi.org/10.1093/bioinformatics/btw829 CoV-2 guanine-N7 methyltransferase (nsp14) for identifying
179. Ropp PJ, Spiegel JO, Walker JL et al (2019) GypSUm-DL: An antiviral inhibitors against COVID-19. J Biomol Struct Dyn.
open-source program for preparing small-molecule libraries for https://doi.org/10.1080/07391102.2020.1778535
structure-based virtual screening. J Cheminform. https://doi.org/ 196. Cruz JV, Neto MFA, Silva LB et al (2018) Identification of novel
10.1186/s13321-019-0358-3 protein kinase receptor type 2 inhibitors using pharmacophore
180. Akbar R, Jusoh SA, Amaro RE, Helms V (2017) ENRI: a tool for and structure-based virtual screening. Molecules. https://d oi.o rg/
selecting structure-based virtual screening target conformations. 10.3390/molecules23020453
Chem Biol Drug Des. https://doi.org/10.1111/cbdd.12900 197. Kannan S, Melesina J, Hauser AT et al (2014) Discovery of
181. Kellenberger E, Springael JY, Parmentier M et al (2007) Identifi- inhibitors of schistosoma mansoni hdac8 by combining homol-
cation of nonpeptide CCR5 receptor agonists by structure-based ogy modeling, virtual screening, and in vitro validation. J Chem
virtual screening. J Med Chem. https://doi.org/10.1021/jm061 Inf Model. https://doi.org/10.1021/ci5004653
389p 198. Zoete V, Daina A, Bovigny C, Michielin O (2016) SwissSimi-
182. De Graaf C, Rognan D (2008) Selective structure-based virtual larity: a web tool for low to ultra high throughput ligand-based
screening for full and partial agonists of the β2 adrenergic recep- virtual screening. J Chem Inf Model. https://d oi.o rg/1 0.1 021/a cs.
tor. J Med Chem. https://doi.org/10.1021/jm800710x jcim.6b00174
183. Vidler LR, Filippakopoulos P, Fedorov O et al (2013) Discovery 199. Imbernón B, Cecilia JM, Pérez-Sánchez H, Giménez D (2018)
of novel small-molecule inhibitors of BRD4 using structure- METADOCK: a parallel metaheuristic schema for virtual screen-
based virtual screening. J Med Chem. https://doi.org/10.1021/ ing methods. Int J High Perform Comput Appl. https://doi.org/
jm4011302 10.1177/1094342017697471
184. Liu LJ, Leung KH, Chan DSH et al (2014) Identification of a 200. Riniker S, Landrum GA (2013) Open-source platform to bench-
natural product-like STAT3 dimerization inhibitor by structure- mark fingerprints for ligand-based virtual screening. J Chemin-
based virtual screening. Cell Death Dis. https://doi.org/10.1038/ form. https://doi.org/10.1186/1758-2946-5-26
cddis.2014.250 201. Li H, Leung KS, Wong MH, Ballester PJ (2016) USR-VS: a web
185. Yang C, Wang W, Chen L et al (2016) Discovery of a VHL and server for large-scale prospective virtual screening using ultrafast
HIF1α interaction inhibitor with: in vivo angiogenic activity via shape recognition techniques. Nucleic Acids Res. https://d oi.o rg/
structure-based virtual screening. Chem Commun. https://doi. 10.1093/nar/gkw320
org/10.1039/c6cc04938a 202. Suzuki SD, Ohue M, Akiyama Y (2018) PKRank: a novel learn-
186. Zhuang C, Narayanapillai S, Zhang W et al (2014) Rapid identi- ing-to-rank method for ligand-based virtual screening using pair-
fication of Keap1-Nrf2 small-molecule inhibitors through struc- wise kernel and RankSVM. Artif Life Robot. https://doi.org/10.
ture-based virtual screening and hit-based substructure search. J 1007/s10015-017-0416-8
Med Chem. https://doi.org/10.1021/jm4017174 203. Patel H, Brinkjost T, Koch O (2017) PyGOLD: a python based
187. Dou X, Jiang L, Wang Y et al (2018) Discovery of new GSK-3β API for docking based virtual screening workflow generation.
inhibitors through structure-based virtual screening. Bioorganic Bioinformatics. https://doi.org/10.1093/bioinformatics/btx197
Med Chem Lett. https://doi.org/10.1016/j.bmcl.2017.11.036 204. Banegas-Luna AJ, Cerón-Carrasco JP, Puertas-Martín S, Pérez-
188. Liu Y, Ren Y, Cao Y et al (2017) Discovery of a low toxicity Sánchez H (2019) BRUSELAS: HPC generic and customizable
O-GlcNAc Transferase (OGT) inhibitor by structure-based vir- software architecture for 3D ligand-based virtual screening of
tual screening of natural products. Sci Rep. https://doi.org/10. large molecular databases. J Chem Inf Model. https://doi.org/
1038/s41598-017-12522-0 10.1021/acs.jcim.9b00279
13
Molecular Diversity (2021) 25:1315–1360 1353
205. Wang L, Pang X, Li Y et al (2017) RADER: a rapid decoy J Biomol Struct Dyn. https://doi.org/10.1080/07391102.2020.
retriever to facilitate decoy based assessment of virtual screen- 1780946
ing. Bioinformatics. https://doi.org/10.1093/bioinformatics/ 222. Ferraz WR, Gomes RA, Novaes ALS, Goulart Trossini GH
btw783 (2020) Ligand and structure-based virtual screening applied to
206. Mochizuki M, Suzuki SD, Yanagisawa K et al (2019) QEX: the SARS-CoV-2 main protease: an in silico repurposing study.
target-specific druglikeness filter enhances ligand-based Future Med Chem. https://doi.org/10.4155/fmc-2020-0165
virtual screening. Mol Divers. https:// d oi. o rg/ 1 0. 1 007/ 223. Choudhary S, Malik YS, Tomar S (2020) Identification of SARS-
s11030-018-9842-3 CoV-2 Cell entry inhibitors by drug repurposing using in silico
207. Zhang H, Liao L, Cai Y et al (2019) IVS2vec: a tool of inverse structure-based virtual screening approach. Front Immunol.
virtual screening based on word2vec and deep learning tech- https://doi.org/10.3389/fimmu.2020.01664
niques. Methods. https://doi.org/10.1016/j.ymeth.2019.03.012 224. Xiao T, Qi X, Chen Y, Jiang Y (2018) Development of Ligand-
208. Arcon JP, Modenutti CP, Avendaño D et al (2019) AutoDock based big data deep neural network models for virtual screening
Bias: improving binding mode prediction and virtual screening of large compound libraries. Mol Inform. https://d oi.o rg/1 0.1 002/
using known protein-ligand interactions. Bioinformatics. https:// minf.201800031
doi.org/10.1093/bioinformatics/btz152 225. Hu J, Liu Z, Yu DJ, Zhang Y (2018) LS-align: An atom-level,
209. Ebejer JP, Finn PW, Wong WK et al (2019) Ligity: a non-super- flexible ligand structural alignment algorithm for high-through-
positional, knowledge-based approach to virtual screening. J put virtual screening. In: Bioinformatics 34(13): 2209–2218;
Chem Inf Model. https://doi.org/10.1021/acs.jcim.8b00779 https://doi.org/https://doi.org/10.1093/bioinformatics/bty081
210. Zhu Z, Wang X, Yang Y et al (2020) D3Similarity: a ligand- 226. Ha EJ, Lwin CT, Durrant JD (2020) LigGrep: a tool for filtering
based approach for predicting drug targets and for virtual screen- docked poses to improve virtual-screening hit rates. J Chemin-
ing of active compounds against COVID-19. ChemRxiv. https:// form. https://doi.org/10.1186/s13321-020-00471-2
doi.org/10.26434/chemrxiv.11959323.v1 227. Spiegel JO, Durrant JD (2020) AutoGrow4: an open-source
211. Bharti DR, Hemrom AJ, Lynn AM (2019) GCAC: Galaxy genetic algorithm for de novo drug design and lead optimiza-
workflow system for predictive model building for virtual tion. J Cheminform. https://d oi.o rg/1 0.1 186/s 13321-0 20-0 0429-4
screening. BMC Bioinformatics. https:// d oi. o rg/ 1 0. 1 186/ 228. Chen P, Ke Y, Lu Y et al (2019) Dligand2: an improved knowl-
s12859-018-2492-8 edge-based energy function for protein–ligand interactions using
212. Kong Y, Bender A, Yan A (2018) Identification of Novel Aurora the distance-scaled, finite, ideal-gas reference state. J Chemin-
Kinase A (AURKA) Inhibitors via Hierarchical Ligand-Based form. https://doi.org/10.1186/s13321-019-0373-4
Virtual Screening. J Chem Inf Model. https://doi.org/10.1021/ 229. Gattani S, Mishra A, Hoque MT (2019) StackCBPred: a stack-
acs.jcim.7b00300 ing based prediction of protein-carbohydrate binding sites from
213. Musumeci D, Amato J, Zizza P et al (2017) Tandem application sequence. Carbohydr Res. https://doi.org/10.1016/j.carres.2019.
of ligand-based virtual screening and G4-OAS assay to identify 107857
novel G-quadruplex-targeting chemotypes. Biochim Biophys 230. Li X, Yan X, Yang Y et al (2019) LSA: a local-weighted struc-
Acta - Gen Subj. https://doi.org/10.1016/j.bbagen.2017.01.024 tural alignment tool for pharmaceutical virtual screening. RSC
214. Yu M, Gu Q, Xu J (2018) Discovering new PI3Kα inhibitors Adv. https://doi.org/10.1039/c8ra08915a
with a strategy of combining ligand-based and structure-based 231. Seifert MHJ (2005) ProPose: steered virtual screening by simul-
virtual screening. J Comput Aided Mol Des. https://doi.org/10. taneous protein-ligand docking and ligand-ligand alignment. J
1007/s10822-017-0092-8 Chem Inf Model. https://doi.org/10.1021/ci0496393
215. Halim SA, Khan S, Khan A et al (2017) Targeting dengue virus 232. Schellhammer I, Rarey M (2007) TrixX: Structure-based
NS-3 Helicase by Ligand based Pharmacophore Modeling and molecule indexing for large-scale virtual screening in sublin-
structure based virtual screening. Front Chem. https://d oi.o rg/1 0. ear time. J Comput Aided Mol Des. https://doi.org/10.1007/
3389/fchem.2017.00088 s10822-007-9103-5
216. Debnath S, Debnath T, Bhaumik S et al (2019) Discovery of 233. Lagarde N, Goldwaser E, Pencheva T et al (2019) A free web-
novel potential selective HDAC8 inhibitors by combine ligand- based protocol to assist structure-based virtual screening experi-
based, structure-based virtual screening and in-vitro biological ments. Int J Mol Sci. https://doi.org/10.3390/ijms20184648
evaluation. Sci Rep. https://d oi.o rg/1 0.1 038/s 41598-0 19-5 3376-y 234. Rifaioglu AS, Nalbat E, Atalay V et al (2020) DEEPScreen:
217. Fu Y, Sun YN, Yi KH et al (2017) 3D pharmacophore-based high performance drug-target interaction prediction with con-
virtual screening and docking approaches toward the discovery volutional neural networks using 2-D structural compound rep-
of novel HPPD inhibitors. Molecules. https://doi.org/10.3390/ resentations. Chem Sci. https://doi.org/10.1039/c9sc03414e
molecules22060959 235. Obrezanova O, Segall MD (2010) Gaussian processes for classi-
218. Krishna S, Shukla S, Lakra AD et al (2017) Identification of fication: QSAR modeling of ADMET and target activity. J Chem
potent inhibitors of DNA methyltransferase 1 (DNMT1) through Inf Model. https://doi.org/10.1021/ci900406x
a pharmacophore-based virtual screening approach. J Mol Graph 236. Wu Z, Zhu M, Kang Y et al (2020) Do we need different machine
Model. https://doi.org/10.1016/j.jmgm.2017.05.014 learning algorithms for QSAR modeling? A comprehensive
219. Pérez-Nueno VI, Pettersson S, Ritchie DW et al (2009) Discovery assessment of 16 machine learning algorithms on 14 QSAR data
of novel HIV entry inhibitors for the CXCR4 receptor by pro- sets. Brief Bioinform. https://doi.org/10.1093/bib/bbaa321
spective virtual screening. J Chem Inf Model. https://doi.org/10. 237. Obrezanova O, Csányi G, Gola JMR, Segall MD (2007) Gaussian
1021/ci800468q processes: a method for automatic QSAR modeling of ADME
220. Hofmarcher M, Mayr A, Rumetshofer E et al (2020) Large-scale properties. J Chem Inf Model. https://d oi.o rg/1 0.1 021/c i7000 633
ligand-based virtual screening for SARS-CoV-2 inhibitors using 238. Benfenati E, Manganaro A, Gini G (2013) VEGA-QSAR: AI
deep neural networks. SSRN Electron J. https://doi.org/10.2139/ inside a platform for predictive toxicology. In: CEUR Workshop
ssrn.3561442 Proceedings
221. Amin SA, Ghosh K, Gayen S, Jha T (2020) Chemical-informat- 239. Ambure P, Halder AK, González Díaz H, Cordeiro MNDS
ics approach to COVID-19 drug discovery: monte carlo based (2019) QSAR-Co: an open source software for developing robust
QSAR, virtual screening and molecular docking study of some multitasking or multitarget classification-based QSAR models. J
in-house molecules as papain-like protease (PLpro) inhibitors. Chem Inf Model. https://doi.org/10.1021/acs.jcim.9b00295
13
1354 Molecular Diversity (2021) 25:1315–1360
240. Chen S, Xue D, Chuai G et al (2020) FL-QSAR: a federated and Molecular Docking of Hit Compounds to DPP-8 and DPP-9
learning based QSAR prototype for collaborative drug discovery. Enzymes. https://doi.org/10.21203/rs.2.22282/v1
Bioinformatics. https://doi.org/10.1093/bioinformatics/btaa1006 258. Tian Y, Zhang S, Yin H, Yan A (2020) Quantitative structure-
241. Olier I, Sadawi N, Bickerton GR et al (2018) Meta-QSAR: a activity relationship (QSAR) models and their applicability
large-scale application of meta-learning to drug design and dis- domain analysis on HIV-1 protease inhibitors by machine learn-
covery. Mach Learn. https://d oi.o rg/1 0.1 007/s 10994-0 17-5 685-x ing methods. Chemom Intell Lab Syst. https://doi.org/10.1016/j.
242. Soufan O, Ba-Alawi W, Magana-Mora A et al (2018) DPubChem: chemolab.2019.103888
a web tool for QSAR modeling and high-throughput virtual 259. Wei Y, Li W, Du T et al (2019) Targeting HIV/HCV coinfection
screening. Sci Rep. https://d oi.o rg/1 0.1 038/s 41598-0 18-2 7495-x using a machine learning-based multiple quantitative structure-
243. Karpov P, Godin G, Tetko IV (2020) Transformer-CNN: swiss Activity Relationships (Multiple QSAR) Method. Int J Mol Sci.
knife for QSAR modeling and interpretation. J Cheminform. https://doi.org/10.3390/ijms20143572
https://doi.org/10.1186/s13321-020-00423-w 260. Michel Kana (2020) Handling Missing Data For Advanced
244. Wang Y-L, Wang F, Shi X-X et al (2020) Cloud 3D-QSAR: a Machine Learning
web tool for the development of quantitative structure–activity 261. Kumar S (2020) 7 Ways to Handle Missing Values in Machine
relationship models in drug discovery. Brief Bioinform. https:// Learning | by Satyam Kumar | Towards Data Science
doi.org/10.1093/bib/bbaa276 262. Gad SC (2014) QSAR. In: Third E (ed) Wexler PBT- Encyclo-
245. Goh GB, Siegel C, Vishnu A, et al (2017) Chemception: A deep pedia of Toxicology. Academic Press, Oxford, pp 1–9
neural network with minimal chemistry knowledge matches the 263. Neves BJ, Braga RC, Melo-Filho CC et al (2018) QSAR-Based
performance of expert-developed QSAR/QSPR models. arXiv Virtual Screening: Advances and Applications in Drug Discov-
246. Reis J, Cagide F, Chavarria D et al (2016) Discovery of new ery. Front Pharmacol 9:1275. https://d oi.o rg/1 0.3 389/f phar.2 018.
chemical entities for old targets: insights on the lead optimization 01275
of chromone-based monoamine oxidase B (MAO-B) inhibitors. 264. Roy K, Kar S, Das RN (2015) Chapter 9 - Newer QSAR Tech-
J Med Chem. https://doi.org/10.1021/acs.jmedchem.6b00527 niques. In: Roy K, Kar S, Das RN, Book Title- Understanding
247. Hoelz L, Horta B, Araújo J et al (2010) Quantitative structure- the Basics of QSAR for Applications in Pharmaceutical Sciences
activity relationships of antioxidant phenolic compounds. J Chem and Risk Assessment (eds). Academic Press, Boston,
Pharm Res 2(5):291–306 265. Kwon S, Bae H, Jo J, Yoon S (2019) Comprehensive ensemble
248. Zhang Y, Han Z, Gao Q et al (2019) Prediction of K562 Cells in QSAR prediction for drug discovery. BMC Bioinformatics
Functional Inhibitors Based on Machine Learning Approaches. 20:521. https://doi.org/10.1186/s12859-019-3135-4
Curr Pharm Des. https://doi.org/10.2174/138161282566619 266. Roy K, Kar S, Das RN (2015) Chapter 12 - Future Avenues. In:
1107092214 Roy K, Kar S, Das RN, Book Title- Understanding the Basics
249. Halder AK, Giri AK, Dias Soeiro Cordeiro MN (2019) Multi- of QSAR for Applications in Pharmaceutical Sciences and Risk
target chemometric modelling, fragment analysis and virtual Assessment (eds). Academic Press, Boston, pp 455–462. https://
screening with ERK inhibitors as potential anticancer agents. doi.org/https://doi.org/10.1016/B978-0-12-801505-6.00012-0
Molecules. https://doi.org/10.3390/molecules24213909 267. Paolini GV, Shapland RHB, Van Hoorn WP et al (2006) Global
250. Halder AK, Cordeiro MNDS (2019) Development of multi- mapping of pharmacological space. Nat Biotechnol. https://doi.
target chemometric models for the inhibition of class I PI3K org/10.1038/nbt1228
enzyme isoforms: a case study using QSAR-Co tool. Int J Mol 268. Koch U, Hamacher M, Nussbaumer P (2014) Cheminformatics
Sci. https://doi.org/10.3390/ijms20174191 at the interface of medicinal chemistry and proteomics. Biochim
251. Kim S, Cho KH (2019) PyQSAR: a fast QSAR modeling plat- Biophys Acta-Proteins Proteomics 1844(1):156–61; https://doi.
form using machine learning and jupyter notebook. Bull Korean org/10.1016/j.bbapap.2013.05.010
Chem Soc. https://doi.org/10.1002/bkcs.11638 269. Makhouri FR, Ghasemi JB (2019) Combating diseases with com-
252. Ben Geoffrey AS, Christian Prasana J, Muthu S (2020) Structure- putational strategies used for drug design and discovery. Curr
activity relationship of Quercetin and its tumor necrosis factor Top Med Chem. https://doi.org/10.2174/156802661966619
alpha inhibition activity by computational and machine learn- 0121125106
ing methods. Mater Today Proc. https://doi.org/10.1016/j.matpr. 270. Würth R, Thellung S, Bajetto A et al (2016) Drug-repositioning
2020.07.464 opportunities for cancer therapy: novel molecular targets for
253. Ben Geoffrey A S, Rafal Madaj, Akhil Sanker, Mario Sergio known compounds. Drug Discov Today 21(1):190–199. https://
Valdés Tresanco, Host Antony Davidd, Gitanjali Roy, Rinnu doi.org/10.1016/j.drudis.2015.09.017
Sarah Saji, Abdulbasit Haliru Yakubu BM Automated In Silico 271. Joachim Haupt V, Schroeder M (2011) Old friends in new guise:
Identification of Drug Candidates for Coronavirus Through a repositioning of known drugs with structural bioinformatics.
Novel Programmatic Tool and Extensive Computational (MD, Brief Bioinform. https://doi.org/10.1093/bib/bbr011
DFT) Studies of Select Drug Candidatesl; https://doi.org/https:// 272. Butcher EC (2005) Can cell systems biology rescue drug discov-
doi.org/10.26434/chemrxiv.12423638.v3 ery? Nat Rev Drug Discov. https://doi.org/10.1038/nrd1754
254. Žuvela P, David J, Wong MW (2018) Interpretation of ANN- 273. Iyengar R, Zhao S, Chung SW et al (2012) Merging systems
based QSAR models for prediction of antioxidant activity of biology with pharmacodynamics. Sci Transl Med 4(126):126ps7.
flavonoids. J Comput Chem. https://doi.org/10.1002/jcc.25168 https://doi.org/10.1126/scitranslmed.3003563
255. Ding Q, Hou S, Zu S et al (2020) VISAR: an interactive tool 274. Martínez V, Navarro C, Cano C et al (2015) DrugNet: network-
for dissecting chemical features learned by deep neural network based drug-disease prioritization by integrating heterogeneous
QSAR models. Bioinformatics. https://doi.org/10.1093/bioin data. Artif Intell Med. https://doi.org/10.1016/j.artmed.2014.11.
formatics/btaa187 003
256. Gadaleta D, Manganelli S, Roncaglioni A et al (2018) QSAR 275. Zhang W, Xu H, Li X et al (2020) DRIMC: an improved drug
modeling of ToxCast assays relevant to the molecular initiating repositioning approach using Bayesian inductive matrix com-
events of AOPs leading to hepatic steatosis. J Chem Inf Model. pletion. Bioinformatics. https://doi.org/10.1093/bioinformatics/
https://doi.org/10.1021/acs.jcim.8b00297 btaa062
257. Hermansyah O, Bustamam A, Yanuar A (2020) Virtual Screening 276. Luo H, Zhang P, Cao XH et al (2016) DPDR-CPI, a server
of DPP-4 Inhibitors Using QSAR-Based Artificial Intelligence that predicts drug positioning and drug repositioning via
13
Molecular Diversity (2021) 25:1315–1360 1355
chemical-protein interactome. Sci Rep. https://doi.org/10.1038/ cancer cells. Cancer Cell. https://doi.org/10.1016/j.ccell.2020.
srep35996 09.014
277. Zhu Q, Tao C, Shen F, Chute CG (2014) Exploring the phar- 295. Wang Z, Zhou M, Arnold C (2020) Toward heterogeneous
macogenomics knowledge base (pharmgkb) for repositioning information fusion: bipartite graph convolutional networks for
breast cancer drugs by leveraging Web ontology language (owl) in silico drug repurposing. Bioinformatics. https://doi.org/10.
and cheminformatics approaches. In: Pacific Symposium on 1093/bioinformatics/btaa437
Biocomputing 296. Pinzi L, Rastelli G (2019) Molecular docking: Shifting paradigms
278 Gallo K, Goede A, Eckert A et al (2020) PROMISCUOUS 2.0: in drug discovery. Int J Mol Sci. https://doi.org/10.3390/ijms2
a resource for drug-repositioning. Nucleic Acids Res. https://d oi. 0184331
org/10.1093/nar/gkaa1061 297. Muhammed MT, Aki-Yalcin E (2019) Homology modeling in
279. Luo H, Li M, Wang S et al (2018) Computational drug reposi- drug discovery: overview, current applications, and future per-
tioning using low-rank matrix approximation and randomized spectives. Chem Biol Drug Des 93:12–20. https://doi.org/10.
algorithms. Bioinformatics. https://doi.org/10.1093/bioinforma 1111/cbdd.13388
tics/bty013 298. Lynch SR, Bothwell T, Campbell L et al (2007) A comparison of
280 Yella JK, Jegga AG (2020) MGATRx: discovering drug repo- physical properties, screening procedures and a human efficacy
sitioning candidates using multi-view graph attention. biorxiv. trial for predicting the bioavailability of commercial elemental
https://doi.org/10.1101/2020.06.29.171876 iron powders used for food fortification. Int J Vitam Nutr Res.
281. Yan CK, Wang WX, Zhang G et al (2019) BiRWDDA: a novel https://doi.org/10.1024/0300-9831.77.2.107
drug repositioning method based on multisimilarity fusion. J 299. Schneider P, Walters WP, Plowright AT et al (2020) Rethinking
Comput Biol. https://doi.org/10.1089/cmb.2019.0063 drug design in the artificial intelligence era. Nat Rev Drug Dis-
282. Fahimian G, Zahiri J, Arab SS, Sajedi RH (2019) RepCOOL: cov 19:353–364. https://doi.org/10.1038/s41573-019-0050-3
computational drug repositioning via integrating heterogeneous 300. Chen H, Engkvist O, Wang Y et al (2018) The rise of deep learn-
biological networks. biorxiv. https://doi.org/10.1101/817882 ing in drug discovery. Drug Discov Today 23(6):1241–1250.
283. Li Z, Yao Y, Cheng X, et al (2020) A Computational Framework https://doi.org/10.1016/j.drudis.2018.01.039
of Host-Based Drug Repositioning for Broad-Spectrum Anti- 301. Lusci A, Pollastri G, Baldi P (2013) Deep architectures and deep
virals against RNA Viruses. https://doi.org/10.26434/chemrxiv. learning in chemoinformatics: the prediction of aqueous solubil-
12927260.v1 ity for drug-like molecules. J Chem Inf Model. https://doi.org/
284. Wu D, Gao W, Li X et al (2020) Dr AFC: drug repositioning 10.1021/ci400187y
through anti-fibrosis characteristic. Brief Bioinform. https://d oi. 302. Kumar R, Sharma A, Siddiqui MH, Tiwari RK (2017) Predic-
org/10.1093/bib/bbaa115 tion of human intestinal absorption of compounds using artificial
285. Hooshmand SA, Zarei Ghobadi M, Hooshmand SE et al (2020) intelligence techniques. Curr Drug Discov Technol. https://doi.
A multimodal deep learning-based drug repurposing approach org/10.2174/1570163814666170404160911
for treatment of COVID-19. Mol Divers. https://d oi.o rg/1 0.1 007/ 303. Zang Q, Mansouri K, Williams AJ et al (2017) In silico predic-
s11030-020-10144-9 tion of physicochemical properties of environmental chemicals
286. Zhou Y, Hou Y, Shen J et al (2020) Network-based drug repur- using molecular fingerprints and machine learning. J Chem Inf
posing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Model. https://doi.org/10.1021/acs.jcim.6b00625
Discov. https://doi.org/10.1038/s41421-020-0153-3 304. Tetko IV, Gasteiger J, Todeschini R et al (2005) Virtual compu-
287. Zheng X, He S, Song X, et al (2018) DTI-RCNN: New efficient tational chemistry laboratory-design and description. J Comput
hybrid neural network model to predict drug–target interactions. Aided Mol Des. https://doi.org/10.1007/s10822-005-8694-y
In: Lecture Notes in Computer Science (including subseries 305. Radchenko E V, Palyulin VA, Zefirov NS (2002) Virtual compu-
Lecture Notes in Artificial Intelligence and Lecture Notes in tational chemistry laboratory. System
Bioinformatics) 306. Royal Society of Chemistry (2015) ChemSpider. Search and
288. Jarada TN, Rokne JG, Alhajj R (2020) SNF–CVAE: computa- Share Chemistry. R. Soc, Chem
tional method to predict drug–disease interactions using simi- 307. Kucukdereli H, Allen NJ, Lee AT et al (2011) Control of excita-
larity network fusion and collective variational autoencoder. tory CNS synaptogenesis by astrocyte-secreted proteins hevin
Knowledge-Based Syst. https://doi.org/10.1016/j.knosys.2020. and SPARC. Proc Natl Acad Sci U S A. https://doi.org/10.1073/
106585 pnas.1104977108
289. Xu R, Wang QQ (2015) PhenoPredict: a disease phenome-wide 308. Ayati A, Falahati M, Irannejad H, Emami S (2012) Synthesis,
drug repositioning approach towards schizophrenia drug discov- in vitro antifungal evaluation and in silico study of 3-azolyl-
ery. J Biomed Inform. https://doi.org/10.1016/j.jbi.2015.06.027 4-chromanone phenylhydrazones. DARU, J Pharm Sci. https://
290. Wu Z, Cheng F, Li J et al (2017) SDTNBI: an integrated network doi.org/10.1186/2008-2231-20-46
and chemoinformatics tool for systematic prediction of drug-tar- 309. Rashid M (2020) Design, synthesis and ADMET prediction of
get interactions and drug repositioning. Brief Bioinform. https:// bis-benzimidazole as anticancer agent. Bioorg Chem. https://d oi.
doi.org/10.1093/bib/bbw012 org/10.1016/j.bioorg.2020.103576
291. Zeng X, Zhu S, Liu X et al (2019) DeepDR: a network-based 310. Puratchikody A, Sriram D, Umamaheswari A, Irfan N (2016)
deep learning approach to in silico drug repositioning. Bioinfor- 3-D structural interactions and quantitative structural toxic-
matics. https://doi.org/10.1093/bioinformatics/btz418 ity studies of tyrosine derivatives intended for safe potent
292. Chen H, Cheng F, Li J (2020) IDrug: Integration of drug reposi- inflammation treatment. Chem Cent J. https://doi.org/10.1186/
tioning and drug-target prediction via cross-network embedding. s13065-016-0169-9
PLoS Comput Biol. https://doi.org/10.1371/journal.pcbi.10080 311. Nascimento ACA, Prudêncio RBC, Costa IG (2016) A mul-
40 tiple kernel learning algorithm for drug-target interaction
293. Li B, Dai C, Wang L et al (2020) A novel drug repurposing prediction. BMC Bioinformatics. https:// d oi. o rg/ 1 0. 1 186/
approach for non-small cell lung cancer using deep learning. s12859-016-0890-3
PLoS ONE. https://doi.org/10.1371/journal.pone.0233112 312. Öztürk H, Özgür A, Ozkirimli E (2018) A chemical language
294. Kuenzi BM, Park J, Fong SH et al (2020) Predicting drug based approach for protein-Ligand interaction prediction. arXiv
response and synergy using a deep learning model of human https://doi.org/10.1002/minf.202000212
13
1356 Molecular Diversity (2021) 25:1315–1360
313. Nascimento ACA, Prudêncio RBC, Costa IG (2019) A drug- 330. Lysenko A, Sharma A, Boroevich KA, Tsunoda T (2018) An
target network-based supervised machine learning repurposing integrative machine learning approach for prediction of toxicity-
method allowing the use of multiple heterogeneous information related drug safety. Life Sci Alliance. https://doi.org/10.26508/
sources. Methods Mol Biol 1903:281–289. https://doi.org/10. lsa.201800098
1007/978-1-4939-8955-3_17 331. Zhou B, Sun Q, Kong DX (2016) Predicting cancer-relevant pro-
314. Öztürk H, Özgür A, Ozkirimli E (2018) DeepDTA: Deep drug- teins using an improved molecular similarity ensemble approach.
target binding affinity prediction. Bioinformatics 34(17):i821– Oncotarget. https://doi.org/10.18632/oncotarget.8716
i829. https://doi.org/10.1093/bioinformatics/bty593 332. Huang R, Xia M, Sakamuru S et al (2016) Modelling the Tox21
315. Feng Q, Dueva E, Cherkasov A, Ester M (2018) PADME: A 10 K chemical profiles for in vivo toxicity prediction and mech-
deep learning-based framework for drug-target interaction pre- anism characterization. Nat Commun. https://doi.o rg/10.1038/
diction. arXiv https://arxiv.org/abs/1807.09741v4 ncomms10425
316. Beck BR, Shin B, Choi Y et al (2020) Predicting commercially 333. Gupta VK, Rana PS (2019) Toxicity prediction of small drug
available antiviral drugs that may act on the novel coronavirus molecules of androgen receptor using multilevel ensemble
(SARS-CoV-2) through a drug-target interaction deep learning model. J Bioinform Comput Biol. https://doi.org/10.1142/S0219
model. Comput Struct Biotechnol J. https://doi.org/10.1016/j. 720019500331
csbj.2020.03.025 334. Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) Deep-
317. Lee H, Kim W (2019) Comparison of target features for pre- Tox: toxicity prediction using deep learning. Front Environ Sci.
dicting drug-target interactions by deep neural network based https://doi.org/10.3389/fenvs.2015.00080
on large-scale drug-induced transcriptome data. Pharmaceu- 335. Gayvert KM, Madhukar NS, Elemento O (2016) A data-driven
tics. https://doi.org/10.3390/pharmaceutics11080377 approach to predicting successes and failures of clinical trials.
318. Karimi M, Wu D, Wang Z, Shen Y (2019) DeepAffinity: inter- Cell Chem Biol. https://doi.org/10.1016/j.chembiol.2016.07.023
pretable deep learning of compound-protein affinity through 336. Gilvary C, Elkhader J, Madhukar N et al (2020) A machine learn-
unified recurrent and convolutional neural networks. Bioin- ing and network framework to discover new indications for small
formatics. https://doi.org/10.1093/bioinformatics/btz111 molecules. PLoS Comput Biol. https://doi.org/10.1371/JOURN
319. Born J, Manica M, Cadow J, et al (2020) PaccMannRL on AL.PCBI.1008098
SARS-CoV-2: Designing antiviral candidates with conditional 337. Robledo-Cadena DX, Gallardo-Pérez JC, Dávila-Borja V et al
generative models. arXiv https://arxiv.org/abs/2005.13285v3 (2020) Non-steroidal anti-inflammatory drugs increase cisplatin,
320. Jiang M, Li Z, Bian Y, Wei Z (2019) A novel protein descriptor paclitaxel, and doxorubicin efficacy against human cervix can-
for the prediction of drug binding sites. BMC Bioinformatics. cer cells. Pharmaceuticals (Basel). https://d oi.o rg/1 0.3 390/p h131
https://doi.org/10.1186/s12859-019-3058-0 20463
321. Cañada A, Capella-Gutierrez S, Rabal O et al (2017) LimTox: 338. Simm J, Klambauer G, Arany A et al (2018) Repurposing high-
a web tool for applied text mining of adverse event and toxicity throughput image assays enables biological activity prediction
associations of compounds, drugs and genes. Nucleic Acids for drug discovery. Cell Chem Biol. https://doi.org/10.1016/j.
Res. https://doi.org/10.1093/nar/gkx462 chembiol.2018.01.015
322. Pires DEV, Blundell TL, Ascher DB (2015) pkCSM: Predicting 339. Goh GB, Siegel C, Hodas N, Vishnu A (2017) SMILES2vec: An
small-molecule pharmacokinetic and toxicity properties using interpretable general-purpose deep neural network for predicting
graph-based signatures. J Med Chem. https://doi.org/10.1021/ chemical properties. arXiv https://arxiv.org/abs/1712.02034v2
acs.jmedchem.5b00104 340. Preuer K, Lewis RPI, Hochreiter S et al (2018) Deepsynergy:
323. Cheng F, Li W, Zhou Y et al (2012) AdmetSAR: A comprehen- predicting anti-cancer drug synergy with deep learning. Bioin-
sive source and free tool for assessment of chemical ADMET formatics. https://doi.org/10.1093/bioinformatics/btx806
properties. J Chem Inf Model. https://doi.org/10.1021/ci300 341. Xu Y, Pei J, Lai L (2017) Deep learning based regression and
367a multiclass models for acute oral toxicity prediction with auto-
324. Patlewicz G, Jeliazkova N, Safford RJ et al (2008) An evaluation matic chemical feature extraction. J Chem Inf Model. https://d oi.
of the implementation of the Cramer classification scheme in the org/10.1021/acs.jcim.7b00244
Toxtree software. SAR QSAR Environ Res. https://doi.org/10. 342. Rodrigues T, Werner M, Roth J et al (2018) Machine intelligence
1080/10629360802083871 decrypts β-lapachone as an allosteric 5-lipoxygenase inhibitor.
325. Uygun MT, Amudi K, Turaçlı İD, Menges N (2021) A new Chem Sci. https://doi.org/10.1039/c8sc02634c
synthetic approach for pyrazolo[1,5-a]pyrazine-4(5H)-one 343. Luechtefeld T, Marsh D, Rowlands C, Hartung T (2018) Machine
derivatives and their antiproliferative effects on lung adeno- learning of toxicological big data enables read-across structure
carcinoma cell line. Mol Divers. https:// d oi. o rg/ 1 0. 1 007/ activity relationships (RASAR) outperforming animal test repro-
s11030-020-10161-8 ducibility. Toxicol Sci. https://doi.org/10.1093/toxsci/kfy152
326. Srivastava A, Siddiqui S, Ahmad R et al (2020) Exploring 344. Zhang C, Cheng F, Li W et al (2016) In silico prediction of
nature’s bounty: identification of Withania somnifera as a prom- drug induced liver toxicity using substructure pattern recognition
ising source of therapeutic agents against COVID-19 by virtual method. Mol Inform. https://doi.org/10.1002/minf.201500055
screening and in silico evaluation. J Biomol Struct Dyn. https:// 345. Lei T, Li Y, Song Y et al (2016) ADMET evaluation in drug
doi.org/10.1080/07391102.2020.1835725 discovery: 15. Accurate prediction of rat oral acute toxicity using
327. Attene-Ramos MS, Miller N, Huang R et al (2013) The Tox21 relevance vector machine and consensus modeling. J Chemin-
robotic platform for the assessment of environmental chemicals- form. https://doi.org/10.1186/s13321-016-0117-7
From vision to reality. Drug Discov. Today 18(15–16):716–23. 346. Lei T, Chen F, Liu H et al (2017) ADMET evaluation in drug
https://doi.org/10.1016/j.drudis.2013.05.015 discovery. Part 17: development of quantitative and qualitative
328. Wang Z, Liang L, Yin Z, Lin J (2016) Improving chemical simi- prediction models for chemical-induced respiratory toxicity. Mol
larity ensemble approach in target prediction. J Cheminform. Pharm. https://doi.org/10.1021/acs.molpharmaceut.7b00317
https://doi.org/10.1186/s13321-016-0130-x 347. Lei T, Sun H, Kang Y et al (2017) ADMET evaluation in drug
329. Pu L, Naderi M, Liu T et al (2019) eToxPred: a machine learning- discovery. 18 reliable prediction of chemical-induced urinary
based approach to estimate the toxicity of drug candidates. BMC tract toxicity by boosting machine learning approaches. Mol
Pharmacol Toxicol. https://doi.org/10.1186/s40360-018-0282-6 Pharm. https://doi.org/10.1021/acs.molpharmaceut.7b00631
13
Molecular Diversity (2021) 25:1315–1360 1357
348. Pandya R, Pandya J (2015) C5.0 algorithm to improved deci- 365. Van Vleet TR, Liguori MJ, Lynch JJ et al (2019) Screening strat-
sion tree with feature selection and reduced error pruning. Int egies and methods for better offtarget liability prediction and
J Comput Appl. https://doi.org/10.5120/20639-3318 identification of small-molecule pharmaceuticals. SLAS Discov
349. Jenwitheesuk E, Horst JA, Rivas KL et al (2008) Novel para- 24(1):1–24. https://doi.org/10.1177/2472555218799713
digms for drug discovery: computational multitarget screening. 366. Yue SJ, Liu J, Feng WW et al (2017) System pharmacology-
Trends Pharmacol Sci. https://doi.org/10.1016/j.tips.2007.11. based dissection of the synergistic mechanism of huangqi and
007 huanglian for diabetes mellitus. Front Pharmacol. https://d oi.o rg/
350. Gu S, Lai L, hua, (2020) Associating 197 Chinese herbal medi- 10.3389/fphar.2017.00694
cine with drug targets and diseases using the similarity ensemble 367. Shi XQ, Yue SJ, Tang YP et al (2019) A network pharmacol-
approach. Acta Pharmacol Sin 41:432–438. https://doi.org/10. ogy approach to investigate the blood enriching mechanism of
1038/s41401-019-0306-9 Danggui buxue Decoction. J Ethnopharmacol. https://d oi.o rg/1 0.
351. Chen YT, Xie JY, Sun Q, Mo WJ (2019) Novel drug candidates 1016/j.jep.2019.01.027
for treating esophageal carcinoma: a study on differentially 368. Liu X, Wu J, Zhang D et al (2018) A network pharmacology
expressed genes, using connectivity mapping and molecular approach to uncover the multiple mechanisms of hedyotis dif-
docking. Int J Oncol 54:152–166. https://doi.org/10.3892/ijo. fusa willd on colorectal cancer. EvidenceBased Complem Altern
2018.4618 Med. https://doi.org/10.1155/2018/6517034
352. Taha KF, Khalil M, Abubakr MS, Shawky E (2020) Identifying 369. Wang J, Luo C, Shan C et al (2015) Inhibition of human copper
cancerrelated molecular targets of Nandina domestica Thunb. trafficking by a small molecule significantly attenuates cancer
by network pharmacologybased analysis in combination with cell proliferation. Nat Chem. https://d oi.o rg/1 0.1 038/n chem.2 381
chemical profiling and molecular docking studies. J Ethnophar- 370. Fang J, Li Y, Liu R et al (2015) Discovery of multitarget-directed
macol 249:112413. https://doi.org/10.1016/j.jep.2019.112413 ligands against Alzheimer’s disease through systematic predic-
353. Anighoro A, Bajorath J, Rastelli G (2014) Polypharmacology: tion of chemical-protein interactions. J Chem Inf Model. https://
Challenges and opportunities in drug discovery. J Med Chem doi.org/10.1021/ci500574n
57(19):7874–87. https://doi.org/10.1021/jm5006463 371. Gao L, Wang KX, Zhou YZ et al (2018) Uncovering the anti-
354. Zhang W, Pei J, Lai L (2017) Computational multitarget drug cancer mechanism of compound Kushen Injection against
design. J Chem Inf Model 57(3):403–412. https://doi.org/10. HCC by integrating quantitative analysis, network analysis
1021/acs.jcim.6b00491 and experimental validation. Sci Rep. https://doi.org/10.1038/
355. Proschak E, Stark H, Merk D (2019) Polypharmacology by s41598-017-18325-7
design: a medicinal chemist’s perspective on multitargeting com- 372. Zhou W, Liu X, Tu Z et al (2013) Discovery of pteridin-7(8H)-
pounds. J Med Chem 62(2):420–444. https://d oi.o rg/1 0.1 021/a cs. one-based irreversible inhibitors targeting the epidermal growth
jmedchem.8b00760 factor receptor (EGFR) kinase T790M/L858R Mutant. J Med
356. Awale M, Reymond JL (2017) The polypharmacology browser: Chem 56:7821–7837. https://doi.org/10.1021/jm401045n
a web-based multi-fingerprint target prediction tool using 373. Wang Q, Feng YH, Huang JC et al (2017) A novel framework
ChEMBL bioactivity data. J Cheminform. https://doi.org/10. for the identification of drug target proteins: combining stacked
1186/s13321-017-0199-x auto-encoders with a biased support vector machine. PLoS ONE.
357. Reker D, Rodrigues T, Schneider P, Schneider G (2014) Identi- https://doi.org/10.1371/journal.pone.0176486
fying the macromolecular targets of de novo-designed chemical 374. Carvalho-Silva D, Pierleoni A, Pignatelli M et al (2019) Open
entities through self-organizing map consensus. Proc Natl Acad targets platform: new developments and updates two years on.
Sci U S A. https://doi.org/10.1073/pnas.1320001111 Nucleic Acids Res. https://doi.org/10.1093/nar/gky1133
358. Wang L, Ma C, Wipf P et al (2013) Targethunter: an in silico tar- 375. López-Cortés A, Paz-y-Miño C, Guerrero S et al (2020) Phar-
get identification tool for predicting therapeutic potential of small macogenomics, biomarker network, and allele frequencies in
organic molecules based on chemogenomic database. AAPS J. colorectal cancer. Pharmacogenomics J 20(1):136–158. https://
https://doi.org/10.1208/s12248-012-9449-z doi.org/10.1038/s41397-019-0102-4
359. Xia W, Chenxu P, Honglin L (2016) PharmMapper. In: Enhanc- 376. Nabirotchkin S, Peluffo AE, Bouaziz J, Cohen D (2020) Focusing
ing Enrich. Pharmacophore-Based Target Predict. Polypharma- on the unfolded protein response and autophagy related path-
cological Profiles Drugs 56(6):1175–83. https://d oi.o rg/1 0.1 021/ ways to reposition common approved drugs against COVID-19.
acs.jcim.5b00690 Preprints
360. Gong J, Cai C, Liu X et al (2013) ChemMapper: a versatile web 377. López-Isac E, Acosta-Herrera M, Kerick M et al (2019) GWAS
server for exploring pharmacology and chemical structure asso- for systemic sclerosis identifies multiple risk loci and highlights
ciation based on molecular 3D similarity method. Bioinformat- fibrotic and vasculopathy pathways. Nat Commun. https://doi.
ics. https://doi.org/10.1093/bioinformatics/btt270 org/10.1038/s41467-019-12760-y
361. Gfeller D, Grosdidier A, Wirth M et al (2014) SwissTargetPredic- 378. Martin P, Ding J, Duffus K et al (2019) Chromatin interac-
tion: a web server for target prediction of bioactive small mol- tions reveal novel gene targets for drug repositioning in rheu-
ecules. Nucleic Acids Res. https://doi.org/10.1093/nar/gku293 matic diseases. Ann Rheum Dis. https://doi.org/10.1136/annrh
362. Poirier M, Awale M, Roelli MA et al (2019) Identifying eumdis-2018-214649
lysophosphatidic acid acyltransferase β (LPAAT-β) as the target 379. Dong J, Cao DS, Miao HY et al (2015) ChemDes: an inte-
of a nanomolar angiogenesis inhibitor from a phenotypic screen grated web-based platform for molecular descriptor and fin-
using the polypharmacology browser PPB2. ChemMedChem. gerprint computation. J Cheminform. https://doi.org/10.1186/
https://doi.org/10.1002/cmdc.201800554 s13321-015-0109-z
363. Ozhathil LC, Delalande C, Bianchi B et al (2018) Identification 380. Angelo RM, Io AK, Almeida MP, et al (2020) OntoQSAR: An
of potent and selective small molecule inhibitors of the cation ontology for interpreting chemical and biological data in quanti-
channel TRPM4. Br J Pharmacol. https://doi.org/10.1111/bph. tative structure-activity relationship studies. In: Proceedings-14th
14220 IEEE International Conference on Semantic Computing, ICSC
364. Ratnawati DE, Marjono M, Anam S (2018) Prediction of active 2020
compounds from SMILES codes using backpropagation algo- 381. Oldenhof M, Arany A, Moreau Y, Simm J (2020) Chemgra-
rithm. In: AIP Conference Proceedings pher: optical graph recognition of chemical compounds by deep
13
1358 Molecular Diversity (2021) 25:1315–1360
learning. J Chem Inf Model. https://doi.org/10.1021/acs.jcim. 398. Shar PA, Tao W, Gao S et al (2016) Pred-binding: large-scale
0c00459 protein–ligand binding affinity prediction. J Enzyme Inhib Med
382. Dong J, Yao ZJ, Zhu MF et al (2017) ChemSAR: An online Chem 31:1443–1450. https://doi.org/10.3109/14756366.2016.
pipelining platform for molecular SAR modeling. J Cheminform. 1144594
https://doi.org/10.1186/s13321-017-0215-1 399. Capuzzi SJ, Kim ISJ, Lam WI et al (2017) Chembench: a pub-
383 Buyukbingol E, Sisman A, Akyildiz M et al (2007) Adaptive licly accessible, integrated cheminformatics portal. J Chem Inf
neuro-fuzzy inference system (ANFIS): a new approach to pre- Model. https://doi.org/10.1021/acs.jcim.6b00462
dictive modeling in QSAR applications: a study of neuro-fuzzy 400. Pires DEV, Blundell TL, Ascher DB (2016) MCSM-lig: quanti-
modeling of PCP-based NMDA receptor antagonists. Bioorg fying the effects of mutations on protein-small molecule affinity
Med Chem 15:4265–4282. https://doi.org/10.1016/j.bmc.2007. in genetic disease and emergence of drug resistance. Sci Rep.
03.065 https://doi.org/10.1038/srep29575
384. Jiang HJ, Huang YA, You ZH (2019) Predicting drug-disease 401. Pires DEV, Ascher DB (2016) CSM-lig: a web server for assess-
associations via using Gaussian interaction profile and kernel- ing and comparing protein-small molecule affinities. Nucleic
based autoencoder. Biomed Res Int. https://doi.org/10.1155/ Acids Res. https://doi.org/10.1093/nar/gkw390
2019/2426958 402. Pires DEV, Ascher DB (2016) mCSM-AB: a web server for pre-
385. Wang YY, Cui C, Qi L et al (2019) DrPOCS: drug repositioning dicting antibody-antigen affinity changes upon mutation with
based on projection onto convex sets. IEEE/ACM Trans Comput graph-based signatures. Nucleic Acids Res. https://doi.org/10.
Biol Bioinforma. https://doi.org/10.1109/TCBB.2018.2830384 1093/nar/gkw458
386. Xuan P, Cui H, Shen T et al (2019) HeteroDualNet: a dual con- 403. Kaminskas LM, Pires DEV, Ascher DB (2019) dendPoint: a web
volutional neural network with heterogeneous layers for drug- resource for dendrimer pharmacokinetics investigation and pre-
disease association prediction via chou’s five-step rule. Front diction. Sci Rep. https://doi.org/10.1038/s41598-019-51789-3
Pharmacol. https://doi.org/10.3389/fphar.2019.01301 404. Patel RD, Prasanth Kumar S, Pandya HA, Solanki HA (2018)
387. Sadeghi SS, Keyvanpour M (2019) RCDR: A Recommender MDCKpred: a web-tool to calculate MDCK permeability coef-
Based Method for Computational Drug Repurposing. In: 2019 ficient of small molecule using membrane-interaction chemical
IEEE 5th Conference on Knowledge Based Engineering and features. Toxicol Mech Methods. https://doi.org/10.1080/15376
Innovation, KBEI 2019 516.2018.1499840
388. Zhu Q, Luo J, Ding P, Xiao Q (2018) GRTR: Drug-disease 405. Montanari F, Knasmüller B, Kohlbacher S et al (2020) Vienna
association prediction based on graph regularized transductive LiverTox workspace—a set of machine learning models for pre-
regression on heterogeneous network. In: Lecture Notes in Com- diction of interactions profiles of small molecules with transport-
puter Science (including subseries Lecture Notes in Artificial ers relevant for regulatory agencies. Front Chem. https://doi.org/
Intelligence and Lecture Notes in Bioinformatics) 10.3389/fchem.2019.00899
389. Jiang HJ, Huang YA, You ZH (2020) SAEROF: an ensemble 406. Kochev N, Avramova S, Jeliazkova N (2018) Ambit-SMIRKS: a
approach for large-scale drug-disease association prediction by software module for reaction representation, reaction search and
incorporating rotation forest and sparse autoencoder deep neural structure transformation. J Cheminform. https://d oi.o rg/1 0.1 186/
network. Sci Rep. https://doi.org/10.1038/s41598-020-61616-9 s13321-018-0295-6
390. Wang MN, You ZH, Li LP, et al (2020) WGMFDDA: A Novel 407. Hornig M, Klamt A (2005) COSMOfrag: a novel tool for high-
Weighted-Based Graph Regularized Matrix Factorization for throughput ADME property prediction and similarity screening
Predicting Drug-Disease Associations. In: Lecture Notes in based on quantum chemistry. J Chem Inf Model. https://doi.org/
Computer Science (including subseries Lecture Notes in Arti- 10.1021/ci0501948
ficial Intelligence and Lecture Notes in Bioinformatics), 408. Hassan-Harrirou H, Zhang C, Lemmin T (2020) RosENet:
391. Liu H, Zhang W, Song Y et al (2020) HNet-DNN: inferring new improving binding affinity prediction by leveraging molecular
drug-disease associations with deep neural network based on mechanics energies with an ensemble of 3D convolutional neural
heterogeneous network features. J Chem Inf Model. https://doi. networks. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.
org/10.1021/acs.jcim.9b01008 0c00075
392. Lee I, Keum J, Nam H (2019) DeepConv-DTI: prediction of 409. Rifaioglu AS, Cetin Atalay R, Cansen Kahraman D et al (2020)
drug-target interactions via deep learning with convolution on MDeePred: novel multi-channel protein featurization for deep
protein sequences. PLoS Comput Biol. https://doi.org/10.1371/ learning-based binding affinity prediction in drug discovery. Bio-
journal.pcbi.1007129 informatics. https://doi.org/10.1093/bioinformatics/btaa858
393. Abdel-Basset M, Hawash H, Elhoseny M et al (2020) DeepH- 410. Banerjee P, Eckert AO, Schrey AK, Preissner R (2018) Pro-
DTA: deep learning for predicting drug-target interactions: a case Tox-II: a webserver for the prediction of toxicity of chemicals.
study of COVID-19 drug repurposing. IEEE Access. https://doi. Nucleic Acids Res. https://doi.org/10.1093/nar/gky318
org/10.1109/access.2020.3024238 411. Dong J, Wang NN, Yao ZJ et al (2018) Admetlab: a platform
394. Yang J, He S, Zhang Z, Bo X (2020) NegStacking: drug-target for systematic ADMET evaluation based on a comprehensively
interaction prediction based on ensemble learning and logistic collected ADMET database. J Cheminform. https://doi.org/10.
regression. IEEE/ACM Trans Comput Biol Bioinforma. https:// 1186/s13321-018-0283-x
doi.org/10.1109/TCBB.2020.2968025 412. Maunz A, Gütlein M, Rautenberg M et al (2013) Lazar: a modu-
395. King MD, Long T, Pfalmer DL et al (2018) SPIDR: small-mole- lar predictive toxicology framework. Front Pharmacol. https://
cule peptide-influenced drug repurposing. BMC Bioinformatics. doi.org/10.3389/fphar.2013.00038
https://doi.org/10.1186/s12859-018-2153-y 413. Yao ZJ, Dong J, Che YJ et al (2016) TargetNet: a web service for
396. Huang K, Fu T, Glass LM et al (2020) DeepPurpose: a deep predicting potential drug–target interaction profiling via multi-
learning library for drug–target interaction prediction. Bioinfor- target SAR models. J Comput Aided Mol Des. https://d oi.o rg/1 0.
matics. https://doi.org/10.1093/bioinformatics/btaa1005 1007/s10822-016-9915-2
397. Chu Y, Kaushik AC, Wang X et al (2019) DTI-CDF: a cascade 414. Meng C, Hu Y, Zhang Y, Guo F (2020) PSBP-SVM: a machine
deep forest model towards the prediction of drug-target interac- learning-based computational identifier for predicting polysty-
tions based on hybrid features. Brief Bioinform. https://doi.org/ rene binding peptides. Front Bioeng Biotechnol. https://doi.org/
10.1093/bib/bbz152 10.3389/fbioe.2020.00245
13
Molecular Diversity (2021) 25:1315–1360 1359
415. Shen C, Luo J, Ouyang W et al (2020) IDDkin: network-based 435. Hessler G, Baringhaus KH (2018) Artificial intelligence in drug
influence deep diffusion model for enhancing prediction of design. Molecules. https://doi.org/10.3390/molecules23102520
kinase inhibitors. Bioinformatics. https://doi.org/10.1093/bioin 436. Domenico A, Nicola G, Daniela T et al (2020) De novo drug
formatics/btaa1058 design of targeted chemical libraries based on artificial intel-
416. Jewison T, Su Y, Disfany FM et al (2014) SMPDB 2.0: big ligence and pair-based multiobjective optimization. J Chem Inf
improvements to the small molecule pathway database. Nucleic Model 60:4582–4593. https://doi.org/10.1021/acs.jcim.0c00517
Acids Res. https://doi.org/10.1093/nar/gkt1067 437. Ekins S, Puhl AC, Zorn KM et al (2019) Exploiting machine
417. Dalabira E, Viennas E, Daki E et al (2014) DruGeVar: an learning for end-to-end drug discovery and development. Nat
online resource triangulating drugs with genes and genomic Mater 18:435–441. https://doi.org/10.1038/s41563-019-0338-z
biomarkers for clinical pharmacogenomics. Public Health 438. Pushpakom S, Iorio F, Eyers PA et al (2018) Drug repurposing:
Genom. https://doi.org/10.1159/000365895 progress, challenges and recommendations. Nat Rev Drug Dis-
418. Verma J, Luo H, Hu J, Zhang P (2017) DrugPathSeeker: Inter- cov 18(1):41–58. https://doi.org/10.1038/nrd.2018.168
active UI for exploring drug-ADR relation via pathways. In: 439. Kubick N, Pajares M, Enache I et al (2020) Repurposing Zileuton
IEEE Pacific Visualization Symposium as a depression drug using an AI and in vitro approach. Mol-
419. Jarada T, Rokne J, Alhajj R (2021) SNF-NN: Computational ecules. https://doi.org/10.3390/molecules25092155
Method To Predict Drug-Disease Interactions Using Similarity 440. Yuan Y, Pei J, Lai L (2020) LigBuilder V3: a multi-target de
Network Fusion and Neural Networks. Res Sq. https://doi.org/ novo drug design approach. Front Chem 8:1–18. https://doi.org/
10.21203/rs.3.rs-56433/v1 10.3389/fchem.2020.00142
420. Cao X, Fan R, Zeng W (2020) DeepDrug: a general graph- 441. Wei L, Wen W, Rao L et al (2020) Cov_FB3D: a de novo cova-
based deep learning framework for drug relation prediction. lent drug design protocol integrating the Ba-SAMP strategy and
biorxiv. https://doi.org/10.1101/2020.11.09.375626 machine-learning-based synthetic tractability evaluation. J Chem
421. Hartenfeller M, Schneider G (2011) Enabling future drug dis- Inf Model 60:4388–4402. https://d oi.o rg/1 0.1 021/a cs.j cim.9 b011
covery by de novo design. Wiley Interdiscip Rev Comput Mol 97
Sci 1:742–759. https://doi.org/10.1002/wcms.49 442. Jiménez-Luna J, Grisoni F, Schneider G (2020) Drug discovery
422. Schneider P, Schneider G (2016) De Novo design at the edge with explainable artificial intelligence. Nat Mach Intell 2:573–
of chaos. J Med Chem 59:4077–4086. https://doi.org/10.1021/ 584. https://doi.org/10.1038/s42256-020-00236-4
acs.jmedchem.5b01849 443. Cavasotto CN, Di Filippo JI (2021) Artificial intelligence in
423. Lavecchia A (2015) Machine-learning approaches in drug the early stages of drug discovery. Arch Biochem Biophys
discovery: methods and applications. Drug Discov Today 698:108730. https://doi.org/10.1016/j.abb.2020.108730
20(3):318–331. https://doi.org/10.1016/j.drudis.2014.10.012 444. Wong CH, Siah KW, Lo AW (2019) Estimation of clinical trial
424. Vyas V, Jain A, Jain A, Gupta A (2008) Virtual screening: a success rates and related parameters. Biostatistics 20:273–286.
fast tool for drug design. Sci Pharm 76(3):333–360. https://doi. https://doi.org/10.1093/biostatistics/kxx069
org/10.3797/scipharm.0803-03 445. DiMasi JA, Grabowski HG, Hansen RW (2016) Innovation in the
425. Ertl P, Schuffenhauer A (2009) Estimation of synthetic acces- pharmaceutical industry: new estimates of R&D costs. J Health
sibility score of drug-like molecules based on molecular com- Econ 47:20–33. https://doi.org/10.1016/j.jhealeco.2016.01.012
plexity and fragment contributions. J Cheminform 1:1–11. 446. Ruddigkeit L, Van Deursen R, Blum LC, Reymond JL (2012)
https://doi.org/10.1186/1758-2946-1-8 Enumeration of 166 billion organic small molecules in the chem-
426. Blaschke T, Olivecrona M, Engkvist O et al (2018) Application ical universe database GDB-17. J Chem Inf Model 52:2864–
of generative autoencoder in De novo molecular design. Mol 2875. https://doi.org/10.1021/ci300415d
Inform 37:1–11. https://doi.org/10.1002/minf.201700123 447. Mohs RC, Greig NH (2017) Drug discovery and development:
427. Jaakkola TS, Haussler D (1999) Exploiting generative models role of basic biological research. Alzheimer’s Dement Transl Res
in discriminative classifiers. In: Advances in Neural Informa- Clin Interv 3(4):651–657. https://doi.org/10.1016/2Fj.trci.2017.
tion Processing Systems 10.005
428. Kadurin A, Aliper A, Kazennov A et al (2017) The cornuco- 448. Vamathevan J, Clark D, Czodrowski P et al (2019) Applica-
pia of meaningful leads: applying deep adversarial autoencod- tions of machine learning in drug discovery and development.
ers for new molecule development in oncology. Oncotarget Nat Rev Drug Discov 18(6):463–477. https://doi.org/10.1038/
8:10883–10890. https://doi.org/10.18632/oncotarget.14073 s41573-019-0024-5
429. Müller AT, Hiss JA, Schneider G (2018) Recurrent neural net- 449. Niel O, Bastard P (2019) Artificial intelligence in nephrology:
work model for constructive peptide design. J Chem Inf Model core concepts, clinical applications, and perspectives. Am J Kid-
58:472–479. https://doi.org/10.1021/acs.jcim.7b00414 ney Dis. https://doi.org/10.1053/j.ajkd.2019.05.020
430. Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) 450. Ahuja AS (2019) The impact of artificial intelligence in medicine
Molecular de-novo design through deep reinforcement on the future role of the physician. PeerJ. https://d oi.o rg/1 0.7 717/
learning. J Cheminform 9:1–14. https://d oi.o rg/1 0.1 186/ peerj.7702
s13321-017-0235-x 451. Rubin EH, Gilliland DG (2012) Drug development and clinical
431. Merk D, Friedrich L, Grisoni F, Schneider G (2018) De novo trials-the path to an approved cancer drug. Nat Rev Clin Oncol
design of bioactive small molecules by artificial intelligence. Mol 9:215–222. https://doi.org/10.1038/nrclinonc.2012.22
Inform 37:3–6. https://doi.org/10.1002/minf.201700153 452. Rautio J, Kumpulainen H, Heimbach T et al (2008) Prodrugs:
432. Sarkar D (2018) A comprehensive hands-on guide to transfer design and clinical applications. Nat Rev Drug Discov 7(3):255–
learning with real-world applications in deep learning. Medium 270. https://doi.org/10.1038/nrd2468
433. Li X, Fourches D (2020) Inductive transfer learning for molecular 453. Harrer S, Shah P, Antony B, Hu J (2019) Artificial intelligence
activity prediction: next-gen QSAR models with MolPMoFiT. J for clinical trial design. Trends Pharmacol Sci 40:577–591.
Cheminform. https://doi.org/10.1186/s13321-020-00430-x https://doi.org/10.1016/j.tips.2019.05.005
434. Engkvist O, Norrby PO, Selmi N et al (2018) Computational pre- 454. Fogel DB (2018) Factors associated with clinical trials that fail
diction of chemical reactions: current status and outlook. Drug and opportunities for improving the likelihood of success: a
Discov Today 23:1203–1218. https://doi.org/10.1016/j.drudis. review. Contemp Clin Trials Commun 11:156–164. https://doi.
2018.02.014 org/10.1016/2Fj.conctc.2018.08.001
13
1360 Molecular Diversity (2021) 25:1315–1360
455. Toh TS, Dondelinger F, Wang D (2019) Looking beyond the 469. Ponzoni I, Sebastián-Pérez V, Martínez MJ et al (2019) QSAR
hype: applied AI and machine learning in translational medicine. Classification models for predicting the activity of inhibitors of
EBioMedicine 47:607–615. https://doi.org/10.1016/j.ebiom. beta-secretase (BACE1) associated with Alzheimer’s disease. Sci
2019.08.027 Rep 9:1–13. https://doi.org/10.1038/s41598-019-45522-3
456. Qi Y (2019) Predicting phase 3 clinical trial results by modeling 470. Kaiser TM, Dentmon ZW, Dalloul CE et al (2020) Accelerated
phase 2 clinical trial subject level data using deep learning. Proc discovery of novel Ponatinib Analogs with improved properties
Mach Learn Res 106:1–14 for the treatment of Parkinson’s disease. ACS Med Chem Lett
457. Viceconti M, Henney A, Morley-Fletcher E (2016) In silico 11:491–496. https://doi.org/10.1021/acsmedchemlett.9b00612
clinical trials: how computer simulation will transform the bio- 471. Shao YM, Ma X, Paira P et al (2018) Discovery of indolylpiper-
medical industry. Int J Clin Trials. https://d oi.o rg/1 0.1 8203/2 349- azinylpyrimidines with dual-target profiles at adenosine A2A and
3259.ijct20161408 dopamine D2 receptors for Parkinson’s disease treatment. PLoS
458. Magalingam KB, Radhakrishnan A, Ping NS, Haleagrahara N ONE 13:1–27. https://doi.org/10.1371/journal.pone.0188212
(2018) Current concepts of neurodegenerative mechanisms in 472. Chen ZD, Zhao L, Chen HY et al (2020) A novel artificial intel-
Alzheimer’s disease. Biomed Res Int. https://doi.org/10.1155/ ligence protocol to investigate potential leads for Parkinson’s dis-
2018/3740461 ease. RSC Adv 10:22939–22958. https://doi.org/10.1039/d0ra0
459. Hussain R, Zubair H, Pursell S, Shahab M (2018) Neurodegen- 4028b
erative diseases: regenerative mechanisms and novel therapeutic 473. Deng L, Zhong W, Zhao L et al (2020) Artificial intelligence-
approaches. Brain Sci. https://doi.org/10.3390/brainsci8090177 based application to explore inhibitors of neurodegenerative
460. Levenson RW, Sturm VE, Haase CM (2014) Emotional and diseases. Front Neurorobot. https://doi.org/10.3389/fnbot.2020.
behavioral symptoms in neurodegenerative disease: a model for 617327
studying the neural bases of psychopathology. Annu Rev Clin 474. Oh M, Ahn J, Yoon Y (2014) A network-based classification
Psychol 10:581–606. https://doi.org/10.1146/annurev-clinp model for deriving novel drug-disease associations and assessing
sy-032813-153653 their molecular actions. PLoS ONE 9:1–12. https://doi.org/10.
461. Gitler AD, Dhillon P, Shorter J (2017) Neurodegenerative dis- 1371/journal.pone.0111668
ease: Models, mechanisms, and a new hope. DMM Dis Model 475. Zhu Y, Jung W, Wang F, Che C (2020) Drug repurposing against
Mech 10:499–502. https://doi.org/10.1242/dmm.030205 Parkinson’s disease by text mining the scientific literature. Libr
462. Mak KK, Pichika MR (2019) Artificial intelligence in drug devel- Hi Tech 38:741–750. https://d oi.o rg/1 0.1 108/L
HT-0 8-2 019-0 170
opment: present status and future prospects. Drug Discov Today 476. Vatansever S, Schlessinger A, Wacker D, et al (2020) Artificial
24(3):773–780. https://doi.org/10.1016/j.drudis.2018.11.014 intelligence and machine learning-aided drug discovery in central
463. Peng J, Guan J, Shang X (2019) Predicting Parkinson’s disease nervous system diseases: state-of-the-arts and future directions.
genes based on node2vec and autoencoder. Front Genet 10:1–6. Med Res Rev Online ahead of print
https://doi.org/10.3389/fgene.2019.00226 477. Stokes JM, Yang K, Swanson K et al (2020) A deep learning
464. Thomas SN, Funk KE, Wan Y et al (2012) Dual modification of approach to antibiotic discovery. Cell. https://doi.org/10.1016/j.
Alzheimer’s disease PHF-tau protein by lysine methylation and cell.2020.01.021
ubiquitylation: a mass spectrometry approach. Acta Neuropathol. 478. Mamoshina P, Vieira A, Putin E, Zhavoronkov A (2016) Applica-
https://doi.org/10.1007/s00401-011-0893-0 tions of deep learning in biomedicine. Mol Pharm 13(5):1445–
465. Yousefian-Jazi A, Sung MK, Lee T et al (2020) Functional fine- 1454. https://doi.org/10.1021/acs.molpharmaceut.5b00982
mapping of noncoding risk variants in amyotrophic lateral scle- 479. Preuer K, Klambauer G, Rippmann F, et al (2019) Interpretable
rosis utilizing convolutional neural network. Sci Rep 10:1–12. deep learning in drug discovery. In: Lecture Notes in Computer
https://doi.org/10.1038/s41598-020-69790-6 Science (including subseries Lecture Notes in Artificial Intel-
466. Gupta R, Ambasta RK, Kumar P (2020) Identification of novel ligence and Lecture Notes in Bioinformatics)
class I and class IIb histone deacetylase inhibitor for Alzheimer’s 480. Ramsundar B, Liu B, Wu Z et al (2017) Is multitask deep learn-
disease therapeutics. Life Sci. https://doi.org/10.1016/j.lfs.2020. ing practical for pharma? J Chem Inf Model. https://doi.org/10.
117912 1021/acs.jcim.7b00146
467. Jamal S, Grover A, Grover S (2019) Machine learning from 481. Grace K, Salvatier J, Dafoe A, et al (2017) When Will AI Exceed
molecular dynamics trajectories to predict caspase-8 Inhibitors Human Performance? evidence from AI experts. J Artif Intell
against Alzheimer’s disease. Front Pharmacol 10:1–13. https:// Res 62:1–48 https://arxiv.org/abs/1705.08807
doi.org/10.3389/fphar.2019.00780 482. Altae-Tran H, Ramsundar B, Pappu AS, Pande V (2017) Low
468. Chen HY, Chen JQ, Li JY et al (2019) Deep learning and random data drug discovery with one-shot learning. ACS Cent Sci 3:283–
forest approach for finding the optimal traditional Chinese medi- 293. https://doi.org/10.1021/acscentsci.6b00367
cine formula for treatment of Alzheimer’s Disease. J Chem Inf
Model 59:1605–1623. https://doi.org/10.1021/acs.jcim.9b00041
13