Abstract
Acquired immune deficiency syndrome (AIDS) is a fatal disease caused by human immunodeficiency virus (HIV). Although 23 different drugs have been available, the treatment of AIDS remains challenging because the virus mutates very quickly which can lead to drug resistance. Therefore, predicting drug resistance before treatment is crucial for individual treatments. Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods. To transform target sequences into numeric vectors, seven physicochemical properties were used, which can well represent the interacting characteristics of target proteins. Then, principal component analysis (PCA) method was adopted to reduce the feature dimensionality. Random forest (RF) and support vector machine (SVM) based on three different kernel functions, including linear, polynomial and radial basis function (RBF), were all employed. By comparisons, we found that RBF-based SVM method gives a comparative performance with RF model. Further, we added the weight information to RBF-based SVM method by four different weight evaluation methods of RF, eXtreme Gradient Boosting (XGB), CfsSubsetEval and ReliefFAttributeEval, respectively. Results show that the RF-weighted RBF-based SVM yield the superior performance and 13 out of 21 drug models provide the correlation coefficients (R2) over 0.8 and 3 of them are higher than 0.9. Finally, position-specific importance analysis indicates that most of the mutation residues with high RF weight scores are proved to be closely related with drug resistance, which has been revealed in previous reports. Overall, we can expect that this method can be a supplementary tool for predicting HIV drug resistance for newly discovered mutations.
Graphic abstract
Here, based on HIV target protein sequence information, we analyzed 21-drug resistance caused by mutated residues using machine learning (ML) methods by fusing the weight information of different mutation positions.
Similar content being viewed by others
References
Rambaut A, Posada D, Crandall KA, Holmes EC (2004) The causes and consequences of HIV evolution. Nat Rev Genet 5:52–61
Smyth RP, Davenport MP, Mak J (2012) The origin of genetic diversity in HIV-1. Virus Res 169(2):415–429
Iyidogan P, Anderson KS (2014) Current perspectives on HIV-1 antiretroviral drug resistance. Viruses 6(10):4095–4139
German Advisory Committee Blood (Arbeitskreis Blut) (2016) Subgroup assessment of pathogens transmissible by Blood Human immunodeficiency virus (HIV). Transf Medicine and Hemotherapy 43(3):203–222
Riemenschneider M, Senge R, Neumann U, Hüllermeier E, Heider D (2016) Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Mining 9:10
Heider D, Senge R, Cheng W, Hullermeier E (2013) Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. Bioinformatics 29(16):1946–1952
Bonet I (2015) Machine learning for prediction of HIV drug resistance: a review. Curr Bioinform 10(5):579–585
Rhee SY, Taylor J, Wadhera G, Ben-Hur A, Brutlag DL, Shafer RW (2006) Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc Natl Acad Sci USA 103(46):17355–17360
Beerenwinkel N, Däumer M, Oette M, Korn K, Hoffmann D, Kaiser R, Lengauer T, Selbig J, Walter H (2003) Geno2pheno: Estimating phenotypic drug resistance from HIV-1 genotypes. Nucleic Acids Res 31:3850–3855
Van Laethem K, De Luca A, Antinori A, Cingolani A, Perno CF, Vandamme AM (2002) A genotypic drug resistance interpretation algorithm that significantly predicts therapy response in HIV-1-infected patients. Antivir Ther 7:123–129
Meynard JL, Vray M, Morand-Joubert L, Race E, Descamps D, Peytavin G et al (2002) Phenotypic or genotypic resistance testing for choosing antiretroviral therapy after treatment failure: a randomized trial. AIDS 16:727–736
Tarasova O, Biziukova N, Filimonov D, Poroikov V (2018) A computational approach for the prediction of HIV resistance based on amino acids and nucleotide descriptors. Molecules 23(11):2751
Khalid Z, Sezerman OU (2018) Prediction of HIV drug resistance by combining sequence and structural properties IEEE/ACM. Trans Comput Biol Bioinform 15(3):966–973
Riemenschneider M, Hummel T, Heider D (2016) SHIVA-A web application for drug resistance and tropism testing in HIV BMC. Bioinformatics 17:314
Riemenschneider M, Cashin KY, Budeus B, Sierra S, Shirvani-Dastgerdi E, Bayanolhagh S, Kaiser R, Gorry PR, Heider D (2016) Genotypic prediction of co-receptor tropism of HIV-1 subtypes A and C. Sci Rep 6:1–9
Beerenwinkel N, Schmidt B, Walter H, Kaiser R, Lengauer T, Hoffmann D, Korn K, Selbig J (2002) Diversity and complexity of HIV-1 drug resistance: A bioinformatics approach to predicting phenotype from genotype. Proc Natl Acad Sci USA 99:8271–8276
Heider D, Senge R, Cheng W, Hüllermeier E (2013) Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. Bioinformatics 29:1946–1952
Masso M, Vaisman II (2013) Sequence and structure based models of HIV-1 protease and reverse transcriptase drug resistance. BMC Genom 14(Suppl 4):S3
Bonet I, García MM, Saeys Y, Van De Peer Y, Grau R (2007) Predicting human immunodeficiency virus (HIV) drug resistance using recurrent neural networks. In: Proceedings of the IWINAC 2007, La Manga del Mar Menor, Spain, vol 4527, pp 234–243
Sheik Amamuddy O, Bishop NT, Tastan Bishop Ö (2017) Improving fold resistance prediction of HIV-1 against protease and reverse transcriptase inhibitors using artificial neural networks. BMC Bioinform 18:369
Ekpenyong ME, Etebong PI, Jackson TC (2019) Fuzzy-multidimensional deep learning for efficient prediction of patient response to antiretroviral therapy. Heliyon 5:e02080
Steiner MC, Gibson KM, Crandall KA (2020) Drug resistance prediction using deep learning techniques on HIV-1 sequence data. Viruses 12:5
Brand L, Yang X, Liu K, Elbeleidy S, Wang H, Zhang H et al (2020) Learning robust multilabel sample specific distances for identifying HIV-1 drug resistance. J Comput Biol 27(4):655–672
Shen C, Yu X, Harrison RW, Weber IT (2016) Automated prediction of HIV drug resistance from genotype data. BMC Bioinform 17(Suppl 8):278
Ramon E, Belanche-Munoz L, Perez-Enciso M (2019) HIV drug resistance prediction with weighted categorical kernel functions. BMC Bioinform 20(1):410
Guo Y, Yu L, Wen Z, Li M (2008) Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res 36(9):3025–3030
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. In: Proceedings of the 21st Austral Asian computer science conference, ACSC'98 vol 20(1), pp 181–191
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning. Springer, Berlin, Heidelberg, pp 171–182
Breimanr L (2001) Random forests. Mach Learn 45:5–32
Aledo JC, Cantón FR, Veredas FJ (2017) A machine learning approach for predicting methionine oxidation sites. BMC Bioinform 18:430
Vapnik VN (1997) The support vector method. In: Proceedings of the 7th international conference on artificial neural networks, Lausanne, pp 263–271
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD 11(1):10–18
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
Wensing AM, Ceccherini-Silberstein F, Charpentier C et al (2019) Update of the drug resistance mutations in HIV-1 2019 resistance mutations update. Top Antiviral Med 27(3):111–121
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare no competing financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Cai, Q., Yuan, R., He, J. et al. Predicting HIV drug resistance using weighted machine learning method at target protein sequence-level. Mol Divers 25, 1541–1551 (2021). https://doi.org/10.1007/s11030-021-10262-y
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1007/s11030-021-10262-y