Thanks to visit codestin.com
Credit goes to link.springer.com

Skip to main content
Log in

Enhancing the identification of malonylation sites using AlphaFold2 and ensemble learning

  • Original Article
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

Malonylation modification of proteins is closely related to many diseases, such as diabetes and cancer. Therefore, accurate identification of malonylation modification sites is crucial for elucidating the molecular mechanisms underlying these diseases. Traditional experimental methods suffer from the problems of high cost, long cycle time, difficulty, etc. With advancements in artificial intelligence, the prediction of protein post-translational modification sites through computational methods has emerged as a vital complement to experimental approaches. In this paper, we present a malonylation site prediction model, Catsoft_Kmalsite, the core innovation of which lies in its integration of complementary information from protein three-dimensional structural features and sequence/physicochemical features, coupled with a soft voting ensemble strategy based on Bayesian-optimized base classifiers. Specifically, we utilize AlphaFold2 to acquire protein tertiary structural information and employ CTDC, EAAC, and EGAAC methods to extract protein sequence and physicochemical features. Subsequently, two base classifiers are constructed using the CatBoost algorithm based on these two distinct feature sets, respectively. Following parameter fine-tuning of the base classifiers via Bayesian optimization, they are ultimately integrated using a soft voting strategy. All ablation experimental results show that the Catsoft_Kmalsite model exhibited good robustness and generalization ability. Across six metrics, including AUC, ACC, Sen, Pre, F1, and MCC, the model achieved average performances of 94.03%, 87.91%, 89.15%, 86.91%, 88.00%, and 0.7585, respectively, in fivefold cross-validation and specific performance of 95.18%, 89.55%, 90.87%, 88.79%, 89.82%, and 0.7912 on the independent test set; Catsoft_Kmalsite also outperformed other state-of-the-art studies in all evaluated metrics. Furthermore, we have developed a website for users to use (http://1.94.102.146:8501/Catsoft_Kmalsite). The code and dataset of Catsoft_Kmalsite are available at https://github.com/flyinsky6/Catsoft_Kmalsite.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from £29.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The raw data is derived from the CPLM database (https://cplm.biocuckoo.cn/index.php). The process-related data and code for the paper are uploaded to https://github.com/flyinsky6/Catsoft_Kmalsite.

References

  1. Vu LD, Gevaert K, Smet ID (2018) Protein language: post-translational modifications talking to each other. Trends Plant Sci 23(12):1068–1080. https://doi.org/10.1016/j.tplants.2018.09.004

    Article  CAS  PubMed  Google Scholar 

  2. Walsh CT, Garneau-Tsodikova S, Gatto GJ Jr (2005) Protein posttranslational modifications: the chemistry of proteome diversifications. Angew Chem Int Ed 44(45):7342–7372. https://doi.org/10.1002/anie.200501023

    Article  CAS  Google Scholar 

  3. Brian J, Charlie F et al (2020) Light-driven post-translational installation of reactive protein side chains. Nature 585(7826):530–537. https://doi.org/10.1038/s41586-020-2733-7

    Article  CAS  Google Scholar 

  4. Snider NT, Omary MB (2014) Post-translational modifications of intermediate filament proteins: mechanisms and functions. Nat Rev Mol Cell Biol 15(3):163–177. https://doi.org/10.1038/nrm3753

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Peng C, Lu Z, Xie Z et al (2011) The first identification of lysine malonylation substrates and its regulatory enzyme. Mol Cell Proteomics 10(12):M111 012658. https://doi.org/10.1074/mcp.M111.012658

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Zhongyu X, Junbiao D, Lunzhi D et al (2012) Lysine succinylation and lysine malonylation in histones. Mol & Cell Proteomics 11(5):100–107. https://doi.org/10.1074/mcp.M111.015875

    Article  CAS  Google Scholar 

  7. Yipeng D, Tanxi C, Tingting L et al (2015) Lysine malonylation is elevated in type 2 diabetic mouse models and enriched in metabolic associated proteins. Mol & Cell Proteomics 14(1):227–236. https://doi.org/10.1074/mcp.M114.041947

    Article  CAS  Google Scholar 

  8. Gozde C, Olga P, Lunzhi D et al (2015) Proteomic and biochemical studies of lysine malonylation suggest its malonic aciduria-associated regulatory role in mitochondrial function and fatty acid oxidation. Mol Cell Proteomics 14(11):3056–3071. https://doi.org/10.1074/mcp.M115.048850

    Article  CAS  Google Scholar 

  9. Nishida Y, Rardin JM, Carrico C et al (2015) SIRT5 regulates both cytosolic and mitochondrial protein malonylation with glycolysis as a major target. Mol Cell 59(2):321–332. https://doi.org/10.1016/j.molcel.2015.05.022

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Nie LB, Liang QL, Du R et al (2020) Global proteomic analysis of lysine malonylation in Toxoplasma gondii. Front Microbiol 11:776. https://doi.org/10.3389/fmicb.2020.00776

    Article  PubMed  PubMed Central  Google Scholar 

  11. Ramazi S, Tabatabaei HAS, Khalili E et al (2024) Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences. Database: J Biol Databases Curation. https://doi.org/10.1093/database/baad094

    Article  Google Scholar 

  12. Yan X, Ya-Xin D, Jun D et al (2016) Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci Rep. https://doi.org/10.1038/srep38318

    Article  PubMed  PubMed Central  Google Scholar 

  13. Li-Na W, Shao-Ping S, Hao-Dong X et al (2017) Computational prediction of species-specific malonylation sites via enhanced characteristic strategy. Bioinformatics (Oxford, England) 33(10):1457–1463. https://doi.org/10.1093/bioinformatics/btw755

    Article  CAS  Google Scholar 

  14. Ghazaleh T, Yuedong Y, Haodong X et al (2018) Predicting lysine-malonylation sites of proteins using sequence and predicted structural features. J Computat Chem 39(22):1757–1763. https://doi.org/10.1002/jcc.25353

    Article  CAS  Google Scholar 

  15. Xin L, Liang W, Jian L et al (2020) Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration. BMC Genom 21(1):812. https://doi.org/10.1186/s12864-020-07166-w

    Article  CAS  Google Scholar 

  16. Ahmad W, Arafat E, Taherzadeh G et al (2020) Mal-Light: enhancing lysine malonylation sites prediction problem using evolutionary-based features. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2989713

    Article  PubMed  PubMed Central  Google Scholar 

  17. Dipta SR, Ahmad W, Arafat ME et al (2020) SEMal: accurate protein malonylation site predictor using structural and evolutionary information. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2020.104022

    Article  PubMed  Google Scholar 

  18. Ghanbari SA, Jamshid P, Vahid G (2022) A hybrid feature extraction scheme for efficient malonylation site prediction. Sci Rep 12(1):5756. https://doi.org/10.1038/s41598-022-08555-9

    Article  CAS  Google Scholar 

  19. Zhang Y, Xie R, Wang J, Leier A, Marquez-Lago TT, Akutsu T, Webb GI, Chou KC, Song J (2019) Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Briefings Bioinform 20(6):2185–2199. https://doi.org/10.1093/bib/bby079

    Article  CAS  Google Scholar 

  20. Chen Z, He N, Huang Y et al (2018) Integration of a deep learning classifier with a random forest approach for predicting Malonylation sites. Genomics Proteomics Bioinform 16(06):451–459. https://doi.org/10.1016/j.gpb.2018.08.004

    Article  Google Scholar 

  21. Minghui W, Xiaowen C, Shan L et al (2020) DeepMal: accurate prediction of protein malonylation sites by deep neural networks. Chemomet Intelligent Lab Syst 207:104175. https://doi.org/10.1016/j.chemolab.2020.104175

    Article  CAS  Google Scholar 

  22. Al-Barakati H, Thapa N, Hiroto S et al (2020) RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites. Computat Struct Biotechnol J. https://doi.org/10.1016/j.csbj.2020.02.012

    Article  Google Scholar 

  23. Minghui W, Lili S, Yaqun Z et al (2022) Malsite-Deep: prediction of protein malonylation sites through deep learning and multi-information fusion based on NearMiss-2 strategy. Knowledge-Based Syst 240:108191. https://doi.org/10.1016/j.knosys.2022.108191

    Article  Google Scholar 

  24. Sultan FM, Shaon HSM, Karim T et al (2024) MLAFP-XN: leveraging neural network model for development of antifungal peptide identification tool. Heliyon 10(18):e37820–e37820. https://doi.org/10.1016/j.heliyon.2024.e37820

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. John J, Richard E, Alexander P et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583–589. https://doi.org/10.1038/s41586-021-03819-2

    Article  CAS  Google Scholar 

  26. Sultan FM, Karim T, Shaon HSM et al (2025) DHUpredET: a comparative computational approach for identification of dihydrouridine modification sites in RNA sequence. Analytical Biochem 702:702115828. https://doi.org/10.1016/j.ab.2025.115828

    Article  CAS  Google Scholar 

  27. Swiss-Model: a web-based computational tool for designing of protein structures. Biotech Today, 2021, 10(2):43–44.https://doi.org/10.5958/2322-0996.2020.00025.3

  28. Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features, 2017. https://doi.org/10.48550/arXiv.1706.09516

  29. Zhang W, Tan X, Lin S et al (2022) CPLM 4.0: an updated database with rich annotations for protein lysine modifications. Nucleic Acids Res 50(D1):D451–D459. https://doi.org/10.1093/nar/gkab849

    Article  CAS  PubMed  Google Scholar 

  30. Vacic V, Iakoucheva LM, Radivojac P (2006) Two sample logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22(12):1536–1537. https://doi.org/10.1093/bioinformatics/btl151

    Article  CAS  PubMed  Google Scholar 

  31. Zhen C, Pei Z, Chen L et al (2021) Ilearnplus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab122

    Article  Google Scholar 

  32. Dietterich T G. 2000 Ensemble Methods in Machine Learning. proc international workshgp on multiple classifier systems. https://doi.org/10.1007/3-540-45014-9_1

  33. Valentini G, Masulli F. 2002 Ensembles of Learning Machines. Springer-Verlag. https://doi.org/10.1007/3-540-45808-5_1

  34. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):37–45. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  35. Wei Y et al (2024) Editorial: special section on challenges and opportunities in biomedical big data analysis: from large language models to clinical applications. Big Data Min Anal 7(4):1114–1115. https://doi.org/10.26599/BDMA.2024.9020077

    Article  Google Scholar 

Download references

Funding

Jiangsu Training Program of Innovation and Entrepreneurship for Undergraduates, 202410313030Z, General Project of the Natural Science Foundation of Jiangsu Higher Education Institutions of China, 22KJB520040, the Natural Science Research of Jiangsu Higher Education Institutions of China, 22KJB310021, and Postdoctoral Science Foundation of Jiangsu Province, 1701062B, 2017107001, Joint Project of Industry University Research of Jiangsu Province, BY20230198.

Author information

Authors and Affiliations

Authors

Contributions

LLX and YTQ equally contributed to the literature review and initial framework construction. JYY, XWX, ZQL, and YHW participated in data collection and pre—processing. EHL and XXK were responsible for the technical verification and algorithm optimization of the model. Hwz and YPL provided support in software development and data analysis tools. FW contributed industry—relevant insights and application—scenario suggestions. XL supervised the whole project, guided the research direction, and participated in the final paper revision and approval.

Corresponding author

Correspondence to Xin Liu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, L., Qian, Y., Yang, J. et al. Enhancing the identification of malonylation sites using AlphaFold2 and ensemble learning. Mol Divers (2025). https://doi.org/10.1007/s11030-025-11357-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11030-025-11357-6

Keywords