Enhancing the identification of malonylation sites using AlphaFold2 and ensemble learning

Xu, Linlin; Qian, Yuting; Yang, Jiayi; Xu, Xiaowei; Li, Zhiqiang; Wang, Yanhan; Lv, Enhui; Kang, Xingxing; Zhang, Hongwei; Lu, Yaping; Wang, Fei; Liu, Xin

doi:10.1007/s11030-025-11357-6

Enhancing the identification of malonylation sites using AlphaFold2 and ensemble learning

Original Article
Published: 05 October 2025

(2025)
Cite this article

Molecular Diversity Aims and scope Submit manuscript

82 Accesses
18 Altmetric
2 Mentions
Explore all metrics

Abstract

Malonylation modification of proteins is closely related to many diseases, such as diabetes and cancer. Therefore, accurate identification of malonylation modification sites is crucial for elucidating the molecular mechanisms underlying these diseases. Traditional experimental methods suffer from the problems of high cost, long cycle time, difficulty, etc. With advancements in artificial intelligence, the prediction of protein post-translational modification sites through computational methods has emerged as a vital complement to experimental approaches. In this paper, we present a malonylation site prediction model, Catsoft_Kmalsite, the core innovation of which lies in its integration of complementary information from protein three-dimensional structural features and sequence/physicochemical features, coupled with a soft voting ensemble strategy based on Bayesian-optimized base classifiers. Specifically, we utilize AlphaFold2 to acquire protein tertiary structural information and employ CTDC, EAAC, and EGAAC methods to extract protein sequence and physicochemical features. Subsequently, two base classifiers are constructed using the CatBoost algorithm based on these two distinct feature sets, respectively. Following parameter fine-tuning of the base classifiers via Bayesian optimization, they are ultimately integrated using a soft voting strategy. All ablation experimental results show that the Catsoft_Kmalsite model exhibited good robustness and generalization ability. Across six metrics, including AUC, ACC, Sen, Pre, F1, and MCC, the model achieved average performances of 94.03%, 87.91%, 89.15%, 86.91%, 88.00%, and 0.7585, respectively, in fivefold cross-validation and specific performance of 95.18%, 89.55%, 90.87%, 88.79%, 89.82%, and 0.7912 on the independent test set; Catsoft_Kmalsite also outperformed other state-of-the-art studies in all evaluated metrics. Furthermore, we have developed a website for users to use (http://1.94.102.146:8501/Catsoft_Kmalsite). The code and dataset of Catsoft_Kmalsite are available at https://github.com/flyinsky6/Catsoft_Kmalsite.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from £29.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

The Identifications of Post Translational Modification Sites with Capsule Network

Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration

Article Open access 23 November 2020

Classification of Protein Modification Sites with Machine Learning

Data availability

The raw data is derived from the CPLM database (https://cplm.biocuckoo.cn/index.php). The process-related data and code for the paper are uploaded to https://github.com/flyinsky6/Catsoft_Kmalsite.

References

Vu LD, Gevaert K, Smet ID (2018) Protein language: post-translational modifications talking to each other. Trends Plant Sci 23(12):1068–1080. https://doi.org/10.1016/j.tplants.2018.09.004
Article CAS PubMed Google Scholar
Walsh CT, Garneau-Tsodikova S, Gatto GJ Jr (2005) Protein posttranslational modifications: the chemistry of proteome diversifications. Angew Chem Int Ed 44(45):7342–7372. https://doi.org/10.1002/anie.200501023
Article CAS Google Scholar
Brian J, Charlie F et al (2020) Light-driven post-translational installation of reactive protein side chains. Nature 585(7826):530–537. https://doi.org/10.1038/s41586-020-2733-7
Article CAS Google Scholar
Snider NT, Omary MB (2014) Post-translational modifications of intermediate filament proteins: mechanisms and functions. Nat Rev Mol Cell Biol 15(3):163–177. https://doi.org/10.1038/nrm3753
Article CAS PubMed PubMed Central Google Scholar
Peng C, Lu Z, Xie Z et al (2011) The first identification of lysine malonylation substrates and its regulatory enzyme. Mol Cell Proteomics 10(12):M111 012658. https://doi.org/10.1074/mcp.M111.012658
Article CAS PubMed PubMed Central Google Scholar
Zhongyu X, Junbiao D, Lunzhi D et al (2012) Lysine succinylation and lysine malonylation in histones. Mol & Cell Proteomics 11(5):100–107. https://doi.org/10.1074/mcp.M111.015875
Article CAS Google Scholar
Yipeng D, Tanxi C, Tingting L et al (2015) Lysine malonylation is elevated in type 2 diabetic mouse models and enriched in metabolic associated proteins. Mol & Cell Proteomics 14(1):227–236. https://doi.org/10.1074/mcp.M114.041947
Article CAS Google Scholar
Gozde C, Olga P, Lunzhi D et al (2015) Proteomic and biochemical studies of lysine malonylation suggest its malonic aciduria-associated regulatory role in mitochondrial function and fatty acid oxidation. Mol Cell Proteomics 14(11):3056–3071. https://doi.org/10.1074/mcp.M115.048850
Article CAS Google Scholar
Nishida Y, Rardin JM, Carrico C et al (2015) SIRT5 regulates both cytosolic and mitochondrial protein malonylation with glycolysis as a major target. Mol Cell 59(2):321–332. https://doi.org/10.1016/j.molcel.2015.05.022
Article CAS PubMed PubMed Central Google Scholar
Nie LB, Liang QL, Du R et al (2020) Global proteomic analysis of lysine malonylation in Toxoplasma gondii. Front Microbiol 11:776. https://doi.org/10.3389/fmicb.2020.00776
Article PubMed PubMed Central Google Scholar
Ramazi S, Tabatabaei HAS, Khalili E et al (2024) Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences. Database: J Biol Databases Curation. https://doi.org/10.1093/database/baad094
Article Google Scholar
Yan X, Ya-Xin D, Jun D et al (2016) Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci Rep. https://doi.org/10.1038/srep38318
Article PubMed PubMed Central Google Scholar
Li-Na W, Shao-Ping S, Hao-Dong X et al (2017) Computational prediction of species-specific malonylation sites via enhanced characteristic strategy. Bioinformatics (Oxford, England) 33(10):1457–1463. https://doi.org/10.1093/bioinformatics/btw755
Article CAS Google Scholar
Ghazaleh T, Yuedong Y, Haodong X et al (2018) Predicting lysine-malonylation sites of proteins using sequence and predicted structural features. J Computat Chem 39(22):1757–1763. https://doi.org/10.1002/jcc.25353
Article CAS Google Scholar
Xin L, Liang W, Jian L et al (2020) Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration. BMC Genom 21(1):812. https://doi.org/10.1186/s12864-020-07166-w
Article CAS Google Scholar
Ahmad W, Arafat E, Taherzadeh G et al (2020) Mal-Light: enhancing lysine malonylation sites prediction problem using evolutionary-based features. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2989713
Article PubMed PubMed Central Google Scholar
Dipta SR, Ahmad W, Arafat ME et al (2020) SEMal: accurate protein malonylation site predictor using structural and evolutionary information. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2020.104022
Article PubMed Google Scholar
Ghanbari SA, Jamshid P, Vahid G (2022) A hybrid feature extraction scheme for efficient malonylation site prediction. Sci Rep 12(1):5756. https://doi.org/10.1038/s41598-022-08555-9
Article CAS Google Scholar
Zhang Y, Xie R, Wang J, Leier A, Marquez-Lago TT, Akutsu T, Webb GI, Chou KC, Song J (2019) Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Briefings Bioinform 20(6):2185–2199. https://doi.org/10.1093/bib/bby079
Article CAS Google Scholar
Chen Z, He N, Huang Y et al (2018) Integration of a deep learning classifier with a random forest approach for predicting Malonylation sites. Genomics Proteomics Bioinform 16(06):451–459. https://doi.org/10.1016/j.gpb.2018.08.004
Article Google Scholar
Minghui W, Xiaowen C, Shan L et al (2020) DeepMal: accurate prediction of protein malonylation sites by deep neural networks. Chemomet Intelligent Lab Syst 207:104175. https://doi.org/10.1016/j.chemolab.2020.104175
Article CAS Google Scholar
Al-Barakati H, Thapa N, Hiroto S et al (2020) RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites. Computat Struct Biotechnol J. https://doi.org/10.1016/j.csbj.2020.02.012
Article Google Scholar
Minghui W, Lili S, Yaqun Z et al (2022) Malsite-Deep: prediction of protein malonylation sites through deep learning and multi-information fusion based on NearMiss-2 strategy. Knowledge-Based Syst 240:108191. https://doi.org/10.1016/j.knosys.2022.108191
Article Google Scholar
Sultan FM, Shaon HSM, Karim T et al (2024) MLAFP-XN: leveraging neural network model for development of antifungal peptide identification tool. Heliyon 10(18):e37820–e37820. https://doi.org/10.1016/j.heliyon.2024.e37820
Article CAS PubMed PubMed Central Google Scholar
John J, Richard E, Alexander P et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583–589. https://doi.org/10.1038/s41586-021-03819-2
Article CAS Google Scholar
Sultan FM, Karim T, Shaon HSM et al (2025) DHUpredET: a comparative computational approach for identification of dihydrouridine modification sites in RNA sequence. Analytical Biochem 702:702115828. https://doi.org/10.1016/j.ab.2025.115828
Article CAS Google Scholar
Swiss-Model: a web-based computational tool for designing of protein structures. Biotech Today, 2021, 10(2):43–44.https://doi.org/10.5958/2322-0996.2020.00025.3
Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features, 2017. https://doi.org/10.48550/arXiv.1706.09516
Zhang W, Tan X, Lin S et al (2022) CPLM 4.0: an updated database with rich annotations for protein lysine modifications. Nucleic Acids Res 50(D1):D451–D459. https://doi.org/10.1093/nar/gkab849
Article CAS PubMed Google Scholar
Vacic V, Iakoucheva LM, Radivojac P (2006) Two sample logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22(12):1536–1537. https://doi.org/10.1093/bioinformatics/btl151
Article CAS PubMed Google Scholar
Zhen C, Pei Z, Chen L et al (2021) Ilearnplus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab122
Article Google Scholar
Dietterich T G. 2000 Ensemble Methods in Machine Learning. proc international workshgp on multiple classifier systems. https://doi.org/10.1007/3-540-45014-9_1
Valentini G, Masulli F. 2002 Ensembles of Learning Machines. Springer-Verlag. https://doi.org/10.1007/3-540-45808-5_1
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):37–45. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Wei Y et al (2024) Editorial: special section on challenges and opportunities in biomedical big data analysis: from large language models to clinical applications. Big Data Min Anal 7(4):1114–1115. https://doi.org/10.26599/BDMA.2024.9020077
Article Google Scholar

Download references

Funding

Jiangsu Training Program of Innovation and Entrepreneurship for Undergraduates, 202410313030Z, General Project of the Natural Science Foundation of Jiangsu Higher Education Institutions of China, 22KJB520040, the Natural Science Research of Jiangsu Higher Education Institutions of China, 22KJB310021, and Postdoctoral Science Foundation of Jiangsu Province, 1701062B, 2017107001, Joint Project of Industry University Research of Jiangsu Province, BY20230198.

Author information

Linlin Xu and Yuting Qian have contributed equally to this work.

Authors and Affiliations

School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004, China
Linlin Xu, Jiayi Yang, Xiaowei Xu, Zhiqiang Li, Yanhan Wang, Enhui Lv, Xingxing Kang, Hongwei Zhang & Xin Liu
Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China
Linlin Xu
Suzhou Institute of Systems Medicine, Suzhou, 215123, China
Linlin Xu
Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, 221004, China
Yuting Qian
School of Humanities and Arts, China University of Mining and Technology, Xuzhou, 221116, China
Yaping Lu
China Telecom Corporation Limited, Xuzhou, 221000, China
Fei Wang

Authors

Linlin Xu
View author publications
Search author on:PubMed Google Scholar
Yuting Qian
View author publications
Search author on:PubMed Google Scholar
Jiayi Yang
View author publications
Search author on:PubMed Google Scholar
Xiaowei Xu
View author publications
Search author on:PubMed Google Scholar
Zhiqiang Li
View author publications
Search author on:PubMed Google Scholar
Yanhan Wang
View author publications
Search author on:PubMed Google Scholar
Enhui Lv
View author publications
Search author on:PubMed Google Scholar
Xingxing Kang
View author publications
Search author on:PubMed Google Scholar
Hongwei Zhang
View author publications
Search author on:PubMed Google Scholar
Yaping Lu
View author publications
Search author on:PubMed Google Scholar
Fei Wang
View author publications
Search author on:PubMed Google Scholar
Xin Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

LLX and YTQ equally contributed to the literature review and initial framework construction. JYY, XWX, ZQL, and YHW participated in data collection and pre—processing. EHL and XXK were responsible for the technical verification and algorithm optimization of the model. Hwz and YPL provided support in software development and data analysis tools. FW contributed industry—relevant insights and application—scenario suggestions. XL supervised the whole project, guided the research direction, and participated in the final paper revision and approval.

Corresponding author

Correspondence to Xin Liu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, L., Qian, Y., Yang, J. et al. Enhancing the identification of malonylation sites using AlphaFold2 and ensemble learning. Mol Divers (2025). https://doi.org/10.1007/s11030-025-11357-6

Download citation

Received: 20 May 2025
Accepted: 05 September 2025
Published: 05 October 2025
DOI: https://doi.org/10.1007/s11030-025-11357-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from £29.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Enhancing the identification of malonylation sites using AlphaFold2 and ensemble learning

Abstract

Graphical abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Identifications of Post Translational Modification Sites with Capsule Network

Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration

Classification of Protein Modification Sites with Machine Learning

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now