Dynamic characterization and interpretation for protein–RNA interactions across diverse cellular conditions using HDRNet
HDRNet is a python package developed for RNA-RBP interaction sites identification and high-attention binding peak recognization using CNN based deep network.
RNA-binding proteins (RBPs) play crucial roles in the regulation of gene expression, and understanding the interactions between RNAs and RBPs in distinct cellular conditions forms the basis for comprehending the underlying RNA function and post-transcriptional regulatory mechanisms. However, the limitations of the current computational methods to account for the diversity of cellular conditions poses a formidable challenge to the cross-prediction of RNA-protein binding events across different cell lines and tissue contexts. Here, we developed HDRNet, an end-to-end deep learning-based framework to precisely predict dynamic RBP binding events. To extract the rich information available in RNA sequences, multi-source information including in vivo RNA secondary structure information and bio-language features are integrated to characterize both the sequence and structural features of RNA. Then, hierarchical multi-scale residual networks (HMRN) are leveraged to understand the contextual dependencies between the nucleotides and their structure. In addition, a deep protein-RNA binding predictor (DPRBP) is proposed to learn the underlying representation , and to select the crucial nucleotide tokens in a synergistic manner. We demonstrate the effectiveness of HDRNet by comparing it to state-of-the-art RNA-binding event identification methods on 261 linear RNA datasets from both eCLIP and CLIP-seq, supplemented with several additional tissue data in both static and dynamic cellular conditions. Our results indicated that the proposed method outperforms the current approaches and is particularly suitable for dynamic prediction. Moreover, we conducted motif and interpretation analyses to provide fresh insights into the pathological mechanisms underlying RNA-RBP interactions from various perspectives. Our functional genomic analysis further explored the gene-human disease associations, uncovering previously uncharacterized observations for a broad range of genetic disorders.
Due to the capacity limiation of Github, we put the relevant files (including the BERT model and all datasets) in our webserver webserver and Zenodo. All source code, data and model are open source and can be downloaded from GitHub or any other processes.
HDRNet package requires only a standard computer with enough RAM to support the in-memory operations.
This package is supported for Linux. The package has been tested on the following systems:
- Linux: Ubuntu 20.04
HDRNet mainly depends on the Python scientific stack.
numpy
scipy
pytorch
scikit-learn
scikit-image
pandas
transformers
shap
For specific setting, please see requirements or yml.
$ conda env create -f HDRNet.yml
The BERT model and all to be trained datasets should first be download and put into the corresponding
folder. Then, you can train a model with a certain RBP dataset using the following command:
python main.py --data_file TIA1_Hela --train --BERT_model_path ./BERT_Model --model_save_path ./results/modelThe --BERT_model_path parameter can be any BERT model path that takes RNA sequences as input,
and can be identified by the transformers module in python. All main parameters are listed in main.py.
After training, you can validate the model by using :
python main.py --data_file TIA1_Hela --validate --BERT_model_path ./BERT_ModelTake TIA1_Hela dataset as an example, the validation is upon the trained TIA1_Hela model stored
in the --model_save_path.
We also provide dynamic prediction tasks by using:
python main.py --data_file some_dataset --dynamic_validate --BERT_model_path ./BERT_ModelThe --data_file is the dataset to be validated. For example, if the input data file is AARS_K562,
then the AARS_HepG2 model will be loaded to valid the AARS_K562 dataset. The trained model should
be saved first.
The prediction results will be displayed automatically. If you need to save the results, please
specify the path yourself.
We also provide users with a complete prediction process in the tutorial, where the prediction tasks and the high-attention binding region visualization are included.
We present a user-friendly web server for the HDRNet method at webserver, which enables users to determine whether a given RNA sequence is a binding site for RNA-binding proteins (RBPs). Moreover, all supporting source code and data can be downloaded from here and Zenodo, and FigShare.
This project is covered under the MIT License.
Thank you for using HDRNet! Any questions, suggestions or advice are welcome!
Contact: [email protected], [email protected]