DeepRT(+): ultra-precise peptide retention predictor

1 Installation

git clone https://github.com/horsepurve/DeepRTplus  
cd DeepRTplus

And then follow DeepRT_install.sh to install the prerequisites.

2 Scripts to reproduce the results

2.1 RPLC datasets

Let's see how to apply DeepRT on HeLa dataset (modifications included). Simply type:

python data_split.py data/mod.txt 9 1 2
python capsule_network_emb.py

The HeLa data is split with 9:1 ratio with random seed 2, 9 for training and 1 for testing, and then capsule network begins training. You may check out the prediction result (about 0.985 ACC) and log file in typically 3 minutes (on a laptop with GTX 1070, for example).

To reproduce the result in the paper, just run as:

cd work
sh ../pipeline_mod.sh

And then you may see the reports in work directory.

2.2 Other datasets

See data/README_data.md for a summary and run corresponding pipline. All the necessary parameters for those datasets are stored in config_backup.py.

3 Change to your own datasets

3.1 Datasets

Prepare your dataset as the following format:

sequence	RT
4GSQEHPIGDK	2507.67
GDDLQAIK	2996.73
FA2FNAYENHLK	4681.428
AH3PLNTPDPSTK	2754.66
WDSE2NSERDVTK	2645.274
TEEGEIDY2AEEGENRR	3210.3959999999997
SQGD1QDLNGNNQSVTR	2468.946

Separate the peptide sequence and RT (in second) by tab (\t), encode the modified amino acides as digits (currently only four kinds of modification are included in the pre-trained models):

'M[16]' -> '1',
'S[80]' -> '2',
'T[80]' -> '3',
'Y[80]' -> '4'

3.2 Model parameters

There are only several parameters to specify, e.g. for HeLa data, which is self-explainable:

train_path = 'data/mod_train_2.txt' 
test_path = 'data/mod_test_2.txt' 
result_path = 'result/mod_test_2.pred.txt'
log_path = 'result/mod_test_2.log'
save_prefix = 'epochs'
pretrain_path = ''
dict_path = '' 

conv1_kernel = 10
conv2_kernel = 10

min_rt = 0
max_rt = 110 
time_scale = 60 # set at 60 if your retention time is in second
max_length = 50 # maximum length of the peptides

Then type as following:

python capsule_network_emb.py

4 Transfer learning using our pre-trained models

Training deep neural network models are time-consuming, especially for large dataset such as the Misc dataset here. However, the prediction accuracy is far from satisfactory without training dataset that big enough. The transfer leaning strategy used here can overcome this issue. You can use your small datasets in hand to fine-tune our pre-trained model in RPLC.

For a demo using the transfer learning strategy, just type:

cd work
sh ../pipeline_mod_trans_emb.sh

Note that you have to use the GPU version to load the pre-trained models, or otherwise you have to train from scratch on CPU.

5 Make prediction using the trained models

Predicting unknown RT for a new peptide using a current model is easy to do, see below as a demo, the four parameters of which are maximum RT, saved RT model, convolutional filter size and testing file, respectively:

python prediction_emb.py max_rt param/dia_all_trans_mod_epo20_dim24_conv10.pt 10 ${rt_file}

Before all trainings, we firstly have normalized RTs for all peptides (rt_norm=(rt-min_rt)/(max_rt-min_rt)), so here we use max_rt to change them back to their previous RT scale (supposing min_rt==0).

6 Publication

doi: 10.1021/acs.analchem.8b02386 (PubMed)

7 Other models

As ResNet and LSTM (already been optimized) were less accurate then capsule network, the codes for ResNet and LSTM were deprecated, and DeepRT(+) (based on CapsNet) is recommended.

Of course you can still use SVM for training, use data_adaption.py to change the data format, and then import it to Elude/GPTime.

8 CPU version

Running DeepRT on CPU is not recommended, because it is way too slow. However, if you have to, use capsule_network_emb_cpu.py instead of capsule_network_emb.py. You can set BATCH_SIZE to be very large if you have large enough memory.

9 Questions

contact

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
data		data
epochs		epochs
param		param
result		result
work		work
.gitignore		.gitignore
DeepRT_install.sh		DeepRT_install.sh
README.md		README.md
RTdata_emb.py		RTdata_emb.py
capsule_network_emb.py		capsule_network_emb.py
capsule_network_emb_cpu.py		capsule_network_emb_cpu.py
config.py		config.py
config_backup.py		config_backup.py
data_adaption.py		data_adaption.py
data_split.py		data_split.py
ensemble_emb.py		ensemble_emb.py
pipeline_ATLANTIS_SILICA_emb.sh		pipeline_ATLANTIS_SILICA_emb.sh
pipeline_HILIC_emb.sh		pipeline_HILIC_emb.sh
pipeline_LUNA_SILICA_emb.sh		pipeline_LUNA_SILICA_emb.sh
pipeline_SCX.sh		pipeline_SCX.sh
pipeline_Xbridge_emb.sh		pipeline_Xbridge_emb.sh
pipeline_mod.sh		pipeline_mod.sh
pipeline_mod_trans_emb.sh		pipeline_mod_trans_emb.sh
pipeline_unmod.sh		pipeline_unmod.sh
prediction_emb.py		prediction_emb.py
prediction_emb_cpu.py		prediction_emb_cpu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepRT(+): ultra-precise peptide retention predictor

1 Installation

2 Scripts to reproduce the results

2.1 RPLC datasets

2.2 Other datasets

3 Change to your own datasets

3.1 Datasets

3.2 Model parameters

4 Transfer learning using our pre-trained models

5 Make prediction using the trained models

6 Publication

7 Other models

8 CPU version

9 Questions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepRT(+): ultra-precise peptide retention predictor

1 Installation

2 Scripts to reproduce the results

2.1 RPLC datasets

2.2 Other datasets

3 Change to your own datasets

3.1 Datasets

3.2 Model parameters

4 Transfer learning using our pre-trained models

5 Make prediction using the trained models

6 Publication

7 Other models

8 CPU version

9 Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages