Thanks to visit codestin.com
Credit goes to github.com

Skip to content

latte193/DeepRTplus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

211 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepRT(+): ultra-precise peptide retention predictor

1 Installation

git clone https://github.com/horsepurve/DeepRTplus  
cd DeepRTplus

And then follow DeepRT_install.sh to install the prerequisites.

2 Scripts to reproduce the results

2.1 RPLC datasets

Let's see how to apply DeepRT on HeLa dataset (modifications included). Simply type:

python data_split.py data/mod.txt 9 1 2
python capsule_network_emb.py

The HeLa data is split with 9:1 ratio with random seed 2, 9 for training and 1 for testing, and then capsule network begins training. You may check out the prediction result (about 0.985 ACC) and log file in typically 3 minutes (on a laptop with GTX 1070, for example).

To reproduce the result in the paper, just run as:

cd work
sh ../pipeline_mod.sh

And then you may see the reports in work directory.

2.2 Other datasets

See data/README_data.md for a summary and run corresponding pipline. All the necessary parameters for those datasets are stored in config_backup.py.

3 Change to your own datasets

3.1 Datasets

Prepare your dataset as the following format:

sequence	RT
4GSQEHPIGDK	2507.67
GDDLQAIK	2996.73
FA2FNAYENHLK	4681.428
AH3PLNTPDPSTK	2754.66
WDSE2NSERDVTK	2645.274
TEEGEIDY2AEEGENRR	3210.3959999999997
SQGD1QDLNGNNQSVTR	2468.946

Separate the peptide sequence and RT (in second) by tab (\t), encode the modified amino acides as digits (currently only four kinds of modification are included in the pre-trained models):

'M[16]' -> '1',
'S[80]' -> '2',
'T[80]' -> '3',
'Y[80]' -> '4'

3.2 Model parameters

There are only several parameters to specify, e.g. for HeLa data, which is self-explainable:

train_path = 'data/mod_train_2.txt' 
test_path = 'data/mod_test_2.txt' 
result_path = 'result/mod_test_2.pred.txt'
log_path = 'result/mod_test_2.log'
save_prefix = 'epochs'
pretrain_path = ''
dict_path = '' 

conv1_kernel = 10
conv2_kernel = 10

min_rt = 0
max_rt = 110 
time_scale = 60 # set at 60 if your retention time is in second
max_length = 50 # maximum length of the peptides

Then type as following:

python capsule_network_emb.py

4 Transfer learning using our pre-trained models

Training deep neural network models are time-consuming, especially for large dataset such as the Misc dataset here. However, the prediction accuracy is far from satisfactory without training dataset that big enough. The transfer leaning strategy used here can overcome this issue. You can use your small datasets in hand to fine-tune our pre-trained model in RPLC.

For a demo using the transfer learning strategy, just type:

cd work
sh ../pipeline_mod_trans_emb.sh

Note that you have to use the GPU version to load the pre-trained models, or otherwise you have to train from scratch on CPU.

5 Make prediction using the trained models

Predicting unknown RT for a new peptide using a current model is easy to do, see below as a demo, the four parameters of which are maximum RT, saved RT model, convolutional filter size and testing file, respectively:

python prediction_emb.py max_rt param/dia_all_trans_mod_epo20_dim24_conv10.pt 10 ${rt_file}

Before all trainings, we firstly have normalized RTs for all peptides (rt_norm=(rt-min_rt)/(max_rt-min_rt)), so here we use max_rt to change them back to their previous RT scale (supposing min_rt==0).

6 Publication

doi: 10.1021/acs.analchem.8b02386 (PubMed)

7 Other models

As ResNet and LSTM (already been optimized) were less accurate then capsule network, the codes for ResNet and LSTM were deprecated, and DeepRT(+) (based on CapsNet) is recommended.

Of course you can still use SVM for training, use data_adaption.py to change the data format, and then import it to Elude/GPTime.

8 CPU version

Running DeepRT on CPU is not recommended, because it is way too slow. However, if you have to, use capsule_network_emb_cpu.py instead of capsule_network_emb.py. You can set BATCH_SIZE to be very large if you have large enough memory.

9 Questions

contact

About

Deep Learning for Peptide Retention Time Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 76.7%
  • Shell 23.3%