- 
                Notifications
    
You must be signed in to change notification settings  - Fork 12
 
Description
Hello,
I have now proceeded to the synthetic read generation itself. However, when checking my alignment with 'samtools tview', the reads present approximately 30 mismatches each, for a total of 76 bp length. Once these synthetic reads are converted to FASTQ format and then realigned using BWA, they mostly end up as unaligned records. I have tried tweaking the parameters of SimSeq, so far to no avail. The original BAM file is free of duplicates and has a much lower error rate, although perhaps some regions have a high error rate due to structural events.
In order to generate the alignment I did the following:
-estimate the error profile for an existing BAM file using getErrorProfile
-generate a modified sequence according to my own needs
-feed that sequence and the error profile into SimSeq
-add headers to the generated SAM file
-converting the SAM to BAM
-sort the BAM
-use tview to examine the alignment against the modified sequence
I also went through the ultimate steps once (SamToFastq then BWA).
The command-line I used for SimSeq is as follows:
java -jar -Xmx2048m SimSeq.jar  -1 76 -2 76 -e ep.txt -l 5500 -m -n 64000000 -o synth.sam --phred64 -r modified.fa --mate_pulldown_error_p 0.7
I tried several values for the mate pulldown error, but it didn't help. My guess is that the values in ep.txt are overestimated for some reason. Is there any way to either control how they are used by the program, or to lower them in a rational way to a more adequate number? Also, you mentioned something about the values being single-position error probabilities and not conditional ones. Perhaps this could be the cause?
Thanks in advance,
A./