Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Paul Staab edited this page Jan 14, 2018 · 10 revisions

Output Format

Similar to the command line arguments, scrm's output is build to mimic ms. Here is an example:

scrm 4 2 -t 5
2052984505

//
segsites: 6
positions: 0.0179738 0.465467 0.553021 0.713282 0.726011 0.761832 
000010
001110
000010
110001

//
segsites: 2
positions: 0.276044 0.313401 
10
11
00
00

This first two lines are the executed command and the used random seed. These two lines are sufficient to reproduce the simulation. To repeat this run, we would need to call scrm 4 2 -t 5 -seed 2052984505.

Next, we have two blocks in the output, each beginning with \\. Each block represents the summary statistics for a single locus. Here, we only ask for the segregating sites summary statistic, which tells us that we have 6 SNPs in the first locus, and 2 SNPs in the second one. What follows are the positions of the SNPs on the chromosome (where 0 to 1 mark the ends of the chromosome) and a matrix that tells use if a SNP (row) is present (1) or absent (0) in each of the four individuals (columns).

In the output, individual are given in the same order as they are mentioned on the command line. This is important when using multiple populations (-I) and possibly multiple sample times (-eI). For example, if the model specification includes

-I 3 2 0 2 -eI 1.0 0 2 1 -eI 0.5 1 0 0

individuals will be in the following order:

Individual Population Sample time
1 1 0.0
2 1 0.0
3 3 0.0
4 3 0.0
5 2 1.0
6 2 1.0
7 3 1.0
8 1 0.5

Trees

The options -T prints the local trees in Newick format:

scrm 4 1 -T
3950440499

//
(((1:0.0512222,2:0.0512222):0.10392,3:0.155142):0.0471195,4:0.202262);

If used with recombination, a number before each tree indicates the number of sequences positions for with this tree is valid:

scrm 4 1 -T -r 1 10 -seed 17
17

//
[1](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[2](3:1.37223,((4:0.066595,1:0.066595):0.113835,2:0.18043):1.1918);
[1](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[6](2:0.296408,((4:0.066595,1:0.066595):0.0297034,3:0.0962985):0.20011);

Here, the first tree spans over the first base, the second one over the next two and so on. Please note that for compatibility with ms, the duration values of the trees are given as integer values. Internally, scrm assumes an infinite-sites model for both recombinations and mutations. It is often a good idea -- in particular when mapping mutations to trees -- to use the -SC rel option, which makes scrm print the tree durations as factions of the locus length:

scrm 4 1 -T -r 1 10 -SC rel -seed 17
17

//
[0.0807038](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[0.198787](3:1.37223,((4:0.066595,1:0.066595):0.113835,2:0.18043):1.1918);
[0.0489976](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[0.00554392](2:1.64584,((4:0.066595,1:0.066595):0.0297034,3:0.0962985):1.54954);
[0.665967](2:0.296408,((4:0.066595,1:0.066595):0.0297034,3:0.0962985):0.20011);

Please note that this is the same simulation as above, but the fourth tree was removed because it did not span across one 'integer position'.

Also you can use -O instead of -T to use the oriented forest format:

scrm 4 1 -O -r 1 100 -seed 16
16

//
{"length":70, "parents":[5,6,7,5,6,7,0], "node_times":[0,0,0,0,0.328647,0.559424,0.88458]}
{"length":30, "parents":[5,7,5,6,6,7,0], "node_times":[0,0,0,0,0.0700593,0.328647,0.559424]}

Here each local tree is encoded in a JSON-String where length is again the number of positions for which each tree is valid. Nodes are represented by a position in the "parents" and "node_times" vectors. The parents vector gives the position of each nodes parent (e.g. in the first tree, node 1 and node 4 have node 5 as parent, which itself has the node 6 as parent). The root is the last node and has parent 0. The node_times vector states the time for each node.

TMRCA and tree length

The -L options prints the TMRCA an the length of the total tree for each segment in coalescent time units.

scrm 4 1 -L -r 1 100
1997945978

//
time:	1.443   	3.15757
time:	1.36679 	3.00516
time:	2.13995 	4.55148

Site Frequency Spectrum

If you use -oSFS along with -t, a site frequency spectrum for the complete sample is generated:

scrm 4 1 -t 5 -oSFS
2428761806

//
segsites: 2
positions: 0.347745 0.780384 
00
10
00
11
SFS: 1 1 0 
Clone this wiki locally