-
Notifications
You must be signed in to change notification settings - Fork 6
Output
Similar to the command line arguments, scrm's output is build to mimic ms. Here is an example:
scrm 4 2 -t 5
2052984505
//
segsites: 6
positions: 0.0179738 0.465467 0.553021 0.713282 0.726011 0.761832
000010
001110
000010
110001
//
segsites: 2
positions: 0.276044 0.313401
10
11
00
00
This first two lines are the executed command and the used random seed. These two lines are sufficient to reproduce the simulation. To repeat this run, we would need to call scrm 4 2 -t 5 -seed 2052984505.
Next, we have two blocks in the output, each beginning with \\. Each block represents the summary statistics for a single locus. Here, we only ask for the segregating sites summary statistic, which tells us that we have 6 SNPs in the first locus, and 2 SNPs in the second one. What follows are the positions of the SNPs on the chromosome (where 0 to 1 mark the ends of the chromosome) and a matrix that tells use if a SNP (row) is present (1) or absent (0) in each of the four individuals (columns).
In the output, individual are given in the same order as they are mentioned on the command line. This is important when using multiple populations (-I) and possibly multiple sample times (-eI). For example, if the model specification includes
-I 3 2 0 2 -eI 1.0 0 2 1 -eI 0.5 1 0 0
individuals will be in the following order:
| Individual | Population | Sample time |
|---|---|---|
| 1 | 1 | 0.0 |
| 2 | 1 | 0.0 |
| 3 | 3 | 0.0 |
| 4 | 3 | 0.0 |
| 5 | 2 | 1.0 |
| 6 | 2 | 1.0 |
| 7 | 3 | 1.0 |
| 8 | 1 | 0.5 |
The options -T prints the local trees in Newick format:
scrm 4 1 -T
3950440499
//
(((1:0.0512222,2:0.0512222):0.10392,3:0.155142):0.0471195,4:0.202262);
If used with recombination, a number before each tree indicates the number of sequences positions for with this tree is valid:
scrm 4 1 -T -r 1 10 -seed 17
17
//
[1](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[2](3:1.37223,((4:0.066595,1:0.066595):0.113835,2:0.18043):1.1918);
[1](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[6](2:0.296408,((4:0.066595,1:0.066595):0.0297034,3:0.0962985):0.20011);
Here, the first tree spans over the first base, the second one over the next two and so on. Please note that for compatibility with ms, the duration values of the trees are given as integer values. Internally, scrm assumes an infinite-sites model for both recombinations and mutations. It is often a good idea -- in particular when mapping mutations to trees -- to use the -SC rel option, which makes scrm print the tree durations as factions of the locus length:
scrm 4 1 -T -r 1 10 -SC rel -seed 17
17
//
[0.0807038](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[0.198787](3:1.37223,((4:0.066595,1:0.066595):0.113835,2:0.18043):1.1918);
[0.0489976](((4:0.066595,1:0.066595):0.113835,2:0.18043):1.46541,3:1.64584);
[0.00554392](2:1.64584,((4:0.066595,1:0.066595):0.0297034,3:0.0962985):1.54954);
[0.665967](2:0.296408,((4:0.066595,1:0.066595):0.0297034,3:0.0962985):0.20011);
Please note that this is the same simulation as above, but the fourth tree was removed because it did not span across one 'integer position'.
Also you can use -O instead of -T to use the oriented forest format:
scrm 4 1 -O -r 1 100 -seed 16
16
//
{"length":70, "parents":[5,6,7,5,6,7,0], "node_times":[0,0,0,0,0.328647,0.559424,0.88458]}
{"length":30, "parents":[5,7,5,6,6,7,0], "node_times":[0,0,0,0,0.0700593,0.328647,0.559424]}
Here each local tree is encoded in a JSON-String where length is again the number of positions for which each tree is valid. Nodes are represented by a position in the "parents" and "node_times" vectors. The parents vector gives the position of each nodes parent (e.g. in the first tree, node 1 and node 4 have node 5 as parent, which itself has the node 6 as parent). The root is the last node and has parent 0. The node_times vector states the time for each node.
The -L options prints the TMRCA an the length of the total tree for each segment in coalescent time units.
scrm 4 1 -L -r 1 100
1997945978
//
time: 1.443 3.15757
time: 1.36679 3.00516
time: 2.13995 4.55148
If you use -oSFS along with -t, a site frequency spectrum for the complete sample is generated:
scrm 4 1 -t 5 -oSFS
2428761806
//
segsites: 2
positions: 0.347745 0.780384
00
10
00
11
SFS: 1 1 0