Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Walk-through example

Content

1) Introduction

The objective of puutools is to dynamically update the structure of the lineage tree during a simulation in an efficient way, such that the amount of data in live memory is minimized. puutools algorithms have been optimized to this aim.

First consult the main documentation to install puutools on your device.

puutools is distributed as an external static library. Once installed, the header must be included with the standard #include directive:

#include <puutools.h>

The main structure that will be manipulated is the class puu_tree, which instanciates a dynamical representation of a lineage or coalescence tree. puu_tree is a template class:

template <typename selection_unit>
class puu_tree

with selection_unit being any class of your own, with the only constraint that the copy constructor must be fully implemented.

For example, if your individual class is Cell, the tree will be instanciated as:

puu_tree<Cell> my_tree;

In this example, we will implement a simple evolutionary simulation algorithm with a constant population of size $N$. Individuals are asexual and generations are non-overlapping. Each individual has a single a phenotypic trait $x \in \mathbb{R}$, which mutates with a probability $m$ (per individual per generation) and a mutation size $s$. The phenotypic trait after mutation is then $x' = x + \epsilon,\ \epsilon \sim \mathcal{N}(0, s)$. The fitness function is Gaussian: $w = e^{-\frac{x^2}{2}}$. The number of descendants at each generation is fitness proportionate, meaning that it is drawn from a multinomial distribution.

We will now walk through puutools classes step by step.

2) Pre-processor #include directives

We first include the necessary standard library (std) utilitaries and the puutools library:

#include <iostream>
#include <vector>
#include <tuple>
#include <assert.h>
#include <puutools.h>

We then include three classes that have been pre-implemented on purpose for this tutorial (see the example folder of this repository):

#include "Prng.h"
#include "Individual.h"
#include "Simulation.h"

The Prng class contains several random numbers generation functions based on the GNU Scientific Library. The class Individual contains the structure of an individual (one phenotypic trait and one fitness value, plus a few methods)—this class will be used by puutools to instanciate trees (the class Individual must have a properly defined copy constructor). The class Simulation contains all the code to run an evolutionary simulation.

3) Read command line parameters

We need to define five parameters:

  • The initial trait value $x_0$;
  • The simulation time $T$ (in generations);
  • The population size $N$;
  • The mutation rate $m$;
  • The mutation size $s$;

Let's implement a piece of code to read our parameters from the command line:

int main( int argc, char const** argv )
{
  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
  /* 1) Read simulation parameters         */
  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/

  assert(argc==6);
  (void)argc;
  double  initial_trait_value = atof(argv[1]);
  int     simulation_time     = atoi(argv[2]);
  int     population_size     = atoi(argv[3]);
  double  mutation_rate       = atof(argv[4]);
  double  mutation_size       = atof(argv[5]);
  std::cout << "> Running a simulation with the following parameters:" << std::endl;
  std::cout << "  • Initial trait value: " << initial_trait_value << std::endl;
  std::cout << "  • Simulation time    : " << simulation_time << std::endl;
  std::cout << "  • Population size    : " << population_size << std::endl;
  std::cout << "  • Mutation rate      : " << mutation_rate << std::endl;
  std::cout << "  • Mutation size      : " << mutation_size << std::endl;

4) Instanciate the pseudo-random numbers generator (PRNG)

We also instanciate a PRNG:

  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
  /* 2) Create the prng                    */
  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/

  Prng prng(time(0));

5) Initialize the population

This step creates the simulation and initializes the population:

  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
  /* 3) Create the simulation              */
  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/

  Simulation simulation(&prng, initial_trait_value, population_size, mutation_rate, mutation_size);
  simulation.initialize_population();

6) Create a lineage and a coalescence tree, and add the roots

We will create two trees:

  • A lineage tree, containing parent-children relationships at every generations,
  • A coalescence tree, which will only contain common ancestors.

  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
  /* 4) Create trees and add roots         */
  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/

  puu_tree<Individual> lineage_tree;
  puu_tree<Individual> coalescence_tree;

  for (int i = 0; i < population_size; i++)
  {
    lineage_tree.add_root(simulation.get_individual(i));
    coalescence_tree.add_root(simulation.get_individual(i));
  }

We first instanciate two trees with the class Individual. It is not mandatory to name your individual class "Individual".

We then create a root in both trees for each of the $N$ individuals at generation zero, with the function add_root(*individual). It is essential to root a tree at the beginning of a simulation.

7) Run the evolutionary algorithm

This is the core of our algorithm. The different tasks have been written as separate code loops for clarity, however it is possible to optimize the code by merging several loops together. At each generation:

  1. The next generation of individuals is created;
  2. All reproduction events are added to the trees;
  3. The previous generation is "inactivated" in the trees (i.e. parents die);
  4. The population is updated with next generation's individuals;
  5. Trees structures are updated;

  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
  /* 5) Evolve the population              */
  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/

  for (int generation = 1; generation <= simulation_time; generation++)
  {
    if (generation%1000==0)
    {
      std::cout << ">> Generation " << generation << "\n";
    }

    /* STEP 1 : Create the next generation
       ------------------------------------ */
    simulation.create_next_generation();

    /* STEP 2 : Add reproduction events
       --------------------------------- */
    Individual* parent;
    Individual* descendant;
    std::tie(parent, descendant) = simulation.get_first_parent_descendant_pair();
    while (parent != NULL)
    {
      lineage_tree.add_reproduction_event(parent, descendant, (double)generation);
      coalescence_tree.add_reproduction_event(parent, descendant, (double)generation);
      std::tie(parent, descendant) = simulation.get_next_parent_descendant_pair();
    }

    /* STEP 3 : Inactivate parents
       ---------------------------- */
    for (int i = 0; i < population_size; i++)
    {
      lineage_tree.inactivate(simulation.get_individual(i), true);
      coalescence_tree.inactivate(simulation.get_individual(i), false);
    }

    /* STEP 4 : Replace the current population with the new one
       --------------------------------------------------------- */
    simulation.update_population();

    /* STEP 5: Update the lineage and coalescence trees
       ------------------------------------------------- */
    lineage_tree.update_as_lineage_tree();
    coalescence_tree.update_as_coalescence_tree();
  }

At STEP 2, we register in the trees every reproduction events to add the new node relationships. This is done with the method add_reproduction_event(*parent, *child, time).

At STEP 3, we must indicate for each tree which individuals from the previous generation are now dead, thanks to the method inactivate(*individual, copy). The parameter copy is a boolean (true/false). If true, the tree creates a copy of the individual, and saves it independently from the main population algorithm (this is why it is mandatory to implement a copy constructor with puutools). Importantly, calling the method inactivate(*individual, copy) depends on your algorithm. Indeed, it can happen that both the parent and its children remain alive at the next generation (e.g. for a bacterial population). However using this function is mandatory, as tree's structure manipulations can only be done with inactivated nodes.

Note also that at STEP 3, we copy the dead individuals in the lineage tree, but not in the coalescence tree. Indeed, we will recover later the evolution of the phenotypic trait and the fitness from the lineage tree, while we will only extract the structure of the coalescence tree.

💡 TIP: It is not mandatory to call the STEP 5 at each generation: if a tree is updated more often, this will increase the computational load. If the tree is updated less often, this will increase the memory load (trees grow at each generation before being pruned and shortened). The user must decide of the period depending on the performance of its own code.

💡 TIP: The size of a coalescence tree is approximately constant over time ($2N-1$ nodes), while a lineage tree grows slowly. Depending on the complexity of your simulation, in can be useful to create a secondary class saving the most important information from your individuals (e.g. phenotypic trait values, mutational events, etc) and provide it to the trees instead of your main individual class.

8) Final step: extracting information from the trees

Now that the simulation reached an end, we will extract some information from the trees. We call a last time update functions to ensure a correct final structure:

  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
  /* 6) Save lineage and coalescence data  */
  /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/

  lineage_tree.update_as_lineage_tree();
  coalescence_tree.update_as_coalescence_tree();

We first retrieve the lineage of the last best individual. To do so, we first get the best individual's node using the method get_node_by_selection_unit(*individual). We then trace back the lineage of this particular node using the method get_parent(), until the root of the tree is reached. Doing so, we write statistics in a file:

  /* Save the lineage of the last best individual
     --------------------------------------------- */
  std::ofstream file("./output/lineage_best.txt", std::ios::out | std::ios::trunc);
  file << "generation mutation_size trait fitness" << std::endl;
  puu_node<Individual>* best_node = lineage_tree.get_node_by_selection_unit(simulation.get_best_individual());
  while (best_node != NULL)
  {
    file << best_node->get_insertion_time() << " ";
    file << best_node->get_selection_unit()->get_mutation_size() << " ";
    file << best_node->get_selection_unit()->get_trait() << " ";
    file << best_node->get_selection_unit()->get_fitness() << std::endl;
    best_node = best_node->get_parent();
    file.flush();
  }
  file.close();

We then save the data over the whole lineage tree. To do so, we use the methods get_first() and get_next(). When the last node is reached, the function returns NULL.

  /* Save the lineage of all alive individuals
     ------------------------------------------ */
  file.open("./output/lineage_all.txt", std::ios::out | std::ios::trunc);
  file << "generation mutation_size trait fitness" << std::endl;
  puu_node<Individual>* node = lineage_tree.get_first();
  while (node != NULL)
  {
    file << node->get_insertion_time() << " ";
    file << node->get_selection_unit()->get_mutation_size() << " ";
    file << node->get_selection_unit()->get_trait() << " ";
    file << node->get_selection_unit()->get_fitness() << std::endl;
    file.flush();
    node = lineage_tree.get_next();
  }
  file.close();

Finally, we save the structure of the coalescence tree in Newick format (.phb extension):

  /* Save the coalescence tree
     -------------------------- */
  coalescence_tree.write_newick_tree("./output/coalescence_tree.phb");

9) Results

This simulation example is available in the folder example of this repository, and can be compiled with CMake (navigate to the folder example/cmake with a terminal and run the following command:

  sh make_release.sh

The binary executable puutools_example is located in the folder example/build/bin.

As an example, a simulation have been run by shifting an initial population of size $N=200$ away from the fitness optimum (initial trait value $x = 2$). The simulation time is $T=10000$ generations, with a mutation rate $m=0.02$ and a mutation size $s=0.02$.

  ../build/bin/puutools_example 2.0 10000 200 0.02 0.02

Output files are written in the folder example/output, which also contains a Rscript to generate a figure. Here, we can see that the population evolved towards the optimum. As we recover the lineage of the last best individual, we have also access to the size of fixed mutations.