-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
2.0 version
MAAG
- Implement shifts in event probabilities.
- In Clonotypes.
- In MAAGBuilder.
- Replace generation probability computing with forward-algo-like procedure.
- Add MarkovChain with errors.
- Tests
- Implement errors in MAAGBuilder.
- V.
- D.
- J.
- Implement errors in MAAGForward-backward.
- VJ (waiting for 100K test)
- VDJ
- Implement errors in alignments.
- Fix replacement of MAAG event probe in MAAGBuilder.
- Add move assignment operator to MAAG.
- With which value initialise error probability?
PAM
- Implement a PAM + inference algorithm with errors in alignments.
- VJ
- VDJ
- Fix segfault
"There are four common mistakes that lead to segmentation faults: dereferencing NULL, dereferencing an uninitialized pointer, dereferencing a pointer that has been freed (or deleted, in C++) or that has gone out of scope (in the case of arrays declared in functions), and writing off the end of an array.
A fifth way of causing a segfault is a recursive function that uses all of the stack space. On some systems, this will cause a "stack overflow" report, and on others, it will merely appear as another type of segmentation fault. "
IO
- Fix Python converter (V / D / J alignments column instead of starts/ends columns)
- Fix writer
- Refactor parser.
- Refactor parser with the new aligner with virtual functions instead of templates.
- Implement a separate class for align all genes on clonotypes sequences. Pass it as a object to Parser if you (user) want to.
- Implement SW local aligner for Variable genes.
- Implement SW local aligner for Joining genes.
- Add translation subroutine.
- Add aligner parameters for alignment - thresholds for length / score, etc.
2.1 version
MAAG
- Add MarkovChain to MAAG (for amino acids).
- VJ
- VDJ
- Implement MAAGaa
- VJ
- VDJ
- Implement amino acid sequence MAAG builder.
- Tests.
IO
- Implement amino acid aligner.
- VJ
- Tests.
- VDJ
- Tests.
- VJ
2.2 version
PAM
- Data diversity measure.
- Implement and test new secret EM algorithm.
- Save #iter for each parameter, not globally.
2.3 version
Optimisations
- Add parallelisation.
- Parallel MAAG building.
- Parallel marginal probs updates in EM.
http://www.futurechips.org/tips-for-power-coders/writing-optimizing-parallel-programs-complete.html
- Optimise MAAG updating in the builder.
- Replace objects in MAAGBuilder with pointers.
- Do not reallocate ProbMMC in MAAGForwardBackward at each step. Just resize and clear it so reserved space won't change.
- Compute full prob while building MAAGs and store them in MAAGs.
2.4 version
Docs
- Add support for high precision numbers or decide to work only with long doubles.
- Write API documentation using Doxygen.
- Write general / usage documentation using MkDocs.
- Publish all documentation on GitHub pages.
2.5 version
IO
- MAAG serialization.
- Binary representation.
- Tests.
- Reading.
- Tests.
- Writing.
- Tests.
- Binary representation.
- ??? Memory mapped MAAG repertoire in case of very large files (align -> save to disk -> read from the memory mapped file).
Far Future
MAAG
- Add checks for zero or error gene segments and other events in MAAG builder.
AAPAG
- Implement AAPAG (Amino Acid Pattern Assembly Graph).
- Implement fast generation of neighbour amino acid sequences.
Optimisations
- Play with SIMD https://github.com/p12tic/libsimdpp
- markov chains, probs in forward-backward
- computing of full probabilities
- Rewrite all using templates - in this case code will be without unnecessary "ifs". Basic scripts (compute, inference and generate) for each possible recombination.
- Do return value optimisation everywhere when possible.
- Check if lazy evaluation can be added anywhere.
- Decide to refactor or not MarkovChain in MAAGBuilder.
- Branching (if - statements) optimisations.
- Try to always build event indices MMC, just do not include it to the resulting MAAG.
- Move if (full_build) from the cycles to their own out cycles with only one cycle in MAAGBuilder.
- ?: instead of if-else in MAAGBuilder deletions and insertions.
- Check speed in ClonotypeBuilder in returning void vs returning ClonotypeBuilder& procedures.
- Use fixed-size matrices in some cases like VJ deletions because all VJ gene segments sequences are pretty similar in size. (???)
- Rewrite ModelParameterVector with plain arrays.
- Optimise sequence class (currently std::string, need speed and memory improvements using bit vectors).
- Compilation options which removes all verbosing for speed.
Refactoring
- Replace all raw pointer with std::unique_ptr.
- Add Google Test instead of my test.
- Shared ptr for VDJRecombinationGenes.
Other
- Add error rate as a argument to the script.
- Models as gzipped folders.
- Add speed benchmarks.
- Compute estimation of MAAG matrices memory and compare it to the graph representation (pointers occupy memory too!). In the mean matrix representation should be more space efficient.
- Installation instructions for GMP and GCC: https://github.com/davidsd/sdpb/blob/master/Readme.md#mac-os-x-installation
- Add progress bars.
http://stackoverflow.com/questions/14539867/how-to-display-a-progress-indicator-in-pure-c-c-cout-printf
https://gist.github.com/cmckni3/6420109 - ?? Design an R package for the cloneset analysis using the generation probabilities with interface to call the Ymir tool.
- ?? Add a bunch of Python scripts for the easy integration of Ymir to other software tools and scripts.
- Add binary (very comfortable! but not-human readable) format for models (parameters and sequences).
- Add Python script for converting between text <-> binary representations.
- Always save both representations.
- Add an easy way to create models from existing models with different parameters.
- ?? Docker container for Ymir?
- Add the "Algorithm" field to model JSON files.
- Add pybind11 and https://github.com/tbenthompson/cppimport