This is a fork of minimap2 that replaces the minimizer algorithm with the mod-minimizer scheme, based on the paper:
The mod-minimizer: a simple and efficient sampling algorithm for long k-mers
Ragnar Groot Koerkamp, Giulio Ermanno Pibiri
bioRxiv 2024.05.25.595898; doi: 10.1101/2024.05.25.595898
This fork modifies the minimap2 code to implement the mod-minimizer algorithm for finding (w,k)-minimizers on DNA sequences. The mod-minimizer scheme provides a simple and efficient sampling algorithm for long k-mers, improving performance in certain applications.
Note: This implementation reduces the minimizer density. If you wish to achieve approximately the same minimizer density as Minimap2's default settings, it is recommended to use the flag -w 8.
The mod-minimizer algorithm finds (w,k)-minimizers on a DNA sequence using the following procedure:
Notation:
- tmer: A newly constructed t-mer by removing the first base and appending the new base.
- W: Set of t-mers in the current window.
- tmer_i: The i-th t-mer of the window.
- kmer_i: The i-th k-mer of the window.
- h(a): Hash of sequence a.
- rc(a): Reverse complement of sequence a.
- pos_W(a): Position of a within window W (0-indexed).
- M: List of minimizers.
Procedure:
Using a sliding window, construct the entering t-mer:
-
Update Window: Remove the oldest t-mer from W:
W = W \ {W_0} -
Compute t-mer Info:
info = min( h(tmer), h(rc(tmer)) ) -
Add t-mer to Window:
W = W ∪ {info} -
Find Minimal t-mer:
min = min(W) -
Compute Position:
p = pos_W(min) mod w -
Select k-mer:
kmer = min( h(kmer_p), h(rc(kmer_p)) ) -
Update Minimizers:
M = M ∪ {kmer}
For more details on the algorithm and its implementation, please refer to the original paper.
For all other details, usage instructions, and documentation, please refer to the original minimap2 repository.
The version of the base Minimap2 program used for this modification and the corresponding benchmarks is Release 2.28-r1209 (27 March 2024).