-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi Giulio,
Thanks again for all the work on Fulgor. I really appreciate all the progressive innovations that have gone into it.
It's really a great tool, both algorithm- and data-structure-wise – however, unfortunately, in almost all (biology-centric) applications where we want to use it, we have to make modifications to the source code to make it work for our purposes, because its output and query parameters do not correspond to our needs. As the program changes quite a lot (e.g., it will soon migrate to the new SSHash), this way of using it is not really sustainable.
Is there a chance that Rust bindings (or, in the worst case, C++ bindings) could be provided for the main query functionality? Essentially, we are interested in the following three functions (two high-priority ones and one low-priority one):
-
open an index – this would simply open an index from disk, or fail if it's the wrong version of the index (e.g., from an older Fulgor), or if it's corrupted; this could also return a list of references names in the indexed order
-
query sequence – for a given string, it would return an array of numbers, with the number of matching k-mers in the first, second, third, ... reference. Important – we are, in principle, interested in all this information, not just the somehow pre-filtered output. And if some pre-filtering must be done for some reason (e.g., thresholding), it should allow two options: a minimum number of matching k-mers or a minimum proportion of matching k-mers (without any "advanced" filtering functionality to make it "smart").
-
get matching bit-vectors – returning the bit vectors of the matches in a given reference (expected to be used for a small subset of references, e.g., only the best-matching references)
In my opinion, 1.+2. might be not that difficult for you/. It would massively increase the applicability of Fulgor – we could then really use it as an essential building block in many of our applications.
(Also cc @rob-p @Francii-B)