gfaidx

A new implementation of my graph index, I wanted to start from scratch. Now using a different Louvain Method implementation that uses much less memory.

TODO cpp

Add fast buffer-based GFA reader, inspired by strangepg
Generate edge lists from a GFA
- Integrate strangepg file reading for faster GFA loading
On disk binary search for node IDs to their community ID
- It works but needs to be implemented in the code to store the community IDs.
separate the GFA file based on the communities produced.
- Change the map to a string: <int, int> and the second Int is the community ID, or keep a vector of node length and add to it <string, int> with node id and community ID. Need to test memory for both.
Generate the community ID to file offset index (int: <int, int>, community ID: <start, end>)
- Need to look if I can then gzip the chunks separately and how will this change the offsets.
Separate the edges that belong to different communities to their own chunk. I don't think it's actually needed
As long as I'm hashing the sequence IDs later, maybe I should hash them first and use that absail dictionary that uses less memory.
Parallelize the GFA chunking/gzipping. Not necessary, it's faster now with compression level 6 instead of 9.
Maybe make the community index a binary file that gets loaded into memory completely for faster access.
Index the Paths and other lines, these will be line by line indexed, should be easy
Recursive chunking, I think I should further chunk the communities that are too large. Do it on the separated file.
Add command line interface
Add unit tests
Benchmark against other graph clustering tools
Add Rust interface
Add conda package

TODO Python

Add command line interface
Edit old ChGraph to work with the new indexes
Investigate why retrieving (node_id 3456 community 743) takes very long, and probably should investigate every chunk.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
attic		attic
external		external
python		python
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gfaidx

TODO cpp

TODO Python

About

Uh oh!

Releases

Packages

Languages

License

fawaz-dabbaghieh/gfaidx

Folders and files

Latest commit

History

Repository files navigation

gfaidx

TODO cpp

TODO Python

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages