Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A new implementation of my graph index, I wanted to start from scratch. Now using a different Louvain Method implementation that uses much less memory.

License

Notifications You must be signed in to change notification settings

fawaz-dabbaghieh/gfaidx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gfaidx

A new implementation of my graph index, I wanted to start from scratch. Now using a different Louvain Method implementation that uses much less memory.

TODO cpp

  • Add fast buffer-based GFA reader, inspired by strangepg
  • Generate edge lists from a GFA
    • Integrate strangepg file reading for faster GFA loading
  • On disk binary search for node IDs to their community ID
    • It works but needs to be implemented in the code to store the community IDs.
  • separate the GFA file based on the communities produced.
    • Change the map to a string: <int, int> and the second Int is the community ID, or keep a vector of node length and add to it <string, int> with node id and community ID. Need to test memory for both.
  • Generate the community ID to file offset index (int: <int, int>, community ID: <start, end>)
    • Need to look if I can then gzip the chunks separately and how will this change the offsets.
  • Separate the edges that belong to different communities to their own chunk. I don't think it's actually needed
  • As long as I'm hashing the sequence IDs later, maybe I should hash them first and use that absail dictionary that uses less memory.
  • Parallelize the GFA chunking/gzipping. Not necessary, it's faster now with compression level 6 instead of 9.
  • Maybe make the community index a binary file that gets loaded into memory completely for faster access.
  • Index the Paths and other lines, these will be line by line indexed, should be easy
  • Recursive chunking, I think I should further chunk the communities that are too large. Do it on the separated file.
  • Add command line interface
  • Add unit tests
  • Benchmark against other graph clustering tools
  • Add Rust interface
  • Add conda package

TODO Python

  • Add command line interface
  • Edit old ChGraph to work with the new indexes
  • Investigate why retrieving (node_id 3456 community 743) takes very long, and probably should investigate every chunk.

About

A new implementation of my graph index, I wanted to start from scratch. Now using a different Louvain Method implementation that uses much less memory.

Resources

License

Stars

Watchers

Forks

Packages

No packages published