Problem
Current /graphify query is BFS keyword matching - same as grep with graph traversal. Searching "find what handles authentication" only works if the word "auth" appears in node labels.
Goal
Replace keyword BFS with embedding-based semantic search so queries find concepts by meaning, not exact string match.
Plan
Embedding backend (local by default):
sentence-transformers with all-MiniLM-L6-v2 (80MB, no API key, works offline)
- Optional: OpenAI embeddings API, nomic-embed via ollama
What changes:
- On graph build, embed every node label + source context, store vectors in
graph.json
/graphify query computes query embedding, ranks nodes by cosine similarity, then does BFS from top-k hits
semantically_similar_to edge detection can use embeddings instead of LLM (faster, cheaper)
- Node similarity surfaced in graph visualization
New optional dependency:
pip install graphifyy[embeddings]
Why this matters
This is the difference between a search tool and an understanding tool. "Find what connects the optimizer to the attention mechanism" should work even if those exact words don't appear together anywhere in the codebase.
Problem
Current
/graphify queryis BFS keyword matching - same as grep with graph traversal. Searching "find what handles authentication" only works if the word "auth" appears in node labels.Goal
Replace keyword BFS with embedding-based semantic search so queries find concepts by meaning, not exact string match.
Plan
Embedding backend (local by default):
sentence-transformerswithall-MiniLM-L6-v2(80MB, no API key, works offline)What changes:
graph.json/graphify querycomputes query embedding, ranks nodes by cosine similarity, then does BFS from top-k hitssemantically_similar_toedge detection can use embeddings instead of LLM (faster, cheaper)New optional dependency:
Why this matters
This is the difference between a search tool and an understanding tool. "Find what connects the optimizer to the attention mechanism" should work even if those exact words don't appear together anywhere in the codebase.