Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: semantic query via sentence-transformers embeddings#424

Open
juhii31 wants to merge 90 commits into
safishamsi:v4from
juhii31:feature/semantic-query-embeddings
Open

feat: semantic query via sentence-transformers embeddings#424
juhii31 wants to merge 90 commits into
safishamsi:v4from
juhii31:feature/semantic-query-embeddings

Conversation

@juhii31

@juhii31 juhii31 commented Apr 17, 2026

Copy link
Copy Markdown

Closes #1

What this does

Adds semantic embedding support to graphify query using
sentence-transformers with all-MiniLM-L6-v2 (80MB, local,
no API key required).

Changes

  • graphify/embed.py — new module with embed_graph() function

    • Embeds all node labels + docstrings using all-MiniLM-L6-v2
    • Computes pairwise cosine similarity matrix
    • Adds semantically_similar_to edges above configurable
      threshold (default 0.82)
    • Tags edges as INFERRED with confidence_score = cosine similarity
    • Caches embedding vectors in graphify-out/cache/embeddings.json
    • Re-runs only embed new/changed nodes via SHA256 cache key
  • tests/test_embed.py — 3 tests covering:

    • Similar nodes get connected
    • Cache file is created on first run
    • No duplicate edges on repeated runs

Notes

  • Works fully offline, zero API cost after model download
  • All 393 applicable tests passing (symlink and git hook
    tests skipped — Windows privilege limitation, pre-existing)

Minidoracat and others added 30 commits April 8, 2026 19:39
* fix: git hooks fail when graphify is installed via pipx

When installed via pipx, the graphify module is only available in
pipx's isolated venv, not the system python3. The git hooks
(post-commit, post-checkout) hardcoded `python3` which cannot import
graphify in this case.

Detect the correct Python interpreter from the graphify binary's
shebang line, matching the approach already used in skill.md Step 1.
Falls back to python3 for system installs.

* fix: handle env-style shebangs and improve interpreter detection

- Use POSIX `command -v` instead of non-standard `which`
- Parse `#!/usr/bin/env python3` shebangs correctly (previous
  `tr -d ' '` would produce `/usr/bin/envpython3`)
- Add import validation fallback to python3 if resolved interpreter
  cannot import graphify
… buffer

* fix: suppress graspologic ANSI output that breaks PowerShell scrolling

graspologic's leiden() emits ANSI escape sequences (progress bars,
colored warnings) that corrupt PowerShell 5.1's scroll buffer on
Windows, disabling vertical scrolling. Redirect stdout/stderr to
StringIO during leiden() calls to prevent any escape codes from
reaching the terminal.

Add 2 tests verifying cluster() produces no stdout/stderr output.

Fixes safishamsi#19

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* docs: add PowerShell troubleshooting section to Windows skill

Document the PowerShell 5.1 scrolling issue and provide 4
workarounds: upgrade graphify, use Windows Terminal, reset
terminal, or uninstall graspologic to use Louvain fallback.

Fixes safishamsi#19

Co-Authored-By: Claude Opus 4.6 <[email protected]>

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
- Register 'trae' and 'trae-cn' in _PLATFORM_CONFIG (skill-trae.md,
  ~/.trae/skills/ and ~/.trae-cn/skills/, claude_md=False)
- Add CLI subcommands: graphify trae install/uninstall,
  graphify trae-cn install/uninstall (routes to _agents_install/uninstall)
- Update help text with new platform entries
- Create skill-trae.md (Agent-tool based extraction, AGENTS.md integration,
  no PreToolUse hook support per Trae limitations)
- Update README.md and README.zh-CN.md with Trae platform docs

Co-authored-by: lijinshuan <[email protected]>
…extension drift, click detection, skill coverage, .graphify_python persistence

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…e relations in innerHTML (#sec)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…msi#127)

Tree-sitter resolves call targets directly from source — marking them
INFERRED was incorrect. Cross-file class-level uses edges remain INFERRED.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
safishamsi and others added 23 commits April 13, 2026 08:39
… path bug, .graphifyignore subfolder patterns; v0.4.10: Dart, Hermes, 6 CLI commands, PHP improvements

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…e plugin, cache root, PHP missing edges, Windows stability, cross-file calls

- safishamsi#352: add skill-kiro.md to pyproject.toml package-data
- safishamsi#341: guard edge_betweenness at >5000 nodes; use approximate k=100 for suggest_questions on large graphs
- safishamsi#354/safishamsi#229: add Step 6b in skill.md to call to_wiki() when --wiki given (before Step 9 cleanup)
- safishamsi#356: call _install_opencode_plugin() from install --platform opencode path
- safishamsi#350: add cache_root param to extract() so subdirectory runs keep cache at ./graphify-out/cache/
- safishamsi#230: PHP class_constant_access_expression emits references_constant edges
- safishamsi#232: PHP scoped_call_expression (static method calls) emits calls edges
- safishamsi#287: os.replace fallback for Windows WinError 5; graphify update exits 1 on failure; templates use graphify update . instead of python3 -c
- safishamsi#348: cross-file call resolution for all languages via raw_calls + global label map pass in extract()

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…js, macOS watch, god_nodes degree rename

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
 safishamsi#385, team workflow docs, Windows/pipx tips

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@juhii31

juhii31 commented Apr 17, 2026

Copy link
Copy Markdown
Author

This PR adds the embedding engine, caching, and edge creation. Query wiring into main.py and pyproject.toml optional dependency coming in the next commit before merge.

@juhii31

juhii31 commented Apr 18, 2026

Copy link
Copy Markdown
Author

Updated , query wiring into main.py and pyproject.toml optional dependency now included. graphify query "question" --embeddings uses semantic cosine similarity ranking instead of BFS keyword match. Fully closes #1.

@Qodo-Free-For-OSS

Copy link
Copy Markdown

Hi, embed_graph() sets an edge attribute named "source", which collides with NetworkX node-link JSON’s reserved edge endpoint key and can corrupt exported graph.json edge endpoints.

Severity: action required | Category: correctness

How to fix: Rename edge attribute "source"

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

embed_graph() adds an edge attribute source="embeddings". NetworkX’s node-link JSON format uses source/target for edge endpoints, so this attribute name can collide and corrupt exported graphs.

Issue Context

The repo exports graphs via json_graph.node_link_data(...) in graphify/export.py, which emits edge endpoint keys named source and target.

Fix Focus Areas

  • graphify/embed.py[60-66]
  • graphify/export.py[282-297]

Suggested change

Rename the edge attribute from source to something non-reserved (e.g., edge_source, provenance, or inferred_by).

We noticed a couple of other issues in this PR as well - happy to share if helpful.


Qodo code review - free for open-source.

@juhii31

juhii31 commented Apr 21, 2026

Copy link
Copy Markdown
Author

Hi, embed_graph() sets an edge attribute named "source", which collides with NetworkX node-link JSON’s reserved edge endpoint key and can corrupt exported graph.json edge endpoints.

Severity: action required | Category: correctness

How to fix: Rename edge attribute "source"

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

embed_graph() adds an edge attribute source="embeddings". NetworkX’s node-link JSON format uses source/target for edge endpoints, so this attribute name can collide and corrupt exported graphs.

Issue Context

The repo exports graphs via json_graph.node_link_data(...) in graphify/export.py, which emits edge endpoint keys named source and target.

Fix Focus Areas

  • graphify/embed.py[60-66]
  • graphify/export.py[282-297]

Suggested change

Rename the edge attribute from source to something non-reserved (e.g., edge_source, provenance, or inferred_by).

We noticed a couple of other issues in this PR as well - happy to share if helpful.

Qodo code review - free for open-source.

hi , Fixed — renamed source to provenance in embed.py to avoid NetworkX node-link JSON collision. Happy to see the other issues too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v3: semantic query with embeddings

8 participants