Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: add Haxe language support via tree-sitter-haxe#1307

Open
mallyskies wants to merge 3 commits into
safishamsi:v8from
masquepublishing:feat/haxe-language-support
Open

feat: add Haxe language support via tree-sitter-haxe#1307
mallyskies wants to merge 3 commits into
safishamsi:v8from
masquepublishing:feat/haxe-language-support

Conversation

@mallyskies

@mallyskies mallyskies commented Jun 13, 2026

Copy link
Copy Markdown

Adds Haxe (.hx) as a supported language for AST extraction.

What this does:

  • detect.py: registers .hx in CODE_EXTENSIONS
  • extract.py: adds extract_haxe() which extracts classes, interfaces,
    enums, enum abstracts, typedefs, and functions from .hx files using the
    tree-sitter-haxe grammar
  • extract.py: adds _haxe_recover_scattered() fallback for files where
    the grammar emits scattered tokens instead of proper declaration nodes
    (minified files, unsupported preprocessor patterns)
  • pyproject.toml: no haxe extra — tree-sitter-haxe has no PyPI
    release, and PyPI rejects packages with a direct URL/VCS dependency in
    Requires-Dist, so declaring one here would block every future
    graphifyy release. extract_haxe() lazy-imports tree_sitter_haxe
    with a graceful ImportError guard (same pattern as dm/terraform),
    so this is a pure packaging change with no functional impact.

Implementation notes:

CR/CRLF line endings are normalized before parsing — the codebase being
tested against contains legacy files with \r-only Mac line endings which
would cause the // comment rule to run to EOF.

The fallback (_haxe_recover_scattered) handles three patterns the grammar
currently struggles with: bare class/enum tokens in ERROR nodes,
struct typedef bodies with optional fields, and @deprecated-prefixed
declarations that block declaration recognition.

Tested against 6,978 Haxe source files with zero parse errors.
Produces 73,419 nodes and 88,084 edges.

Dependency:

Requires tree-sitter-haxe — not on PyPI, so install the patched fork
directly:

pip install git+https://github.com/masquepublishing/tree-sitter-haxe.git

@mallyskies

mallyskies commented Jun 13, 2026

Copy link
Copy Markdown
Author

Note on parse quality

This PR depends on tree-sitter-haxe from PyPI (currently v0.0.1). That version
has two known grammar bugs that affect .hx parsing:

  • Strings swallowing inline comments"str" // comment causes the comment
    to be consumed into the string token, corrupting the rest of the expression.
  • Member expressions are right-associativea.b.c is parsed as a.(b.c)
    instead of (a.b).c, producing incorrect AST structure for all chained
    property access.

Both are fixed in this pending PR to the grammar repo:
vantreeseba/tree-sitter-haxe#67

Update: to correct the above — there's actually no PyPI release of
tree-sitter-haxe at all ("v0.0.1" was this repo's own unpublished package
metadata, not a PyPI version). Rather than wait on the upstream merge, this
PR now depends directly on
our patched fork
(branch fix/grammar-issues-52-53), which includes both fixes today — see
the updated PR description's Dependency section. Once
vantreeseba/tree-sitter-haxe#67 merges and a real PyPI release exists,
this PR will switch to a normal haxe extra and the fork can be retired.

@safishamsi

Copy link
Copy Markdown
Owner

Thanks @mallyskies — the Haxe extractor itself is well-built (follows the LanguageConfig/dispatch pattern, graceful ImportError guard, sensible inherits-vs-implements split). Two blockers before it can merge:

  1. Dependency doesn't resolve. tree-sitter-haxe returns 404 on PyPI — pip install "graphifyy[haxe]" / uv sync can't find it. Worse, it's added to the all extra, so it breaks uv sync/CI for everyone (dependency resolution fails before tests can even collect), contradicting the #1104 note above the dm/terraform extras. Please publish the grammar to PyPI first and pin it (tree-sitter-haxe>=x.y), and at minimum keep it out of the all group until then.
  2. Rebase — the branch is currently CONFLICTING with v8.

Nice-to-have: add imports and calls edge assertions (the fixture has both but no test checks them). Happy to merge once the dependency is real + pinned and the branch rebases clean.

@mallyskies mallyskies force-pushed the feat/haxe-language-support branch 2 times, most recently from f3b0361 to 63ee262 Compare July 2, 2026 00:09
@mallyskies mallyskies marked this pull request as draft July 2, 2026 00:23
- extract_haxe(): extracts classes, interfaces, enums, enum abstracts,
  typedefs, and functions from .hx files using tree-sitter-haxe grammar
- _haxe_recover_scattered(): fallback parser for files where the grammar
  produces scattered tokens instead of proper declaration nodes
- CR/CRLF normalization before parsing (handles old Mac \r-only files)
- detect.py: register .hx extension → Haxe language
- pyproject.toml: add haxe optional dep group; add tree-sitter-haxe to all

Tested against 5,490 .hx files; 2 empty files (both legitimately
all-commented-out). Produces 82,867 nodes and 98,717 edges.
- README.md: add .hx to supported languages table (36 → 37 grammars)
- CHANGELOG.md: add Unreleased entry for Haxe support
- tests/fixtures/sample.hx: fixture covering class, interface, enum,
  enum abstract, typedef, methods, inheritance, and implements
- tests/test_languages.py: 9 tests for extract_haxe(); skipped when
  tree-sitter-haxe is not installed (mirrors [dm] skip pattern)
PyPI/Warehouse rejects any package upload whose metadata contains a
direct URL/VCS dependency. graphifyy is actively published to PyPI, so
the haxe extra's git+https dependency would block every future release
of the package, not just fail to build for haxe users.

Drop the extra entirely and document a manual pip install
git+https://github.com/masquepublishing/tree-sitter-haxe.git step in
the README instead, matching how the project treats every other
grammar with install friction (real PyPI name, or nothing) - there is
no existing precedent for a non-PyPI dependency in pyproject.toml.
@mallyskies mallyskies force-pushed the feat/haxe-language-support branch from aa4f473 to ef21405 Compare July 2, 2026 15:56
@mallyskies mallyskies marked this pull request as ready for review July 2, 2026 16:14
@mallyskies

Copy link
Copy Markdown
Author

@safishamsi Thanks for your help and patience. I believe I've addressed both concerns:

  1. Dependency resolution — removed the haxe extra from pyproject.toml entirely (rather than pinning) since the tree-sitter-haxe project I forked still has no PyPI release — the upstream grammar-fix PR (Fix string/comment parsing and member_expression associativity (#52 and #53) vantreeseba/tree-sitter-haxe#67) is still open. extract_haxe() already lazy-imports with a graceful fallback, so this is packaging-only. Manual install documented in the README and CHANGELOG. PR description updated to match.
  2. Rebase — done; branch is on current v8 tip, no conflicts.

NOTE: test_haxe_finds_imports/test_haxe_finds_calls (with real edge-label assertions) were already in the second commit, so the edge-assertion ask should be covered, but let me know if I didn't do that the way you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants