Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: add Haxe language support via tree-sitter-haxe#1307

Open
mallyskies wants to merge 2 commits into
safishamsi:v8from
masquepublishing:feat/haxe-language-support
Open

feat: add Haxe language support via tree-sitter-haxe#1307
mallyskies wants to merge 2 commits into
safishamsi:v8from
masquepublishing:feat/haxe-language-support

Conversation

@mallyskies

Copy link
Copy Markdown

Adds Haxe (.hx) as a supported language for AST extraction.

What this does:

  • detect.py: registers .hx in CODE_EXTENSIONS
  • extract.py: adds extract_haxe() which extracts classes, interfaces,
    enums, enum abstracts, typedefs, and functions from .hx files using the
    tree-sitter-haxe grammar
  • extract.py: adds _haxe_recover_scattered() fallback for files where
    the grammar emits scattered tokens instead of proper declaration nodes
    (minified files, unsupported preprocessor patterns)
  • pyproject.toml: adds haxe = ["tree-sitter-haxe"] optional dep group;
    adds tree-sitter-haxe to the all group

Implementation notes:

CR/CRLF line endings are normalized before parsing — the codebase being
tested against contains legacy files with \r-only Mac line endings which
would cause the // comment rule to run to EOF.

The fallback (_haxe_recover_scattered) handles three patterns the grammar
currently struggles with: bare class/enum tokens in ERROR nodes,
struct typedef bodies with optional fields, and @deprecated-prefixed
declarations that block declaration recognition.

Tested against 6,978 Haxe source files with zero parse errors.
Produces 73,419 nodes and 88,084 edges.

Dependency:

Requires pip install tree-sitter-haxe (available on PyPI). Install with:
pip install "graphifyy[haxe]"

- extract_haxe(): extracts classes, interfaces, enums, enum abstracts,
  typedefs, and functions from .hx files using tree-sitter-haxe grammar
- _haxe_recover_scattered(): fallback parser for files where the grammar
  produces scattered tokens instead of proper declaration nodes
- CR/CRLF normalization before parsing (handles old Mac \r-only files)
- detect.py: register .hx extension → Haxe language
- pyproject.toml: add haxe optional dep group; add tree-sitter-haxe to all

Tested against 5,490 .hx files; 2 empty files (both legitimately
all-commented-out). Produces 82,867 nodes and 98,717 edges.
@mallyskies

Copy link
Copy Markdown
Author

Note on parse quality

This PR depends on tree-sitter-haxe from PyPI (currently v0.0.1). That version
has two known grammar bugs that affect .hx parsing:

  • Strings swallowing inline comments"str" // comment causes the comment
    to be consumed into the string token, corrupting the rest of the expression.
  • Member expressions are right-associativea.b.c is parsed as a.(b.c)
    instead of (a.b).c, producing incorrect AST structure for all chained
    property access.

Both are fixed in this pending PR to the grammar repo:
vantreeseba/tree-sitter-haxe#67

Once that is merged and a new tree-sitter-haxe release is published to PyPI,
graphify users will automatically get correct Haxe ASTs with no changes needed here.

- README.md: add .hx to supported languages table (36 → 37 grammars)
- CHANGELOG.md: add Unreleased entry for Haxe support
- tests/fixtures/sample.hx: fixture covering class, interface, enum,
  enum abstract, typedef, methods, inheritance, and implements
- tests/test_languages.py: 9 tests for extract_haxe(); skipped when
  tree-sitter-haxe is not installed (mirrors [dm] skip pattern)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant