Forward: HCL2 Text → [PostLexer] → Lark Parse Tree → LarkElement Tree → Python Dict/JSON
Reverse: Python Dict/JSON → LarkElement Tree → Lark Tree → HCL2 Text
Direct: HCL2 Text → [PostLexer] → Lark Parse Tree → LarkElement Tree → Lark Tree → HCL2 Text
The Direct pipeline (parse_to_tree → transform → to_lark → reconstruct) skips serialization to dict, so all IR nodes (including NewLineOrCommentRule nodes for whitespace/comments) directly influence the reconstructed output. Any information discarded before the IR is lost in this pipeline.
| Module | Role |
|---|---|
hcl2/hcl2.lark |
Lark grammar definition |
hcl2/api.py |
Public API (load/loads/dump/dumps + intermediate stages) |
hcl2/postlexer.py |
Token stream transforms between lexer and parser |
hcl2/parser.py |
Lark parser factory with caching |
hcl2/transformer.py |
Lark parse tree → LarkElement tree |
hcl2/deserializer.py |
Python dict → LarkElement tree |
hcl2/formatter.py |
Whitespace alignment and spacing on LarkElement trees |
hcl2/reconstructor.py |
LarkElement tree → HCL2 text via Lark |
hcl2/builder.py |
Programmatic HCL document construction |
hcl2/walk.py |
Generic tree-walking primitives for the LarkElement IR tree |
hcl2/utils.py |
SerializationOptions, SerializationContext, string helpers |
hcl2/const.py |
Constants: IS_BLOCK, COMMENTS_KEY, INLINE_COMMENTS_KEY |
cli/helpers.py |
File/directory/stdin conversion helpers |
cli/hcl_to_json.py |
hcl2tojson entry point |
cli/json_to_hcl.py |
jsontohcl2 entry point |
cli/hq.py |
hq CLI entry point — query dispatch, formatting, optional operator |
hcl2/query/__init__.py |
Public query API exports |
hcl2/query/_base.py |
NodeView base class, view registry, view_for() factory |
hcl2/query/body.py |
DocumentView, BodyView facades for top-level and body queries |
hcl2/query/blocks.py |
BlockView facade for block queries |
hcl2/query/attributes.py |
AttributeView facade for attribute queries |
hcl2/query/containers.py |
TupleView, ObjectView facades for container queries |
hcl2/query/expressions.py |
ConditionalView facade for conditional expressions |
hcl2/query/functions.py |
FunctionCallView facade for function call queries |
hcl2/query/for_exprs.py |
ForTupleView, ForObjectView facades for for-expressions |
hcl2/query/path.py |
Structural path parser (PathSegment, parse_path, [select()], type:name) |
hcl2/query/resolver.py |
Path resolver — segment-by-segment with label depth, type filter |
hcl2/query/pipeline.py |
Pipe operator — split_pipeline, classify_stage, execute_pipeline |
hcl2/query/builtins.py |
Built-in transforms: keys, values, length |
hcl2/query/diff.py |
Structural diff between two HCL documents |
hcl2/query/predicate.py |
select() predicate tokenizer, recursive descent parser, evaluator |
hcl2/query/safe_eval.py |
AST-validated Python expression eval for hybrid/eval modes |
hcl2/query/introspect.py |
--describe and --schema output generation |
hcl2/__main__.py is a thin wrapper that imports cli.hcl_to_json:main.
| File | Domain |
|---|---|
rules/abstract.py |
LarkElement, LarkRule, LarkToken base classes |
rules/tokens.py |
StringToken (cached factory), StaticStringToken, punctuation constants |
rules/base.py |
StartRule, BodyRule, BlockRule, AttributeRule |
rules/containers.py |
TupleRule, ObjectRule, ObjectElemRule, ObjectElemKeyRule |
rules/expressions.py |
ExprTermRule, BinaryOpRule, UnaryOpRule, ConditionalRule |
rules/literal_rules.py |
IntLitRule, FloatLitRule, IdentifierRule, KeywordRule |
rules/strings.py |
StringRule, InterpolationRule, HeredocTemplateRule, TemplateStringRule |
rules/functions.py |
FunctionCallRule, ArgumentsRule |
rules/indexing.py |
GetAttrRule, SqbIndexRule, splat rules |
rules/for_expressions.py |
ForTupleExprRule, ForObjectExprRule, ForIntroRule, ForCondRule |
rules/directives.py |
TemplateIfRule, TemplateForRule, and flat directive start/end rules |
rules/whitespace.py |
NewLineOrCommentRule, InlineCommentMixIn |
Follows the json module convention. All option parameters are keyword-only.
load/loads— HCL2 text → Python dictdump/dumps— Python dict → HCL2 textquery— HCL2 text/file →DocumentViewfor structured queries- Intermediate stages:
parse/parses,parse_to_tree/parses_to_tree,transform,serialize,from_dict,from_json,reconstruct
SerializationOptions (LarkElement → dict):
with_comments, with_meta, wrap_objects, wrap_tuples, explicit_blocks, preserve_heredocs, force_operation_parentheses, preserve_scientific_notation, strip_string_quotes
DeserializerOptions (dict → LarkElement):
heredocs_to_strings, strings_to_heredocs, object_elements_colon, object_elements_trailing_comma
FormatterOptions (whitespace/alignment):
indent_length, open_empty_blocks, open_empty_objects, open_empty_tuples, vertically_align_attributes, vertically_align_object_elements
Console scripts defined in pyproject.toml. All three CLIs accept positional PATH arguments (files, directories, glob patterns, or - for stdin). When no PATH is given, stdin is read by default (like jq).
All CLIs use structured error output (plain text to stderr) and distinct exit codes:
| Code | hcl2tojson |
jsontohcl2 |
hq |
|---|---|---|---|
| 0 | Success | Success | Success |
| 1 | Partial (some skipped) | JSON/encoding parse error | No results |
| 2 | All unparsable | Bad HCL structure | Parse error |
| 3 | — | — | Query error |
| 4 | I/O error | I/O error | I/O error |
| 5 | — | Differences found (--diff / --semantic-diff) |
— |
hcl2tojson file.tf # single file to stdout
hcl2tojson --ndjson dir/ # directory → NDJSON to stdout
hcl2tojson a.tf b.tf -o out/ # multiple files to output dir
hcl2tojson --ndjson 'modules/**/*.tf' # glob + NDJSON streaming
hcl2tojson --only resource,module file.tf # block type filtering
hcl2tojson --exclude variable file.tf # exclude block types
hcl2tojson --fields cpu,memory file.tf # field projection
hcl2tojson --compact file.tf # single-line JSON
hcl2tojson -q dir/ -o out/ # quiet (no stderr progress)
echo 'x = 1' | hcl2tojson # stdin (no args needed)
Key flags: --ndjson, --compact, --only/--exclude, --fields, -q/--quiet, --json-indent N, --with-meta, --with-comments, --strip-string-quotes (breaks round-trip). Multi-file NDJSON adds a __file__ provenance key to each object.
jsontohcl2 file.json # single file to stdout
jsontohcl2 --diff original.tf modified.json # preview text changes
jsontohcl2 --semantic-diff original.tf modified.json # semantic-only changes
jsontohcl2 --semantic-diff original.tf --diff-json m.json # semantic diff as JSON
jsontohcl2 --dry-run file.json # convert without writing
jsontohcl2 --fragment - # attribute snippets from stdin
jsontohcl2 --indent 4 --no-align file.json
Key flags: --diff ORIGINAL, --semantic-diff ORIGINAL, --diff-json, --dry-run, --fragment, -q/--quiet, --indent N, --no-align, --colon-separator.
Add new options as parser.add_argument() calls in the relevant entry point module.
Lark's postlex parameter accepts a single object with a process(stream) method that transforms the token stream between the lexer and LALR parser. The PostLexer class is designed for extensibility: each transformation is a private method that accepts and yields tokens, and process() chains them together.
Current passes:
_merge_newlines_into_operators
To add a new pass: create a private method with the same (self, stream) -> generator signature, and add a yield from call in process().
These are project-specific constraints that must not be violated:
- Always use the LarkElement IR. Never transform directly from Lark parse tree to Python dict or vice versa.
- Block vs object distinction. Use
__is_block__markers (const.IS_BLOCK) to preserve semantic intent during round-trips. The deserializer must distinguish blocks from regular objects. - Bidirectional completeness. Every serialization path must have a corresponding deserialization path. Test round-trip integrity: Parse → Serialize → Deserialize → Serialize produces identical results.
- One grammar rule = one
LarkRuleclass. Each class implementslark_name(), typed property accessors,serialize(), and declares_children_layout: Tuple[...](annotation only, no assignment) to document child structure. - Token caching. Use the
StringTokenfactory inrules/tokens.py— never create token instances directly. - Interpolation context.
${...}generation depends on nesting depth — always pass and respectSerializationContext. - Update both directions. When adding language features, update transformer.py, deserializer.py, formatter.py and reconstructor.py.
- Add grammar rules to
hcl2.lark - If the new construct creates LALR ambiguities with
NL_OR_COMMENT, add a postlexer pass inpostlexer.py - Create rule class(es) in the appropriate
rules/file - Add transformer method(s) in
transformer.py - Implement
serialize()in the rule class - Update
deserializer.py,formatter.pyandreconstructor.pyfor round-trip support
Framework: unittest.TestCase (not pytest).
python -m unittest discover -s test -p "test_*.py" -v
Unit tests (test/unit/): instantiate rule objects directly (no parsing).
rules/— one file per rules modulecli/— one file per CLI moduletest_*.py— tests for corresponding files fromhcl2/directory
Use concrete stubs when testing ABCs (e.g., StubExpression(ExpressionRule)).
Integration tests (test/integration/): full-pipeline tests with golden files.
test_round_trip.py— iterates over all suites inhcl2_original/, tests HCL→JSON, JSON→JSON, JSON→HCL, and full round-triptest_specialized.py— feature-specific tests with golden files inspecialized/
Always run round-trip full test suite after any modification.
Hooks are defined in .pre-commit-config.yaml (includes black, mypy, pylint, and others). All changed files must pass these checks before committing. When writing or modifying code:
- Format Python with black (Python 3.8 target).
- Ensure mypy and pylint pass. Pylint config is in
pylintrc, scoped tohcl2/andtest/. - End files with a newline; strip trailing whitespace (except under
test/integration/(hcl2_reconstructed|specialized)/).
Update this file when architecture, modules, API surface, or testing conventions change. Also update README.md and the docs in docs/ (01_getting_started.md, 02_querying.md, 03_advanced_api.md, 04_hq.md, 05_hq_examples.md) when changes affect the public API, CLI flags, or option fields.