Soplang is a compiled language. The tree-walking interpreter is replaced by a two-backend pipeline: Cranelift (run
soplang file.sop— JIT to native code) and LLVM (buildsoplang build file.sop— AOT standalone binary). Both share the same front-end and HIR.
- Goals
- Architecture Overview
- Pipeline Stages
- Value Representation
- Runtime Library
- Cranelift JIT Design
- LLVM AOT Design
- Phase 1 — Semantic Analyzer
- Phase 2 — HIR (High-Level IR)
- Phase 3 — Runtime Library
- Phase 4 — Cranelift JIT Backend
- Phase 5 — LLVM AOT Backend
- Phase 6 — Remove Interpreter, Final Wiring
- Crate Layout
- Validation + Timeline
- Compiled-only: no tree-walking interpreter in the final product.
- Two backends sharing the same front-end (Lexer, Parser, AST, Semantic, HIR):
- Cranelift JIT:
soplang file.sop→ HIR → native machine code in memory → run. Near-native speed, fast compile, ~5 MB dep. - LLVM AOT:
soplang build file.sop→ HIR → LLVM IR → .o → link → standalone binary. Maximum optimization; best for static types (abn,jajab).
- Cranelift JIT:
- Reuse: Lexer, Parser, AST,
Value,SoplangError, stdlib (wrapped by runtime for native backends). - Remove interpreter:
src/interpreter.rsdeleted in Phase 6.
Source (.sop)
│
▼
┌─────────────────────────────┐
│ Lexer → Token stream │ (existing)
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ Parser → AST │ (existing)
└─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Semantic Analyzer (Phase 1) │
│ • Name resolution & scope building │
│ • Variable slot assignment (local #0, #1, …) │
│ • Closure analysis (which vars escape to heap?) │
│ • Static type checking (abn, jajab, qoraal, …) │
│ • Constant folding & propagation │
└─────────────────────────────────────────────────────┘
│ Annotated AST + Symbol Table
▼
┌─────────────────────────────────────────────────────┐
│ HIR Lowering (Phase 2) │
│ Flat, linear IR. No nested AST. SSA-ready. │
│ Variables are slots, jumps are explicit labels. │
└─────────────────────────────────────────────────────┘
│ HIR (HirModule)
▼
┌────────────────────────────────────────────────────────────────────────┐
│ TWO BACKENDS (same HIR → two targets) │
├─────────────────────────────────┬──────────────────────────────────────┤
│ Cranelift JIT │ LLVM AOT │
│ HIR → Cranelift IR │ HIR → LLVM IR │
│ → native code in memory │ → .o → link → standalone binary │
│ Used for: soplang file.sop │ Used for: soplang build file.sop │
│ (Phase 4) │ (Phase 5) │
└─────────────────────────────────┴──────────────────────────────────────┘
| Stage | Module | Input → Output | Phase |
|---|---|---|---|
| Lexer | lexer.rs |
Source → Tokens | existing |
| Parser | parser.rs |
Tokens → AST | existing |
| Semantic Analyzer | semantic.rs |
AST → Annotated AST + SymbolTable | 1 |
| HIR Lowering | hir.rs |
Annotated AST → HIR | 2 |
| Runtime Library | runtime.rs |
C-ABI functions called by both backends | 3 |
| Cranelift Backend | backend/cranelift.rs |
HIR → native code (JIT) | 4 |
| LLVM Backend | backend/llvm.rs |
HIR → LLVM IR → binary | 5 |
Both Cranelift and LLVM use the same tagged value for dynamic types (door):
SoplangValue = { tag: u8, payload: i64 } (C ABI: 16 bytes, passed as two i64 in practice)
Tag Meaning Payload
──── ───────── ────────────────────────────────
0 Null 0
1 Int i64 value directly
2 Float f64 bit-cast to i64
3 Bool 0 or 1
4 Str pointer to heap String (as i64)
5 List pointer to Rc<Vec<Value>> (as i64)
6 Object pointer to Rc<HashMap> (as i64)
7 Function pointer to closure struct (as i64)
Static types (abn, jajab) skip boxing in both backends:
abn x = 42→ emit nativei64in Cranelift/LLVM; no struct, no runtime call.jajab y = 3.14→ emit nativef64(double).
This is where Soplang's static type system gives large optimization wins.
Both backends call a small Rust runtime via C ABI. New file: src/runtime.rs.
#[repr(C)]
pub struct SoplangValue { pub tag: u8, pub payload: i64 }
// Primitives
#[no_mangle] pub extern "C" fn soplang_int(n: i64) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_float(x: f64) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_str(ptr: *const u8, len: usize) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_bool(b: bool) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_null() -> SoplangValue { ... }
// Arithmetic (dynamic dispatch)
#[no_mangle] pub extern "C" fn soplang_add(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_sub(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_mul(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_div(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_mod(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_neg(a: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_not(a: SoplangValue) -> SoplangValue { ... }
// Comparison
#[no_mangle] pub extern "C" fn soplang_eq(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_ne(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_lt(a: SoplangValue, b: SoplangValue) -> SoplangValue { ... }
// ... le, gt, ge
// IO / stdlib
#[no_mangle] pub extern "C" fn soplang_qor(v: SoplangValue) { ... }
#[no_mangle] pub extern "C" fn soplang_gelin() -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_nooc(v: SoplangValue) -> SoplangValue { ... }
// Collections
#[no_mangle] pub extern "C" fn soplang_list_new() -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_list_push(list: SoplangValue, val: SoplangValue) { ... }
#[no_mangle] pub extern "C" fn soplang_object_new() -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_get_index(obj: SoplangValue, idx: SoplangValue) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_set_index(obj: SoplangValue, idx: SoplangValue, val: SoplangValue) { ... }
#[no_mangle] pub extern "C" fn soplang_get_prop(obj: SoplangValue, name: *const u8, len: usize) -> SoplangValue { ... }
#[no_mangle] pub extern "C" fn soplang_set_prop(obj: SoplangValue, name: *const u8, len: usize, val: SoplangValue) { ... }
// Calls
#[no_mangle] pub extern "C" fn soplang_call(callee: SoplangValue, args: *const SoplangValue, n: i32) -> SoplangValue { ... }Each function converts SoplangValue to/from the existing Value enum and delegates to existing stdlib logic where applicable.
Crates: cranelift-codegen, cranelift-jit, cranelift-frontend, cranelift-module.
- Invocation:
soplang file.sop(default “run” path). - Flow: HIR → Cranelift IR → native machine code in process memory → call compiled function.
- No standalone binary: user always runs the
soplangexecutable with a.sopfile.
Pass SoplangValue as two i64 (tag widened): params/returns (i64, i64). For static-typed locals use plain i64 or f64 and only box at boundaries (e.g. call, return, qor).
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_frontend::{FunctionBuilder, FunctionBuilderContext};
use cranelift_codegen::ir::{types, AbiParam, InstBuilder};
pub struct CraneliftBackend {
module: JITModule,
ctx: cranelift_codegen::Context,
fn_ctx: FunctionBuilderContext,
}
impl CraneliftBackend {
pub fn new() -> Self { ... }
pub fn compile_module(&mut self, hir: &HirModule) -> Result<(), SoplangError> { ... }
pub fn get_main_fn(&self) -> *const u8 { ... } // entry point to call
}Cranelift applies its own passes (constant propagation, dead code elimination, register allocation). For abn/jajab, emit iadd, fadd, etc. with no boxing.
Crate: inkwell (LLVM 17+ bindings).
- Invocation:
soplang build file.sop(orsoplang build file.sop -o mybin). - Flow: HIR → LLVM IR → optimize (O2) → emit .o → link with runtime → standalone binary.
- Standalone binary: user gets an executable that runs without
soplang.
%SoplangValue = type { i8, i64 }
declare %SoplangValue @soplang_add(%SoplangValue, %SoplangValue)
declare %SoplangValue @soplang_int(i64)
declare void @soplang_qor(%SoplangValue)
; ... etcFor abn x = 42; abn y = x + 3 emit pure i64 arithmetic; LLVM constant-folds and optimizes. For door use %SoplangValue and runtime calls.
pub struct LlvmBackend<'ctx> {
ctx: &'ctx Context,
module: Module<'ctx>,
builder: Builder<'ctx>,
value_ty: StructType<'ctx>,
}
impl<'ctx> LlvmBackend<'ctx> {
pub fn compile_module(&mut self, hir: &HirModule) -> Result<(), SoplangError> { ... }
pub fn emit_object_file(&self, path: &Path) -> Result<(), SoplangError> { ... }
pub fn link_binary(&self, obj_path: &Path, out_path: &Path) -> Result<(), SoplangError> { ... }
}Goal: Walk the AST, build symbol table, resolve names, assign variable slots, analyse closures, validate static types.
pub struct SymbolTable {
pub scopes: Vec<Scope>,
pub functions: Vec<FunctionMeta>,
pub classes: HashMap<String, ClassMeta>,
}
pub struct Scope {
pub vars: HashMap<String, VarInfo>,
}
pub struct VarInfo {
pub slot: usize,
pub type_ann: TypeAnnotation,
pub is_const: bool,
pub is_captured: bool,
}
pub struct FunctionMeta {
pub name: String,
pub param_slots: Vec<usize>,
pub local_count: usize,
pub captures: Vec<String>,
}
pub fn analyze(stmts: &[Stmt]) -> Result<SymbolTable, SoplangError> { ... }Tasks: push/pop scopes, assign slots, mark captured vars, validate static types, record FunctionMeta per hawl.
- All example files pass semantic analysis. Symbol table has correct slots and metadata.
Goal: Define flat, backend-agnostic IR. Lower annotated AST → HIR.
pub enum HirInstr {
Const { dst: Slot, val: HirConst },
Copy { dst: Slot, src: Slot },
Load { dst: Slot, name: String },
Store { name: String, src: Slot },
BinOp { dst: Slot, op: BinOpKind, lhs: Slot, rhs: Slot, typed: bool },
UnOp { dst: Slot, op: UnOpKind, src: Slot },
BuildList { dst: Slot, items: Vec<Slot> },
BuildObject { dst: Slot, pairs: Vec<(String, Slot)> },
GetIndex { dst: Slot, obj: Slot, idx: Slot },
SetIndex { obj: Slot, idx: Slot, val: Slot },
GetProp { dst: Slot, obj: Slot, prop: String },
SetProp { obj: Slot, prop: String, val: Slot },
Label(LabelId),
Jump(LabelId),
JumpIf { cond: Slot, on_true: LabelId, on_false: LabelId },
Call { dst: Slot, callee: Slot, args: Vec<Slot> },
CallMethod { dst: Slot, obj: Slot, method: String, args: Vec<Slot> },
Return { val: Slot },
Break(LabelId),
Continue(LabelId),
TryBegin { catch: LabelId },
TryEnd,
BindError { dst: Slot },
}
pub enum HirConst { Int(i64), Float(f64), Str(String), Bool(bool), Null }
pub type Slot = usize;
pub type LabelId = usize;
pub struct HirFunction {
pub name: String,
pub params: Vec<Slot>,
pub local_count: usize,
pub body: Vec<HirInstr>,
pub is_static: bool,
}
pub struct HirModule {
pub functions: Vec<HirFunction>,
pub top_level: Vec<HirInstr>,
}Lowering: HirLowering::lower(sym, stmts) -> HirModule with backpatching for jumps.
--dump-hirprints valid HIR for all examples. No panics in lowering.
Goal: Implement src/runtime.rs with all extern "C" functions that Cranelift and LLVM IR will call.
- Define
SoplangValue(repr(C)) and conversion to/from existingValue. - Implement each runtime function (arithmetic, comparison, qor, gelin, list/object get/set, call).
- Reuse existing stdlib logic behind these wrappers.
- Ensure ABI is stable (e.g. two i64 for Value on 64-bit).
- Runtime compiles. Unit tests or small C/JIT callers can call runtime and get correct results.
Goal: Full “run” path: source → … → HIR → Cranelift → native code → execute. All language features work.
- HIR → Cranelift IR: For each
HirFunctionand top-level block, emit Cranelift blocks and instructions. Use runtime calls for dynamic ops; use nativei64/f64for static-typed slots. - Control flow: Map
Label/Jump/JumpIfto Cranelift blocks and branches. - Calls: Compile user functions to Cranelift functions;
Callcompiles to indirect or direct call with correct signature (SoplangValue = two i64). - Closures: Closure value = function pointer + env pointer; pass env into compiled function as extra arg.
- Classes/methods:
cusuband method dispatch via runtime helpers or compiled stubs that call runtime. - Import: Lex/parse/analyze/lower/compile imported file; merge “globals” into current JIT context or call into compiled module.
- Try/catch: Unwind or setjmp/longjmp to catch block; store error in slot.
- REPL: Each input is a small module; compile and run; keep globals in a persistent JIT context.
pub fn run_source(source: &str, path: Option<&Path>) -> Result<(), SoplangError> {
let tokens = Lexer::new(source).tokenize()?;
let ast = Parser::new(tokens).parse()?;
let sym = semantic::analyze(&ast)?;
let hir = hir::HirLowering::lower(&sym, &ast);
let mut jit = backend::cranelift::CraneliftBackend::new();
jit.compile_module(&hir)?;
jit.run_main()
}- All 43 examples produce correct output via
soplang file.sop. Integration tests (.expected) pass.
Goal: soplang build file.sop produces a standalone native binary.
- HIR → LLVM IR: Same HIR as Cranelift; emit LLVM IR via inkwell (runtime declarations,
%SoplangValue, static vs dynamic as in Cranelift). - Optimization: Run O2 (or configurable) passes on the module.
- Emit .o: Write object file; link with runtime (same
runtime.rscompiled into a static lib or linked from the soplang binary). - CLI:
soplang build <file.sop> [-o output]→ compile → link → output binary.
soplang build examples/hello.sop -o helloproduces./hellothat runs and matches expected output.
Goal: Interpreter removed. Single run path (Cranelift) and single build path (LLVM). Tests and docs updated.
- Delete
src/interpreter.rsand any interpreter-only code. - Ensure
lib.rsandmain.rsuse only semantic → HIR → Cranelift (run) or LLVM (build). - REPL/shell: compile each line with Cranelift and run in a persistent context.
- Tests: all use
run_source(Cranelift); no references toInterpreter. - Benchmarks: measure Cranelift run and LLVM-built binary; update RESULTS.md.
- Docs: README and COMPILER_PLAN state “compiled language; Cranelift for run, LLVM for build”.
- No interpreter in tree.
cargo testgreen.soplang file.sopandsoplang build file.sopdocumented and working.
src/
├── token.rs
├── lexer.rs
├── ast.rs
├── parser.rs
├── value.rs
├── scope.rs
├── error.rs
├── stdlib.rs
│
├── semantic.rs Phase 1
├── hir.rs Phase 2
├── runtime.rs Phase 3 — C-ABI for both backends
│
├── backend/
│ ├── mod.rs
│ ├── cranelift.rs Phase 4 — JIT (soplang file.sop)
│ └── llvm.rs Phase 5 — AOT (soplang build file.sop)
│
├── shell.rs REPL: compile + run via Cranelift
├── main.rs CLI: run vs build
├── lib.rs run_source (Cranelift), build_source (LLVM)
│
└── interpreter.rs DELETED Phase 6
| Phase | Focus | Validation |
|---|---|---|
| 1 | Semantic Analyzer | All examples pass; symbol table correct |
| 2 | HIR | --dump-hir works; no panic on examples |
| 3 | Runtime Library | Runtime builds; ABI used by backends |
| 4 | Cranelift JIT | All 43 .expected tests pass via soplang file.sop |
| 5 | LLVM AOT | soplang build file.sop produces working binary |
| 6 | Remove interpreter | No interpreter code; tests green |
| Backend | Command | Output | Dep |
|---|---|---|---|
| Cranelift JIT | soplang file.sop |
Run in process; no standalone binary | cranelift-* ~5 MB |
| LLVM AOT | soplang build file.sop |
Standalone native binary | inkwell + LLVM ~300 MB dev |
When ready, start with Phase 1 (Semantic Analyzer).