DaemonDB is a lightweight relational database engine built from scratch in Go. It implements core database concepts including B+ tree indexing, heap file storage, SQL parsing, and query execution, designed for educational purposes and learning database internals.
DaemonDB provides a clean, well-documented implementation of fundamental database components. The project implements a complete database stack from storage to query execution:
Key Features:
- Complete B+ tree with insert, search, delete, and range scan operations
- Heap file storage system with slot directory for O(1) row lookup
- SQL parser supporting DDL and DML statements
- Query executor with bytecode-based virtual machine
- Page-based storage architecture (4KB pages)
- Thread-safe operations with proper concurrency control
The database follows a layered architecture separating storage, indexing, and query processing:
βββββββββββββββββββββββββββββββββββββββββββ
β SQL Query Layer β
β (Parser β Code Generator β Executor) β
ββββββββββββββββ¬βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Index Layer (B+ Tree) β
β PrimaryKey β RowPointer(File,Page,Slot)β
ββββββββββββββββ¬βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Storage Layer (Heap Files) β
β Pages (4KB) with Slot Directory β
βββββββββββββββββββββββββββββββββββββββββββ
- Query Parser: Lexical analysis and syntax parsing of SQL statements
- Code Generator: Converts AST to bytecode instructions
- Query Executor (VM): Executes bytecode, orchestrates B+ tree and heap file operations
- B+ Tree: Index structure mapping primary keys to row locations
- Heap File Manager: Manages row storage in page-based heap files
- Pager: Abstract interface for page-level I/O (currently in-memory)
The B+ tree serves as the primary index, providing O(log n) performance for key lookups and range scans.
- Internal Nodes: Store separator keys and child pointers (navigation only)
- Leaf Nodes: Store key-value pairs with linked list structure for range scans
- Node Capacity: 32 keys per node (configurable via
MaxKeys) - Automatic Splitting: Nodes split when capacity exceeded, with parent propagation
- Balanced Tree: All leaves at same depth, maintains B+ tree invariants
// Initialize tree
pager := bplus.NewInMemoryPager()
cache := bplus.NewBufferPool(10)
tree := bplus.NewBPlusTree(pager, cache, bytes.Compare)
// Insert data
tree.Insertion([]byte("S001"), []byte("RowPointerBytes"))
// Search data
result, _ := tree.Search([]byte("S001"))
fmt.Printf("Found: %s\n", string(result))
// Delete data
tree.Delete([]byte("S001"))
// Range scan
iter := tree.SeekGE([]byte("S001"))
for iter.Valid() {
key := iter.Key()
value := iter.Value()
iter.Next()
}Completed Features:
- β Leaf node insertion with splitting
- β Internal node splitting with parent propagation
- β Recursive parent updates (handles multi-level splits)
- β Delete operations with borrow/merge logic
- β Binary search optimization for key lookups
- β Range scan iterator (SeekGE, Next)
- β Thread-safe operations with reader-writer locks
File Structure:
struct.go: Node and tree data structuresinsertion.go: Insert operationssplit_leaf.go: Leaf node splittingsplit_internal.go: Internal node splittingparent_insert.go: Parent propagation logicdeletion.go: Delete with borrow/mergesearch.go: Point lookupiterator.go: Range scan operationsfind_leaf.go: Leaf node navigationbinary_search.go: Binary search utilities
The heap file system stores actual row data in page-based files with a slot directory for efficient row access.
Each 4KB page contains:
- Header (32 bytes): FileID, PageNo, FreePtr, NumRows, SlotCount, etc.
- Data Area: Rows stored sequentially from offset 32
- Slot Directory: Grows backward from end of page (4 bytes per slot: offset + length)
βββββββββββββββββββββββββββββββββββββββββββ
β Page (4096 bytes) β
βββββββββββββββββββββββββββββββββββββββββββ€
β Header (32B): metadata β
βββββββββββββββββββββββββββββββββββββββββββ€
β Row 1 | Row 2 | Row 3 | ... β
β (grows forward) β
βββββββββββββββββββββββββββββββββββββββββββ€
β ... β
β Slot 3 | Slot 2 | Slot 1 | Slot 0 β
β (grows backward) β
βββββββββββββββββββββββββββββββββββββββββββ
// Create heap file manager
hfm, _ := heapfile.NewHeapFileManager("./data")
// Create heap file for a table
hfm.CreateHeapfile("students", fileID)
// Insert row
rowPtr, _ := hfm.InsertRow(fileID, rowData)
// Returns: RowPointer{FileID, PageNumber, SlotIndex}
// Read row
rowData, _ := hfm.GetRow(rowPtr)Completed Features:
- β Page-based storage (4KB pages)
- β Slot directory for O(1) row lookup within pages
- β Automatic page creation when pages fill up
- β Row insertion with space management
- β Row retrieval using RowPointer
- β Thread-safe file operations
File Structure:
struct.go: PageHeader, Slot, RowPointer, HeapFile structuresheapfile.go: HeapFile operations (insertRow, GetRow, findSuitablePage)heapfile_manager.go: HeapFileManager (CreateHeapfile, InsertRow, GetRow)page_io.go: Low-level page read/write operationspage_header.go: Header serialization/deserializationslots.go: Slot directory operations (readSlot, writeSlot, addSlot)
Architecture:
- 1 Table = 1 HeapFile (one
.heapfile per table) - 1 HeapFile = Multiple Pages (grows automatically as needed)
- 1 Page = Multiple Rows (~100-200 rows per page, depends on row size)
A complete SQL processing pipeline from lexical analysis to AST generation.
-- Table creation
CREATE TABLE students {
id int,
name string,
age int,
grade string
}
-- Data insertion
INSERT INTO students VALUES ("S001", "Alice", 20, "A")
INSERT INTO students VALUES ("S002", "Bob", 21, "B")
-- Data querying
SELECT * FROM students
SELECT name, grade FROM students
-- Data updates
UPDATE students SET grade = "A+" WHERE id = "S001"
-- Table Join
-- JOIN by default does INNER JOIN
SELECT * FROM table1 JOIN table2 ON id1 = id2
SELECT * FROM table1 INNER JOIN table2 ON id1 = id2
SELECT * FROM table1 JOIN table2 ON table1.id1 = table2.id2
SELECT * from table1 JOIN table2 ON id1 = id2 WHERE table1.id = 5
SELECT * from table1 JOIN table2 ON table1.id1 = table2.id2 WHERE table1.id = 5
SELECT * from table1 JOIN table2 ON table1.name = table2.refname WHERE table1.name = "abc"
SELECT * from table1 JOIN table2 ON table1.name = table2.refname WHERE table1.id = NULL
-- similary
select * from table1 LEFT JOIN table2 on id1 = id2
select * from table1 RIGHT JOIN table2 on id1 = id2
select * from table1 FULL JOIN table2 on id1 = id2
-- Table management
DROP students- Lexer: Hand-written tokenizer for SQL keywords, identifiers, literals
- Parser: Recursive descent parser for syntax analysis
- AST: Abstract syntax tree generation for each statement type
- Code Generator: Converts AST to bytecode instructions
File Structure:
lexer/lexer.go: Tokenization implementationlexer/token.go: Token definitionsparser/parser.go: Recursive descent parserparser/ast.go: AST node definitionscode-generator/code_generator.go: Bytecode emission
The query executor uses a bytecode-based virtual machine (VDBE-style) to execute SQL statements.
SQL Query
β
Parser β AST
β
Code Generator β Bytecode Instructions
β
VM.Execute()
βββ HeapFileManager.InsertRow() β Write row data
βββ B+ Tree.Insertion() β Index the row
βββ Return result
Completed:
- β CREATE TABLE execution
- β INSERT execution (writes to heap file + indexes in B+ tree)
- β Bytecode instruction set (OP_PUSH_VAL, OP_INSERT, OP_SELECT, etc.)
- β Row serialization/deserialization
- β Primary key extraction
- β RowPointer serialization
In Progress:
- π§ SELECT execution (parser complete, executor TODO)
- π§ UPDATE execution (parser complete, executor TODO)
- π§ DELETE execution (parser complete, executor TODO)
File Structure:
executor.go: VM implementation and statement executionhelpers.go: Serialization utilities, table schema management
DaemonDB/
βββ bplustree/ # B+ Tree index implementation
β βββ struct.go # Data structures
β βββ insertion.go # Insert operations
β βββ deletion.go # Delete operations
β βββ search.go # Point lookup
β βββ iterator.go # Range scan
β βββ split_leaf.go # Leaf splitting
β βββ split_internal.go # Internal node splitting
β βββ parent_insert.go # Parent propagation
β βββ find_leaf.go # Leaf navigation
β βββ binary_search.go # Binary search utilities
β βββ pager.go # Pager interface (in-memory)
β βββ ...
βββ heapfile_manager/ # Heap file storage system
β βββ struct.go # PageHeader, Slot, RowPointer
β βββ heapfile.go # HeapFile operations
β βββ heapfile_manager.go # HeapFileManager
β βββ page_io.go # Page read/write
β βββ page_header.go # Header serialization
β βββ slots.go # Slot directory operations
β βββ heapfile_test.go # Comprehensive tests
βββ query_parser/ # SQL parsing
β βββ lexer/ # Lexical analysis
β βββ parser/ # Syntax analysis
β βββ code-generator/ # Bytecode generation
βββ query_executor/ # Query execution
β βββ executor.go # VM and execution
β βββ helpers.go # Utilities
βββ main.go # Entry point
βββ README.md # This file
go run main.goThen enter SQL queries:
CREATE TABLE students {
id int primary key,
name string,
age int,
grade string
}
INSERT INTO students VALUES ("S001", "Alice", 20, "A")
INSERT INTO students VALUES ("S002", "Bob", 21, "B")
-- Point lookup uses the B+ tree on the declared primary key
SELECT * FROM students WHERE id = "S002"cd heapfile_manager
go test -v -run TestAllThis runs comprehensive tests:
- Basic insert/read operations
- Multiple pages
- Slot directory functionality
- Page header management
cd bplustree
go run bplus.go| Component | Status | Description |
|---|---|---|
| B+ Tree Core | β Complete | Full CRUD with parent propagation, internal splits |
| B+ Tree Iterator | β Complete | Range scan operations (SeekGE, Next) |
| Heap File Storage | β Complete | Page-based storage with slot directory |
| Heap File Operations | β Complete | Insert, GetRow (Delete/Update TODO) |
| SQL Parser | β Complete | Lexer and parser for DDL/DML |
| Code Generator | β Complete | AST to bytecode conversion |
| Query Executor | π§ Partial | INSERT/CREATE TABLE working, SELECT uses PK index; UPDATE/DELETE TODO |
| Concurrency | β Complete | Thread-safe operations |
| File Persistence | π§ Partial | Heap files on disk, B+ tree index pages on disk (root persisted) |
| Buffer Pool | β Complete | LRU cache with pin/unpin, dirty tracking |
| Node Serialization | β Complete | Encode/decode nodes to pages |
1. User: INSERT INTO students VALUES ("S001", "Alice", 20, "A")
β
2. Parser: Parse SQL β AST
β
3. Code Generator: AST β Bytecode
β
4. VM.Execute():
a. SerializeRow() β Convert values to bytes
b. HeapFileManager.InsertRow() β Write to heap file
β Returns: RowPointer(FileID=1, PageNumber=0, SlotIndex=3)
c. SerializeRowPointer() β Convert to 10 bytes (FileID, PageNumber, SlotIndex)
d. ExtractPrimaryKey() β declared PK if present, otherwise implicit rowid
e. B+ Tree.Insertion(PK, RowPointerBytes) β Stores index: PK β RowPointer
1. User: SELECT * FROM students WHERE id = "S001"
β
2. Parser: Parse SQL β AST
β
3. Code Generator: AST β Bytecode
β
4. VM.Execute():
a. B+ Tree.Search("S001") (only when WHERE is on the primary key)
β Returns: RowPointer bytes
b. DeserializeRowPointer() β RowPointer(1, 0, 3)
c. HeapFileManager.GetRow(RowPointer)
β Reads page 0, slot 3 β Returns row data
d. DeserializeRow() β Convert bytes to values
e. Return result to user (SELECT without WHERE still does a full scan)
- B+ Tree Search: O(log n) for point lookups
- B+ Tree Range Scan: O(log n + k) where k is result size
- Heap File Insert: O(1) per row (amortized)
- Heap File Read: O(1) using slot directory
- Page Size: 4KB (disk-aligned)
- Node Capacity: 32 keys per node
- Concurrency: Reader-writer locks for optimal read performance
- Language: Go 1.19+
- Storage: Heap files on disk (4KB pages)
- Indexing: B+ tree (currently in-memory, disk persistence planned)
- Query Language: SQL with DDL/DML support
- Concurrency: Thread-safe with mutex locks
- Architecture: Index-organized (B+ tree points to heap file rows)
The project includes comprehensive tests:
# Test heap file system
cd heapfile_manager
go test -v
# Test specific functionality
go test -v -run TestHeapFileOperations
go test -v -run TestMultiplePages
go test -v -run TestSlotDirectory- Implement UPDATE/DELETE operations in heap files
- Add secondary indexes and non-PK WHERE filtering
- Add transaction support
- Implement WAL (Write-Ahead Logging) for durability
This project is licensed under the MIT License.
This is an educational project built for learning database internals. Contributions and suggestions are welcome!