Not to be confused with Yet Another Compiler Compiler (which I learned about only after naming this project...)
Version française : README-fr.md
Yet Another C Compiler is a simple compiler for a subset of the C programming language, written in Python, in the context of a compilation course in engineering school. It translates C code into assembly code for a minimal stack machine (MSM).
- Python 3.10 or higher
- GCC (for compiling the MSM simulator)
-
Clone the repository:
git clone https://github.com/AlexZeGamer/yacc.git cd yacc -
(optional) Install the project as a package:
pip install . -
Compile the MSM simulator:
cd msm gcc -o msm msm.c # or 'make msm' if make is installed
-
(optional) Add the MSM simulator to your PATH for easy access:
# Linux / WSL export PATH="$PATH:$(pwd)/msm" # Add this line to ~/.bashrc
# Windows PowerShell $env:PATH += ";$PWD/msm" # Add this line to your PowerShell profile
python yacc.py <parameters>or
yacc <parameters> # if installed as a packageInput :
input.cor--i <input.c>or--input <input.c>: Source file to compile--str="<source_code>": Instead of reading from a file, compile the provided C source code string--stdin: Read the C source code from standard input (e.g. via a pipe)
Output :
output.asmor--o <output.asm>or--output <output.asm>: Output assembly file--stdout: Print the generated assembly code to standard output instead of writing to a file (you can pipe it to the MSM simulator to run it directly, e.g.yacc input.c --stdout | ./msm/msm)
Other options :
-vor--verboseor--debug: Enable verbose mode for detailed output of every compilation step-hor--help: Show help message
See the MSM simulator README for more details about the MSM assembly language and simulator.
To use the MSM simulator, you can pipe the output of the YACC compiler directly into the simulator. For example:
yacc input.c --stdout | ./msm/msmor
yacc input.c --stdout | msm # if msm is in your PATHCompile a C source file to an assembly file:
yacc input.c -o output.asmCompile a C source file and run it directly with the MSM simulator:
yacc input.c --stdout | msmCompile a C source code string and run it directly with the MSM simulator:
yacc --str="int main() { return 42; }" --stdout | msmCompile a C source file read from standard input and run it directly with the MSM simulator:
cat input.c | yacc --stdin --stdout | msmCompile a C source file with verbose output for debugging:
yacc input.c -o output.asm --verboseInstall the test runner once (for example with python -m pip install pytest) and execute the suite from the repository root:
python -m pytestYou can also target a single pipeline step by pointing pytest at the matching file, e.g. python -m pytest tests/test_parser.py to limit the run to parser tests.
The compilation process consists of the following steps (and implemented in the following files):
-
Lexical analysis (tokenisation):
lexer.py
Converts the input C source code into a list of tokens representing the smallest units of meaning (keywords, identifiers, operators, etc.) -
Syntax analysis (parsing):
parser.py
Parses the sequence of tokens to build an Abstract Syntax Tree (AST) representing the program structure -
Semantic analysis:
sema.py
Analyzes the AST to check for semantic errors (e.g., variable declarations, type checking) and annotates the AST with additional information -
Optimization:
optimizer.py
Performs optimizations on the AST to improve performance (e.g., constant folding, dead code elimination) -
Code generation (Codegen):
codegen.py
Generates the target assembly code from the optimized AST -
Optimization:
optimizer.py
Performs optimizations on the generated assembly code to improve performance (e.g., removing instructions that cancel each other out, etc.) -
Binary: (This step is done by the MSM simulator)
The generated assembly code is interpreted line by line to run the program
Show diagram
graph TB
left_CS(["Code Source"]) --> left_AL
subgraph Frontend
left_AL("**Lexical analysis (tokenisation)**<br>lexer.py") --> left_ASy
left_ASy("**Syntax analysis (parsing)**<br>parser.py") --> left_ASe
end
left_ASe("**Semantic analysis**<br>sema.py") --> left_Opt
left_Opt("**Optimization**<br>optimizer.py") --> left_GC
subgraph Backend
left_GC("**Code generation (Codegen)**<br>codegen.py") --> left_Opt2
end
left_Opt2("**Optimization**<br>optimizer.py") -- msm --> left_Bin
left_Bin(["**Binary**"])
-
Source code:
... if (a == 3) { b = 5; } ...
-
Lexical analysis (tokenisation):
... TOK_IF "if" TOK_LPARENTHESIS "(" TOK_IDENT "a" TOK_EQ "==" TOK_CONST "3" TOK_RPARENTHESIS ")" TOK_LBRACE "{" TOK_IDENT "a" TOK_AFFECT "=" TOK_CONST "5" TOK_SEMICOLON ";" TOK_RBRACE "}" ... TOK_EOF -
Syntax analysis (AST generation):
NODE_BLOCK ... ├── NODE_COND │ ├── NODE_EQ │ │ ├── NODE_REF a │ │ └── NODE_CONST #3 │ └── NODE_BLOCK │ └── NODE_DROP │ └── NODE_AFFECT │ ├── NODE_REF b │ └── NODE_CONST #5 ... -
Semantic analysis (AST verification & annotation):
NODE_BLOCK ... ├── NODE_COND │ ├── NODE_EQ │ │ ├── NODE_REF a @0 │ │ └── NODE_CONST #3 │ └── NODE_BLOCK │ └── NODE_DROP │ └── NODE_AFFECT │ ├── NODE_REF b @1 │ └── NODE_CONST #5 ... -
Optimization:
TODO -
Code generation:
... get 0 push 3 cmpeq jumpf L0_else push 5 dup set 0 drop 1 .L0_else ... -
Optimization:
TODO -
Binary:
0101010101101000111010...
Show diagram
graph TB
right_CS("**Source code :**<code><p style='text-align:left; white-space:pre;'>...
if (a == 3) {
</tr>b=5;
}
...</p></code>") --> right_AL
right_AL("**Lexic analyzer<br>(Tokenization) :**<code><p style='text-align:left;white-space:pre;'>...
TOK_IF "if"
TOK_LPARENTHESIS "("
TOK_IDENT "a"
TOK_EQ "=="
TOK_CONST "3"
TOK_RPARENTHESIS ")"
TOK_LBRACE "{"
TOK_IDENT "a"
TOK_AFFECT "="
TOK_CONST "5"
TOK_SEMICOLON ";"
TOK_RBRACE "}"
...
TOK_EOF</p></code>") --> right_ASy
right_ASy("**Syntax analysis<br>(AST generation) :**<code><p style='text-align:left; white-space:pre;'>NODE_BLOCK
...
├── NODE_COND
│ ├── NODE_EQ
│ │ ├── NODE_REF a
│ │ └── NODE_CONST #3
│ └── NODE_BLOCK
│ └── NODE_DROP
│ └── NODE_AFFECT
│ ├── NODE_REF b
│ └── NODE_CONST #5
...</p></code>") --> right_ASe
right_ASe("**Semantic analysis<br>(AST verification & annotation) :**<code><p style='text-align:left;white-space:pre;'>NODE_BLOCK
...
├── NODE_COND
│ ├── NODE_EQ
│ │ ├── NODE_REF a @0
│ │ └── NODE_CONST #3
│ └── NODE_BLOCK
│ └── NODE_DROP
│ └── NODE_AFFECT
│ ├── NODE_REF b @1
│ └── NODE_CONST #5
...</p></code>") --> right_Opt
right_Opt("**Optimization :**<code><p style='text-align:left;white-space:pre;'>TODO</p></code>") --> right_GC
right_GC("**Code generation :**<code><p style='text-align:left;white-space:pre;'>...
get 0
push 3
cmpeq
jumpf L0_else
push 5
dup
set 0
drop 1
.L0_else
...</p></code>") --> right_Opt2
right_Opt2("**Optimization :**<code><p style='text-align:left;white-space:pre;'>TODO</p></code>") -- msm --> right_Bin
right_Bin(["**Binary :**<code><p style='text-align:left;'>0101010101101000111010...</p></code>"])
| Alexandre MALFREYT (Github | LinkedIn | Website) |