C-to-Assembly (x86-64) compiler for a basic subset of C.
Simply to learn more about compilers, assembly, and how not to design languages. :)
If something is missing in the list below, then it's not planned to be implemented.
- Operators:
- Unary:
- Prefix (
--,++,!,~,-) - Postfix (
--,++)
- Prefix (
- Binary
- Arithmetic (
+,-,*,/,%) - Bitwise (
&,|,^,<<,>>)
- Arithmetic (
- Logical (
!,&&,||) - Relational (
<,<=,>,>=,==,!=)
- Unary:
- Local variables:
- Declaration
- Assignments
- Compound assignments (
+=,-=, etc.) - Scopes
- Storage-class specifiers:
-
static -
extern -
typedef
-
- Conditionals and control flow:
- If statements
- Ternary expressions
- Labeled statements
- Switch statements
-
gotostatements -
breakandcontinue
- Loops:
- For loops
- While loops
- Do-while loops
- Functions:
- Function declarations
- Function definitions
- Function calls
- Types:
-
void -
int -
long -
unsigned int -
unsigned long -
double -
char -
signed char -
unsigned char - Structs
- Unions
- Pointers
- Pointer arithmetic
- Arrays
-
- Memory management:
-
sizeofoperator -
malloc -
calloc -
realloc -
aligned_alloc -
free
-
Optimizations:
- Constant folding
- Dead code elimination
- Dead store elimination
- Copy propagation
- Register allocation
- Register coalescing
Additional features:
- Use the QBE backend.
- Non-standard extensions to the language, like modules (not macros).
Defined using EBNF-like notation.
Definition
program
= { function-declaration }
declaration
= variable-declaration
| function-declaration
function-declaration
= "int" identifier "(" param-list ")" ( block | ";" )
variable-declaration
= "int" identifier [ "=" expression ] ";"
param-list
= "void"
| "int" identifier { "," "int" identifier }
block
= "{" { block-item } "}"
block-item
= declaration
| statement
statement
= "return" expression ";"
| expression ";"
| identifier ":" statement
| "if" "(" expression ")" statement [ "else" statement ]
| "break" ";"
| "continue" ";"
| "switch" "(" expression ")" statement
| "while" "(" expression ")" statement
| "do" statement "while" "(" expression ")" ";"
| "for" "(" initializer [ expression ] ";" [ expression ] ";" [ expression ] ")" statement
| "goto" identifier ";"
| <block>
| ";"
initializer
= variable-declaration
| [ expression ] ";"
expression
= factor
| expression binary-op expression
| expression "?" expression ":" expression
factor
= unary-op factor
| postfix
postfix
= primary { postfix-op }
primary
= int
| identifier
| "(" expression ")"
| identifier "(" [ argument-list ] ")"
argument-list
= expression { "," expression }
unary-op
= "-" | "~" | "!" | "++" | "--"
postfix-op
= "++" | "--"
binary-op
= "+" | "-" | "*" | "/" | "%"
| "<<" | ">>" | "&" | "|" | "^"
| "&&" | "||" | "==" | "!=" | "<" | "<=" | ">" | ">="
| "=" | "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>="
identifier
= ? An identifier token ?
int
= ? A constant token ?The standard drill: programs get lexed, then parsed into AST, then lowered to TAC, then into AAST, which is then used to emit assembly.
This is used to represent the syntax tree of the program, and to perform semantic analysis.
This IR stands between the AST and the assembly code, and will let us handle structural transformations separately from the details of assembly language (this is to be done), and it's also should be well suited for applying some compile-time optimizations (also to be done).
This IR is very low-level, relatively flat, and is used to emit assembly code in AT&T syntax.
MIT.