Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ cyan Public

C-to-Assembly (x86-64) compiler for a basic subset of C.

License

Notifications You must be signed in to change notification settings

norskeld/cyan

Repository files navigation

cyan

Checks

C-to-Assembly (x86-64) compiler for a basic subset of C.

Why

Simply to learn more about compilers, assembly, and how not to design languages. :)

Features

If something is missing in the list below, then it's not planned to be implemented.

  • Operators:
    • Unary:
      • Prefix (--, ++, !, ~, -)
      • Postfix (--, ++)
    • Binary
      • Arithmetic (+, -, *, /, %)
      • Bitwise (&, |, ^, <<, >>)
    • Logical (!, &&, ||)
    • Relational (<, <=, >, >=, ==, !=)
  • Local variables:
    • Declaration
    • Assignments
    • Compound assignments (+=, -=, etc.)
    • Scopes
  • Storage-class specifiers:
    • static
    • extern
    • typedef
  • Conditionals and control flow:
    • If statements
    • Ternary expressions
    • Labeled statements
    • Switch statements
    • goto statements
    • break and continue
  • Loops:
    • For loops
    • While loops
    • Do-while loops
  • Functions:
    • Function declarations
    • Function definitions
    • Function calls
  • Types:
    • void
    • int
    • long
    • unsigned int
    • unsigned long
    • double
    • char
    • signed char
    • unsigned char
    • Structs
    • Unions
    • Pointers
    • Pointer arithmetic
    • Arrays
  • Memory management:
    • sizeof operator
    • malloc
    • calloc
    • realloc
    • aligned_alloc
    • free

Optimizations:

  • Constant folding
  • Dead code elimination
  • Dead store elimination
  • Copy propagation
  • Register allocation
  • Register coalescing

Additional features:

  • Use the QBE backend.
  • Non-standard extensions to the language, like modules (not macros).

Grammar

Defined using EBNF-like notation.

Definition
program
  = { function-declaration }

declaration
  = variable-declaration
  | function-declaration

function-declaration
  = "int" identifier "(" param-list ")" ( block | ";" )

variable-declaration
  = "int" identifier [ "=" expression ] ";"

param-list
  = "void"
  | "int" identifier { "," "int" identifier }

block
  = "{" { block-item } "}"

block-item
  = declaration
  | statement

statement
  = "return" expression ";"
  | expression ";"
  | identifier ":" statement
  | "if" "(" expression ")" statement [ "else" statement ]
  | "break" ";"
  | "continue" ";"
  | "switch" "(" expression ")" statement
  | "while" "(" expression ")" statement
  | "do" statement "while" "(" expression ")" ";"
  | "for" "(" initializer [ expression ] ";" [ expression ] ";" [ expression ] ")" statement
  | "goto" identifier ";"
  | <block>
  | ";"

initializer
  = variable-declaration
  | [ expression ] ";"

expression
  = factor
  | expression binary-op expression
  | expression "?" expression ":" expression

factor
  = unary-op factor
  | postfix

postfix
  = primary { postfix-op }

primary
  = int
  | identifier
  | "(" expression ")"
  | identifier "(" [ argument-list ] ")"

argument-list
  = expression { "," expression }

unary-op
  = "-" | "~" | "!" | "++" | "--"

postfix-op
  = "++" | "--"

binary-op
  = "+" | "-" | "*" | "/" | "%"
  | "<<" | ">>" | "&" | "|" | "^"
  | "&&" | "||" | "==" | "!=" | "<" | "<=" | ">" | ">="
  | "=" | "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>="

identifier
  = ? An identifier token ?

int
  = ? A constant token ?

Trees and IRs

The standard drill: programs get lexed, then parsed into AST, then lowered to TAC, then into AAST, which is then used to emit assembly.

AST

This is used to represent the syntax tree of the program, and to perform semantic analysis.

Three Address Code (TAC)

This IR stands between the AST and the assembly code, and will let us handle structural transformations separately from the details of assembly language (this is to be done), and it's also should be well suited for applying some compile-time optimizations (also to be done).

Assembly AST (AAST)

This IR is very low-level, relatively flat, and is used to emit assembly code in AT&T syntax.

Links

License

MIT.

About

C-to-Assembly (x86-64) compiler for a basic subset of C.

Topics

Resources

License

Stars

Watchers

Forks

Languages