* Documentation

Compiler Internals

Deep dive into the Turf compiler internals: AST, Codegen, and LLVM integration.

This section provides a technical deep dive into how the Turf compiler transforms source code into native executables.

The Compilation Pipeline

  1. Lexical Analysis (Lexer.cpp): Converts the raw source text into a stream of tokens (e.g., TOKEN_INT, TOKEN_IDENTIFIER, TOKEN_PLUS).
  2. Parsing (Parser.cpp): A recursive-descent parser consumes tokens and builds an Abstract Syntax Tree (AST).
  3. Semantic Analysis: Performed during parsing and AST construction to ensure types are compatible and variables are declared before use.
  4. Code Generation (Codegen.cpp): The AST nodes are traversed, and each node’s codegen() method generates LLVM Intermediate Representation (IR).
  5. Linking & Optimization: The generated LLVM IR is passed to the LLVM backend to produce machine code and linked into a final binary.

Multi-Pass Compilation

To support features like forward function declarations and recursion without requiring the developer to manually define prototypes, Turf employs a Multi-Pass Compilation strategy in main.cpp.

  1. Pre-Pass (Prototype Registration): The compiler performs an initial, partial parse of the source file. It looks specifically for fn (function definition) nodes. For each function found, it generates an LLVM function prototype (name and signature) but skips the body.
  2. Main Pass (Full Codegen): The compiler resets the lexer and performs a full parse. Because all functions were registered in the pre-pass, calls to functions defined later in the file are correctly resolved during this second pass.
// Pre-pass: register function prototypes
while (CurTok != TOK_EOF) {
  if (CurTok == TOK_FN) {
    auto AST = ParseExpression();
    AST->codegen(); // Registers prototype on first call
  } else {
    getNextToken();
  }
}

Abstract Syntax Tree (AST)

The AST is the heart of the compiler’s intermediate representation. Every language construct is represented by a class inheriting from ExprAST (defined in AST.h).

Key AST Nodes

Symbol Table Architecture

The SymbolTable (implemented in SymbolTable.cpp and SymbolTable.h) is responsible for managing variable scopes and ensuring semantic correctness.

LLVM Backend Integration

Turf leverages the LLVM C++ API to bridge the gap between AST nodes and machine code.

Codegen Context

During codegen(), nodes often need to access global resources. Turf uses global pointers for the Builder, TheModule, and TheContext to maintain a clean AST structure while allowing nodes to emit instructions effectively.

Built-in Functions

Turf includes a flexible built-in system (defined in Builtins.cpp). Functions like print() and printline() are implemented as external LLVM function calls (e.g., to C’s printf), allowing Turf programs to interact with the system standard output.