Smart Error Recovery
In-depth architectural overview of Turf’s advanced diagnostic engine and error recovery mechanisms.
Turf is designed with a “Diagnostics-First” philosophy. Unlike traditional compilers that halt at the first sign of trouble, Turf’s architecture is built to gracefully handle errors, recover, and continue analysis to provide the most comprehensive feedback possible.
Distributed Diagnostic Engine
The core of Turf’s error handling is the DiagnosticEngine, a centralized registry for all compiler issues. This engine is decoupled from the parser and code generator, allowing for:
- Deferred Reporting: Errors are collected and sorted by source location before being flushed to the user.
- Context Preservation: Each error captures the full
SourceLocation, the relevant code snippet, and specialized hints.
Phase 1: Heuristic-Based Recovery
The current implementation (Phase 1) uses advanced string-matching heuristics to suggest fixes for common developer mistakes.
1. Damerau-Levenshtein Distance
Turf implements the Damerau-Levenshtein distance algorithm in Algorithms.cpp to provide suggestions for:
- Keyword Typos: If a user writes
funtioninstead offn, the compiler identifies the similarity and suggests the correct keyword. - Variable Name Typos: Misspelled variables are matched against the
SymbolTableto find the most likely intended reference. - Transposition Detection: Special handling for swapped characters (e.g.,
tehninstead ofthen), which the standard Levenshtein distance might miss.
2. Guarded Recursive Descent
The parser uses a setjmp/longjmp mechanism (as seen in main.cpp) to create “recovery points.” When an error is encountered:
- The parser records the error via
TurfError::raise(). - It then jumps back to a known-good state (e.g., the start of the next statement or function).
- This allows Turf to report multiple independent syntax errors in a single pass.
Phase 2: Speculative Re-parsing (Conceptual)
Future versions of Turf will introduce Speculative Re-parsing, where the compiler internally “tries out” various fixes for an error.
Architectural Flow:
- Error Trigger: An invalid token is encountered.
- Context Snapshot: The compiler creates a snapshot of the current AST and Symbol Table.
- Speculation: The compiler generates multiple “fix candidates” (using heuristics or LLMs).
- Validation: Each candidate is speculatively parsed.
- Ranking: Candidates that resolve the error and lead to a valid AST are ranked and presented to the user.
Semantic Analysis & Linting
Beyond syntax, Turf’s Lint.cpp performs deep semantic checks to prevent logical errors:
- Unreachable Code Detection: Analyzing control flow to identify code that can never be executed (e.g., statements after a
return). - Dead Branch Warnings: Identifying
if/elsebranches that are logically impossible to enter. - Suspicious Loop Detection: Warning about infinite loops that lack a
breakorreturn.
By combining these advanced diagnostic techniques, Turf provides a development experience that is both helpful and educational for the programmer.