Simone Siega

6 min read

CFG Parser

Rust-based command-line parser that turns raw arithmetic input into validated numeric results through a tokenizer and recursive-descent parser/evaluator.

  • LanguageRust
  • TypeCLI tool
  • ParserRecursive descent
  • ArchitectureTokenizer -> Parser/Evaluator
  • Supported syntax8 operators/forms

Overview

CFG Parser is a command-line project for parsing and evaluating arithmetic expressions through a grammar-based architecture. Instead of treating evaluation as a single opaque operation, the project breaks the process into explicit phases, making the flow easier to inspect, test, and refine independently. It was developed as an early hands-on exploration of Rust and context-free grammars. Rust was chosen for its control over program structure, memory safety, and data flow, making it a strong fit for deterministic systems-style tooling.

Goal

The goal was to transform raw user input into a valid arithmetic representation while preserving operator precedence and rejecting malformed syntax before producing a result. A central objective was to keep lexical analysis separate from grammar traversal and result computation, so each part of the flow remained easier to reason about and extend.

Technical Approach

The implementation separates lexical analysis from recursive-descent parsing and evaluation. Input is first converted into a token stream, then consumed by grammar-driven parser methods that encode precedence, validate semantic constraints, and compute results in a predictable way.

Architecture

The application is organized around clear boundaries between input handling, tokenization, recursive-descent parsing/evaluation, and formatted output. Keeping lexical analysis separate from grammar traversal made it easier to inspect token streams, isolate parser failures, and evolve the grammar without coupling unrelated parts of the system.

CFG Parser architecture diagram
High-level parser architecture, from tokenization to recursive-descent parsing/evaluation, including grammar layers and structured error handling.

Key Decisions

The project favors explicit parsing logic and readable control flow over compact abstractions. That decision made the parser easier to trace, adapt, and extend while keeping the grammar understandable as the implementation evolved.

  • Keep tokenization separate from parser/evaluator logic so lexical and grammar concerns remain isolated.
  • Represent grammar rules explicitly through recursive-descent functions to make precedence handling easier to inspect and debug.
  • Compute results during grammar traversal instead of introducing an intermediate AST for this project scope.
  • Design CLI output to surface concise diagnostics and fail early when input is invalid.

Challenges

One of the main challenges was rejecting malformed expressions before invalid state could propagate through the system. This required defensive checks around token consumption, controlled parser branches, and error paths that remained understandable for CLI users.

  • Handle unexpected symbols and incomplete expressions without triggering cascading parser failures.
  • Prevent invalid token or parse states from producing a result.
  • Keep diagnostics useful and readable without exposing unnecessary internal complexity.
  • Validate semantic math constraints such as division by zero, invalid roots, overflow, and underflow.

Implementation Evidence

The parser is backed by documented grammar rules and explicit non-terminal semantics, making the implementation traceable from formal definition to code structure. The repository shows how precedence, associativity, validation, and evaluation are modeled directly inside the parser.

  • Formal grammar defines the accepted arithmetic language, including expression termination, precedence layers, product operations, exponentiation, roots, unary negation, numbers, and parenthesized expressions.
  • Non-terminal semantics document the role of each grammar component, from full input formulas to atomic base operands.
  • Recursive-descent functions map directly to grammar responsibilities, keeping parsing and evaluation behavior readable and debuggable.
  • Operator precedence and associativity are handled structurally through grammar decomposition rather than through post-processing.
  • Implicit multiplication is treated as a parser-level transition triggered by valid token adjacency within the product context.
  • The validation model separates token/syntax errors from semantic/math errors, keeping malformed syntax and invalid operations distinct.

What I Learned

Building this project strengthened my understanding of parser construction as a design problem, not just an implementation task. It gave me practical experience with recursive-descent parsing, grammar decomposition, and Rust-based tooling where correctness, control flow, and module boundaries all matter.

Future Improvements

Future iterations could extend the project beyond arithmetic evaluation into a richer grammar and a more expressive parsing toolchain.

  • Add support for functions such as sin, cos, and tan as first-class grammar constructs.
  • Allow symbolic variables rather than limiting expressions to numeric-only input.
  • Extend the parser toward equation handling and broader validation rules.

Links

Additional resources for exploring the project, including the source repository and technical documentation.