Compiler Architecture
wirespec is a hand-written compiler implemented in Rust. There are no parser generators, no template engines, and no external dependencies beyond standard Rust crates.
Pipeline Overview
Source (.wspec)
→ AST (wirespec-syntax)
→ Semantic IR (wirespec-sema)
→ Layout IR (wirespec-layout)
→ Codec IR (wirespec-codec)
→ C Code (wirespec-backend-c)
→ Rust Code (wirespec-backend-rust)Each stage transforms the representation into something progressively closer to machine output. Backends (C, Rust) consume Codec IR — they never touch the AST directly.
Crate Structure
wirespec/
├── Cargo.toml # Workspace root
├── crates/
│ ├── wirespec-syntax/ # Lexer + Parser → AST
│ ├── wirespec-sema/ # Semantic analysis → Semantic IR
│ ├── wirespec-layout/ # Wire shape → Layout IR
│ ├── wirespec-codec/ # Parse/serialize strategy → Codec IR
│ ├── wirespec-backend-api/ # Backend trait + contracts
│ ├── wirespec-backend-c/ # C code generation
│ ├── wirespec-backend-rust/ # Rust code generation
│ └── wirespec-driver/ # Module resolver + CLI
├── examples/ # .wspec/.wspec protocol definitions
├── docs/ # Design documents and plans
└── protospec/ # Python reference implementation (legacy)| Crate | Description |
|---|---|
wirespec-syntax | Hand-written lexer + recursive descent parser, AST node types, span tracking |
wirespec-sema | Semantic analysis: name resolution, type checking, validation rules |
wirespec-layout | Layout lowering: wire field ordering, bit group packing, endianness |
wirespec-codec | Codec lowering: parse/serialize strategies, zero-copy decisions, capacity checks |
wirespec-backend-api | Backend trait definitions (Backend, BackendDyn, ArtifactSink, checksum bindings) |
wirespec-backend-c | C code generator: header + source, bitgroup shift/mask, checksum verify/compute |
wirespec-backend-rust | Rust code generator: single .rs file, lifetime tracking, Rust enums for frames |
wirespec-driver | Compilation driver: module resolution, dependency graph, multi-module pipeline, CLI binary |
Stage Descriptions
Parser (wirespec-syntax)
The lexer and recursive descent parser are implemented as a single crate. Each grammar production maps to one parse method. The result is an AST — a direct structural echo of the source text. No name resolution happens here. Field references in expressions are just name strings at this stage.
Semantic Analyzer (wirespec-sema)
The largest crate in the compiler. It takes an AST and produces a SemanticModule with:
- All names resolved to their definitions
Option[T]applied to conditional fields- Wire types assigned to every field
- Typed semantic expressions replacing raw AST expression nodes
- State machine validation (reachability, complete action coverage)
requireconstraints checked for type correctness
This is the last stage that understands the language's meaning. Downstream passes treat it as ground truth.
Layout Pass (wirespec-layout)
Translates semantic types into wire byte geometry:
- Endianness resolved per field (from type suffix like
u16le, from@endianmodule annotation, or from a type alias chain) - Consecutive
bits[N]fields grouped into a single read operation — aBitGroupthat is then shift-and-masked into individual fields - No code generation logic here — purely descriptive
Codec Pass (wirespec-codec)
Assigns a field strategy to every field:
Primitive— read N bytes, apply endian conversionVarInt— prefix-match variable-length integerContVarInt— continuation-bit variable-length integer (MQTT-style)BytesFixed— zero-copy byte sliceBytesLength— length-prefixed byte sliceBytesRemaining— consume scope remainderArray— loop over element count from prior fieldBitGroup— single read + shift/maskStruct— nested struct parse callConditional—if COND { T }optional fieldChecksum— verify on parse, compute on serialize
C Code Generator (wirespec-backend-c)
Consumes CodecModule and emits .h + .c files. Every generated function follows the same contract:
wirespec_result_t PREFIX_parse(
const uint8_t *buf, size_t len,
PREFIX_t *out, size_t *consumed);
wirespec_result_t PREFIX_serialize(
const PREFIX_t *in,
uint8_t *buf, size_t cap, size_t *written);
size_t PREFIX_serialized_len(const PREFIX_t *in);No heap allocation. All buffers are caller-provided. This invariant is enforced during code review and tested by compiling with -Wall -Wextra -Werror.
Rust Code Generator (wirespec-backend-rust)
Consumes CodecModule and emits a single .rs file. Uses the same structured emitter approach as the C backend. The Rust backend is fully implemented and covered by codegen and end-to-end tests.
Module Resolver (wirespec-driver)
Handles multi-file compilation:
- Parse the entry
.wspecfile, collectimportdeclarations - Locate each imported module on disk (respects
-Iinclude paths) - Recursively parse imported modules
- Detect cycles (error if found)
- Return modules in topological order (dependencies first)
The compiler then processes modules in that order, collecting exported types for use in downstream modules.
Multi-Module Compilation Flow
Entry .wspec file
└─ wirespec-driver: find + parse all imports (depth-first, topo sorted)
└─ For each module (dependencies first):
wirespec-syntax → AST
wirespec-sema → Semantic IR (with imported types injected)
wirespec-layout → Layout IR
wirespec-codec → Codec IR
wirespec-backend-c → .h + .c (-t c)
wirespec-backend-rust → .rs (-t rust)Downstream modules receive the exported types from upstream modules before their own semantic analysis runs. This is how import quic.varint.VarInt makes VarInt available as a named type in quic.frames.
Running Tests
# All tests (933+ across 8 crates)
cargo test --workspace
# Run tests for a specific crate
cargo test -p wirespec-sema
# Build the compiler
cargo build --release
# Compile a .wspec file
./target/release/wirespec compile examples/quic/varint.wspec -t c -o build/
# Compile and test generated C
cd build && gcc -Wall -Wextra -Werror -O2 -std=c11 \
-o test_varint quic_varint.c tests/test_varint.c && ./test_varintDesign Principles
- No heap allocation in generated C. All buffers are caller-provided. Generated code uses stack and zero-copy views only.
- Generated code compiles warning-free under
gcc -Wall -Wextra -Werror -std=c11. - Backends consume Codec IR only — all name resolution and type checking is complete before code generation.