The 4-Stage IR Pipeline
The wirespec compiler uses four intermediate representations (IRs), each progressively closer to machine output. This separation keeps language semantics independent from code generation, making it possible to add new backends without touching the parser or type checker.
This page is for contributors who want to understand or modify compiler internals.
Why Four Stages?
The compiler is organized as a chain of IRs that become progressively lower-level. This allows the language semantics to be independent of the code generation target.
In practice this means:
- AST = what the programmer wrote (syntax)
- Semantic IR = what it means (semantics, backend-agnostic)
- Layout IR = how bytes are arranged on the wire
- Codec IR = how to parse/serialize those bytes (backend-specific)
A Rust backend and a C backend share the first three stages. They diverge only at Codec IR, where target type names differ (uint32_t vs u32) and backend-specific idioms apply.
Data Flow
AST ─── wirespec-sema ──→ Semantic IR
│
wirespec-layout
│
▼
Layout IR
│
wirespec-codec
│
▼
Codec IR
│
┌───────────────┴───────────────┐
wirespec-backend-c wirespec-backend-rust
│ │
▼ ▼
.h + .c output files .rs output fileStage 1: AST
Produced by: wirespec-syntax crate Key types: WireFile, PacketDef, FrameDef, CapsuleDef, TypeDef, StateMachineDef, Expr
The AST is a direct structural representation of the source text. Nothing is resolved — field references in expressions are bare name strings, type references are unresolved identifiers.
Key types:
| Type | Description |
|---|---|
WireFile | Top-level container: module decl, imports, definitions |
PacketDef | packet Foo { ... } definition |
FrameDef | frame Foo = match tag: T { ... } definition |
CapsuleDef | capsule Foo { ... within ... } definition |
TypeDef | type Foo = ... (alias or computed type) |
StateMachineDef | state machine Foo { ... } |
Expr | Expression AST nodes: NameExpr, LiteralExpr, BinaryExpr, CoalesceExpr, etc. |
The AST is syntax-directed and must not be used directly by backends. Once semantic resolution is needed, backends must consume Semantic IR or below.
Stage 2: Semantic IR
Produced by: wirespec-sema crate Key types: SemanticModule, SemanticStruct, SemanticFrame, StateMachine, SemanticVarInt, SemField
This is the first stage where the program's meaning is fully established. Every name is resolved. Every type is known.
WireType Enum
Every field gets a WireType variant:
| Variant | Meaning |
|---|---|
U8, U16, U24, U32, U64 | Unsigned integers |
I8, I16, I32, I64 | Signed integers |
VARINT | Prefix-match variable-length integer |
CONT_VARINT | Continuation-bit variable-length integer |
BOOL | Semantic boolean (derived fields / guards only) |
BYTES | Byte sequence (fixed, length-prefixed, or remaining) |
BITS | Sub-byte field (bits[N]) |
BIT | Single-bit field |
ARRAY | Homogeneous array [T; count] |
STRUCT | Named packet/frame/capsule reference |
ENUM | Named enum reference |
FLAGS | Named flags reference |
SemExpr Replaces AST Expr
Raw Expr nodes are replaced by typed SemExpr variants:
| SemExpr | Description |
|---|---|
SemFieldRef | Field reference with resolved type |
SemLiteral | Integer, string, bool literal |
SemBinaryOp | Binary operation with typed operands |
SemCoalesce | ?? coalesce (Option[T] + default) |
SemInState | in_state(S) predicate for state machine guards |
SemAll | all(collection, predicate) quantifier |
SemSlice | field[start..end] half-open slice |
SemSubscript | field[index] array subscript |
Option[T] Tracking
Conditional fields (if COND { T }) are typed as Option[T] in the Semantic IR. Any expression that references such a field must use ?? (coalesce) or be inside a guard — this is enforced here.
State Machine Validation
State machine definitions are validated in this stage:
- All states referenced in transitions exist
- All events handled in
onclauses have consistent parameter types actionblocks fully initializedstfields that have no default valuedelegateandactiondo not coexist in the same transition
Key Rust Types
SemanticModule // One per .wspec file: structs, frames, state machines, imports
SemanticStruct // A packet or frame branch: name, fields, constraints
SemanticFrame // A tagged union: tag field + list of SemanticStruct branches
StateMachine // State machine: states, transitions, initial state
SemanticVarInt // Computed type (prefix-match or continuation-bit)
SemField // A single field with WireType, name, optional SemExpr conditionStage 3: Layout IR
Produced by: wirespec-layout crate Key types: LayoutModule, LayoutField, BitGroup, Endianness
Layout IR describes wire byte geometry. It answers: in what order do the bytes appear, and how are bits packed?
Endianness Resolution
Each field gets an Endianness value (Big, Little, None):
- Explicit type suffix wins:
u16le→Little - Module-level
@endianannotation is the fallback - Type aliases are chased to their underlying type
BitGroup Collapsing
Consecutive bits[N] fields (and bit fields) in a packet, frame branch, or capsule body are collapsed into a single BitGroup. For example:
packet IPv4Header {
version: bits[4],
ihl: bits[4],
dscp: bits[6],
ecn: bits[2],
...
}The first two fields become one BitGroup reading 1 byte. The next two become another BitGroup reading 1 byte. The C backend then emits a single uint8_t read and shift/mask extractions.
Key Rust Types
LayoutModule // Layout wrapper for a full module
LayoutField // Single field with Endianness
BitGroup // Grouped consecutive bits[N] fields + total byte width
Endianness // Big | Little | NoneLayout IR is deliberately free of code generation concerns. It says nothing about uint32_t or function signatures — that is Codec IR's job.
Stage 4: Codec IR
Produced by: wirespec-codec crate Key types: CodecModule, CodecStruct, CodecFrame, CodecField, FieldStrategy
Codec IR is the final backend-agnostic representation. It assigns a FieldStrategy and target type to every field.
FieldStrategy
| Strategy | What it represents |
|---|---|
Primitive | Read N bytes, apply endian conversion |
VarInt | Decode prefix-match variable-length integer |
ContVarInt | Decode continuation-bit variable-length integer |
BytesFixed | Zero-copy byte slice (pointer + length) |
BytesLength | Length-prefixed byte slice |
BytesRemaining | Consume all remaining bytes in current scope |
Array | Loop: read count field, parse N elements |
BitGroup | Single read + per-field shift/mask |
Struct | Nested struct parse/serialize call |
Conditional | Evaluate condition, parse if true |
Checksum | Verify on parse, auto-compute on serialize |
Derived | let field — computed from other fields, not on wire |
Constraint | require expression — runtime check only |
MemoryTier
The three-tier memory model:
| Tier | Example | Strategy |
|---|---|---|
| A | bytes[length] | Zero-copy: pointer + length view into the input buffer |
| B | [u16le; N] | Materialized: memcpy + byte-swap into fixed array |
| C | [AckRange; N] | Materialized: parse each element into pre-allocated struct array |
Key Rust Types
CodecModule // Full module codec representation
CodecStruct // One struct (packet or frame branch) with codec fields
CodecFrame // Tagged union with tag codec + branch list
CodecField // One field: strategy, wire_type, layout ref, semantic ref
FieldStrategy // Enum of parse/serialize strategies (see above)Using the Pipeline Programmatically
The wirespec-driver crate provides the entry point for full-pipeline compilation:
use wirespec_driver::{compile, CompileRequest};
let result = compile(&CompileRequest {
entry: "examples/quic/varint.wspec".into(),
include_paths: vec!["examples/".into()],
profile: wirespec_sema::ComplianceProfile::default(),
});
match result {
Ok(result) => {
for module in &result.modules {
// module.codec is the CodecModule ready for backend consumption
println!("Module: {}", module.module_name);
}
}
Err(e) => eprintln!("error: {e}"),
}For single-module compilation without import resolution:
use wirespec_driver::compile_module;
let source = std::fs::read_to_string("examples/net/udp.wspec").unwrap();
let compiled = compile_module(
&source,
wirespec_sema::ComplianceProfile::default(),
&Default::default(),
).unwrap();
// compiled.codec is the CodecModuleDebugging the Pipeline
To inspect each stage during development:
use wirespec_syntax;
use wirespec_sema;
use wirespec_layout;
use wirespec_codec;
let source = std::fs::read_to_string("examples/quic/varint.wspec").unwrap();
let ast = wirespec_syntax::parse(&source).unwrap();
// Inspect AST nodes
let sem = wirespec_sema::analyze(
&ast,
wirespec_sema::ComplianceProfile::default(),
&Default::default(),
).unwrap();
// Inspect SemanticModule: sem.packets, sem.frames, etc.
let layout = wirespec_layout::lower(&sem).unwrap();
// Inspect LayoutModule
let codec = wirespec_codec::lower(&layout).unwrap();
// Inspect CodecModule: codec.packets, codec.frames, etc.Each IR is a plain Rust struct — Debug is implemented for inspection.