Timing Violation Detection
See also:
timing-correctness.md— forward-looking validation contract and timing IR requirements (in progress). The document below describes current behaviour.
Guide to enabling, reading, and debugging setup/hold timing violations in GEM.
Overview
Setup and hold violations occur when data arrives too late (setup) or too early (hold) relative to the clock edge at a flip-flop. GEM checks for these violations during GPU simulation by tracking arrival times — the accumulated gate delay from primary inputs or DFF outputs through combinational logic to the next DFF data input.
Approximation model: GEM tracks one arrival time per 32-signal group (one GPU thread position). The arrival is the maximum across all 32 signals in the group. This is conservative: it may over-report violations but will never miss a real one. See Reducing False Positives for details.
Enabling Timing Checks
Prerequisites
- SDF file with back-annotated delays from your place-and-route tool
- Gate-level netlist synthesized to
aigpdk.libcells
Step-by-step
-
Generate SDF from your P&R tool (or use
scripts/generate_sdf.pyfor test designs):# Example: OpenROAD flow output ls my_build/6_final.sdf -
Run the simulator with
--sdfand a clock period:Metal (macOS):
cargo run -r --features metal --bin jacquard -- sim \ design.gv input.vcd output.vcd 1 \ --sdf design.sdf \ --sdf-corner typCUDA (NVIDIA):
cargo run -r --features cuda --bin jacquard -- sim \ design.gv input.vcd output.vcd 8 \ --sdf design.sdf \ --sdf-corner typ \ --enable-timing \ --timing-clock-period 1200cosim (co-simulation):
cargo run -r --features metal --bin jacquard -- cosim \ design.gv \ --config testbench.json \ --sdf design.sdf \ --sdf-corner typ
CLI Flags Reference
| Flag | Binary | Description |
|---|---|---|
--sdf <path> | all | Path to SDF file with back-annotated delays |
--sdf-corner <min|typ|max> | all | Which SDF corner to use (default: typ) |
--sdf-debug | all | Print unmatched SDF instances for debugging |
--enable-timing | jacquard sim | Enable timing analysis (arrival + violation checks) |
--timing-clock-period <ps> | jacquard sim | Clock period in picoseconds (default: 1000) |
--timing-report-violations | jacquard sim | Report all violations, not just summary |
--timing-report <path.json> | jacquard sim | Write a structured end-of-run JSON report (schema in src/timing_report.rs, ADR 0008). |
--timing-summary | jacquard sim | Print a human-readable text summary at end of run. Independent of --timing-report; both can be combined. |
--timing-report-max-violations <N> | jacquard sim | Cap on the per-cycle violations list in --timing-report. Default 100k. 0 = unbounded. Totals + worst-slack always reflect every event. |
--liberty <path> | jacquard sim | Liberty library for timing data (optional, falls back to AIGPDK defaults) |
Example: inv_chain_pnr Test Case
# Run with SDF timing
cargo run -r --features metal --bin jacquard -- sim \
tests/timing_test/inv_chain_pnr/6_final.v \
tests/timing_test/inv_chain_pnr/input.vcd \
tests/timing_test/inv_chain_pnr/output.vcd 1 \
--sdf tests/timing_test/inv_chain_pnr/6_final.sdf
Reading Violation Reports
Setup Violation Format
[cycle 42] SETUP VIOLATION at top/cpu/regs[7][bit 22] [word=5]: arrival=900ps setup=200ps slack=-100ps
(WS-P1.1.a, 2026-05-02: state-word indices are now resolved to symbolic
hierarchical signal names. The bare [word=N] suffix is preserved for
grep compatibility. Words packing more than 4 DFFs truncate with a
+N more suffix.)
| Field | Meaning |
|---|---|
| cycle | Simulation cycle where the violation occurred |
| word | State word index — identifies a group of 32 DFF data inputs |
| arrival | Maximum accumulated gate delay to this word's signals (picoseconds) |
| setup | DFF setup time constraint from SDF/Liberty (picoseconds) |
| slack | clock_period - arrival - setup. Negative = violation amount |
Hold Violation Format
[cycle 11] HOLD VIOLATION at top/cpu/state[bit 3] [word=3]: arrival=10ps hold=50ps slack=-40ps
| Field | Meaning |
|---|---|
| cycle | Simulation cycle where the violation occurred |
| word | State word index |
| arrival | Accumulated gate delay to this word's signals (picoseconds) |
| hold | DFF hold time constraint from SDF/Liberty (picoseconds) |
| slack | arrival - hold. Negative = violation amount |
Summary Statistics
At the end of simulation, GEM prints totals:
Simulation complete: 1000 cycles, 5 setup violations, 0 hold violations
Text Summary (--timing-summary)
A one-screen human summary printed to stdout at end of run. Reuses the
same data the JSON report builds (so --timing-report and
--timing-summary cost the same; only the output channel differs).
Sample output:
=== Jacquard Timing Summary ===
Design: my_cpu.gv
Vectors: boot.vcd (1000 cycles)
Clock period: 1000 ps
Timing source: my_cpu.jtir
Violations:
Setup: 5
Hold: 2
Total: 7
Worst slack:
Setup: -150ps at top/cpu/regs[7][bit 22] [word=5] (cycle 87)
Hold: -40ps at top/cpu/state[bit 3] [word=12] (cycle 91)
Top 2 by violation count (of 2 total words with violations):
top/cpu/regs[7][bit 22] [word=5] (5 violations): worst setup=-150ps hold=- arrival=950ps
top/cpu/state[bit 3] [word=12] (2 violations): worst setup=- hold=-40ps arrival=10ps
The format is for human inspection — explicitly not a stable
parseable contract. Tools that need to script against the data should
use --timing-report JSON.
Structured JSON Report (--timing-report <path.json>)
For CI integration and downstream tooling, pass --timing-report <path>
to get an end-of-run JSON document. The schema is versioned (ADR 0008's
stability contract: additive-only extensions, breaking changes bump
the major). Sample at tests/timing_ir/sample_reports/two_violations.json;
authoritative type definitions in src/timing_report.rs.
Top-level shape:
{
"schema_version": "1.0.0",
"metadata": { "design": "...", "cycles_run": 1000, "clock_period_ps": 1000, "...": "..." },
"stats": { "setup_violations": 5, "hold_violations": 0, "events_dropped": 0 },
"violations": [
{ "cycle": 42, "kind": "setup", "word_id": 5, "site": "top/cpu/regs[7][bit 22] [word=5]",
"arrival_ps": 900, "constraint_ps": 200, "slack_ps": -100 }
],
"per_word": [
{ "word_id": 5, "site": "...", "setup_violations": 5, "hold_violations": 0,
"worst_setup_slack_ps": -100, "worst_hold_slack_ps": null, "worst_arrival_ps": 900 }
],
"worst_slack": {
"setup": [ /* top-N most-negative slacks across the run */ ],
"hold": [ /* same shape */ ]
}
}
per_word is sorted by total violation count desc, then by word_id.
worst_slack.setup / .hold are top-10 by closest-to-violation slack
(most negative first). Caveats:
- The "even when no violation occurred" half of WS-P1.1.d (per-DFF
closest-to-violation tracking when the design never tripped a
violation) needs GPU-side near-miss instrumentation and is not in
v1.0.0; for now,
worst_slackis populated only from actual violation events. --timing-reportonly produces output today on the Metal sim path. The CUDA / HIP / cosim paths do not currently route runtime violations throughprocess_events— bringing them in is independent plumbing.- The
violationsarray is capped at 100,000 records by default (~8 MB JSON). Override or disable the cap with--timing-report-max-violations <N>(0= unbounded). Setup/hold totals,events_dropped, andworst_slackrankings always reflect every observed event; only the per-cycle list is bounded.stats.violations_truncatedreports how many records were dropped because the cap was reached.
Tracing Violations to Source Signals
When you see a violation on a specific word, follow this workflow to identify the offending signals and their logic cone.
1. Get the Word Index
From the log: word 5 means state word index 5.
2. Map Word to DFF Signals
Each word covers 32 bits of state. The DFFs in that word have data_state_pos / 32 == word_index. To find which DFFs:
-
Look at the
dff_constraintsentries in theFlattenedScriptV1:dff_constraints entries where data_state_pos / 32 == 5 → cell_id values → netlist cell names -
In
gpu_sim, violations are logged with word IDs that map directly to theoutput_mappositions. Each word covers bit positionsword * 32throughword * 32 + 31.
3. Trace Backwards with netlist_graph
Use the netlist_graph tool to trace the combinational logic cone feeding the DFF. After uv sync --group dev, the netlist-graph console script is on the workspace's uv run path — no cd required:
# Find the DFF data input driver chain
uv run netlist-graph drivers design.v "dff_name.D" -d 10
# Search for DFFs matching a pattern
uv run netlist-graph search design.v "dff_out*"
Discovered signal names can be passed directly into jacquard sim --trace-signals <file> / jacquard cosim --trace-signals <file> (one name per line) to surface them in the output VCD alongside top-level IO.
4. Detailed Timing Analysis with CVC
For per-signal accuracy (no 32-signal approximation), use CVC (open-src-cvc) with SDF back-annotation:
# Run CVC with SDF timing
cvc64 +typdelays tb.v design.v
./cvcsim
CVC provides event-driven simulation with full SDF support (IOPATH + INTERCONNECT delays), allowing you to pinpoint exactly which path is critical.
The Approximation Caveat
GEM tracks one arrival time per 32-signal group (one GPU thread position). The tracked value is the maximum arrival across all 32 signals in that thread. This means:
- Conservative: If any signal in the group has a long path, the arrival for the entire group reflects that worst case. Violations may be reported for signals that individually meet timing.
- Never misses real violations: A real violation always results in a reported violation (the max is >= any individual signal's arrival).
Reducing False Positives
If a violation is reported but you suspect it's a false positive from the approximation:
- Use CVC for per-signal accuracy (see Detailed Timing Analysis with CVC above).
- Timing-aware bit packing groups signals with similar arrival times into the same thread, reducing the approximation error. See
docs/timing-simulation.md§ "Timing-Aware Bit Packing" for details.
Common Scenarios
Setup violations on many words, same cycle: The clock period is likely too tight for the design. The combinational logic depth exceeds what can settle in one clock period. Try increasing the clock period.
Setup violation on a single word: A critical path through one specific logic cone. Use netlist_graph drivers to trace the path and identify the bottleneck.
Hold violation: Rare with SKY130 process (negative hold times clamp to 0 in the SDF). If seen, the design likely has minimum-delay paths that are too short. Check for direct connections between DFF outputs and nearby DFF inputs with minimal combinational logic.
Violations only on first cycle: The arrival > 0 guard in the GPU kernel skips setup checks when arrival is zero (meaning no data has propagated through combinational logic yet). If you see violations on cycle 0, they are hold violations — setup violations on cycle 0 are suppressed by design.