Timing Violation Detection

See also: timing-correctness.md — forward-looking validation contract and timing IR requirements (in progress). The document below describes current behaviour.

Guide to enabling, reading, and debugging setup/hold timing violations in GEM.

Overview

Setup and hold violations occur when data arrives too late (setup) or too early (hold) relative to the clock edge at a flip-flop. GEM checks for these violations during GPU simulation by tracking arrival times — the accumulated gate delay from primary inputs or DFF outputs through combinational logic to the next DFF data input.

Approximation model: GEM tracks one arrival time per 32-signal group (one GPU thread position). The arrival is the maximum across all 32 signals in the group. This is conservative: it may over-report violations but will never miss a real one. See Reducing False Positives for details.

Enabling Timing Checks

Prerequisites

  1. SDF file with back-annotated delays from your place-and-route tool
  2. Gate-level netlist synthesized to aigpdk.lib cells

Step-by-step

  1. Generate SDF from your P&R tool (or use scripts/generate_sdf.py for test designs):

    # Example: OpenROAD flow output
    ls my_build/6_final.sdf
    
  2. Run the simulator with --sdf and a clock period:

    Metal (macOS):

    cargo run -r --features metal --bin jacquard -- sim \
        design.gv input.vcd output.vcd 1 \
        --sdf design.sdf \
        --sdf-corner typ
    

    CUDA (NVIDIA):

    cargo run -r --features cuda --bin jacquard -- sim \
        design.gv input.vcd output.vcd 8 \
        --sdf design.sdf \
        --sdf-corner typ \
        --enable-timing \
        --timing-clock-period 1200
    

    cosim (co-simulation):

    cargo run -r --features metal --bin jacquard -- cosim \
        design.gv \
        --config testbench.json \
        --sdf design.sdf \
        --sdf-corner typ
    

CLI Flags Reference

FlagBinaryDescription
--sdf <path>allPath to SDF file with back-annotated delays
--sdf-corner <min|typ|max>allWhich SDF corner to use (default: typ)
--sdf-debugallPrint unmatched SDF instances for debugging
--enable-timingjacquard simEnable timing analysis (arrival + violation checks)
--timing-clock-period <ps>jacquard simClock period in picoseconds (default: 1000)
--timing-report-violationsjacquard simReport all violations, not just summary
--timing-report <path.json>jacquard simWrite a structured end-of-run JSON report (schema in src/timing_report.rs, ADR 0008).
--timing-summaryjacquard simPrint a human-readable text summary at end of run. Independent of --timing-report; both can be combined.
--timing-report-max-violations <N>jacquard simCap on the per-cycle violations list in --timing-report. Default 100k. 0 = unbounded. Totals + worst-slack always reflect every event.
--liberty <path>jacquard simLiberty library for timing data (optional, falls back to AIGPDK defaults)

Example: inv_chain_pnr Test Case

# Run with SDF timing
cargo run -r --features metal --bin jacquard -- sim \
    tests/timing_test/inv_chain_pnr/6_final.v \
    tests/timing_test/inv_chain_pnr/input.vcd \
    tests/timing_test/inv_chain_pnr/output.vcd 1 \
    --sdf tests/timing_test/inv_chain_pnr/6_final.sdf

Reading Violation Reports

Setup Violation Format

[cycle 42] SETUP VIOLATION at top/cpu/regs[7][bit 22] [word=5]: arrival=900ps setup=200ps slack=-100ps

(WS-P1.1.a, 2026-05-02: state-word indices are now resolved to symbolic hierarchical signal names. The bare [word=N] suffix is preserved for grep compatibility. Words packing more than 4 DFFs truncate with a +N more suffix.)

FieldMeaning
cycleSimulation cycle where the violation occurred
wordState word index — identifies a group of 32 DFF data inputs
arrivalMaximum accumulated gate delay to this word's signals (picoseconds)
setupDFF setup time constraint from SDF/Liberty (picoseconds)
slackclock_period - arrival - setup. Negative = violation amount

Hold Violation Format

[cycle 11] HOLD VIOLATION at top/cpu/state[bit 3] [word=3]: arrival=10ps hold=50ps slack=-40ps
FieldMeaning
cycleSimulation cycle where the violation occurred
wordState word index
arrivalAccumulated gate delay to this word's signals (picoseconds)
holdDFF hold time constraint from SDF/Liberty (picoseconds)
slackarrival - hold. Negative = violation amount

Summary Statistics

At the end of simulation, GEM prints totals:

Simulation complete: 1000 cycles, 5 setup violations, 0 hold violations

Text Summary (--timing-summary)

A one-screen human summary printed to stdout at end of run. Reuses the same data the JSON report builds (so --timing-report and --timing-summary cost the same; only the output channel differs). Sample output:

=== Jacquard Timing Summary ===
Design:        my_cpu.gv
Vectors:       boot.vcd (1000 cycles)
Clock period:  1000 ps
Timing source: my_cpu.jtir

Violations:
  Setup: 5
  Hold:  2
  Total: 7

Worst slack:
  Setup: -150ps  at top/cpu/regs[7][bit 22] [word=5]  (cycle 87)
  Hold:   -40ps  at top/cpu/state[bit 3] [word=12]  (cycle 91)

Top 2 by violation count (of 2 total words with violations):
  top/cpu/regs[7][bit 22] [word=5] (5 violations): worst setup=-150ps hold=- arrival=950ps
  top/cpu/state[bit 3] [word=12] (2 violations): worst setup=- hold=-40ps arrival=10ps

The format is for human inspection — explicitly not a stable parseable contract. Tools that need to script against the data should use --timing-report JSON.

Structured JSON Report (--timing-report <path.json>)

For CI integration and downstream tooling, pass --timing-report <path> to get an end-of-run JSON document. The schema is versioned (ADR 0008's stability contract: additive-only extensions, breaking changes bump the major). Sample at tests/timing_ir/sample_reports/two_violations.json; authoritative type definitions in src/timing_report.rs.

Top-level shape:

{
  "schema_version": "1.0.0",
  "metadata": { "design": "...", "cycles_run": 1000, "clock_period_ps": 1000, "...": "..." },
  "stats": { "setup_violations": 5, "hold_violations": 0, "events_dropped": 0 },
  "violations": [
    { "cycle": 42, "kind": "setup", "word_id": 5, "site": "top/cpu/regs[7][bit 22] [word=5]",
      "arrival_ps": 900, "constraint_ps": 200, "slack_ps": -100 }
  ],
  "per_word": [
    { "word_id": 5, "site": "...", "setup_violations": 5, "hold_violations": 0,
      "worst_setup_slack_ps": -100, "worst_hold_slack_ps": null, "worst_arrival_ps": 900 }
  ],
  "worst_slack": {
    "setup": [ /* top-N most-negative slacks across the run */ ],
    "hold":  [ /* same shape */ ]
  }
}

per_word is sorted by total violation count desc, then by word_id. worst_slack.setup / .hold are top-10 by closest-to-violation slack (most negative first). Caveats:

  • The "even when no violation occurred" half of WS-P1.1.d (per-DFF closest-to-violation tracking when the design never tripped a violation) needs GPU-side near-miss instrumentation and is not in v1.0.0; for now, worst_slack is populated only from actual violation events.
  • --timing-report only produces output today on the Metal sim path. The CUDA / HIP / cosim paths do not currently route runtime violations through process_events — bringing them in is independent plumbing.
  • The violations array is capped at 100,000 records by default (~8 MB JSON). Override or disable the cap with --timing-report-max-violations <N> (0 = unbounded). Setup/hold totals, events_dropped, and worst_slack rankings always reflect every observed event; only the per-cycle list is bounded. stats.violations_truncated reports how many records were dropped because the cap was reached.

Tracing Violations to Source Signals

When you see a violation on a specific word, follow this workflow to identify the offending signals and their logic cone.

1. Get the Word Index

From the log: word 5 means state word index 5.

2. Map Word to DFF Signals

Each word covers 32 bits of state. The DFFs in that word have data_state_pos / 32 == word_index. To find which DFFs:

  • Look at the dff_constraints entries in the FlattenedScriptV1:

    dff_constraints entries where data_state_pos / 32 == 5
    → cell_id values → netlist cell names
    
  • In gpu_sim, violations are logged with word IDs that map directly to the output_map positions. Each word covers bit positions word * 32 through word * 32 + 31.

3. Trace Backwards with netlist_graph

Use the netlist_graph tool to trace the combinational logic cone feeding the DFF. After uv sync --group dev, the netlist-graph console script is on the workspace's uv run path — no cd required:

# Find the DFF data input driver chain
uv run netlist-graph drivers design.v "dff_name.D" -d 10

# Search for DFFs matching a pattern
uv run netlist-graph search design.v "dff_out*"

Discovered signal names can be passed directly into jacquard sim --trace-signals <file> / jacquard cosim --trace-signals <file> (one name per line) to surface them in the output VCD alongside top-level IO.

4. Detailed Timing Analysis with CVC

For per-signal accuracy (no 32-signal approximation), use CVC (open-src-cvc) with SDF back-annotation:

# Run CVC with SDF timing
cvc64 +typdelays tb.v design.v
./cvcsim

CVC provides event-driven simulation with full SDF support (IOPATH + INTERCONNECT delays), allowing you to pinpoint exactly which path is critical.

The Approximation Caveat

GEM tracks one arrival time per 32-signal group (one GPU thread position). The tracked value is the maximum arrival across all 32 signals in that thread. This means:

  • Conservative: If any signal in the group has a long path, the arrival for the entire group reflects that worst case. Violations may be reported for signals that individually meet timing.
  • Never misses real violations: A real violation always results in a reported violation (the max is >= any individual signal's arrival).

Reducing False Positives

If a violation is reported but you suspect it's a false positive from the approximation:

  1. Use CVC for per-signal accuracy (see Detailed Timing Analysis with CVC above).
  2. Timing-aware bit packing groups signals with similar arrival times into the same thread, reducing the approximation error. See docs/timing-simulation.md § "Timing-Aware Bit Packing" for details.

Common Scenarios

Setup violations on many words, same cycle: The clock period is likely too tight for the design. The combinational logic depth exceeds what can settle in one clock period. Try increasing the clock period.

Setup violation on a single word: A critical path through one specific logic cone. Use netlist_graph drivers to trace the path and identify the bottleneck.

Hold violation: Rare with SKY130 process (negative hold times clamp to 0 in the SDF). If seen, the design likely has minimum-delay paths that are too short. Check for direct connections between DFF outputs and nearby DFF inputs with minimal combinational logic.

Violations only on first cycle: The arrival > 0 guard in the GPU kernel skips setup checks when arrival is zero (meaning no data has propagated through combinational logic yet). If you see violations on cycle 0, they are hold violations — setup violations on cycle 0 are suppressed by design.