Logic / Protocol Analyzer Guide for I2C, SPI, and UART

Q: Decode shows intermittent I²C NAK, but the scope “looks fine” — threshold or sample rate first?

Likely cause: analyzer Vth mismatched to the logic domain (open-drain/level-shift), or sampling too low causing edge ambiguity. Quick check: A/B: shift threshold by ±ΔV (placeholder) and raise sample rate ×2; confirm whether NAK count changes and whether decode alignment shifts. Fix: lock Vth to the correct rail domain; set sample rate ≥ X×fSCL (placeholder) and enable glitch/debounce where supported. Pass criteria: NAK rate ≤ X per 10k transactions (placeholder) and decode remains stable across ±ΔV sweep (no event class changes).

Q: One branch hot-plugged, then the whole I²C bus locks — fastest way to identify who holds SDA low?

Likely cause: a device keeps SDA low (stuck-bus) after brown-out/hot-plug, or a mux/isolator channel is left in a bad state. Quick check: trigger on SDA low > X ms (placeholder) and keep long pre-trigger; correlate the last valid address before the hold and the branch-switch marker (if present). Fix: add/enable a recovery sequence (clock pulses + STOP), and use segmentation (mux/buffer) to isolate the faulty branch for immediate attribution. Pass criteria: SDA low holds > X ms occur 0 times in N hot-plug cycles (placeholder), and recovery completes within X ms (placeholder).

Q: Clock stretching “sometimes” causes timeouts — how to prove it’s stretch latency (not decode artifacts)?

Likely cause: genuine SCL low-hold latency spikes, or analyzer time resolution insufficient to measure hold durations reliably. Quick check: compute SCL-low duration distribution (min/typ/p99); re-run with one higher timing-resolution tier and check whether the distribution shape changes. Fix: enforce a master timeout policy aligned to measured p99 stretch; add marker to timestamp firmware request/response boundaries for correlation. Pass criteria: SCL-low hold p99 ≤ X µs (placeholder) under stress, and timeout events drop to ≤ X per hour (placeholder).

Q: SPI register write “succeeds” but does not take effect — CPOL/CPHA first or CS hold first?

Likely cause: mode mismatch causing bit/byte shift, or CS deassertion violates hold/setup timing so the slave drops the frame. Quick check: A/B decode Mode 0–3 and look for alignment convergence; measure CS hold time around the last clock edge (CS↑ vs SCLK) against X ns (placeholder). Fix: lock a known-good mode at power-up; add guard time: CS hold ≥ X ns and CS inactive gap ≥ X ns (placeholders) in the transaction template. Pass criteria: write-then-readback match ≥ X% (placeholder) and decode remains correct under mode lock (no sporadic shifts).

Q: SPI decode intermittently shifts (bit-slip) — is it sampling, threshold, or trigger placement?

Likely cause: marginal sampling rate/time resolution near SCLK edges, or threshold drift on MISO/MOSI creating false transitions. Quick check: raise sample rate ≥ ×2 and shift Vth by ±ΔV (placeholder); compare “shift event” timestamps to SCLK duty anomalies or edge clusters. Fix: set sample rate ≥ X×fSCLK (placeholder), enable hysteresis/glitch reject if available, and trigger on CS-active with fixed pre/post window. Pass criteria: decode shift events = 0 over N transactions (placeholder) and conclusions remain unchanged across ±ΔV threshold sweep.

Q: SPI throughput collapses with many small transfers — how to prove it’s gaps/queueing (not “bad wires”)?

Likely cause: large CS inactive gaps and burst fragmentation from firmware/DMA framing, not signal integrity defects. Quick check: export stats: CS gap p50/p99 and payload duty-cycle; compare “many small” vs “batched” runs (A/B) using identical clock settings. Fix: batch transactions, increase in-flight depth where applicable, and set a minimum inter-transfer gap template (≤ X µs target, placeholder). Pass criteria: effective throughput ≥ X Mbps (placeholder) and CS gap p99 ≤ X µs (placeholder) under the target workload.

Q: UART shows intermittent framing errors — baud error first or noise/glitches first?

Likely cause: clock mismatch/drift exceeding tolerance, or narrow noise pulses corrupting the start/stop boundary. Quick check: measure bit width over M frames and compare to nominal; then enable glitch reject / adjust Vth by ±ΔV (placeholder) to see if framing rate changes. Fix: tighten baud error budget (target |Δbaud| ≤ X%, placeholder) and add de-glitch/oversampling where available; align errors to markers for causality. Pass criteria: framing errors = 0 in N frames (placeholder) and measured bit width deviation ≤ X% (placeholder) across temperature/load.

Q: UART “random drops” but no framing/parity — how to prove overrun or RTS/CTS back-pressure?

Likely cause: receiver overrun (buffer/service latency), or flow control holding the link (CTS asserted/blocked). Quick check: capture RTS/CTS alongside TX/RX and compute CTS-block duration distribution; align drop timestamps to CTS state and burst size. Fix: increase buffering/servicing margin and enforce back-pressure policy; define a maximum CTS-block time ≤ X ms (placeholder) for the system. Pass criteria: overruns ≤ X per hour (placeholder) and CTS-block p99 ≤ X ms (placeholder) under worst-case log load.

Q: Trigger hits “too often” or “never” — how to design the trigger funnel without guessing?

Likely cause: trigger condition too narrow before a baseline is understood, or too sensitive to noise (edge/glitch misfires). Quick check: run a two-stage plan: wide trigger to collect M samples, then measure which event/sequence occurs only near failures (frequency ratio ≥ X, placeholder). Fix: implement layered triggers (edge → event → sequence) and add debounce/time-qualification windows (≥ X ns/ms, placeholders). Pass criteria: trigger hit rate within [Xmin, Xmax] per hour (placeholders) and captures include ≥ X ms pre-trigger context (placeholder).

Q: The bug happens once per hour — sample rate is fine, but the capture still misses the “lead-up”

Likely cause: memory depth too small for the required pre-trigger window, or segmented capture not configured for rare events. Quick check: compute required pre-trigger time = suspected lead-up interval (X s/min, placeholder); compare to available pre-trigger depth at current sample rate. Fix: lower sample rate selectively (if allowed) to buy time span, enable segmented/deep memory modes, and trigger on a higher-level event closer to the failure. Pass criteria: ≥ X seconds of pre-trigger is present in ≥ X% of captures (placeholders), and the causal event appears within the window consistently.

← Back to: I²C / SPI / UART — Serial Peripheral Buses

This page shows how to use a logic/protocol analyzer to turn “intermittent bus bugs” (I²C/SPI/UART) into a repeatable, data-backed evidence chain. It focuses on trustworthy capture, trigger design, timing alignment, and pass criteria so issues can be reproduced, attributed, and handed off without guesswork.

What a Logic / Protocol Analyzer is (and when it beats a scope)

Scope guard

Covered: what the tool proves (decode/trigger/stats) and when it is the fastest path to a reproducible root cause.
Not covered: I²C/SPI/UART protocol theory, pull-up sizing, signal-integrity routing, or PHY-level analog design (those belong to the bus-specific subpages).

A logic/protocol analyzer turns raw transitions into time-stamped events that can be decoded, filtered, triggered, and summarized. The goal is not “seeing a waveform” but producing evidence: what happened, in what order, how often, and under which conditions.

Pick the right tool in ~60 seconds Choose based on what must be proven.

Oscilloscope

Best for: edge shape, ringing/reflection, overshoot, analog noise, eye/jitter work.
Weak at: proving long event sequences, rare faults, and frequency-of-occurrence.
Output: waveform evidence.

Logic / Protocol Analyzer

Best for: decode timelines, advanced triggers, event correlation, and stats.
Weak at: precise analog shape and SI root causes (still needs a scope).
Output: decode + trigger capture + statistics.

Combined (Most robust)

Use when: the system “looks OK” but fails intermittently; both analog quality and digital sequence must be proven.
Method: trigger/capture on the analyzer; verify edge integrity on the scope around the same window.
Output: full evidence chain.

Analyzer wins fastest when…

Intermittent faults must be turned into a reproducible trigger (rare NAK/retry, sporadic bit-slip, random framing errors).
Root cause depends on event order (who spoke first, which transaction preceded the failure, which CS asserted).
Proof requires frequency and distribution (how often per minute, clustered under load, correlated to a marker).
Debug needs time alignment across multiple signals (bus + reset/IRQ + firmware marker).

Not the right first tool when…

The key question is analog edge quality: reflection, ringing, overshoot, termination, eye/jitter, or power integrity coupling.
Failure is dominated by physical layer constraints (cable SI, high-speed edges, EMI compliance). Use a scope/SI workflow first, then return for decode/sequence proof.

The 3 deliverables that make debug “engineering-grade”

1) Decode timeline

Human-readable transactions (bytes/frames/events) with precise timestamps; supports “what exactly happened?”

2) Trigger capture

A designed capture window around the fault (pre/post), so a rare bug becomes a repeatable dataset.

3) Stats & evidence

Counts, rates, and correlations (NAK rate, retries, error bursts, latency spikes) that survive peer review and production triage.

Diagram: tool comparison and what each one can produce (waveform vs decode vs stats)

Measurement Setup That Won’t Lie: probing, reference, thresholds

Protocol debug collapses when the measurement chain injects artifacts: ground bounce, threshold mismatch, probe loading, or input damage can create “bugs” that disappear when the setup changes. This section sets measurement readiness rules so decode/trigger results can be trusted.

Setup checklist (actionable) Each item includes a quick validation hook.

Probe & return path

Keep return short: long ground leads create inductive bounce and false edges.
Minimize loading: avoid clips that add capacitance on fast edges or weak pull-ups.
Quick check: swap to a shorter ground/return; decode error rate should drop, not shift randomly.

Logic thresholds

Match voltage domain: 1.2/1.8/3.3/5 V logic needs correct VIH/VIL interpretation.
Respect signaling type: open-drain (I²C) vs push-pull (SPI/UART) changes margin behavior.
Quick check: shift threshold by a small step; a real protocol fault persists, a threshold artifact moves/disappears.

Input protection & safety

Prevent damage: ESD/over-voltage events can silently degrade analyzer inputs.
Use series-R / isolation: especially when probing unknown headers or hot-plug ports.
Quick check: verify analyzer input range and clamp strategy before first contact; avoid “capture once, then everything changes.”

Sampling intuition (decode reliability)

Edge quality sets the floor: slow edges + noise shrink timing margin and increase threshold sensitivity.
Sample fast enough: under-sampling converts real timing into random decode shifts.
Quick check: increase sample rate and compare the same transaction; correct decode should converge, not diverge.

Red flags: suspect the measurement first These patterns often indicate setup artifacts.

Same transaction decodes differently across captures without any firmware/state change → threshold/return-path instability is likely.
ACK/NAK toggles when only probe position/ground lead changes → ground bounce or loading dominates the result.
Bug disappears when the clip is removed or moved → probe capacitance/induced crosstalk is acting as a “fix.”
Analyzer-to-analyzer disagreement on the same signal → validate thresholds, sample rate, and channel skew before blaming the DUT.

Pass criteria (measurement readiness)

Repeatability: N replays of the same action produce the same decode result (N = X).
Threshold stability: small threshold adjustments do not flip the interpretation of stable bits/ACKs; only marginal edges move (Δthreshold = X).
Error convergence: increasing sample rate reduces decode ambiguity (no new “random” bytes appear).
Setup traceability: probe type, return path method, threshold setting, and input range are recorded for review/hand-off.

Diagram: probe + DUT + return path, with threshold selection and input clamp modules

Sampling & Timebase: how much sample rate is enough (and when it isn’t)

Scope guard

Covered: sample rate, timing resolution, and memory depth as they impact decode stability, trigger reliability, and evidence quality.
Not covered: protocol field semantics, bus-specific electrical design, or signal-integrity root causes (those belong to the bus/SI subpages).

Sampling choices determine whether captures converge into a consistent decode timeline or diverge into shifting markers and contradictory transactions. A “good” configuration is not simply higher numbers—it is one where repeated captures of the same action produce the same decoded events and timing relationships.

The 3 knobs that decide capture truth Treat them as a coupled system: rate ↔ resolution ↔ depth.

Sample rate

Controls: how finely edges and short pulses are represented; directly impacts decode stability.
Rule of thumb: sample ≥ X× the fastest relevant edge/timing feature (X = placeholder).
Failure signature: the same transaction shifts byte/bit boundaries between captures.
Quick check: increase rate by one step; decode should converge, not “randomize.”

Timing resolution

Controls: event localization (setup/hold windows, stretch duration, inter-frame gaps).
Where it bites: ordering and causality (“what happened first”) becomes ambiguous if time bins are coarse.
Failure signature: event times quantize into a few buckets; deltas look inconsistent.
Quick check: zoom into a known boundary; markers should anchor to stable edges.

Memory depth

Controls: record length (how far back/forward evidence extends).
Where it bites: burst/DMA transfers and rare faults need long windows to capture the “pre-cause.”
Failure signature: captures show only the failure moment, not the lead-up sequence.
Quick check: enable pre-trigger and increase depth; the prior context must appear.

Practical convergence test (turn “settings” into proof)

Capture the same action with two or three configurations (sample rate and timebase steps).
Compare decoded markers (start/stop, CS edges, frame boundaries): they should align to the same physical transitions.
Decision: if decode shifts between configurations, the dataset is not trustworthy yet—raise rate/resolution and re-check.
Pass criteria: repeated captures produce the same decode timeline and stable deltas (thresholds = X placeholders).

Window planning (why pre-trigger is mandatory for intermittent bugs)

Trigger point captures the symptom; pre-trigger captures the cause chain (prior transactions, retries, queueing gaps).
Use a fixed template window around the trigger (pre = X, post = X) so datasets are comparable across runs.
For long bursts (DMA), prioritize depth and event markers; for rare faults, prioritize pre-trigger history over post-trigger length.

Diagram: sparse sampling near an edge can shift detected markers and create “jitter-like” decode artifacts

Timing Alignment: correlate SCL/SDA/CS/SCLK/RX/TX with a shared reference

Scope guard

Covered: alignment methodology—shared references, markers, skew calibration, and delay decomposition.
Not covered: protocol correctness rules or bus electrical design (use the bus-specific pages for those).

Timing alignment turns multiple observations into a single causal timeline. The objective is to answer: what happened first, how long each phase took, and whether observed delays are fixed (deterministic) or variable (state-dependent). This is essential when troubleshooting bridges, isolators, extenders, and firmware-driven transactions.

Step 1–5 alignment workflow Each step leaves an artifact that can be shared.

Choose a shared reference. Use one of: same-instrument timebase, shared trigger, a reference pulse, or a marker strobe. Artifact: reference definition (signal name, wiring point, electrical levels).
Unify trigger and capture window. Use fixed pre/post windows (pre = X, post = X) so datasets are comparable. Artifact: trigger condition + window template.
Calibrate channel skew (single analyzer). Validate multi-channel simultaneity and apply skew correction if supported. Artifact: residual skew < X (placeholder).
Align across instruments (analyzer + scope). Anchor both to the same reference pulse/marker or share trigger. Artifact: reference delta distribution (mean/peak-to-peak = X).
Decompose delay. Separate fixed offsets (cable/isolator/level shifter) from variable latency (bridge FIFO/state). Artifact: annotated path map (fixed vs variable).

Alignment failure signatures Diagnose by symptom → first sanity check.

Reference line drifts between captures → trigger/timebase is not actually shared. First check: re-inject a hard marker pulse and verify it lands at the same timestamp.
Delay “jumps” in discrete steps → variable latency (FIFO levels, buffering, state machine phases). First check: collect a distribution of delay deltas over many events.
Decode becomes contradictory after alignment → sampling/thresholds are unstable. First check: run the H2-3 convergence test (decode must converge).
Scope and analyzer disagree on edge timing → skew/calibration mismatch. First check: align on a single sharp marker captured by both.

Diagram: reference pulse / marker drives a unified timeline across scope + analyzer + firmware, with delay decomposition

Trigger Strategy: from “capture everything” to “capture the bug”

Scope guard

Covered: trigger types, layered composition, convergence workflow, and trigger reliability.
Not covered: protocol field meaning or bus electrical design (use bus-specific subpages for those).

Effective triggering is a designed funnel: start wide to capture real samples, extract the minimum signature, then tighten into sequences and statistical thresholds. The goal is to convert intermittent faults into repeatable, comparable evidence windows.

Layered trigger stack Build from raw signals → protocol events → sequences → thresholds.

Level / Edge

Use for: first capture, coarse localization, line-state anomalies.
Common forms: edge, pulse width, timeout, stuck level.
Risk: noise/glitches cause false triggers.
Sanity check: add debounce or adjust threshold; false hits should change dramatically.

Byte / Frame

Use for: lock onto protocol-level events (NAK, frame boundary, error flag).
Benefit: fewer irrelevant captures than edge-only triggering.
Risk: unstable decode if sampling/thresholds do not converge.
Sanity check: run a convergence test (H2-3): decode must remain consistent across settings.

Sequence

Use for: rare bugs that require a specific lead-up chain.
Key: define the minimum signature (event list + ordering + limits).
Risk: overly strict rules never trigger; overly wide rules create large datasets.
Sanity check: start wide, then tighten one condition at a time.

Statistic / Threshold

Use for: error bursts, retry storms, latency spikes, throughput collapse.
Key: define “abnormal” with explicit thresholds (X placeholders).
Risk: wrong window length hides bursts or overreacts to noise.
Sanity check: sweep window length; true faults remain correlated to the symptom.

Rare-bug workflow (wide → narrow, guaranteed progress)

Start wide. Trigger on a coarse symptom (edge/timeout/NAK) with a large pre-trigger window to capture lead-up.
Extract the minimum signature. Identify the smallest set of events that always precede the bug (ordering and gaps).
Convert to sequence trigger. Implement the signature as a sequence with explicit limits (counts/timeouts = X).
Fix the evidence window. Use a standard template: pre = X, post = X to produce comparable captures.
Validate quality. Measure hit rate and false-trigger rate over repeated runs; thresholds remain explicit placeholders.

Trigger reliability (avoid false conclusions) Symptom → most likely cause → first sanity check.

Trigger rate swings wildly → glitches / threshold drift / ground bounce. First check: enable debounce and shift threshold by one step.
Sequence never triggers → rules too strict or decode does not converge. First check: revert to wide trigger and re-run convergence (H2-3).
Captures show the symptom but no cause → pre-trigger missing. First check: enforce a fixed pre/post template window.
Different analyzers disagree → timing base and thresholds differ. First check: add a hard reference pulse or marker and compare alignment.

Evidence window template (standardize every capture)

Window

Pre: X
Post: X
Trigger: explicit condition + versioned notes

Channels

bus lines (e.g., SCL/SDA or CS/SCLK or RX/TX)
reset/IRQ lines (if present)
firmware marker pulse (recommended)

Outputs

decode timeline export
trigger configuration record
stats summary (counts, burst rate, deltas)

Diagram: Trigger funnel—reduce raw edges into a captured bug window with fixed evidence

I²C Decode Playbook: NAK, arbitration, stretch, stuck-bus (analyzer-first)

Scope guard

Covered: analyzer-first capture → trigger → evidence interpretation for common I²C failure modes.
Not covered: pull-up sizing, bus capacitance budgeting, or board-level electrical design (use I²C electrical subpages).

The playbook is built around a repeatable evidence chain: symptom → what to capture → how to trigger → how to read the evidence. Each section prioritizes stable decoding and comparable windows over one-off screenshots.

Key events to surface in decode Keep the view event-driven (not waveform-only).

START STOP ACK / NAK Repeated START Clock stretching Arbitration lost SCL/SDA stuck

Analyzer-first playbook cards Symptom → capture → trigger → interpretation

NAK / retry burst

Symptom: intermittent NAK or retry storms.

Capture: the transaction chain leading into the first NAK; note whether NAK clusters by stage (address vs data).
Trigger: ACK/NAK event trigger; enforce fixed windows (pre = X, post = X). Optional: filter to a target address as a view reduction.
Interpret: if NAK aligns to a consistent lead-up pattern, tighten into a sequence trigger; if NAK rate changes with thresholds/sample rate, revisit H2-2/H2-3.

Evidence artifact: export decode + NAK frequency + lead-up event list.

Arbitration lost / multi-master conflict

Symptom: unexpected stalls or conflict errors.

Capture: the last clean sequence before the loss; include any secondary lines (IRQ/reset) if present.
Trigger: arbitration-lost event trigger (or an error flag) with a wide pre-trigger window.
Interpret: use alignment markers (H2-4) to correlate bus activity with firmware events; unstable positioning indicates sampling/threshold issues.

Evidence artifact: time-aligned timeline showing “what happened first.”

Clock stretching / timeout

Symptom: slow transactions, timeout failures, sporadic hangs.

Capture: stretch duration distribution (min/typ/max) and the event chain before long stretches.
Trigger: pulse-width/timeout trigger on SCL low > X, plus decode enabled to keep event context.
Interpret: multi-modal duration buckets often indicate state-dependent phases; tighten into a sequence trigger using the lead-up events.

Evidence artifact: histogram-ready list of stretch durations + lead-up signature.

Stuck bus (SCL/SDA held low)

Symptom: bus stops responding until reset.

Capture: the last complete transaction and the missing/abnormal termination event (if any).
Trigger: SCL low > X or SDA low > X with deep pre-trigger to keep the “last good” context.
Interpret: missing stop/termination events indicate an unclosed transaction chain; correlate to power/plug markers if available (H2-4).

Evidence artifact: last-good transaction + stuck-line duration + recovery action timing.

“Looks OK” traps (avoid mis-decode) If any signature matches, revisit H2-2/H2-3 first.

ACK flips across captures of the same action → sampling near edges or wrong thresholds can turn SDA into a false bit.
NAK rate changes when thresholds or sample rate changes → the dataset is not stable enough for protocol conclusions.
Decode “drifts” in time despite the same stimulus → convergence failed; raise sample rate/time resolution and re-test.

Diagram: I²C transaction timeline with trigger point and fixed capture window (pre/post)

SPI Decode Playbook: mode mismatch, CS timing, bit-slip, throughput drops

Scope guard

Covered: decode / trigger / timing alignment evidence for common SPI failures.
Not covered: termination, reflections, or long-trace SI implementation details (use the SPI Long-Trace SI subpage).

The SPI workflow is layered to prevent misdiagnosis: confirm CPOL/CPHA first, then validate CS semantics, then check sampling windows, and finally quantify throughput and burst behavior. Each step produces exportable evidence rather than one-off screenshots.

Layer order (diagnosis) Mode → CS → window → throughput

CPOL/CPHA CS setup/hold Sampling edge Burst / DMA Stats (gap/duty)

Analyzer-first SPI cards Symptom → capture → trigger → interpretation

CPOL/CPHA quick confirmation

Symptom: writes appear successful, but registers do not change; reads look “random” or shifted.

Capture: the same transaction under Mode 0–3 decode views; compare stability (byte boundaries and lengths should converge).
Trigger: wide trigger on CS active; enforce a fixed evidence window (pre = X, post = X).
Interpret: the correct mode typically yields stable decode across repeated captures; if all modes drift, revisit sampling/threshold setup (H2-2/H2-3).

Artifact: mode comparison snapshot + “decode convergence” notes.

CS semantics (setup/hold, gap, re-assert)

Symptom: intermittent failures clustered near frame boundaries.

Capture: CS low duration, CS high gap distribution, and the last/first clock edges around CS transitions.
Trigger: CS rising edge (frame end) or CS pulse width abnormal (< X or > X).
Interpret: if errors align with CS transitions or gap jitter, tighten the trigger to “boundary signatures” and export gap statistics.

Artifact: CS gap min/typ/max + boundary capture window.

Multi-slave: correct blame with CS fanout

Symptom: “SPI is broken” reports, but only one target misbehaves.

Capture: all CS lines and bucket anomalies by CS (error counts, abnormal lengths, unusual gaps).
Trigger: filter by a specific CS active state; start wide then narrow to the failing CS.
Interpret: single-CS clustering indicates target-specific issues; multi-CS anomalies indicate a shared root cause (align with markers, H2-4).

Artifact: per-CS anomaly histogram (counts per minute / per 1k frames).

Bit-slip / sampling-edge issues

Symptom: occasional one-bit shifts or “byte boundary drift” across otherwise similar frames.

Capture: the first bad frame and the last good frame; compare edge-to-sample markers and frame length stability.
Trigger: abnormal frame length / unexpected boundary (if supported), else wide CS trigger + post-filter by anomaly.
Interpret: if slip rate changes with sampling/threshold settings, convergence is not met (H2-3). If slip clusters at burst boundaries, revisit CS semantics and burst framing.

Artifact: annotated “first-bad frame” capture with sample-edge markers.

Throughput drops / DMA burst behavior

Symptom: throughput collapses under small transfers or burst boundaries.

Capture: gap distribution, burst length distribution, and payload duty (active clock time / total time).
Trigger: statistical threshold triggers: gap > X, burst < X, jitter spikes > X.
Interpret: boundary-aligned drops point to framing/queueing; event-aligned drops point to external back-pressure or firmware scheduling (confirm with markers, H2-4).

Artifact: duty + gap + burst exports for apples-to-apples comparison.

Diagram: SPI rails (CS/SCLK/MOSI/MISO) with sampling-edge markers and Mode 0–3 selector

UART Decode Playbook: baud error, framing/parity, flow control, wake/break

Scope guard

Covered: analyzer decode, error classification, timing alignment, and flow-control evidence.
Not covered: RS-232/RS-485 electrical/PHY details (use the UART Voltage Levels & PHY subpage).

UART failures often appear random until they are typed and aligned in time. The layered workflow starts with a baud-rate sanity, then converts “garbage” into error classes, separates back-pressure from link issues, and finally designs robust wake/break capture windows.

Baud (auto) Errors RTS/CTS Wake/Break

Baud rate & sampling sanity

Symptom: intermittent garbage or periodic failures.

Capture: auto-baud estimate (if available) and measured bit widths over time; track drift across frames.
Trigger: framing/parity events with fixed windows (pre = X, post = X).
Interpret: if errors cluster with bit-width drift, treat as a timing budget issue; typical combined clock tolerance is often around ±2% as a practical checkpoint.

Artifact: bit-width trend + time-stamped error bursts.

Error classes (framing / parity / overrun)

Symptom: “random” corruption without an obvious pattern.

Capture: per-class counts + the exact timestamps; identify whether errors are isolated or bursty.
Trigger: trigger on a specific error type to build clean evidence packs.
Interpret: framing suggests boundary loss; parity suggests bit-level corruption; overrun suggests buffering/back-pressure—confirm by aligning with RTS/CTS lines.

Artifact: typed error timeline export (counts + burst intervals).

RTS/CTS: detect back-pressure masquerading as link faults

Symptom: timeouts or stalls that look like a link problem.

Capture: CTS/RTS alongside TX/RX; measure block durations and correlate to dropped frames/timeouts.
Trigger: CTS blocked > X (duration threshold) or timeout event + examine CTS state in the fixed window.
Interpret: if all stalls align to CTS blocking, treat as back-pressure; if CTS is normal but errors persist, return to baud and error typing.

Artifact: CTS duty + block time distribution (min/typ/max).

Wake / Break / Idle detect windows

Symptom: wake-up occasionally fails or the first frame after wake is corrupted.

Capture: break/idle boundary with deep pre-trigger to preserve the lead-up; include a firmware marker if available.
Trigger: break detected or idle-to-active transition; enforce a fixed window template.
Interpret: if failures align to wake boundaries, treat as a capture-window/alignment problem first (H2-4), not as “random link noise”.

Artifact: wake boundary capture with first-frame correctness check.

Diagram: UART bit frame (start/data/parity/stop) with sample points and error labels

Debug Workflow: prove root cause with a repeatable evidence chain

Scope guard

Covered: evidence chain template that is repeatable, transferable, and bug-report ready.
Not covered: long repair plans for circuits/firmware; focus is proof and pass criteria.

The goal is not “a capture” but a closed loop: each cycle produces artifacts that another engineer can replay and compare. The workflow below turns intermittent symptoms into quantifiable triggers, isolates variables one at a time, and ends with a pass criterion suitable for regression.

5-step closed loop Goal → Do → Validate → Output

Step 1 · Define the symptom

Goal: convert “intermittent” into a measurable event type.
Do: pick the observable signature (NAK / bit-slip / framing / timeout / stall).
Validate: reproduce under the same conditions at least N times.
Output: one-line symptom definition + counting rule (X placeholders).

Step 2 · Design the trigger

Goal: capture the bug window reliably.
Do: start wide, capture a real sample, then narrow into a sequence/threshold trigger.
Validate: track hit rate vs miss rate (X placeholders).
Output: trigger definition + fixed window template (pre = X, post = X).

Step 3 · Capture a clean sample

Goal: ensure the sample reflects the system, not the measurement chain.
Do: run a convergence check (sampling/threshold changes must not flip the conclusion).
Validate: repeated captures decode consistently.
Output: raw capture + decode export + setup record.

Step 4 · Isolate variables (one at a time)

Goal: turn correlation into a causality chain.
Do: change one variable only (bridge / isolator / queue depth / firmware build / cable).
Validate: anomaly rate changes monotonically or predictably.
Output: A/B evidence packs with identical trigger/window template.

Step 5 · Define pass criteria (regression-ready)

Goal: make “fixed” measurable in production-like runs.
Do: set thresholds (error rate ≤ X, p99 latency ≤ X, stall count ≤ X).
Validate: repeat under a fixed workload and always pass.
Output: pass criteria + regression steps (X placeholders).

Evidence pack (artifacts) must-have vs optional

Must-have

Raw capture: preserves threshold/sampling auditability.
Decode export: makes interpretation reproducible.
Trigger definition: explains why the bug window is captured.
Statistics: converts “rare” into measurable rates/distributions.
Marker notes: pin polarity + meaning + mapping to firmware events.

Optional

Cross-instrument alignment: correlates analyzer + scope when needed.
A/B comparison pack: identical trigger/window with one-variable change.
One-page summary: symptom + trigger + pass criteria (bug-report ready).

Naming & packaging rule (recommended)

Use a stable naming pattern such as bus_tool_symptom_trigger_window_timestamp and include a small README with three lines: Symptom, Trigger, Pass criteria.

Misattribution traps avoid false conclusions

Retry ≠ hardware failure: align retry bursts to bus events and markers before blaming the physical layer.
Threshold drift ≠ protocol error: if the conclusion flips by changing thresholds or sampling, fix the measurement chain first.
Single screenshot ≠ proof: require repeat captures and a distribution (rate, gaps, burst intervals).

Diagram: evidence chain loop (Capture → Annotate → Compare → Isolate → Pass → Capture)

Engineering Checklist (design → bring-up → production)

Intent

Focus: observability hooks that make analyzer evidence fast and repeatable.
Not a bus design guide: avoids routing/SI rules and protocol-level teaching.

This checklist embeds observability into the product lifecycle. Each item is phrased as a hook with a verification action and a pass condition, enabling consistent bring-up and production regression.

Design · observability hooks to build in

Test pads / probe access — Verify: stable capture without clip-induced flips; Pass: decode remains consistent when sampling/threshold changes one step.
Marker GPIO / strobe pin — Verify: a deterministic pulse at key firmware events; Pass: marker-to-bus alignment error < X (placeholder).
Config options (pull-up/termination variants) — Verify: A/B switchable by BOM option or strap; Pass: A/B evidence packs use identical trigger/window template.
Bypass hooks (mux/isolator bypass) — Verify: bypass toggles reproducibly; Pass: anomaly rate shifts predictably across bypass state.
Logging with timestamps — Verify: event IDs and timestamps are emitted; Pass: log events can be correlated to analyzer markers (shared reference concept).

MVP hooks (minimal viable): test pads + one marker pin + a bypass option + a stable log timestamp field.

Bring-up · templates and minimal repro

Default trigger templates — Verify: one-click templates catch common anomalies; Pass: hit rate ≥ X with miss rate ≤ X (placeholders).
Minimal reproducer — Verify: fixed script/sequence reproduces the symptom; Pass: appears ≥ X times across N runs (placeholders).
Alignment plan — Verify: marker + bus lines capture simultaneously; Pass: event order is stable and repeatable across captures.
Evidence packaging — Verify: exports include raw + decode + trigger + stats; Pass: another station can replay and reach the same conclusion.

MVP bring-up pack: symptom line + trigger rule + pre/post window + typed error stats.

Production · regression and fast triage

BIST / loopback — Verify: runs at power-on or station time; Pass: error count = 0 or ≤ X (placeholders).
Stats threshold alarms — Verify: station captures and logs key metrics; Pass: exceed threshold → automatic fail/hold decision.
Station capture triage — Verify: fixed trigger/window produces a compact evidence pack; Pass: pack naming + content are consistent and reviewable.
Golden reference runs — Verify: periodic baseline captures; Pass: distributions remain within control limits (X placeholders).

MVP production guard: loopback + one threshold alarm + a fixed capture template for triage.

Diagram: observability hooks map around the DUT (test pads, marker pin, bypass, config, logging)

Applications (where analyzers save days)

Scope guard

Covered: application buckets for I²C/SPI/UART evidence-first debugging.
Not covered: Industrial Ethernet, high-speed SerDes, USB/PCIe/TSN domains, or long repair plans.

Each card below maps a real board-level scenario to the first evidence to capture and the first trigger to use. The objective is fast, repeatable proof (raw + decode + stats + markers), not guesswork.

I²C EEPROM write then read-back fails

Scenario: configuration/page write followed by immediate verify read.
Common failures: NAK burst, stale data read-back, intermittent verify fail.
Evidence to capture: START/STOP + ACK/NAK timeline, repeated START sequence, write-to-read gap stats (busy window).
First trigger: trigger on NAK (or specific address NAK), fixed pre/post window.
Pass hint: read-back mismatch rate ≤ X (placeholder) under a fixed workload.