123 Main Street, New York, NY 10001

Logic / Protocol Analyzer Guide for I2C, SPI, and UART

← Back to: I²C / SPI / UART — Serial Peripheral Buses

This page shows how to use a logic/protocol analyzer to turn “intermittent bus bugs” (I²C/SPI/UART) into a repeatable, data-backed evidence chain. It focuses on trustworthy capture, trigger design, timing alignment, and pass criteria so issues can be reproduced, attributed, and handed off without guesswork.

What a Logic / Protocol Analyzer is (and when it beats a scope)

Scope guard
  • Covered: what the tool proves (decode/trigger/stats) and when it is the fastest path to a reproducible root cause.
  • Not covered: I²C/SPI/UART protocol theory, pull-up sizing, signal-integrity routing, or PHY-level analog design (those belong to the bus-specific subpages).

A logic/protocol analyzer turns raw transitions into time-stamped events that can be decoded, filtered, triggered, and summarized. The goal is not “seeing a waveform” but producing evidence: what happened, in what order, how often, and under which conditions.

Pick the right tool in ~60 seconds Choose based on what must be proven.
Oscilloscope
  • Best for: edge shape, ringing/reflection, overshoot, analog noise, eye/jitter work.
  • Weak at: proving long event sequences, rare faults, and frequency-of-occurrence.
  • Output: waveform evidence.
Logic / Protocol Analyzer
  • Best for: decode timelines, advanced triggers, event correlation, and stats.
  • Weak at: precise analog shape and SI root causes (still needs a scope).
  • Output: decode + trigger capture + statistics.
Combined (Most robust)
  • Use when: the system “looks OK” but fails intermittently; both analog quality and digital sequence must be proven.
  • Method: trigger/capture on the analyzer; verify edge integrity on the scope around the same window.
  • Output: full evidence chain.
Analyzer wins fastest when…
  • Intermittent faults must be turned into a reproducible trigger (rare NAK/retry, sporadic bit-slip, random framing errors).
  • Root cause depends on event order (who spoke first, which transaction preceded the failure, which CS asserted).
  • Proof requires frequency and distribution (how often per minute, clustered under load, correlated to a marker).
  • Debug needs time alignment across multiple signals (bus + reset/IRQ + firmware marker).
Not the right first tool when…
  • The key question is analog edge quality: reflection, ringing, overshoot, termination, eye/jitter, or power integrity coupling.
  • Failure is dominated by physical layer constraints (cable SI, high-speed edges, EMI compliance). Use a scope/SI workflow first, then return for decode/sequence proof.
The 3 deliverables that make debug “engineering-grade”
1) Decode timeline

Human-readable transactions (bytes/frames/events) with precise timestamps; supports “what exactly happened?”

2) Trigger capture

A designed capture window around the fault (pre/post), so a rare bug becomes a repeatable dataset.

3) Stats & evidence

Counts, rates, and correlations (NAK rate, retries, error bursts, latency spikes) that survive peer review and production triage.

Diagram: tool comparison and what each one can produce (waveform vs decode vs stats)
DUT signals Edges / clocks Noise / artifacts Transactions Tools Oscilloscope Waveform truth Logic Analyzer Edges + timing Protocol Decode Events + meaning Combined Outputs (evidence) Waveform edge / noise / SI clues Decode timeline frames / bytes / events Stats & evidence rates / correlation Waveform strength Decode/stats strength

Measurement Setup That Won’t Lie: probing, reference, thresholds

Protocol debug collapses when the measurement chain injects artifacts: ground bounce, threshold mismatch, probe loading, or input damage can create “bugs” that disappear when the setup changes. This section sets measurement readiness rules so decode/trigger results can be trusted.

Setup checklist (actionable) Each item includes a quick validation hook.
Probe & return path
  • Keep return short: long ground leads create inductive bounce and false edges.
  • Minimize loading: avoid clips that add capacitance on fast edges or weak pull-ups.
  • Quick check: swap to a shorter ground/return; decode error rate should drop, not shift randomly.
Logic thresholds
  • Match voltage domain: 1.2/1.8/3.3/5 V logic needs correct VIH/VIL interpretation.
  • Respect signaling type: open-drain (I²C) vs push-pull (SPI/UART) changes margin behavior.
  • Quick check: shift threshold by a small step; a real protocol fault persists, a threshold artifact moves/disappears.
Input protection & safety
  • Prevent damage: ESD/over-voltage events can silently degrade analyzer inputs.
  • Use series-R / isolation: especially when probing unknown headers or hot-plug ports.
  • Quick check: verify analyzer input range and clamp strategy before first contact; avoid “capture once, then everything changes.”
Sampling intuition (decode reliability)
  • Edge quality sets the floor: slow edges + noise shrink timing margin and increase threshold sensitivity.
  • Sample fast enough: under-sampling converts real timing into random decode shifts.
  • Quick check: increase sample rate and compare the same transaction; correct decode should converge, not diverge.
Red flags: suspect the measurement first These patterns often indicate setup artifacts.
  • Same transaction decodes differently across captures without any firmware/state change → threshold/return-path instability is likely.
  • ACK/NAK toggles when only probe position/ground lead changes → ground bounce or loading dominates the result.
  • Bug disappears when the clip is removed or moved → probe capacitance/induced crosstalk is acting as a “fix.”
  • Analyzer-to-analyzer disagreement on the same signal → validate thresholds, sample rate, and channel skew before blaming the DUT.
Pass criteria (measurement readiness)
  • Repeatability: N replays of the same action produce the same decode result (N = X).
  • Threshold stability: small threshold adjustments do not flip the interpretation of stable bits/ACKs; only marginal edges move (Δthreshold = X).
  • Error convergence: increasing sample rate reduces decode ambiguity (no new “random” bytes appear).
  • Setup traceability: probe type, return path method, threshold setting, and input range are recorded for review/hand-off.
Diagram: probe + DUT + return path, with threshold selection and input clamp modules
DUT Probe low loading Analyzer input Return path (keep short & controlled) Threshold select 1.2 / 1.8 / 3.3 / 5.0 V Input clamp / protection series-R, ESD clamp Signal under test Common trap long ground loop

Sampling & Timebase: how much sample rate is enough (and when it isn’t)

Scope guard
  • Covered: sample rate, timing resolution, and memory depth as they impact decode stability, trigger reliability, and evidence quality.
  • Not covered: protocol field semantics, bus-specific electrical design, or signal-integrity root causes (those belong to the bus/SI subpages).

Sampling choices determine whether captures converge into a consistent decode timeline or diverge into shifting markers and contradictory transactions. A “good” configuration is not simply higher numbers—it is one where repeated captures of the same action produce the same decoded events and timing relationships.

The 3 knobs that decide capture truth Treat them as a coupled system: rate ↔ resolution ↔ depth.
Sample rate
  • Controls: how finely edges and short pulses are represented; directly impacts decode stability.
  • Rule of thumb: sample ≥ the fastest relevant edge/timing feature (X = placeholder).
  • Failure signature: the same transaction shifts byte/bit boundaries between captures.
  • Quick check: increase rate by one step; decode should converge, not “randomize.”
Timing resolution
  • Controls: event localization (setup/hold windows, stretch duration, inter-frame gaps).
  • Where it bites: ordering and causality (“what happened first”) becomes ambiguous if time bins are coarse.
  • Failure signature: event times quantize into a few buckets; deltas look inconsistent.
  • Quick check: zoom into a known boundary; markers should anchor to stable edges.
Memory depth
  • Controls: record length (how far back/forward evidence extends).
  • Where it bites: burst/DMA transfers and rare faults need long windows to capture the “pre-cause.”
  • Failure signature: captures show only the failure moment, not the lead-up sequence.
  • Quick check: enable pre-trigger and increase depth; the prior context must appear.
Practical convergence test (turn “settings” into proof)
  1. Capture the same action with two or three configurations (sample rate and timebase steps).
  2. Compare decoded markers (start/stop, CS edges, frame boundaries): they should align to the same physical transitions.
  3. Decision: if decode shifts between configurations, the dataset is not trustworthy yet—raise rate/resolution and re-check.
  4. Pass criteria: repeated captures produce the same decode timeline and stable deltas (thresholds = X placeholders).
Window planning (why pre-trigger is mandatory for intermittent bugs)
  • Trigger point captures the symptom; pre-trigger captures the cause chain (prior transactions, retries, queueing gaps).
  • Use a fixed template window around the trigger (pre = X, post = X) so datasets are comparable across runs.
  • For long bursts (DMA), prioritize depth and event markers; for rare faults, prioritize pre-trigger history over post-trigger length.
Diagram: sparse sampling near an edge can shift detected markers and create “jitter-like” decode artifacts
Sampling density vs marker stability Signal Samples Decode marker edge edge Low sample rate High sample rate Marker (shifted) Marker (stable) Sparse samples Dense samples

Timing Alignment: correlate SCL/SDA/CS/SCLK/RX/TX with a shared reference

Scope guard
  • Covered: alignment methodology—shared references, markers, skew calibration, and delay decomposition.
  • Not covered: protocol correctness rules or bus electrical design (use the bus-specific pages for those).

Timing alignment turns multiple observations into a single causal timeline. The objective is to answer: what happened first, how long each phase took, and whether observed delays are fixed (deterministic) or variable (state-dependent). This is essential when troubleshooting bridges, isolators, extenders, and firmware-driven transactions.

Step 1–5 alignment workflow Each step leaves an artifact that can be shared.
  1. Choose a shared reference. Use one of: same-instrument timebase, shared trigger, a reference pulse, or a marker strobe. Artifact: reference definition (signal name, wiring point, electrical levels).
  2. Unify trigger and capture window. Use fixed pre/post windows (pre = X, post = X) so datasets are comparable. Artifact: trigger condition + window template.
  3. Calibrate channel skew (single analyzer). Validate multi-channel simultaneity and apply skew correction if supported. Artifact: residual skew < X (placeholder).
  4. Align across instruments (analyzer + scope). Anchor both to the same reference pulse/marker or share trigger. Artifact: reference delta distribution (mean/peak-to-peak = X).
  5. Decompose delay. Separate fixed offsets (cable/isolator/level shifter) from variable latency (bridge FIFO/state). Artifact: annotated path map (fixed vs variable).
Alignment failure signatures Diagnose by symptom → first sanity check.
  • Reference line drifts between captures → trigger/timebase is not actually shared. First check: re-inject a hard marker pulse and verify it lands at the same timestamp.
  • Delay “jumps” in discrete steps → variable latency (FIFO levels, buffering, state machine phases). First check: collect a distribution of delay deltas over many events.
  • Decode becomes contradictory after alignment → sampling/thresholds are unstable. First check: run the H2-3 convergence test (decode must converge).
  • Scope and analyzer disagree on edge timing → skew/calibration mismatch. First check: align on a single sharp marker captured by both.
Diagram: reference pulse / marker drives a unified timeline across scope + analyzer + firmware, with delay decomposition
Unified timeline with shared reference (t0) t0 Scope Ref Analyzer Event A Event B Event C Ref Marker GPIO strobe Ref Delay decomposition: Cable Isolator Level shifter Bridge Fixed Var

Trigger Strategy: from “capture everything” to “capture the bug”

Scope guard
  • Covered: trigger types, layered composition, convergence workflow, and trigger reliability.
  • Not covered: protocol field meaning or bus electrical design (use bus-specific subpages for those).

Effective triggering is a designed funnel: start wide to capture real samples, extract the minimum signature, then tighten into sequences and statistical thresholds. The goal is to convert intermittent faults into repeatable, comparable evidence windows.

Layered trigger stack Build from raw signals → protocol events → sequences → thresholds.
Level / Edge
  • Use for: first capture, coarse localization, line-state anomalies.
  • Common forms: edge, pulse width, timeout, stuck level.
  • Risk: noise/glitches cause false triggers.
  • Sanity check: add debounce or adjust threshold; false hits should change dramatically.
Byte / Frame
  • Use for: lock onto protocol-level events (NAK, frame boundary, error flag).
  • Benefit: fewer irrelevant captures than edge-only triggering.
  • Risk: unstable decode if sampling/thresholds do not converge.
  • Sanity check: run a convergence test (H2-3): decode must remain consistent across settings.
Sequence
  • Use for: rare bugs that require a specific lead-up chain.
  • Key: define the minimum signature (event list + ordering + limits).
  • Risk: overly strict rules never trigger; overly wide rules create large datasets.
  • Sanity check: start wide, then tighten one condition at a time.
Statistic / Threshold
  • Use for: error bursts, retry storms, latency spikes, throughput collapse.
  • Key: define “abnormal” with explicit thresholds (X placeholders).
  • Risk: wrong window length hides bursts or overreacts to noise.
  • Sanity check: sweep window length; true faults remain correlated to the symptom.
Rare-bug workflow (wide → narrow, guaranteed progress)
  1. Start wide. Trigger on a coarse symptom (edge/timeout/NAK) with a large pre-trigger window to capture lead-up.
  2. Extract the minimum signature. Identify the smallest set of events that always precede the bug (ordering and gaps).
  3. Convert to sequence trigger. Implement the signature as a sequence with explicit limits (counts/timeouts = X).
  4. Fix the evidence window. Use a standard template: pre = X, post = X to produce comparable captures.
  5. Validate quality. Measure hit rate and false-trigger rate over repeated runs; thresholds remain explicit placeholders.
Trigger reliability (avoid false conclusions) Symptom → most likely cause → first sanity check.
  • Trigger rate swings wildly → glitches / threshold drift / ground bounce. First check: enable debounce and shift threshold by one step.
  • Sequence never triggers → rules too strict or decode does not converge. First check: revert to wide trigger and re-run convergence (H2-3).
  • Captures show the symptom but no cause → pre-trigger missing. First check: enforce a fixed pre/post template window.
  • Different analyzers disagree → timing base and thresholds differ. First check: add a hard reference pulse or marker and compare alignment.
Evidence window template (standardize every capture)
Window
  • Pre: X
  • Post: X
  • Trigger: explicit condition + versioned notes
Channels
  • bus lines (e.g., SCL/SDA or CS/SCLK or RX/TX)
  • reset/IRQ lines (if present)
  • firmware marker pulse (recommended)
Outputs
  • decode timeline export
  • trigger configuration record
  • stats summary (counts, burst rate, deltas)
Diagram: Trigger funnel—reduce raw edges into a captured bug window with fixed evidence
Trigger funnel: from noise to evidence Raw edges Protocol events Evt Evt Evt Filtered sequence A B Bug Captured bug window Pre (X) Post (X) Trigger Debounce / Threshold Fixed pre/post template

I²C Decode Playbook: NAK, arbitration, stretch, stuck-bus (analyzer-first)

Scope guard
  • Covered: analyzer-first capture → trigger → evidence interpretation for common I²C failure modes.
  • Not covered: pull-up sizing, bus capacitance budgeting, or board-level electrical design (use I²C electrical subpages).

The playbook is built around a repeatable evidence chain: symptomwhat to capturehow to triggerhow to read the evidence. Each section prioritizes stable decoding and comparable windows over one-off screenshots.

Key events to surface in decode Keep the view event-driven (not waveform-only).
START STOP ACK / NAK Repeated START Clock stretching Arbitration lost SCL/SDA stuck
Analyzer-first playbook cards Symptom → capture → trigger → interpretation
NAK / retry burst
Symptom: intermittent NAK or retry storms.
  • Capture: the transaction chain leading into the first NAK; note whether NAK clusters by stage (address vs data).
  • Trigger: ACK/NAK event trigger; enforce fixed windows (pre = X, post = X). Optional: filter to a target address as a view reduction.
  • Interpret: if NAK aligns to a consistent lead-up pattern, tighten into a sequence trigger; if NAK rate changes with thresholds/sample rate, revisit H2-2/H2-3.
Evidence artifact: export decode + NAK frequency + lead-up event list.
Arbitration lost / multi-master conflict
Symptom: unexpected stalls or conflict errors.
  • Capture: the last clean sequence before the loss; include any secondary lines (IRQ/reset) if present.
  • Trigger: arbitration-lost event trigger (or an error flag) with a wide pre-trigger window.
  • Interpret: use alignment markers (H2-4) to correlate bus activity with firmware events; unstable positioning indicates sampling/threshold issues.
Evidence artifact: time-aligned timeline showing “what happened first.”
Clock stretching / timeout
Symptom: slow transactions, timeout failures, sporadic hangs.
  • Capture: stretch duration distribution (min/typ/max) and the event chain before long stretches.
  • Trigger: pulse-width/timeout trigger on SCL low > X, plus decode enabled to keep event context.
  • Interpret: multi-modal duration buckets often indicate state-dependent phases; tighten into a sequence trigger using the lead-up events.
Evidence artifact: histogram-ready list of stretch durations + lead-up signature.
Stuck bus (SCL/SDA held low)
Symptom: bus stops responding until reset.
  • Capture: the last complete transaction and the missing/abnormal termination event (if any).
  • Trigger: SCL low > X or SDA low > X with deep pre-trigger to keep the “last good” context.
  • Interpret: missing stop/termination events indicate an unclosed transaction chain; correlate to power/plug markers if available (H2-4).
Evidence artifact: last-good transaction + stuck-line duration + recovery action timing.
“Looks OK” traps (avoid mis-decode) If any signature matches, revisit H2-2/H2-3 first.
  • ACK flips across captures of the same action → sampling near edges or wrong thresholds can turn SDA into a false bit.
  • NAK rate changes when thresholds or sample rate changes → the dataset is not stable enough for protocol conclusions.
  • Decode “drifts” in time despite the same stimulus → convergence failed; raise sample rate/time resolution and re-test.
Diagram: I²C transaction timeline with trigger point and fixed capture window (pre/post)
I²C event timeline (minimal labels, maximum structure) START ADDR ACK/NAK DATA ACK STOP Trigger Capture window Pre (X) Post (X) Sample rate (converge) Threshold (stable bits)

SPI Decode Playbook: mode mismatch, CS timing, bit-slip, throughput drops

Scope guard
  • Covered: decode / trigger / timing alignment evidence for common SPI failures.
  • Not covered: termination, reflections, or long-trace SI implementation details (use the SPI Long-Trace SI subpage).

The SPI workflow is layered to prevent misdiagnosis: confirm CPOL/CPHA first, then validate CS semantics, then check sampling windows, and finally quantify throughput and burst behavior. Each step produces exportable evidence rather than one-off screenshots.

Layer order (diagnosis) Mode → CS → window → throughput
CPOL/CPHA CS setup/hold Sampling edge Burst / DMA Stats (gap/duty)
Analyzer-first SPI cards Symptom → capture → trigger → interpretation
CPOL/CPHA quick confirmation
Symptom: writes appear successful, but registers do not change; reads look “random” or shifted.
  • Capture: the same transaction under Mode 0–3 decode views; compare stability (byte boundaries and lengths should converge).
  • Trigger: wide trigger on CS active; enforce a fixed evidence window (pre = X, post = X).
  • Interpret: the correct mode typically yields stable decode across repeated captures; if all modes drift, revisit sampling/threshold setup (H2-2/H2-3).
Artifact: mode comparison snapshot + “decode convergence” notes.
CS semantics (setup/hold, gap, re-assert)
Symptom: intermittent failures clustered near frame boundaries.
  • Capture: CS low duration, CS high gap distribution, and the last/first clock edges around CS transitions.
  • Trigger: CS rising edge (frame end) or CS pulse width abnormal (< X or > X).
  • Interpret: if errors align with CS transitions or gap jitter, tighten the trigger to “boundary signatures” and export gap statistics.
Artifact: CS gap min/typ/max + boundary capture window.
Multi-slave: correct blame with CS fanout
Symptom: “SPI is broken” reports, but only one target misbehaves.
  • Capture: all CS lines and bucket anomalies by CS (error counts, abnormal lengths, unusual gaps).
  • Trigger: filter by a specific CS active state; start wide then narrow to the failing CS.
  • Interpret: single-CS clustering indicates target-specific issues; multi-CS anomalies indicate a shared root cause (align with markers, H2-4).
Artifact: per-CS anomaly histogram (counts per minute / per 1k frames).
Bit-slip / sampling-edge issues
Symptom: occasional one-bit shifts or “byte boundary drift” across otherwise similar frames.
  • Capture: the first bad frame and the last good frame; compare edge-to-sample markers and frame length stability.
  • Trigger: abnormal frame length / unexpected boundary (if supported), else wide CS trigger + post-filter by anomaly.
  • Interpret: if slip rate changes with sampling/threshold settings, convergence is not met (H2-3). If slip clusters at burst boundaries, revisit CS semantics and burst framing.
Artifact: annotated “first-bad frame” capture with sample-edge markers.
Throughput drops / DMA burst behavior
Symptom: throughput collapses under small transfers or burst boundaries.
  • Capture: gap distribution, burst length distribution, and payload duty (active clock time / total time).
  • Trigger: statistical threshold triggers: gap > X, burst < X, jitter spikes > X.
  • Interpret: boundary-aligned drops point to framing/queueing; event-aligned drops point to external back-pressure or firmware scheduling (confirm with markers, H2-4).
Artifact: duty + gap + burst exports for apples-to-apples comparison.
Diagram: SPI rails (CS/SCLK/MOSI/MISO) with sampling-edge markers and Mode 0–3 selector
SPI decode structure: mode + CS + sampling edges Mode selector Mode 0 Mode 1 Mode 2 Mode 3 CS SCLK MOSI / MISO Data Data Data Sample edge Bit-slip CS boundary evidence Gap / duty statistics

UART Decode Playbook: baud error, framing/parity, flow control, wake/break

Scope guard
  • Covered: analyzer decode, error classification, timing alignment, and flow-control evidence.
  • Not covered: RS-232/RS-485 electrical/PHY details (use the UART Voltage Levels & PHY subpage).

UART failures often appear random until they are typed and aligned in time. The layered workflow starts with a baud-rate sanity, then converts “garbage” into error classes, separates back-pressure from link issues, and finally designs robust wake/break capture windows.

Baud (auto) Errors RTS/CTS Wake/Break
Baud rate & sampling sanity
Symptom: intermittent garbage or periodic failures.
  • Capture: auto-baud estimate (if available) and measured bit widths over time; track drift across frames.
  • Trigger: framing/parity events with fixed windows (pre = X, post = X).
  • Interpret: if errors cluster with bit-width drift, treat as a timing budget issue; typical combined clock tolerance is often around ±2% as a practical checkpoint.
Artifact: bit-width trend + time-stamped error bursts.
Error classes (framing / parity / overrun)
Symptom: “random” corruption without an obvious pattern.
  • Capture: per-class counts + the exact timestamps; identify whether errors are isolated or bursty.
  • Trigger: trigger on a specific error type to build clean evidence packs.
  • Interpret: framing suggests boundary loss; parity suggests bit-level corruption; overrun suggests buffering/back-pressure—confirm by aligning with RTS/CTS lines.
Artifact: typed error timeline export (counts + burst intervals).
RTS/CTS: detect back-pressure masquerading as link faults
Symptom: timeouts or stalls that look like a link problem.
  • Capture: CTS/RTS alongside TX/RX; measure block durations and correlate to dropped frames/timeouts.
  • Trigger: CTS blocked > X (duration threshold) or timeout event + examine CTS state in the fixed window.
  • Interpret: if all stalls align to CTS blocking, treat as back-pressure; if CTS is normal but errors persist, return to baud and error typing.
Artifact: CTS duty + block time distribution (min/typ/max).
Wake / Break / Idle detect windows
Symptom: wake-up occasionally fails or the first frame after wake is corrupted.
  • Capture: break/idle boundary with deep pre-trigger to preserve the lead-up; include a firmware marker if available.
  • Trigger: break detected or idle-to-active transition; enforce a fixed window template.
  • Interpret: if failures align to wake boundaries, treat as a capture-window/alignment problem first (H2-4), not as “random link noise”.
Artifact: wake boundary capture with first-frame correctness check.
Diagram: UART bit frame (start/data/parity/stop) with sample points and error labels
UART decode structure: bits + sample points + typed errors Start Data (D0..D7) Parity Stop Sample Parity Framing Baud (X) Tolerance (±X%) Error typing statistics RTS/CTS alignment

Debug Workflow: prove root cause with a repeatable evidence chain

Scope guard
  • Covered: evidence chain template that is repeatable, transferable, and bug-report ready.
  • Not covered: long repair plans for circuits/firmware; focus is proof and pass criteria.

The goal is not “a capture” but a closed loop: each cycle produces artifacts that another engineer can replay and compare. The workflow below turns intermittent symptoms into quantifiable triggers, isolates variables one at a time, and ends with a pass criterion suitable for regression.

5-step closed loop Goal → Do → Validate → Output
Step 1 · Define the symptom
  • Goal: convert “intermittent” into a measurable event type.
  • Do: pick the observable signature (NAK / bit-slip / framing / timeout / stall).
  • Validate: reproduce under the same conditions at least N times.
  • Output: one-line symptom definition + counting rule (X placeholders).
Step 2 · Design the trigger
  • Goal: capture the bug window reliably.
  • Do: start wide, capture a real sample, then narrow into a sequence/threshold trigger.
  • Validate: track hit rate vs miss rate (X placeholders).
  • Output: trigger definition + fixed window template (pre = X, post = X).
Step 3 · Capture a clean sample
  • Goal: ensure the sample reflects the system, not the measurement chain.
  • Do: run a convergence check (sampling/threshold changes must not flip the conclusion).
  • Validate: repeated captures decode consistently.
  • Output: raw capture + decode export + setup record.
Step 4 · Isolate variables (one at a time)
  • Goal: turn correlation into a causality chain.
  • Do: change one variable only (bridge / isolator / queue depth / firmware build / cable).
  • Validate: anomaly rate changes monotonically or predictably.
  • Output: A/B evidence packs with identical trigger/window template.
Step 5 · Define pass criteria (regression-ready)
  • Goal: make “fixed” measurable in production-like runs.
  • Do: set thresholds (error rate ≤ X, p99 latency ≤ X, stall count ≤ X).
  • Validate: repeat under a fixed workload and always pass.
  • Output: pass criteria + regression steps (X placeholders).
Evidence pack (artifacts) must-have vs optional
Must-have
  • Raw capture: preserves threshold/sampling auditability.
  • Decode export: makes interpretation reproducible.
  • Trigger definition: explains why the bug window is captured.
  • Statistics: converts “rare” into measurable rates/distributions.
  • Marker notes: pin polarity + meaning + mapping to firmware events.
Optional
  • Cross-instrument alignment: correlates analyzer + scope when needed.
  • A/B comparison pack: identical trigger/window with one-variable change.
  • One-page summary: symptom + trigger + pass criteria (bug-report ready).
Naming & packaging rule (recommended)
Use a stable naming pattern such as bus_tool_symptom_trigger_window_timestamp and include a small README with three lines: Symptom, Trigger, Pass criteria.
Misattribution traps avoid false conclusions
  • Retry ≠ hardware failure: align retry bursts to bus events and markers before blaming the physical layer.
  • Threshold drift ≠ protocol error: if the conclusion flips by changing thresholds or sampling, fix the measurement chain first.
  • Single screenshot ≠ proof: require repeat captures and a distribution (rate, gaps, burst intervals).
Diagram: evidence chain loop (Capture → Annotate → Compare → Isolate → Pass → Capture)
Evidence chain loop Capture Raw Annotate Decode Compare A/B Isolate Vars Pass Criteria Fixed window template (pre/post) Evidence pack = reproducible proof

Engineering Checklist (design → bring-up → production)

Intent
  • Focus: observability hooks that make analyzer evidence fast and repeatable.
  • Not a bus design guide: avoids routing/SI rules and protocol-level teaching.

This checklist embeds observability into the product lifecycle. Each item is phrased as a hook with a verification action and a pass condition, enabling consistent bring-up and production regression.

Design · observability hooks to build in
  • Test pads / probe access — Verify: stable capture without clip-induced flips; Pass: decode remains consistent when sampling/threshold changes one step.
  • Marker GPIO / strobe pin — Verify: a deterministic pulse at key firmware events; Pass: marker-to-bus alignment error < X (placeholder).
  • Config options (pull-up/termination variants) — Verify: A/B switchable by BOM option or strap; Pass: A/B evidence packs use identical trigger/window template.
  • Bypass hooks (mux/isolator bypass) — Verify: bypass toggles reproducibly; Pass: anomaly rate shifts predictably across bypass state.
  • Logging with timestamps — Verify: event IDs and timestamps are emitted; Pass: log events can be correlated to analyzer markers (shared reference concept).
MVP hooks (minimal viable): test pads + one marker pin + a bypass option + a stable log timestamp field.
Bring-up · templates and minimal repro
  • Default trigger templates — Verify: one-click templates catch common anomalies; Pass: hit rate ≥ X with miss rate ≤ X (placeholders).
  • Minimal reproducer — Verify: fixed script/sequence reproduces the symptom; Pass: appears ≥ X times across N runs (placeholders).
  • Alignment plan — Verify: marker + bus lines capture simultaneously; Pass: event order is stable and repeatable across captures.
  • Evidence packaging — Verify: exports include raw + decode + trigger + stats; Pass: another station can replay and reach the same conclusion.
MVP bring-up pack: symptom line + trigger rule + pre/post window + typed error stats.
Production · regression and fast triage
  • BIST / loopback — Verify: runs at power-on or station time; Pass: error count = 0 or ≤ X (placeholders).
  • Stats threshold alarms — Verify: station captures and logs key metrics; Pass: exceed threshold → automatic fail/hold decision.
  • Station capture triage — Verify: fixed trigger/window produces a compact evidence pack; Pass: pack naming + content are consistent and reviewable.
  • Golden reference runs — Verify: periodic baseline captures; Pass: distributions remain within control limits (X placeholders).
MVP production guard: loopback + one threshold alarm + a fixed capture template for triage.
Diagram: observability hooks map around the DUT (test pads, marker pin, bypass, config, logging)
Observability hooks map DUT Bus lines Test pad Marker pin Mux bypass Isolator bypass Config option Logging Analyzer Probe Evidence pack Hooks enable fast capture and alignment Regression uses the same trigger/window

Applications (where analyzers save days)

Scope guard
  • Covered: application buckets for I²C/SPI/UART evidence-first debugging.
  • Not covered: Industrial Ethernet, high-speed SerDes, USB/PCIe/TSN domains, or long repair plans.

Each card below maps a real board-level scenario to the first evidence to capture and the first trigger to use. The objective is fast, repeatable proof (raw + decode + stats + markers), not guesswork.

I²C EEPROM write then read-back fails
  • Scenario: configuration/page write followed by immediate verify read.
  • Common failures: NAK burst, stale data read-back, intermittent verify fail.
  • Evidence to capture: START/STOP + ACK/NAK timeline, repeated START sequence, write-to-read gap stats (busy window).
  • First trigger: trigger on NAK (or specific address NAK), fixed pre/post window.
  • Pass hint: read-back mismatch rate ≤ X (placeholder) under a fixed workload.
Material examples (verify suffix/package)
Microchip 24LC256, ST M24C64, Microchip/Atmel AT24C256C
I²C Clock stretching / slow slave causes timeouts
  • Scenario: sensor reads succeed on bench but fail under load/temperature.
  • Common failures: master timeout, repeated retries, “looks OK” but latency explodes.
  • Evidence to capture: SCL low-hold duration distribution, transaction latency stats, marker-aligned firmware event stamps.
  • First trigger: trigger on SCL low > X (placeholder) or a protocol timeout event; keep long pre-trigger.
  • Pass hint: stretch duration p99 ≤ X (placeholder) in the target workload.
Material examples (verify suffix/package)
Bosch BME280 (I²C sensor), InvenSense MPU-6050 (I²C IMU), NXP PCA9535 (I²C I/O expander)
I²C MUX / long-reach segment adds delay or stalls
  • Scenario: bus works direct; adding mux/segment/cable introduces NAK or stuck-bus.
  • Common failures: first transaction after channel switch fails, SDA stuck low, bus “hangs” after hot-plug.
  • Evidence to capture: channel-switch marker alignment, first post-switch transaction, SCL/SDA low-stuck events.
  • First trigger: trigger on SCL low > X or SDA low > X (placeholders), with long pre-trigger.
  • Pass hint: post-switch first-try success ≥ X% (placeholder).
Material examples (verify suffix/package)
TI TCA9548A (I²C mux), NXP PCA9515A (I²C buffer), TI TCA9617A / NXP PCA9615 (differential I²C extender)
SPI SPI Flash XIP intermittent failures
  • Scenario: large reads / execute-in-place; rare CRC faults or random crashes.
  • Common failures: mode mismatch, CS gap abnormal, burst boundary drift, bit-slip.
  • Evidence to capture: CS gap distribution, transaction length distribution, decode convergence across Mode 0–3.
  • First trigger: start with CS-active wide trigger; then narrow to “gap > X” or “length abnormal” (placeholders).
  • Pass hint: payload duty stable; gap p99 ≤ X (placeholder) under target load.
Material examples (verify suffix/package)
Winbond W25Q64JV, Macronix MX25L25645G, GigaDevice GD25Q64
SPI Multi-slave bus: wrong attribution (which CS?)
  • Scenario: many slaves share SCLK/MOSI/MISO; one device misbehaves intermittently.
  • Common failures: abnormal frame length under one CS, gap jitter on one branch, “bus is bad” misdiagnosis.
  • Evidence to capture: error counts binned by CS, per-CS gap/length stats, first-bad-frame timestamp.
  • First trigger: trigger on target CS-active (or capture-wide then filter by CS labels).
  • Pass hint: anomaly concentration (Top1 CS share ≥ X%) used for fast root attribution.
Material examples (verify suffix/package)
TI SN74HC138 / SN74LVC138 (CS decode), NXP 74HC4051 (signal mux), TI SN74LVC1G125 (CS buffering)
UART Console garbling: baud drift vs framing/parity
  • Scenario: mostly readable console with rare bursts of junk characters.
  • Common failures: baud mismatch/drift, framing errors, parity errors, overrun.
  • Evidence to capture: error classification stats, auto-baud/bit-width measurement, error timestamps aligned to markers.
  • First trigger: trigger on framing/parity error; if unavailable, wide trigger on start-bit then filter by error events.
  • Pass hint: framing=0 and overrun ≤ X (placeholder) in a fixed stress run.
Material examples (verify suffix/package)
FTDI FT232R, Silicon Labs CP2102N, WCH CH340C
UART RTS/CTS back-pressure masks as “link failure”
  • Scenario: long frames or high log volume causes stalls/timeouts.
  • Common failures: CTS held in block state, RTS not released, false hardware blame.
  • Evidence to capture: CTS/RTS alongside TX/RX, CTS-block duration distribution, timeout alignment to CTS state.
  • First trigger: trigger on CTS block lasting > X (placeholder), fixed window pre/post.
  • Pass hint: timeout events not correlated with CTS-block in the final run.
Material examples (verify suffix/package)
NXP SC16IS750 / SC16IS752 (UART bridge with FIFOs), Maxim MAX3232 (RS-232 level), TI SN65HVD3082E (RS-485 transceiver)
ALIGN Bridges/muxes add new fixed/variable latency
  • Scenario: adding a bridge/isolator/mux changes timing and breaks edge cases.
  • Common failures: fixed offset shift, variable latency jitter, event ordering instability.
  • Evidence to capture: reference/marker pulse aligned with bus events, delay distribution (min/typ/max/p99).
  • First trigger: trigger on marker/reference pulse; long pre-trigger to capture the cause window.
  • Pass hint: delay distribution p99 ≤ X (placeholder) and stable event ordering.
Material examples (verify suffix/package)
NXP SC18IS602B (I²C↔SPI bridge), TI TCA9548A (I²C mux), Analog Devices ADuM1250 / TI ISO1540 (I²C isolator)
Diagram: application buckets grouped by bus (I²C / SPI / UART)
Application buckets I²C SPI UART EEPROM write/read NAK / busy gap MUX / segment stuck-bus Stretch / timeout latency stats Flash XIP CS gap / mode Mode mismatch decode shift Multi-slave CS binning Garbled console framing/parity RTS/CTS back-pressure Wake / break window capture Each bucket maps to: Evidence → First trigger → Pass hint (X placeholders)

Tool Selection Notes (logic analyzer / protocol analyzer) — specs that actually matter

Scope guard
  • Covered: selection logic and engineering thresholds.
  • Not covered: shopping lists or brand/model recommendations.

Selection should start from the bug class (speed, rarity, alignment, production/regression) and map to tool capabilities. The cards below focus on what changes the ability to capture, decode, correlate, and export evidence reliably.

Channel count (minimum viable set)
  • Why it matters: bus lines alone rarely explain failures; reset/IRQ/marker often unlock causality.
  • Rule of thumb: bus lines + 2 control lines + 1 marker (project-dependent).
  • Quick check: capture with/without marker; causality should become unambiguous.
Hook material examples
TI SN74LVC1G17 (Schmitt buffer for clean marker), TI SN74LVC1T45 (level shift marker to analyzer IO)
Thresholds & input tolerance
  • Why it matters: threshold mismatch creates protocol “illusions” (false ACK/bit values).
  • Rule of thumb: confirm the bus voltage domain and open-drain vs push-pull behavior.
  • Quick check: shift threshold one step; conclusion should remain stable (decode convergence).
Domain helpers (examples)
NXP PCA9306 (I²C level shifter), TI TXS0108E (multi-bit level translation, direction auto-sense use-case dependent)
Sample rate vs timing resolution
  • Why it matters: insufficient sampling causes edge ambiguity and bit-slip misreads.
  • Timing resolution: determines visibility of stretching, inter-frame gaps, and setup/hold margins.
  • Quick check: increase one tier; decode should not change class of conclusion.
Stability aids (examples)
Analog Devices LTC4311 (I²C rise-time accelerator), NXP PCA9517 (I²C buffer)
Memory depth (rare bugs)
  • Why it matters: rare failures demand long capture windows with pre-trigger context.
  • Rule of thumb: depth defines “how far back” the cause can be proven.
  • Quick check: verify pre-trigger can include the full lead-up interval.
Bridge/buffering examples
NXP SC16IS752 (dual UART bridge with FIFOs), NXP SC18IS602B (I²C↔SPI bridge)
Trigger capability (edge → sequence → stats)
  • Why it matters: rare bugs become capturable when triggers are designed, not guessed.
  • Need-to-have: edge/level + protocol events + sequence + threshold/statistics triggers.
  • Quick check: create a “first funnel” trigger then a narrow trigger; compare hit/miss rates.
Protection examples
Nexperia PESD5V0S1BA (ESD diode), Littelfuse SMF05C (multi-line ESD array)
Export & automation (regression-ready)
  • Why it matters: production and CI-style regression require machine-readable artifacts.
  • Need-to-have: CSV/JSON export, stable timestamps, scripting/API (where available).
  • Quick check: run an A/B pack and compute the same KPI (error rate, gap p99) from exports.
Isolation examples
TI ISO7741 (digital isolator), Analog Devices ADuM1250 / TI ISO1540 (I²C isolator)
Selection flow (If/Else) map needs → capability thresholds
If the need is…
  • Higher speed / edge ambiguity: sampling + threshold stability matter most.
  • Rare bug (minutes→hours): depth + pre-trigger + stats triggers dominate.
  • Timing alignment: marker/reference + sync IO capability is required.
  • Production/regression: export + automation + stable KPIs are mandatory.
Then prioritize…
  • Channels ≥ X: bus + reset/IRQ + marker (project-specific).
  • Sample ≥ X: enough to avoid bit-slip misreads (X placeholder).
  • Depth ≥ X: covers lead-up + aftermath for causality (X placeholder).
  • Trigger: sequence + stats threshold triggers for rare events.
  • Export/API: CSV/JSON + repeatable templates for A/B packs.
Reference BOM (examples)
I²C mux TCA9548A · I²C buffer PCA9515A · Diff I²C extender PCA9615/TCA9617A · I²C↔SPI bridge SC18IS602B · UART bridge SC16IS752 · RS-232 level MAX3232 · RS-485 transceiver SN65HVD3082E · I²C isolator ISO1540/ADuM1250 · ESD diode PESD5V0S1BA · ESD array SMF05C
Diagram: selection flow (needs → thresholds → tool class)
Selection flow Needs Speed Rare bug Alignment Production Requirement Map to thresholds Thresholds Channels ≥ X Sample ≥ X Depth ≥ X Trigger / Export Tool class: Logic / Protocol / Combined

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (fixed 4-line answers + JSON-LD)

These FAQs close common long-tail debugging loops without expanding the main text. Each answer is a data-driven checklist: likely cause → quick check → fix → pass criteria (thresholds use X placeholders).

Decode shows intermittent I²C NAK, but the scope “looks fine” — threshold or sample rate first?
Likely cause: analyzer Vth mismatched to the logic domain (open-drain/level-shift), or sampling too low causing edge ambiguity.
Quick check: A/B: shift threshold by ±ΔV (placeholder) and raise sample rate ×2; confirm whether NAK count changes and whether decode alignment shifts.
Fix: lock Vth to the correct rail domain; set sample rate ≥ X×fSCL (placeholder) and enable glitch/debounce where supported.
Pass criteria: NAK rate ≤ X per 10k transactions (placeholder) and decode remains stable across ±ΔV sweep (no event class changes).
One branch hot-plugged, then the whole I²C bus locks — fastest way to identify who holds SDA low?
Likely cause: a device keeps SDA low (stuck-bus) after brown-out/hot-plug, or a mux/isolator channel is left in a bad state.
Quick check: trigger on SDA low > X ms (placeholder) and keep long pre-trigger; correlate the last valid address before the hold and the branch-switch marker (if present).
Fix: add/enable a recovery sequence (clock pulses + STOP), and use segmentation (mux/buffer) to isolate the faulty branch for immediate attribution.
Pass criteria: SDA low holds > X ms occur 0 times in N hot-plug cycles (placeholder), and recovery completes within X ms (placeholder).
Clock stretching “sometimes” causes timeouts — how to prove it’s stretch latency (not decode artifacts)?
Likely cause: genuine SCL low-hold latency spikes, or analyzer time resolution insufficient to measure hold durations reliably.
Quick check: compute SCL-low duration distribution (min/typ/p99); re-run with one higher timing-resolution tier and check whether the distribution shape changes.
Fix: enforce a master timeout policy aligned to measured p99 stretch; add marker to timestamp firmware request/response boundaries for correlation.
Pass criteria: SCL-low hold p99 ≤ X µs (placeholder) under stress, and timeout events drop to ≤ X per hour (placeholder).
SPI register write “succeeds” but does not take effect — CPOL/CPHA first or CS hold first?
Likely cause: mode mismatch causing bit/byte shift, or CS deassertion violates hold/setup timing so the slave drops the frame.
Quick check: A/B decode Mode 0–3 and look for alignment convergence; measure CS hold time around the last clock edge (CS↑ vs SCLK) against X ns (placeholder).
Fix: lock a known-good mode at power-up; add guard time: CS hold ≥ X ns and CS inactive gap ≥ X ns (placeholders) in the transaction template.
Pass criteria: write-then-readback match ≥ X% (placeholder) and decode remains correct under mode lock (no sporadic shifts).
SPI decode intermittently shifts (bit-slip) — is it sampling, threshold, or trigger placement?
Likely cause: marginal sampling rate/time resolution near SCLK edges, or threshold drift on MISO/MOSI creating false transitions.
Quick check: raise sample rate ≥ ×2 and shift Vth by ±ΔV (placeholder); compare “shift event” timestamps to SCLK duty anomalies or edge clusters.
Fix: set sample rate ≥ X×fSCLK (placeholder), enable hysteresis/glitch reject if available, and trigger on CS-active with fixed pre/post window.
Pass criteria: decode shift events = 0 over N transactions (placeholder) and conclusions remain unchanged across ±ΔV threshold sweep.
SPI throughput collapses with many small transfers — how to prove it’s gaps/queueing (not “bad wires”)?
Likely cause: large CS inactive gaps and burst fragmentation from firmware/DMA framing, not signal integrity defects.
Quick check: export stats: CS gap p50/p99 and payload duty-cycle; compare “many small” vs “batched” runs (A/B) using identical clock settings.
Fix: batch transactions, increase in-flight depth where applicable, and set a minimum inter-transfer gap template (≤ X µs target, placeholder).
Pass criteria: effective throughput ≥ X Mbps (placeholder) and CS gap p99 ≤ X µs (placeholder) under the target workload.
UART shows intermittent framing errors — baud error first or noise/glitches first?
Likely cause: clock mismatch/drift exceeding tolerance, or narrow noise pulses corrupting the start/stop boundary.
Quick check: measure bit width over M frames and compare to nominal; then enable glitch reject / adjust Vth by ±ΔV (placeholder) to see if framing rate changes.
Fix: tighten baud error budget (target |Δbaud| ≤ X%, placeholder) and add de-glitch/oversampling where available; align errors to markers for causality.
Pass criteria: framing errors = 0 in N frames (placeholder) and measured bit width deviation ≤ X% (placeholder) across temperature/load.
UART “random drops” but no framing/parity — how to prove overrun or RTS/CTS back-pressure?
Likely cause: receiver overrun (buffer/service latency), or flow control holding the link (CTS asserted/blocked).
Quick check: capture RTS/CTS alongside TX/RX and compute CTS-block duration distribution; align drop timestamps to CTS state and burst size.
Fix: increase buffering/servicing margin and enforce back-pressure policy; define a maximum CTS-block time ≤ X ms (placeholder) for the system.
Pass criteria: overruns ≤ X per hour (placeholder) and CTS-block p99 ≤ X ms (placeholder) under worst-case log load.
Trigger hits “too often” or “never” — how to design the trigger funnel without guessing?
Likely cause: trigger condition too narrow before a baseline is understood, or too sensitive to noise (edge/glitch misfires).
Quick check: run a two-stage plan: wide trigger to collect M samples, then measure which event/sequence occurs only near failures (frequency ratio ≥ X, placeholder).
Fix: implement layered triggers (edge → event → sequence) and add debounce/time-qualification windows (≥ X ns/ms, placeholders).
Pass criteria: trigger hit rate within [Xmin, Xmax] per hour (placeholders) and captures include ≥ X ms pre-trigger context (placeholder).
The bug happens once per hour — sample rate is fine, but the capture still misses the “lead-up”
Likely cause: memory depth too small for the required pre-trigger window, or segmented capture not configured for rare events.
Quick check: compute required pre-trigger time = suspected lead-up interval (X s/min, placeholder); compare to available pre-trigger depth at current sample rate.
Fix: lower sample rate selectively (if allowed) to buy time span, enable segmented/deep memory modes, and trigger on a higher-level event closer to the failure.
Pass criteria: ≥ X seconds of pre-trigger is present in ≥ X% of captures (placeholders), and the causal event appears within the window consistently.
Analyzer vs scope timing does not align — what is the first reliable correlation method?
Likely cause: independent timebases, trigger latency differences, or missing common reference/marker between instruments.
Quick check: inject a reference pulse/marker GPIO visible to both tools; measure Δt between marker and the target event across K repeats (placeholder).
Fix: use shared trigger/marker routing, calibrate channel skew, and report fixed offset + jitter (min/typ/p99) instead of single-shot deltas.
Pass criteria: alignment error p99 ≤ X ns/µs (placeholder) and Δt distribution remains stable across tool re-arming cycles.
Same board, different analyzer gives different conclusions — what correlation check comes first?
Likely cause: different thresholds, sampling/time resolution tiers, decode assumptions (mode/baud), or probe/grounding differences.
Quick check: lock both tools to the same Vth, sample rate, trigger template, and decode settings; run an A/B pack of N identical transactions (placeholder) and compare event counts.
Fix: standardize capture templates (threshold/sampling/trigger/export) and require exports (CSV/JSON) to compute KPIs instead of screenshot-only judgments.
Pass criteria: KPI deltas (NAK rate, framing rate, gap p99) differ by ≤ X% across tools (placeholder) and root-cause classification matches.