123 Main Street, New York, NY 10001

I2C Clock Stretching: Slave Delay, Master Timeouts & Recovery

← Back to: I²C / SPI / UART — Serial Peripheral Buses

Clock stretching is only “real” when a slave actively holds SCL low; this page shows how to separate true stretching from electrical/bridge artifacts, then enforce bounded timeouts and reliable recovery.

The goal is production-grade behavior: signature-based diagnosis, transparent topology checks, and measurable counters/pass criteria (X placeholders) from bring-up through mass production.

H2-1 · Scope & Quick Triage: Is it really clock stretching?

Goal: classify “SCL looks low for too long” into the correct bucket in under a minute. This prevents chasing clock stretching when the root cause is actually edge-rate, intermediate clamping, or master pacing.

Scope guard (to avoid cross-page overlap)
  • This page focuses on clock-stretch behavior, timeout policy, robust error handling.
  • Detailed pull-up sizing math and full RC budgeting belongs to the Open-Drain & Pull-Up Network subpage.
  • Deep multi-master arbitration and repeated-start edge cases belong to their dedicated subpages. Here they are mentioned only as false-positive discriminators.
Required signature (measurable, repeatable)
Treat “true clock stretching” as a measured extension of SCL-low time: tLOW_EXT = observed SCL-low − expected SCL-low
  • Actor: SCL is held low by an external open-drain holder (typically the slave on that segment), not by the master’s own duty-cycle shaping.
  • Context: the extension tends to align with protocol boundaries (ACK/byte boundary or specific command response), not purely random jitter.
  • Consistency: the phenomenon is reproducible under the same command and can be confirmed at two probe points (near master vs near slave).
Three common false positives (fast discrimination)
A) Slow edge / RC distortion (looks like “long low”)
  • Symptom: SCL “low” width changes with probe location; rising edge is a long ramp.
  • Quick check: compare near-master vs near-slave SCL. If “low width” shrinks near master, it is likely threshold-crossing delay rather than a real hold-low.
  • Conclusion bucket: electrical edge-rate problem (RC/EMI). Resolve in pull-up/network design pages.
B) Level shifter / isolator clamp (direction-control trap)
  • Symptom: the bus becomes “held” only when a specific segment/device is connected.
  • Quick check: temporarily bypass/short the intermediate device or move the slave to the same voltage domain. If the issue disappears instantly, the intermediate path is not transparent.
  • Conclusion bucket: intermediate clamp / non-transparent segment.
C) Master pacing (controller inserts waits; not a slave)
  • Symptom: similar “long low” appears across many slaves; depends on master configuration/load.
  • Quick check: switch driver mode (polling ↔ IRQ ↔ DMA) or change I²C peripheral timing mode. If the pattern tracks master settings, it is pacing rather than stretching.
  • Conclusion bucket: master-side timing/driver behavior.
3-step triage checklist (produce a conclusion label)
Step 1 — Identify who holds SCL low

Use two probe points (near master vs near slave or across the intermediate device). The holder is the segment where SCL remains low even when the upstream side attempts to release.

Output label: HOLDER = slave / intermediate / master-pace / unknown
Step 2 — Correlate with protocol phase

Determine whether the extension aligns with byte boundaries (ACK/data) or appears random. Boundary alignment strongly supports intentional stretching; random alignment suggests edge-rate/EMI artifacts.

Output label: PHASE = ACK boundary / data boundary / random
Step 3 — Check cross-device & cross-topology consistency

Repeat the same transaction across a second slave (or with the intermediate device bypassed). If the symptom follows a specific slave/segment, it is not a generic master pacing artifact.

Output label: CONSISTENCY = same across slaves / only one slave / only after shifter
Fast conclusion mapping
  • Likely true stretching: HOLDER=slave and PHASE=ACK/data boundary.
  • Likely edge-rate artifact: PHASE=random and the “low width” changes by probe point.
  • Likely intermediate clamp: CONSISTENCY=only after shifter.
  • Likely master pacing: symptom reproduces across multiple slaves and tracks controller/driver configuration.
Quick triage decision tree for I2C clock stretching Flow from symptom SCL low looks long to probe points near master, across intermediate device, near slave, mapping to likely cause buckets. Symptom SCL low looks long Probe A Near master pins Probe B Across shifter/iso Probe C Near slave pins True stretching Slave holds SCL low Intermediate clamp Shifter/iso not transparent Edge-rate artifact RC / threshold delay Use 2 probe points to confirm the holder segment. Keep labels short; log tLOW_EXT distribution.
Diagram: a practical triage tree—start from the symptom, probe at strategic points, then classify into a cause bucket.

H2-2 · What Clock Stretching Is (and is not)

This section establishes a single engineering definition used throughout the page: measurable, repeatable, and phase-aware. Without this, timeout policy and recovery logic become inconsistent across teams and tooling.

Engineering definition (portable across tools and teams)
  • Signal: stretching occurs on SCL (not SDA). SCL is held low longer than the master’s nominal clocking pattern.
  • Mechanism: the holder is an open-drain device on the bus segment (commonly the slave).
  • Observable metric: tLOW_EXT = measured SCL-low extension (distribution, not a single point)
  • Protocol context: typical stretching is command-dependent and aligns with byte/ACK boundaries more often than it appears as random timing noise.
Typical legitimate reasons (kept high-level to avoid overlap)
Internal latency
Conversion, compute, CDC synchronization, or NVM operations that must complete before a safe response.
Power-state transitions
Wake-up latency or safety gates (UVLO/thermal) that delay readiness.
Rate mismatch protection
A slave throttles the bus to avoid buffer overruns or to guarantee coherent register reads.
“Is NOT stretching” exclusion clauses (each with a fastest check)
Not #1 — Master intentionally slows SCL
Fast check: change master timing (rate/duty mode). If “long low” scales deterministically with settings across multiple slaves, it is master pacing.
Not #2 — Duty-cycle shaping by controller
Fast check: the low-time ratio remains fixed (patterned) regardless of command content; no command-dependent “latency signature” exists.
Not #3 — RC/threshold distortion (false low width)
Fast check: compare near-master vs near-slave and inspect the SCL rising edge. A long ramp indicates threshold-crossing delay rather than a stable hold-low plateau.
Normal versus clock-stretched SCL timing Two SCL timing rows compare normal clocking and clock stretching where a slave holds SCL low; the extension tLOW_EXT is highlighted. Normal Stretched Slave holds SCL low tLOW_EXT Measure tLOW_EXT as a distribution across commands; do not treat a single capture as a universal timeout.
Diagram: normal SCL vs stretched SCL. Clock stretching is defined by a stable hold-low segment and a measurable extension (tLOW_EXT).

H2-3 · Timing Anatomy: Where stretching can appear inside a transaction

Clock stretching becomes actionable only after it is mapped to a transaction segment. This section defines a phase-aware signature so debug and timeout policy can target the right bucket (internal latency, intermediate clamp, or master pacing) without guessing.

Three anchors to label every event (produce tags, not opinions)
Anchor 1 · Alignment
Determine whether tLOW_EXT aligns to a byte/ACK boundary or appears random.
Output tag: ALIGN = addr_ack / data_ack / read_gap / pre_restart / random
Anchor 2 · Direction
Check whether stretching happens only on read, only on write, or both. This often separates “response preparation” from “commit operations”.
Output tag: DIR = read_only / write_only / both
Anchor 3 · Trigger coupling
Identify whether stretching is tied to a specific register/command. Strong coupling implies a functional latency bucket (conversion/NVM/security/wake) rather than generic bus integrity.
Output tag: TRIGGER = specific_reg / specific_cmd / any_access
Transaction segments (use the same labels in logs and captures)
  • Segment A: START → Address bits → Address ACK
  • Segment B (write): Data byte → Data ACK (repeat)
  • Segment C (read): Data byte → Master ACK/NACK (repeat); watch the read gap
  • Segment D: Repeated START / STOP boundaries (used here only as a signature marker, not as a full protocol tutorial)
Stretch signature table (position → likely bucket → first checks)
Signature: ALIGN=addr_ack (Segment A)
  • Likely bucket: wake-up gate, CDC sync, security gate
  • First check: does it happen only on first access after idle/power change?
  • Pass indicator: tLOW_EXT stays below X ms for this address ACK
Signature: DIR=write_only + ALIGN=data_ack (Segment B)
  • Likely bucket: NVM commit, erase/program protect, atomic config update
  • First check: does it correlate with specific writes (calibration/config/OTP)?
  • Pass indicator: timeout policy uses T_TXN_MAX = X ms and retry count N_RETRY = X
Signature: DIR=read_only + ALIGN=read_gap (Segment C)
  • Likely bucket: response preparation, conversion/filters, buffer fill
  • First check: does “kick then later read” eliminate the stretch signature?
  • Pass indicator: tLOW_EXT distribution stays bounded (p99 ≤ X ms)
Signature: ALIGN=pre_restart (Segment D)
  • Likely bucket: state-machine mismatch, bridge queueing, intermediate non-transparency
  • First check: split the combined operation into two transactions (only for diagnosis) and compare signatures.
  • Pass indicator: “pre-restart” stretch does not exceed X ms and does not leave the bus busy.
Signature: ALIGN=random (any segment)
  • Likely bucket: edge-rate artifact, EMI/crosstalk, master pacing
  • First check: compare near-master vs near-slave captures; inspect rising edge slope and threshold behavior.
  • Pass indicator: after mitigation, signature becomes boundary-aligned or disappears; “random” should not dominate logs.
I2C transaction anatomy and stretch insertion points A block timeline shows START, ADDR, ACK, DATA, ACK, DATA, ACK, RSTART, STOP. Four common insertion points are circled and labeled. Transaction timeline (signature view) Mark where tLOW_EXT appears; log ALIGN / DIR / TRIGGER. START ADDR ACK DATA ACK DATA ACK RS STOP (boundary marker; used only for signature tagging) Segment A Segment B Segment C Segment D tLOW_EXT addr_ack tLOW_EXT data_ack tLOW_EXT read_gap tLOW_EXT pre_restart Use signature tags to select timeouts and the fastest checks. Keep labels short for mobile readability.
Diagram: transaction anatomy as a signature map. Common insertion points are marked for quick alignment with logic analyzer captures.

H2-4 · Why slaves stretch: internal latency buckets (practical causes)

After locating the phase signature (H2-3), the next step is to assign the event to an internal latency bucket. Each bucket below provides: expected tLOW_EXT range (placeholder), typical trigger, and the fastest check. The intent is to shorten root-cause time, not to expand into unrelated protocol chapters.

Six practical latency buckets (each includes a fastest discriminator)
Bucket 1 · Register map / CDC sync
  • Typical triggers: reads requiring coherent snapshot; shadow register refresh; cross-clock sync.
  • Expected tLOW_EXT: ~X µs to X ms (placeholder).
  • Fastest check: repeated reads of the same register show a bounded, repeatable extension (tight distribution).
Bucket 2 · Sensor conversion / filter window
  • Typical triggers: on-demand measurement; averaging/filter pipeline; “read before ready”.
  • Expected tLOW_EXT: ~X ms (often ms-class; placeholder).
  • Fastest check: “kick then later read” eliminates the stretch signature; DRDY/INT-based flow yields stable bus timing.
Bucket 3 · NVM write / program protection
  • Typical triggers: EEPROM/flash commit; configuration lock; atomic update; wear-level gate.
  • Expected tLOW_EXT: ~X ms to X tens of ms (placeholder).
  • Fastest check: stretching correlates with specific write sequences; read-only traffic shows no similar signature.
Bucket 4 · Security / authentication gate
  • Typical triggers: CRC/signature check; challenge-response; secure counters; integrity verification.
  • Expected tLOW_EXT: ~X µs to X ms (placeholder, engine-dependent).
  • Fastest check: signature appears only on security commands and scales with payload/attempt count.
Bucket 5 · Low-power wake (sleep → active)
  • Typical triggers: first transaction after long idle; power-domain enable; clock tree startup.
  • Expected tLOW_EXT: ~X ms (placeholder; often “first-hit only”).
  • Fastest check: the first access stretches; subsequent accesses in a short window do not.
Bucket 6 · Protection gating (UVLO / thermal / fault)
  • Typical triggers: undervoltage/thermal boundary; fault handling; brown-out recovery; watchdog gating.
  • Expected tLOW_EXT: distribution widens (p95/p99 drift); use placeholders.
  • Fastest check: tLOW_EXT statistics correlate with temperature/voltage logs and error counters rise together.
Deliverable: bucket → expected range (X) → typical triggers (short list)

Use the buckets above to define a system-specific timeout contract: T_STRETCH_MAX = X ms T_TXN_MAX = X ms N_RETRY = X The purpose is to cap worst-case latency while keeping legitimate device behavior functional.

Why a slave stretches: internal latency pipeline Block diagram shows I2C interface, command decode, a wait gate, internal engines (ADC, NVM, Crypto, Power/Fault), and response buffer. The wait gate indicates where stretching can be inserted. Slave internal pipeline (where waiting occurs) A wait gate can extend SCL-low (tLOW_EXT) until the response buffer is safe to serve. I2C Interface Command Decode WAIT Gate Response Buffer Internal engines ADC NVM Crypto Power tLOW_EXT Use bucket signatures to set T_STRETCH_MAX / T_TXN_MAX and to select the fastest checks.
Diagram: a high-level internal pipeline. Waiting at the gate can extend SCL-low until the response buffer is ready.

H2-5 · Master-side policy: timeouts, retries, and bus release rules

Clock stretching must be treated as bounded waiting. The master should enforce layered time limits, classify failures, and convert every timeout into a recoverable, observable event instead of a deadlock.

Policy guardrails (layered limits)
T_STRETCH_MAX (single low-hold bound)
Maximum allowed tLOW_EXT for a single stretching event. Prevents one hold from stalling the system.
Placeholder: T_STRETCH_MAX = X ms
T_TRANSACTION_MAX (transaction wall clock)
Maximum wall-clock time for a full transaction (START→end condition). Captures multi-stretch accumulation and controller stalls.
Placeholder: T_TRANSACTION_MAX = X ms
N_RETRY + backoff (avoid retry storms)
Retry count and delay policy to absorb transient busy windows while preventing feedback loops under fault conditions.
Placeholders: N_RETRY = X BACKOFF = X ms (± jitter)
Failure classification (controls retry vs escalation)
Class R · Retryable
Boundary-aligned, command-coupled busy windows (conversion/NVM busy/security gate). Distribution stays bounded.
Action: retry with backoff, keep counters and snapshots.
Class S · Suspect
Statistics drift (p99 grows), partial reproducibility, or mixed signatures. Often correlates with environment or intermediary behavior.
Action: limited retries then escalate recovery steps; consider feature downgrade.
Class H · Hard / Non-retryable
Bus stuck low (SCL/SDA held), controller error state, or persistent BUSY across transactions. Retry risks a storm.
Action: immediate strong recovery (bus clear / controller reset / isolate / power-cycle) and raise an error event.
Deliverable: master strategy table (scenario → timeouts → retry → fail action → log fields)
Scenario · DIR=read_only, ALIGN=read_gap, TRIGGER=specific_cmd (Class R)
  • Timeout: T_STRETCH_MAX=X ms; T_TRANSACTION_MAX=X ms
  • Retry: N_RETRY=X; backoff=X ms (± jitter)
  • Fail action: abort transaction → recover step 1 → retry; escalate if repeat rate rises
  • Log fields: addr, rw, ALIGN, TRIGGER, tLOW_EXT_max, txn_time, retry_count, recovery_step
Scenario · DIR=write_only, ALIGN=data_ack, TRIGGER=specific_reg (Class R/S)
  • Timeout: T_STRETCH_MAX=X ms; T_TRANSACTION_MAX=X ms (guard multi-byte writes)
  • Retry: N_RETRY=X; backoff=X ms; avoid infinite retry on write side-effects
  • Fail action: abort → recover step 1/2; escalate to controller reset if BUSY persists
  • Log fields: reg/cmd id (if known), tLOW_EXT_p99, cnt_timeouts, last_recovery_result
Scenario · Bus stuck low (SCL or SDA held) (Class H)
  • Timeout: detect low-hold beyond X ms → treat as hard fault (no retry loop)
  • Retry: N_RETRY=0 (or minimal) for immediate escalation
  • Fail action: bus clear pulses → controller re-init → isolate segment or power-cycle (if supported)
  • Log fields: line_state(SCL/SDA), recovery_step, recovery_outcome, cnt_bus_clear, cnt_ctrl_reset
Scenario · ALIGN=random, TRIGGER=any_access (Class S)
  • Timeout: keep tight T_STRETCH_MAX=X ms; enforce T_TRANSACTION_MAX=X ms
  • Retry: small N_RETRY=X; backoff with jitter to avoid correlated failures
  • Fail action: recover step ladder; if recurrence rate exceeds X%, enter degrade mode and alert
  • Log fields: p95/p99 drift, temp/voltage snapshot, cross-device consistency flag

Recommendation: store both “event snapshots” and “rolling statistics” so policies can be tuned using field data.

Master policy state machine: bounded wait, timeout, recover, retry/fail State machine shows RUN to WAIT_SCL, TIMEOUT, RECOVER with step ladder, then RETRY or FAIL. LOG is emitted from TIMEOUT and RECOVER. Master policy state machine (bounded waiting) Enforce T_STRETCH_MAX / T_TRANSACTION_MAX; classify failure; recover; retry with backoff. RUN WAIT_SCL tLOW_EXT tracking TIMEOUT classify + snapshot RECOVER RETRY backoff + N_RETRY FAIL raise event LOG counters + snapshot Step ladder Stop/Abort Bus Clear Ctrl Reset SCL low tLOW_EXT > T_STRETCH_MAX bus idle? retry < N_RETRY? Always bound waiting. Every timeout must emit counters + a snapshot (addr/segment/tLOW_EXT/recovery/result).
Diagram: master policy as a bounded state machine with escalation steps and mandatory observability.

H2-6 · Robust firmware driver design: non-blocking state machine

A robust I²C driver must never block indefinitely. Waiting is expressed as state, not as a busy loop. The driver must be watchdog-friendly, observable, and idempotent under retries so that higher layers can take actionable decisions.

Design pillars (must-haves)
Watchdog-friendly
Every wait point must yield control and resume safely. Recovery paths cannot depend on a full system reset.
Observable
Every timeout emits counters and a snapshot (addr, phase signature, tLOW_EXT, recovery step, result code).
Idempotent under retry
Retries must not amplify device state. Write-like operations require conservative retry rules and clear error reporting.
Actionable error codes (driver → upper layer)
BUS_BUSY / BUS_STUCK
Indicates the bus is not idle or lines are held. Upper layers should not queue blindly; trigger recovery or degrade mode.
STRETCH_TIMEOUT / TXN_TIMEOUT
Bounded waiting exceeded. Upper layers may retry with backoff (Class R) or escalate recovery (Class S/H).
NACK
Address/data not acknowledged. Upper layers should differentiate “device absent” vs “busy window” using phase signatures and counters.
ARB_LOST / BERR
Controller-level events. Driver should reset or re-init the controller before returning an actionable status upstream.
Deliverable: driver contract checklist (interfaces + counters + snapshots)
Required interfaces (minimum set)
  • submit_xfer(desc): includes addr/rw/len/flags and a policy id
  • cancel(bus_id): abort current transfer and enter RECOVER
  • service(): advances the non-blocking state machine (polling path)
  • get_stats(): returns counters + last snapshot (phase signature + timing)
Required counters (examples)
  • cnt_stretch_timeout / cnt_txn_timeout / cnt_retries
  • cnt_bus_clear / cnt_ctrl_reset / cnt_nack
  • tLOW_EXT_max (window) / p99 estimate (optional placeholder)
Snapshot fields (on every timeout)
  • addr, rw, phase signature (ALIGN/DIR/TRIGGER if known)
  • tLOW_EXT, txn_time, retry_count, recovery_step
  • result_code, line_state (SCL/SDA), timestamp

Contract goal: upper layers can decide “retry vs escalate” using structured evidence, not opaque failures.

Non-blocking driver layering and where to enforce timeouts/watchdog/log Four stacked layers: App, HAL/Driver, I2C Controller, Bus. Timeout, watchdog yield, and log blocks are placed at the driver layer. Driver layering (non-blocking by design) Implement policy enforcement, observability, and watchdog yields at the driver layer. App Retry Policy Error Handling Degrade / Alerts HAL / Driver State Machine Timeouts T_STRETCH Counters / Log T_TXN_MAX Watchdog yield I²C Controller IRQ/DMA Error Flags Bus (SCL/SDA) stretch holder Place bounded waits and observability in the driver; keep the bus layer strictly as physical behavior.
Diagram: layered responsibilities. Timeouts/watchdog yields/logging belong in the driver layer for non-blocking operation.

H2-7 · Hardware compatibility traps: controllers/bridges that break stretching

Clock stretching only works when the entire link is transparent to SCL low-hold. Any node that regenerates, queues, re-times, or latches direction can turn stretching into bus errors, false timeouts, or “invisible” timing changes.

Stretch transparency model (what the link does to SCL)
Transparent
Preserves the same SCL behavior end-to-end. A slave low-hold remains a low-hold across the node.
Conditionally transparent
Transparent only in specific modes/speeds/direction states. Configuration drift can silently break stretching.
Non-transparent
Regenerates timing (queue/re-time/protocol conversion). SCL after the node is no longer the same “wire behavior”.
Field symptoms that indicate a node is breaking stretching
  • Direct connection works, but inserting a node causes bus error / immediate timeout.
  • Stretch signatures lose byte-boundary alignment after insertion (queue/re-time artifacts).
  • Different controller modes treat the same low-hold as BERR/timeout rather than wait.
  • Near-side and far-side SCL disagree on whether the line is being held low (non-transparent behavior).
Must-do compatibility verification (insertion changes behavior)
Two-point simultaneous probe
Measure SCL both before and after the inserted node on the same transaction. A real low-hold must propagate through a transparent link.
Hold-through test
Trigger a known “busy window” transaction (device-specific command) and verify that SCL low-hold remains low across the chain.
Boundary preservation check
Confirm whether low-hold aligns to ACK / byte boundaries. Loss of alignment after insertion suggests queue/re-time or internal state-machine translation.
Mode-switch check (controller)
Repeat the same test under different controller modes (filters, hardware state machine options). A true-compatibility path does not change semantics from “wait” to “bus error”.
Deliverable: Stretch compatibility matrix (roles → transparency → must-test items)
Role: Master / Controller
  • Transparency: must treat low-hold as wait (not bus error)
  • Break mechanism: “no-stretch” HW mode, fixed timeout, misclassified error flags
  • Must-test: mode-switch check + hold-through test + error flag semantics
Role: Buffer / Repeater
  • Transparency: transparent or conditional (depends on architecture)
  • Break mechanism: edge shaping that changes timing visibility, low-hold not propagated
  • Must-test: two-point probe + boundary preservation
Role: Mux / Switch
  • Transparency: typically transparent, but can be conditional under hot-switch states
  • Break mechanism: channel isolation timing, “stuck channel” that holds a line
  • Must-test: hold-through test per channel + bypass comparison
Role: Isolator
  • Transparency: often conditional (depends on how edges/low states are transferred)
  • Break mechanism: internal state replication and delay that distorts low-hold semantics
  • Must-test: two-point probe + boundary preservation + bus-idle detection across barrier
Role: Bridge (I²C↔SPI/UART/other)
  • Transparency: frequently non-transparent (queue/re-time/command staging)
  • Break mechanism: internal FIFO hides timing; SCL may be regenerated or decoupled
  • Must-test: hold-through + boundary preservation + “insertion changes SCL behavior” rule
Role: Slave
  • Transparency: source of low-hold; must produce stable low platform
  • Break mechanism: undefined behavior under brown-out or internal fault state
  • Must-test: repeatability under the same command; cross-device comparison on same bus

Scope boundary: this section focuses on transparency and validation, not device selection or pull-up design.

Topology traps: nodes that may break clock stretching transparency Master connects through optional nodes (buffer, mux, isolator, bridge) to slaves. Risk nodes are highlighted with red frames. Probe points before and after nodes are shown. Topology: where stretching can be swallowed Verify transparency by probing SCL before/after inserted nodes on the same transaction. Master Controller Probe A Buffer Transparent? Mux/Switch Channel? Isolator Conditional Bridge queue / re-time Auto-direction shifter dir latch / low-hold Probe B Slaves Dev A Dev B Dev C May swallow stretching Pass criteria: a slave low-hold must remain a low-hold at Probe A and Probe B for the same transaction.
Diagram: topology and insertion points that may break clock-stretch transparency (bridges and direction-latched nodes are highest risk).

H2-8 · Stretching vs electrical effects: how to avoid false positives

A true stretching event is a stable low-hold driven by a device on SCL. Electrical effects can imitate “long low time” through delayed threshold crossing, noise spikes, or reference shifts. The goal is to separate wire-held-low from measurement/edge artifacts.

Minimal mechanism set (only what matters for stretching vs false positives)
Delayed threshold crossing (slow rise)
The line is rising, but the logic threshold is reached later. Tools may report a longer “low” even without a device holding SCL down.
Noise / crosstalk spikes
Short negative spikes or ringing can be decoded as extra low segments. These are usually narrow and not boundary-aligned.
Ground bounce / common-mode shift
Reference movement changes comparator thresholds. The “low duration” can vary with probe reference method and load switching.
Deliverable: false-positive checklist (3 waveform points + 2 control experiments)
Waveform point 1 · low platform stability
True stretching shows a clean, stable low platform. Electrical artifacts often show ringing/spikes during “low”.
Waveform point 2 · edge shape (rise-time)
Slow rise produces a long slope through the threshold region. A long slope can mimic “low extended” in decoded timing.
Waveform point 3 · boundary alignment
True stretching is commonly correlated with ACK/byte boundaries. Random alignment suggests noise or non-transparent links.
Control experiment A · near vs far (same transaction)
Probe SCL near the controller and near the slave. A real low-hold should appear as a low-hold at both locations. A rise-time artifact usually changes with distance.
Control experiment B · bypass / simplify the chain
Temporarily remove the suspected node or shorten the link. True stretching keeps a consistent signature; electrical artifacts often shrink or disappear.

Scope boundary: this section explains false-positive discrimination, not pull-up sizing or full SI/EMC design.

True stretching vs slow rise-time false positive Left panel shows a stable low hold with a flat platform; right panel shows a slow rise slope causing delayed threshold crossing. Near and far probe hints shown. Waveform comparison (minimal labels) True stretching: stable low-hold. False positive: delayed threshold crossing from a slow rise. True stretching Slow rise-time false positive Vth Vth stable low hold threshold crossing delayed near far near far A real low-hold should be visible as a low-hold at both probe locations; edge artifacts often vary strongly with distance.
Diagram: “true stretching” produces a stable low platform; “slow rise-time” shifts threshold timing and can mimic extended low time.

H2-9 · Debug & instrumentation: what to capture and how to trigger

Debugging clock stretching is less about “seeing SCL low” and more about capturing the first failing transaction and aligning it with command context and timeout semantics. The workflow is: TriggerTagCorrelateDecide bucket.

Deliverable: Minimum debug bundle (6 fields that classify the failure)
ts addr (R/W) reg/cmd len + dir timeout bucket attempt#

These six fields tie together transaction context (address/command/length), policy outcome (timeout bucket), and repeatability (attempt#), enabling fast mapping to “true stretch vs compatibility break vs electrical false positive”.

Logic analyzer capture recipe (focus: the first failing transaction)
Trigger
  • SCL low > X (threshold placeholder for “unexpected hold”).
  • Pre-trigger buffer enabled to preserve the lead-in bytes.
  • Stop after first hit to avoid later retries contaminating context.
Filter
  • Filter by addr first (target device).
  • Then filter by reg/cmd (the command that triggers the slow path).
  • Capture both read and write variants for comparison.
What to mark
  • Transaction boundaries (START/STOP, repeated START).
  • Byte boundaries (address, ACK, each data byte).
  • The first point where SCL stays low beyond expectation.
Firmware logging schema (make timeouts actionable, not just “failed”)
Per-transaction log line
  • ts, addr, reg/cmd, len, dir
  • timeout bucket: T_STRETCH_MAX / T_TRANSACTION_MAX / BUS_STUCK
  • attempt# and backoff tier
  • result: OK / NACK / ARB_LOST / BERR / TIMEOUT
Correlation hooks
  • Include a monotonically increasing txn_id to match logs with analyzer captures.
  • Record the maximum observed SCL low-hold duration if available (optional).
  • Record whether bus was idle before/after (optional exit criteria hook).
Signals to capture (stretching-focused minimum set)
Must-capture
SCL SDA
Strongly correlated (when present)
VDD (slave) RESET VDD (bridge)
Anti-false-positive hook

Capture SCL near the controller and near the slave on the same transaction to detect non-transparent nodes and rise-time artifacts.

Debug pipeline: Trigger → Tag → Correlate → Decide bucket Pipeline diagram showing capture trigger, tagging first failing transaction, correlating with firmware logs, and mapping to failure buckets. Capture-to-conclusion pipeline Goal: preserve the first failing transaction and align it with policy buckets. Trigger SCL low > X Tag addr / reg / len Correlate txn_id / ts Bucket classify Decision buckets (minimal set) True stretch Non-transparent node Electrical false positive Slave internal latency bucket Master policy mismatch Minimum debug bundle + first-hit capture reduces “heisenbugs” caused by retries and recovery side effects.
Diagram: use a single pipeline to align analyzer triggers and firmware logs, then map the failure into a minimal set of actionable buckets.

H2-10 · Recovery playbook: from stretch timeout to hung-bus recovery

Recovery should be an ordered escalation ladder, not an unlimited wait or blind retry loop. Each step defines a goal, a risk, an escalation condition, and a pass criterion to exit cleanly.

Deliverable: Recovery ladder (actions → risks → escalation → pass criteria)
L1 End transaction · backoff · retry
  • Goal: recover from transient “busy” windows without altering bus ownership.
  • Risk: repeated retries can hide the first failure and increase latency.
  • Escalate when: N_RETRY exceeded or repeated timeouts on the same command.
  • Pass criteria: transaction completes and bus returns to idle.
L2 GPIO release · clock pulses · STOP
  • Goal: forcibly unwind a stuck slave state machine and return the bus to idle.
  • Risk: multi-master conflicts or double-driving if controller/GPIO ownership is not exclusive.
  • Escalate when: SCL/SDA do not both return high after pulses/STOP.
  • Pass criteria: bus-idle holds and a health probe transaction succeeds.
L3 Segment isolate · power-cycle · reset
  • Goal: recover from hard faults by isolating a stuck segment or resetting the offender.
  • Risk: device state loss, partial configuration, or business-impacting resets.
  • Escalate when: L2 fails, or the bus is held low indefinitely (hung-bus).
  • Pass criteria: bus-idle holds + health probe passes + event is logged with recovery level.
Recovery exit criteria (do not return to normal operation without these)
Exit 1 · bus idle holds

SCL=HIGH and SDA=HIGH continuously for ≥ X (placeholder). This prevents returning to a half-recovered state.

Exit 2 · health probe transaction

Perform one safe read/write probe (e.g., status/ID register or a known side-effect-free access) and require a clean ACK path.

Exit 3 · event accounting

Record recovery level (L1/L2/L3), timeout bucket, attempt#, and whether bus-idle and health probe passed. This supports trend analysis.

Recovery ladder: from timeout to hung-bus recovery Stair-step diagram with three levels: retry/backoff, GPIO release with clock pulses and STOP, and segment isolation or reset. Includes pass criteria box. Recovery ladder (escalation) Escalate only when the previous level fails its pass criteria. L1 Retry / backoff end txn · delay · retry L2 GPIO release clock pulses · STOP exclusive bus ownership L3 Isolate / reset segment off · power-cycle highest risk level still timeout bus not idle Pass criteria (exit conditions) bus idle ≥ X health probe OK Record the recovery level and outcome to prevent recurring “recover but fragile” systems.
Diagram: a controlled escalation ladder with explicit exit criteria (bus idle + health probe) to avoid partial recovery states.

Engineering Checklist (Design → Bring-up → Production)

This section turns repeated-start + page-write requirements into a practical, tick-box checklist across the full lifecycle. Each item should have evidence (capture/log) and a bounded failure action (timeout + cleanup).

Design checklist (datasheet → firmware rules)

  • Freeze datasheet fields (per EEPROM): page size = X bytes, address width = X-bit, tWR(max) = X ms, busy behavior supports ACK polling (Yes/No), write-protect pin behavior (WP / hardware block protect).
  • Write chunking rule (mandatory): one page-write transaction must never span a page boundary; payload_len ≤ page_remaining.
  • Write completion rule (mandatory): page write ends with STOP to trigger internal programming, then poll until ACK or timeout.
  • Polling budget (bounded): poll_interval = X, poll_max = X, total_wait ≤ X ms. Persistent NACK outside the post-write window must escalate (not treated as “busy forever”).
  • Read termination rule (mandatory): burst read ends with NACK(last) then STOP/bus-free.
  • Failure cleanup rule (mandatory): any error branch must end in STOP (or controller-defined bus-free), and must be time-bounded (no infinite waits).

Concrete EEPROM material examples (fill X fields from datasheets)

Microchip 24LC256 / 24AA256 / 24FC256 · STMicro M24C64 / M24M02 · onsemi CAT24C256 · ROHM BR24G256 (verify package/suffix/availability)

Bring-up checklist (analyzer evidence)

  1. Sr placement: appears after pointer/register bytes and before the read address phase.
  2. Re-address after Sr: Sr is immediately followed by ADDR + R/W.
  3. NACK(last): the final read byte is NACK, then STOP/bus-free.
  4. STOP after page write: payload ends with STOP before polling begins.
  5. Busy window looks sane: post-write polling shows NACK(s) transitioning to ACK within X_timeout.
  6. No stuck-low after cleanup: SDA/SCL return high after STOP; no long low stretches.

Worst-case bring-up exercises

  • Full-page writes: write exactly page_size bytes repeatedly, then verify readback.
  • Boundary stress: attempt a cross-page payload in test code and confirm firmware chunks it (no wrap/overwrite).
  • Power-fail rehearsal: interrupt power during a write and validate boot-time detection + recovery policy.

Production checklist (BIST + counters)

Minimal BIST sequence (recommended)

  • Write pattern → STOP (trigger tWR) → ACK poll until done → Readback → CRC/compare → Log counters.
  • Use at least: one full-page write, one multi-page (chunked) write, and one random-read verification pass.

Required counters (minimum set)

  • poll_count_avg / poll_count_max (distribution matters more than a single average)
  • poll_wait_ms_avg / poll_wait_ms_max
  • nack_count_total (optionally classify: busy-NACK vs unexpected-NACK)
  • timeout_count
  • readback_error_count (compare/CRC mismatch)

Pass criteria (placeholders; tune per product)

  • poll_max < X ms (derived from tWR(max) + margin)
  • timeout_rate < X ppm (per lot / per shift)
  • readback_error = 0 (or < X ppm if rework policy allows)
Production test flow (Write → STOP → Poll → Readback → Compare → Log) Evidence: NACK window then ACK; counters capture distribution, not just a single pass/fail. WRITE pattern STOP trigger tWR ACK POLL NACK… → ACK READBACK COMPARE CRC / match LOG counters PASS? PASS FAIL timeout / mismatch Yes No
Production flow: STOP triggers tWR, polling confirms completion, readback validates data, and counters capture reliability trends.

Applications & IC Selection Notes (Strictly relevant to Sr + Page Write)

Only use-cases that depend on combined transactions (write pointer → Sr → read) and EEPROM page-write behavior are listed here. Selection notes focus on the minimum required device/controller capabilities for stable, bounded operation.

Applications (Sr + page write strongly relevant)

1) EEPROM parameter storage (config / serial / calibration)

Templates: Random read: S → AddrW → WordAddr → Sr → AddrR → Data… → NACK(last) → P · Page write: S → AddrW → WordAddr → Payload… → P → Poll(NACK…ACK)

Material examples: Microchip 24LC256 / 24AA256 · ST M24C64 / M24M02 · onsemi CAT24C256 · ROHM BR24G256 (verify suffix/package).

2) Sensor register reads (burst reads, multi-register, FIFO drains)

Template: S → AddrW → Reg → Sr → AddrR → Data… → NACK(last) → P (Sr keeps the combined transaction semantics and avoids relying on vendor-specific “pointer retention after STOP”.)

Material examples: Bosch BME280 · TI TMP117 · Microchip MCP9808 · TI INA219 (verify address map + auto-increment behavior).

3) PMIC / clock / system-control register access (status read, fault clear)

Template: write register address → Sr → read status (burst if needed), then NACK(last) + STOP.

Material examples: Silicon Labs Si5351 · Microchip MCP7940N · NXP PCF8563 · TI TPS65217 (verify register pointer behavior).

IC selection notes (capabilities that matter for Sr + page write)

EEPROM requirements (page write stability)

  • Page size + wrap behavior: firmware chunking must match device page boundaries.
  • tWR(max) and busy behavior: polling and timeout budgets must be based on worst-case datasheet limits.
  • Address width: 8/16-bit word address impacts the pointer phase length and templates.
  • Write protection: WP pin / block protection should align with field update policy.

Examples: Microchip 24LC256 / 24AA256 / 24FC256 · ST M24C64 / M24M02 · onsemi CAT24C256 · ROHM BR24G256 (verify ordering codes).

MCU/SoC I²C controller requirements (combined transactions + bounded cleanup)

  • Repeated START support: must generate Sr reliably (capture must match the template).
  • Last-byte NACK control: hardware/driver must guarantee NACK(last) at the end of burst reads.
  • Timeout + stuck detection: bounded waits and forced cleanup (no infinite busy loops).
  • Error visibility: interrupts/counters for NACK/timeout/bus errors are strongly preferred for production telemetry.

Concrete MCU examples (controller capability must be confirmed by TRM + capture): ST STM32G031K8 · ST STM32L052C8 · NXP LPC55S16 · Microchip ATSAMD21G18A · TI MSP430FR2355.

“Bus health” telemetry (optional, production-friendly)

  • Prefer platforms that can log NACK bursts, timeout counts, and controller reset counts.
  • Treat capture evidence as truth: verify actual Sr/STOP behavior on the wire, not only via driver API assumptions.
Selection decision mini-flow (strictly Sr + page write) Output: EEPROM critical specs + I²C controller critical capabilities + telemetry hooks Need combined read? write pointer → Sr → read Controller: Sr support NACK(last) control Timeout + cleanup Evidence capture matches template Need EEPROM page write? STOP triggers tWR; poll until ACK EEPROM: page size EEPROM: tWR(max) Busy ACK polling Telemetry poll / NACK / timeout counters and
Mini-flow: select EEPROM based on page-write behavior (page size, tWR(max), polling), and select controller based on Sr/NACK(last)/timeout behavior verified by capture.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13 · FAQs (clock stretching troubleshooting)

These FAQs only close long-tail troubleshooting. Each answer is intentionally short and actionable with measurable pass criteria (X placeholders).

SCL low holds for ~X ms, but only on one register read—why? Signature-locked stretching usually maps to a specific internal “slow-path” bucket.
Likely cause That specific register read triggers a slow internal path (e.g., sensor conversion, FIFO refill, NVM access, CDC sync), so the slave holds SCL low until data is ready.
Quick check Repeat the same read X times and confirm the stretch aligns to a consistent byte boundary (address/ACK vs data/ACK). Then run a “nearby” register read: if only one register stretches, it is command-triggered.
Fix Redesign access as “kick + later read” (two short transactions), or use INT/DRDY when available. If stretching must remain, bound it with T_STRETCH_MAX and log per-command stretch histogram.
Pass criteria For that register: stretch_low_hold ≤ X ms (p99), timeout_rate ≤ X per 10k reads, and retries do not increase stretch_count beyond X%.
Works on bench, fails with a bridge/extender—first transparency check? Many bridges/extenders are not timing-transparent; prove it with probe A/B.
Likely cause The inserted bridge/extender buffers or re-times SCL/SDA so slave-held SCL low is not propagated (or is reshaped), causing the master to see bus error/timeout.
Quick check Use a known “stretching command” and probe SCL on both sides of the node (A: master-side, B: slave-side). If slave-side shows a flat low-hold but master-side does not, transparency is broken.
Fix Replace or reconfigure the node to a mode documented as clock-stretch transparent, or redesign to avoid stretching (INT/DRDY or kick+later-read). Keep T_TXN_MAX bounded regardless.
Pass criteria Under the same command: A/B probes match (stretch visible on both sides), timeout_rate ≤ X per 10k, and recovery_count ≤ X per 1k transactions.
Timeout triggers, but retry makes it worse—what state-machine bug is common? The most common failure is “retry without cleanup,” which compounds bus/device state.
Likely cause Retry is not idempotent: controller state (BUSY/flags/FIFO/DMA) is not reset, STOP is not issued, or the “slow command” is re-fired while the slave is still busy.
Quick check Log a “retry trace” of attempt# with bus-state snapshots (BUSY flag, error flags, FIFO level, STOP sent). If attempt#2 fails faster or with a new error bucket, cleanup is missing.
Fix Make retry idempotent: abort DMA, clear controller flags, flush FIFOs, issue STOP (or bus reset sequence), enforce backoff, and only re-issue the command after a verified bus-idle + optional health probe.
Pass criteria After any timeout: cleanup completes in ≤ X ms, bus-idle is true for ≥ X µs, and retry success_rate ≥ X% without increasing recovery_count.
Looks like stretching, but SCL waveform is sloped—what’s the fastest discrimination? True stretching shows a flat low-hold; RC/threshold artifacts show a ramp.
Likely cause Slow rise-time (RC) or threshold-crossing delay is extending “apparent low time,” not a slave holding SCL low.
Quick check Compare SCL at the controller pin vs the far node on the same transaction. True stretching has a stable low plateau; RC artifact shows a ramp with delayed threshold crossing and often varies with probe point.
Fix Improve edge margins (reduce effective bus C, adjust pull-up, avoid over-slow edge shaping). Then re-run the same command signature to verify timing stability.
Pass criteria Measured tR ≤ X (per mode budget) and “low-hold” duration variance ≤ X% across X repeats; timeout_rate ≤ X per 10k.
After timeout, bus stays busy forever—what’s the correct recovery ladder? Recovery must escalate from soft abort to forced bus release with measurable exit criteria.
Likely cause The transaction aborted without releasing the bus (missing STOP), or a slave remains in a partial state holding SDA/SCL low (hung-bus).
Quick check Immediately after timeout, sample SCL and SDA levels. If either is low for ≥ X µs, the bus is not idle and a forced release ladder is required.
Fix Ladder: (1) abort + STOP + backoff; (2) switch to GPIO and clock out 9 pulses, then STOP; (3) isolate segment / reset node / power-cycle the affected branch (if available). Always finish with a health probe read/write.
Pass criteria Bus-idle holds (SCL=H and SDA=H) for ≥ X µs, recovery completes within ≤ X attempts, and the health probe succeeds with error_rate ≤ X.
Two slaves behave differently—how to classify latency bucket quickly? Use a signature table: position + direction + command correlation.
Likely cause Different slaves map to different internal latency buckets (conversion vs NVM vs wake-up vs protection), even on the same bus and same master policy.
Quick check For each slave, capture: (a) where stretching occurs (addr/ACK vs data/ACK), (b) read vs write correlation, (c) command correlation (specific reg/cmd). Build a quick histogram of low-hold durations.
Fix Apply bucket-appropriate mitigation: replace “read-until-ready” with INT/DRDY, split slow ops (kick+read), or increase per-command budget while keeping global T_TXN_MAX bounded.
Pass criteria Per slave: p99 stretch_low_hold ≤ X ms (or documented per-command X), and timeout_rate ≤ X per 10k with stable histogram shape across X runs.
Stretch count rises at cold/hot—what to log first? Temperature sensitivity is often indirect; log the minimum fields to avoid guesswork.
Likely cause Cold/hot changes internal latency (conversion time, NVM write timing, wake latency) and can also reduce edge margin (rise-time, thresholds), increasing apparent stretching and timeouts.
Quick check For each timeout/stress window, log: temperature, bus speed, command signature (addr/reg/dir/len), stretch_low_hold (duration), timeout bucket, and recovery result. Compare p95/p99 across cold vs hot.
Fix Separate “real latency” vs “edge artifact”: confirm waveform plateau vs slope. If real latency increases, adjust per-command budget while keeping T_TXN_MAX bounded; if edge margin is the limiter, fix rise-time and noise susceptibility.
Pass criteria Across temperature corners: timeout_rate ≤ X per 10k, p99 stretch_low_hold ≤ X ms (or bucket-specific X), and recovery_success ≥ X% within ≤ X attempts.
DMA I²C occasionally locks—where to put watchdog hooks? DMA needs progress-based watchdoging, not just a global timer.
Likely cause DMA completion or I²C controller IRQ is missed, leaving the driver in a waiting state; without a non-blocking state machine, a stretch or bus glitch can freeze the pipeline.
Quick check Add “progress markers” to log: state transitions, DMA submit, DMA done, STOP sent, and bus-idle reached. If progress stops for ≥ X ms without a state change, the hook point is missing.
Fix Implement a non-blocking driver: watchdog each wait-state (WAIT_SCL / WAIT_DMA / WAIT_STOP), abort-and-cleanup on timeout, and always end with a verified bus-idle + optional health probe.
Pass criteria No stuck state for ≥ X hours of stress: watchdog recovers within ≤ X ms, and recovery_count ≤ X per 10k transactions with health probe success ≥ X%.
Oscilloscope shows SCL released but analyzer still decodes “clock held”—why? Tool thresholds, sampling, or probe point mismatch can create a decode illusion.
Likely cause The analyzer’s threshold/sampler sees SCL below its logic-high threshold (slow rise-time/noise), or it is attached at a different node than the scope (far node still low).
Quick check Put scope and analyzer on the same node; set analyzer threshold to match measured VIH; increase analyzer sample rate; compare the time SCL crosses the threshold vs the scope waveform.
Fix Align measurement conditions (node, threshold, sampling). If the root is edge margin, fix rise-time/noise; if the root is topology, re-probe near/far to find the node that remains low.
Pass criteria Analyzer decode matches scope at the same node; measured tR/tF meet budgets (≤ X), and “clock held” false positives drop to ≤ X per 10k.
If you must cap stretching, what’s a safe timeout policy? Use layered limits: per-hold, per-transaction, and bounded retry with cleanup.
Likely cause An unbounded wait treats stretching as “infinite busy” and causes deadlocks; an overly tight timeout causes false failures on legitimate slow-path commands.
Quick check Measure p95/p99 stretch_low_hold for the worst-latency command at temperature corners, then set T_STRETCH_MAX with margin; separately bound total transaction time with T_TXN_MAX.
Fix Policy template: T_STRETCH_MAX (per-hold), T_TXN_MAX (end-to-end), N_RETRY with backoff; on timeout, always cleanup (STOP/abort/flush) then optional health probe before retry.
Pass criteria Under worst-case command: p99 stretch_low_hold ≤ T_STRETCH_MAX, p99 txn_time ≤ T_TXN_MAX, timeout_rate ≤ X per 10k, and no deadlock for ≥ X hours stress.
How to prove production readiness for stretching behavior? Production readiness is counters + thresholds + reproducible worst-case loop.
Likely cause Without standardized counters and a worst-case stimulus, stretching regressions appear as “random” field failures and cannot be screened in production.
Quick check Run a production BIST loop that triggers worst-latency commands for X iterations and record stretch_count / timeout_count / recovery_count plus per-bucket classification.
Fix Standardize: policy parameters (T_STRETCH_MAX/T_TXN_MAX/N_RETRY), a golden waveform reference for bring-up, and production thresholds with automated logging/reporting.
Pass criteria Over X-loop BIST: timeout_rate ≤ X, recovery_count ≤ X, and post-recovery health probe success ≥ X%; transparency matrix is signed for all inserted nodes.
What’s the minimum debug bundle to send to a vendor? A small, consistent bundle shortens vendor triage cycles and avoids back-and-forth.
Likely cause Vendor support stalls when key context is missing (signature, bucket, topology, and recovery results), even if a waveform exists.
Quick check Confirm the bundle contains exactly the minimum fields below, plus one capture of the “first failing transaction” (not only a late-stage failure).
Fix Send: (1) bus speed/mode, (2) topology nodes list (buffer/mux/isolator/bridge), (3) command signature (addr+reg/cmd+dir+len), (4) stretch_low_hold duration + where it occurs (phase/byte boundary), (5) timeout bucket + attempt#, (6) recovery action + outcome; attach one scope/LA capture with node location noted.
Pass criteria Vendor can reproduce/triage within X iterations: the first-failing signature is repeatable (≥ X%), and bucket classification remains consistent across X runs.