Robustness for I2C/SPI/UART: ESD, Surge, EMI, Hot-Plug, UVLO
← Back to: I²C / SPI / UART — Serial Peripheral Buses
Robustness means serial ports should survive stress (ESD/surge/EMI/hot-plug/UVLO), keep operating without false actions or link drops, and recover automatically with measurable pass criteria.
This page turns datasheet ratings into board-and-system acceptance gates using practical protection stacks, return-path governance, and recovery hooks validated by counters, logs, and repeatable tests.
Definition & Scope Guard: What “Robustness” means here
Definition (board-level, serial-port focused)
Robustness is the ability of a serial bus port to survive stress (no permanent damage), operate correctly under disturbance (no false triggers, drops, or protocol corruption), and recover to a known-good state after an event (deterministic restoration without manual rework).
This page treats robustness as a system property spanning component ratings, board implementation (placement/return path), power sequencing, and firmware recovery policy.
Failure triad: Damage / Upset / Hang
- Permanent or cumulative degradation (leakage/threshold drift)
- Often survives once but worsens over repeated stress
- Confirm with post-event param checks and trend logging
- Transient errors (NAKs/CRC/framing) correlated with disturbance
- Self-recovers when stress stops
- Mitigate via edge control, filtering, and better return paths
- State-machine lock (stuck line, wedged peripheral, stuck break)
- Persists after the event; requires bus-clear/reset sequencing
- Prevent with timeouts + deterministic recovery hooks
Fast classification rule: if the symptom persists until a forced recovery step (timeout/bus-clear/reset), treat it as Hang; if it disappears with the disturbance, treat it as Upset; if it grows worse over repeats, treat it as Damage.
Scope Guard (to prevent topic overlap)
- ESD / Surge / EFT: current paths, clamp strategy, placement rules
- EMI emission: edge/loop/return-path levers without breaking timing
- EMI immunity: coupling paths to thresholds, clocks, and rails
- Hot-plug & UVLO: back-powering, sequencing hazards, safe states
- Verification: bring-up gates, production monitors, pass criteria schema
- Protocol-specific timing deep dives (I²C pull-up RC math, SPI CPOL/CPHA waveforms, UART baud derivations)
- Product catalogs for level shifters/isolators (only robustness budgets and failure modes)
- Generic EMC textbook theory unrelated to serial-port implementation
Protocol pages should link here for robustness strategy; this page links back to protocol pages only for bus-specific timing details.
Robustness Metrics & Acceptance Philosophy: specs → system pass
Metrics hierarchy: Component rating → Board level → System level
- Datasheet ESD/Surge classes and I–V clamp behavior
- Leakage, capacitance, dynamic resistance vs stress
- Useful for screening—but not a system guarantee
- Placement and return path decide real current flow
- Edge-rate shaping changes emission and immunity
- Power sequencing prevents back-power and latch hazards
- Cables, chassis bond, and earth reference dominate outcomes
- Hot-plug frequency and environment amplify degradation risk
- Acceptance must be defined in observable system behavior
A port can meet component-level ratings and still fail in the field if the board current path and system return path are not governed by design.
Standardized acceptance schema (for every stress type)
Use a single measurement grammar across ESD/Surge, emission, immunity, hot-plug, and UVLO. This prevents “passed once” ambiguity and makes results comparable across boards and revisions.
Practical rule: if a stress “passes” without a defined observable and recovery, the result is not actionable for system acceptance.
Acceptance philosophy: Bring-up gate vs Production gate
- Verify protection path and placement are correct
- Prove no hang states without deterministic recovery
- Confirm margins with controlled setups and repeatability
- Track stats: retries/NAKs/CRC, hang time, recovery counts
- Detect slow degradation (heat, leakage, clamp aging)
- Close the loop: logs → root cause → layout/stack updates
A robust port is accepted only when it passes both gates: it is correct by design and also stable by statistics.
ESD Ratings & Real-World Translation: HBM/CDM vs IEC
Decision: what ratings can (and cannot) guarantee
- Chip-level robustness for handling and assembly risk
- Useful for screening internal clamp strength and leakage stability
- Not a promise of port stability under system discharge conditions
- Port-level current path governance under defined fixtures and coupling
- Determines whether the port survives, operates, and recovers
- Outcome depends heavily on return path, cable coupling, and chassis bond
Practical translation: treat HBM/CDM as chip survivability, and IEC as system path + behavior. A robust port must demonstrate measurable observables and deterministic recovery under IEC-style events.
Mechanism: IEC outcomes are determined by discharge and return paths
Common weak spots (where IEC current bypasses protection)
- Unprotected trace length between connector and first clamp (the “exposed window”).
- Rail injection: clamp current dumped into VDD/GND causes rail bounce and UVLO chatter.
- Ground bounce: return path crosses sensitive reference ground and shifts thresholds.
- IO clamp conduction during partial power states, enabling back-power and wedged logic.
Verify: translate “ratings” into system acceptance (one grammar)
- Level: ±kV, contact/air; repetition and interval
- Hit matrix: connector shell, signal pins, nearby metal
- Boundary conditions: cable attached, chassis bond, earth state
- Protocol stats: NAK/CRC/framing counts, link drops
- Hang detection: bus-stuck duration, wedged-state flags
- System logs: reset reason, PG/UVLO flags, error IRQ counters
- Classify: automatic (timeouts/retry/bus-clear) vs forced (reset/power-cycle)
- Measure: recovery time window (ms/s) and post-recovery stability time
- After X hits, within Y minutes, error rate ≤ Z
- No cumulative degradation: leakage trend, rail bounce sensitivity, recovery time growth
A “pass” without defined observables and recovery is not actionable. The objective is repeatable, logged behavior under a clearly defined stress envelope.
ESD Protection Network Design: low-C, placement, clamps
Architecture: a 3-stage protection stack (roles are distinct)
The stack must be evaluated against three failure outcomes: Damage (energy and heat), Upset (false edges/frames), and Hang (wedged protocol states).
Selection metrics (data-driven, port-behavior focused)
- Lower capacitance preserves edge shape and reduces loading
- Large mismatch can convert differential disturbances into common-mode victims
- Validate with emission/immunity and timing margin, not by “C-only” choice
- Clamp voltage must be checked at the relevant peak current region
- Over-high Vclamp increases internal IO clamp conduction and latch risk
- Use curves rather than a single “typical” clamp number
- Higher dynamic resistance means clamp voltage rises faster with current
- Explains why same footprint, different vendor can change error behavior
- Evaluate alongside placement and return, not in isolation
- Leakage can bias levels and create false thresholds under temperature
- Can convert “survive” into “operate fails” (false edges/frames)
- Track post-event leakage trend to detect hidden degradation
Selection is complete only when the chosen protection preserves operation (no false triggers) and guarantees recovery (no wedged states), not merely “no damage”.
Layout hooks (placement and return path decide real behavior)
- Place first clamp at the connector; minimize exposed trace length
- Provide a short, low-impedance return to the correct reference (often chassis/earth)
- Use stitching vias near clamps to reduce loop area and ground bounce
- Keep rail-clamp return close to the rail decoupling reference
- Crossing split grounds in the clamp return path
- Routing protection returns through sensitive reference ground
- Letting clamp current inject into rails without a controlled second clamp
- Long, thin returns that turn ESD into ground bounce victims
A layout that “looks correct” can still fail if the return is not governed. Treat the clamp and its return as a single functional component: placement + path.
Surge / EFT / Cable Events (Long-Cable Reality)
Difference: why cable events behave unlike ESD
- Dominated by discharge path and return impedance
- Often exposes “exposed window” before the first clamp
- Can pass yet still leave hidden marginal recovery behavior
- Energy and repetition matter (thermal stress and aging)
- Common-mode injection via cable is the usual entry mode
- Rail lift and ground-potential difference are frequent root causes
Long-cable failures are typically not “timing problems.” They are path-and-energy problems: common-mode current enters at the cable boundary, shifts rails and references, and converts a stable link into resets, drops, or wedged states.
Injection model: how cable disturbance becomes board-level victims
Typical weak spots (high-yield failure mechanisms)
- Rail lift: clamp current dumped into VDD/GND drives UVLO chatter or brownout resets.
- Ground-potential difference: return chooses unintended paths and shifts receiver thresholds.
- Repeated absorption aging: TVS/CMC heating causes leakage and clamp drift over time.
- Chassis bond instability: common-mode energy is forced into logic reference ground.
Design strategy (robustness-only): suppress common-mode, clamp energy, govern return
- CMC to reduce incoming common-mode current peaks
- TVS to clamp surge energy before it reaches internal traces
- Chassis bond to provide a low-impedance return where energy belongs
- Long cable with large ground shifts (GPD) or harsh common-mode environment
- Field requirement: the link must operate or recover without manual intervention
- Budget: propagation delay and recovery behavior must remain deterministic
The objective is not “more parts.” The objective is controlled energy flow at the cable boundary: prevent common-mode conversion, avoid rail injection, and keep return currents out of sensitive reference ground.
Verify: acceptance grammar for cable events (one measurable template)
- Injection at cable entry (level, polarity, repetition)
- Cable length and shield state; chassis bond state
- Worst-case power states (warm start, partial power, sleep)
- Protocol stats: retries/NAKs/CRC/framing; link drops
- System logs: reset reason, PG/UVLO flags, error IRQ counters
- Hang detection: bus-stuck duration and recovery success rate
- Automatic: timeouts/retry/bus-clear within X ms/s
- Forced: reset/power-cycle must be rare and deterministic
- After X events, within Y minutes: error rate ≤ Z
- No cumulative degradation: leakage trend, thermal drift, recovery time growth
EMI Emission: Edge Control Without Breaking Timing
Physics: three knobs explain most emissions outcomes
Emission fixes must be coupled with a timing-safe verification gate: reduce edge energy and common-mode conversion without shrinking sampling margin below acceptable limits.
Engineering levers: action → risk → timing-safe check
- Source series-R / slew control
- Risk: margin loss
- Check: worst-case functional error counters
- Keep return adjacent and continuous
- Risk: plane splits create detours
- Check: near-field hotspot reduction
- Avoid return discontinuities
- Risk: cable becomes CM radiator
- Check: cable CM current trend
Bus-focused actions (emission-only, minimal to avoid topic overlap)
- Shape edge rate via pull-up and damping choices
- Keep return reference continuous to limit CM conversion
- Avoid stubs that enlarge loop area and radiation
- Control SCLK edge with source series-R
- Route SCLK with tight return to reduce loop radiation
- Preserve plane continuity to prevent CM conversion
- Limit edge rate using driver/slew configuration where available
- Treat cable shield termination as a CM control decision
- Add CM control at the cable boundary if the cable dominates radiation
Detailed timing budgets and protocol-specific constraints belong to the dedicated I²C/SPI/UART pages; this section is intentionally limited to emission levers and verification gates.
Verify gate: emissions improvement with no functional regression
- Peak reduction and hotspot reduction (near-field)
- Cable common-mode current trend
- Spectrum shift consistent with slower edges
- No increase in retries/NAKs/CRC/framing errors
- No new drops or wedged states under worst-case conditions
- Recovery time does not grow after changes
- Measurable emission improvement at target bands
- Error rate remains ≤ Z under worst-case usage
- No hang; recovery remains deterministic within X ms/s
EMI Immunity: Why “Looks OK on Scope” Still Fails
Failure entrances: three ways disturbances create errors without obvious waveform damage
Many immunity failures happen at the wrong observation point: the receiver’s threshold, the reference return, or the local supply impedance — not at a convenient probe location.
Symptom mapping: what “immunity failure” looks like on each bus
- False START/STOP, SDA “ghost edges”
- NAK bursts without obvious pin distortion
- Hung bus (stuck-low or wedged state)
- Bit flips or “fake toggles” on MISO
- Sampling window squeezed by noise
- CRC/frame errors concentrated in bursts
- Framing/parity error bursts
- False start-bit / break detection
- Garbage characters then self-recovery
Design actions (organized by entrance): filter, hysteresis, reference strategy, CM path control
- Receiver hysteresis / deglitch behavior
- Band-limit spikes with controlled filtering
- Edge-energy control with timing-safe verification
- Continuous return reference (avoid plane splits)
- Define where CM current is allowed to return
- Suppress CM conversion at connectors and transitions
- Local decoupling and supply impedance control
- Brownout/UVLO behavior must be deterministic
- Keep clamp currents out of sensitive rails
This section is intentionally limited to immunity mechanisms and acceptance gates; timing-budget math and protocol deep dives are handled on the dedicated bus pages.
Verify gate: immunity acceptance grammar (stress → observable → recovery → pass)
- Disturbance source + injection point (field/cable/ground)
- Level, dwell time, and repetition profile
- Worst-case operating states (traffic, sleep/wake, hot load)
- Errors: NAK/CRC/framing; false events; bus-stuck counters
- Drops, resets, watchdog trips, and state wedge duration
- Correlation with rail/ground/CM monitors
- Automatic recovery within X ms/s (timeouts, retry, bus-clear)
- No “power-cycle required” states under specified stress
- Within Y minutes: error rate ≤ Z (defined per bus)
- No cumulative fragility: recovery time and leakage do not trend worse
Hot-Plug & Inrush: Ghost-Powering, Latch-Up, Brownout
Event chain: plug event → inrush / reference bounce → IO ahead of VDD → back-powering → latch-up or wedged states
High-frequency failure modes during hot-plug
Design strategy: manage injection paths, gate unsafe states, and guarantee deterministic recovery
- Series resistance / current limiting to reduce injection
- Control diode/clamp paths to avoid feeding internal rails
- Define safe-state for unpowered IO (no undefined drive)
- Inrush control to prevent rail collapse and reference bounce
- UVLO gating to avoid chatter and partial-state lockups
- Deterministic power-good window before enabling bus activity
Robust hot-plug behavior requires both hardware gating (safe-state under uncertain contact order) and a deterministic recovery sequence (timeouts, re-initialize, re-enumerate).
Recovery and acceptance: plug/unplug must not create “power-cycle required” states
- Detect plug event → force safe-state gating
- Wait for power-good stable window
- Reset interface state → re-initialize → re-enumerate
- Re-enter normal operation with health counters cleared
- Stress: X plug cycles, cable length, speed, worst-case load
- Observe: reset reason, abnormal current peaks, drop/hang counters
- Recover: automatic within Y ms/s, no manual intervention
- Pass: no latch-up; no cumulative fragility after repeated cycles
UVLO / Brown-Out Behaviors: Safe States & Bus Recovery
The risk is not “power goes down.” The risk is IO behavior inside the brownout window.
A robust design defines deterministic IO behavior in the gray zone and guarantees an automated recovery path.
IO behavior matrix: behavior → risk → what to log
Common brownout traps: a slave holds I²C SDA low, an SPI slave wakes half-way and drives MISO unpredictably, or a UART TX line emits a false start-bit during threshold drift.
Recovery hooks: escalation ladder from soft timeout to deterministic reset sequencing
Verify gate: brownout must be recoverable and auditable
- VDD minimum and dwell in the gray zone
- Repetition count and spacing (burst vs sporadic)
- Worst-case traffic state (busy / idle / sleep-wake)
- Hung bus / frame loss / false start-bit counters
- Reset reason distribution and recurrence
- Abnormal current peaks and thermal flags
- Automated recovery within X seconds
- Escalation ladder used is recorded (L0/L1/L2/L3)
- Recovery ≤ X s for N repeats
- No cumulative fragility trend (errors and leakage do not worsen)
- No “power-cycle required” state under the defined stress
Grounding, Shielding & Isolation Strategy: Robustness Budget
Return-path governance: keep signal reference stable while routing common-mode energy to chassis/earth.
This section keeps the scope on robustness: how to route energy and references so serial buses keep operating, recovering, and avoiding cumulative fragility.
Shielding decisions: define what the shield is allowed to carry (and what it must not)
Isolation thresholds for robustness: delay budget, CMTI, and logic-hold under transients
Robustness budget and acceptance: energy routing must translate into measurable outcomes
- Common-mode step / cable coupling / ground potential difference
- Hot-plug related transients and shield current profiles
- Isolation CM transient profile (worst-case dV/dt)
- Error rate stays below the defined bus threshold
- Automatic recovery within X seconds (no manual intervention)
- No cumulative fragility (no worsening leakage or recovery time)
Bring-Up & Production Checklist (Test Hooks + Logging)
Convert robustness into a repeatable engineering workflow: a Bring-up Gate that closes design risks, and a Production Gate that detects drift and feeds back into fixes.
Output format is copyable checklists + data definitions: Stress input → Observable → Recovery → Pass criteria (threshold placeholders).
- First clamp placed at the connector (short lead + short return).
- Clamp return goes to the intended reference (chassis corridor vs quiet ground).
- Series element footprint exists (series-R / ferrite) for edge damping.
- Second clamp to rail has a defined rail sink path (no “floating rail”).
- ESD passes once but later becomes “fragile” (cumulative leakage or clamp heating).
- Transient causes hung I²C (SDA low), SPI desync, or reset storms.
- Unexpected current spike during events (clamp conduction / back-powering).
- Return path is continuous (avoid crossing splits with fast edges).
- Source series-R footprint exists on SCLK / UART TX (tunable damping).
- I²C pull-ups are not “over-aggressive” (emission vs rise-time balance).
- Shield continuity is verified at the connector (bond points are intentional).
- EMI improves but timing margin collapses (over-slowed edges, distorted sampling window).
- Hot spots appear near SCLK/UART transitions (common-mode conversion via return discontinuity).
- Inputs vulnerable to threshold jitter are protected (filter/RC, hysteresis where appropriate).
- Common-mode path is governed (CM energy is routed to chassis corridor, not to receiver reference).
- Supply injection is bounded (local decoupling + rail clamp path exists).
- Error signatures are measurable (counters + time windows are defined).
- I²C false START/STOP, SDA false toggles, or hung bus spikes.
- SPI sampling window squeeze causes bit flips without obvious amplitude collapse.
- UART framing/parity bursts aligned with disturbance exposure.
- Enable order is enforced: rails stable → reset release → bus enabled.
- Back-powering is observable (IO voltage higher than VDD window is logged).
- Bus recovery ladder is present: timeout → bus-clear/re-sync → reset sequencing.
- Inrush is bounded (load switch / ideal diode / UVLO gating as needed).
- After plug/unplug: I²C SDA stuck low, SPI slave half-awake, UART emits false start-bit.
- Reset storms or “power-cycle required” wedges during brownout windows.
- Ports become more fragile after repeated plug cycles (cumulative stress).
Test hooks (board-level): make failures observable and repeatable
- Shield/chassis bond test pad near connector (verify continuity and path).
- TVS return vias made probe-friendly (short loop for current probing).
- Rail clamp node test pad (observe rail lift and recovery).
- Inline series element footprints (0Ω/series-R/FB swap options).
- Header for logic/protocol analyzer with ground reference point.
- Reset/UVLO status capture (reset reason pin or register log).
Load switch: TI TPS22919, TI TPS22965
Ideal diode: TI LM66100
Current monitor: TI INA219, TI INA226
Shunt resistor: Vishay WSL0603R0100FEA (example)
MPNs are examples. Always validate voltage rating, capacitance, clamp behavior, package, and derating in the target port environment.
- I²C: NAK, arbitration lost, clock-stretch timeout, bus-clear invoked, hung-bus duration.
- SPI: CRC fails (if used), framing mismatch, desync events, retry count.
- UART: framing error, parity error, break detect, overrun, resync count.
- Per X minutes window (time-based stability).
- Per Y transactions window (workload-normalized).
- Per Z plug/power cycles window (event-normalized).
- Leakage trend after repeated events (port becomes “more fragile”).
- Recovery time trend (P95/P99 creeping upward).
- Error bursts aligned with plug/power windows.
Concrete example parts (MPNs) for robustness building blocks (verify for your rails, speed, and port environment)
Littelfuse SP0502BAHT, Littelfuse SP1003-01WTG
Semtech RClamp0524P
STMicroelectronics SMBJ6.0A (example family)
Vishay SMBJ5.0A (example family)
TDK ACM2012 series (CMC family example)
SPI/UART isolator: TI ISO7741, Analog Devices ADuM1401
Ideal diode: TI LM66100
Supervisor / reset: TI TPS3808 (example)
Ferrite bead family examples: Murata BLM18 series, TDK MPZ2012 series
Part selection is context-dependent: signal speed, rise-time budget, rail voltage, cable environment, chassis bonding, and repetition heating all matter. Treat the MPN list as a starting set and validate in your acceptance gates.
Recommended topics you might also need
Request a Quote
FAQs (Robustness: ESD/Surge/EFT, EMI, Hot-Plug, UVLO)
Format per FAQ is fixed and data-structured: Likely cause / Quick check / Fix / Pass criteria (threshold placeholders X/Y/Z).
Pass IEC ESD once, but the bus becomes “more fragile” later—what degradation check is fastest?
Quick check: Compare pre/post-event leakage (Vport→GND, Vport→VDD), track error counters per window, and check clamp temperature rise under repeated hits (trend, not single shot).
Fix: Increase energy margin (stronger clamp or better rail-sink path), shorten/clean the return loop to the intended reference, and add a small series element to limit peak current into the IC.
Pass criteria: ΔLeakage ≤ X µA; error rate ≤ X / 1k transactions over Y minutes; recovery ≤ X s; no upward trend after N events.
Same ESD array footprint, different vendor makes errors worse—what’s the first C/leakage sanity check?
Quick check: Measure rise/fall time and steady-state bias (idle level) at the receiver, then compare leakage at the intended bias voltage and temperature (e.g., 25°C vs 85°C).
Fix: Select a lower-capacitance, lower-leakage ESD array; add minimal series damping; re-verify pull-up/drive strength with the new parasitics.
Pass criteria: tR/tF within budget (≤ X ns); idle bias error ≤ X mV; error rate ≤ X / 1k over Y minutes.
ESD gun hits nearby metal, the bus hangs—return path issue or latch-up?
Quick check: Log reset reason, capture VDD_min and ground bounce during the strike, and check for abnormal supply current that persists after the event.
Fix: Improve chassis bonding and “short, direct” ESD return routing; add rail clamps/sink capacity; ensure IO injection is limited (series element + correct clamp placement).
Pass criteria: No hung-bus; no persistent overcurrent; automatic recovery ≤ X s for N strikes on nearby metal with stable error counters.
Cable hot-plug causes random resets—first check inrush path or back-powering path?
Quick check: Record VDD_min, brownout_dwell, and reset_reason during plug; also log “IO > VDD” window as a back-powering indicator.
Fix: Add soft-start/load switch, ideal-diode isolation, and gate bus enable until rails are stable; add IO series resistance to limit injection current.
Pass criteria: Reset count = 0 over N plug cycles; VDD_min ≥ X V; recovery ≤ X ms; no drift in counters across cycles.
Brownout doesn’t reset MCU, but peripherals lock—what’s the first UVLO behavior to verify?
Quick check: Capture bus line state across the brownout window (e.g., SDA low-hold time), log peripheral reset/POR status, and record whether recovery requires power-cycle.
Fix: Enforce a deterministic safe state: supervisor-gated enables, explicit peripheral reset sequencing, and a defined bus-recovery ladder (timeout → bus-clear/re-sync → reset).
Pass criteria: After the brownout profile, bus returns to idle and devices re-enumerate ≤ X s; hung time ≤ X ms; no manual power-cycle required across N repeats.
EMI test passes emissions, fails immunity—what coupling path is most common on serial ports?
Quick check: Correlate error bursts with common-mode current (clamp-on probe if available) and compare behavior with controlled chassis bond/return-path changes (A/B test).
Fix: Govern the common-mode return corridor to chassis; improve return continuity; add CM suppression at cable entry and bound supply injection with local decoupling/rail clamps.
Pass criteria: Under the immunity stress level, errors ≤ X / window; no persistent wedge; recovery ≤ X s; counters show no trend over N runs.
Scope shows clean edges, but immunity fails—what threshold/ground-bounce proxy should you log?
Quick check: Log time-stamped errors with VDD ripple, VDD_min, and a local reference proxy (receiver-side ground bounce or IO-to-local-ground delta).
Fix: Add hysteresis/filtering where valid, strengthen local decoupling and return governance, and reduce common-impedance coupling into the receiver reference.
Pass criteria: Error counter ≤ X / Y minutes; no hung state; recovery ≤ X s under the defined immunity exposure for N repetitions.
Adding series-R fixes EMI but breaks timing—what’s the “minimum viable damping” decision rule?
Quick check: Sweep series-R and measure timing margin proxy (edge rate + receiver-side sampling margin) while tracking error counters at the target bus speed.
Fix: Choose the smallest series-R that meets emission targets while maintaining ≥X% timing margin; keep a “tuning footprint” and avoid damping that forces operating near the threshold.
Pass criteria: Timing margin ≥ X%; tR/tF ≤ X ns; error rate ≤ X / 1k over Y minutes; EMI meets target without new retries.
ESD hits and UART starts framing errors—filtering issue or reference ground shift?
Quick check: Compare framing-error bursts with local ground movement and VDD disturbance; check whether errors reduce when the return/chassis bond is improved vs when RX filtering is adjusted.
Fix: Govern the common-mode return to chassis, improve reference stability, and apply minimal RX filtering that reduces spikes without violating baud timing margin.
Pass criteria: Framing/parity errors ≤ X / hour; no stuck state; recovery ≤ X s after N ESD events; no timing-margin regression.
Protection clamps heat up in surge tests—what’s the first energy accounting check?
Quick check: Calculate E = ∫V·I·dt from the measured surge waveform, log repetition rate, and measure clamp temperature rise vs time to detect cumulative heating.
Fix: Increase surge margin (higher power TVS, better rail sink, distribute energy), add series impedance, and ensure the return path targets chassis rather than sensitive ground.
Pass criteria: ΔT_clamp ≤ X °C at worst-case repetition; no leakage drift beyond X µA; no error-rate increase after N surges.
After hot-plug, only one slave is dead—what sequencing mistake is typical?
Quick check: Observe that slave’s rail ramp vs bus activity, detect “IO > VDD” injection windows, and log whether the device recovers with reset-only vs power-cycle.
Fix: Gate bus enable by per-rail power-good, add IO series resistance/limiting, and ensure deterministic reset sequencing after plug events.
Pass criteria: For N plug cycles, no dead slave; enumeration completes ≤ X ms; injection window ≤ X ms; no “power-cycle required” condition.
UVLO triggers oscillation—how to detect a “power-good chatter” artifact quickly?
Quick check: Log UVLO/power-good toggles and VDD around the threshold; detect chatter by counting toggles per second and correlating with error bursts.
Fix: Add hysteresis and a minimum hold-off (RC delay), enforce “rails stable → reset release → bus enable”, and block bus activity during unstable windows.
Pass criteria: Chatter toggles ≤ X (ideally 0) per event; single clean reset; recovery ≤ X s over N brownout profiles with stable counters.