UART Framing & Parity Errors Under Noise: Oversampling & Filters
← Back to: I²C / SPI / UART — Serial Peripheral Buses
This page turns UART framing/parity noise issues into an executable workflow: classify FE/PE into receiver decision points, then harden the link with oversampling, start-bit qualification/de-glitch, input conditioning, and safe resync policies.
The result is measurable robustness—lower FE/PE rates, shorter error bursts, and bounded recovery time under real board and system noise.
Problem definition & symptom taxonomy (FE/PE/noise)
This section converts “garbled bytes / missing bytes / sporadic failures” into a repeatable, engineering-first triage. The goal is to classify symptoms quickly and choose the first measurement that separates noise-induced receiver mis-detect from configuration/throughput issues—without expanding into baud-budget or PHY deep dives.
Scope guard: Covers FE/PE meaning, noise symptom patterns, first checks. Does not cover detailed baud error budgeting, frame-format selection strategy, or long-cable PHY design.
Minimal distinction (flags → first interpretation)
FE (Framing Error)
- Meaning: stop-bit check failed at the receiver decision point.
- First suspicion: false start → bit misalignment, or stop level pulled low by noise/glitch.
- First check: capture idle→start→stop waveform; look for short low glitches near stop.
PE (Parity Error)
- Meaning: parity check mismatched (computed vs received parity bit).
- First suspicion: parity mode mismatch OR single-bit flips caused by noise.
- First check: run a known pattern (walking-1/0 or PRBS); correlate PE to bit position/value.
OE (Overrun)
- Meaning: RX FIFO/register overwritten before software/DMA drained it.
- First suspicion: ISR latency, DMA configuration, or flow-control mismatch (not “noise” first).
- First check: inspect FIFO level/overrun counters; reproduce with lower baud or reduced bursts.
BREAK / long-low
- Meaning: line held low longer than a frame (receiver treats as break/abort).
- First suspicion: brown-out dragging IO, contention, hot-plug transient, or deliberate break.
- First check: measure low duration and correlate with power events and TX enable states.
Symptom map (translate “it fails” into first actions)
Symptom A: “FE spikes in bursts” (normally 0, then sudden clusters)
- Primary suspicion: false start events or stop-bit glitches caused by impulsive EMI/ground bounce.
- First check: trigger capture around the spike; look for short low pulses during idle/stop windows.
- Fast divider: if errors align with a system event (motor/relay/ESD), treat as coupling-path first.
Symptom B: “PE only on certain byte values”
- Primary suspicion: parity mode mismatch OR noise flipping specific bit positions near edges.
- First check: send walking-1/0 (bit-sweep) and compare PE distribution per bit position.
- Fast divider: deterministic “always wrong” points to configuration; probabilistic points to noise.
Symptom C: “Errors happen during idle” (no traffic)
- Primary suspicion: false start detection (idle is not stable) or input threshold chatter.
- First check: log false-start counts (if available) and observe RX pin idle stability + noise floor.
- Fast divider: add/start qualify or de-glitch and verify if “idle errors” collapse quickly.
Symptom D: “Garbled bytes without FE/PE” (parser fails, flags look clean)
- Primary suspicion: higher-layer framing mismatch, buffer overrun without flag visibility, or analyzer decode mismatch.
- First check: compare raw captured bits vs MCU flags; verify sampling point assumptions and decode settings.
- Fast divider: if logic analyzer decodes cleanly but MCU flags errors, sampling-window/qualify settings are suspect.
Minimum logging (makes noise issues reproducible)
- Counts: FE/PE/OE/BREAK counters with time windowing (per second or per N bytes).
- Context: supply state (UVLO/brown-out indicator), temperature band, and “event tag” (motor/relay/hot-plug).
- RX metrics (if supported): false-start rejects, majority-vote disagreement rate, or noise filter hit rate.
- Recovery: time-to-resync and discard policy used (byte drop vs frame drop).
Diagram intent: FE anchors to the stop-bit decision; PE anchors to the parity decision. OE/BREAK are included only as a minimal triage boundary.
What framing/parity errors really mean (receiver decision points)
UART errors are not “random.” Each flag corresponds to a specific receiver decision moment. Understanding where the decision happens turns vague symptoms into targeted checks: false start detection, sampling-window collapse, and stop/parity mis-judgment.
Receiver decision points (the only moments that matter)
- Start qualify: decide whether a low level is a valid start bit or a glitch.
- Sampling window: choose a sampling instant (often mid-bit; sometimes majority vote across sub-samples).
- Parity decision: compute expected parity from data bits and compare to received parity bit.
- Stop-bit decision: verify line is at the idle level during stop-bit check (framing integrity).
FE (Framing Error) = stop-bit decision failed
- Most common narrative: a false start or edge glitch shifts the perceived bit boundary, so the “stop check” samples inside data/low.
- Noise signature: clustered FE bursts aligned to impulsive events (switching, hot-plug, ESD, ground bounce).
- Fast confirmation: capture the stop window and look for brief low pulses or ringing crossing threshold.
PE (Parity Error) = parity decision mismatched
- Two dominant causes: parity mode mismatch (deterministic) OR single-bit flips from noise (probabilistic).
- Noise signature: PE correlates with specific bit positions or edge-sensitive transitions; may rise with EMI events.
- Fast confirmation: run a bit-sweep pattern and build “PE vs bit position” statistics.
Three common failure narratives (FE/PE combinations)
Case 1: FE + PE together (both spike)
Most consistent with false start or severe sampling-window collapse: the receiver locks onto the wrong bit boundary, then parity and stop checks both fail because the decision points are no longer aligned to the actual frame.
Case 2: FE dominates, PE is rare
Points to stop window disturbance: a short low glitch or ringing crosses threshold near stop sampling. The data bits may still be mostly correct, but stop validation fails under noise.
Case 3: PE dominates, FE stays near zero
Either a parity configuration mismatch (highly deterministic) or single-bit flips that do not disrupt stop validation. Pattern-driven statistics separates configuration from noise quickly.
What this implies next: false start → start qualification + de-glitch; sampling collapse → oversampling/majority vote; stop disturbance → input conditioning and edge control; parity mismatch → verify parity mode + pattern-based noise check.
Diagram intent: sampling points and qualify stages are the only places where short glitches become FE/PE. Later sections address oversampling, de-glitch filters, and resync policies.
Problem definition & symptom taxonomy (FE/PE/noise)
This section converts “garbled bytes / missing bytes / sporadic failures” into a repeatable, engineering-first triage. The goal is to classify symptoms quickly and choose the first measurement that separates noise-induced receiver mis-detect from configuration/throughput issues—without expanding into baud-budget or PHY deep dives.
Scope guard: Covers FE/PE meaning, noise symptom patterns, first checks. Does not cover detailed baud error budgeting, frame-format selection strategy, or long-cable PHY design.
Minimal distinction (flags → first interpretation)
FE (Framing Error)
- Meaning: stop-bit check failed at the receiver decision point.
- First suspicion: false start → bit misalignment, or stop level pulled low by noise/glitch.
- First check: capture idle→start→stop waveform; look for short low glitches near stop.
PE (Parity Error)
- Meaning: parity check mismatched (computed vs received parity bit).
- First suspicion: parity mode mismatch OR single-bit flips caused by noise.
- First check: run a known pattern (walking-1/0 or PRBS); correlate PE to bit position/value.
OE (Overrun)
- Meaning: RX FIFO/register overwritten before software/DMA drained it.
- First suspicion: ISR latency, DMA configuration, or flow-control mismatch (not “noise” first).
- First check: inspect FIFO level/overrun counters; reproduce with lower baud or reduced bursts.
BREAK / long-low
- Meaning: line held low longer than a frame (receiver treats as break/abort).
- First suspicion: brown-out dragging IO, contention, hot-plug transient, or deliberate break.
- First check: measure low duration and correlate with power events and TX enable states.
Symptom map (translate “it fails” into first actions)
Symptom A: “FE spikes in bursts” (normally 0, then sudden clusters)
- Primary suspicion: false start events or stop-bit glitches caused by impulsive EMI/ground bounce.
- First check: trigger capture around the spike; look for short low pulses during idle/stop windows.
- Fast divider: if errors align with a system event (motor/relay/ESD), treat as coupling-path first.
Symptom B: “PE only on certain byte values”
- Primary suspicion: parity mode mismatch OR noise flipping specific bit positions near edges.
- First check: send walking-1/0 (bit-sweep) and compare PE distribution per bit position.
- Fast divider: deterministic “always wrong” points to configuration; probabilistic points to noise.
Symptom C: “Errors happen during idle” (no traffic)
- Primary suspicion: false start detection (idle is not stable) or input threshold chatter.
- First check: log false-start counts (if available) and observe RX pin idle stability + noise floor.
- Fast divider: add/start qualify or de-glitch and verify if “idle errors” collapse quickly.
Symptom D: “Garbled bytes without FE/PE” (parser fails, flags look clean)
- Primary suspicion: higher-layer framing mismatch, buffer overrun without flag visibility, or analyzer decode mismatch.
- First check: compare raw captured bits vs MCU flags; verify sampling point assumptions and decode settings.
- Fast divider: if logic analyzer decodes cleanly but MCU flags errors, sampling-window/qualify settings are suspect.
Minimum logging (makes noise issues reproducible)
- Counts: FE/PE/OE/BREAK counters with time windowing (per second or per N bytes).
- Context: supply state (UVLO/brown-out indicator), temperature band, and “event tag” (motor/relay/hot-plug).
- RX metrics (if supported): false-start rejects, majority-vote disagreement rate, or noise filter hit rate.
- Recovery: time-to-resync and discard policy used (byte drop vs frame drop).
Diagram intent: FE anchors to the stop-bit decision; PE anchors to the parity decision. OE/BREAK are included only as a minimal triage boundary.
What framing/parity errors really mean (receiver decision points)
UART errors are not “random.” Each flag corresponds to a specific receiver decision moment. Understanding where the decision happens turns vague symptoms into targeted checks: false start detection, sampling-window collapse, and stop/parity mis-judgment.
Receiver decision points (the only moments that matter)
- Start qualify: decide whether a low level is a valid start bit or a glitch.
- Sampling window: choose a sampling instant (often mid-bit; sometimes majority vote across sub-samples).
- Parity decision: compute expected parity from data bits and compare to received parity bit.
- Stop-bit decision: verify line is at the idle level during stop-bit check (framing integrity).
FE (Framing Error) = stop-bit decision failed
- Most common narrative: a false start or edge glitch shifts the perceived bit boundary, so the “stop check” samples inside data/low.
- Noise signature: clustered FE bursts aligned to impulsive events (switching, hot-plug, ESD, ground bounce).
- Fast confirmation: capture the stop window and look for brief low pulses or ringing crossing threshold.
PE (Parity Error) = parity decision mismatched
- Two dominant causes: parity mode mismatch (deterministic) OR single-bit flips from noise (probabilistic).
- Noise signature: PE correlates with specific bit positions or edge-sensitive transitions; may rise with EMI events.
- Fast confirmation: run a bit-sweep pattern and build “PE vs bit position” statistics.
Three common failure narratives (FE/PE combinations)
Case 1: FE + PE together (both spike)
Most consistent with false start or severe sampling-window collapse: the receiver locks onto the wrong bit boundary, then parity and stop checks both fail because the decision points are no longer aligned to the actual frame.
Case 2: FE dominates, PE is rare
Points to stop window disturbance: a short low glitch or ringing crosses threshold near stop sampling. The data bits may still be mostly correct, but stop validation fails under noise.
Case 3: PE dominates, FE stays near zero
Either a parity configuration mismatch (highly deterministic) or single-bit flips that do not disrupt stop validation. Pattern-driven statistics separates configuration from noise quickly.
What this implies next: false start → start qualification + de-glitch; sampling collapse → oversampling/majority vote; stop disturbance → input conditioning and edge control; parity mismatch → verify parity mode + pattern-based noise check.
Diagram intent: sampling points and qualify stages are the only places where short glitches become FE/PE. Later sections address oversampling, de-glitch filters, and resync policies.
Noise coupling paths that create FE/PE (board & system reality)
FE/PE spikes are usually not “mystery UART behavior.” They are the receiver’s decision points reacting to energy coupled into the RX path. This section classifies failures by coupling path, so the first measurement targets the right chain (ground, supply/threshold, near-field, or impulsive events).
Scope guard: This section provides diagnostic entry points (what to measure first). It avoids deep PHY/RS-485 transmission theory and baud-budget math.
Coupling-path taxonomy (signature → first measurement → fast mitigations)
1) Ground bounce / return-path discontinuity
- Signature: clustered FE bursts aligned with switching; “idle errors” appear when nearby high di/dt toggles; stop window crosses threshold intermittently.
- First measurement: probe RX pin and reference ground near the RX pad; compare to a distant ground to expose ground delta and reference movement.
- Fast mitigations: enforce continuous return path; shorten RX return loop; add local ground stitching; reduce aggressor loop area near RX routing.
2) Conducted noise via supply / IO threshold (VDDIO/ground injection)
- Signature: FE/PE increases with load transients or DC/DC state changes; false-start rate rises while RX waveform looks “fine” relative to a moving threshold.
- First measurement: measure VDDIO ripple and ground at the RX domain; correlate error counters with supply events and threshold crossings.
- Fast mitigations: improve local decoupling; isolate noisy rails; add series impedance where appropriate; verify input hysteresis behavior at the IO domain.
3) Crosstalk / near-field coupling (neighbor aggressors)
- Signature: errors correlate with a specific neighbor activity (PWM, fast clocks); PE may correlate with bit transitions if coupling hits edges.
- First measurement: probe the neighbor aggressor and RX simultaneously; check timing alignment (edge-to-error correlation) and measure overshoot/ringing at RX.
- Fast mitigations: increase spacing; route with ground shielding; reduce edge rate at the aggressor; add small series-R to tame ringing at the victim input.
4) Impulsive events (motor/relay switching, hot-plug, ESD-like transients)
- Signature: short high-amplitude disturbances; FE burst dominates; stop-bit window shows brief low glitches or threshold crossings.
- First measurement: trigger on the event (relay coil, motor PWM edge, plug-in moment) and capture RX with a time-aligned window around the event.
- Fast mitigations: add suppression on the event source (snubbers/TVS where appropriate); ensure robust return; harden RX input with conditioning and filtering strategy.
5) Post-ESD / stress drift (threshold shift, leakage, edge distortion)
- Signature: system “passes once” but becomes fragile later; FE/PE rises without a clear new aggressor; idle level looks different across temperature/humidity.
- First measurement: compare RX pin idle level and edge shape before/after stress; look for increased leakage or shifted effective threshold behavior.
- Fast mitigations: re-check protection network loading; validate clamp capacitance/placement; confirm no partial damage or ghost-power paths affecting IO state.
“Only during a specific action”: EMI/coupling or clock/config?
- Event correlation: if errors align tightly with a motor/relay/hot-plug/ESD-like event, treat coupling-path first.
- Error morphology: bursts suggest impulsive coupling; value/bit-position correlation suggests edge-sensitive coupling or parity mismatch.
- A/B quick experiment: one fast change (series-R, re-route cable path, local decoupling, qualify/filter setting) should shift error rate by >X if coupling dominates.
See also (do not expand here): Baud Rate & Error Budget (clock drift/ppm) · Voltage Levels & PHY (long-cable/RS-232/RS-485) · Start-bit qualify & de-glitch (next section).
Oversampling fundamentals (8x/16x) & sampling window robustness
Oversampling improves noise tolerance by turning a fragile single decision into a more robust sampling window. FE/PE spikes often appear when the window collapses toward bit edges (where ringing, jitter, and short glitches are most harmful).
Sampling-window model (why mid-bit sampling is safer)
- Mid-bit is stable: it maximizes distance from edges where noise and ringing cross thresholds.
- Window collapse: if noise or reference movement shifts the effective crossing time, the “mid-bit” point drifts toward the edge and becomes error-prone.
- Receiver symptom: stop-bit checks fail (FE) or parity mismatches rise (PE) when decisions land in the uncertain edge region.
8x vs 16x oversampling (principle-level trade-offs)
8x oversampling
- Strength: simpler timing; fewer internal phases; often stable in basic implementations.
- Risk: less phase granularity for qualify/vote; more sensitive when a single sample is used.
- Best when: edges are clean and coupling is mild; filtering/qualify hooks are limited.
16x oversampling
- Strength: finer phase resolution enables start qualify and multi-sample voting.
- Risk: if voting window spans edge regions, short glitches can influence multiple sub-samples.
- Best when: the receiver supports robust qualify/vote configuration and edge regions are controlled.
Majority vote (3-sample voting) as a short-glitch suppressor
- Single-sample receiver: a short glitch hitting the sampling instant can flip the bit immediately.
- 3-sample voting: the same glitch must corrupt at least 2 out of 3 sub-samples to change the decision.
- Design implication: keep the vote cluster centered (avoid edge proximity) and avoid overly wide spacing that overlaps edges.
Edge noise vs low-frequency drift (what to fear first)
Edge noise (ringing, crosstalk, impulsive spikes)
- Typical outcome: stop window crosses threshold → FE dominates.
- Most effective: center sampling + majority vote + edge conditioning (series-R / controlled slew) + de-glitch.
Low-frequency drift (threshold wander, reference movement, slow baseline shift)
- Typical outcome: start qualify becomes fragile; sampling points drift toward edges → FE/PE rise over time.
- Most effective: stronger qualify/resync policy + stable IO domain reference + correlation logging (events/rails/temperature).
Start-bit qualification & de-glitch filters (reject false start)
The most destructive path for FE/PE bursts is a false start: an idle line is pulled low by a transient, the receiver “locks” on the wrong bit boundary, and parity/stop checks land in the wrong time slots. This section hardens the RX front-end using start-bit qualification and de-glitch policies.
False-start signatures (idle disturbed)
- Short low dip on idle: a narrow low pulse (glitch) that resembles start for a fraction of a bit time.
- Threshold chatter: repeated threshold crossings around idle due to supply/ground movement.
- Edge ringing: a fast aggressor creates overshoot/undershoot that crosses the RX threshold briefly.
- Symptom pattern: FE/PE appear in bursts and align with a switching event (relay/motor/hot-plug).
Start-bit qualification (N-of-M low samples)
- Core rule: declare START only after N consecutive oversamples are LOW (or N out of a short window are LOW, depending on implementation).
- Practical effect: narrow idle glitches fail qualification and are rejected before the receiver commits to a bit boundary.
- Where it helps most: event-driven EMI and ringing that produces brief threshold crossings.
De-glitch filters (minimum pulse width / vote window / digital integrator)
Minimum pulse width
- Rule: pulses shorter than Tglitch are ignored.
- Best for: sharp, narrow spikes (impulsive coupling).
- Risk: if Tglitch is too long, true START edges may be delayed or missed.
Sliding-window vote
- Rule: in a window of W oversamples, require ≥K LOW samples.
- Best for: threshold chatter and edge ringing.
- Risk: if W is too wide, edge regions are included and timing shifts worsen.
Digital integrator
- Rule: LOW evidence accumulates; trigger when an internal score crosses a threshold.
- Best for: noisy idle with frequent small crossings.
- Risk: excessive accumulation delays START recognition (sampling point drift).
Guardrails (avoid “filter so strong it breaks real UART”)
- Do not push START confirmation too late: qualification should consume only a small fraction of one bit time (α·Tbit, α = X placeholder).
- Set Tglitch with two constraints: reject the observed glitch width, but remain shorter than the stable-low portion of a real START edge (worst voltage/temperature).
- Validate with A/B tests: inject narrow glitches and real frames; require both false-start rejection and frame-detect retention.
Pass criteria (threshold placeholders)
- False-start rate: < X per minute (idle-only test window).
- Event burst reduction: FE/PE burst length drops to < X bytes at the triggering event.
- True-frame detect: ≥ X% frame detection under worst edge/voltage/temperature conditions.
- Recovery: receiver returns to stable decoding within < X ms after a disturbance.
Parity error patterns (why only certain bytes trigger PE)
“Parity errors only for certain byte values” typically means either a deterministic configuration mismatch or a probabilistic single-bit flip driven by edge-sensitive noise. The fastest way to distinguish them is to use targeted test patterns and observe how PE maps to byte values and bit positions.
Deterministic fingerprints (config mismatch)
- High, stable PE rate: errors reproduce consistently across runs and environments.
- Weak event correlation: PE does not track switching events or noise injections.
- Typical causes: parity enable/disable mismatch; even/odd/mark/space mismatch.
Probabilistic fingerprints (noise bit flips)
- Variable PE rate: changes with event timing, edge quality, temperature, or supply noise.
- Edge sensitivity: patterns with high transition density (e.g., alternating bits) may show higher PE.
- Hot bit position: PE concentrates on a specific bit index when coupling targets one timing window.
Fast checks (patterns that separate mismatch from noise)
Pattern set A: walking-1 / walking-0
- Goal: reveal a “hot” bit position (bit index sensitivity).
- Interpretation: if PE clusters at one bit position, noise is likely aligning to one edge/window.
Pattern set B: 0x00, 0xFF, 0x55, 0xAA
- Goal: compare low-transition vs high-transition density.
- Interpretation: higher PE on 0x55/0xAA suggests edge-related coupling and sampling-window fragility.
Pattern set C: PRBS (long-run statistics)
- Goal: separate stable deterministic mismatch from environment-dependent noise.
- Interpretation: deterministic mismatch stays stable across conditions; noise-driven PE varies with events and coupling.
Config check entry (do not expand into system-level framing guidance)
- Parity enabled? verify both ends match (enabled/disabled).
- Parity type: verify even/odd/mark/space match.
- Observation: if PE is near-constant across patterns and conditions, treat mismatch first.
Pass criteria (threshold placeholders)
- After mismatch correction: PE < X per 10^N bytes on stable patterns.
- After noise hardening: PE reduces by > X% and “hot bit” concentration disappears or drops > X dB (placeholder).
- Event correlation: PE bursts at switching events are < X bytes and recover within < X ms.
Hardware input conditioning (threshold, hysteresis, RC, series-R)
FE/PE bursts often originate at the RX threshold: ringing, spikes, and threshold chatter create false-starts or destabilize sampling windows. This section turns hardware measures into an executable checklist while keeping scope limited to single-ended UART RX conditioning (not PHY-level migrations).
Threshold stability & hysteresis (Schmitt behavior)
- Why it matters: repeated threshold crossings on idle can trigger false START and shift the receiver’s bit boundary.
- Hysteresis value: two thresholds reduce “chatter” around the switching point and suppress micro-glitches.
- Quick checks: observe idle-level stability at the RX pin; look for multiple crossings near the threshold during events.
Series-R (edge damping)
- Target: reduce ringing and overshoot that re-crosses the threshold.
- Verification: count threshold crossings on a scope; require fewer crossings after adding R.
- Guardrail: avoid slowing edges so much that start qualification or sampling becomes timing-fragile (placeholder: X).
RC (glitch shaping)
- Target: attenuate narrow spikes without changing bit-time structure.
- Verification: compare spike width/amplitude vs RX threshold; require fewer false-starts.
- Guardrail: avoid stacking strong analog RC with strong digital de-glitch (H2-5), which can delay true START.
ESD/Clamp side effects (what to watch)
- Parasitic capacitance: slows edges and changes ringing; can increase timing sensitivity and start-detect fragility.
- Leakage drift: after ESD or in hot/humid conditions, leakage can bias idle level toward threshold and raise false-start risk.
- Clamp current path: poor return routing can convert a spike into ground bounce that destabilizes RX threshold during events.
- Quick check: A/B swap (same footprint, different part) and compare idle level, edge shape, and FE/PE statistics.
Escalation entry (when single-ended fixes are not enough)
- Common-mode disturbance is large: long cabling, ground potential differences, or repeated event-driven spikes.
- Best-effort conditioning still fails: after series-R/RC and reasonable protection, FE/PE cannot meet acceptance thresholds (X).
- Next step: consider differential or isolation strategies and link to the relevant PHY/isolator topics.
Pass criteria (threshold placeholders)
- Threshold crossings: ringing-induced multi-crossing reduces by > X% (scope-based).
- False-start rate: < X per minute during idle + event stress.
- Edge integrity: true-frame detect ≥ X% under worst-case voltage/temperature with conditioning enabled.
Firmware/driver handling (flags, resync, discard policy)
Robust UART systems treat FE/PE as a control problem: decide what to discard, how to resynchronize, what to count, and how to recover. This section defines practical handling policies without expanding into idle-detect or throughput-tuning topics.
FE handling (stop check failed)
- Default stance: treat bit boundary as suspect; prefer discarding the current frame segment.
- Resync trigger: enter a resync state and wait for stable idle-high or clean stop patterns.
- Measure: burst length (bytes) and recovery time (ms) vs acceptance threshold X.
PE handling (parity mismatch)
- Default stance: discard the byte or mark it invalid (depends on upper-layer tolerance).
- Fingerprinting: if PE is stable across patterns/conditions, prioritize config mismatch checks.
- Measure: PE rate per 10^N bytes and PE hot-bit concentration (placeholder).
Overrun warning (avoid misclassification)
- Risk: FIFO/DMA overruns can look like “noise corruption”.
- Minimum check: track overrun flags and FIFO watermark alongside FE/PE counters.
- Scope: no throughput tuning here; only classification and logging entry.
Resynchronization (restore a clean boundary)
- Enter resync: on FE bursts or repeated PE clusters within a short window.
- Wait condition: stable idle-high for ≥ X time (placeholder) or a clean stop/idle pattern sequence.
- Re-arm: re-enable start qualification (H2-5) before accepting the next START.
- Exit criteria: sustained decode stability for ≥ X bytes after resync.
Counters & logging (minimum evidence set)
- Windowing: log every X ms or every N bytes (choose one) to keep statistics interpretable.
- Counters: FE count, PE count, (optional) overrun count, recovery count.
- Context tags: temperature (if available), VDDIO status (if available), and event labels (relay/motor/hot-plug).
- Outcome: enable reproduction and regression checks across builds, boards, and environments.
Pass criteria (threshold placeholders)
- Recovery time: < X ms from disturbance to stable decoding.
- Burst containment: FE/PE burst length < X bytes, followed by ≥ X bytes error-free.
- Logging completeness: every error window has counters + context tags (no missing fields).
Debug workflow (what to measure first, how to trigger)
Turn “garbled bytes” and “sporadic FE/PE” into a 10-minute executable path: capture the moment an error happens, classify the failure mode, then run fast A/B experiments to converge on the root cause.
Step 0 · Establish a minimal baseline (1 minute)
- Window: count per X ms or per 10^N bytes (choose one; placeholder X).
- Counters: FE, PE, (optional) overrun, recovery/resync count.
- Tags: cable length, power mode, and event labels (motor start / relay / hot-plug).
- Output: a baseline snapshot that makes A/B comparisons meaningful.
Logic analyzer (protocol decode)
- Best for: pinpointing which byte/frame fails and how long bursts last.
- Fingerprinting: deterministic PE patterns vs probabilistic noise.
- Patterns: 0x00/0xFF, 0x55/0xAA, walking-1/0, PRBS (fast correlation).
Oscilloscope (edge & threshold behavior)
- Best for: ringing, spikes, and multiple threshold crossings on RX.
- Idle sanity: detect idle drifting near the threshold (leakage/return noise).
- Event linkage: align RX behavior with supply ripple or motor/relay/hot-plug events.
Trigger cookbook (make “sporadic” reproducible)
- Flag-trigger: capture pre/post windows around FE/PE flags (±X ms placeholder).
- Long-LOW trigger: treat abnormal long-LOW as an “RX disturbance” capture (used only to grab the moment, not a protocol lesson).
- Event-trigger: tag motor/relay/hot-plug timestamps and align with FE/PE burst density.
- Supply-trigger: correlate error clusters with VDDIO ripple/step transitions (scope + log tags).
A/B ladder (fast root-cause narrowing)
- Change cable/length/route: strong length dependence suggests SI/return-path coupling.
- Change ground reference/return: event-driven improvements point to ground bounce/return discontinuity.
- Change supply mode: idle errors disappearing indicates threshold drift via VDDIO noise.
- Add/adjust series-R: fewer multi-crossings implies ringing/edge over-aggression.
- Adjust qualify/de-glitch thresholds: reduced false-start rate without “missed frames” indicates false-start dominance.
- A/B protection parts: systematic shifts implicate Cpar/leakage/clamp path side effects.
Output: Debug decision tree
Classify failures into false start, sampling shift, or threshold noise, then select the smallest fix set: qualify/de-glitch, front-end conditioning, or policy/resync/logging.
Pass criteria & metrics (quantify robustness)
Replace “looks better” with quantified acceptance. Use a consistent counting window, measure burst behavior and recovery time, and gate results across defined stress conditions.
Metric definitions (consistent windows)
- FE rate: FE per 10^N bytes (placeholder N, threshold X).
- PE rate: PE per 10^N bytes (placeholder N, threshold X).
- False-start rate: false START detections per minute under idle-only stress (threshold X).
- Burst length: distribution of consecutive error bytes (P95 or max; threshold X bytes).
- Time-to-recover: time from first error to ≥ X consecutive error-free bytes (threshold X ms).
Conditions pack (minimal but meaningful)
- Temperature: room / hot / cold (placeholders).
- Power: nominal / ripple injected / transient step (placeholders).
- Events: motor start / relay toggle / hot-plug (tagged and repeatable).
- Cabling: short vs long harness/trace (placeholders).
- Config edges: highest baud and worst-case operating corners (placeholders; no budget derivation here).
Pass/Fail gates (threshold placeholders)
- Gate 1: FE rate < X per 10^N bytes.
- Gate 2: PE rate < X per 10^N bytes.
- Gate 3: Burst length P95 (or max) < X bytes.
- Gate 4: Time-to-recover < X ms.
- Gate 5: False-start rate < X per minute (idle-only).
- Rule: gates must pass across the defined conditions pack (or document exceptions explicitly).
Engineering checklist (design → bring-up → production)
A noise-tolerant UART RX is an evidence-driven process: control threshold behavior, avoid false-starts, measure burst morphology, and prove recovery under corner conditions. This checklist stays tightly scoped to FE/PE/noise robustness and recovery handling.
Design gate
- Input chain: RX trace, reference plane, and return continuity are documented.
- Filter plan: avoid stacking strong analog RC with strong digital de-glitch (define X guardrail).
- ESD Cpar budget: verify protection capacitance/leakage won’t pull idle toward the threshold.
- Ground path: clamp current return does not inject bounce into RX threshold.
- Expected waveform: “single crossing” edges (no repeated threshold crossings during events).
Evidence: scope captures (idle + event), schematic notes (Cpar/return path), planned counters window.
Bring-up gate
- Patterns: 0x55/0xAA, 0x00/0xFF, walking-1/0, PRBS (for PE fingerprints).
- Triggers: FE/PE flag windows (±X ms), abnormal long-LOW capture, event-tag correlation.
- Metrics: FE/PE rate, burst length, time-to-recover, false-start rate (X placeholders).
- A/B ladder: cable/ground/power/series-R/thresholds/protection swap to classify root cause.
Evidence: decoded burst logs + scope snapshots aligned to events.
Production gate
- BIST/loopback: fixed patterns + counter capture (prove observability and recovery).
- Corners: temperature, supply ripple/transients, cable length, event injection (placeholders).
- Minimum logs: window definition + FE/PE/burst/recovery/false-start + event tags.
- Gate rules: pass/fail thresholds must hold across conditions pack (document exceptions).
Evidence: “PASS/FAIL” record sheet per build/board/lot.
Applications & IC selection notes (noise-tolerant UART)
Noise-tolerant UART reception focuses on false-start rejection, sampling robustness, and observable recovery. The notes below remain scoped to UART RX behavior (not a PHY migration guide).
Industrial service/debug port
- Noise source: relays, motors, ESD events.
- Signature: FE bursts aligned with events; idle errors.
- Key hooks: start validation + de-glitch, series-R, hysteresis, resync + counters.
- Metric: burst length < X and recovery < X ms under event tags.
Long harness console (cabinet / multi-board)
- Noise source: return discontinuities, common-mode shifts.
- Signature: errors vary strongly with cable length/route.
- Key hooks: front-end conditioning + conservative sampling robustness.
- Metric: FE/PE rate below gates across short/long cabling conditions.
High-noise bypass link (fallback channel)
- Noise source: switching supplies and transient load steps.
- Signature: clustered errors during power steps.
- Key hooks: event-tag logging + strict recovery targets.
- Metric: time-to-recover and resync count within gates.
Robust RX before low-power transitions
- Noise source: rail ramping and thresholds shifting during mode transitions.
- Signature: idle instability and false starts near transitions.
- Key hooks: start qualification + conservative de-glitch + clear discard/resync policy.
- Metric: false-start rate under idle-only stress < X per minute.
MCU/UART “noise-tolerance” feature checklist
- Oversampling options: selectable oversampling (commonly 8×/16×) and robust mid-bit sampling behavior.
- Start-bit validation: configurable START qualification or digital de-glitch support (or available via programmable filters).
- Error observability: FE/PE/overrun flags plus counters or low-overhead logging capability.
- Recovery hooks: resync/discard policy can be implemented deterministically without losing long frames.
- Clocking flexibility: stable clock source options and divider granularity for robust sampling windows.
Example MCU/UART platforms (for feature comparison)
- ST: STM32G0 series (e.g., STM32G071) — verify package/suffix/availability.
- ST: STM32L4 series (e.g., STM32L476) — verify package/suffix/availability.
- NXP: i.MX RT (e.g., MIMXRT1062) — verify package/suffix/availability.
- Microchip: SAM E5x (e.g., ATSAME54P20A) — verify package/suffix/availability.
- TI: MSPM0 (e.g., MSPM0G3507) — verify package/suffix/availability.
Note: these examples anchor a comparison checklist; feature availability varies by sub-family and revision.
Schmitt buffer / input conditioning ICs
- TI: SN74LVC1G17 (Schmitt buffer) — verify package/suffix/availability.
- Nexperia: 74LVC1G17 variants — verify package/suffix/availability.
- Onsemi: NC7SZ17 (Schmitt buffer family) — verify package/suffix/availability.
Use when RX threshold chatter is dominant; validate input capacitance and edge timing margins.
Digital isolators (isolation entry)
- ADI: ADuM1201 (dual-channel isolator) — verify package/suffix/availability.
- TI: ISO7721 (dual-channel isolator) — verify package/suffix/availability.
- Silicon Labs: Si8621 family — verify package/suffix/availability.
Use when common-mode disturbance dominates; evaluate propagation delay and edge shaping vs sampling windows.
Protection arrays (ESD entry examples)
- TI: TPD2E007 (low-cap ESD protection) — verify package/suffix/availability.
- Nexperia: PESD5V0 families — verify package/suffix/availability.
- Semtech: RClamp families (low-cap arrays) — verify package/suffix/availability.
Always validate Cpar/leakage/clamp return path; A/B swap can reveal hidden edge and idle shifts.
Selection rule: example part numbers are reference anchors only; always verify package/suffix/availability and confirm capacitance/delay/edge effects against sampling robustness and false-start rejection gates.
Recommended topics you might also need
Request a Quote
FAQs (framing/parity errors under noise)
These FAQs close long-tail troubleshooting without expanding the main text. Each answer is fixed to four lines: Likely cause / Quick check / Fix / Pass criteria (thresholds use placeholders X and N).