UART Baud Rate & Error Budget for Reliable Serial Links
← Back to: I²C / SPI / UART — Serial Peripheral Buses
Scope & Assumptions
UART reliability is a timing-window problem. A “correct” nominal baud rate can still fail when the combined TX+RX clock error, divider quantization, and drift push sampling toward bit edges—especially across temperature, power modes, or long bursts.
- Covers: UART baud error budget (TX+RX), sampling-window intuition, divider rounding/quantization, drift contributors (ppm/temperature/aging), practical verification, and calibration hooks that keep combined error within a typical ±2% target (or a defined system target).
- Not covered: RS-232/RS-485 electrical layer details, galvanic isolation/CMTI design, surge/ESD standards compliance, and long-trace SI/termination as full topics (those belong to dedicated pages; only brief pointers may appear).
- Outputs: an error-budget checklist (what to count and how), a measurement plan (how to measure actual baud and margin), and calibration/recovery hooks that remain production-robust.
- Who should read: hardware (clock/baud generator choices), firmware (timeouts/recovery/calibration), and test/production (screening, trim, traceability fields).
Default assumptions (explicit baseline)
- Frame baseline: 8N1 (1 start, 8 data, no parity, 1 stop).
- Receiver baseline: 16× oversampling with a center/majority decision (implementation varies; differences are addressed later by mapping them to effective timing margin).
- Budget baseline: “clock error” includes source accuracy (ppm/%), temperature drift, divider quantization, and mode-dependent clock switching (sleep/performance states).
UART Sampling Window: why baud error matters
UART is asynchronous: there is no shared clock on the wire. The receiver synchronizes at the start-bit edge, then “free-runs” its sampling schedule for the rest of the frame. Any mismatch between the transmitter and receiver bit time causes the sampling point to drift toward bit edges, where noise and threshold uncertainty can flip decisions.
Intuition (engineering-first)
- The “safe” place to sample a UART bit is near the bit center, not near edges.
- If TX is slightly faster than RX (or vice versa), the sample timing shifts a little every bit. The shift accumulates across the frame.
- When accumulated drift reaches the edge region, failures appear as framing errors, intermittent garble, or bit slips—often only under temperature, power-state changes, or long bursts.
Minimal model (enough to build a budget)
Let the transmitter bit time be TTX and the receiver’s expected bit time be TRX. The relative timing mismatch per bit is approximately:
After k bits from the start edge, the accumulated sampling offset is roughly k · ΔT. The practical design goal is to keep the sampling point away from edges with margin, so that line noise and threshold variation do not dominate.
What “failure” looks like (symptom mapping)
- Framing error: the stop-bit sample lands too close to an edge; a valid stop is misread.
- Intermittent garble: some data-bit samples enter the edge region; noise flips a subset of bits.
- Bit slip / phase walk: drift crosses a bit boundary; subsequent bits are decoded shifted.
This drift model is the reason a combined error target (often cited as ±2%) must be treated as a budget with measurable contributors—not a single-number checkbox.
Error Budget Definition: what counts as baud error
“Baud error” is not just crystal ppm. UART timing margin is consumed by clock-source accuracy, divider quantization/rounding, temperature and mode drift, and an implementation margin that covers real-world sampling and edge uncertainty. A usable budget expresses every contributor in the same unit, then sums TX and RX sides into a combined limit.
Budget structure (single-line rule)
- Clock error includes ppm accuracy, temperature drift, aging, and any mode-dependent clock switching (sleep/performance states).
- Quantization is the gap between the desired baud and the baud actually produced by integer/fractional divider steps.
- Margin accounts for implementation differences (oversampling strategy, edge noise sensitivity, threshold uncertainty) not explicitly modeled.
Units: ppm ↔ percent (must normalize before summing)
Divider quantization can be orders of magnitude larger than ppm-level clock accuracy. Treat quantization as a first-class budget line item; otherwise “ppm looks great” while UART still fails.
Error budget template (fill this before declaring “±2% OK”)
| Error term | Typical | Worst-case | How to measure | How to mitigate |
|---|---|---|---|---|
| TX clock accuracy (ppm/%) | datasheet @ 25°C | temp range + aging | frequency counter / timer capture | better source, trim, or calibration |
| RX clock accuracy (ppm/%) | datasheet @ 25°C | temp range + aging | frequency counter / timer capture | better source, trim, or calibration |
| Baud-gen quantization/rounding | register-derived actual baud | worst baud point / mode | read divisor + compute actual baud | choose different baud/clk, fractional, calibrate |
| Temperature / mode drift | room temp, steady mode | cold/hot + sleep/perf switch | temperature sweep + baud re-measure | lock clock source, re-trim, periodic re-cal |
| Implementation margin | known-good baseline | noisy edges / weak sampling | error-rate vs margin tests | increase margin, improve edges, adjust frame strategy |
The budget is complete only when each contributor is expressed as percent at worst-case conditions, then summed with a defined margin.
Typical Tolerance Targets: why “±2%” is common (and when it is not enough)
The widely cited “combined error within ±2%” is an engineering rule-of-thumb for many MCU UART implementations with robust oversampling. It assumes the full budget in the previous section is honored—especially quantization and drift. The correct target is risk-based: frame behavior and effective sampling margin determine how much combined error can be tolerated.
Target bands (choose based on frame and margin risk)
- Long bursts / long frames where drift accumulates.
- Large temperature span or frequent clock/mode switching.
- Weak sampling margin (low oversampling or limited filtering).
- Common MCU UART with robust oversampling.
- Stable clock source and controlled drift.
- Quantization error accounted in the budget.
- Low oversampling or narrow effective sampling window.
- High baud with unfavorable divider steps (large quantization).
- Stop-bit/framing sensitivity under noise (reduced margin).
“±2%” is not a checkbox. The correct band is the one that keeps sampling away from edges under worst-case drift and real margin.
Divider & Baud Generator Pitfalls: rounding, quantization, and drift
Baud settings do not always land exactly on the requested rate. Integer divider rounding, fractional step size limits, oversampling differences, and clock-tree mode switching can create a quantization error that is much larger than ppm-level clock accuracy. The only reliable way to budget UART timing is to compute the actual baud from registers and the selected UART clock source.
Practical structure (use this during bring-up)
Symptoms
- “Same baud” configured, but interoperability differs across vendors or boards.
- Works at one baud (e.g., 115200), fails at another “standard” baud.
- Stable at room temperature, fails after clock/power-mode switching.
- Short frames are fine, long bursts show intermittent errors.
Root causes
- Integer rounding: divisor must be an integer, producing step-like baud error.
- Fractional limits: fractional divider exists, but step size still leaves residual error.
- Oversample mismatch: OSR differs (16× vs 8×/4×) or changes with mode.
- Clock-tree drift: UART clock source changes across sleep/performance states.
How to compute actual baud (generic)
- Identify f_uart_clk (the actual clock feeding UART, including pre-dividers and mode switching).
- Confirm OSR (oversampling ratio used for RX sampling; may be configurable or implicit).
- Read divisor registers and build divisor_eff (integer + fractional fields if present).
- Compute baud_actual and error(%); record typical and worst-case (temperature/modes).
Fix options (lowest friction first)
- Choose a friendlier baud or clock that reduces quantization (better divisor fit).
- Use fractional divider if available and validate residual error across modes.
- Lock the UART clock source (avoid hidden clock switching across power states).
- Add calibration hooks (trim or auto-baud) and guard with rollback criteria.
Clock Source Choice: XTAL vs RC vs PLL (ppm vs jitter)
UART timing is dominated by frequency offset and drift (ppm → %), because sampling-point drift accumulates across the frame. Jitter matters indirectly: when sampling is pushed toward edges, edge uncertainty (noise/threshold variation) reduces effective margin. Clock selection should therefore be driven by worst-case drift, burst length, and whether calibration is available.
Quick comparison (what matters for UART)
- Accuracy: stable ppm-class baseline.
- Temp drift: predictable across rated range.
- Start-up: may be slower than RC in deep sleep.
- Calibration: optional; often used to tighten worst-case.
- Accuracy: typically stable and spec-driven.
- Temp drift: controlled; check rated range.
- Start-up: may be favorable for some systems.
- Calibration: usually not required for typical targets.
- Accuracy: can be poor without trim; varies by voltage/temperature.
- Temp drift: often the dominant risk for long bursts.
- Start-up: fast; common in low-power wake paths.
- Calibration: typically required for tight targets or wide temperature.
- Accuracy: depends on reference and configuration.
- Temp/mode: verify stability across clock-tree switching.
- Jitter: usually secondary for UART, but edge uncertainty can reduce margin when drift is high.
- Calibration: may still be needed when worst-case drift is tight.
Selection logic (inputs that change the answer)
- Baud target: higher baud increases sensitivity to quantization and drift.
- Temperature span: wide temperature demands a stable source or calibration.
- Burst length / frame behavior: longer bursts accumulate sampling drift.
- Low-power wake: if UART runs on RC during wake, budget that path explicitly.
- Calibration availability: if calibration exists, RC can be viable; without it, prefer stable sources.
Calibration Hooks: auto-baud, runtime trimming, and safe re-calibration
Calibration turns a baud-rate budget into a closed-loop control: measure frequency or edge timing, estimate combined error, adjust divisor/trim, validate under real conditions, then commit or rollback. The defining requirement is fail-safe behavior—calibration must never make the link less stable.
Safety rules (do not skip)
- Two-phase apply: test new settings first, commit only after validation.
- Rollback ready: keep a last-known-good profile and restore immediately on failure.
- Guardrails: limit adjustment step size and calibration rate to avoid oscillation.
- Traceability: log temperature, mode, target/actual baud, error%, and profile ID.
Three practical calibration classes
- Trigger: end-of-line, after programming, at controlled conditions.
- Hooks needed: measurable clock path (tick count / counter capture), writable divisor/trim, stable reference (fixture or board reference).
- Risk: wrong reference or unmodeled clock-tree switching leads to a “perfect” trim for the wrong mode.
- Rollback: commit only after re-measure; reject if correction exceeds a sane bound.
- Trigger: periodic timer, temperature delta, or clock/power-mode change.
- Hooks needed: temperature input, counter capture or frequency measurement path, adjustable divisor/trim.
- Risk: noisy measurements cause “over-correction” that degrades margin.
- Rollback: apply small steps, validate with error counters / stability window, then commit.
- Trigger: unknown peer, first attach, repeated framing/parity errors, or service mode.
- Hooks needed: edge-timing capture (timer input capture or UART assist), defined training pattern, timeout policy.
- Risk: false lock due to noise/glitches; training interpreted as real payload.
- Rollback: require repeated-consistent estimates; revert to safe profile on timeout or instability.
Minimum logging fields (traceable calibration)
| Field | Why it matters | Example use |
|---|---|---|
| baud_target / baud_actual / error% | quantifies timing margin | trend vs temperature/modes |
| clock source ID / mode | detects hidden clock switching | correlate failures to states |
| temperature | drift driver | trigger/validate re-cal |
| profile_id (last-known-good) | safe rollback anchor | field recovery |
| error counters (frame/parity/overrun) | validation signal | commit/rollback decision |
Frame-Length Effect: why longer frames tighten the error budget
UART aligns on the start bit, then the sampling point drifts by a small amount every bit when TX and RX clocks differ. Longer frames and longer bursts allow this drift to accumulate, reducing margin until sampling approaches a transition edge. Frame format therefore changes robustness even when baud target is the same.
Design moves in high-error environments
- Increase stop bits to widen the safe decision window and reduce sensitivity at frame end.
- Shorten bursts or segment transfers so drift does not run uninterrupted for long periods.
- Insert idle gaps between frames to re-center the receiver’s start-edge alignment opportunities.
- Use hardware flow control to prevent overruns when error handling or re-sync requires pauses (protocol details stay outside this page).
What to record and compare (for bring-up and field)
- Frame format: data bits / parity / stop bits, plus inter-frame gap policy.
- Burst length and idle insertion rules (worst-case continuous bits).
- baud_target and baud_actual with computed error% under representative conditions.
- Counters: framing/parity/overrun, and the time correlation to temperature and mode changes.
- Calibration state: active profile ID, last calibration time, and rollback events.
Verification & Measurement: measure actual baud and practical margin
Verification proves whether the system truly stays within a target tolerance. A single frame is not enough: measure distributions over many frames, repeat across temperature and power/clock modes, and compute error% from a consistent definition.
Verification target (minimum proof)
- Compute: baud_actual and error% from a stable measurement window.
- Cover: temperature and power/clock modes that can change the UART clock tree.
- Judge: typical and worst-case distribution, not a single sample.
- Log: clock source ID, OSR, divisor registers, and window length.
Three measurement paths (easy → robust)
- Setup: high sample rate relative to baud; stable threshold.
- Capture: many bits and many frames; include long bursts.
- Compute: bit_time → baud_actual → error% (store distribution).
- Use when: quick bring-up, rounding/quantization detection.
- Setup: consistent trigger threshold; avoid “auto” period traps.
- Capture: apply statistics on repeated edges (many frames).
- Compute: edge-to-edge timing → baud_actual; compare across conditions.
- Use when: edge quality or threshold sensitivity must be observed.
- Setup: measure f_uart_clk (or derived tick) with a known gate window.
- Capture: repeat across sleep/boost states and clock-tree switching.
- Compute: use f_uart_clk, OSR, divisor_eff to derive baud_actual.
- Use when: traceability and root-cause across modes is needed.
Common measurement traps
- Trigger jitter: unstable trigger threshold corrupts period statistics.
- Sample rate too low: edge quantization inflates apparent timing error.
- Auto-measurement misreads: glitches/reflections can be treated as real transitions.
- Single-frame confirmation: misses drift and worst-case tails in the distribution.
- Condition coverage missing: temperature and clock-mode switching not included.
- Window too short: counter gate time too short increases variance.
Pass criteria (fill with project thresholds)
- Error% distribution: typical ≤ X%, worst-case ≤ Y% across the full test set.
- Condition coverage: temperature points and clock/power modes defined and tested.
- Stability: no sustained framing/parity storm under representative bursts.
- Traceability: logs include mode, clock source ID, OSR, divisor registers, and window length.
Robust Error Handling (UART boundary): timeouts, re-sync, and watchdog-safe recovery
Even with a good baud budget, real systems can hit RX lockups, framing storms, and burst overruns. Robust UART handling needs bounded recovery: detect errors, perform a controlled flush and re-sync, retry with limits, and fall back to a safe profile without creating watchdog traps.
Minimum hooks (must-have)
- Timeout: frame/byte timeout that aborts stalled reception.
- Line idle detect: a stable “idle” condition to re-arm cleanly.
- Buffer flush: clear RX/TX FIFO on error.
- Error counters: framing/parity/overrun statistics for validation.
- Safe reset path: reset UART peripheral without destabilizing the system.
- Retry limit: finite retries with a fail-safe exit path.
Case A: RX stuck / never returns to idle
- Symptom: RX busy persists; no valid frames; ISR rate abnormal.
- First checks: idle detect state, timeout counters, error flags.
- Recovery: disable RX → flush FIFO → wait for stable idle → re-enable RX; escalate to peripheral reset if repeated.
- Pass criteria: returns to idle and stays stable for N frames; error counters return to baseline.
Case B: framing/parity storm (continuous errors)
- Symptom: error interrupts dominate; payload becomes unusable.
- First checks: baud_actual drift across temperature/modes; divisor/OSR changes; clock-source switching.
- Recovery: enter quiet time → flush → re-sync on idle; optionally switch to a safe UART profile (e.g., more margin) for recovery only.
- Pass criteria: storm ends; retry count bounded; system returns to stable receive state.
Case C: burst overrun / buffer overflow
- Symptom: overrun flags; missing bytes during bursts.
- First checks: service latency, FIFO threshold, DMA/ISR load, flow-control availability.
- Recovery: flush and re-sync; reduce ISR work; enable hardware flow control if available; ensure retries are bounded.
- Pass criteria: no overrun at max burst; bounded latency and stable counters.
Case D: timeout & re-sync (partial frames, stalls)
- Symptom: partial frames; long gaps mid-frame; stuck parser behavior at UART boundary.
- First checks: timeout base, idle criteria, stop-bit/gap assumptions.
- Recovery: timeout abort → flush partial state → wait idle → re-arm receiver; retry with limits.
- Pass criteria: auto recovery within bounded time; no infinite loops; watchdog remains satisfied.
Engineering Checklist (Design → Bring-up → Production)
This checklist compresses UART baud-rate budgeting into verifiable gates. Every item is written as an action with evidence fields and a pass criteria placeholder, so the flow can be used in reviews, bring-up logs, and production test specs.
Concrete parts (examples for implementation hooks)
Verify package/suffix/availability for the target design. The list focuses on parts that enable accurate clocks, capture/measurement, and robust UART bridging during bring-up and production.
- MEMS XO: SiTime SIT1602 (fixed-frequency oscillator family; ppm vs temp depends on grade).
- MEMS XO: SiTime SIT8008 (small XO family; common for MCU reference).
- TCXO: Epson TG2520SMN family (temperature-compensated XO options).
- 32.768 kHz reference: Abracon ABS07 (watch crystal family; grade impacts ppm/temp).
- Precision ref: Analog Devices LTC6655 (stable reference option for analog timing/measurement subsystems).
- Time capture assist: TI TDC7200 (time-to-digital converter for edge interval measurement in fixtures).
- Counter/clock buffer: TI CDCLVC1102 (clock buffer family; useful to fan-out a measured clock point).
- USB-UART: FTDI FT232R (classic USB-UART family used in debug adapters).
- USB-UART: Silicon Labs CP2102N (common USB-UART bridge family; variants differ by GPIO/clock).
- USB-UART: WCH CH343 family (multi-UART options exist; verify exact part/driver needs).
Note: these part examples are for hooks (stable clock, measurable clock point, and reliable bridging). UART transceivers such as RS-232/RS-485 are intentionally out of scope for this page.
Design gate (prevent budget leaks before layout)
-
Choose a clock strategy that matches baud, temperature, and burst length constraints.
Evidence: clock_source_id
Pass criteria: clock plan documented; mode switching rules defined (X).
-
Compute worst-case baud error budget (TX% + RX% + divider quantization + margin) using the page’s definition.
Evidence: error_budget_sheet
Pass criteria: worst-case ≤ Y% (placeholder) with margin M%.
-
Define a safe profile set: typical operating profile + conservative recovery profile (more margin).
Evidence: profile_id
Pass criteria: recovery profile documented and testable (X).
-
Specify calibration hooks (measurement point + adjust path + rollback storage). If an accurate oscillator is required, pre-select candidate families (e.g., SiTime SIT1602 / Epson TG2520SMN grades as appropriate).
Evidence: trim_storage
Pass criteria: commit/rollback defined; trim persistence verified (X).
-
Plan frame robustness rules (stop bits / idle gaps / burst segmentation) for worst-case continuous bits.
Evidence: frame_policy
Pass criteria: worst-case burst length and gaps specified (X).
-
Allocate minimum recovery hooks: timeout, idle detect, FIFO flush, error counters, retry limit, safe UART reset.
Evidence: hook_matrix
Pass criteria: hooks exist in design & firmware plan (X).
Bring-up gate (turn nominal settings into measured distributions)
-
Measure baud_actual with a stable window and store error% distribution (not a single frame).
Evidence: la_scope_capture
Pass criteria: typical ≤ X%, worst-case ≤ Y% (placeholder).
-
Sweep temperature (or stepped temperature proxy) and repeat worst-case measurements.
Evidence: temp_log
Pass criteria: tails remain within limits across defined points (X).
-
Toggle power/clock modes (sleep/boost/PLL relock) and check baud stability after each transition.
Evidence: mode_id
Pass criteria: no hidden clock switching without detection (X).
-
Stress long frames and worst-case bursts; verify no framing storm and no silent byte loss.
Evidence: burst_profile
Pass criteria: error counters remain below threshold (X).
-
Inject controlled faults (framing/parity/overrun) and verify bounded recovery (timeout → flush → resync → retry limit → fail-safe).
Evidence: fault_injection_log
Pass criteria: recovery completes within T ms and never loops forever (X).
-
Validate calibration loop with two-phase commit and rollback. If a fixture uses time interval measurement, validate the measurement chain (e.g., TI TDC7200 in a capture-based fixture design).
Evidence: cal_profile_id
Pass criteria: commit only after stability; rollback on anomaly (X).
Production gate (repeatable test, trim write-in, full traceability)
-
Implement a fixture method to measure clock/baud quickly with a defined window (gate time) and fixed thresholds.
Evidence: fixture_rev
Pass criteria: measurement repeatability within R (placeholder).
-
Run UART loopback/pattern test at the target baud and the conservative profile; confirm no framing storms under stress.
Evidence: prod_pattern_id
Pass criteria: error counters below threshold (X).
-
Program trim/profile with two-phase commit; keep the last-known-good baseline as the default fallback.
Evidence: trim_value
Pass criteria: verification read-back matches; rollback path proven (X).
-
Verify trace fields are complete: clock source ID, divisor regs, OSR, temperature (if available), measured error%, profile ID, timestamp.
Evidence: trace_blob
Pass criteria: field completeness = 100% (X).
-
Audit bounded recovery behavior in production firmware builds (retry limit, timeout, safe reset) to avoid watchdog traps in the field.
Evidence: fw_build_id
Pass criteria: recovery bounded; no infinite loops; watchdog remains satisfied (X).
Recommended topics you might also need
Request a Quote
FAQs (baud error, calibration, measurement, frame effects, recovery)
These FAQs close out long-tail troubleshooting without expanding the main text. Every answer uses the same 4-line structure and includes data-like pass criteria placeholders with units.
Both ends set 115200, but characters are garbled randomly—what’s the first baud sanity check?
Likely cause: Actual baud mismatch from divider rounding/OSR mismatch, or an unexpected clock source/mode; sometimes frame format mismatch (8N1 vs others).
Quick check: Read both ends’ UART divisor + oversampling (OSR) settings and compute baud_actual; confirm both ends use the same frame format.
Fix: Choose the nearest divisor minimizing worst-case error, align OSR/frame format, and pin the UART to a stable clock source across modes (or re-apply settings after mode switches).
Pass criteria: E_typ ≤ X%, E_worst ≤ Y% over Temp_range = [Tmin..Tmax]°C; framing error rate ≤ Z per 10^6 bits across N_frames ≥ N.
Works at room temp, fails cold/hot—what clock-drift field should be logged first?
Likely cause: Clock ppm drift vs temperature (RC drift, marginal oscillator grade, or PLL source changes) pushes the combined UART error beyond margin.
Quick check: Log temperature + clock_source_id + mode_id + baud_actual/error% at Tmin/room/Tmax (same capture window).
Fix: Use a more stable clock source (or add runtime trim triggered by ΔT), and validate the conservative UART profile under cold/hot.
Pass criteria: E_worst ≤ Y% across Temp_range = [Tmin..Tmax]°C and all modes; sustained framing/parity storms = 0 over N_frames ≥ N.
Only fails on long bursts—why does frame length tighten the error budget?
Likely cause: Asynchronous sampling re-syncs at the start bit, then sample-point drift accumulates with each bit; longer continuous bits reduce “re-alignment opportunities.”
Quick check: Repeat a worst-case continuous-bit pattern for L bits and record where errors start; compare with short frames/added idle gaps.
Fix: Shorten bursts, insert idle gaps, add stop bits (if supported), or reduce baud/increase OSR to widen sampling margin.
Pass criteria: L_max ≥ L bits per burst with error rate ≤ Z per 10^6 bits; no errors over N_frames ≥ N at the worst-case burst profile.
Auto-baud sometimes locks to a wrong rate—what preamble pattern is safest?
Likely cause: Ambiguous edge spacing (too few transitions or noisy edges) causes the estimator to pick a harmonic/alias rate.
Quick check: Confirm what the auto-baud algorithm measures (start-bit width, bit time, or edge-to-edge) and test candidate preambles under worst-case conditions.
Fix: Use BREAK (dominant low) + repeated 0x55 (01010101) for high transition density + a delimiter byte to validate; only accept lock after K consistent measurements.
Pass criteria: Lock_success ≥ P% over Temp_range and modes; false-lock ≤ F per 10^4 attempts; locked E_worst ≤ Y%.
Logic analyzer shows “correct baud” but the device still frames—what measurement trap is common?
Likely cause: Auto-decode hides timing tails; sample rate is too low; trigger threshold jitter/glitches are counted as edges, corrupting period statistics.
Quick check: Measure raw bit time over a long window (median + p99), increase capture sample rate, and compare “decoded baud” vs measured distribution tails.
Fix: Use fixed thresholds, long captures, and distribution-based decisions; prefer clock-source measurement when mode switching is suspected.
Pass criteria: bit_time_p99 error ≤ Y% in the same window; framing storm = 0 across N_frames ≥ N with identical test conditions.
RC oscillator works at low baud but fails at high baud—what’s the quickest mitigation?
Likely cause: RC accuracy and temperature drift are too large; divider quantization consumes margin, so combined error exceeds tolerance at higher baud.
Quick check: Measure baud_actual at room/cold/hot and compare E_worst vs target; verify whether UART clock changes in low-power modes.
Fix: Lower baud or switch to a more stable clock; if RC must be used, add periodic trim using a known reference and only commit trims after validation.
Pass criteria: E_worst ≤ Y% at target baud across Temp_range; calibration interval ≤ t_cal s with stable post-cal N_frames ≥ N.
Fractional divider looks accurate on paper, but RX still errors—what rounding/oversampling detail to check?
Likely cause: Fractional accumulator introduces periodic timing modulation; OSR differs from the assumed value; peak error (not average) breaks sampling near bit boundaries.
Quick check: Read OSR + fractional settings and compute effective baud over a long window; look for periodic error patterns aligned with the fractional cycle length.
Fix: Choose the setting that minimizes worst-case/peak error, align OSR, and prefer an integer divider (or higher OSR) when peak timing modulation is the limiting factor.
Pass criteria: peak_error_p99 ≤ Y%; periodic framing at fractional-cycle boundaries = 0 over N_frames ≥ N.
Switching power modes breaks UART—what clock source change is most likely?
Likely cause: UART clock source silently switches (PLL → RC / different divider), or fractional/divider state resets during gating, changing baud_actual after the transition.
Quick check: Log clock_source_id + mode_id before/after the transition and measure baud_actual immediately after wake/boost.
Fix: Pin UART to a stable source across modes, re-apply divisor/OSR after transitions, and perform a bounded flush/resync on mode changes.
Pass criteria: post-transition E_worst ≤ Y% within T_settle ms; dropped bytes ≤ B per transition; recovery bounded by R_retry ≤ R.
After calibration, it got worse—what rollback rule should be enforced?
Likely cause: Noisy measurement/reference, too-large trim step, or committing before validation; calibration “learns” the wrong direction and increases worst-case error.
Quick check: Compare pre/post error% distribution under the same window and conditions; verify reference stability; check whether improvement is consistent across N_frames.
Fix: Two-phase commit: apply a small step → validate for N_frames → commit only if E_worst improves by ≥ Δ%; otherwise rollback to last-known-good and rate-limit retries.
Pass criteria: E_worst never exceeds baseline + M%; rollback success = 100%; calibration attempts per hour ≤ K.
One board batch fails more—what production-time clock test is missing?
Likely cause: No per-unit measurement of clock/baud_actual and no traceability; oscillator grade spread or mis-trim slips through without a gate-time frequency check.
Quick check: Add a fixture check that measures UART clock (or TX bit time) over a defined gate window and logs trim_value + measured error%.
Fix: Enforce production auto-trim with two-phase commit, store trace fields (clock_source_id, divisor_regs, OSR, measured_error%), and quarantine outliers by measured E_worst.
Pass criteria: yield ≥ Yld%; E_worst ≤ Y% for all units; trace completeness = 100% (required fields present).
Parity errors spike only when EMI is present—how to separate baud drift vs noise?
Likely cause: EMI-induced edge disturbances/glitches cause sporadic bit errors; true baud drift typically correlates with temperature/mode and often increases framing/stop-bit errors.
Quick check: Measure baud_actual/error% during the EMI event and compare to baseline; compare parity-only spikes vs framing/stop-bit error signatures.
Fix: If baud stays within spec, use the conservative UART profile and ensure bounded recovery (timeouts + flush + resync); treat the event as noise, not drift, and keep error counters for correlation.
Pass criteria: Δbaud ≤ E% during event; parity error rate ≤ P per 10^6 bits; framing storm duration = 0 over N_frames ≥ N.
Framing storms lock up firmware—what timeout/recovery sequence is safest?
Likely cause: Unbounded error ISR loops with RX FIFO never cleared; missing timeouts/retry limits cause the system to spin and starve the watchdog.
Quick check: Confirm error interrupt rate and RX busy state; verify timeout counters progress; check whether FIFO flush and line-idle detection exist.
Fix: Bounded state machine: detect storm → disable RX → flush FIFO → wait idle for T_idle → re-enable RX → retry ≤ R_retry → fall back to safe profile and report.
Pass criteria: T_recover ≤ T ms; CPU usage returns ≤ C% after recovery; watchdog resets = 0; retries limited to R_retry ≤ R.