Balise / Transponder: RF Demodulation, Diagnostics, Event Timing
← Back to: Rail Transit & Locomotive
A Balise/Transponder link is only “reliable” when it can prove every pass with a consistent RF→decode→timestamp→commit evidence chain—so unreadable, wrong reads, and post-ESD failures are debugged by checking a few key fields (RSSI/AGC/CRC/timebase/commit) and applying the first fix pattern before changing architecture.
H2-1. Scope & Interfaces
This page locks the scope to the balise/transponder over-the-air read chain: the trackside balise and the onboard BTM (Balise Transmission Module) path that performs coupling, RF front-end processing, demod/decoding, and diagnostic evidence recording. It intentionally avoids broader ETCS/CBTC onboard architecture and any unrelated trackside subsystems.
1) System roles (responsibility boundary)
- Trackside balise (passive / semi-active) is responsible for producing a valid telegram under allowed coupling conditions and environmental tolerances. The key requirement is not “responds once,” but responds repeatably across installation variation, temperature drift, and nearby metal effects.
- Onboard BTM is responsible for the full read chain: coupling → RF conditioning → demod → sync → decode → integrity check → output, plus diagnostic evidence generation when reads fail (failure without evidence is not diagnosable).
2) Interface map (interfaces are evidence entry points)
Treat each interface as a place where the system must expose at least one measurable field for acceptance and field debugging:
- Antenna / coupler interface → coupling strength indicators (RSSI proxy), AGC level, saturation/weak-signal counters.
- RF front-end I/O and tap points → limiter flags, gain state, amplitude metrics (enough to classify “too strong vs too weak”).
- Demod/decoder output → frame detect state, sync-failure reason code, CRC results, retry counters.
- MCU interface (SPI/I²C/parallel) → structured fault codes, read attempt history, reset reason, temperature/voltage snapshots.
- Timestamp source → event timepoints (frame start / decode OK / commit done) and drift indicators.
- Storage & service/maintenance port → evidence packet commit status and sequence continuity (detect missing events).
3) Operating conditions (variation → mechanism → what to observe)
- High-speed pass reduces the effective read window → raises sensitivity to sync timing margin and retry policy.
- Gap / height / attitude changes create rapid coupling swings → AGC/limiter behavior and decision thresholds dominate.
- Trackside EMC + nearby metal introduces common-mode pickup and reflection effects → front-end dynamic range and filtering must be provable via logged fields, not guessed.
ALT (for this figure): Balise transponder system context showing train underside antenna coupling to trackside balise, onboard BTM RF demod decode logging chain, and evidence outputs.
H2-2. User Intent: What “Good” Looks Like in the Field
In rail signaling, “good” is not a subjective impression. It must be measurable, repeatable, and diagnosable. A balise read chain is acceptable only when it can prove: (1) reads are reliable across boundary conditions, (2) decoded content is integrity-checked and consistent, (3) event timing is trustworthy, and (4) failures still produce a usable evidence packet.
Acceptance targets (metrics → evidence → decision)
A) Read reliability
- Metric: success probability per pass, retry-count distribution, failure-type mix (sync vs CRC vs power).
- Evidence fields: read_attempts, read_success, retry_hist, fail_reason_code
- Decision: prove margin at boundary conditions (max speed / worst gap / temperature corners / EMC stress).
B) Timing correctness
- Metric: timestamp jitter and alignment error versus speed/odometer reference (if available).
- Evidence fields: t_frame_start, t_decode_ok, t_commit_done, clock_drift_est
- Decision: confirm timepoints are taken at defined stages (not “some time later”).
C) Decode integrity
- Metric: CRC/format check rate and multi-read consistency (same balise → same telegram hash).
- Evidence fields: crc_fail_count, telegram_hash, consistency_score
- Decision: avoid “false confidence”: CRC pass alone is insufficient without consistency checks under noise.
D) Diagnostics completeness
- Metric: evidence packet completeness rate, especially for failed reads and reset/brownout events.
- Evidence fields: log_commit_status, reset_reason, last_event_seqno
- Decision: failures must be classifiable without oscilloscopes on track.
A practical rule: if the system cannot tell whether a failure is weak coupling, front-end saturation, sync failure, or power/reset, the architecture is not field-ready.
ALT (for this figure): Balise read failure-to-evidence funnel mapping symptoms to mechanisms, logged evidence fields, and first corrective actions.
H2-3. Over-the-Air Coupling & Antenna Path
Balise read failures are most often margin problems, not protocol mysteries. The over-the-air path must be treated as a variable channel: coupling changes with height, tilt, nearby metal, and train speed. A field-ready design proves reliability by converting these variations into an explicit SNR margin budget and a time-window budget.
1) Practical coupling model (engineering, not academic)
- Coupling is not constant. Effective coupling varies with installation height/gap, antenna attitude, and metal proximity (rails, fasteners, underbody structures).
- Coupling variation becomes amplitude variation at the RF input. The read chain must classify failures as “too weak,” “too strong/saturated,” or “timing/sync limited,” based on observable fields.
- Reflections and near-field distortion can change the apparent signal shape even when average level is similar, which is why a robust design tracks both level indicators (RSSI/AGC) and decode-stage outcomes (sync/CRC).
2) Matching network impact (Q, bandwidth, tolerance)
- Higher Q can increase peak gain, but narrows bandwidth and increases sensitivity to component tolerance, temperature drift, and frequency offset. A “lab-perfect” tune can reduce field robustness.
- Tolerance budgeting is mandatory. Component tolerance and temperature drift shift the resonance point, reducing effective coupling and changing the noise bandwidth seen by the detector.
- Design goal: choose a Q/bandwidth that preserves SNR margin across worst-case installation and environment, rather than maximizing peak response at a single condition.
3) Speed-driven time window (read budget)
Higher speed primarily reduces the effective acquisition window. A shorter window limits synchronization time, reduces the number of retry opportunities, and tightens allowable processing latency. The design must explicitly budget: detect → sync → decode → verify → commit evidence within the available window.
4) What to prove (margin, not anecdotes)
SNR margin budget
- Coupling loss variation + mismatch loss + cable loss + interference pickup
- Front-end noise figure and effective bandwidth
- Demod threshold and required margin at boundary conditions
Time-window budget
- Window length at max speed (worst geometry)
- Minimum time for sync and frame detection
- Retry policy that converges before leaving the coupling zone
ALT (for this figure): Coupling and SNR budget diagram showing loss segments, demod threshold, remaining margin, and speed window impact for balise reads.
H2-4. RF Front-End Architecture
The RF front-end must be designed as an observable system. A non-observable front-end forces field teams to guess. A robust architecture separates “weak coupling,” “front-end saturation,” “sync failure,” and “power/reset” using measurable tap points and structured status codes.
1) Front-end blocks (what each block protects or proves)
- LNA / gain stage preserves weak-signal sensitivity. Failure signature: low level with high AGC demand, repeated sync-fail without limiter activity.
- Limiter prevents overdrive and clamps transient peaks. Failure signature: limiter active frequently, distorted amplitude leading to “CRC bursts” or unstable sync.
- AGC stabilizes amplitude under coupling swings. Failure signature: AGC pinned high (too weak) or pinned low (too strong).
- Filter trades interference rejection versus sensitivity. Too narrow harms frequency tolerance; too wide increases noise bandwidth.
- Detector / demod interface must expose “why decoding failed” (sync code, CRC) rather than only “fail.”
2) Dynamic range requirement (near-strong vs far-weak)
The required dynamic range must cover the full combination of coupling variation, tolerance drift, temperature corners, and interference pickup. The architecture should prove that strong coupling does not saturate the chain, while weak coupling still exceeds the minimum detectable level with adequate SNR margin.
3) Built-in observability (minimum tap-point set)
- Level indicators: RSSI proxy, AGC code, detector amplitude
- Clipping indicators: limiter flag, saturation counter
- Decode indicators: sync reason code, CRC fail count, retry histogram
- Context snapshot: temperature, supply voltage, reset reason, commit status
ALT (for this figure): RF front-end block diagram for balise reader showing antenna, LNA, limiter, AGC, filter, demod, and diagnostic tap points for RSSI, AGC, limiter, and sync/CRC.
H2-5. Demodulation, Decoding & Telegram Integrity
A “wrong read” is rarely a single bug. It is the result of a chain where waveform quality, synchronization margin, bit decisions, and integrity policy interact. A field-ready design must convert every decode failure into a pipeline-stage outcome with a clear evidence field: “failed at frame detect,” “failed at sync,” “CRC burst,” or “CRC pass but inconsistent across reads.”
1) Demod approach (kept at engineering abstraction)
- Envelope / amplitude path (ASK-like): sensitive to clipping, noise floor rise, and threshold jitter. Key evidence: amplitude stats, limiter activity, decision threshold state.
- Phase / zero-crossing path (PSK-like): sensitive to phase noise and sampling-phase drift. Key evidence: sync quality score and phase/clock error bins (compressed codes are acceptable).
- Correlation-based detect: sensitive to window length and multipath distortion. Key evidence: correlation peak ratio and peak position stability across attempts.
2) Frame detection and synchronization (deterministic failure classification)
-
Frame detect must distinguish miss-detect vs false-detect. If the design cannot log false-detect rate,
field tuning becomes guesswork.
Evidence: frame_detect_count, false_detect_count, preamble_quality -
Sync lock must record where it failed (timing window, jitter, threshold). “Sync fail” without a reason code
is not actionable.
Evidence: sync_reason_code, sync_lock_time, sync_quality_score -
Bit clock recovery failures often appear as “gets worse over the frame.” Capture slips/drift rather than only CRC.
Evidence: bit_slip_count, phase_error_bin (or clock_drift_code)
3) Bit decisions (soft vs hard) and parameter traceability
Soft decisions can improve robustness near the SNR edge but cost compute and power. Hard decisions are simpler but require well-managed thresholds. In either case, a field-debuggable system must record the decision mode and a configuration version (or threshold ID) so a failure can be reproduced.
Hard decision (threshold-driven)
- Risk: threshold jitter under noise/clipping
- Log: decision_mode, threshold_id (or config_version)
Soft decision (confidence-driven)
- Benefit: improved error tolerance near margin
- Log: decision_mode, confidence_bin (compressed), compute budget
4) Telegram integrity: CRC pass is necessary, not sufficient
CRC proves internal consistency for one read attempt, but it does not prove correctness under noise and interference. A robust implementation adds a multi-read consistency gate: repeated reads of the same balise within the same pass must converge to a consistent telegram hash before output is accepted.
- Consistency rule: accept output only when K-of-N attempts agree (majority or thresholded policy).
- Non-convergence: output “inconsistent” with evidence fields preserved (do not silently pick a random pass).
- Evidence: telegram_hash per attempt, read_group_id, consistency_score, crc_pass/fail stats
ALT (for this figure): Decode pipeline timeline for balise reads showing sampling, frame detect, sync, demod, decode, CRC and multi-read consistency with symptoms and evidence fields.
H2-6. Low-Power MCU & Wake-Up Strategy
Low-power MCU design in a balise read chain is not only about reducing standby current. It must preserve read availability, prevent brownout mid-read, and guarantee that failures still produce a minimal evidence record. The wake-up strategy, power sequencing, and firmware state machine must be engineered as a single reliability system.
1) Wake-up paths (stability vs power)
-
RF-field wake: lowest standby power, but sensitive to threshold drift and false wake from interference.
Log: wake_source, rf_wake_level, false_wake_count -
External interrupt wake: more deterministic timing integration, but vulnerable to EMI on harness lines.
Log: int_source_id, debounce_status, emi_event_counter -
Timer wake: predictable for self-test and health checks, but increases energy cost if poorly scheduled.
Log: wake_timer_id, schedule_version
2) Power sequencing and brownout prevention (avoid “dies mid-decode”)
Wake-up creates a steep current transient: clock start, RF front-end enable, demod compute, and non-volatile commit. The most common field symptom is partial reads followed by reset. Prevention requires:
- Power-good gating: do not enter Acquire until VDD is stable above a defined threshold for a minimum time.
- Staged enable: avoid overlapping peak loads (RF enable vs storage writes) unless hold-up is guaranteed.
- Minimal evidence first: on any failure path, commit a compact evidence packet before optional processing.
3) Firmware state machine (deterministic transitions + timeouts)
A rail-grade implementation requires explicit Commit and Recovery states. Every state must define entry conditions, exit criteria, timeouts, and the evidence fields it updates.
- Listen/Idle: wait for wake source, collect baseline context (temp, VDD, counters).
- Acquire: enable RF chain, confirm power-good, open the decode window.
- Decode/Verify: perform demod/decoding and CRC + multi-read consistency gating.
- Commit: write result and evidence with sequence number; confirm completion.
- Recovery: on any failure, store minimal evidence and return to safe state (sleep or controlled retry).
ALT (for this figure): Low-power MCU power-state machine for balise reader showing sleep, idle, acquire, decode, verify, commit, and recovery with power-good thresholds and timeouts.
H2-7. Event Timestamping & Correlation
“Event timestamp” becomes useful only when it forms a provable time chain: the time source has a known trust level, each critical decode milestone is time-stamped, and the record can be correlated to speed/odometer inputs using explicit fields. This chapter defines a minimal, auditable timestamp set that preserves alignment without expanding into full vehicle positioning.
1) Time sources (trust levels + health evidence)
-
Local RTC provides a local axis but can drift or lose validity after power events.
Log: timebase_mode, rtc_valid, rtc_epoch, rtc_health_code -
TCXO / stable local timebase improves short-window stability. Temperature is still relevant.
Log: timebase_mode, temp_c, timebase_health_code -
External sync input (if present) enables cross-module alignment, but must detect loss/jitter.
Log: sync_present, sync_lost_count, sync_offset_est, sync_jitter_bin
2) Timestamp strategy (milestones that form a closed evidence loop)
A reliable chain uses milestone stamps that map to both performance and integrity. The following minimal set supports root-cause classification and correlation:
- t_frame_start: anchors the entry into the coupling window.
- t_decode_done: proves processing time and window sufficiency.
- t_verify_ok: marks CRC + consistency acceptance time.
- t_commit_done: proves the evidence packet actually landed (not “lost mid-write”).
- t_fail_stage: records failure stage time when verification does not converge.
3) Correlation (speed/odometer alignment by fields, not inference)
Correlation is implemented through explicit input snapshots and alignment quality codes. The intent is to prove that an event belongs to a specific pass window, not to compute vehicle position.
- Inputs: speed_in, odometer_in, speed_sample_time
- Outputs: event_speed, event_odometer, align_quality (OK / STALE / MISSING)
- Grouping: read_group_id ties multiple read attempts within one pass window.
ALT (for this figure): Timestamp correlation map linking BTM timebase and milestone event timestamps with speed and odometer inputs and greyed upper-layer logs for alignment.
H2-8. Diagnostics & Evidence Packet (What to Log)
Diagnostics becomes differentiating only when it is a copyable evidence contract. The evidence packet must answer the key field questions even when reading fails: whether the failure is power-related, RF-margin-related, synchronization-related, or commit-related. This chapter defines a fixed schema with a minimal subset that must survive brownout, and a full subset used for deep root-cause analysis.
1) Minimal evidence (must survive failures)
Minimal evidence is written on every failure path before optional processing. It ensures that “read failed” can still be correlated to a specific pass window and root-cause direction.
- Identity: event_id, seqno, read_group_id
- Failure classification: fail_stage, fail_reason_code
- Power proof: reset_reason, vdd_min_mv, brownout_count
- Commit proof: commit_status, t_commit_done
2) Full evidence (answers the “why” with measurable fields)
Environment & state
- temp_c, vdd_mv, power_good_state
- reset_reason_code, boot_count
RF status
- rssi_bin, agc_code
- limiter_flag, limiter_count
- sat_count, under_amp_count
Decode status
- preamble_quality, sync_reason_code, sync_quality
- crc_fail_count, crc_reason_code, retry_count
- telegram_hash, consistency_score, k_of_n_result
Time & window
- t_frame_start, t_decode_done, t_verify_ok
- sample_window_us, timebase_mode, sync_present
- event_speed, event_odometer, align_quality
3) Result policy (hash/summary instead of full payload when needed)
The result section can store a telegram summary rather than full payload depending on policy. The evidence contract remains valid when it includes a deterministic summary identifier and a schema version.
- Result: telegram_hash, payload_len, store_policy_id
- Versioning: schema_version, fw_version, config_version
ALT (for this figure): Evidence packet schema diagram for balise diagnostics showing grouped fields for environment, RF, decode, time, and result with mappings to key troubleshooting questions.
H2-9. EMI/ESD/Environmental Hardening for Trackside Reality
Trackside conditions can disturb the balise read chain through two dominant failure routes: (1) front-end saturation / threshold collapse that drives CRC bursts and unstable hashes, and (2) power/reference perturbation that triggers brownout/reset and incomplete commits. Hardening is effective only when each mitigation maps to a specific coupling path and can be verified through observable fields.
1) ESD/EFT → front-end saturation → decode instability
Fast transients near the antenna/coupler or shield termination can inject common-mode current into the RF input path. The result is often not a total read loss but a destabilized decision process: limiter activity rises, AGC rails, and the demod threshold becomes noisy. The field signature is a combination of RSSI/AGC anomalies and CRC bursts.
- Victim nodes: limiter/LNA input, AGC loop, ADC/demod front-end
- Symptoms: RSSI jump, AGC at rails, limiter toggles, false detect increase, CRC bursts, inconsistent telegram_hash
- Observable fields: rssi_bin, agc_code, limiter_flag/limiter_count, sat_count, crc_fail_count, sync_reason_code, false_detect_count
2) ESD/EFT → reference/power disturbance → reset & incomplete evidence
A second dominant route is reference disturbance: ground bounce and supply dip can force brownout reset during acquire/decode, or abort non-volatile writes. A robust design must ensure that failure paths still commit a minimal evidence packet and record commit completion explicitly.
- Victim nodes: MCU BOR/WDT, clock start-up, NVM commit window
- Symptoms: mid-read reset, missing t_commit_done, commit_status failures, boot-count jumps
- Observable fields: reset_reason, vdd_min_mv, brownout_count, commit_status, t_commit_done
3) Shielding, grounding, and common-mode return (antenna-to-front-end specific)
Shielding is effective only when the common-mode return path is controlled. For the antenna-to-front-end segment, the goal is to prevent high-frequency return currents from flowing through sensitive reference networks. The most common failure pattern is a long/uncertain return path that converts transient current into threshold jitter or reset events.
- Design focus: minimize RF input loop area and provide a low-impedance return to chassis/reference
- Verification hint: correlate limiter/AGC anomalies with reset events and harness/shield configurations
4) Temperature and vibration: drift → margin loss → retries
Temperature and vibration can shift matching network parameters and degrade connector integrity. The typical signature is not a single failure, but a progressive loss of margin: sync becomes slower, retries rise, and K-of-N consistency converges less reliably under the same pass window.
- Observable fields: temp_c, retry_count, consistency_score, k_of_n_result
- Actionable linkage: tie drift evidence to the timestamp correlation fields (H2-7) for repeatability
ALT (for this figure): Interference path map for balise systems showing sources, coupling paths, victim nodes, mitigation actions, and observable evidence fields like RSSI, AGC, reset and commit status.
H2-10. Verification & Test Playbook (Bench + Field)
Verification becomes repeatable when each test stimulus is tied to expected evidence fields and pass/fail criteria. This playbook is organized as a closed loop: bench margin characterization, transient injection with evidence capture, pass-window stress (speed/time budget), and field regression driven by the evidence packet schema.
1) Bench margin characterization (coupling + SNR + saturation)
- Coupling control: use a repeatable antenna-to-balise fixture to vary coupling loss in defined steps.
- SNR sweep: reduce margin and observe convergence (retry and K-of-N consistency) rather than only CRC.
- Strong-field sweep: increase field to locate limiter/AGC rail points and verify the system fails safely (classified stage + evidence retained).
- Evidence focus: rssi_bin, agc_code, limiter_count, crc_fail_count, retry_count, consistency_score
2) Injection tests (ESD/EFT/transients) with commit-proof logging
- Stimulus: apply ESD/EFT in a controlled set of points around antenna, shield termination, and I/O entry.
- Expected evidence: limiter/AGC anomalies (route A) or reset/commit events (route B) must be captured, not inferred.
- Pass condition: any fail must still produce minimal evidence: stage + reason + reset/VDDmin + commit_status.
- Evidence focus: reset_reason, vdd_min_mv, commit_status, t_commit_done
3) Speed / pass-window stress (time budget + re-read policy)
- Window scaling: shorten effective read window and confirm sync/verify still converges or fails with a clear stage code.
- Policy validation: verify K-of-N convergence under reduced window without silently outputting unstable telegrams.
- Evidence focus: t_frame_start, t_verify_ok, sample_window_us, retry_count, k_of_n_result
4) Field regression loop (reproduce → evidence → root cause → fix → rerun)
- Reproduce: define repeatable conditions (temperature band, speed band, installation state, location segment).
- Capture: collect evidence packets with schema_version and configuration IDs.
- Classify: map to RF margin / sync / power / commit routes using the evidence fields.
- Fix & rerun: apply mitigation or policy change and rerun the same test-to-evidence matrix.
ALT (for this figure): Verification matrix for balise systems mapping bench and field tests to evidence groups (environment, RF, decode, time, commit, result) with required check marks and pass criteria.
H2-11. Design Pitfalls & Fix Patterns
This chapter compresses common field failures into actionable fix patterns. Each pattern follows the same workflow: Symptom → 2-field checks → First fix → Confirm. Concrete MPN examples are provided as starting points for design reviews and lab trials.
Pattern 1 — RSSI is high but CRC keeps failing
- Symptom: strong field indicated, yet crc_fail_count stays high; output is unstable.
- 2-field checks: (RF) agc_code + limiter_count/sat_count; (DECODE) sync_reason_code + crc_fail_count
- Interpretation: high RSSI can be distorted RSSI (clipping/AGC rail), not “good margin”.
- First fix: shorten the input clamp loop and stabilize the AGC operating window (avoid rail + limiter chatter).
- Confirm: run strong-field sweep (H2-10) and verify limiter_count decreases and CRC bursts disappear.
Input ESD/TVS (compact): PESD5V0S1UL (Nexperia), PESD5V0X1B (Nexperia), ESD9B5.0ST5G (onsemi)
RF limiter / front-end protection (broad use): HMC547ALC3 (Analog Devices), HMC987ALP5E (Analog Devices)
RF gain/AGC building blocks (broad use): ADL5501 (Analog Devices, RF detector), AD8361 (Analog Devices, detector), ADL5611 (Analog Devices, LNA)
Pattern 2 — RSSI is low, reads sometimes succeed (high sensitivity to conditions)
- Symptom: intermittent success; retry_count distribution becomes heavy-tail.
- 2-field checks: (RF) rssi_bin + under_amp_count; (DECODE/TIME) retry_count + sample_window_us
- Interpretation: margin is insufficient; the system “wins by retries” rather than stable convergence.
- First fix: restore margin at the antenna/matching/connector path before changing decode thresholds (avoid false locks).
- Confirm: coupling/SNR sweep shows smoother convergence and reduced retries at the same window.
Matching network components (robust MLCC families): GRM series (Murata), C0G/NP0 MLCC where possible (various vendors)
Low-loss RF switches (if needed in antenna path): SKY13385-679LF (Skyworks), ADRF5020 (Analog Devices)
Connector/vibration reliability (example families): JST GH/PH series (board-to-wire), TE MicroMatch (board-to-wire)
Pattern 3 — Fails only at low temperature
- Symptom: field failures cluster at cold; warm conditions pass.
- 2-field checks: (ENV/TIME) temp_c + timebase_health_code; (DECODE) sync_reason_code + retry_count
- Interpretation: cold drift can move matching, oscillator, or decision thresholds.
- First fix: use temperature-bucketed thresholds/AGC targets and a more stable timebase (TCXO) if sync drift dominates.
- Confirm: thermal soak regression shows stage-coded failures disappear or migrate to a stable, explainable route.
TCXO examples: SIT5358 (SiTime), TXETBLSANF-26.000000 (Epson, example family), Abracon ASTX-H11 series (Abracon)
RTC examples (if RTC required): RV-3028-C7 (Micro Crystal), PCF2129 (NXP)
Temperature sensor (for evidence + compensation): TMP117 (Texas Instruments), MCP9808 (Microchip)
Pattern 4 — Resets mid-read (read half then reboot)
- Symptom: incomplete telegram handling; missing commit proof; boot counters jump.
- 2-field checks: (POWER/COMMIT) reset_reason + vdd_min_mv; (COMMIT) commit_status + t_commit_done
- Interpretation: brownout/wake-up current spike or “commit too late” causes evidence loss.
- First fix: enforce “minimal evidence first” commit ordering and add brownout-safe power gating / supervisor policy.
- Confirm: transient injection + power dip tests still produce minimal evidence and t_commit_done when possible.
Supervisors / BOR helpers: TPS3839 (Texas Instruments), MAX809 (Analog Devices/Maxim), MCP1316 (Microchip)
Load switch / eFuse (local rail control): TPS22918 (Texas Instruments), TPS25940 (Texas Instruments), LTC4412 (Analog Devices, ideal diode controller)
Hold-up / bulk (application-dependent): polymer electrolytic families (Panasonic OS-CON), low-ESR electrolytics (various vendors)
Pattern 5 — “CRC passes” yet the decoded output is wrong (rare but critical)
- Symptom: incorrect telegram interpretation slips through single-pass checks; later correlation fails.
- 2-field checks: (DECODE) false_detect_count + sync_reason_code; (RESULT) telegram_hash + k_of_n_result/consistency_score
- Interpretation: frame detection or sync false-lock can yield “valid-looking” frames; CRC alone is not sufficient.
- First fix: promote acceptance from “CRC OK” to “K-of-N convergence + hash consistency” and tighten frame/sync gating.
- Confirm: edge SNR + interference injection no longer produces wrong outputs; failures become stage-coded.
FRAM for robust event hashing/logging (fast commit): FM24CL64B (Cypress/Infineon), MB85RS64V (Fujitsu)
Serial flash (if policy allows): W25Q32JV (Winbond), MX25R6435F (Macronix, low-power family)
Hardware CRC acceleration MCUs (example families): STM32L4 series (ST), MSP430FR series (TI, FRAM-based)
Pattern 6 — After ESD, performance degrades over time (not immediate failure)
- Symptom: baseline shifts; same installation now shows lower margin or higher retries days later.
- 2-field checks: (RF) rssi_bin distribution shift + under_amp_count; (DECODE) retry_count trend + crc_fail_count
- Interpretation: connector/matching drift or shield termination loosened; latent damage is plausible.
- First fix: re-baseline with the same fixture (SNR sweep) and inspect the antenna/matching/shield path before firmware changes.
- Confirm: post-ESD baseline matches pre-ESD within the test matrix; trend stabilizes.
Low-capacitance ESD for RF nodes: PESD1CAN (Nexperia, example family), ESD5Z series (various vendors)
Shield termination accessories (system-level): 360° EMC cable glands (HUMMEL/Pflitsch families), braid clamps (various vendors)
Adhesive/strain relief (mechanical): epoxy/RTV families (application-dependent)
Pattern 7 — Passes on bench, fails in trackside reality
- Symptom: lab fixture is stable; field shows high variance and unexplained dropouts.
- 2-field checks: (RF) agc_code stability + limiter_count; (TIME/DECODE) read_group_id clustering + retry_count tail
- Interpretation: uncontrolled common-mode return, installation posture, metal reflections, or cable routing dominates.
- First fix: make the field setup measurable: record installation state in logs and enforce a controlled return path (shield→chassis).
- Confirm: field regression can reproduce failures and map them to one coupling path (F9), then resolve after the fix.
Isolated transceivers for maintenance/debug links: ISO3082 (Texas Instruments, isolated RS-485), ADM2587E (Analog Devices, isolated RS-485)
Common-mode choke (as an interface helper where applicable): WE-CMB series (Würth Elektronik), TDK ACM series (TDK)
Ethernet isolation magnetics (if applicable in the module boundary): Pulse H5007NL (Pulse, example family)
Pattern 8 — Logs exist but cannot be correlated to the same pass window (evidence is unusable)
- Symptom: events cannot be aligned; upper-layer comparisons are ambiguous.
- 2-field checks: (TIME) timebase_mode + align_quality; (ID) event_id + read_group_id
- Interpretation: missing group IDs or timebase health makes timestamps non-evidentiary.
- First fix: enforce group IDs and alignment quality codes; record timebase health fields on every event.
- Confirm: multi-read events within one pass window cluster under the same read_group_id with OK alignment.
Secure timestamp / tamper-aware RTC options (if needed): NXP PCF85063A (RTC), Microchip MCP79410 (RTC)
Small backup supply element (application-dependent): ML1220 rechargeable coin cell (Panasonic), supercap families (various vendors)
Low-power MCU families with good retention: STM32L0/L4 (ST), EFR32BG (Silicon Labs), MSP430FR (TI)
ALT (for this figure): Root-cause decision tree for balise failures mapping symptoms to two evidence-field checks and first fix actions such as clamp loop tuning, brownout-safe commits, and temperature-bucket thresholds.
Implementation note: MPN examples are provided as evaluation starting points. Final selection must be validated against the actual RF band, interface impedance, transient levels, and mechanical constraints of the balise module and antenna path.
H2-12. FAQs (Evidence-Driven Troubleshooting)
Each FAQ is designed as a fast field SOP: Verdict → Evidence ×2 → First fix, and explicitly maps back to H2-3…H2-11.
RSSI is high but decoding always fails — front-end saturation or sync thresholds?
Evidence: Check agc_code with limiter_count/sat_count (H2-4) and confirm whether failures cluster at sync_reason_code vs crc_fail_count (H2-5/H2-8).
First fix: Shorten the clamp loop / improve return path, then retune AGC target window before touching decode logic (H2-11).
Unreadable only at high speed — window too short or re-read policy wrong?
Evidence: Compare sample_window_us vs retry_count distribution (H2-3), and locate the dominant fail stage via sync_reason_code / CRC bursts (H2-5).
First fix: Enforce a speed-aware K-of-N policy (bounded retries + early abort stage codes) and validate with window-stress tests (H2-10).
Occasional wrong read but CRC shows no error — telegram version compatibility?
Evidence: Require telegram_hash consistency across reads and track k_of_n_result/consistency_score (H2-8). Also inspect false_detect_count and stage codes around sync/parse (H2-5).
First fix: Upgrade acceptance to “K-of-N + hash stable,” and log telegram_version_id/parser_path_id (H2-11).
Failure spikes in cold/heat — matching drift or clock drift causing sampling misalignment?
Evidence: Correlate temp_c with rssi_bin shift (margin drift, H2-3) and with sync_reason_code/sync_lock_time (timing drift, H2-5). Track interference sensitivity changes (H2-9).
First fix: Apply temperature-bucketed thresholds/AGC targets and add timebase health reporting before redesigning hardware.
Works in one direction but fails on the return pass — posture/reflection or install height?
Evidence: Compare rssi_bin and agc_code stability between directions (H2-3). If interference route changes, limiter_count spikes and stage codes shift (H2-9).
First fix: Log installation state (height/posture ID) and correlate with the coupling/SNR budget before firmware changes (H2-3/H2-10).
After ESD, reads fail — front-end damage or MCU reset loop?
Evidence: Check reset_reason with vdd_min_mv (H2-6/H2-8) and compare limiter_count/sat_count behavior pre/post ESD (H2-9).
First fix: Ensure minimal evidence commits survive resets, then improve clamp/return path if RF saturation signatures dominate (H2-11).
Unreadable and no logs either — did power loss prevent commit?
Evidence: Look for missing t_commit_done and non-OK commit_status, then correlate with reset_reason and vdd_min_mv (H2-6/H2-8).
First fix: Move “minimal evidence first” ahead of heavy parsing and use a commit-proof scheme (FRAM or two-phase flags) (H2-11).
Needs many re-reads to stabilize — sync/threshold issue or insufficient SNR margin?
Evidence: Use retry_count trend plus sync_reason_code to separate “can’t lock” from “locks then CRC bursts” (H2-5). Check whether rssi_bin is near the sensitivity knee (H2-3).
First fix: Restore margin first (coupling/matching), then tune gating thresholds to avoid false acceptance.
AGC stays pinned — weak coupling or wrong gain configuration?
Evidence: Compare agc_code with rssi_bin and under_amp_count (H2-8). If RSSI is not low, the gain table/register path is suspect (H2-4).
First fix: Audit gain/AGC register configuration and tap-point observability before mechanical changes (H2-4).
Timestamps don’t align with speed/odometer — wrong marking point or clock drift?
Evidence: Compare align_quality with timebase_health_code (H2-7). Validate whether timestamps are taken at frame start/end/verify OK consistently using test logs (H2-10).
First fix: Standardize marking points (t_frame_start/t_verify_ok) and add alignment quality codes for every event (H2-7/H2-8).
“Read OK but position is still wrong” — missing correlation fields between telegram and upper layers?
Evidence: Verify read_group_id clustering and event_id uniqueness per pass window (H2-7/H2-8). Check whether align_quality is OK when the upstream claims mismatch.
First fix: Add/validate correlation IDs and a minimal telegram summary (hash + version) so upstream mapping can be proven (H2-8).
Failure rate changes after switching balise batches — protocol nuance or RF tolerance distribution?
Evidence: Compare telegram_version_id/parser_path_id outcomes (H2-5) and baseline rssi_bin + retry/CRC statistics under identical fixture conditions (H2-10).
First fix: Run A/B baseline with the test-to-evidence matrix and apply the decision-tree route before hardware changes (H2-11).