Balise / Transponder: RF Demodulation, Diagnostics, Event Timing

Q: RSSI is high but decoding always fails — front-end saturation or sync thresholds?

Most cases are front-end clipping or AGC rail, not true SNR margin. Check agc_code with limiter_count/sat_count and confirm whether failures cluster at sync_reason_code versus crc_fail_count. First fix: shorten the clamp loop/improve return path, then retune the AGC target window before changing decode logic.

Q: Unreadable only at high speed — window too short or re-read policy wrong?

High-speed failures usually come from a time-budget collapse before convergence. Compare sample_window_us against retry_count distribution and locate the dominant fail stage using sync_reason_code and CRC bursts. First fix: enforce a speed-aware K-of-N policy with bounded retries and validate using window-stress verification.

Q: Occasional wrong read but CRC shows no error — telegram version compatibility?

Treat CRC OK as insufficient because false-lock or parse mismatch can still yield wrong semantics. Require telegram_hash stability across reads and track k_of_n_result/consistency_score, plus false_detect_count and stage codes around sync/parse. First fix: accept only K-of-N + hash stability and log telegram_version_id/parser_path_id.

Q: Failure spikes in cold/heat — matching drift or clock drift causing sampling misalignment?

Separate RF margin drift from timebase/threshold drift using trends. Correlate temp_c with rssi_bin shift and with sync_reason_code (or lock-time metrics) to identify the dominant route, and track interference sensitivity changes. First fix: apply temperature-bucketed thresholds/AGC targets and add timebase health reporting before redesigning hardware.

Q: Works in one direction but fails on the return pass — posture/reflection or install height?

Direction-dependent reads usually point to coupling geometry and metal environment sensitivity. Compare rssi_bin and agc_code stability between directions; if the interference route changes, limiter_count spikes and stage codes shift. First fix: log installation state (height/posture ID) and correlate with coupling/SNR budget before firmware changes.

Q: After ESD, reads fail — front-end damage or MCU reset loop?

Distinguish RF path injury from power/reset collapse using one RF field and one power field. Check reset_reason with vdd_min_mv, then compare limiter_count/sat_count behavior pre/post ESD. First fix: ensure minimal evidence commits survive resets, then improve clamp/return path if RF saturation signatures dominate.

Q: Unreadable and no logs either — did power loss prevent commit?

No log is a power/commit problem until proven otherwise. Look for missing t_commit_done and non-OK commit_status, then correlate with reset_reason and vdd_min_mv. First fix: move minimal-evidence commits ahead of heavy parsing and use a commit-proof scheme such as FRAM or two-phase flags.

Q: Needs many re-reads to stabilize — sync/threshold issue or insufficient SNR margin?

Persistent retries mean the system is not converging. Use retry_count plus sync_reason_code to separate can’t-lock from locks-then-CRC-bursts, and check whether rssi_bin is near the sensitivity knee. First fix: restore margin first (coupling/matching), then tune gating thresholds to avoid false acceptance.

Q: AGC stays pinned — weak coupling or wrong gain configuration?

AGC rail can mean input too small or gain path misconfigured; differentiate by RSSI bins. Compare agc_code with rssi_bin and under_amp_count; if RSSI is not low, the gain table/register path is suspect. First fix: audit gain/AGC register configuration and tap-point observability before mechanical changes.

Q: Timestamps don’t align with speed/odometer — wrong marking point or clock drift?

Misalignment is usually a marking-definition issue before it is a hardware clock issue. Compare align_quality with timebase_health_code and validate consistent timestamp points (frame start/end/verify OK) using regression logs. First fix: standardize marking points and add alignment quality codes on every event.

← Back to: Rail Transit & Locomotive

A Balise/Transponder link is only “reliable” when it can prove every pass with a consistent RF→decode→timestamp→commit evidence chain—so unreadable, wrong reads, and post-ESD failures are debugged by checking a few key fields (RSSI/AGC/CRC/timebase/commit) and applying the first fix pattern before changing architecture.

H2-1. Scope & Interfaces

This page locks the scope to the balise/transponder over-the-air read chain: the trackside balise and the onboard BTM (Balise Transmission Module) path that performs coupling, RF front-end processing, demod/decoding, and diagnostic evidence recording. It intentionally avoids broader ETCS/CBTC onboard architecture and any unrelated trackside subsystems.

Balise / Transponder BTM Read Chain RF Front-End Demod / Decode Event Evidence & Timestamps

1) System roles (responsibility boundary)

Trackside balise (passive / semi-active) is responsible for producing a valid telegram under allowed coupling conditions and environmental tolerances. The key requirement is not “responds once,” but responds repeatably across installation variation, temperature drift, and nearby metal effects.
Onboard BTM is responsible for the full read chain: coupling → RF conditioning → demod → sync → decode → integrity check → output, plus diagnostic evidence generation when reads fail (failure without evidence is not diagnosable).

2) Interface map (interfaces are evidence entry points)

Treat each interface as a place where the system must expose at least one measurable field for acceptance and field debugging:

Antenna / coupler interface → coupling strength indicators (RSSI proxy), AGC level, saturation/weak-signal counters.
RF front-end I/O and tap points → limiter flags, gain state, amplitude metrics (enough to classify “too strong vs too weak”).
Demod/decoder output → frame detect state, sync-failure reason code, CRC results, retry counters.
MCU interface (SPI/I²C/parallel) → structured fault codes, read attempt history, reset reason, temperature/voltage snapshots.
Timestamp source → event timepoints (frame start / decode OK / commit done) and drift indicators.
Storage & service/maintenance port → evidence packet commit status and sequence continuity (detect missing events).

3) Operating conditions (variation → mechanism → what to observe)

High-speed pass reduces the effective read window → raises sensitivity to sync timing margin and retry policy.
Gap / height / attitude changes create rapid coupling swings → AGC/limiter behavior and decision thresholds dominate.
Trackside EMC + nearby metal introduces common-mode pickup and reflection effects → front-end dynamic range and filtering must be provable via logged fields, not guessed.

Figure F1. System context map for the balise/transponder read chain. The upper-layer ETCS/CBTC block is shown only as an interface (out of scope).

Cite this figure

ALT (for this figure): Balise transponder system context showing train underside antenna coupling to trackside balise, onboard BTM RF demod decode logging chain, and evidence outputs.

H2-2. User Intent: What “Good” Looks Like in the Field

In rail signaling, “good” is not a subjective impression. It must be measurable, repeatable, and diagnosable. A balise read chain is acceptable only when it can prove: (1) reads are reliable across boundary conditions, (2) decoded content is integrity-checked and consistent, (3) event timing is trustworthy, and (4) failures still produce a usable evidence packet.

Acceptance targets (metrics → evidence → decision)

A) Read reliability

Metric: success probability per pass, retry-count distribution, failure-type mix (sync vs CRC vs power).
Evidence fields: read_attempts, read_success, retry_hist, fail_reason_code
Decision: prove margin at boundary conditions (max speed / worst gap / temperature corners / EMC stress).

B) Timing correctness

Metric: timestamp jitter and alignment error versus speed/odometer reference (if available).
Evidence fields: t_frame_start, t_decode_ok, t_commit_done, clock_drift_est
Decision: confirm timepoints are taken at defined stages (not “some time later”).

C) Decode integrity

Metric: CRC/format check rate and multi-read consistency (same balise → same telegram hash).
Evidence fields: crc_fail_count, telegram_hash, consistency_score
Decision: avoid “false confidence”: CRC pass alone is insufficient without consistency checks under noise.

D) Diagnostics completeness

Metric: evidence packet completeness rate, especially for failed reads and reset/brownout events.
Evidence fields: log_commit_status, reset_reason, last_event_seqno
Decision: failures must be classifiable without oscilloscopes on track.

A practical rule: if the system cannot tell whether a failure is weak coupling, front-end saturation, sync failure, or power/reset, the architecture is not field-ready.

Figure F2. A practical funnel that turns field symptoms into mechanisms, required evidence fields, and a first action.

Cite this figure

ALT (for this figure): Balise read failure-to-evidence funnel mapping symptoms to mechanisms, logged evidence fields, and first corrective actions.

H2-3. Over-the-Air Coupling & Antenna Path

Balise read failures are most often margin problems, not protocol mysteries. The over-the-air path must be treated as a variable channel: coupling changes with height, tilt, nearby metal, and train speed. A field-ready design proves reliability by converting these variations into an explicit SNR margin budget and a time-window budget.

Coupling variation Matching & Q SNR margin Speed window

1) Practical coupling model (engineering, not academic)

Coupling is not constant. Effective coupling varies with installation height/gap, antenna attitude, and metal proximity (rails, fasteners, underbody structures).
Coupling variation becomes amplitude variation at the RF input. The read chain must classify failures as “too weak,” “too strong/saturated,” or “timing/sync limited,” based on observable fields.
Reflections and near-field distortion can change the apparent signal shape even when average level is similar, which is why a robust design tracks both level indicators (RSSI/AGC) and decode-stage outcomes (sync/CRC).

2) Matching network impact (Q, bandwidth, tolerance)

Higher Q can increase peak gain, but narrows bandwidth and increases sensitivity to component tolerance, temperature drift, and frequency offset. A “lab-perfect” tune can reduce field robustness.
Tolerance budgeting is mandatory. Component tolerance and temperature drift shift the resonance point, reducing effective coupling and changing the noise bandwidth seen by the detector.
Design goal: choose a Q/bandwidth that preserves SNR margin across worst-case installation and environment, rather than maximizing peak response at a single condition.

3) Speed-driven time window (read budget)

Higher speed primarily reduces the effective acquisition window. A shorter window limits synchronization time, reduces the number of retry opportunities, and tightens allowable processing latency. The design must explicitly budget: detect → sync → decode → verify → commit evidence within the available window.

4) What to prove (margin, not anecdotes)

SNR margin budget

Coupling loss variation + mismatch loss + cable loss + interference pickup
Front-end noise figure and effective bandwidth
Demod threshold and required margin at boundary conditions

Time-window budget

Window length at max speed (worst geometry)
Minimum time for sync and frame detection
Retry policy that converges before leaving the coupling zone

Figure F3. A budget-style view that turns installation and environment variation into explicit SNR margin and time-window constraints.

Cite this figure

ALT (for this figure): Coupling and SNR budget diagram showing loss segments, demod threshold, remaining margin, and speed window impact for balise reads.

H2-4. RF Front-End Architecture

The RF front-end must be designed as an observable system. A non-observable front-end forces field teams to guess. A robust architecture separates “weak coupling,” “front-end saturation,” “sync failure,” and “power/reset” using measurable tap points and structured status codes.

LNA Limiter AGC Filter Detector / Demod Tap points

1) Front-end blocks (what each block protects or proves)

LNA / gain stage preserves weak-signal sensitivity. Failure signature: low level with high AGC demand, repeated sync-fail without limiter activity.
Limiter prevents overdrive and clamps transient peaks. Failure signature: limiter active frequently, distorted amplitude leading to “CRC bursts” or unstable sync.
AGC stabilizes amplitude under coupling swings. Failure signature: AGC pinned high (too weak) or pinned low (too strong).
Filter trades interference rejection versus sensitivity. Too narrow harms frequency tolerance; too wide increases noise bandwidth.
Detector / demod interface must expose “why decoding failed” (sync code, CRC) rather than only “fail.”

2) Dynamic range requirement (near-strong vs far-weak)

The required dynamic range must cover the full combination of coupling variation, tolerance drift, temperature corners, and interference pickup. The architecture should prove that strong coupling does not saturate the chain, while weak coupling still exceeds the minimum detectable level with adequate SNR margin.

3) Built-in observability (minimum tap-point set)

Level indicators: RSSI proxy, AGC code, detector amplitude
Clipping indicators: limiter flag, saturation counter
Decode indicators: sync reason code, CRC fail count, retry histogram
Context snapshot: temperature, supply voltage, reset reason, commit status

Figure F4. The front-end is structured to expose level, clipping, gain state, and decode outcomes, enabling fast root-cause classification.

Cite this figure

ALT (for this figure): RF front-end block diagram for balise reader showing antenna, LNA, limiter, AGC, filter, demod, and diagnostic tap points for RSSI, AGC, limiter, and sync/CRC.

H2-5. Demodulation, Decoding & Telegram Integrity

A “wrong read” is rarely a single bug. It is the result of a chain where waveform quality, synchronization margin, bit decisions, and integrity policy interact. A field-ready design must convert every decode failure into a pipeline-stage outcome with a clear evidence field: “failed at frame detect,” “failed at sync,” “CRC burst,” or “CRC pass but inconsistent across reads.”

Frame detect Sync Bit clock Soft/Hard decision CRC Multi-read consistency

1) Demod approach (kept at engineering abstraction)

Envelope / amplitude path (ASK-like): sensitive to clipping, noise floor rise, and threshold jitter. Key evidence: amplitude stats, limiter activity, decision threshold state.
Phase / zero-crossing path (PSK-like): sensitive to phase noise and sampling-phase drift. Key evidence: sync quality score and phase/clock error bins (compressed codes are acceptable).
Correlation-based detect: sensitive to window length and multipath distortion. Key evidence: correlation peak ratio and peak position stability across attempts.

2) Frame detection and synchronization (deterministic failure classification)

Frame detect must distinguish miss-detect vs false-detect. If the design cannot log false-detect rate, field tuning becomes guesswork.
Evidence: frame_detect_count, false_detect_count, preamble_quality
Sync lock must record where it failed (timing window, jitter, threshold). “Sync fail” without a reason code is not actionable.
Evidence: sync_reason_code, sync_lock_time, sync_quality_score
Bit clock recovery failures often appear as “gets worse over the frame.” Capture slips/drift rather than only CRC.
Evidence: bit_slip_count, phase_error_bin (or clock_drift_code)

3) Bit decisions (soft vs hard) and parameter traceability

Soft decisions can improve robustness near the SNR edge but cost compute and power. Hard decisions are simpler but require well-managed thresholds. In either case, a field-debuggable system must record the decision mode and a configuration version (or threshold ID) so a failure can be reproduced.

Hard decision (threshold-driven)

Risk: threshold jitter under noise/clipping
Log: decision_mode, threshold_id (or config_version)

Soft decision (confidence-driven)

Benefit: improved error tolerance near margin
Log: decision_mode, confidence_bin (compressed), compute budget

4) Telegram integrity: CRC pass is necessary, not sufficient

CRC proves internal consistency for one read attempt, but it does not prove correctness under noise and interference. A robust implementation adds a multi-read consistency gate: repeated reads of the same balise within the same pass must converge to a consistent telegram hash before output is accepted.

Consistency rule: accept output only when K-of-N attempts agree (majority or thresholded policy).
Non-convergence: output “inconsistent” with evidence fields preserved (do not silently pick a random pass).
Evidence: telegram_hash per attempt, read_group_id, consistency_score, crc_pass/fail stats

Figure F5. The decode chain is diagnosable only when each stage emits a clear outcome and evidence fields, not a single “read failed” bit.

Cite this figure

ALT (for this figure): Decode pipeline timeline for balise reads showing sampling, frame detect, sync, demod, decode, CRC and multi-read consistency with symptoms and evidence fields.

H2-6. Low-Power MCU & Wake-Up Strategy

Low-power MCU design in a balise read chain is not only about reducing standby current. It must preserve read availability, prevent brownout mid-read, and guarantee that failures still produce a minimal evidence record. The wake-up strategy, power sequencing, and firmware state machine must be engineered as a single reliability system.

RF-field wake External interrupt Timer wake Brownout State machine Commit & Recovery

1) Wake-up paths (stability vs power)

RF-field wake: lowest standby power, but sensitive to threshold drift and false wake from interference.
Log: wake_source, rf_wake_level, false_wake_count
External interrupt wake: more deterministic timing integration, but vulnerable to EMI on harness lines.
Log: int_source_id, debounce_status, emi_event_counter
Timer wake: predictable for self-test and health checks, but increases energy cost if poorly scheduled.
Log: wake_timer_id, schedule_version

2) Power sequencing and brownout prevention (avoid “dies mid-decode”)

Wake-up creates a steep current transient: clock start, RF front-end enable, demod compute, and non-volatile commit. The most common field symptom is partial reads followed by reset. Prevention requires:

Power-good gating: do not enter Acquire until VDD is stable above a defined threshold for a minimum time.
Staged enable: avoid overlapping peak loads (RF enable vs storage writes) unless hold-up is guaranteed.
Minimal evidence first: on any failure path, commit a compact evidence packet before optional processing.

3) Firmware state machine (deterministic transitions + timeouts)

A rail-grade implementation requires explicit Commit and Recovery states. Every state must define entry conditions, exit criteria, timeouts, and the evidence fields it updates.

Listen/Idle: wait for wake source, collect baseline context (temp, VDD, counters).
Acquire: enable RF chain, confirm power-good, open the decode window.
Decode/Verify: perform demod/decoding and CRC + multi-read consistency gating.
Commit: write result and evidence with sequence number; confirm completion.
Recovery: on any failure, store minimal evidence and return to safe state (sleep or controlled retry).

Figure F6. A low-power design remains field-reliable only when wake-up, power-good gating, timeouts, and evidence commit are part of a single state machine.

Cite this figure

ALT (for this figure): Low-power MCU power-state machine for balise reader showing sleep, idle, acquire, decode, verify, commit, and recovery with power-good thresholds and timeouts.

H2-7. Event Timestamping & Correlation

“Event timestamp” becomes useful only when it forms a provable time chain: the time source has a known trust level, each critical decode milestone is time-stamped, and the record can be correlated to speed/odometer inputs using explicit fields. This chapter defines a minimal, auditable timestamp set that preserves alignment without expanding into full vehicle positioning.

Timebase trust Milestone stamps Commit proof Speed/Odo alignment Quality codes

1) Time sources (trust levels + health evidence)

Local RTC provides a local axis but can drift or lose validity after power events.
Log: timebase_mode, rtc_valid, rtc_epoch, rtc_health_code
TCXO / stable local timebase improves short-window stability. Temperature is still relevant.
Log: timebase_mode, temp_c, timebase_health_code
External sync input (if present) enables cross-module alignment, but must detect loss/jitter.
Log: sync_present, sync_lost_count, sync_offset_est, sync_jitter_bin

2) Timestamp strategy (milestones that form a closed evidence loop)

A reliable chain uses milestone stamps that map to both performance and integrity. The following minimal set supports root-cause classification and correlation:

t_frame_start: anchors the entry into the coupling window.
t_decode_done: proves processing time and window sufficiency.
t_verify_ok: marks CRC + consistency acceptance time.
t_commit_done: proves the evidence packet actually landed (not “lost mid-write”).
t_fail_stage: records failure stage time when verification does not converge.

3) Correlation (speed/odometer alignment by fields, not inference)

Correlation is implemented through explicit input snapshots and alignment quality codes. The intent is to prove that an event belongs to a specific pass window, not to compute vehicle position.

Inputs: speed_in, odometer_in, speed_sample_time
Outputs: event_speed, event_odometer, align_quality (OK / STALE / MISSING)
Grouping: read_group_id ties multiple read attempts within one pass window.

Figure F7. A provable chain requires timebase health + milestone stamps + explicit correlation fields, while keeping upper-layer positioning out of scope.

Cite this figure

ALT (for this figure): Timestamp correlation map linking BTM timebase and milestone event timestamps with speed and odometer inputs and greyed upper-layer logs for alignment.

H2-8. Diagnostics & Evidence Packet (What to Log)

Diagnostics becomes differentiating only when it is a copyable evidence contract. The evidence packet must answer the key field questions even when reading fails: whether the failure is power-related, RF-margin-related, synchronization-related, or commit-related. This chapter defines a fixed schema with a minimal subset that must survive brownout, and a full subset used for deep root-cause analysis.

Schema contract Minimal vs full Power & reset RF status Decode status Time & window

1) Minimal evidence (must survive failures)

Minimal evidence is written on every failure path before optional processing. It ensures that “read failed” can still be correlated to a specific pass window and root-cause direction.

Identity: event_id, seqno, read_group_id
Failure classification: fail_stage, fail_reason_code
Power proof: reset_reason, vdd_min_mv, brownout_count
Commit proof: commit_status, t_commit_done

2) Full evidence (answers the “why” with measurable fields)

Environment & state

temp_c, vdd_mv, power_good_state
reset_reason_code, boot_count

RF status

rssi_bin, agc_code
limiter_flag, limiter_count
sat_count, under_amp_count

Decode status

preamble_quality, sync_reason_code, sync_quality
crc_fail_count, crc_reason_code, retry_count
telegram_hash, consistency_score, k_of_n_result

Time & window

t_frame_start, t_decode_done, t_verify_ok
sample_window_us, timebase_mode, sync_present
event_speed, event_odometer, align_quality

3) Result policy (hash/summary instead of full payload when needed)

The result section can store a telegram summary rather than full payload depending on policy. The evidence contract remains valid when it includes a deterministic summary identifier and a schema version.

Result: telegram_hash, payload_len, store_policy_id
Versioning: schema_version, fw_version, config_version

Figure F8. A fixed evidence packet schema turns failures into answerable questions, even under brownout or incomplete reads.

Cite this figure

ALT (for this figure): Evidence packet schema diagram for balise diagnostics showing grouped fields for environment, RF, decode, time, and result with mappings to key troubleshooting questions.

H2-9. EMI/ESD/Environmental Hardening for Trackside Reality

Trackside conditions can disturb the balise read chain through two dominant failure routes: (1) front-end saturation / threshold collapse that drives CRC bursts and unstable hashes, and (2) power/reference perturbation that triggers brownout/reset and incomplete commits. Hardening is effective only when each mitigation maps to a specific coupling path and can be verified through observable fields.

ESD/EFT coupling Common-mode return Front-end saturation Reset/brownout Temp/vibration drift Observable fields

1) ESD/EFT → front-end saturation → decode instability

Fast transients near the antenna/coupler or shield termination can inject common-mode current into the RF input path. The result is often not a total read loss but a destabilized decision process: limiter activity rises, AGC rails, and the demod threshold becomes noisy. The field signature is a combination of RSSI/AGC anomalies and CRC bursts.

Victim nodes: limiter/LNA input, AGC loop, ADC/demod front-end
Symptoms: RSSI jump, AGC at rails, limiter toggles, false detect increase, CRC bursts, inconsistent telegram_hash
Observable fields: rssi_bin, agc_code, limiter_flag/limiter_count, sat_count, crc_fail_count, sync_reason_code, false_detect_count

2) ESD/EFT → reference/power disturbance → reset & incomplete evidence

A second dominant route is reference disturbance: ground bounce and supply dip can force brownout reset during acquire/decode, or abort non-volatile writes. A robust design must ensure that failure paths still commit a minimal evidence packet and record commit completion explicitly.

Victim nodes: MCU BOR/WDT, clock start-up, NVM commit window
Symptoms: mid-read reset, missing t_commit_done, commit_status failures, boot-count jumps
Observable fields: reset_reason, vdd_min_mv, brownout_count, commit_status, t_commit_done

3) Shielding, grounding, and common-mode return (antenna-to-front-end specific)

Shielding is effective only when the common-mode return path is controlled. For the antenna-to-front-end segment, the goal is to prevent high-frequency return currents from flowing through sensitive reference networks. The most common failure pattern is a long/uncertain return path that converts transient current into threshold jitter or reset events.

Design focus: minimize RF input loop area and provide a low-impedance return to chassis/reference
Verification hint: correlate limiter/AGC anomalies with reset events and harness/shield configurations

4) Temperature and vibration: drift → margin loss → retries

Temperature and vibration can shift matching network parameters and degrade connector integrity. The typical signature is not a single failure, but a progressive loss of margin: sync becomes slower, retries rise, and K-of-N consistency converges less reliably under the same pass window.

Observable fields: temp_c, retry_count, consistency_score, k_of_n_result
Actionable linkage: tie drift evidence to the timestamp correlation fields (H2-7) for repeatability

Figure F9. Hardening is trackable only when each interference route is mapped to a coupling path, a victim node, and observable evidence fields.

Cite this figure

ALT (for this figure): Interference path map for balise systems showing sources, coupling paths, victim nodes, mitigation actions, and observable evidence fields like RSSI, AGC, reset and commit status.

H2-10. Verification & Test Playbook (Bench + Field)

Verification becomes repeatable when each test stimulus is tied to expected evidence fields and pass/fail criteria. This playbook is organized as a closed loop: bench margin characterization, transient injection with evidence capture, pass-window stress (speed/time budget), and field regression driven by the evidence packet schema.

Coupling control SNR sweep Strong-field sweep ESD/EFT injection Window stress Field regression

1) Bench margin characterization (coupling + SNR + saturation)

Coupling control: use a repeatable antenna-to-balise fixture to vary coupling loss in defined steps.
SNR sweep: reduce margin and observe convergence (retry and K-of-N consistency) rather than only CRC.
Strong-field sweep: increase field to locate limiter/AGC rail points and verify the system fails safely (classified stage + evidence retained).
Evidence focus: rssi_bin, agc_code, limiter_count, crc_fail_count, retry_count, consistency_score

2) Injection tests (ESD/EFT/transients) with commit-proof logging

Stimulus: apply ESD/EFT in a controlled set of points around antenna, shield termination, and I/O entry.
Expected evidence: limiter/AGC anomalies (route A) or reset/commit events (route B) must be captured, not inferred.
Pass condition: any fail must still produce minimal evidence: stage + reason + reset/VDDmin + commit_status.
Evidence focus: reset_reason, vdd_min_mv, commit_status, t_commit_done

3) Speed / pass-window stress (time budget + re-read policy)

Window scaling: shorten effective read window and confirm sync/verify still converges or fails with a clear stage code.
Policy validation: verify K-of-N convergence under reduced window without silently outputting unstable telegrams.
Evidence focus: t_frame_start, t_verify_ok, sample_window_us, retry_count, k_of_n_result

4) Field regression loop (reproduce → evidence → root cause → fix → rerun)

Reproduce: define repeatable conditions (temperature band, speed band, installation state, location segment).
Capture: collect evidence packets with schema_version and configuration IDs.
Classify: map to RF margin / sync / power / commit routes using the evidence fields.
Fix & rerun: apply mitigation or policy change and rerun the same test-to-evidence matrix.

Figure F10. A verification plan is complete only when each test maps to required evidence groups and a clear acceptance intent (evidence present, stage coded, commit proven).

Cite this figure

ALT (for this figure): Verification matrix for balise systems mapping bench and field tests to evidence groups (environment, RF, decode, time, commit, result) with required check marks and pass criteria.

H2-11. Design Pitfalls & Fix Patterns

This chapter compresses common field failures into actionable fix patterns. Each pattern follows the same workflow: Symptom → 2-field checks → First fix → Confirm. Concrete MPN examples are provided as starting points for design reviews and lab trials.

2-field checks Stage-coded failures Commit-proof logging RF margin vs saturation Temp/vibration drift K-of-N consistency

Pattern 1 — RSSI is high but CRC keeps failing

Symptom: strong field indicated, yet crc_fail_count stays high; output is unstable.
2-field checks: (RF) agc_code + limiter_count/sat_count; (DECODE) sync_reason_code + crc_fail_count
Interpretation: high RSSI can be distorted RSSI (clipping/AGC rail), not “good margin”.
First fix: shorten the input clamp loop and stabilize the AGC operating window (avoid rail + limiter chatter).
Confirm: run strong-field sweep (H2-10) and verify limiter_count decreases and CRC bursts disappear.

MPN starting points:
Input ESD/TVS (compact): PESD5V0S1UL (Nexperia), PESD5V0X1B (Nexperia), ESD9B5.0ST5G (onsemi)
RF limiter / front-end protection (broad use): HMC547ALC3 (Analog Devices), HMC987ALP5E (Analog Devices)
RF gain/AGC building blocks (broad use): ADL5501 (Analog Devices, RF detector), AD8361 (Analog Devices, detector), ADL5611 (Analog Devices, LNA)

Note: choose variants by frequency band and package constraints; these are evaluation starting points.

Pattern 2 — RSSI is low, reads sometimes succeed (high sensitivity to conditions)

Symptom: intermittent success; retry_count distribution becomes heavy-tail.
2-field checks: (RF) rssi_bin + under_amp_count; (DECODE/TIME) retry_count + sample_window_us
Interpretation: margin is insufficient; the system “wins by retries” rather than stable convergence.
First fix: restore margin at the antenna/matching/connector path before changing decode thresholds (avoid false locks).
Confirm: coupling/SNR sweep shows smoother convergence and reduced retries at the same window.

MPN starting points:
Matching network components (robust MLCC families): GRM series (Murata), C0G/NP0 MLCC where possible (various vendors)
Low-loss RF switches (if needed in antenna path): SKY13385-679LF (Skyworks), ADRF5020 (Analog Devices)
Connector/vibration reliability (example families): JST GH/PH series (board-to-wire), TE MicroMatch (board-to-wire)

Note: connector choice is mechanical-system-dependent; use as a shortlist for reviews, not a single “correct” answer.

Pattern 3 — Fails only at low temperature

Symptom: field failures cluster at cold; warm conditions pass.
2-field checks: (ENV/TIME) temp_c + timebase_health_code; (DECODE) sync_reason_code + retry_count
Interpretation: cold drift can move matching, oscillator, or decision thresholds.
First fix: use temperature-bucketed thresholds/AGC targets and a more stable timebase (TCXO) if sync drift dominates.
Confirm: thermal soak regression shows stage-coded failures disappear or migrate to a stable, explainable route.

MPN starting points:
TCXO examples: SIT5358 (SiTime), TXETBLSANF-26.000000 (Epson, example family), Abracon ASTX-H11 series (Abracon)
RTC examples (if RTC required): RV-3028-C7 (Micro Crystal), PCF2129 (NXP)
Temperature sensor (for evidence + compensation): TMP117 (Texas Instruments), MCP9808 (Microchip)

Pattern 4 — Resets mid-read (read half then reboot)

Symptom: incomplete telegram handling; missing commit proof; boot counters jump.
2-field checks: (POWER/COMMIT) reset_reason + vdd_min_mv; (COMMIT) commit_status + t_commit_done
Interpretation: brownout/wake-up current spike or “commit too late” causes evidence loss.
First fix: enforce “minimal evidence first” commit ordering and add brownout-safe power gating / supervisor policy.
Confirm: transient injection + power dip tests still produce minimal evidence and t_commit_done when possible.

MPN starting points:
Supervisors / BOR helpers: TPS3839 (Texas Instruments), MAX809 (Analog Devices/Maxim), MCP1316 (Microchip)
Load switch / eFuse (local rail control): TPS22918 (Texas Instruments), TPS25940 (Texas Instruments), LTC4412 (Analog Devices, ideal diode controller)
Hold-up / bulk (application-dependent): polymer electrolytic families (Panasonic OS-CON), low-ESR electrolytics (various vendors)

Pattern 5 — “CRC passes” yet the decoded output is wrong (rare but critical)

Symptom: incorrect telegram interpretation slips through single-pass checks; later correlation fails.
2-field checks: (DECODE) false_detect_count + sync_reason_code; (RESULT) telegram_hash + k_of_n_result/consistency_score
Interpretation: frame detection or sync false-lock can yield “valid-looking” frames; CRC alone is not sufficient.
First fix: promote acceptance from “CRC OK” to “K-of-N convergence + hash consistency” and tighten frame/sync gating.
Confirm: edge SNR + interference injection no longer produces wrong outputs; failures become stage-coded.

MPN starting points:
FRAM for robust event hashing/logging (fast commit): FM24CL64B (Cypress/Infineon), MB85RS64V (Fujitsu)
Serial flash (if policy allows): W25Q32JV (Winbond), MX25R6435F (Macronix, low-power family)
Hardware CRC acceleration MCUs (example families): STM32L4 series (ST), MSP430FR series (TI, FRAM-based)

Pattern 6 — After ESD, performance degrades over time (not immediate failure)

Symptom: baseline shifts; same installation now shows lower margin or higher retries days later.
2-field checks: (RF) rssi_bin distribution shift + under_amp_count; (DECODE) retry_count trend + crc_fail_count
Interpretation: connector/matching drift or shield termination loosened; latent damage is plausible.
First fix: re-baseline with the same fixture (SNR sweep) and inspect the antenna/matching/shield path before firmware changes.
Confirm: post-ESD baseline matches pre-ESD within the test matrix; trend stabilizes.

MPN starting points:
Low-capacitance ESD for RF nodes: PESD1CAN (Nexperia, example family), ESD5Z series (various vendors)
Shield termination accessories (system-level): 360° EMC cable glands (HUMMEL/Pflitsch families), braid clamps (various vendors)
Adhesive/strain relief (mechanical): epoxy/RTV families (application-dependent)

Pattern 7 — Passes on bench, fails in trackside reality

Symptom: lab fixture is stable; field shows high variance and unexplained dropouts.
2-field checks: (RF) agc_code stability + limiter_count; (TIME/DECODE) read_group_id clustering + retry_count tail
Interpretation: uncontrolled common-mode return, installation posture, metal reflections, or cable routing dominates.
First fix: make the field setup measurable: record installation state in logs and enforce a controlled return path (shield→chassis).
Confirm: field regression can reproduce failures and map them to one coupling path (F9), then resolve after the fix.

MPN starting points:
Isolated transceivers for maintenance/debug links: ISO3082 (Texas Instruments, isolated RS-485), ADM2587E (Analog Devices, isolated RS-485)
Common-mode choke (as an interface helper where applicable): WE-CMB series (Würth Elektronik), TDK ACM series (TDK)
Ethernet isolation magnetics (if applicable in the module boundary): Pulse H5007NL (Pulse, example family)

Pattern 8 — Logs exist but cannot be correlated to the same pass window (evidence is unusable)

Symptom: events cannot be aligned; upper-layer comparisons are ambiguous.
2-field checks: (TIME) timebase_mode + align_quality; (ID) event_id + read_group_id
Interpretation: missing group IDs or timebase health makes timestamps non-evidentiary.
First fix: enforce group IDs and alignment quality codes; record timebase health fields on every event.
Confirm: multi-read events within one pass window cluster under the same read_group_id with OK alignment.

MPN starting points:
Secure timestamp / tamper-aware RTC options (if needed): NXP PCF85063A (RTC), Microchip MCP79410 (RTC)
Small backup supply element (application-dependent): ML1220 rechargeable coin cell (Panasonic), supercap families (various vendors)
Low-power MCU families with good retention: STM32L0/L4 (ST), EFR32BG (Silicon Labs), MSP430FR (TI)

Figure F11. A fast field-debug flow: start from the symptom, inspect two evidence fields, then apply one “first fix” before expanding scope.

Cite this figure

ALT (for this figure): Root-cause decision tree for balise failures mapping symptoms to two evidence-field checks and first fix actions such as clamp loop tuning, brownout-safe commits, and temperature-bucket thresholds.

Implementation note: MPN examples are provided as evaluation starting points. Final selection must be validated against the actual RF band, interface impedance, transient levels, and mechanical constraints of the balise module and antenna path.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12. FAQs (Evidence-Driven Troubleshooting)

Each FAQ is designed as a fast field SOP: Verdict → Evidence ×2 → First fix, and explicitly maps back to H2-3…H2-11.

RSSI is high but decoding always fails — front-end saturation or sync thresholds?

Maps to: H2-4 / H2-5 / H2-8

Verdict: Most cases are front-end clipping/AGC rail, not “good margin.”
Evidence: Check agc_code with limiter_count/sat_count (H2-4) and confirm whether failures cluster at sync_reason_code vs crc_fail_count (H2-5/H2-8).
First fix: Shorten the clamp loop / improve return path, then retune AGC target window before touching decode logic (H2-11).

Unreadable only at high speed — window too short or re-read policy wrong?

Maps to: H2-3 / H2-5 / H2-10

Verdict: High-speed failures usually come from time-budget collapse before convergence, not raw RF loss alone.
Evidence: Compare sample_window_us vs retry_count distribution (H2-3), and locate the dominant fail stage via sync_reason_code / CRC bursts (H2-5).
First fix: Enforce a speed-aware K-of-N policy (bounded retries + early abort stage codes) and validate with window-stress tests (H2-10).

Occasional wrong read but CRC shows no error — telegram version compatibility?

Maps to: H2-5 / H2-8

Verdict: Treat “CRC OK” as insufficient; false-lock/parse mismatch can still yield wrong semantics.
Evidence: Require telegram_hash consistency across reads and track k_of_n_result/consistency_score (H2-8). Also inspect false_detect_count and stage codes around sync/parse (H2-5).
First fix: Upgrade acceptance to “K-of-N + hash stable,” and log telegram_version_id/parser_path_id (H2-11).

Failure spikes in cold/heat — matching drift or clock drift causing sampling misalignment?

Maps to: H2-3 / H2-5 / H2-9

Verdict: Separate “RF margin drift” from “timebase/threshold drift” using trends, not guesses.
Evidence: Correlate temp_c with rssi_bin shift (margin drift, H2-3) and with sync_reason_code/sync_lock_time (timing drift, H2-5). Track interference sensitivity changes (H2-9).
First fix: Apply temperature-bucketed thresholds/AGC targets and add timebase health reporting before redesigning hardware.

Works in one direction but fails on the return pass — posture/reflection or install height?

Maps to: H2-3 / H2-9

Verdict: Direction-dependent reads point to coupling geometry and metal environment sensitivity.
Evidence: Compare rssi_bin and agc_code stability between directions (H2-3). If interference route changes, limiter_count spikes and stage codes shift (H2-9).
First fix: Log installation state (height/posture ID) and correlate with the coupling/SNR budget before firmware changes (H2-3/H2-10).

After ESD, reads fail — front-end damage or MCU reset loop?

Maps to: H2-9 / H2-6 / H2-8

Verdict: Distinguish “RF path injured” vs “power/reset collapse” using one RF field and one power field.
Evidence: Check reset_reason with vdd_min_mv (H2-6/H2-8) and compare limiter_count/sat_count behavior pre/post ESD (H2-9).
First fix: Ensure minimal evidence commits survive resets, then improve clamp/return path if RF saturation signatures dominate (H2-11).

Unreadable and no logs either — did power loss prevent commit?

Maps to: H2-6 / H2-8

Verdict: “No log” is a power/commit problem until proven otherwise.
Evidence: Look for missing t_commit_done and non-OK commit_status, then correlate with reset_reason and vdd_min_mv (H2-6/H2-8).
First fix: Move “minimal evidence first” ahead of heavy parsing and use a commit-proof scheme (FRAM or two-phase flags) (H2-11).

Needs many re-reads to stabilize — sync/threshold issue or insufficient SNR margin?

Maps to: H2-3 / H2-5

Verdict: Persistent retries mean the system is not converging; treat it as margin or gating instability.
Evidence: Use retry_count trend plus sync_reason_code to separate “can’t lock” from “locks then CRC bursts” (H2-5). Check whether rssi_bin is near the sensitivity knee (H2-3).
First fix: Restore margin first (coupling/matching), then tune gating thresholds to avoid false acceptance.

AGC stays pinned — weak coupling or wrong gain configuration?

Maps to: H2-4 / H2-8

Verdict: AGC rail can indicate either “input too small” or “gain path misconfigured”; differentiate by RSSI bins.
Evidence: Compare agc_code with rssi_bin and under_amp_count (H2-8). If RSSI is not low, the gain table/register path is suspect (H2-4).
First fix: Audit gain/AGC register configuration and tap-point observability before mechanical changes (H2-4).

Timestamps don’t align with speed/odometer — wrong marking point or clock drift?

Maps to: H2-7 / H2-10

Verdict: Misalignment is usually a definition/marking problem before it is a hardware clock problem.
Evidence: Compare align_quality with timebase_health_code (H2-7). Validate whether timestamps are taken at frame start/end/verify OK consistently using test logs (H2-10).
First fix: Standardize marking points (t_frame_start/t_verify_ok) and add alignment quality codes for every event (H2-7/H2-8).

“Read OK but position is still wrong” — missing correlation fields between telegram and upper layers?

Maps to: H2-7 / H2-8

Verdict: Position disputes are often correlation failures, not decode failures.
Evidence: Verify read_group_id clustering and event_id uniqueness per pass window (H2-7/H2-8). Check whether align_quality is OK when the upstream claims mismatch.
First fix: Add/validate correlation IDs and a minimal telegram summary (hash + version) so upstream mapping can be proven (H2-8).

Failure rate changes after switching balise batches — protocol nuance or RF tolerance distribution?

Maps to: H2-5 / H2-10 / H2-11

Verdict: Batch shifts must be separated into “parser/protocol differences” vs “RF margin distribution.”
Evidence: Compare telegram_version_id/parser_path_id outcomes (H2-5) and baseline rssi_bin + retry/CRC statistics under identical fixture conditions (H2-10).
First fix: Run A/B baseline with the test-to-evidence matrix and apply the decision-tree route before hardware changes (H2-11).

Balise / Transponder: RF Demodulation, Diagnostics, Event Timing

Balise / Transponder: RF Demodulation, Diagnostics, Event Timing

H2-1. Scope & Interfaces

1) System roles (responsibility boundary)

2) Interface map (interfaces are evidence entry points)

3) Operating conditions (variation → mechanism → what to observe)

H2-2. User Intent: What “Good” Looks Like in the Field

Acceptance targets (metrics → evidence → decision)

H2-3. Over-the-Air Coupling & Antenna Path

1) Practical coupling model (engineering, not academic)

2) Matching network impact (Q, bandwidth, tolerance)

3) Speed-driven time window (read budget)

4) What to prove (margin, not anecdotes)

H2-4. RF Front-End Architecture

1) Front-end blocks (what each block protects or proves)

2) Dynamic range requirement (near-strong vs far-weak)

3) Built-in observability (minimum tap-point set)

H2-5. Demodulation, Decoding & Telegram Integrity

1) Demod approach (kept at engineering abstraction)

2) Frame detection and synchronization (deterministic failure classification)

3) Bit decisions (soft vs hard) and parameter traceability

4) Telegram integrity: CRC pass is necessary, not sufficient

H2-6. Low-Power MCU & Wake-Up Strategy

1) Wake-up paths (stability vs power)

2) Power sequencing and brownout prevention (avoid “dies mid-decode”)

3) Firmware state machine (deterministic transitions + timeouts)

H2-7. Event Timestamping & Correlation

1) Time sources (trust levels + health evidence)

2) Timestamp strategy (milestones that form a closed evidence loop)

3) Correlation (speed/odometer alignment by fields, not inference)

H2-8. Diagnostics & Evidence Packet (What to Log)

1) Minimal evidence (must survive failures)

2) Full evidence (answers the “why” with measurable fields)

3) Result policy (hash/summary instead of full payload when needed)

H2-9. EMI/ESD/Environmental Hardening for Trackside Reality

1) ESD/EFT → front-end saturation → decode instability

2) ESD/EFT → reference/power disturbance → reset & incomplete evidence

3) Shielding, grounding, and common-mode return (antenna-to-front-end specific)

4) Temperature and vibration: drift → margin loss → retries

H2-10. Verification & Test Playbook (Bench + Field)

1) Bench margin characterization (coupling + SNR + saturation)

2) Injection tests (ESD/EFT/transients) with commit-proof logging

3) Speed / pass-window stress (time budget + re-read policy)

4) Field regression loop (reproduce → evidence → root cause → fix → rerun)

H2-11. Design Pitfalls & Fix Patterns

Pattern 1 — RSSI is high but CRC keeps failing

Pattern 2 — RSSI is low, reads sometimes succeed (high sensitivity to conditions)

Pattern 3 — Fails only at low temperature

Pattern 4 — Resets mid-read (read half then reboot)

Pattern 5 — “CRC passes” yet the decoded output is wrong (rare but critical)

Pattern 6 — After ESD, performance degrades over time (not immediate failure)

Pattern 7 — Passes on bench, fails in trackside reality

Pattern 8 — Logs exist but cannot be correlated to the same pass window (evidence is unusable)

Request a Quote

Accepted Formats

Attachment

H2-12. FAQs (Evidence-Driven Troubleshooting)

Explore

Categories

Get in Touch