Axle Counter / Track Circuit Receiver: AFE, BIST, EMC & Timing
← Back to: Rail Transit & Locomotive
Core idea: An axle counter / track circuit receiver is a vital “clear/occupied” decision front-end that must stay reliable under harsh railway EMC and changing rail conditions, with false-clear treated as the highest-risk outcome. The winning architecture is the one that pairs interference-hardened sensing and timing with auditable evidence logs, so every decision can be verified, debugged, and defended in the field.
H2-1. System Role & Safety Context
Axle counters and track circuit receivers sit at a safety boundary where weak sensing signals must survive harsh railway interference, and the output directly influences occupancy state decisions. This topic is not about “detecting a wheel” in isolation; it is about producing an auditable occupancy evidence output (Clear / Occupied) that can be trusted by vital logic.
What “Clear / Occupied” means in a vital chain
Occupancy status is commonly consumed by interlocking or equivalent vital decision modules through trackside I/O. The receiver subsystem must therefore provide more than a binary state: it must also provide health flags and event context so downstream logic can detect “bad sensing” versus “true occupancy.”
- Primary output: Occupied / Clear state (or axle event sequence for count-based occupancy).
- Health output: front-end saturation, coil/line integrity, self-test status, clock validity, supply status.
- Evidence output: timestamped transitions, threshold crossings, self-test results, and reset/brownout markers.
Axle counter vs. track circuit receiver: difference that matters in practice
A useful comparison is not a feature list; it is a failure-mode-driven comparison. Both aim to support occupancy decisions, but they rely on different physical mechanisms and therefore fail in different ways.
- Axle counter (event-based evidence): detects wheel-induced magnetic coupling changes at a sensor head, producing a timed event sequence. It is highly sensitive to sensor geometry, mounting, and local magnetic interference.
- Track circuit receiver (state-based evidence): receives an injected signal over the rails and infers occupancy from impedance/shorting changes. It is highly sensitive to rail electrical conditions (leakage, rust, water film, insulation joints).
- Evidence expectation: axle counters must prove event correctness (no missing/double counts); track circuits must prove threshold/decoding correctness (no false clears under degraded rail conditions).
Where it is installed and why it is installed there
The most common placement is inside a trackside cabinet or wayside I/O enclosure, close to the sensing boundary. The reason is simple: railway environments create large common-mode stress (ground potential differences, surges, stray currents) and wideband noise. Keeping the fragile analog boundary short and well-controlled improves determinism and diagnosability.
- Sensing boundary: sensor head / rail interface (weak signals + strong interference).
- Receiver boundary: preamp/detector and thresholding (where false decisions are born).
- System boundary: isolated communications to vital logic (where evidence must remain trustworthy).
Why False Clear is more dangerous than False Occupied
Fault consequences are asymmetric. A false occupied tends to reduce capacity (conservative behavior), while a false clear can create an unsafe permissive condition. This is why robust designs prioritize: (1) controlled behavior under front-end saturation, (2) strong self-test coverage for “silent failures,” and (3) evidence logs to support audit and maintenance triage.
- False Occupied typical roots: noise bursts crossing thresholds, adjacent track coupling, leakage-induced distortion.
- False Clear typical roots: front-end clamped/saturated, sampling/clock corruption, sensor/line open-circuit appearing “quiet.”
- Shared roots: surges, common-mode injection, water film/rust changes, shield/ground termination errors.
SEO note: this chapter intentionally frames terminology (“vital input”, “false clear”, “auditable evidence”) that aligns with how engineering teams search and troubleshoot in field incidents.
H2-2. Signal Physics: What Is Actually Measured
Robust receiver design starts with a precise definition of what the front-end is allowed to treat as “signal” and what it must treat as “environment.” Railway sensing is dominated by non-ideal conductors, distributed networks, and large common-mode stress. The core objective is to map physical reality to observable electrical features that remain stable enough to support thresholds, self-test, and audit.
Axle counter: magnetic coupling change becomes an observable feature
Axle counters typically excite a sensor head and observe how a wheelset perturbs the magnetic field. The measurable result is not a binary “wheel” but a change in coupling that appears as feature movement in the receiver domain (amplitude, phase, or derived envelope metrics). A deep design implication is that the “most stable feature” is chosen to survive interference, mounting variance, and temperature drift.
- Excitation: controlled drive that sets the measurement band (often kHz-range to avoid low-frequency rail currents).
- Coupling mechanism: wheel flange / wheelset modifies the magnetic path and effective coupling.
- Observable variables: amplitude shift, phase shift, harmonic distortion, and time-windowed feature stability.
- Risk coupling: nearby magnetic fields, mechanical misalignment, and saturation can create “signal-like” artifacts.
Track circuit receiver: rails behave like a distributed network, not a wire
In track circuits, an injected signal propagates through rails that behave like a distributed impedance network. Occupancy changes effective impedance because the wheelset introduces a low-impedance shunt; however, real rail conditions can also shunt or distort the signal. The receiver therefore relies on feature quality (SNR, correlation, stable amplitude/phase) rather than raw amplitude alone.
- Injected signal: audio-frequency or coded waveforms designed to be distinguishable from interference.
- Non-ideal shunt: wheel-rail contact resistance varies with contamination and moisture.
- Insulated joints: change boundaries and leakage paths; faults here can mimic occupancy changes.
- Observable variables: amplitude + phase + distortion + code correlation / decode confidence.
Environment variables that directly move electrical features
A receiver becomes field-ready when every “common incident condition” is translated into: physical change → electrical feature shift → risk. This mapping is what later supports threshold strategy and self-test design.
- Water film / humidity: increased leakage paths → lower received amplitude / altered phase → false occupied or unstable state.
- Rust / contamination: higher contact resistance → reduced shunt effect or additional distortion → missed occupancy evidence.
- Return / stray currents: common-mode injection and magnetic interference → front-end saturation or false features.
- Temperature: coil resistance and analog bias drift → baseline movement → threshold drift if not controlled/compensated.
Why high-CMR and anti-alias constraints are non-negotiable
Two constraints dominate wayside sensing. First, large ground potential differences and common-mode disturbances can exceed the signal magnitude by orders of magnitude. Without high common-mode rejection and controlled front-end headroom, the receiver can saturate and create false differential content. Second, traction and switching interference inject wideband energy; if not filtered before sampling, it can alias into the detection band and produce “valid-looking” feature changes.
- Common-mode stress: can drive clipping/rectification and create pseudo-signal artifacts.
- Anti-alias reality: wideband noise can fold into the decision band and defeat threshold logic.
- Field observables: clipped waveforms, long recovery tails, noise-floor lift, and unexpected spectral peaks near the detect band.
H2-3. Analog Front-End Architecture
The analog front-end (AFE) is the highest-risk boundary in the axle counter / track circuit receiver chain. Most “mysterious” field mis-detections originate here: common-mode stress pushes the input into non-linearity, wideband interference folds into the decision band, or long-cable effects distort features until thresholds become meaningless. A railway-grade AFE is therefore designed around three guardrails: linearity under common-mode, hard decision-band boundaries, and measurable health evidence.
Differential preamp: high CMR must include headroom and recovery
High common-mode rejection is necessary but not sufficient. Railway environments introduce ground potential differences, stray return currents, and surge-induced common-mode events that can drive inputs beyond the linear region. Once the input stage clips or rectifies, it can generate pseudo-differential content that looks like a valid detection feature. A railway-grade differential preamp must therefore be evaluated by: (1) common-mode headroom, (2) saturation recovery time, and (3) symmetry under stress.
- Headroom: maintain linear operation under expected common-mode swings and burst conditions.
- Recovery: clipped stages must return quickly; long recovery tails create “phantom windows.”
- Symmetry: mismatch turns common-mode into differential error; symmetry is an EMI countermeasure.
Band-pass filtering: define the decision band before the detector
Filtering is not only about noise reduction; it is about enforcing a hard boundary between “allowed signal” and “rejected energy.” Traction-related interference and mains harmonics can be orders of magnitude larger than the sensing feature. The band-pass stage must be chosen to preserve the target physics features (amplitude/phase/correlation) while rejecting: 50/60 Hz, harmonics, and out-of-band switching energy that would otherwise contaminate detection statistics.
- Passband sizing: too wide invites alias and feature pollution; too narrow increases missed detections in worst cases.
- Group delay awareness: filtering can shift phase; phase-based detectors must account for deterministic delay.
- Field evidence: noise-floor lift or unexpected peaks near the decision band indicates insufficient boundary control.
Programmable gain: preserve linearity across rail conditions
Programmable gain (or controlled AGC) is a reliability tool, not a sensitivity knob. Rail conditions change the received feature amplitude (humidity, rust, leakage, rail impedance) and long-cable capacitance changes the effective signal shape. Gain staging is used to keep the feature inside a linear measurement window without saturating during interference bursts. Gain decisions are also part of the evidence chain: gain state should be exposed as a health and audit field to explain “why the threshold changed.”
- Gain scheduling: avoid gain switching inside a detection window to prevent artificial transitions.
- State logging: record gain range changes and clip counters as part of event context.
- Degraded-mode behavior: when saturation is detected, force conservative state output until recovery is proven.
ΣΔ ADC vs. SAR ADC: choose based on feature stability and alias risk
ADC selection should be driven by the observable feature and the interference model. Sigma-delta (ΣΔ) converters provide oversampling and digital decimation that naturally suppress out-of-band noise, which can reduce analog anti-alias burden for narrow-band or slowly varying features. SAR converters provide low latency and deterministic sampling that can benefit windowed transient detection, but they demand stronger analog anti-alias filtering because wideband interference can fold directly into the decision band.
- ΣΔ fit: stable narrow-band features, strong out-of-band rejection through decimation, manageable group delay.
- SAR fit: low-latency windows, tight timing control, but requires strict analog anti-alias and robust front-end headroom.
- Evidence expectation: whichever ADC is used, expose “effective bandwidth” and “clip/overrange indicators” for audit.
Anti-alias design: the most common “looks fine” failure
In wayside sensing, aliasing is a silent failure mode. Traction and switching noise inject wideband energy; if filtering and sampling are not co-designed, that energy folds into the decision band and becomes indistinguishable from real features. Anti-alias design must therefore be treated as a system constraint: analog filtering, sampling rate, and any digital decimation must align so that out-of-band energy cannot produce in-band threshold crossings.
- Co-design rule: analog anti-alias + sampling + digital filtering must share a single decision-band definition.
- Verification hint: check for clipped recovery tails and in-band noise-floor lift during worst-case traction events.
- Health metric: maintain counters for overrange, clip duration, and filter bypass states (if any).
H2-4. Detection & Threshold Strategy
Detection is the boundary where physics becomes a decision. A reliable railway receiver does not rely on a single instantaneous value. It uses windowed features, hysteresis, and multi-parameter consistency checks so that bursts, drift, and rail-condition changes cannot easily create false clear states. The output should be a state plus a confidence/health context that explains why the state is trusted.
Static vs adaptive thresholds: adaptation must be bounded and auditable
Static thresholds can be robust in stable environments but become fragile when rail impedance and leakage change with weather, rust, or contamination. Adaptive thresholds are not “auto-follow” logic; they must be bounded by rate limits and locked during detection windows so interference cannot drag the baseline and convert noise into apparent “normal.” A practical rule is to make adaptation explicit: track a baseline and track a confidence, and never allow baseline motion to masquerade as evidence.
- Static threshold fit: predictable rail condition, stable SNR, strong margins.
- Adaptive threshold fit: changing rail conditions; requires rate limits, window freeze, and profile logging.
- Audit requirement: record threshold/baseline evolution so state changes remain explainable.
Windowed detection: window length is the “false-trigger bandwidth”
Windowing is the primary defense against short bursts. A too-short window allows spikes to trip thresholds; a too-long window can delay detection or miss high-speed cases when the feature is present only briefly. Windowing parameters are therefore engineering control knobs, not afterthoughts: window length, overlap, entry/exit conditions, and hold times determine both safety conservatism and operational stability.
- Key parameters: window length, overlap ratio, entry count, exit count, minimum hold time.
- Railway reality: windows must tolerate burst interference without suppressing true short events.
- Evidence logging: store window statistics (mean/RMS/variance) around transitions for postmortems.
Hysteresis: prevent state chatter under drift and noise-floor changes
Hysteresis is required because rail conditions shift baselines and noise floors. Separate enter/exit thresholds reduce chatter and protect against partial recoveries after saturation. The safest practice is to make clearing more strict than occupying: entering occupied may be allowed with strong evidence; exiting occupied should require higher confidence and stronger consistency, especially when health flags indicate degraded sensing.
- Two-threshold logic: enter threshold vs exit threshold with deterministic hold time.
- Fail-safe bias: degraded health should tighten “clear” conditions, not loosen them.
- Drift protection: hysteresis width should scale with observed noise-floor, not a fixed guess.
Envelope vs phase detection: choose the feature that survives the interference model
Envelope features are straightforward but can be sensitive to amplitude noise and common-mode artifacts. Phase-based features can be more robust to amplitude fluctuations but become sensitive to timing integrity and deterministic filtering delay. The correct choice depends on which feature remains stable under traction interference, cable capacitance distortion, and temperature drift. A robust design often extracts both and uses them in a consistency gate rather than trusting a single feature.
- Envelope strength: simplicity and directness; vulnerable to amplitude bursts and gain drift if not controlled.
- Phase strength: robustness to amplitude variation; requires clock integrity and controlled group delay.
- Practical rule: extract multiple features and require agreement before clearing states.
Multi-parameter “voting”: enforce evidence consistency, not just threshold crossing
Multi-parameter voting is best treated as a consistency gate: a detection is considered trustworthy only when independent features agree within defined bounds. For example, an amplitude change without the expected phase/correlation behavior should not be promoted to a clear/occupied transition. This is especially important for preventing false clears after front-end saturation or clock disturbance, where one feature may appear “normal” while others indicate a degraded measurement.
- Typical feature set: amplitude metric + phase metric + band-energy / correlation metric.
- Consistency rule: all required features must pass within the same window; otherwise output a degraded confidence state.
- Audit rule: log which feature failed the gate and by how much (bounded numeric fields).
Threshold drift, temperature compensation, and rail-condition profiles
Threshold drift is unavoidable: analog baselines move with temperature, component aging, and rail condition changes. A railway-grade receiver treats drift as a monitored variable and uses explicit compensation: baseline tracking, temperature-aware correction, and rail-condition profiles (e.g., dry / wet / heavy contamination) that bind coherent parameter sets (thresholds, hysteresis width, window length, adaptation limits). Profile switching must be logged so later investigations can reproduce the exact decision context.
- Drift sources: temperature, aging, leakage changes, cable impedance changes.
- Monitor fields: baseline trend, noise-floor, gain state, threshold offset, clip counter.
- Compensation fields: temperature coefficient table, bounded adaptation rate, profile ID in logs.
H2-5. Built-In Self-Test (BIST) & Diagnostic Coverage
In a vital sensing chain, self-test is not a “nice-to-have.” It is the mechanism that prevents silent failures from producing unsafe permissive states. The objective is to detect faults that can make the receiver appear stable while its measurement is no longer trustworthy (e.g., open coils, clamped inputs, stuck sampling, reference drift). A railway-grade BIST program therefore focuses on: coverage of dangerous failures, bounded diagnostic latency, and fail-safe response with auditable logs.
What diagnostic coverage means without citing standards
Diagnostic coverage should be treated as a practical question: for each dangerous failure mode, is there a mechanism that detects it quickly enough and forces a provably conservative behavior? Coverage is not measured by the number of tests, but by whether the test set can detect faults that otherwise lead to false clears, and whether the system records evidence that makes the detection explainable during audits and maintenance.
- Danger focus: prioritize faults that can yield false clear or hide real occupancy evidence.
- Latency focus: ensure a maximum time to detect and react (diagnostic latency) is bounded.
- Response focus: move to degraded/fail-safe output rules and record the reason.
Injection test: verify end-to-end measurement integrity
Injection tests validate the measurement chain beyond “continuity.” A controlled calibration stimulus is injected so the receiver can confirm that the AFE, sampling, feature extraction, and decision statistics behave as expected. The injection should be designed so it cannot be confused with real occupancy features and should avoid disturbing live decision windows.
- Injection points: front-end input node, reference node, or sensor-drive loop depending on architecture.
- Pass criteria: amplitude/phase/correlation within bounds over a defined window, with stable variance.
- Failure action: assert a diagnostic flag and tighten or block clear decisions until integrity is restored.
Coil continuity monitoring: detect the most common silent fault
Coil and line integrity faults are frequent in harsh trackside environments: connectors loosen, water ingress changes impedance, and mechanical stress breaks conductors. Continuity monitoring should detect open/short/intermittent behavior using impedance-aware checks rather than a single DC threshold, because temperature and cable length change the baseline.
- Methods: DC resistance trend, AC impedance probe, or drive-current feedback with temperature compensation.
- Intermittent detection: track variance and dropouts over time, not just a single snapshot.
- Fail-safe bias: loss of continuity should prevent “clear” from being declared as a normal state.
ADC range and stuck-data checks: prevent “stable but wrong” sensing
ADC faults and overrange behavior often manifest as deceptively stable values. A receiver must detect saturation, clamping, reference drift, and stuck codes. Practical implementations combine overrange/clip counters with distribution checks on window statistics (min/max/variance) to detect frozen or implausibly quiet signals.
- Range checks: overrange count, clip duration, near-rail dwell time.
- Stuck-data checks: near-zero variance and repeated patterns across windows.
- Logging: store the first timestamp of the anomaly and the last known good state.
Reference injection loop: catch drift in gain, phase, and filter behavior
Reference injection loops validate that “known input produces known output.” They are used to detect gain drift, phase shift, or filter deviation caused by temperature, aging, or component damage. The reference feature is evaluated with the same feature extraction path used for real sensing, ensuring the diagnostic covers what the decision actually relies on.
- What it detects: baseline drift, gain/phase mismatch, filter shape deviation.
- Pass criteria: bounded error in amplitude/phase and bounded variance over a window.
- Action: raise “recalibration needed” flags and switch to conservative decision settings.
Sampling-chain watchdog: detect clock/logic stalls and partial updates
A sampling chain can fail without obvious symptoms: DMA stalls, buffers stop updating, or the feature engine stops advancing while the system still “runs.” A watchdog should therefore monitor liveness at multiple levels: sample counters, window statistics heartbeats, and decision-engine progress. If liveness fails, the system should reinitialize the sampling chain and record a root-cause marker.
- Liveness signals: sample counter increments, DMA heartbeat, window-stat update counter.
- Timeout policy: bounded detection latency for stalls and partial updates.
- Action: controlled restart + degraded state output + log the watchdog event.
Boot, periodic, and online self-tests: schedule to maximize coverage without false events
Self-tests should be staged by their impact and diagnostic latency requirements. Boot tests catch configuration and structural faults before any state is trusted. Periodic tests track drift and aging. Online tests monitor for sudden failures without injecting disturbances into live decision windows. All test outcomes should be exported as health flags and recorded with timestamps so the decision history is reproducible.
- Boot self-test: continuity, ADC range, reference check, sampling heartbeat before enabling trusted state output.
- Periodic self-test: reference injection and baseline/noise-floor tracking to catch drift and degradation.
- Online self-test: clip counters, stuck-data checks, heartbeats, and short non-disruptive probes where applicable.
H2-6. Clocking & Interference-Hardened Timing
Timing integrity is a hidden cause of receiver mis-detections. Clock instability does not merely shift frequency; it changes how features are sampled, how phase is measured, and how windows align. A receiver can therefore appear electrically “healthy” yet still misclassify states when jitter, EMI-induced corruption, or drift degrades measurement consistency. Railway timing design treats the clock as part of the evidence chain: sampling integrity and timestamp integrity must both be monitored and gated.
Two timing roles: sampling clock vs timestamp clock
A robust design separates the roles even if they share a source. The sampling clock governs feature extraction quality (amplitude/phase/correlation). The timestamp clock governs event traceability and cross-system alignment. Each role requires its own health indicators so that a degraded clock can tighten “clear” decisions and mark logs as less trustworthy without hiding the problem.
- Sampling domain: impacts window statistics, phase accuracy, and correlation confidence.
- Timestamp domain: impacts event ordering, transition audits, and alignment with interlocking records.
- Gating rule: clock health degraded should bias toward conservative (non-clear) behavior.
Interference-hardened oscillators: selection is about resilience, not only accuracy
Trackside environments expose oscillators and PLLs to conducted and radiated interference. Resilient clocking focuses on start-up behavior, lock robustness, temperature drift stability, and the ability to detect and report loss-of-lock or abnormal frequency excursions. The goal is to ensure that any timing degradation becomes observable and cannot silently poison the decision pipeline.
- Resilience metrics: lock robustness, shock/EMI tolerance, predictable warm-up drift.
- Observability: loss-of-lock markers and frequency health flags must be exported to diagnostics.
- Fail-safe tie-in: degraded timing should tighten thresholds for clearing states.
Jitter tolerance: how timing noise becomes “feature noise”
Jitter directly perturbs sampling instants. For envelope detectors, jitter can appear as added noise and inflate window variance. For phase detectors, jitter becomes phase error and reduces the stability of phase-based features. Both effects push window statistics closer to thresholds and can increase false transitions unless jitter tolerance is engineered and monitored.
- Envelope impact: variance inflation and noisy window statistics.
- Phase impact: phase error and reduced correlation stability.
- Evidence: sudden rise in feature variance or correlation drop without a matching physical cause.
EMI-induced clock corruption: short-lived faults can create long-lived misclassifications
EMI can cause transient clock corruption: PLL unlock/relock events, edge perturbations, or counter jumps. Even short events can create a chain reaction: a corrupted window can trigger a false transition, and the state machine can then “legitimize” that transition via hysteresis and hold timers. Therefore, clock fault detection must be fast and explicitly linked to decision gating and logging.
- Fault types: loss-of-lock, frequency excursions, counter discontinuities, abnormal period jitter.
- Mitigation: clock-health gating + discard windows during recovery + log a timing fault marker.
- Outcome: prevent transient timing corruption from causing false clears.
Spread-spectrum clock tradeoff: EMI benefits vs feature consistency
Spread-spectrum clocking can reduce EMI peak energy but introduces controlled frequency modulation that can complicate narrow-band feature extraction, especially phase and correlation measurements. When spread-spectrum is used, feature extraction must be designed to remain consistent under the modulation, and the modulation profile should be part of the configuration evidence so investigations can reproduce behavior.
- Pros: reduced EMI peaks and improved compliance margins in some installations.
- Cons: added frequency motion can disturb phase/correlation stability if not modeled.
- Evidence: record whether spread-spectrum is enabled and its profile ID.
Timestamp requirements and alignment with vital systems
Timestamps are required for audit, incident reconstruction, and alignment with vital decision records. Synchronization is not treated here as a protocol tutorial; it is treated as an evidence requirement. If alignment is lost or clock health is degraded, the system should record the condition and apply conservative state rules until timing integrity is restored.
- Required fields: transition timestamp, clock health flag, and any loss-of-lock / recovery markers.
- Alignment behavior: mark logs when timing is not trustworthy; tighten clear decisions under degraded timing.
- Root-cause hint: unexplained false transitions often correlate with clock health events and recovery windows.
H2-7. Isolation & Communications
Wayside installations combine long outdoor cables, variable grounding conditions, and strong traction-related interference. In this setting, external communication links are not only functional interfaces; they are energy coupling paths. A robust design starts by placing the galvanic isolation boundary correctly, then controlling common-mode surge return paths, and finally defining fail-safe behavior when links degrade. The goal is to ensure that comm disturbances cannot silently corrupt receiver evidence or trigger unsafe permissive states.
External interface set: treat every cable as a potential surge path
Rail-side cabinets typically expose multiple physical links (data, maintenance, and auxiliary connections). The key engineering question is not which protocol is used, but which conductors can carry common-mode current into the cabinet during traction events or lightning-induced surges. Long cables and remote earth references often create large common-mode swings that a receiver must withstand without front-end saturation or false events.
- Data links: RS-485 / CAN / Ethernet (physical layer behavior under common-mode stress matters).
- Reference paths: shield, chassis/PE, and any bonding conductors.
- Service links: maintenance ports should be treated as additional coupling points.
Galvanic isolation boundary placement: isolate at the cabinet entry, not deep inside
The isolation boundary should be placed so that surge energy is contained on the “dirty side” before it reaches sensitive measurement and decision domains. If isolation is located too far downstream, the cable entry region still drives common-mode current across the sensitive reference, and the system behaves as if it were not isolated during fast transients. Practical placement principles are: isolate close to connectors, keep the dirty-side return loop short, and provide a clean-side reference that is not forced to follow cable-induced common-mode motion.
- Dirty side: cable entry, surge clamp, chassis return paths, common-mode suppression.
- Clean side: AFE, sampling, detection logic, timestamps and event logs.
- Proof point: during a surge, the clean-side baseline should not exhibit large recovery tails or clip events.
Isolated RS-485 / CAN / Ethernet: reduce fault propagation, not only noise
Isolation on RS-485, CAN, and Ethernet interfaces prevents ground potential differences and surge events from propagating into the cabinet’s internal ground reference. The objective is to keep the receiver measurement chain stable under common-mode stress and to keep link disturbances from creating misleading evidence (e.g., corrupted logs or timing discontinuities). Isolated interfaces should therefore be paired with explicit health monitoring so that link faults become observable and auditable.
- Isolation role: block DC and low-frequency ground shifts; limit common-mode energy transfer.
- Health role: expose link state (UP/DOWN/FLAP) and error counters as diagnostic evidence.
- Fail-safe tie-in: degraded comm health should tighten state-clearing conditions.
Common-mode surge and lightning coupling: control the return path or lose the isolation benefit
Lightning-induced surges and traction-related common-mode events couple through multiple routes: direct conduction via shields and bonding, capacitive displacement current across isolation barriers, and inductive pickup along long parallel runs. Effective protection is therefore not a single component, but a controlled system: clamp and divert energy at the entry, minimize loop area, and ensure that shield and chassis returns provide a preferred low-impedance path that does not cross the sensitive reference.
- Coupling routes: shield conduction, chassis/PE injection, displacement current across isolation.
- Protection intent: keep the highest dV/dt and surge currents on the dirty side.
- Evidence: correlate comm errors or resets with surge markers and clip counters.
Cable shield termination strategy: decide where common-mode current is allowed to flow
Shield termination is a current-routing decision. The shield should carry high-frequency common-mode currents to the chassis/earth without forcing them through signal reference nodes. Poor termination can make the shield act as an injection antenna that drives common-mode motion across sensitive circuits. A practical strategy is to prioritize a low-impedance high-frequency path to chassis while avoiding large low-frequency ground loops that pull the internal reference away from its stable domain.
Watchdog reset and degraded behavior on comm failure: make failure visible and conservative
Communication failures should not be treated as generic “reboot and hope.” The receiver must classify link faults (silent outage, burst errors, flapping) and react with a conservative decision policy until stable recovery is proven. Watchdog resets can restore liveness, but the critical feature is auditable degradation: health flags, timestamps, outage duration, and error counters must be recorded so maintenance can distinguish external disturbance from internal faults.
- Fault classes: DOWN (no data), ERROR (burst faults), FLAP (intermittent).
- First reaction: assert comm health degraded and tighten clearing behavior.
- Evidence fields: link state, error-rate counters, last-good timestamp, outage duration.
H2-8. EMC & Interference Sources in Railway
Railway receivers operate in a multi-source interference field. The risk is not only higher noise amplitude, but a combination of diverse spectra, multiple coupling paths, and non-linear front-end behavior under stress. The most damaging failures occur when interference drives the input stage into saturation or clamping, producing recovery tails and pseudo-features that survive filtering and appear as legitimate detection evidence. A practical EMC view therefore starts with an interference map: sources → coupling paths (DM vs CM) → receiver symptoms → evidence fields.
Major interference sources: what “shape” they typically produce
- Traction inverter noise: switching harmonics and spectral clusters that move with operating conditions.
- Regenerative braking noise: time-varying return-current patterns and baseline shifts during braking phases.
- Stray return current: low-frequency common-mode bias plus pulsating components that distort baselines.
- Lightning-induced surge: high dV/dt transients; short events that can create long recovery tails.
- Adjacent track interference: windowed, event-correlated disturbances that can resemble real detections.
DM vs CM coupling paths: the single most useful EMC classification
Differential-mode (DM) interference directly adds to the signal pair and corrupts amplitude/phase features. Common-mode (CM) interference moves the reference of both conductors together and often becomes dangerous when it drives the front-end into non-linearity or forces displacement currents across isolation boundaries. In practice, many “DM-looking” failures originate from CM injection that is converted into differential error by mismatch or saturation.
- DM path: conductor-to-conductor coupling, loop pickup, and direct injection into the measurement pair.
- CM path: shield/chassis/earth injection, ground shifts, and fast dV/dt displacement across isolation.
- Classification rule: if symptoms track chassis/shield events, treat it as CM until proven otherwise.
Shielding vs grounding: define the goal as current routing, not “more metal”
Shielding is primarily about controlling where common-mode current flows. Grounding is about controlling reference stability and preventing large loop areas from turning into antennas. A poor combination can create a path that pushes common-mode current through sensitive references, triggering saturation and recovery artifacts. A robust strategy provides a preferred high-frequency return path to chassis while avoiding low-frequency loops that drag internal references across large potentials.
Front-end saturation behavior: why transients become false “events”
When interference is strong enough to saturate or clamp the front-end, the receiver can generate false evidence even after the transient ends. Saturation creates clipped waveforms; recovery creates tails that contaminate multiple windows; mismatch can convert common-mode motion into a differential artifact that resembles the real sensing feature. This is why EMC must be linked to AFE headroom, recovery time, and decision gating.
Mitigation philosophy: control paths first, then linearity, then decisions
- Path control: shield termination, chassis return, cable entry surge diversion, minimize loop areas.
- Linearity control: front-end headroom, symmetric protection, fast recovery, clip detection.
- Decision control: windowing, hysteresis, multi-parameter consistency gate, conservative clearing under degraded health.
- Evidence control: clip counters, noise-floor trends, comm-error counters, and timing fault markers.
Field evidence checklist: what to correlate during investigations
- Clip/overrange counters versus acceleration/braking periods and known traction operating phases.
- Decision-band noise-floor lift during regeneration or adjacent-track activity.
- Comm error bursts aligned to storm events or known surge markers.
- Phase/correlation instability aligned to jitter/clock-health events and strong CM disturbances.
H2-9. Event Recording & Evidence Chain
A railway receiver is judged not only by detection performance, but by its ability to prove why a decision was valid. In safety-critical contexts, an event record must answer: what changed, when it changed, which evidence triggered the transition, and whether the measurement chain was healthy at that moment. This turns the receiver into an auditable system: a decision is backed by structured snapshots (measurement + thresholds + health) wrapped in a storage and integrity model that survives outages and supports maintenance extraction.
Timestamped detection events: record context, not just a state change
A “clear/occupied” transition is only useful if it can be reconstructed. Each detection event should include a compact context window: a short pre-roll and post-roll summary of key features (amplitude/phase/correlation/noise floor) and the exact decision gate that fired (threshold crossing, consistency gate, hysteresis/hold timer). This prevents disputes from turning into guesswork and makes false-event patterns visible.
- Event types: state transition, axle pulse, diagnostic fault, self-test status change.
- Context window: pre/post window feature summaries around the transition.
- Decision reason: which gate fired (crossing, hysteresis state, hold timer state).
Threshold crossing logs: capture the effective threshold and why it was effective
Thresholds are rarely single constants in the field. They are often the output of a profile, compensation state, and gating logic. To make decisions auditable, the system should record both the threshold identity (profile ID) and the effective value that was actually applied at the time of the event, along with the hysteresis and hold conditions that shaped the transition.
- Record: profile ID, effective threshold value, hysteresis state, hold timer state.
- Compensation linkage: baseline/temperature compensation state that influenced the effective threshold.
- Audit outcome: the “why” of the crossing becomes reproducible, not anecdotal.
Self-test result logs: bind BIST health to every decision snapshot
Self-test is only valuable if its status is inseparable from decisions. Each detection event should reference the most recent BIST summary and whether the measurement chain is in a healthy, degraded, or blocked state. When diagnostics are degraded, event records should automatically carry a lower-confidence marker and the system should bias toward conservative behavior until health is restored.
- Bind: BIST status and diagnostic markers into each event snapshot.
- Degraded behavior: tighten “clear” rules while health is uncertain.
- Traceability: investigations can confirm the chain was healthy at the time of decision.
Power rail monitoring logs: turn brownouts and resets into evidence
Many field disputes are actually power integrity problems that look like sensing failures. Power rail logs should record minimum rail levels, sag duration markers, brownout flags, and reset reasons (when applicable). This enables correlation between detection anomalies and rail behavior, and prevents false attribution of power-induced glitches to threshold design or sensor physics.
- Record: rail minimum, sag duration, brownout marker, reset reason marker.
- Correlation: align power markers with detection transitions and diagnostics.
- Outcome: power issues become measurable evidence, not speculation.
Black-box storage: preserve decision evidence through outages
A black-box approach is defined by survivability and consistency. Event records should remain intact across power loss or watchdog resets, avoid partially written records, and maintain a monotonic sequence that supports reconstruction. Storage health itself should be observable (e.g., integrity flags and lifetime counters) so the evidence system cannot silently degrade.
- Survivability: keep critical event summaries through resets and outages.
- Consistency: avoid half-written records and preserve sequence continuity.
- Observability: integrity flags and storage-health markers.
Signed event logs: integrity and authenticity as “evidence prerequisites”
In railway operations, evidence is only usable if its integrity can be verified. A practical approach is to package event sequences with integrity markers and verification hooks so maintenance extraction can confirm that logs were not altered and that time ordering remains consistent. The implementation details vary, but the requirements are stable: verifiable integrity, identifiable extraction sessions, and clear handling of time discontinuities or degraded clock quality.
Maintenance data extraction: export, verify, and reproduce the chain
Extraction is part of the evidence system. A maintenance workflow should support selecting a time range, exporting event sequences and health summaries, and verifying integrity markers before analysis. Export records should include extraction session identifiers and the health state of timing and power to ensure results are interpreted correctly.
Evidence field checklist: minimum set for auditable decisions
| Evidence Group | Minimum Fields | Why it matters |
|---|---|---|
| Decision | state, transition_reason, window_id, hysteresis_state, hold_timer_state | Reconstructs why a state changed and which gate fired. |
| Threshold | profile_id, effective_threshold_value, compensation_state | Shows the applied decision boundary, not just the configured intent. |
| Measurement | amp_metric, phase_metric, corr_metric, noise_floor, clip_count | Separates real physics from saturation/recovery artifacts. |
| Health | bist_status, adc_range_status, sampling_heartbeat, clock_health, comm_health | Proves the chain was healthy (or degraded) when the decision happened. |
| Power | rail_min, sag_duration, brownout_flag, reset_reason | Links anomalies to supply events and avoids misdiagnosis. |
| Time | event_time, time_quality_flag, discontinuity_marker | Enables alignment across systems and flags “untrustworthy time.” |
| Storage | log_sequence, integrity_flag, extract_session_id | Supports black-box reconstruction and verification after extraction. |
H2-10. Failure Modes & Field Debug Methodology
Field issues should be approached as a repeatable evidence workflow, not ad-hoc intuition. The fastest path is to first gate on chain trust (diagnostics, clock health, comm health, power markers), then separate non-linearity (saturation/clamping/recovery tails) from threshold misfit (profile/compensation/windowing mismatch), and finally apply the lowest-cost fix that restores conservative behavior. Each case below is framed as: two waveforms to capture, three log fields to check, and a decision split between saturation-driven artifacts and threshold-driven misclassification.
Universal debug gate: confirm the evidence is trustworthy
- Diagnostics gate: bist_status and sampling_heartbeat are normal (not degraded).
- Timing gate: clock_health and time_quality_flag show no discontinuity near the event.
- Power/comm gate: brownout/reset markers and comm_health do not align with the symptom window.
Case A — False occupied during heavy rain
Heavy rain often changes insulation and return-current behavior, raising common-mode stress and moving the baseline. The first step is to determine whether the receiver is producing occupancy evidence from a linear measurement or from saturation/recovery artifacts.
- Waveforms to capture: (1) AFE/ADC input for clipping and recovery tails, (2) feature output trend (noise floor or envelope stability).
- Log fields to check: clip_count, noise_floor trend, effective_threshold_value/profile_id.
- Interpretation: clip_count rise with tails → saturation/CM path problem; noise floor lift without clipping → threshold/profile mismatch.
Case B — Missed axle at high speed
High speed reduces the effective feature window and increases sensitivity to windowing and sampling alignment. The goal is to decide whether events are being filtered out by configuration (window/hold) or lost to timing instability.
- Waveforms to capture: (1) feature peak width (duration of correlation/envelope peak), (2) window update cadence (feature window liveness).
- Log fields to check: window_id/window duration marker, clock_health/time_quality_flag, transition_reason or “no-crossing” markers.
- Interpretation: peak becomes narrower but windows/holds unchanged → threshold/window misfit; clock health anomalies → timing integrity issue.
Case C — Occasional double count
Double counts are commonly caused by recovery tails creating a second pseudo-peak, or by hysteresis/hold settings that allow a second crossing before the system has “settled.” The quickest discriminator is whether a second peak exists in the features and whether it aligns with clipping behavior.
- Waveforms to capture: (1) phase/correlation feature trace to see double peaks, (2) AFE/ADC input to check for saturation rebound.
- Log fields to check: hysteresis_state/hold_timer_state, transition_reason sequence, pre/post event window snapshots (summary).
- Interpretation: double peak with clipping or tails → saturation + gating; double peak without clipping → hysteresis/hold too permissive.
Case D — Detection jitter near switch points
Switch points introduce geometry changes and complex coupling that can destabilize phase and correlation. The primary question is whether jitter is environmental (feature variance rises while the chain is healthy) or systemic (comm/clock/power faults coincide).
- Waveforms to capture: (1) phase metric variance over windows, (2) correlation/consistency metric variance over windows.
- Log fields to check: phase_metric variance marker, corr_metric variance marker, comm_health and clock_health markers.
- Interpretation: phase/corr variance rises with healthy chain → environment/DM-CM coupling; faults coincide → investigate timing/comms first.
Fast discriminator: saturation vs threshold misfit
- Saturation signature: clipping counters, flat-top waveforms, long recovery tails, and sudden feature variance spikes.
- Threshold misfit signature: stable linear waveforms, systematic near-threshold behavior, and strong dependence on profile/compensation state.
- First fix bias: control CM return paths and preserve linearity before increasing sensitivity via thresholds.
H2-11. Design Trade-Offs & Component Domains
A railway axle-counter / track-circuit receiver is defined by constraints: harsh common-mode interference, long cable runs, conservative safety bias (false-clear is the most dangerous outcome), and a requirement to provide auditable evidence for every critical state transition. This chapter turns those constraints into design decisions. Each trade-off below includes: what is being traded, what to measure, how it maps to evidence fields (H2-9), and which field symptoms it typically triggers (H2-10) when chosen incorrectly.
Trade-Off A: ADC resolution vs latency
More resolution can improve separation between weak “real” signatures and interference, but only if the analog chain stays linear. Latency comes from filtering, conversion, and decision processing; excessive end-to-end delay compresses the effective detection window at high speed, and makes event timestamps harder to align across cabinets and upstream safety logic.
- Measure: end-to-end decision latency budget vs minimum event width; group delay through filters; clip_count rate under worst CM stress.
- Evidence impact: time_quality_flag stability + event_time alignment; pre/post window summaries remain meaningful only if latency is bounded.
- Typical field symptom when wrong: missed axle at high speed (window becomes too short) or “late” evidence that fails to correlate with other logs.
Trade-Off B: fixed thresholds vs programmable thresholds
Fixed thresholds are simple and predictable, but they can fail when rail conditions change (wet contamination, corrosion, return-current distribution). Programmable thresholds allow profile-based adaptation, but only if the receiver stays auditable: the applied effective threshold and compensation state must be recorded alongside each decision, or the system becomes a black box in maintenance.
- Measure: sensitivity margin across worst-case noise floor shifts; stability of near-threshold behavior (how often events sit “on the edge”).
- Evidence impact: profile_id + effective_threshold_value + compensation_state must be bound to each transition_reason snapshot.
- Typical field symptom when wrong: rain false occupied (noise floor rises) or drift-triggered chatter near switch points.
Trade-Off C: hardware detection vs DSP detection
Hardware detection offers deterministic latency and simple failure boundaries, but it can struggle with complex interference patterns and multi-parameter consistency checks. DSP-based detection enables amplitude/phase/correlation voting and richer diagnostics, but it depends heavily on clock integrity, compute determinism, and complete evidence logging.
- Measure: worst-case processing time; time-base sensitivity (clock_health correlation); robustness after saturation recovery tails.
- Evidence impact: DSP decisions require richer measurement snapshots (amp/phase/corr/noise_floor) to remain auditable.
- Typical field symptom when wrong: occasional double count (recovery tail creates pseudo-peaks) or jitter near switch points (variance not gated).
Trade-Off D: isolation rating vs PCB spacing and coupling control
Higher isolation ratings reduce the risk of ground potential shifts and common-mode surge energy crossing into the clean side, but they can increase layout constraints, routing length, and parasitic coupling if spacing is achieved without controlling return paths. In long-cable trackside cabinets, isolation is not just a rating—boundary placement and transient return control determine whether the system recovers cleanly or produces false evidence.
- Measure: comm_health resets vs surge markers; correlation between clip_count spikes and comm dropouts; recovery time after CM transients.
- Evidence impact: comm_health + integrity_flag must stay coherent across CM events; otherwise evidence chain confidence collapses.
- Typical field symptom when wrong: “random” comm resets during surges and unexplainable occupancy flips.
Trade-Off E: power budget vs redundancy
Redundancy is valuable only if it preserves conservative behavior and evidence quality under partial failure. If power budget pressure forces health monitors or black-box durability features to be removed, maintenance cost and dispute risk increase sharply. A practical approach is to define a minimum evidence set that must never be powered down: event snapshots, brownout/reset markers, clock/comm health markers, and integrity flags.
- Measure: brownout frequency vs event anomalies; hold-up sufficiency for writing an atomic event record; reset_reason distribution.
- Evidence impact: brownout_flag + reset_reason must align with detection timelines; otherwise root-cause separation becomes impossible.
- Typical field symptom when wrong: “ghost” events around power sags and inability to prove whether a decision was valid.
Component domains: example MPN buckets (by function)
The part numbers below are representative examples to anchor design thinking and BOM discussions. Exact selection depends on isolation level, temperature grade, availability, and the noise/latency budget of the chosen detection strategy.
| Domain | Role | Key selection levers | Example MPNs (buckets) |
|---|---|---|---|
| Differential / INA front-end | Extract small signals under CM noise; preserve linearity. | CMRR vs frequency, noise density, input bias, recovery behavior. |
AD8421
INA826
|
| ADC / sampling chain | Convert features with stable timing and auditable performance. | Resolution vs latency, simultaneous sampling, reference stability. |
ADS131M04
ADS131M04-Q1
|
| Digital isolation | Protect the clean side; preserve data integrity across CM transients. | Isolation rating, CMTI, propagation delay, fail-safe behavior. |
ADuM141E
ADuM141ES
|
| Isolated fieldbus (examples) | Robust long-cable comms in trackside cabinets. | Bus fault protection, EMC robustness, loop delay, isolation. |
ISO3082 (RS-485)
ISO1042 (CAN)
|
| Clocking (interference-hardened) | Stable sampling and reliable timestamps under vibration/EMI. | g-sensitivity, shock/vibration, temperature range, jitter. |
SiT8920B
|
| Surge / ESD front-line | Clamp induced surges before they corrupt evidence and timing. | Clamping capability, surge waveform rating, leakage vs temperature. |
SMBJ58A
|
Mapping these domains back to the trade-offs keeps selection disciplined: instrumentation/INA choices determine linearity headroom (saturation vs threshold misfit), ADC choices determine evidence timing and latency, isolation/transceiver choices determine comm_health behavior during CM events, and clocking determines whether DSP-based methods remain repeatable and auditable.
H2-12. FAQs (Evidence-First Triage)
Each FAQ below follows a strict field triage pattern: one conclusion, two evidence checks (waveform/feature + log fields), and one first fix. Every answer maps back to upstream chapters so the investigation stays auditable and repeatable.
1) Heavy rain causes false occupied — threshold drift or rail leakage? → H2-2 / H2-4 / H2-8
Conclusion: This is usually rail leakage and noise-floor lift unless clipping artifacts are present.
- Evidence check #1 (waveform/feature): Verify whether the AFE/ADC input shows flat-topping or long recovery tails; if not, check whether envelope/phase metrics show a sustained noise-floor rise rather than isolated spikes.
- Evidence check #2 (logs): Compare
noise_floor_trendvsclip_count, and confirmeffective_threshold_value/profile_idremained stable across the transition. - First fix: Lock a conservative wet-condition profile and record the applied
effective_threshold_valuefor every transition, then re-test under similar rainfall to confirm the noise-floor margin.
2) High-speed trains occasionally miss counts — sampling window too short or phase shift? → H2-4 / H2-6
Conclusion: Misses at high speed typically come from window/hold settings that no longer match the shortened feature peak, not from “random” drift.
- Evidence check #1: Measure feature peak width (correlation/envelope duration) and confirm it fits within the configured detection window and hold logic.
- Evidence check #2: Check
window_id/window_durationmarkers plustime_quality_flag/clock_healthnear the miss; timing anomalies point to sampling integrity issues. - First fix: Reduce window latency and retune window/hold parameters for the observed peak width while keeping conservative hysteresis to avoid double counts.
3) False double count near switch — coil coupling or adaptive algorithm issue? → H2-3 / H2-4
Conclusion: Double counts near switch points are more often recovery-tail or coupling-induced pseudo-peaks than “algorithm instability.”
- Evidence check #1: Inspect phase/correlation traces for a second peak and align it to the raw input; a rebound after saturation strongly indicates non-linearity.
- Evidence check #2: Review
clip_count,hysteresis_state, andhold_timer_statearound the event sequence to see whether gating allowed a second crossing too early. - First fix: Increase hold time or tighten consistency gating after a strong peak, and add a “recovery tail” inhibit condition when
clip_countis non-zero.
4) Self-test passes but field errors occur — what coverage is missing? → H2-5
Conclusion: Passing BIST can still miss environmental coupling paths that are not exercised by the injection model.
- Evidence check #1: Compare “field failure” waveforms to BIST injection signatures; if failures involve common-mode bursts or saturation, the injection path is not representative.
- Evidence check #2: Correlate failure timestamps with
comm_health,clock_health, andclip_countto identify missing cross-domain coverage (sampling, timing, isolation, or surge path). - First fix: Extend BIST with periodic chain-heartbeat checks (sampling, reference range, continuity) and log a degraded marker that forces conservative behavior when coverage is incomplete.
5) Lightning storm leads to reboot — front-end surge or comm isolation gap? → H2-7 / H2-8
Conclusion: Distinguish power-rail collapse from isolation/comm upset by which marker appears first and how recovery behaves.
- Evidence check #1: Check whether the analog input shows immediate clipping and prolonged recovery, which indicates surge energy coupling into the front end.
- Evidence check #2: Align
brownout_flag/reset_reasonwithcomm_healthandintegrity_flag; early comm faults without brownout suggest an isolation boundary weakness. - First fix: Harden the surge return path at the cabinet entry and enforce a comm “fail-safe reset” policy that preserves black-box integrity and marks time discontinuities.
6) ADC values saturate randomly — common-mode spike or reference drift? → H2-3 / H2-8
Conclusion: Random saturation is typically common-mode spikes; reference drift tends to be slow and correlated with temperature or supply trends.
- Evidence check #1: Observe whether saturation events are impulsive with fast onset and recovery tails (CM spikes) versus gradual bias movement (reference drift).
- Evidence check #2: Compare
clip_countbursts torail_min/sag_durationand to any reference-range diagnostics; drift will correlate with slow changes rather than bursts. - First fix: Increase CM resilience and headroom in the AFE path (reduce coupling, prevent clamping into the measurement node) before changing thresholds or gains.
7) Clock drift causes detection instability — oscillator aging or EMI coupling? → H2-6
Conclusion: Instability that appears suddenly or during interference events is more likely EMI coupling; gradual long-term drift suggests aging.
- Evidence check #1: Look for detection variance spikes that align to high-interference moments versus a slow degradation trend over days/weeks.
- Evidence check #2: Inspect
time_quality_flaganddiscontinuity_markerplusclock_healthevents; EMI coupling often produces discontinuities rather than smooth drift. - First fix: Add a clock-health gate that blocks aggressive decisions when time quality degrades, and stabilize the sampling time base before retuning thresholds.
8) Track circuit shows occupied but axle counter clear — who is right? → H2-1 / H2-2
Conclusion: Treat the inconsistency as an evidence-alignment problem; conservative operation depends on verifying what each system actually measured and when.
- Evidence check #1: Confirm the physical condition each method senses (rail shunt / injected signal versus wheel/coil disturbance) and whether rail conditions (leakage, contamination) can bias either signal.
- Evidence check #2: Align event timestamps using
event_time+time_quality_flag; mismatched time bases can create apparent disagreement even with correct sensing. - First fix: Enforce time-quality marking and cross-system correlation windows so disagreements trigger an auditable “investigate” state, not an untraceable override.
9) Long cable causes noise bursts — shield termination or differential imbalance? → H2-7 / H2-8
Conclusion: Bursts over long cables often come from shield/return handling that converts common-mode energy into differential error.
- Evidence check #1: Check whether bursts coincide with traction events and whether the AFE input shows symmetrical CM movement versus true differential spikes.
- Evidence check #2: Correlate
comm_healthandclip_countspikes withnoise_floor_trend; CM-to-DM conversion usually produces simultaneous disturbances across domains. - First fix: Correct the shield termination strategy at the cabinet entry and validate that differential balance improves before adjusting thresholds or DSP logic.
10) Interlocking flags inconsistency — timestamp misalignment or voting logic? → H2-9 / H2-6
Conclusion: Investigate timestamp alignment first; “logic inconsistency” is frequently a time-base or latency mismatch between evidence streams.
- Evidence check #1: Compare the receiver’s event timeline to the upstream timeline and confirm whether decision latency and windowing could shift the event boundary.
- Evidence check #2: Use
time_quality_flag/discontinuity_markerplus logged latency/window IDs to show whether the event was reported late or with degraded time quality. - First fix: Add a “time-quality required” rule for critical transitions and log the effective latency budget so upstream correlation is deterministic.
11) Power brownout leads to missed events — holdup insufficient or reset policy too aggressive? → H2-9 / H2-7
Conclusion: Missed events around brownouts are usually an evidence-write survivability problem or a reset policy that collapses the sampling chain too early.
- Evidence check #1: Check whether event summaries stop abruptly or show partial sequences near the brownout, indicating insufficient time to write atomic records.
- Evidence check #2: Align
rail_min/sag_durationwithreset_reasonandintegrity_flag; aggressive resets will coincide with incomplete evidence markers. - First fix: Reserve hold-up budget for committing a minimal event snapshot and delay non-essential resets until the snapshot is sealed and integrity-marked.
12) Maintenance mode causes unexpected reset — watchdog config or test injection misused? → H2-5 / H2-10
Conclusion: Unexpected resets in maintenance mode are typically watchdog timing not aligned to test workflows, or injection sequences that violate gating assumptions.
- Evidence check #1: Confirm whether the reset happens during long injection/diagnostic steps where normal sampling heartbeat pauses, triggering watchdog conditions.
- Evidence check #2: Inspect
reset_reasonwithsampling_heartbeatand BIST markers; misused injection often leaves inconsistent BIST states around the reset. - First fix: Add an explicit “maintenance gate” that extends watchdog windows only for approved test states, and force all injected tests to be logged with session IDs for audit.