CBTC/ETCS Onboard Unit: Safety Compute, Positioning & Security

Q: Train occasionally triggers emergency braking, but the wireless link looks “up” — session freshness or timestamp anomaly?

Conclusion: This pattern usually comes from an evidence gate expiring (freshness/time validity), not a total link drop. Evidence checks: (1) Compare freshness_expired_count against session_state to see if messages are accepted but stale. (2) Check timestamp_valid_flag and clock_jump_flag around the brake trigger time. First fix: Inspect reconnect_reason_code and time-source switch logs before tuning thresholds.

Q: Speed suddenly spikes and triggers overspeed — sensor glitch or slip/slide misclassification?

Conclusion: A true spike must be separated from slip/slide plausibility conflicts using two-layer evidence. Evidence checks: (1) Review debounce_reject_count and open_short_flag for transient edge artifacts. (2) Correlate wheel_speed_delta with accel_plausibility_flag (and slip_flag if present) to confirm physical plausibility. First fix: Capture raw edge timing and filtered speed in the same window before replacing sensors.

Q: After passing a balise, position still drifts — balise misread or odometry drift model not converging?

Conclusion: Drift after an anchor is usually an anchor credibility issue or a slow odometry bias that remains unbounded. Evidence checks: (1) Validate misread_flag and expected_set_hit for anchor identity consistency. (2) Compare anchor_timestamp to timestamp_valid_flag and track post-anchor wheel_speed_delta trend for bias. First fix: Require anchor-accepted events to include a stable time-valid window and a bias-reset marker in logs.

Q: Startup occasionally enters degraded mode — secure boot rejection or safety self-test coverage trigger?

Conclusion: Degraded-at-boot is normally either integrity failure (fail-closed) or compute credibility failing self-tests. Evidence checks: (1) Read signature_fail_reason_code with boot_stage_id to locate verification failure. (2) Check bist_fail_reason and ecc_correctable_rate for self-test or memory-correction driven downgrade. First fix: Freeze boot measurement digest and key version into the incident record before retrying or re-flashing.

Q: Log timeline is scrambled and accountability fails — RTC drift or logging pipeline latency?

Conclusion: Timeline issues often come from time-validity collapse or late commits, not only RTC drift. Evidence checks: (1) Inspect clock_offset_ms, drift_rate_ppm, and timestamp_valid_flag. (2) Compare log_commit_latency and log_drop_count for delayed or lost writes. First fix: Record time-valid transitions explicitly and tag each log segment with a monotonic sequence number.

Q: Frequent reconnects in weak coverage — latency tail out-of-window or replay/sequence protection triggering?

Conclusion: Reconnect storms typically come from tail latency and sequence/freshness gates, not average signal level. Evidence checks: (1) Use latency_ms_p99 and packet_loss_rate to confirm tail behavior vs thresholds. (2) Check seq_jump_count and freshness_expired_count for sequence/replay gating vs transport loss. First fix: Start from reconnect_reason_code distribution to separate auth/seq/timeouts.

Q: After maintenance, “input mismatch” alarms increase — wiring/common-mode across isolation or thresholds too tight?

Conclusion: Maintenance often changes reference/return paths; isolation and common-mode events can create false mismatches. Evidence checks: (1) Compare input_consistency_fail_count with isolation_fault_flag or link_reset_reason. (2) Check clock_offset_ms or time-validity if mismatch correlates with timestamp anomalies. First fix: Re-validate isolation power/return layout and re-baseline threshold IDs before changing numeric thresholds.

Q: Same line, different trains show big behavior differences — wheel/gear variation or calibration parameter version drift?

Conclusion: Cross-train variance is often parameter/version drift plus mechanical differences the model does not absorb. Evidence checks: (1) Compare calibration_param_version (or config hash) and threshold_id across vehicles. (2) Track wheel_speed_delta trends and sensor_health_counter to separate mechanics from electronics. First fix: Lock parameter bundles under signed configuration and log config_hash at trip start.

Q: Compute has enough performance but shows periodic jitter — safety partition preemption or ECC correction “storm”?

Conclusion: Periodic jitter is usually a determinism problem (scheduling/ECC), not raw compute shortage. Evidence checks: (1) Compare sched_jitter_hist against task periods to confirm preemption patterns. (2) Inspect ecc_correctable_rate and lockstep_mismatch_count spikes around missed deadlines. First fix: Separate safety/non-safety contention and bound ECC handling bursts into deterministic service windows.

Q: Authority updates look normal, but braking curve execution lags — I/O path delay or state machine stuck in conservative branch?

Conclusion: Action lag is either vital I/O latency or the state machine holding conservative mode. Evidence checks: (1) Compare reaction_timestamp and io_path_latency from command to readback confirmation. (2) Check state_transition_reason and action_taken for policies blocking fast ramp-in. First fix: Log the full I/O readback chain with sequence numbers and enforce explicit exit criteria from conservative branches.

← Back to: Rail Transit & Locomotive

The CBTC/ETCS Onboard Unit (OBU/EVC) is a safety-critical evidence engine: it turns speed/position, radio sessions, and time into trusted proof, then drives braking/limits deterministically when any evidence becomes unreliable. In practice, stable operation depends on measurable fields (reason codes, counters, latency/timestamps) and a BOM built around safety compute, secure boot/keys, isolation, supervision/holdup, and tamper-evident logging.

H2-1. What it is: OBU/EVC role and safety boundary

Engineering role: turning movement authority into audited safety actions

A CBTC Onboard Unit (OBU) / ETCS European Vital Computer (EVC) is the train-borne safety decision point that converts position & speed evidence plus received authority into supervision limits and vital actuation (e.g., brake demand / traction restriction), while producing an evidentiary record of why each safety action occurred.

Evidence → Decision Supervision Curves Vital Actuation Audit-Ready Logging

What the onboard unit must do (written as verifiable responsibilities)

Fuse evidence: combine odometry-based motion evidence with discrete position anchors (e.g., balise) and authority constraints to produce a safe state estimate and confidence.
Supervise motion: compute and enforce supervision limits (speed ceilings, braking curves, approach constraints) derived from the currently valid authority.
Trigger safety actions: when evidence indicates a boundary violation or insufficient confidence, force the system back into a safe envelope (restriction or braking), with clear trigger criteria.
Prove decisions: record the minimum set of fields needed to reconstruct “what was known, what was decided, and why” at the time of each event.

This page treats the onboard unit as a supervisor (safety boundary enforcement), not a traction controller or HMI system.

Safety boundary: vital vs non-vital inputs/outputs (a practical classification)

The safety boundary is defined by the consequences of being wrong. An interface becomes vital if an error can directly cause unsafe permission (false release) or prevent required protection (missed braking).

Rule 1 — Consequence

If a signal can change the safety decision (release vs restrict/brake), treat it as vital and require health evidence (consistency, plausibility, timeout discipline).

Rule 2 — Self-evidence

Inputs that cannot self-prove integrity (e.g., no cross-check, no plausibility) must be paired with independent evidence or relegated to non-vital aiding.

Rule 3 — Degraded operation

Signals still required in conservative mode are life-line signals; design them as vital with deterministic behavior and auditable fault states.

Typical vital inputs: dual-channel odometry evidence (speed/distance), discrete position anchors (balise-derived), authority validity fields (sequence/time validity), essential brake feedback.
Typical non-vital aiding: GNSS for optional aiding, non-critical maintenance telemetry, driver convenience indications.
Typical vital outputs: brake demand / traction restriction with feedback confirmation and fault-latched states.
Typical non-vital outputs: informational displays and diagnostics that must never block safety cycles.

Not covered here: wayside controller internals (Zone Controller / RBC), interlocking logic, or full RF front-end design. Those belong to separate pages.

Figure (H2-1): Safety boundary map for an onboard unit

Figure 1. Safety boundary for a CBTC/ETCS onboard unit: vital inputs/outputs must be deterministic, diagnosable, and auditable.

Cite this figure: CBTC/ETCS Onboard Unit Safety Boundary (Figure 1)

Suggested caption: “Vital interfaces define the safety decision boundary and required evidence fields.”

H2-2. System block diagram: sensing → safety compute → vital outputs

Reading method: follow the evidence arrow, not the module list

A useful onboard architecture diagram is not a “box pile.” It is an evidence pipeline: sensing produces measurements, integrity checks turn measurements into evidence, safety compute turns evidence into supervised limits, and vital outputs enforce the safe envelope. Each stage must emit the minimum fields required to prove correctness during audits and post-event analysis.

Sensing Integrity Checks Safety Compute Decision Actuation + Log

Inputs: each must state (1) what it proves, (2) how health is proven

Odometry (wheel tach / encoder / radar aiding): proves motion (speed & distance). Health evidence: dual-channel consistency, jump detection, plausibility vs acceleration limits, sensor timeout.
Balise-derived anchors: proves discrete position constraints. Health evidence: ID/sequence consistency, validity windows, missed/duplicate discrimination flags.
Radio authorization/session: proves what is currently permitted. Health evidence: message sequencing, freshness/time validity, session state transitions, reason codes for reconnects.
Train integrity & essential discretes: proves “allowed to move” constraints. Health evidence: input echo/feedback consistency, debounced state with timeout discipline.

The point is not to list buses; it is to define the evidence required to safely use each input.

Safety compute: four defenses that keep decisions deterministic

Parallelism & comparison: lockstep / dual execution prevents single-event faults from silently changing safety decisions.
Partitioning: safety tasks run with guaranteed timing; non-safety tasks (UI, bulk logging, comms) cannot starve supervision cycles.
Diagnostics coverage: ECC/Parity, clock monitors, watchdogs, LBIST/MBIST turn hardware trust into auditable evidence.
Fault state machine: each abnormal condition maps to a defined restriction/brake behavior and a defined recovery criterion.

Outputs: vital actuation must be closed-loop and reconstructable

Vital outputs are not “control signals.” They are enforcement actions driven by supervision decisions. Each vital output must have feedback confirmation (or a deterministic fault-latched behavior), plus an event record containing: trigger class, evidence summary, timestamp status, and the resulting actuation state.

Vital outputs: brake demand, traction restriction, fault latch states (with feedback/echo where applicable).
Non-vital outputs: indications and maintenance telemetry that must never block the safety cycle.

Evidence checklist (used later by validation + FAQs)

Sensor health: consistency flags, plausibility counters, timeout markers.
Session health: sequence counters, freshness checks, reconnect reason codes.
Timestamp health: drift/holdover flags, time validity, monotonic counters.
Decision state: current supervision mode, restriction/brake rationale code.
Actuation feedback: output echo/feedback consistency, latch state, recovery gate.
Audit integrity: signature status, event chain continuity, anti-rollback indicators.

This checklist is intentionally compact so each later chapter can point back to a named field class.

Figure (H2-2): Evidence-chain system block diagram (single-column, mobile-safe)

Figure 2. Evidence-chain system view: sensing is only useful once integrity checks convert it into evidence for deterministic safety decisions.

Cite this figure: CBTC/ETCS Onboard Unit Evidence Chain (Figure 2)

Suggested caption: “Measurement-to-audit pipeline: integrity checks and deterministic safety compute link inputs to vital enforcement.”

H2-3. Safety compute platform: SoC/MCU choices & partitioning

Why safety compute looks this way

A CBTC/ETCS onboard unit cannot rely on “correct software” alone. The compute platform must turn random faults (EMI, soft errors, clock anomalies, transient brownouts) into detectable, controlled, and auditable events. This is why modern safety SoCs/MCUs combine execution comparison, domain partitioning, data-path integrity, and built-in diagnostics.

Compare Partition ECC/Parity BIST/WDT/Clock Audit Fields

Defense Layer A — execution comparison (lockstep / dual-core compare / 1oo2D)

Comparison is not for performance. It exists to convert silent computation faults into explicit “mismatch” evidence that can force deterministic restriction or braking.

Lockstep: tightly synchronized cores execute the same instructions; mismatches raise immediate fault evidence.
Dual-core compare: two independently scheduled domains compute equivalent safety results and compare within a defined window.
1oo2D concept: dual-channel with diagnostics that can identify which channel is unhealthy (without exposing voter implementation details).

Design output of this layer: mismatch flags, comparison window violations, channel health counters.

Defense Layer B — partitioning (safety island + high-performance domain)

Partitioning separates deterministic safety cycles from non-safety workloads. Safety tasks (supervision, fault state machine, vital output gating) must keep timing guarantees even when the non-safety domain is busy (comms stacks, UI, bulk logs, maintenance tools).

Safety island

supervision cycles • vital gating • fault state machine • minimal audit fields

Key requirement

bounded latency and deterministic recovery paths

High-performance domain

radio/session processing • UI/diagnostics • log compression • maintenance interfaces

Key requirement

cannot starve or override safety island decisions

The safety argument becomes simpler when safety-critical behavior is contained and timing is provable.

Defense Layer C — data-path integrity (memory ECC, bus parity)

Safety decisions depend on data integrity as much as compute integrity. ECC and parity mechanisms provide measurable evidence that stored state and in-flight transfers are trustworthy.

ECC (correctable): corrects single-bit faults and increments counters that signal environmental stress or aging.
ECC (uncorrectable): triggers deterministic fault states when corruption cannot be repaired.
Bus parity/CRC: surfaces transfer corruption as explicit error flags tied to time and address context.

Evidence fields: correctable_count, uncorrectable_flag, parity_error_count, fault_address (if available).

Defense Layer D — diagnostics & clock supervision (LBIST/MBIST, WDT, clock monitor)

Built-in tests and monitors provide the coverage material required in safety cases: what faults are detected, how fast they are detected, and how the system reacts.

LBIST/MBIST: detects logic and memory faults at startup and/or periodic intervals; failures must map to a defined restriction/brake path.
Watchdogs (WDT): detect stalls or runaway execution; safe reaction must be independent of non-safety workloads.
Clock monitors: detect frequency drift, stop, or instability; time validity is an evidence primitive for logs and supervision.

Evidence fields: bist_status, wdt_reset_reason, clock_valid_flag, fault_reaction_timestamp.

ASIL/SIL evidence points (without turning into a textbook)

Safety cases require defensible evidence that faults are detected and handled within defined reaction times. The compute platform contributes by producing measurable artifacts rather than marketing claims.

Diagnostic coverage artifacts: which failure modes are detected by compare/ECC/BIST/WDT/clock monitors.
Fault reaction timing: detection latency and transition time to restrictive or braking states.
Containment proof: non-safety crashes must not override safety island gating or timing.
Audit continuity: event chains that show mismatch/fault → decision → enforcement → recorded evidence.

Figure (H2-3): Safety compute “four defense layers” with partitioning

Figure 3. Safety compute is built from layered defenses: comparison, partitioning, data integrity, and diagnostics—each producing auditable evidence.

Cite this figure: CBTC/ETCS Safety Compute Defense Layers (Figure 3)

Suggested caption: “Partitioned safety compute converts random faults into detectable evidence and deterministic safe reactions.”

H2-4. Vital I/O & isolation strategy (inputs/outputs you must trust)

Signal credibility engineering: trust is measured, not assumed

Vital I/O is not an interface list. It is a trust loop that ensures every safety-relevant input and output can answer two questions: is it credible now? and what deterministic action occurs if it is not? The onboard unit must maintain closed-loop evidence: command does not equal actuation until feedback confirms it.

Vital Inputs Safety Gate Vital Outputs Feedback/Echo Isolation Evidence Fields

Vital inputs: what they influence and how credibility is proven

Speed/odometry: drives supervision limits and braking curves. Credibility: dual-channel agreement, jump detection, plausibility bounds, timeout discipline.
Brake feedback: confirms enforcement outcomes and detects stuck paths. Credibility: command-vs-feedback alignment, response window timing, fault-latched mismatch flags.
Essential discretes (e-stop chain, critical interlocks): gate permission and force conservative states. Credibility: debounced state, line monitoring (open/short), deterministic timeout actions.

Each vital input must emit explicit health evidence; otherwise it cannot be safely consumed by the safety island.

Vital output gating: only safety domain can authorize enforcement

Vital outputs must be controlled through a safety gate that is independent of non-safety workloads. The non-safety domain may request actions, but it cannot directly drive vital enforcement. When evidence is insufficient or fault states are active, the safety gate must force restrictive behavior.

Deterministic gating: fixed rules map evidence/fault states to allowed outputs.
Fail-safe default: loss of credible evidence leads to restriction or braking paths, not silent continuation.
Audit linkage: each gating decision produces a reason code and timestamp validity marker.

Vital outputs: redundancy + feedback (closed-loop enforcement)

Redundancy exists to make failures observable and controllable. Feedback/echo closes the loop so that enforcement is confirmed, timed, and recorded. A vital path should be able to detect stuck-on, stuck-off, delayed response, and intermittent wiring faults through deterministic checks.

Outputs: brake demand, traction restriction, fault latch states.
Feedback: output echo/feedback consistency, response latency windows, latch confirmation.
Record: event reason, evidence snapshot, action taken, recovery gate condition.

Isolation strategy: manage common-mode paths, not just components

Isolation is not solved by adding parts. It is solved by controlling where common-mode current flows. Digital isolators, isolated transceivers, and isolated power must be chosen and placed so that high dv/dt and surge energy does not corrupt sensing, comparison, or vital gating.

Isolation boundary: define which signals cross, which reference they use, and how health evidence crosses the boundary.
Common-mode suppression: provide a controlled return path so noise does not enter sensitive measurement/comparison nodes.
Deterministic comms: isolated links must still provide CRC/sequence/freshness evidence for safety consumption.

Evidence fields dictionary (used later by validation + FAQs)

input_consistency_flag: dual-channel disagreement markers and counters.
input_timeout_flag: stale data detection with bounded timeout rules.
crc_fail_count / seq_jump: integrity and ordering evidence on safety-relevant messages.
output_echo_mismatch: commanded vs feedback mismatch with latch and clear conditions.
actuation_latency_ms: response timing evidence (window-based, not best-effort).
fault_latch_state: current enforced restriction/brake state and recovery gate reason.

These names act as anchors: each later diagnostic question can point back to a specific field class instead of vague “check logs.”

Figure (H2-4): Vital I/O trust loop (inputs → gate → outputs → feedback → audit)

Figure 4. Vital I/O is a closed-loop trust system: credibility evidence gates enforcement, and feedback confirms actuation for auditability.

Cite this figure: CBTC/ETCS Vital I/O Trust Loop (Figure 4)

Suggested caption: “Inputs must be measurable and diagnosable; outputs must be confirmed and recorded with reason codes.”

Scope cut (to avoid overlap)

This page defines the onboard unit’s vital I/O gating, confirmation, and evidence fields. Detailed actuator physics (valve/pump drivers, pressure control loops) belongs to the Brake Control Unit page.

H2-5. Speed & odometry AFE chain (how speed becomes evidence)

Goal: speed is only usable when it becomes safety-grade evidence

A safety supervisor does not consume a raw “speed number.” It consumes speed evidence: speed + validity window + credibility flags + health counters + timestamp status. The AFE chain and plausibility checks must turn physical pulses and analog signals into a deterministic evidence stream that can be audited after incidents.

Sensors AFE Conditioning Plausibility Slip Detection Health Counters Audit Fields

Typical sensing stack (primary + optional aiding)

Tach/encoder: pulse-based distance and speed evidence; strong resolution, sensitive to missing teeth and mechanical looseness.
Hall/MR: robust in harsh environments; sensitive to air-gap and magnetic conditions; used as primary or redundant channel.
Doppler radar (optional): independent motion evidence that can help during wheel slip; installation and multipath must be considered.
IMU aiding (non-vital aiding): used for short-window plausibility and continuity, not as a sole speed authority.

The onboard unit should treat aiding sources as credibility checks unless they can produce independent integrity evidence.

AFE conditioning: turning noisy physics into stable edges

The AFE does not “improve accuracy” by magic. It prevents noise, bounce, and wiring faults from becoming false motion evidence.

Threshold + hysteresis: stable switching with margin; prevents edge chatter near noise floors.
Debounce / glitch reject: blocks micro-spikes from EMI or mechanical bounce; produces reject counters as evidence.
Open/short detection: converts dead sensors and wiring faults into explicit fault flags.
Bandwidth shaping: ensures out-of-band noise does not produce “valid” pulses.
Latency discipline: detects abnormal capture delay that would break supervision timing assumptions.

Error models that matter (and how they appear in evidence fields)

Wheel slip / spin

Wheel speed rises but acceleration/position constraints disagree; must raise a slip flag and widen uncertainty.

Evidence signature

slip_detect_flag + accel_plausibility_flag + channel_delta growth

Wheel diameter drift

Long-term scale error; appears as consistent bias that can be corrected only with anchor references.

Evidence signature

scale_drift_estimate (or correction_applied) + anchor residual trends

Missing tooth / damaged gear

Periodic pulse gaps or burst pulses; creates non-physical period outliers.

Evidence signature

pulse_period_outlier_count + tooth_missing_count

Temperature drift

Comparator thresholds and sensor offsets drift; increases edge instability and reject counts.

Evidence signature

debounce_reject_count + temperature_tag + stability counters

Control actions for sanding/anti-slip are out of scope; this chapter focuses on evidence generation and credibility.

What must be recorded to prove speed credibility (field dictionary)

These fields turn “speed reading” into audit-grade evidence that can justify supervision decisions.

wheel_speed_chA / wheel_speed_chB: raw or normalized channel speeds.
wheel_speed_delta: channel disagreement magnitude (with thresholds).
accel_plausibility_flag: non-physical acceleration/jump detection marker.
slip_detect_flag: slip/spin marker driven by consistency and aiding checks.
sensor_open_short_flag: explicit wiring/sensor failure markers.
debounce_reject_count: count of rejected spikes/bounce events.
pulse_period_outlier_count: outlier timing that indicates tooth loss or noise.
health_counter / health_score: rolling credibility metric for supervision gating.
timestamp_valid_flag: timebase validity for evidence traceability.

Acceptance rules: when speed can be treated as evidence

Consistency: channel delta remains within limit for a bounded window (N cycles) with no open/short faults.
Plausibility: acceleration/jump checks pass and slip flag is not active (or the system enters a defined conservative mode).
Timing: timestamp validity is true and evidence is not stale (timeout discipline holds).

Figure (H2-5): Speed evidence pipeline (sensor → AFE → checks → evidence)

Figure 5. Speed becomes safety-grade evidence only after AFE conditioning, credibility checks, and auditable field emission.

Cite this figure: CBTC/ETCS Speed Evidence Pipeline (Figure 5)

Suggested caption: “Sensors and AFE produce stable edges; checks produce credibility flags and counters used by supervision.”

H2-6. Position referencing: balise interface and map constraints (onboard side)

Balise as a discrete anchor: shrinking uncertainty, not just “reading a tag”

Onboard odometry is an integration process and will drift. A balise provides discrete position anchors that allow the onboard unit to correct drift, tighten uncertainty bounds, and validate map constraints. The onboard side must treat a balise read as evidence only when identity and timing integrity are provable.

Anchor Evidence Expected-Set Match Timestamp Validity Duplicate/Missed/Misread Uncertainty Bound

Onboard reception chain (abstracted to “data enters OBU”)

Antenna/interface: provides the physical receive point; the onboard concern is signal presence and interface health.
Demod output: produces decoded balise data units; onboard logic consumes identity and integrity markers.
Safety intake: validates data within time windows and integrity rules before allowing anchor use in the safety state estimate.

RF front-end internals and transponder implementation are out of scope; this chapter focuses on onboard validation, correction, and evidence logging.

How anchors constrain the onboard position state

The onboard position state is best treated as estimate + uncertainty bound + validity window. When an anchor is accepted, the system applies a bounded correction and tightens the uncertainty. When an anchor is suspicious or missing, the system widens uncertainty and enforces conservative behavior through supervision.

Correct drift: reduce accumulated odometry bias using accepted anchor evidence.
Trigger map states: anchors can confirm region boundaries or constraints already present in the onboard map.
Protect safety: uncertainty growth and time validity directly influence supervision conservatism.

Evidence checks: identity match + timing + anomaly discrimination

Identity consistency: anchor ID must match an expected set for the current corridor/time window.
Timestamp validity: acceptance requires a valid timebase and bounded freshness.
Duplicate detection: same anchor appears again within an impossible distance/time window.
Missed detection: expected anchor window passes with no read; uncertainty must widen deterministically.
Misread detection: unexpected ID or integrity failure; anchor is rejected and logged as evidence.

Balise evidence fields (field dictionary)

balise_id: decoded anchor identity.
expected_set_hit: whether the ID belongs to the expected set for the current segment.
balise_timestamp: capture time used for evidence ordering.
timestamp_valid_flag: timebase validity marker.
balise_duplicate_flag: duplicate/too-soon occurrence marker.
balise_missed_flag: expected-window miss marker.
balise_misread_flag: unexpected ID or integrity failure marker.
correction_applied_flag: whether anchor correction was applied.
uncertainty_bound_changed_flag: whether uncertainty tightened/widened after the event.

Figure (H2-6): Onboard balise anchor + map constraint flow

Figure 6. Onboard balise flow: validate identity and time, match expected anchors, apply bounded corrections, and log anomalies for audit.

Cite this figure: CBTC/ETCS Balise Anchor Onboard Flow (Figure 6)

Suggested caption: “Balise anchors shrink uncertainty only when identity and timing integrity are provable.”

Scope cut (to avoid overlap)

This chapter covers onboard reception, validation, correction, and evidence logging. Transponder RF front-end design and wayside placement strategy belong to the Balise/Transponder page.

H2-7. Radio/session interface for CBTC/ETCS (safety comms without RF deep dive)

Focus: trusted session + movement authority integrity (not RF)

The onboard unit does not need RF implementation details to be safe. It needs a trusted session interface that turns an unreliable bearer into deterministic safety inputs. Movement authority and restriction curves must be accepted only when identity, integrity, freshness, and timing are provable and auditable.

Integrity Anti-replay Seq/Freshness Latency/Jitter Deterministic Gate Evidence Fields

Session boundary: what the OBU must expose to safety logic

Treat the radio path as a bearer. The safety boundary is the session layer that outputs: (1) trusted messages, (2) session health, and (3) explicit reasons when messages are rejected or when the system must degrade.

Inputs: authenticated/checked authorization updates, bounded timing metadata, and link statistics with timestamps.
Outputs: session_state, message_accept/reject decisions, and deterministic degrade triggers.
Rule: non-safety processing can request actions, but the safety gate decides whether updates are usable.

CBTC side: session state + authorization curve update integrity

CBTC authorization updates must be treated as versioned evidence. A valid update requires integrity checks, sequence/freshness acceptance, and a bounded activation policy so that network jitter cannot silently shift supervision.

CONNECTED

updates accepted; authorization version advances; health counters stable

Gate condition

integrity OK + freshness OK + latency within window

DEGRADED / RECONNECTING

updates restricted; conservative behavior enabled; reconnection reason is logged

Gate condition

stale/late updates, loss bursts, seq anomalies, or integrity failures

The exact ground-side logic is out of scope; the onboard view is a deterministic session gate with auditable acceptance rules.

ETCS side: Euroradio/RBC session (concept-level, acceptance is still testable)

Safety communication must prevent silent corruption and stale reuse. Even at a conceptual level, each safety attribute maps to observable fields.

Integrity: message checks must fail closed; failures increment explicit counters and reason codes.
Anti-replay: stale or repeated messages are rejected by sequence/freshness windows; violations are logged.
Session binding: session changes require explicit reason codes; acceptance must be tied to the current session context.

Unreliable network, deterministic behavior: turning jitter into bounded inputs

Packet loss and delay are not “annoyances.” They are safety inputs. The session layer must quantify them and feed deterministic gating so supervision never depends on best-effort timing.

Loss window: sustained loss raises session health degradation and may freeze authorization advancement.
Latency distribution: p50/p99 windowing prevents rare long delays from being misinterpreted as valid immediacy.
Seq anomalies: jumps and reorder events become explicit evidence and rejection triggers.

Evidence fields dictionary (used later by validation + FAQs)

session_state: CONNECTED / DEGRADED / RECONNECTING / EXPIRED.
reconnect_reason_code: auth_fail • seq_violation • timeout • bearer_loss (examples).
msg_seq_last & msg_seq_jump_count: ordering and replay evidence.
auth_version_id: current movement authority / curve version accepted by the gate.
freshness_expired_count: stale update rejections.
crc_or_mac_fail_count: integrity failures (fail closed).
packet_loss_rate: windowed loss metric.
latency_ms_p50 & latency_ms_p99: bounded timing metrics.
rx_stale_flag: stale data detection flag used by supervision.

Figure (H2-7): Trusted session gate for authorization updates

Figure 7. A trusted session gate converts bearer variability into bounded evidence and deterministic acceptance for authorization updates.

Cite this figure: CBTC/ETCS Trusted Session Gate (Figure 7)

Suggested caption: “Integrity, freshness, and timing checks gate movement authority updates; evidence fields enable audit and validation.”

Scope cut (to avoid overlap)

This chapter defines the safety session interface and message trust evidence. RF chain design, antennas, power amplifiers, and bearer-specific engineering belong to the Train Radio page.

H2-8. Time sync & deterministic behavior (why timestamps matter)

Time is part of the evidence chain

In train control, timestamps are not a convenience. They are a core evidence primitive. If time is not credible, event ordering becomes disputable, freshness windows become meaningless, and sensor alignment can silently corrupt supervision decisions. The onboard unit must expose time credibility as an explicit input to safety logic.

Dual Sources Drift/Jump Monitor Holdover Freshness Windows Deterministic Degrade Evidence Fields

Single-unit timebase: dual sources + credibility monitoring

The onboard unit should treat its own time as a monitored subsystem: multiple sources, continuous comparison, and explicit quality states.

Dual sources: primary clock and backup clock are comparable and can be selected with bounded rules.
Offset/drift monitoring: source-to-source offset and drift rate expose degradation before it becomes a safety incident.
Jump detection: sudden time steps must raise explicit flags and trigger deterministic reactions.
Holdover policy: when external reference is unavailable, time remains usable but with a declared quality level.

Where timestamps are consumed (and why each needs time credibility)

Event recording

Auditable incident chains require valid time markers and stable ordering evidence.

Fields

timestamp_valid_flag • time_source • jump_flag

Sensor alignment

Speed evidence, balise anchors, and other inputs must align under the same credibility rules.

Fields

offset/drift • holdover_level • stale_flag

Authorization freshness

Session updates and safety messages depend on freshness windows to prevent stale use.

Fields

freshness_violation • timestamp_valid

Deterministic supervision

Reaction timing is safety-critical; time credibility affects whether actions can be asserted.

Fields

reaction_timestamp • holdover_level

When time is abnormal: deterministic degrade and diagnostic triggers

A “bad clock” must not create ambiguous safety behavior. The onboard unit should map time credibility states to deterministic restrictions and diagnostic actions.

timestamp_invalid: block immediate application of new authorization updates; enter conservative curves where required.
clock_jump_detected: freeze or reset bounded state machines, annotate logs, and raise explicit reason codes.
holdover_level degraded: widen uncertainty, tighten supervision margins, and prioritize re-establishing credible time.

Full train-wide PTP architecture is out of scope; this chapter focuses on onboard time credibility and deterministic reactions.

Evidence fields dictionary (time credibility)

time_source: primary • backup • holdover.
timestamp_valid_flag: whether timestamps are credible for evidence ordering.
clock_offset_ms: source-to-source offset evidence.
drift_rate_ppm: monitored drift estimate.
clock_jump_flag & jump_magnitude_ms: sudden step evidence.
holdover_level: declared quality tier when reference is missing.
freshness_window_violation_count: stale-use prevention evidence.
reaction_timestamp: time-stamped fault reaction evidence.

Figure (H2-8): Timestamp as an evidence spine (sources → monitor → consumers → actions)

Figure 8. Time credibility forms the evidence spine: it enables ordering, freshness checks, alignment, and deterministic degrade actions.

Cite this figure: CBTC/ETCS Timestamp Evidence Spine (Figure 8)

Suggested caption: “When time quality changes, supervision and logging must react deterministically and record explicit time-evidence fields.”

Scope cut (to avoid overlap)

This chapter defines onboard time credibility, monitored fields, and deterministic reactions. Train-wide PTP topology, TSN distribution, and backbone synchronization belong to the Timing & Sync and Backbone Gateway pages.

H2-11. Validation & field diagnostics (bench → track)

Why this chapter exists: make safety evidence reproducible

Validation is not a checklist. It is a reproducible loop that turns each evidence chain (speed/odometry, balise anchors, trusted session, time credibility, secure boot, fault state machine) into injectable, measurable, and auditable pass/fail behavior. Bench injection creates controlled repeatability; track scenarios confirm real-world boundaries using the same evidence gates.

Injection Knobs Evidence Fields Pass/Fail Gates Deterministic State Bench → Track Field Feedback

Evidence map (what to validate, tied to earlier chapters)

This map keeps the chapter vertical: every test points to specific evidence fields and deterministic actions.

Speed/Odometry evidence (H2-5): wheel_speed_deltaaccel_plausibility_flagdebounce_reject_countsensor_health_counter
Balise anchor evidence (H2-6): balise_missed_flagexpected_set_hitmisread_flaganchor_timestamp
Trusted session evidence (H2-7): session_statereconnect_reason_codelatency_ms_p99packet_loss_rateseq_jump_countfreshness_expired_count
Time credibility evidence (H2-8): timestamp_valid_flagclock_offset_msdrift_rate_ppmclock_jump_flagholdover_level
Secure boot & key evidence (H2-9): boot_measurement_digestsignature_fail_reason_codekey_version_idrollback_attempt_count
Fault state machine evidence (H2-10): fault_classstate_transition_reasonaction_takenrecovery_window_count

Bench injection principle (a single template for every test)

Every bench test must define the same four components. This prevents “hand-wavy” validation and guarantees deterministic behavior under identical injections.

1) Injection knob

What is injected, amplitude/rate, duration, and seed for repeatability.

2) Observation points

Two waveforms (input vs gated output) + a fixed set of log fields.

3) Pass/Fail gates

Threshold + window (p99, N-cycles, timeout) producing a clear verdict.

4) Expected state reaction

Deterministic transition + reason_code + action_taken (degrade/freeze/brake).

Bench: sensor injection (speed/odometry chain)

The purpose is not to “break the sensor,” but to prove that plausibility logic and fault handling remain deterministic and auditable.

Injection knobs

Missing pulses • burst glitches • bandwidth/noise increase • slow bias drift • A/B channel mismatch ramp.

Observe + gates

Waveforms: raw pulses vs filtered speed. Fields: wheel_speed_delta + accel_plausibility_flag. Gate: sustained delta violation over N windows triggers DEGRADED; persistence escalates to RESTRICTED.

First corrective step (bench diagnosis): before replacing sensors, check debounce_reject_count and open/short indicators to confirm the failure is not a filtering/reference issue.

MPN examples (AFE / isolation / safety compute commonly used in safety-critical designs):

Safety MCU (lockstep-class): Infineon AURIX TC3xx (e.g., TC397 family), Renesas RH850 family, ST SPC58 family.
Digital isolators: Analog Devices ADuM141E, Texas Instruments ISO7741.
Isolated RS-485 (for rugged links around cabinets): Analog Devices ADM2587E, TI ISO1410 + transceiver pairing.

These MPNs are provided as concrete examples for documentation and test-fixture planning; exact selection depends on SIL/rail qualification constraints and design rules.

Bench: session latency/loss injection (trusted session gate)

The session gate must convert an unreliable bearer into bounded evidence. Validation proves that jitter and loss translate into predictable gating, not silent timing drift.

Injection knobs

Latency tail increase (p99) • burst loss • reorder/seq jump • forced reconnect events.

Two evidence checks + first fix

Checks: session_state + latency_ms_p99 (or packet_loss_rate). First fix: inspect reconnect_reason_code (auth vs seq vs timeout) before blaming coverage.

MPN examples (platform/communications building blocks frequently referenced in safety systems):

Secure element (for device identity & credential operations): NXP SE050, Microchip ATECC608B, STMicroelectronics STSAFE-A110.
Ethernet PHY (industrial/harsh environments often use robust PHY families): TI DP83867 family (example PHY class), NXP TJA110x family (example rugged PHY class).

PHY selection is system-dependent (train backbone/TSN constraints belong to the backbone page); this chapter validates the session gate behavior and evidence fields.

Bench: clock drift/jump injection + signature failure injection (time & secure boot evidence)

Clock injection

Knobs: drift_rate_ppm increase • step jump • forced time_source switch • holdover_level degradation. Observe: timestamp_valid_flag + clock_jump_flag; gate: invalid time blocks immediate authorization use and forces deterministic degrade.

Signature/rollback injection

Knobs: bad signature • expired cert • missing manifest • rollback attempt. Observe: signature_fail_reason_code + key_version_id; gate: fail-closed (no unsafe execution), with auditable counters.

MPN examples (time/clocking and crypto hardware building blocks):

Jitter-cleaning / timing PLL (system time conditioning examples): Renesas 8A34001 (class), Texas Instruments LMK04828 (class).
Hardware root-of-trust / TPM-style modules (example class): Infineon OPTIGA™ TPM family (class), NXP EdgeLock secure element families (class).

Track: scenario validation (real conditions, same evidence gates)

Track scenarios validate the boundaries in real dynamics. The goal is not “coverage reporting,” but proving that the same gates and fields produce deterministic state transitions under real disturbances.

Slip/slide conditions: confirm wheel_speed_delta + plausibility flags cause conservative behavior with clear reason codes.
Weak coverage zones: confirm session_state + latency_ms_p99/packet_loss_rate drive stable gating (freeze auth_version advancement where required).
Balise missing segments: confirm balise_missed_flag + expected_set_hit tighten uncertainty and trigger bounded degrade (no silent drift).
Fast station entry/exit switching: confirm action timing remains explainable (reaction_timestamp) and time validity is enforced.

Decision template (waveforms + fields + gates + repro parameters)

Use one template for every bench/track test so failures become actionable and FAQ-ready (2 evidence checks + 1 first fix).

Test ID: BENCH-SESSION-LAT-01
Injection knob:
  - latency_tail_ms, loss_burst_len, reorder_rate, duration_s, seed

Observe (waveforms):
  - bearer_rx_timestamps vs session_gate_accept_pulses

Observe (log fields):
  - session_state, latency_ms_p99, packet_loss_rate, seq_jump_count
  - reconnect_reason_code, auth_version_id, action_taken, state_transition_reason

Pass/Fail gate:
  - if latency_ms_p99 exceeds window_limit for N windows → DEGRADED within T
  - if session_state becomes EXPIRED → freeze auth_version_id advancement, log reason_code

Expected deterministic reaction:
  - NORMAL → DEGRADED → RECOVERY_PENDING (bounded), no silent acceptance

Repro parameters (recorded in log):
  - duration_s, window_len, threshold_id, seed, build_id, config_hash

Keep thresholds as IDs (threshold_id) so field feedback can update limits without rewriting test procedures.

Figure (H2-11): Bench → evidence gates → track → feedback loop

Figure 11. A single validation loop turns fault injection into deterministic evidence gates, confirms boundaries on track, and feeds field thresholds back into the same gates.

Cite this figure: CBTC/ETCS Bench→Track Validation Loop (Figure 11)

Suggested caption: “Controlled bench injections validate evidence gates and state transitions; track scenarios confirm boundaries; artifacts enable reproducible field feedback.”

Scope cut (to avoid overlap)

This chapter defines OBU-level injection knobs, observable evidence fields, deterministic state expectations, and reproducible decision templates. Line-test organization, project management process, and wayside acceptance workflows are intentionally excluded.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12. Reference architecture BOM hotspots (IC classes + example MPN buckets)

How to use this chapter (evidence → hardware buckets)

This BOM view is organized by evidence responsibilities: compute credibility, key custody, trusted I/O across isolation, speed/position credibility, power/reset determinism, and durable logging. The MPNs below are example buckets for documentation and test-fixture planning; final selection must follow the project’s safety, environment, and lifecycle constraints.

Evidence-Driven BOM MPN Buckets Integration Traps What-to-log Bench/Track Ready

Bucket 1 — Safety MCU / SoC (lockstep / safety island)

Evidence role: deterministic supervision under faults (H2-3/H2-10/H2-11). Key selection gates: lockstep/compare support, ECC/parity visibility, LBIST/MBIST reason codes, watchdog/reset attribution, partitioning support for safety vs non-safety workloads.

Selection gates

Lockstep/compare • ECC+reporting • BIST reason codes • clock monitor • safety manuals/toolchain support • long lifecycle

Integration traps

Safety partition jitter • ECC “storm” jitter • ambiguous reset reasons • shared bus contention between safety and non-safety

Example MPNs (typical safety MCU families):

Infineon AURIX™ TC3xx: TC397, TC387, TC377
NXP S32 Safety MCU: S32S247 (family example), S32K3 (safety-capable variants)
Renesas RH850 (functional safety families): RH850/P1x (family example), RH850/U2A (family example)
ST SPC58 (Stellar/Power Architecture safety families): SPC58EC (family example), SPC58NH (family example)
Texas Instruments safety MCU class: TMS570LS (family example)

Suggested log fields: lockstep_mismatch_count, ecc_correctable_rate, bist_fail_reason, wdt_reset_reason, sched_jitter_hist.

Figure 12-1. Safety MCU/SoC bucket: the primary source of deterministic supervision evidence (lockstep/ECC/BIST/WDT).

Cite this figure: OBU BOM Hotspot — Safety MCU/SoC (Figure 12-1)

Bucket 2 — HSM / Secure Element (keys + secure boot + anti-rollback)

Evidence role: prevent silent safety-logic replacement (H2-9) and provide anti-rollback proof for audits (H2-11). Selection gates: non-exportable keys, TRNG, monotonic counters, attestation/measurement support, service/OTA policy hooks.

Example MPNs (SE/HSM-class components):

Microchip CryptoAuthentication™: ATECC608B
NXP EdgeLock: SE050
STMicroelectronics secure element: STSAFE-A110
Infineon OPTIGA™ Trust: OPTIGA Trust M (family example)
NXP secure element family example: A71CH
TPM-style module class (example): Infineon OPTIGA TPM (family example)

Suggested log fields: key_version_id, anti_rollback_counter, signature_fail_reason_code, rollback_attempt_count, boot_measurement_digest.

Figure 12-2. HSM/SE bucket: anti-rollback counters and verification reason codes become auditable proof.

Cite this figure: OBU BOM Hotspot — HSM/SE (Figure 12-2)

Bucket 3 — Isolated communications (CAN / RS-485 / generic isolators)

Evidence role: preserve trust across harsh common-mode and surge environments (H2-4/H2-7/H2-10). Selection gates: isolation rating, CMTI robustness, ESD, predictable fail mode, and isolation power integrity.

Example MPNs (isolated comms building blocks):

Isolated CAN transceiver (TI): ISO1050, ISO1042
Isolated CAN transceiver (ADI): ADM3055E
Isolated RS-485 transceiver (ADI): ADM2587E
Isolated RS-485 transceiver (TI): ISO1410
Digital isolators (TI): ISO7741, ISO7721
Digital isolators (ADI): ADuM140E, ADuM141E

Suggested log fields: comm_crc_err_count, seq_jump_count, link_reset_reason, latency_ms_p99, isolation_fault_flag.

Figure 12-3. Isolated comms bucket: verify that isolation does not corrupt CRC/sequence/latency evidence.

Cite this figure: OBU BOM Hotspot — Isolated Comms (Figure 12-3)

Bucket 4 — Speed / position AFE (encoder / Hall / MR + ΣΔ / ADC examples)

Evidence role: turn raw edges/fields into credible speed evidence with diagnosable failure modes (H2-5/H2-10). Selection gates: input protection, thresholds/hysteresis, open/short diagnostics, noise tolerance, and observable health counters.

Example MPNs (sensor interface + conversion):

Hall/MR sensor interface examples: Maxim/ADI MAX9926, MAX9927 (speed sensor front-end class)
Magnetic encoder IC examples: ams AS5047P, Infineon TLE5012B
Hall sensor IC examples (signal source): TI DRV5055 (linear Hall class), Allegro A1332 (angle sensor class)
ΣΔ modulators (for isolated measurement patterns): ADI AD7403, AD7405
Isolated amplifier / modulator class examples: TI AMC1301, AMC1311
High-speed precision ADC class examples: ADI AD7380 (class), TI ADS131M family (class)

Suggested log fields: debounce_reject_count, open_short_flag, sensor_health_counter, wheel_speed_delta, slip_flag.

Figure 12-4. Speed/position AFE bucket: diagnostics and counters are mandatory to prove speed credibility.

Cite this figure: OBU BOM Hotspot — Speed/Position AFE (Figure 12-4)

Bucket 5 — Power & supervision (wide-VIN, supervisor, watchdog, holdup)

Evidence role: prevent “mystery resets” and preserve last-gasp logging (H2-10/H2-11). Selection gates: reset reason observability, brownout thresholds, watchdog independence, holdup monitoring, and deterministic shutdown / commit windows.

Example MPNs (power front-end + supervision + protection):

Supervisors / reset ICs: TI TPS3890, Maxim/ADI MAX16052, ADI LTC2937
Watchdog timers: TI TPS3431, Maxim/ADI MAX6369
Wide-VIN buck regulator class examples: TI LM5009 (class), ADI LT8609S (class)
eFuse / protection: TI TPS25982, ADI LTC4368
Power-path / ideal diode controller class: TI TPS2121 (power mux class)

Suggested log fields: brownout_count, reset_reason_code, holdup_voltage_trace, last_gasp_write_status, wdt_reset_reason.

Figure 12-5. Power & supervision bucket: reset attribution and holdup window must be observable and testable.

Cite this figure: OBU BOM Hotspot — Power & Supervision (Figure 12-5)

Bucket 6 — NVM & logging (durable storage + evidentiary logs)

Evidence role: preserve explainable timelines and tamper-resistant event history (H2-8/H2-9/H2-11). Selection gates: endurance strategy, power-fail behavior, commit granularity, and integrity tags (hash/signature) aligned with timestamps.

Example MPNs (NVM buckets commonly used for robust logging):

SPI FRAM (Fujitsu): MB85RS64V, MB85RS256
SPI FRAM (Cypress/Infineon): FM25V10 (family example)
SPI NOR Flash (Winbond): W25Q128JV, W25Q256JV
SPI NOR Flash (Micron): MT25QL128ABA (family example)
Serial NAND Flash (Winbond): W25N01GV (family example)
eMMC class (example bucket): JEDEC eMMC devices (use vendor-qualified part per lifecycle policy)

Suggested log fields: log_commit_latency, log_drop_count, time_valid_flag, hash_chain_head, signature_status, storage_wear_index.

Figure 12-6. NVM & logging bucket: commit windows and integrity tags turn storage into evidence.

Cite this figure: OBU BOM Hotspot — NVM & Logging (Figure 12-6)

H2-13. FAQs (Accordion ×12; each = conclusion + 2 evidence checks + 1 first fix)

How these FAQs are engineered (no scope creep)

Each answer is constrained to: one-sentence conclusion, two evidence checks (field-level), and one first fix (actionable). Every item maps back to H2-4 ~ H2-12, so the troubleshooting path is mechanically verifiable.

Train occasionally triggers emergency braking, but the wireless link looks “up” — session freshness or timestamp anomaly?

Conclusion: This pattern usually comes from an evidence gate expiring (freshness/time validity), not a total link drop.

Evidence checks: (1) Compare freshness_expired_count against session_state to see if messages are accepted but stale. (2) Check timestamp_valid_flag and clock_jump_flag around the brake trigger time.

First fix: Inspect reconnect_reason_code and time-source switch logs before tuning thresholds.

See: H2-7 / H2-8 / H2-10

Speed suddenly spikes and triggers overspeed — sensor glitch or slip/slide misclassification?

Conclusion: A true spike must be separated from slip/slide plausibility conflicts using two-layer evidence.

Evidence checks: (1) Review debounce_reject_count and open_short_flag for transient edge artifacts. (2) Correlate wheel_speed_delta with accel_plausibility_flag (and slip_flag if present) to confirm whether dynamics are physically plausible.

First fix: Tighten capture of raw edge timing and filtered speed in the same window before replacing sensors.

See: H2-5 / H2-10

After passing a balise, position still drifts — balise misread or odometry drift model not converging?

Conclusion: Drift after an anchor is usually an anchor credibility issue or a slow odometry bias that remains unbounded.

Evidence checks: (1) Validate misread_flag and expected_set_hit for anchor identity consistency. (2) Compare anchor_timestamp to timestamp_valid_flag and track post-anchor wheel_speed_delta trend for bias.

First fix: Require “anchor accepted” events to include a stable time-valid window and a bias-reset marker in logs.

See: H2-6 / H2-5 / H2-11

Startup occasionally enters degraded mode — secure boot rejection or safety self-test coverage trigger?

Conclusion: Degraded-at-boot is normally either integrity failure (fail-closed) or compute credibility failing self-tests.

Evidence checks: (1) Read signature_fail_reason_code with boot_stage_id to locate where verification failed. (2) Check bist_fail_reason and ecc_correctable_rate to see if self-test or memory corrections forced safety downgrade.

First fix: Freeze the boot measurement digest and key version into the incident record before retrying or re-flashing.

See: H2-3 / H2-9 / H2-10

Log timeline is scrambled and accountability fails — RTC drift or logging pipeline latency?

Conclusion: Timeline issues are rarely “just RTC”; they often come from a time-validity collapse or late commits.

Evidence checks: (1) Inspect clock_offset_ms, drift_rate_ppm, and timestamp_valid_flag across the suspect window. (2) Compare log_commit_latency and log_drop_count to see whether writes are delayed or lost.

First fix: Record time-valid transitions as explicit events and tag each log segment with a monotonic sequence number.

See: H2-8 / H2-11 / H2-12

Frequent reconnects in weak coverage — latency tail out-of-window or replay/sequence protection triggering?

Conclusion: Reconnect storms typically come from tail latency and sequence/freshness gates, not average RSSI.

Evidence checks: (1) Use latency_ms_p99 and packet_loss_rate to confirm tail behavior vs thresholds. (2) Check seq_jump_count and freshness_expired_count to identify sequence/replay gating vs transport loss.

First fix: Start from reconnect_reason_code distribution to separate auth/seq/timeouts before any RF investigation.

See: H2-7 / H2-9 / H2-10

After maintenance, “input mismatch” alarms increase — wiring/common-mode across isolation or thresholds too tight?

Conclusion: Maintenance often changes reference/return paths; isolation and common-mode behavior can create false mismatches.

Evidence checks: (1) Compare input_consistency_fail_count (or equivalent) with isolation_fault_flag/link_reset_reason to detect common-mode events. (2) Check clock_offset_ms or timing validity if mismatch correlates with timestamp anomalies.

First fix: Re-validate the isolation power/return layout and then re-baseline threshold IDs, not raw numeric thresholds.

See: H2-4 / H2-8 / H2-10 / H2-12

Same line, different trains show big behavior differences — wheel/gear variation or calibration parameter version drift?

Conclusion: Cross-train variance is often parameter/version drift plus mechanical differences that the model does not absorb.

Evidence checks: (1) Compare calibration_param_version (or config hash) and threshold_id across vehicles. (2) Track long-term bias signals via wheel_speed_delta trend and sensor_health_counter to separate mechanics from electronics.

First fix: Lock parameter bundles under signed configuration and log config_hash at every trip start.

See: H2-5 / H2-11 / H2-12

Compute has enough performance but shows periodic jitter — safety partition preemption or ECC correction “storm”?

Conclusion: Periodic jitter is usually a determinism problem (scheduling/ECC), not raw compute shortage.

Evidence checks: (1) Compare sched_jitter_hist against safety task periods to confirm preemption patterns. (2) Inspect ecc_correctable_rate and any lockstep_mismatch_count spikes that correlate with missed deadlines or degraded transitions.

First fix: Separate safety and non-safety bus/memory contention and cap ECC handling bursts to bounded service windows.

See: H2-3 / H2-8 / H2-11 / H2-12

Authority updates look normal, but braking curve execution lags — I/O path delay or state machine stuck in conservative branch?

Conclusion: “Update OK but action late” is either vital I/O latency or a state machine holding conservative mode.

Evidence checks: (1) Compare reaction_timestamp and io_path_latency (or equivalent) from command to actuation confirmation readback. (2) Check state_transition_reason and action_taken to see whether RESTRICTED/DEGRADED policies block fast ramp-in.

First fix: Log the full I/O readback chain with sequence numbers and enforce explicit exit criteria from conservative branches.

See: H2-4 / H2-10

Field suspects “firmware rollback” — how to prove anti-rollback is actually working?

Conclusion: Anti-rollback proof requires monotonic counters and version gating evidence, not verbal assurance.

Evidence checks: (1) Verify anti_rollback_counter_value and key_version_id are strictly increasing across updates. (2) Confirm rollback_attempt_count and signature_fail_reason_code capture any downgrade attempts with timestamps and boot stage attribution.

First fix: Ensure boot measurement digest and counter values are exported into signed incident reports for audits.

See: H2-9 / H2-11 / H2-12

Critical events are lost during power-off — insufficient holdup or incorrect write policy?

Conclusion: Event loss at power-off is often policy/commit timing, even when holdup energy exists.

Evidence checks: (1) Compare holdup_voltage_trace against last_gasp_write_status to see whether voltage remained sufficient but commit started too late. (2) Inspect log_commit_latency and log_drop_count to detect oversized transactions or blocked I/O.

First fix: Implement a bounded “last-gasp” journal (small, fixed-size) to capture the final critical record before bulk writes.

See: H2-11 / H2-12 / H2-10