Brake Control Unit (EBD/EP) for Rolling Stock
← Back to: Rail Transit & Locomotive
A Brake Control Unit (EBD/EP) is only “safe” when its pressure and wheel-speed signals remain trustworthy under rail harness transients, and every valve/pump action can be proven by evidence fields (resets, drift, CRC, actuation counters) rather than guesses. This page shows how to design the sensing, drive, redundancy voting, EMC hardening, and black-box logging so faults trigger deterministic fail-safe behavior and every field issue can be reproduced and fixed with a bench→rig→train validation plan.
H2-1. Role, Boundaries, and What This Page Covers
A Brake Control Unit (EBD/EP) is the execution controller that converts a brake request into a verifiable pneumatic outcome: pressure build/hold/release with traceable evidence, controlled actuator behavior, and fail-safe responses under rail transients. The focus is the electronics that make the brake action measurable, diagnosable, and safe-to-fail.
What the Brake Control Unit owns (in-scope)
- Pressure feedback control: line/cylinder pressure acquisition, filtering, plausibility checks, and closed-loop actuation decisions.
- Speed-based slip inputs: wheel/axle speed conditioning for reliable slip/slide evidence (thresholds, hysteresis, missing-pulse handling).
- Actuation outputs: valve coil driving and pump motor driving with current/voltage evidence for open/short/stall detection.
- Redundancy & diagnostics: redundant MCU architecture, voting windows, watchdogs, fault latching, and event logging for root-cause traceability.
- Rail constraints: EN 50155-style power/environment realities and EN 50121-style EMC realities translated into checkable design rules.
Interfaces framed as evidence-in / action-out
- Evidence-in: pressure sensors (P_line, P_cyl), speed inputs (wheel/axle), brake requests (driver/ATP interface types only).
- Action-out: valve coil drive, pump motor drive, fail-safe cut/vent path, safety relay status and alarms.
- Rail stressors: supply dips/surges, EFT/ESD coupling via harness, ground bounce, and common-mode noise across isolation boundaries.
H2-2. System Block Diagram: Pneumatic Chain + Electronics Chain
The fastest way to make a Brake Control Unit understandable (and reviewable) is a single top-level diagram that ties the pneumatic path, sensing path, actuation path, isolation boundaries, and event evidence into one page. The diagram below is intentionally “review-ready”: each label corresponds to a measurable point or a safety action.
How to read this architecture (review checklist)
- Pneumatic outcome: pressure is measured at both the supply/line side and the cylinder side (P_line, P_cyl).
- Evidence points: actuator current (I_valve, I_pump) is sampled to prove whether the commanded action physically occurred.
- Isolation boundary: sensing + comms are separated from noisy actuation and vehicle supply disturbances.
- Fail-safe path: a defined cut/vent route exists even when control logic is degraded.
- Rail stressors: surges/EFT/ESD enter via harness; clamp loops and grounding must be drawn as paths, not as parts lists.
H2-3. Rail Requirements Translated into Checkable Specs
Rail compliance is most useful when it becomes a measurable acceptance list: what must be injected, what must be observed, what is unacceptable behavior, and which evidence fields must exist to prove the outcome. The tables below translate power, environment, EMC, functional performance, and diagnostics into checkable items that later chapters reference.
Power & supply events (platform-dependent input, behavior-defined acceptance)
| Event / condition | Checkable behavior (pass/fail) | Evidence to record (fields) |
|---|---|---|
| Wide input window 24/36/72/110V class |
Closed-loop pressure control remains stable; no output chatter; no unintended vent/cut. | Vbat_min/Vbat_max, control_state, fault_code (if any) |
| Brownout / dip | No unsafe valve/pump behavior during dip; if reset occurs, outputs move to defined safe state and restart is deterministic. | reset_reason, brownout_cnt, last_good_pressure_ts, safe_output_latched |
| Surge / overvoltage | No false apply/release; no permanent latch unless evidence proves actuator/AFE fault. | Vbat_peak, fault_code, latch_reason, event_ts |
| Hold-up need | Time budget exists to finish a safe action (e.g., cut/vent or freeze outputs) and commit an evidence snapshot. | holdup_ms, event_commit_ok, snapshot_seq |
Environment (temperature, vibration, humidity) tied to evidence drift and reliability
| Stress | What must not happen | Evidence fields |
|---|---|---|
| Temperature range | Pressure offset does not drift beyond plausibility; valve/pump control does not oscillate due to sensor noise. | temp_local, P_offset_est, P_line/P_cyl, P_rate |
| Vibration / shock | No intermittent open/short events from harness/connector; no intermittent pulse loss on speed input. | sensor_status_bits, missing_pulse_cnt, comm_crc_err_cnt (if applicable) |
| Humidity / condensation | No leakage-driven bias that mimics pressure changes; no false saturation events. | ADC_saturation_count, P_line/P_cyl, sensor_status_bits |
EMC & immunity (conducted/radiated, EFT/ESD) with unacceptable behaviors defined
| Injection / disturbance | Unacceptable behavior | Evidence fields |
|---|---|---|
| Conducted/Radiated | False apply/release; speed pulse mis-detection that triggers slip logic; pressure loop instability. | P_rate, speed_valid, jitter_ppm, control_state |
| EFT | Silent reset with missing logs; random latch with no correlated evidence; uncontrolled valve driver state. | reset_reason, event_commit_ok, latch_reason, I_valve_peak |
| ESD | Spurious sensor faults without status bits; pressure jump without saturation/status correlation. | sensor_status_bits, ADC_saturation_count, P_line/P_cyl |
Functional performance + diagnostics (what must be proven and recorded)
| Capability | Checkable acceptance | Evidence snapshot fields |
|---|---|---|
| Brake response time | Time from command to pressure reaching target is bounded; overshoot/undershoot is controlled. | event_ts, P_cyl, P_rate, control_state |
| Pressure accuracy | Steady-state error stays within design bounds; drift triggers plausibility alarms before unsafe behavior. | P_line, P_cyl, P_offset_est, sensor_status_bits |
| Actuator redundancy | Single fault in one drive channel does not create uncontrolled actuation; safe output path is deterministic. | I_valve_peak/I_valve_hold, I_pump_rms, safe_output_latched |
| Fault coverage | Open/short/saturation/drift are detected with a bounded detection time and consistent fault coding. | fault_code, sensor_status_bits, ADC_saturation_count, mismatch_cnt |
H2-4. Pressure Sensing Chain (AFE): Accuracy vs EMI vs Fault Coverage
In rail brake control, pressure is not “just an ADC reading.” The pressure chain must remain trustworthy under common-mode noise, long harness coupling, sensor supply variation, temperature drift, and ESD/EFT stress. The goal is a measurement path that can prove when the reading is valid—and quickly declare it invalid when it is not.
Design priorities (trust first, then precision)
- Integrity under noise: preserve P_line/P_cyl signal meaning when ground reference and common-mode conditions move.
- Supply-awareness: ratiometric sensors require supply correlation or ratio strategies to avoid “fake pressure drift.”
- Fault coverage: open/short/saturation/drift must be detectable with clear status bits and bounded detection time.
- Evidence fields: measurements must be tied to an evidence snapshot usable for root-cause analysis.
P_line, P_cyl, P_rate, sensor_status_bits, ADC_saturation_count
(commonly paired with Vbat and temp_local in snapshots).
Sensor interface options (pick based on rail failure modes)
| Interface | Strength under rail noise | Key diagnostics to implement |
|---|---|---|
| 0.5–4.5V ratiometric | Simple wiring; vulnerable to sensor supply movement and reference shifts unless supply is measured or ratios are used. | Open/short-to-bat/short-to-gnd, overrange, drift trend; correlate pressure with sensor supply state. |
| 4–20mA loop | More robust over long harness; less sensitive to voltage drops and common-mode pickup. | Loop open/short, shunt resistor drift, saturation; detect implausible current steps vs physical dynamics. |
| Digital sensor | Resists analog pickup; introduces bus integrity and isolation boundary concerns. | Bus error counters, timeout handling, stale-data detection; status bits must map to safety actions. |
AFE checklist (what makes pressure trustworthy)
| AFE function | Rail-specific risk addressed | Evidence linkage |
|---|---|---|
| Input protection ESD/EFT/surge loops |
Harness-injected transients must clamp without forcing false pressure steps or saturating the ADC path. | ADC_saturation_count, sensor_status_bits |
| Noise + ripple control CMRR/PSRR |
Common-mode and supply ripple can masquerade as pressure change; rejection prevents false P_rate spikes. | P_rate correlation vs actuation events |
| Filtering dynamic-safe |
Filters must reduce pickup while preserving response time so the pressure loop remains stable. | Response time vs overshoot evidence |
| Fault detection open/short |
Connector intermittency and harness faults must be detected deterministically rather than as “random drift.” | sensor_status_bits + fault_code |
| Self-test injection | Proves end-to-end integrity (sensor→AFE→ADC→MCU) without relying on external equipment. | selftest_result + snapshot_seq |
H2-5. Speed/Axle Input Front-End: Noisy Pulses to Trustworthy Speed
Speed inputs are not collected for display; they are used to justify slip-related decisions. The front-end must convert long-harness,
EMI-polluted pulses into a speed stream with explicit validity. The output is therefore a pair: a value (wheel_speed_raw) and a gate
(speed_valid), supported by counters and jitter metrics.
What “trustworthy speed” means in a brake controller
- Edge integrity: comparator threshold + hysteresis prevent false edges under common-mode pickup.
- Time integrity: debounce/filtering reduce noise without destroying low-speed pulses.
- Missing pulse logic: distinguish true stop from dropped edges; expose it as
missing_pulse_cnt. - Validity gate: downstream logic must consume
speed_validbefore usingslip_ratio_est.
wheel_speed_raw, speed_valid, missing_pulse_cnt, jitter_ppm, slip_ratio_est.
The goal is deterministic behavior under EMI: if the speed is untrustworthy, the system must say so and record why.
Input forms (BCU-relevant view)
| Input form | Typical rail failure mode | Front-end design emphasis |
|---|---|---|
| Hall / MR pulses | Threshold-near chatter under CM pickup; low-speed jitter; intermittent missing pulses. | Comparator hysteresis, debounce, CM suppression, missing-pulse handling. |
| Encoder / conditioned pulses | Edge distortion from filtering; isolation/reference errors causing time jitter. | Clean threshold levels, controlled filter corner, isolation boundary discipline. |
| Pulse shaping IC | ESD/EFT shifts thresholds; output becomes “valid-looking” but wrong. | Status gating, plausibility windows, jitter + missing pulse counters. |
Engineering items (minimal algorithm, maximum determinism)
- Speed windowing: compute raw speed in a bounded window; expose window stability as
jitter_ppm. - Missing pulse handling: increment
missing_pulse_cntwhen expected edges are absent; gate output viaspeed_valid. - Low-speed jitter: apply hysteresis + debounce rules that avoid “speed toggling” near zero speed; track with
jitter_ppm. - Slip ratio input quality:
slip_ratio_estmust freeze or degrade whenspeed_valid=0.
H2-6. Valve & Pump Drivers: Solenoids and Motors Under Transients
Actuator channels are both noise sources and fault sources. The driver design must define a deterministic action sequence (kick/hold/release), measure current evidence to prove the action occurred, and execute protective responses under shorts, opens, thermal stress, and rail transients. The result is “drive + evidence + protection,” not just a switching stage.
Evidence-first actuation goals
- Valve channel: prove pull-in and holding via
I_valve_peakandI_valve_hold; validate clamp path viaflyback_clamp_v. - Pump channel: detect overload and stall via
I_pump_rmsandstall_flag; manage heat viathermal_derate_level. - Protection behavior: faults must drive deterministic outputs and produce a snapshot (no silent resets, no unlogged latches).
I_valve_peak, I_valve_hold, flyback_clamp_v, I_pump_rms, stall_flag, thermal_derate_level.
Solenoid valve drive (kick/hold/release) — what must be controlled and proven
| Phase | What the driver does | Evidence / protection |
|---|---|---|
| Kick | Apply strong pull-in drive to guarantee actuation within a bounded time budget. | I_valve_peak proves pull-in energy; overcurrent triggers fast protection. |
| Hold | Use PWM or controlled current to hold with lower dissipation while resisting supply variation. | I_valve_hold stability indicates wiring/coil health; thermal limits feed derating. |
| Release | Turn-off sequence routes inductive energy into a defined clamp loop with controlled dv/dt. | flyback_clamp_v indicates clamp path integrity; abnormal clamp voltage flags layout/loop faults. |
Pump motor drive — protection and evidence without algorithm sprawl
| Condition | Required action | Evidence fields |
|---|---|---|
| Normal load | Start/stop without injecting excessive noise; keep current within expected envelope. | I_pump_rms, snapshot_ts |
| Overload trend | Apply current limiting or derate to prevent thermal runaway; keep behavior deterministic. | I_pump_rms, thermal_derate_level |
| Stall | Stop or enter controlled retry policy (bounded attempts); protect wiring and supply. | stall_flag, I_pump_rms, fault_code |
jitter_ppm in the speed chain and increased
ADC_saturation_count in the pressure chain. Evidence fields enable correlation without expanding into unrelated subsystems.
H2-7. Redundant MCU + Fault Voting: 1oo2/2oo3 Without False Trips
Redundancy must be engineered to avoid spurious trips. The voting system should vote on decisions (permits, validity gates, and state IDs),
not raw analog numbers. Deterministic windows, synchronized sampling, and cross-check integrity (CRC + counters) are required to keep
voter_state stable under transients while still forcing a safe outcome when evidence cannot be trusted.
Design rules that prevent false trips
- Vote decisions, not raw signals: vote permits and validity gates instead of raw sensor readings.
- Bound the time domain: define
window_msand apply anti-chatter logic to stabilizevoter_state. - Align sampling: mismatch must be evaluated on time-aligned samples to avoid “phase error” false mismatches.
- Cross-check integrity: CRC + monotonic counters expose link issues via
crosscheck_crc_err. - Deterministic safe action: force and record
safe_output_latchedonly when evidence cannot be proven safe.
voter_state, mismatch_cnt, window_ms, crosscheck_crc_err, safe_output_latched.
These fields explain whether a trip was caused by true disagreement, window instability, or cross-check integrity failures.
Architecture options (BCU-focused)
| Option | What it is best at | What can cause false trips |
|---|---|---|
| Lockstep MCU | Detects internal execution faults with tight cycle-level agreement. | Common-mode disturbances; external input validity not clearly gated can still mislead both lanes. |
| Dual MCU (main + monitor) | Independent supervision path; strong “decision voting” and evidence gating. | Cross-check link errors and sampling phase offsets; mitigated by window_ms and CRC+counters. |
| 2oo3 (if platform requires) | Improves tolerance to single-point failures and reduces spurious shutdown risk. | Window instability and voter chatter if thresholds and timing alignment are not explicitly engineered. |
What is voted (granularity that remains stable under EMI)
- Output permits: valve/pump enable/inhibit is voted rather than PWM details.
- Input validity gates: pressure_valid and speed_valid are voted rather than raw P or raw speed values.
- State IDs: vote state machine identifiers plus a bounded time budget (
window_ms) to avoid drift. - Mismatch handling: update
mismatch_cntper window; hold stablevoter_statewith anti-chatter.
Synchronization + cross-check (engineer the time axis)
- Sampling alignment: compare time-aligned samples; treat phase error as a synchronization fault, not a sensor fault.
- Cross-check link: CRC + counter prevents silent corruption/replay; expose errors via
crosscheck_crc_err. - Windowed decision: a bounded
window_msconverts momentary EMI glitches into measurable, non-latching evidence. - Safe latch policy:
safe_output_latchedis asserted only when a safe decision cannot be proven under the voter rules.
H2-8. Safety I/O and Fail-Safe Behavior: Trigger → Action → Recovery
Fail-safe behavior must be predictable and auditable. Each fault is defined by a checkable trigger, a deterministic action on safety I/O, and an explicit recovery policy (manual reset, bounded retries, or condition-cleared recovery). The system must record the “why” and the “last known good” evidence so that safe outcomes are repeatable and not opaque.
Safety I/O boundary (BCU-facing)
- Permit/cut: hard enable/inhibit to valve and pump drivers.
- Vent/hold commands: safe pressure actions expressed as state transitions (not continuous tuning here).
- Alarm/relay outputs: fault annunciation and service signaling.
- Reset/unlatch input: manual reset or service procedure entry, when required.
fault_code, latch_reason, recovery_attempts, last_good_pressure_ts.
These fields enable post-event traceability: what happened, why it latched, how many retries occurred, and when pressure was last trusted.
Fault Response Spec (each row is a state-machine rule)
| Fault (fault_code) | Trigger (checkable) | Action (safety I/O) | Latch? | Recovery policy |
|---|---|---|---|---|
| Pressure not trustworthy | Pressure validity fails; saturation or plausibility checks fail; update last_good_pressure_ts only on valid frames. |
Enter degraded braking state; restrict actuation to safe subset; raise alarm. | Conditional (latch_reason) |
Condition-cleared + stable interval; manual reset if persistent or hard fault indicated. |
| Speed not trustworthy | speed_valid=0, excessive missing_pulse_cnt, or high jitter_ppm (from H2-5). |
Disable slip-based decisions; switch to conservative mode; log evidence. | No (typically) | Auto-recover when validity returns and remains stable for bounded time. |
| Valve driver fault | Open/short indicated by current evidence; abnormal clamp behavior; driver reports fault. | Cut permit for affected channel; prevent repeated actuation; raise alarm. | Yes | Manual reset after service validation; do not auto-retry without bounded policy. |
| Pump stall / overload | stall_flag=1 and rising I_pump_rms; thermal derate active. |
Stop pump; apply bounded retry; derate if temperature demands. | Conditional | Bounded recovery_attempts; recover when load clears and temperature returns below threshold. |
| Undervoltage reset | Reset reason indicates UV/brownout; power stability not proven. | Start in safe init state; hold permits off; validate inputs before enabling actuation. | No | Auto when voltage is stable for defined interval and validity checks pass. |
| Cross-check / comm lost | crosscheck_crc_err increases; counters stall; mismatch patterns unstable (from H2-7). |
Switch to conservative voter mode or cut permits; log latch_reason. |
Conditional | Recover when link integrity is stable and voter state remains stable over window. |
Latch vs recoverable (to avoid both unsafe recovery and unnecessary shutdown)
- Latched (typically): actuator hard faults (short/open), repeated unsafe mismatches, conditions that cannot prove safety.
- Recoverable (typically): transient validity loss under EMI, short cross-check disturbances, undervoltage with proven recovery.
- Always logged:
fault_code+latch_reason+recovery_attempts+last_good_pressure_ts.
H2-9. Isolation, EMC & Transient Hardening for Brake Harness Reality
Rail brake harnesses are long, grounded in complex ways, and exposed to frequent transients. Effective hardening is not a bill-of-materials checklist; it is a controlled energy path: where noise is generated, how it couples into victim nodes, where it is clamped, and which return path it uses. Design success is measured by reduced false-trip rate, fewer resets, stable pressure readings, and controlled communication error counters.
Noise sources → coupling paths → victim nodes (path-first)
- Sources: valve/pump switching edges, flyback energy, supply surges, ground bounce.
- Coupling: harness capacitive/inductive coupling, shared return impedance, common-mode current.
- Victims: speed front-end edge detection, pressure AFE saturation/recovery, MCU brownout, voter cross-check link.
Isolation boundary partition (separate risk domains)
- Sensor domain: pressure/speed front-ends and validity gates; keep reference stable against power return noise.
- Control domain: MCU + voter logic; protect against resets and false mismatches under transient stress.
- Actuator/power domain: valve/pump drivers; confine switching and flyback energy with short clamp loops and defined returns.
Grounding & return paths (measure vs power return)
- Measurement reference: pressure/speed front-ends must not share high di/dt return segments.
- Power return: valve/pump currents must return on a controlled path, not through sensitive reference nodes.
- Most common failure mode: a “short” clamp loop physically exists, but the return path closes through the wrong ground.
Protection that works in the harness (close the clamp loop)
- EFT/Surge/ESD: clamp at the entry and keep the clamp loop short and local (device + return path as a closed loop).
- Common-mode suppression: use CMC/RC/shield termination to steer common-mode current away from sensitive domains.
- Shield termination: termination point defines where common-mode current returns; choose it to avoid polluting measurement reference.
reset_reason, brownout_cnt, pressure_offset_est, comm_crc_err_cnt.
H2-10. Diagnostics & Event Logging: Evidence-First Black-Box Style
Diagnostics are a competitive advantage when logs are structured to prove root cause, not just to report “a fault happened.” A black-box approach uses three layers—transient summary, state snapshot, and hardware counters—so that every trip, reset, or degraded transition can be reconstructed with time-correlated evidence.
Three-layer logging model (what gets recorded)
- Transient layer: compact action-adjacent summaries (peaks, durations, counts) around key events.
- State layer: state machine + voter + protection decisions that explain why a safety action occurred.
- Hardware layer: power/reset traces, temperature and lifetime counters that reveal stress and maintenance trends.
reset_reason, brownout_cnt, valve_actuation_count, pump_runtime_h,
pressure_offset_est, comm_crc_err_cnt, event_signature_status.
These fields connect behavior to power integrity, actuation stress, drift, and communication integrity.
Event Snapshot (single record that joins all evidence)
- Header: event timestamp +
fault_code/ event ID. - Transient: short-window summaries around actuation and protection transitions.
- State: voter + fail-safe decisions and their reasons (
latch_reason). - Hardware: reset/brownout, lifetime counters, drift estimates, and comm integrity counters.
- Trust (optional): signature status for tamper-evident evidence (
event_signature_status).
Evidence → root-cause routes (examples)
- Reset-driven events:
reset_reason= brownout andbrownout_cntrising → power transient is a prime suspect. - Link-driven events: rising
comm_crc_err_cntduring disturbances → cross-check integrity or harness coupling dominates. - Drift-driven changes: slowly increasing
pressure_offset_estwith recurring validity dropouts → reference pollution or input chain stability issues.
Black-box record layout (checkable, maintainable)
| Layer | What it captures | Field examples |
|---|---|---|
| Transient | Compact, event-adjacent summaries (peaks, clamp voltage, counters) in a bounded time window. | I_valve_peak / I_valve_hold / flyback_clamp_v, I_pump_rms, stall_flag |
| State | State machine + voter decisions that explain why a safe action was taken and whether it latched. | voter_state, mismatch_cnt, window_ms, safe_output_latched, fault_code, latch_reason |
| Hardware | Power integrity, lifetime, and integrity counters that trend over time and correlate with failures. | reset_reason, brownout_cnt, valve_actuation_count, pump_runtime_h, pressure_offset_est, comm_crc_err_cnt |
| Trust (optional) | Integrity status for tamper-evident records in safety investigations. | event_signature_status |
H2-11. Validation Plan: Bench → Rig → Train (How to Prove It)
Validation should close the loop with executable checklists and measurable pass/fail criteria. The plan below progresses from component-level evidence (bench), to closed-loop behavior (rig), to immunity under injection (EMC), and finally to field reproducibility (train) using black-box evidence fields.
Evidence fields (captured for every major test step)
- Power & resets:
reset_reason,brownout_cnt - Actuation stress:
valve_actuation_count,pump_runtime_h - Drift & stability:
pressure_offset_est,last_good_pressure_ts - Integrity:
comm_crc_err_cnt,event_signature_status(optional) - Safety outcomes:
fault_code,latch_reason,recovery_attempts
Bench (component-level proof)
Valve current Clamp loop Brownout Sensor open/short-
Valve coil current & flyback clamp characterization
Stimulus: PWM kick/hold, fast-off, repeated pulses.
Observe: I_valve_peak / I_valve_hold and clamp voltage behavior (flyback_clamp_v).
Fixture MPN examples: AMC1301 (TI isolated amplifier) or ADuM7701 (Analog Devices isolated ΣΔ modulator) for isolated current/voltage measurement; TPD1E10B06 (TI ESD) at entry; TVS example SM8S series (automotive-style high-power TVS family) for surge clamping (rating per platform). -
Power dip / drop-out and deterministic safe start
Stimulus: controlled brownout pulses, short supply interruptions, recovery ramps.
Observe: reset_reason, brownout_cnt, and that safety outputs stay inhibited until validity gates are proven.
Fixture MPN examples: TPS3840 (TI supervisor) or TPS3839 (TI supervisor) for reset threshold validation; TPS2660 (TI eFuse/hot-swap) as a protection front-end reference point. -
Sensor open/short injection (pressure / speed inputs)
Stimulus: open, short-to-GND, short-to-supply, overrange injection (via resistor/cable fixtures).
Observe: fault_code / latch_reason, last_good_pressure_ts discontinuity, recovery_attempts behavior.
Fixture MPN examples: MAX3160 (Analog Devices/Maxim RS-485 transceiver) + ISO1410 (TI digital isolator) for isolated comm fault simulation; MCP2562 (Microchip CAN transceiver) if CAN-based test harness is used.
Rig (closed-loop proof under real pneumatic/mechanical load)
Step response Cold viscosity Hot drift Pump stall replay-
Pressure loop step response script (repeatable)
Stimulus: defined pressure step commands and load profiles (rig-defined).
Observe: response time, overshoot bounds, stability; record pressure_offset_est trend and last_good_pressure_ts continuity.
Instrumentation MPN examples: ADS131M04 (TI multi-channel ADC) for synchronized pressure/current capture in a lab logger; MB85RS64V (Fujitsu FRAM) for high-endurance event storage in the rig controller. -
Cold start / low-temperature stick-slip script
Stimulus: cold soak then actuation cycles; include worst-case harness routing if possible.
Observe: increased actuation current signatures and any false latch; track valve_actuation_count and recovery_attempts. -
Hot soak / drift and offset stability
Stimulus: high-temperature soak with periodic actuation and sensor checks.
Observe: pressure_offset_est and event-to-event consistency; ensure no systematic shift causes safety misclassification. -
Pump stall replay (controlled)
Stimulus: repeatable stall condition (mechanical brake or flow restriction).
Observe: stall detection flags, thermal derate behavior, bounded retries; track pump_runtime_h and fault_code correctness.
Driver MPN examples: DRV8701 (TI brushed DC driver) for test-bench motor drive reference; DRV8305 (TI 3-phase gate driver) for BLDC-style rigs (if used).
EMC (injection → observe false trips, resets, and shifts)
EFT Surge ESD Harness injection points-
Injection points (must be documented): power entry, actuator switching region, harness segments (near/remote).
Observe: false trip rate, reset frequency (reset_reason/brownout_cnt), pressure shift (pressure_offset_est), comm errors (comm_crc_err_cnt).
Pass intent: brief validity dropouts may be acceptable; unexplained latching is not; recovery time must be bounded and logged. -
Protection & isolation reference MPN examples (for fixture design and boundary checks):
Digital isolators: ISO1042 (TI isolated CAN), ISO35 (TI general digital isolator family).
High-speed robust comms: DP83TD510E (TI 10BASE-T1L PHY) for long-cable lab stress rigs (if Ethernet-based diagnostics are used).
TVS examples: SMBJ/SMCJ families (select by voltage/power); ESD: TPD1E10B06.
Train / Field (reproducibility and evidence capture)
Trigger conditions Repro script Must-capture fields-
Trigger conditions: any safe_output latch, repeated resets, pressure shift beyond threshold, sudden rise in comm CRC errors.
Repro script: minimal steps to reproduce (environment, speed range, actuation pattern, harness conditions).
Must-capture fields: reset_reason, brownout_cnt, fault_code, latch_reason, recovery_attempts, last_good_pressure_ts, pressure_offset_est, comm_crc_err_cnt, valve_actuation_count, pump_runtime_h. -
Tamper-evident record (optional): sign event snapshots to preserve evidence integrity.
MPN examples: ATECC608B (Microchip secure element) for event_signature_status workflows; RTC/timestamping reference: DS3231 (Maxim/ADI RTC) for stable time-base in lab/rig controllers (platform-dependent).
Unified Pass/Fail criteria (what “prove it” means)
| Metric | How it is measured | Pass intent (set limits per platform) |
|---|---|---|
| Response time | Rig step scripts with time-to-settle; correlate with event timestamps. | Within platform limit; no unstable oscillation or repeated hunting. |
| False trip rate | Count unexplained safety latches vs injection/operating hours; track fault_code + latch_reason. | Below threshold; any latch must have consistent evidence fields. |
| Recovery time | Time from condition-cleared to restored permits; track recovery_attempts. | Bounded, deterministic; no uncontrolled auto-retry loops. |
| Pressure shift under stress | Track pressure_offset_est and last_good_pressure_ts discontinuities across EMC/thermal scripts. | Within allowed shift; no persistent offset that biases safety decisions. |
| Log completeness | Event Snapshot audit: required fields present and time-correlated. | 100% completeness for safety-relevant events; optional signature status if used. |
H2-12. FAQs (Evidence-First Troubleshooting)
Each answer follows a fixed structure: 1-sentence conclusion 2 evidence checks 1 first fix and maps back to H2-4…H2-11 only.
1
Intermittent brake pressure “jitter” — pressure AFE noise or valve PWM hold strategy?
Conclusion: Most “jitter” is control reacting to a noisy pressure estimate, not true pneumatic instability.
Evidence 1: Check pressure_offset_est trend and any saturation/validity flags around the jitter window; noise-driven jitter often correlates with short validity drops.
Evidence 2: Compare jitter timing against I_valve_hold ripple/PWM pattern and whether the ripple period matches pressure oscillations.
First fix: Temporarily increase pressure filtering bandwidth limit or add a notch at the PWM ripple frequency, then re-run the same step/hold script.
2
Low-speed slip is misdetected — insufficient hysteresis or missing-pulse handling bug?
Conclusion: At very low speed, missing-pulse handling usually dominates slip false positives more than threshold tuning.
Evidence 1: Inspect missing_pulse_cnt and speed_valid transitions during the misdetection; false slip often coincides with validity toggling.
Evidence 2: Review jitter_ppm (or edge-to-edge interval spread) on wheel_speed_raw; noisy pulses suggest hysteresis/conditioning issues.
First fix: Enforce a minimum-speed gating window and clamp slip computation when speed_valid is unstable; validate with a controlled low-speed rig script.
3
Pump start causes resets — ground bounce/flyback coupling or undervoltage threshold too sensitive?
Conclusion: If resets cluster exactly at pump inrush, power integrity and return paths are the first suspect.
Evidence 1: Correlate reset_reason and increments in brownout_cnt with pump start events; undervoltage resets leave a consistent signature.
Evidence 2: Check whether comm_crc_err_cnt spikes right before reset, indicating common-mode/ground bounce coupling into control links.
First fix: Temporarily slow the pump start (soft-start) and shorten/relocate the clamp/return loop; re-test with the same brownout script and EMC injection points.
4
After ESD, braking enters degraded mode — pressure channel self-test failure or unstable voting window?
Conclusion: Degraded mode after ESD is most often triggered by a validity gate failing, not by “random software.”
Evidence 1: Look for pressure self-test flags and discontinuities in last_good_pressure_ts; ESD often causes brief AFE saturation that trips self-test criteria.
Evidence 2: Check mismatch_cnt growth and the configured window_ms; if mismatch rises during ESD without sensor anomalies, the voter window/sync is too tight.
First fix: Add an ESD recovery debounce for pressure validity (bounded) and widen the cross-check window only for post-ESD settling, then validate under ESD injection.
5
Valve coil overheats but current looks “normal” — clamp loop too long or PWM frequency unsuitable?
Conclusion: “Normal” average current can still hide excessive RMS heating caused by ripple and poor flyback energy control.
Evidence 1: Compare I_valve_hold ripple (or inferred RMS) across PWM settings; overheating often tracks ripple amplitude, not the average.
Evidence 2: Review flyback_clamp_v behavior at turn-off; a long clamp loop can increase dissipation and inject noise back into the harness.
First fix: Change PWM frequency (or use current-regulated hold) and physically shorten the clamp return path; re-run a repeated-actuation thermal script and log drift.
6
Two MCUs frequently mismatch — sampling not synchronized or cross-communication CRC loss?
Conclusion: Mismatch is usually timing/synchronization first, CRC loss second—unless errors spike under disturbances.
Evidence 1: Inspect mismatch_cnt versus window_ms; if mismatch falls when the window is widened, sampling alignment is the root.
Evidence 2: Track comm_crc_err_cnt or crosscheck_crc_err; rising CRC errors point to link integrity issues amplified by EMC/harness coupling.
First fix: Align sampling triggers and time-stamp comparisons (same epoch), then add bounded retries for cross-check packets; validate under EMC injection and rig scripts.
7
Pressure drift gradually increases — sensor aging or temperature compensation coefficients not updated?
Conclusion: A slow monotonic drift is more often aging/offset shift than random EMI, but it must be proven by validation data.
Evidence 1: Trend pressure_offset_est across temperature points; compensation issues typically show temperature-correlated curvature, not a simple monotonic shift.
Evidence 2: Use the bench/rig plan to repeat the same pressure reference points; if drift repeats at fixed temperatures, coefficient update is likely required.
First fix: Lock a calibration/compensation version and re-fit coefficients from rig data; confirm by rerunning hot/cold scripts with unchanged harness routing.
8
Rig passes, but train shows false alarms — unmanaged common-mode injection path or wrong grounding strategy?
Conclusion: “Rig OK, train fails” usually indicates an uncontrolled coupling path (common-mode current or return path) rather than algorithm mistakes.
Evidence 1: Check whether comm_crc_err_cnt and validity dropouts rise only on-train; that pattern is typical of common-mode injection via harness/termination changes.
Evidence 2: Compare reset_reason/brownout_cnt rates between rig and train; added ground bounce on-train often increases brownouts.
First fix: Re-audit shield termination and clamp loop location at the vehicle entry; replicate the same injection points on the rig using longer harness segments.
9
A spike at valve turn-off triggers false events — comparator/input protection or threshold/filtering?
Conclusion: Turn-off spikes usually originate in the actuation path and couple into the sensing path through protection/return loops.
Evidence 1: Correlate false triggers with flyback_clamp_v excursions; if spikes align to clamp behavior, the clamp loop/return is the coupling source.
Evidence 2: Observe pressure AFE saturation markers or abrupt validity toggles near turn-off; persistent toggles suggest input protection layout/threshold filtering is insufficient.
First fix: Add a bounded blanking window around turn-off plus improve local input protection return; confirm by repeating the same turn-off pattern on bench.
10
After one emergency brake event, recovery fails — latch policy too strict or recovery conditions incomplete?
Conclusion: Recovery failures are typically policy/condition issues: the system is behaving as designed, but the exit criteria are not achievable in the real sequence.
Evidence 1: Inspect latch_reason and whether recovery_attempts stops early; this reveals which condition blocks recovery.
Evidence 2: Verify that required “last good” evidence exists (e.g., last_good_pressure_ts is recent and stable) before re-enabling outputs.
First fix: Make recovery conditions explicit and testable (state machine checklist), then validate with a scripted emergency-brake → clear → recover sequence on the rig.
11
Logs claim “pressure abnormal” but the issue cannot be reproduced — which missing field breaks the evidence chain?
Conclusion: Non-reproducible “pressure abnormal” events are often logging gaps: the record lacks the context needed to separate drift, saturation, and EMC-induced glitches.
Evidence 1: Confirm the event snapshot contains pressure_offset_est and last_good_pressure_ts; without both, drift vs transient cannot be distinguished.
Evidence 2: Check whether reset_reason/brownout_cnt or comm_crc_err_cnt is present; missing power/link context makes root-cause ambiguous.
First fix: Add the missing fields to the Event Snapshot and rerun EMC injection at documented points to force a comparable event with full evidence.
12
False trips increase after maintenance — harness/termination change or configuration/version governance issue?
Conclusion: Post-maintenance false trips most often come from physical termination/return-path changes; configuration issues are the second suspect.
Evidence 1: Compare comm_crc_err_cnt and validity dropout rates before/after maintenance; termination changes often increase CRC errors and intermittent validity failures.
Evidence 2: Audit the validation checklist: if pass/fail scripts now fail only on-train, the harness path changed; if failures appear on bench/rig too, suspect parameter version drift.
First fix: Re-verify shield termination and clamp loop placement at reworked connectors, then lock and record configuration versions in the Event Snapshot for traceability.