Onboard Battery & Charger for Rail Transit

Q: Battery shows “full” but drops fast — SOC algorithm error or rising internal resistance?

Conclusion: This is most often SOP loss from higher internal resistance, not a simple SOC display issue. Evidence 1: Compare pre/post event windows—Vpack sag under the same Ipack increases while cell delta widens. Evidence 2: Trend logs show rising sag_index/impedance proxy and stronger temperature dependence at low temperature. First fix: Re-tune SOP/IR model weighting and add a load-step validation before declaring SOC as “full”.

Q: Float charging slowly raises temperature — balancing runaway or wrong charging policy?

Conclusion: Persistent float temperature rise usually indicates balancing energy plus float setpoints that keep the pack in a high-loss region. Evidence 1: Logs show long balancing_on_time and balancing_energy increasing while cell delta is already small. Evidence 2: Charger state stays in CV/float with repeated micro-restarts and higher Ipack ripple than expected. First fix: Add balancing inhibit during float (or tighten entry criteria) and validate float thresholds with temperature-time limits.

Q: Intermittent insulation alarms — sensor drift or grounding/return-path issue?

Conclusion: If alarms correlate with switching or cable conditions, the dominant cause is often common-mode return-path disturbance rather than pure sensor drift. Evidence 1: Injection measurement shows bursts of invalid intervals (CMR/saturation flags) aligned to switching edges or comm bursts. Evidence 2: Event packets repeat with specific harness/load transitions and chassis return conditions. First fix: Stabilize the measurement window and improve CM suppression/termination at isolation boundaries; then re-calibrate drift only if residual remains.

Q: Under disturbance the MCU resets — PMIC thresholds too sensitive or holdup energy insufficient?

Conclusion: If reset coincides with rail droop and repeated restart loops, holdup and brownout policy are the first suspects before firmware logic. Evidence 1: reset_reason indicates brownout and Vrail droop appears in pre/post windows; restart_count rises quickly. Evidence 2: UV threshold/debounce sit near worst-case transient and commit markers are missing after reset. First fix: Increase holdup margin (or reduce load) and add UV hysteresis plus restart budget gating.

Q: After battery replacement the system behaves strangely — calibration/config not updated?

Conclusion: Post-replacement anomalies often come from mismatched calibration/config versions and stale aging parameters rather than a hardware fault. Evidence 1: Metadata shows pack profile differs while model_ver/threshold_ver do not match the new chemistry. Evidence 2: SOC offset or temperature mapping shifts at swap time and no commissioning marker exists. First fix: Run commissioning—update pack profile, reset/seed SOH state, and lock config with versioned audit records.

Q: Balancing runs too frequently — real cell imbalance or measurement noise/offset?

Conclusion: Excessive balancing is commonly triggered by noisy/offset cell measurements that mimic drift, especially under temperature gradients. Evidence 1: Cell delta spikes correlate with vibration/load changes while validity flags flicker. Evidence 2: Balancing energy rises without net improvement and the “weak cell” ID changes frequently. First fix: Add plausibility filtering plus minimum dwell time and inhibit balancing when measurement validity is degraded or during float/standby.

Q: Charger repeatedly restarts (start–stop loop) — control instability or protection gating?

Conclusion: Frequent restart loops usually indicate protection gating (UV/OT/timeout) interacting with charging control rather than pure loop instability. Evidence 1: State logs show consistent inhibit reasons then re-entry after short delay. Evidence 2: Windows show boundary hovering near thresholds and temperature never clears hysteresis. First fix: Increase hysteresis/debounce, enforce cooldown/lockout, and cap restart_count before re-enable.

Q: A trip happened but the log is incomplete — storage commit/holdup issue or wrong trigger policy?

Conclusion: Missing evidence almost always points to commit/holdup weakness or trigger configuration lacking sufficient pre/post windows. Evidence 1: gap_flag or hash-chain break plus missing commit_marker aligns with resets. Evidence 2: Window policy is too short or triggers only on hard trips, not warnings. First fix: Increase holdup for commit and promote warning-level triggers to capture pre-window evidence before hard trips.

Q: Timestamps look wrong across subsystems — time sync loss or time-quality not declared?

Conclusion: Bad cross-system correlation is usually time-quality loss (sync unlock or missing lock/unlock edges), not random drift. Evidence 1: Packets show time_source/sync_lock transitions with offset/uncertainty jumps. Evidence 2: Correlation improves when filtering to sync_lock=true periods and invalid intervals match disturbance windows. First fix: Log time_quality on every packet and treat sync-unlock intervals as time-uncertain for analytics.

Q: False trips happen mainly in cold weather — threshold design or SOP model mismatch?

Conclusion: Cold-weather false trips often come from SOP model mismatch (IR rise) combined with thresholds lacking temperature-aware hysteresis. Evidence 1: Larger sag at the same load clusters trips near UV/OC boundaries. Evidence 2: Estimators lag temperature transitions and overpredict available power at cold start. First fix: Add temperature-conditioned SOP limits and widen hysteresis/debounce for cold start, then validate using temp-cycle scripts.

← Back to: Rail Transit & Locomotive

Onboard Battery & Charger in rail vehicles is not just “a battery with a charger”—it is the last line of low-voltage stability and safety for critical loads. A rail-grade design must combine isolated measurements, a clear protection state machine, and black-box evidence logging so every abnormal event is explainable, auditable, and continuously improved through field feedback.

24/48/72/110 Vdc low-voltage domains Holdup & brownout survival Insulation / leakage monitoring EN 50155 / EN 50121 / EN 61373 touchpoints

H2-1. System Scope & Rail Context Boundary

What this page covers

This subsystem is the rolling stock’s low-voltage energy buffer and controlled charger, designed to keep critical control, safety, and evidentiary functions stable during supply disturbances and maintenance transitions. It focuses on the end-to-end chain: battery pack sensing and protection → charge control and isolated power conversion → balancing and insulation monitoring → watchdog-safe behavior → verifiable logs.

Typical LV domains: 24 Vdc, 48 Vdc, 72 Vdc, 110 Vdc (domain choice drives thresholds, holdup energy, and load-shedding order).
Chemistry options (implementation implications): Lead-acid / NiCd / Li-ion (LFP preferred for safety margin; requires tighter SOC/SOH/SOP separation and balancing discipline).
Primary roles: LV control power stability, emergency hold-up, black-box commit power, and supply continuity for safety-relevant loads (e.g., door and safety chain).

Reset reason (POR/BOR/WDT) Min-voltage & duration stats Holdup time & commit marker Insulation/leakage trend counters

What this page does not cover

Scope control prevents mixing traction HV powertrain topics and wayside energy systems into this onboard LV specialty page. The items below have different power levels, standards emphasis, and verification evidence.

Traction DC-link and traction inverter: HV power conversion, gate-drive protection, and DC-link dynamics belong to traction powertrain pages.
Station UPS architecture: fixed-site power redundancy and facility maintenance workflows differ from rolling stock constraints.
Substation energy storage: grid-tied energy management and substation protection are outside onboard LV system boundary.

F1. Power-tree context: the onboard battery and charger stabilizes the LV bus and must preserve safety behavior and evidence integrity during disturbances.

Cite this figure: Onboard Battery & Charger — Rail Power Tree Context Map

H2-2. Rail Power Conditions & Transient Environment

Rail-specific stressors that shape the design

In rolling stock, the battery and charger must survive supply volatility and interference without losing control stability or corrupting evidence. The threat model is not only “does it reboot,” but also “does it reboot predictably with a complete, time-stamped record.”

Input variation (EN 50155 touchpoint): repeated UV/OV excursions can force charge-state oscillation, thermal stress, and brownout resets.
Long under-voltage windows: slow degradation can trigger partial rail collapse (comms dropouts) before a full reset occurs.
Shock & vibration (EN 61373 touchpoint): intermittent contacts and sensor micro-disconnects create false alarms unless plausibility checks and counters exist.
Temperature cycling: capacity and internal resistance drift can invalidate SOC assumptions and trip protection thresholds if not temperature-aware.
EFT/Surge/ESD (EN 50121 touchpoint): interference often causes misbehavior (false trips, timebase drift, logging gaps) before outright failure.

Three non-negotiables: wide input, distinct brownout vs deep-discharge logic, and holdup

A rail charger front-end must remain functional across the LV domain’s realistic extremes, otherwise the system can bounce between CC/CV and fault states. Brownout handling must protect system stability and data integrity, while deep-discharge handling protects battery safety and lifetime—these are different policies with different recovery rules. Holdup is mandatory because “graceful shutdown + evidence commit” must complete even when the upstream LV source collapses.

V_in / V_bat / V_core minima + duration charger_state transitions count reset_reason + watchdog_trip_count commit_marker + holdup_time_ms

F2. Survival flow under rail transients: detection and classification drive state actions; evidence must be committed before recovery to avoid silent data loss.

Cite this figure: Onboard Battery & Charger — Transient & Brownout Survival Flow

H2-3. Battery Chemistry & Aging Model

Chemistry choice drives policy, not just capacity

In rolling stock LV systems, chemistry selection should be translated into control policies and evidence fields. The goal is predictable power delivery under temperature swing and disturbance, plus explainable aging that can be trended and audited.

LFP (Li-ion): wider safety margin; SOC estimation often needs coulomb counting + temperature + resistance trend because voltage is flatter in mid-SOC.
NMC (Li-ion): higher energy density; typically tighter thermal and protection margins; aging can accelerate at high temperature and high SOC dwell.
Lead-acid: operationally common for standby; float strategy dominates lifetime; voltage visibility helps but is load/temperature sensitive.
NiCd: robust in low temperature and high discharge; maintenance policy and capacity tracking require consistent logging and periodic verification.

Temperature-aware thresholds Time-at-high-SOC management Resistance (R) trend evidence Explainable SOC/SOH/SOP

Rail aging paths and what must be observable

Rail aging should be modeled as multiple concurrent paths: cycling throughput, float/high-SOC dwell, and internal resistance rise. Internal resistance rise is often the most operationally visible because it converts load steps into voltage sag and under-voltage events.

Cycle fade: capacity loss correlates with throughput and depth-of-discharge distribution (trend Ah, not only “cycles”).
Float aging / high-SOC dwell: long standby charging can accelerate degradation; dwell-time counters matter.
Resistance rise (R↑): reduces SOP; increases voltage sag and brownout probability under the same load transient.

SOC ≠ SOH ≠ SOP. SOC describes remaining energy, SOH describes degradation state, and SOP describes the deliverable peak power at the current temperature and resistance. For rail stability, SOP is often the decisive metric because it predicts whether a load step will cause a bus collapse.

SOC_est / SOH_est / SOP_est temperature_map R_est trend (or sag_index) UV events count + duration

F3. Aging should be treated as parallel paths. Resistance rise is a primary driver of SOP drop and voltage sag, so it must be trended with evidence fields.

Cite this figure: Onboard Battery & Charger — Aging & Internal Resistance Drift Model

H2-4. BMS Core Architecture

Architecture must separate measurement chain and safety/evidence chain

A rail BMS should be described as two linked chains: (1) measurement and estimation, and (2) safety decisions with evidence logging. Isolation boundaries and redundant sensing are not optional; they are the basis for stable behavior under high common-mode noise and for explainable faults.

Cell monitoring AFE: per-cell voltage and temperature acquisition with built-in diagnostics and plausibility checks.
Isolated measurement: defined isolation boundary to tolerate common-mode shifts while preserving measurement integrity.
Pack current sensing: ΣΔ isolation modulator path or Hall path; both must support drift detection and trend evidence.
Safety MCU: lockstep or dual-core execution for protection state machine, log commit, and recovery policy enforcement.
Balancing control: policy-driven equalization with action logging; freeze rules under brownout or thermal limits.
Watchdog + brownout detect: layered supervision; reset causes must be recorded to avoid “silent resets”.
Isolated communications: robust comms under common-mode stress; link health must be observable.

Isolated comms Redundant voltage sense Fault latching + timestamps Commit marker for logs

What “fault latching” means in practice

Fault latching is a policy that preserves the first-seen context of safety-relevant events even if the stimulus disappears. This prevents “transient amnesia” where intermittent wiring, vibration-induced disconnects, or interference produces a brief fault that leaves no trace. Latching should include first_seen timestamp, last_seen timestamp, and the minimal evidence window needed for root cause.

Latch: insulation fault, over-temperature, critical under-voltage, current sensor plausibility failure.
Non-latch (telemetry only): short comm glitch with automatic recovery, non-critical temperature warning (if policy allows).
Always record: reset_reason, watchdog trips, and commit status across disturbances.

F4. BMS architecture should make isolation boundaries and redundancy explicit. Safety decisions must be tied to fault latching and evidence commits.

Cite this figure: Onboard Battery & Charger — BMS Block Diagram with Isolation Boundaries

H2-5. Charging Topology & Power Stage

Energy path options and where isolation belongs

Rail onboard charging is best described as an energy path plus a protection-and-audit chain. The topology choice determines the isolation point, the controllable variables (I/V/P), and the dominant failure modes under input volatility.

AC → DC (PFC + LLC): stabilizes an intermediate bus, then provides isolated conversion. State stability at light load and robust restart policy are critical in standby-heavy operation.
DC → DC isolated: common for LV domain charging; isolation supports common-mode stress tolerance and clean measurement; brownout rules must prevent “charger pull” from collapsing the LV bus.
Multi-stage charging (CC/CV): implemented as an explicit state machine with debounced transitions and dwell-time tracking to avoid oscillation under rail input disturbances.

Isolation boundary explicit CC/CV transitions debounced Brownout: stop charge pull Restart policy logged

Standby/float policy and overcharge protection must be auditable

Standby behavior is not “do nothing.” It is a controlled policy that limits high-SOC dwell and prevents thermal stress while keeping readiness. Overcharge protection should be treated as a composite condition of voltage, temperature, and time (V+T+t), not a single threshold. Charge state transitions and dwell time must be recorded for maintenance audit and root-cause analysis.

Standby/float: track time-at-high-SOC and float dwell; prefer bounded SOC windows where applicable; log entry/exit conditions.
Overcharge guard (V+T+t): raise severity when high voltage coincides with elevated temperature for sustained duration; record duration and maxima.
Audit trail: record charger_state transitions, dwell time per stage, derate reasons, and a commit marker to prove persistence across disturbances.

charger_state + dwell Vpack / Ichg / Tmax ov_duration + derate_reason commit_marker

F5. The charger should be designed as an energy path with an explicit protection chain and an audit log that preserves state transitions, dwell time, and commit markers.

Cite this figure: Onboard Battery & Charger — Rail Charger Energy Flow & Protection Chain

H2-6. Balancing Strategies & Failure Risk

Balancing is a controlled intervention, not a background task

Balancing should be treated as an explicit control loop that reduces cell-to-cell divergence while avoiding thermal stress and avoiding decisions based on drifting sensors. In rail service, imbalance can translate into incorrect SOC/SOH/SOP interpretation and early protection triggers, even when most cells remain healthy.

Passive balancing: dissipative; simple failure modes; requires thermal guards, duty limits, and action logging.
Active balancing: energy transfer; improved efficiency; higher control complexity; requires strict plausibility checks and audit evidence.
Core risk: a single outlier cell can dominate pack behavior, causing misleading “pack-level” conclusions and repeated under-voltage events.

Outlier cell detection Thermal guard + duty limit Sensor plausibility Convergence trend

Balancing actions must be auditable (three required records)

Every balancing action should leave a compact evidence record. This prevents silent degradation and enables maintenance teams to distinguish true cell divergence from measurement drift. For rail service, the minimum action record includes a timestamp, an energy estimate, and a trend of cell delta over time.

Timestamp: start/end times, stage and trigger reason (e.g., standby window, post-charge window, thermal guard entry).
Energy: dissipated (passive) or transferred (active) energy estimate to quantify stress and duty.
Delta trend: cell_max − cell_min trend and outlier cell IDs to verify convergence and detect sensor drift.

bal_event_ts bal_energy_mWh delta_cell_trend bal_trigger_reason

F6. Balancing should be a closed-loop control with plausibility checks and an audit record (timestamp, energy, delta trend) to distinguish true imbalance from sensor drift.

Cite this figure: Onboard Battery & Charger — Cell Imbalance Detection & Balancing Control Loop

H2-7. Isolation & Insulation Monitoring

Insulation monitoring is a compliance safety chain, not an option

Rail onboard battery domains must support insulation resistance (Riso) measurement, ground leakage detection, and leakage trend logging. The system should expose both the estimate and its validity, so maintenance can distinguish real degradation from measurement saturation or interference.

Insulation resistance (Riso): estimate value + validity flag + update period.
Ground leakage events: severity grading (warn/derate/trip) with duration and counters.
Trend: slope/index over defined windows for predictive maintenance and audit queries.

Riso_est + Riso_valid leak_severity + duration Riso_trend_slope event_ts + commit_marker

Injection method + high-CMR sensing: where errors come from

Insulation monitoring commonly uses a controlled injection signal and measures the response with a high common-mode rejection (CMR) differential front end. The measurement chain should defend against saturation, frequency-dependent CMRR loss, and interference coupling into the sense loop.

Injection stability: injected amplitude and frequency must be identifiable under rail EMI background.
High-CMR differential AFE: needs adequate common-mode range and recovery behavior; record saturation and recovery time.
Model mismatch: distributed capacitance, surface leakage (humidity/contamination), and cable routing can bias Riso_est; validity must reflect this.
Evidence discipline: when validity is false, trend updates should freeze and log “invalid interval”.

Recommended flags: afe_sat_flag • afe_recovery_time • cmrr_health_flag • cal_offset_trend • Riso_valid

F7. Injection-based insulation monitoring must output both the estimate and validity, then log events and trend snapshots with commit markers for audit and maintenance.

Cite this figure: Onboard Battery & Charger — Insulation Injection & Detection Flow

H2-8. Protection State Machine & Safe State Logic

Protection must be implemented as a state machine with evidence windows

Rail protection is not a list of thresholds. It is a state machine that enforces safe states, defines which faults are latched (non-auto-restart), captures pre/post-trip evidence windows, and exposes remote query capability for audit and service workflows.

Fault coverage: OV, UV, OC, OT, resistance anomaly (R↑), insulation fault (Riso), and sensor-invalid conditions.
Non-auto-restart: selected safety faults must latch until explicit service/remote-clear policy allows transition.
Evidence window: capture pre/post data (V/I/T/SOC/SOP/Riso/R_est) and commit markers to survive resets.
Remote query: current state, last trip record, counters, and trend snapshots available at capability level.

fault_id + severity pre/post window reset_reason remote query snapshot

Safe-state rules: graded actions and explicit recovery conditions

Safe-state logic should grade responses (warn/derate/trip) and enforce explicit recovery conditions with hysteresis and debounce. Latch levels prevent silent cycling and ensure that repeated brownout or insulation faults do not self-clear without evidence review.

Warn: record-only or advisory with counters; no disruptive action unless escalation rules trigger.
Derate: limit charge/discharge power; freeze balancing; enforce thermal guard and SOP-based limits.
Trip (Latched): stop charge/discharge; preserve evidence bundle; require service/remote-clear policy for recovery.
Recovery check: verify stability for a minimum time window before returning to Normal.

F8. A rail-ready protection system is a state machine with explicit latch rules, evidence windows, and remote query outputs—ensuring faults are explainable and recoveries are controlled.

Cite this figure: Onboard Battery & Charger — Battery Protection State Machine

H2-9. Event Logging & Black-Box Evidence

Evidence logging is a packet, not a dump

Rail-grade logging should create an evidence packet that can reconstruct cause-and-effect and survive resets. The most useful design splits data into fast window captures (pre/post) and slow snapshots (maps and trends), then binds them to a verified timebase and integrity metadata.

Fast window (pre/post): pack voltage, current waveform, fault flags edge-aligned, charge state transitions.
Slow snapshot: cell-min/max/delta summary, temperature map summary, lifetime counters and health indices.
Metadata: device ID, firmware/config version, trigger ID, time quality (PTP/GNSS/local), and commit marker.

pre/post window state transitions time quality commit marker

Integrity + time alignment: what makes it “black-box” evidence

An evidence packet should carry integrity fields that detect tampering and detect missing segments, and it should declare time source and sync lock state. When sync is lost, packets should record the transition and mark the interval as time-uncertain to prevent incorrect cross-system correlation.

Integrity fields: payload hash, previous hash (chain), signature, and key/cert ID.
Gap detection: segment IDs and gap flags; a broken chain indicates deletion or reordering.
Time quality: time_source + sync_lock + offset/uncertainty estimate; record lock/unlock edges.
Remote query/export: last-trip packet, counters, and trend snapshot must be retrievable at capability level.

payload_hash + prev_hash signature + key_id time_source + sync_lock gap_flag + segment_id

F9. A rail-grade evidence packet is layered: fast pre/post windows for causality, slow snapshots for context, declared time quality, and signed hash chaining for tamper/gap detection.

Cite this figure: Onboard Battery & Charger — Battery Event Evidence Packet Structure

H2-10. EMC & Rail Compliance Mapping

Standards mapped into engineering actions and log fields

Rail compliance becomes actionable when each standard is translated into: requirement intent → design actions → test method → required log fields. EMC should be treated as return-path design and common-mode current control, not only as “add a filter”.

EN 50155 — Power and environmental operating conditions

Wide input operation, brownout resilience, thermal cycling behavior, and functional continuity verified through controlled voltage/temperature profiles and restart budgets.

uv_event_count restart_count derate_reason recovery_time_ms time_at_high_SOC

EN 50121 — EMC emission/immunity behavior

Immunity events should yield explainable behavior: no silent data corruption, bounded recovery time, and declared invalid intervals for saturated measurement chains.

reset_reason afe_sat_flag comm_error_burst invalid_interval_ms recovery_time_ms

EN 61373 — Shock and vibration robustness

Vibration-induced intermittency and drift must be detectable: connector/ground reference changes should raise plausibility failures and be tied to evidence packets.

intermittent_counter sensor_plausibility_fail connector_fault_flag fault_counter last_trip

requirement → action test → record time stamped evidence bounded recovery

Common-mode path thinking: the practical EMC view

Common-mode problems typically start at a switching node (high dv/dt and di/dt), couple through parasitic capacitance into cable/shield structures, and return through chassis paths into sensitive analog or digital domains. Mitigation aims to control the return path and isolate sensitive references.

Source: switching node edges create common-mode currents via parasitic coupling.
Path: cable shields and harness geometry provide unintended return paths to chassis.
Victims: AFE saturation, MCU resets, and comm burst errors.
Mitigation tags: return-path control, shield termination, isolation boundary, CM suppression placement, loop-area reduction.

F10. Common-mode issues follow a path: switching node → parasitic coupling → cable/shield → chassis return → victim circuits. Mitigation is return-path control with evidence logging.

Cite this figure: Onboard Battery & Charger — Common-Mode Noise Path & Mitigation

H2-11. Validation & Field Feedback Loop

Why this chapter exists: BMS is a living aging-model system

Rail battery reliability improves when validation and field telemetry form a closed loop: tests generate evidence bundles, field events refine models, and updates ship through regression-controlled releases with versioned thresholds and policies.

model_ver (SOH/SOP) threshold_ver policy_ver (balancing) evidence_bundle regression_pass

Executable checklist: bring-up (measurement + decision + evidence)

Bring-up is not just “power-on”. It verifies measurement integrity (validity and saturation), protection state machine transitions, and evidence persistence across resets.

Measurement chain: confirm cell-voltage validity, current zero-drift behavior, and temperature plausibility flags.
Decision chain: verify warn/derate/trip/latch/recovery transitions and debounce/hysteresis behavior.
Evidence chain: validate pre/post window capture and commit markers survive brownout/reset.

MPN examples (bring-up relevant):

ADI LTC6813 (cell monitor AFE) TI BQ76952 (battery monitor) TI AMC1301 (isolated amplifier) ADI AD7403 (isolated ΣΔ mod) TI TPS3839 (watchdog/supervisor) Infineon AURIX TC3xx (safety MCU)

Deep-discharge recovery validation: recovery gating + data trust

The goal is controlled recovery without restart storms and without silently trusting corrupted or time-uncertain data. Recovery must be gated by voltage hysteresis, thermal safety, and restart budgets, and it must produce an evidence packet for audit.

During UV/deep discharge: inhibit charge/discharge actions as defined, freeze trend updates when validity is false.
Recovery check: voltage/temperature back inside safe window for a minimum time; enforce restart budget limits.
Post-recovery: SOC realignment flagging, last-trip packet exportable, and commit marker confirmed.

MPN examples (recovery control & logging):

TI TPS2121 (power mux) ADI LTC4368 (surge stopper) TI ISO7741 (digital isolator) Microchip ATECC608B (secure element ID) Fujitsu MB85RS64V (FRAM, small logs)

Thermal cycling: validate temperature dependence of SOH/SOP + thresholds

Temperature cycling should validate that model outputs (SOH/SOP) separate true aging from temperature-driven capacity and resistance shifts, while thresholds remain robust (low false trips, bounded recovery time) across temperature ranges.

Model checks: resistance/sag indices vs temperature; consistency of SOC/energy accounting.
Protection checks: OT/UT behaviors, derate entry/exit stability, and latch rules where applicable.
Evidence outputs: temperature map summaries tied to pre/post windows and time quality fields.

MPN examples (sensing & robustness):

ADI ADT7420 (digital temp sensor) TI TMP117 (high-accuracy temp) ADI ADXL355 (low-noise accel) TI TPS3703 (window comparator)

Disturbance injection: declare invalid intervals + bounded recovery

Immunity validation should prove there is no silent corruption. When measurement chains saturate, the design must declare invalid intervals, freeze trend updates, and recover within a defined bound while logging reset reasons and error bursts.

After ESD/EFT/surge events: record AFE saturation flags and recovery times.
Communications behavior: capture burst error counters and time alignment edges.
Evidence discipline: evidence packets must show time quality and integrity status for the event interval.

MPN examples (immunity + isolation):

TI ISO1042 (isolated CAN) ADI ADuM140D (digital isolator) TI SN65HVD1050 (RS-485 transceiver) TI TPS2660 (eFuse/hot-swap)

Aging trend comparison: lab vs field alignment

Aging validation compares lab profiles and field distributions using the same indicators, so drift in real operations triggers targeted updates. Key outputs are false-trip rate, drift indices, and mismatch flags mapped back to test scripts and evidence bundles.

Compare: capacity fade, resistance rise, imbalance energy, deep-discharge counts, and thermal hot-spot indices.
Normalize by: time, mileage equivalent, cycle count, and cumulative Ah.
Trigger criteria: mismatch thresholds start an update task (SOH model or threshold/policy revision) with traceable evidence.

MPN examples (data retention / robustness):

Infineon S25FL (SPI NOR, logs) Micron MTFC (industrial eMMC) Renesas 8A34001 (jitter cleaner PLL) u-blox ZED-F9T (timing GNSS)

Field feedback: SOH model, thresholds, and balancing policy updates

Field telemetry should feed controlled updates. Each update must be versioned, justified by evidence packets and statistics, validated by regression scripts, and released with staged rollout and rollback readiness.

SOH algorithm update: recalibrate aging/impedance mapping using field distributions; publish model_ver and applicability notes.
Threshold update: tune hysteresis/debounce and latch policy using false-trip evidence; publish threshold_ver.
Balancing strategy update: adjust start/stop windows and limits using imbalance trend and thermal risk; publish policy_ver.
Governance: every change links to evidence_bundle IDs and passes regression gates before release.

MPN examples (update governance & identity):

Microchip ATECC608B (device identity) Infineon OPTIGA Trust M (secure ID) TI TMS570 (safety MCU family)

F11. A closed-loop validation process turns field events into versioned improvements: evidence packets → drift analytics → controlled updates → regression gates → staged release.

Cite this figure: Onboard Battery & Charger — Closed-Loop Improvement Cycle

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12. FAQs (Accordion ×12)

Format rule: each answer contains 1 conclusion, 2 evidence checks, and 1 first fix, and maps back to H2-4…H2-11.

Battery shows “full” but drops fast — SOC algorithm error or rising internal resistance? → H2-3 / H2-9

Conclusion: This is most often SOP loss from higher internal resistance, not a simple SOC display issue.

Evidence 1: Compare pre/post event windows: Vpack sag under the same current (Ipack) increases while cell delta widens.
Evidence 2: Trend logs show rising sag_index/impedance proxy and a stronger temperature dependence at low temperature.
First fix: Re-tune SOP/IR model weighting and add a “load-step validation” before declaring SOC as “full”.

MPN examples: ADI LTC6813 (cell monitor), TI BQ76952 (battery monitor), TI AMC1301 (isolated amplifier), ADI AD7403 (isolated ΣΔ).

Float charging slowly raises temperature — balancing runaway or wrong charging policy? → H2-5 / H2-6

Conclusion: Persistent float temperature rise usually indicates balancing energy plus float setpoints that keep the pack in a high-loss region.

Evidence 1: Logs show long balancing_on_time and balancing_energy increasing while cell delta is already small.
Evidence 2: Charger state stays in CV/float with repeated micro-restarts and higher Ipack ripple than expected.
First fix: Add balancing inhibit during float (or tighten entry criteria) and validate float thresholds with temperature-time limits.

MPN examples: TI UCC25630 (LLC controller), TI UCC28180 (PFC controller), TI BQ76952 (balancing control), TI TPS3839 (supervisor).

Intermittent insulation alarms — sensor drift or grounding/return-path issue? → H2-7 / H2-10

Conclusion: If alarms correlate with switching or cable conditions, the dominant cause is often common-mode return-path disturbance rather than pure sensor drift.

Evidence 1: Insulation injection measurement shows bursts of invalid intervals (CMR/saturation flags) aligned to switching edges or comm bursts.
Evidence 2: Event packets show repeatability with specific harness states (door/HVAC load transitions) and chassis return conditions.
First fix: Stabilize the injection measurement window and improve CM suppression/termination at isolation boundaries; then re-calibrate drift only if residual remains.

MPN examples: TI ISO7741 (digital isolator), ADI ADuM140D (isolator), TI SN65HVD1050 (RS-485), TI ISO1042 (isolated CAN).

Under disturbance the MCU resets — PMIC thresholds too sensitive or holdup energy insufficient? → H2-2 / H2-4

Conclusion: If reset coincides with rail droop and repeated restart loops, holdup and brownout policy are the first suspects before firmware logic.

Evidence 1: Reset_reason = brownout and Vrail droop appears in pre/post windows; restart_count climbs in a short time.
Evidence 2: PMIC undervoltage threshold and debounce are near the worst-case transient; time_quality remains valid but commit markers are missing after reset.
First fix: Increase holdup margin (or reduce load during event) and add UV hysteresis + restart budget gating for safe recovery.

MPN examples: TI TPS2121 (power mux), TI TPS3839 (watchdog/supervisor), TI TPS2660 (eFuse/hot-swap), ADI LTC4368 (surge stopper).

After battery replacement the system behaves strangely — calibration/config not updated? → H2-3 / H2-11

Conclusion: Post-replacement anomalies often come from mismatched calibration/config versions and stale aging parameters rather than a hardware fault.

Evidence 1: Device metadata shows pack_id/cell_count differs from stored config; model_ver/threshold_ver do not match the new battery type.
Evidence 2: SOC offset or temperature mapping shifts abruptly at swap time; evidence packets show no “commissioning” marker.
First fix: Run a commissioning workflow: update pack profile, reset/seed SOH state, and lock config with versioned audit records.

MPN examples: Microchip ATECC608B (device identity), Infineon OPTIGA Trust M (secure ID), Fujitsu MB85RS64V (FRAM for config markers).

Balancing runs too frequently — real cell imbalance or measurement noise/offset? → H2-4 / H2-6

Conclusion: Excessive balancing is commonly triggered by noisy/offset cell measurements that mimic drift, especially under temperature gradients.

Evidence 1: Cell delta spikes correlate with vibration/load changes while temperature map shows strong gradients and validity flags flicker.
Evidence 2: Balancing energy rises without net improvement in delta trend; the same “weak cell” ID changes frequently over time.
First fix: Add plausibility filtering + minimum dwell time and inhibit balancing when measurement validity is degraded or during float/standby.

MPN examples: ADI LTC6813 (cell monitor AFE), TI BQ76952 (monitor/balancing), TI TPS3703 (window supervisor), TI AMC1301 (isolated sensing).

Charger repeatedly restarts (start–stop loop) — control instability or protection gating? → H2-5 / H2-8

Conclusion: Frequent restart loops usually indicate protection gating (UV/OT/timeout) interacting with charging control rather than pure loop instability.

Evidence 1: State machine logs show transitions into inhibit with consistent reason codes (UV/OT/timer), then re-entry after a short delay.
Evidence 2: Ipack/Vpack windows show boundary hovering near thresholds; temperature rises slowly but never fully clears hysteresis.
First fix: Increase hysteresis/debounce on gating, enforce a cooldown/lockout timer, and cap restart_count before allowing re-enable.

MPN examples: TI UCC25630 (LLC), TI UCC28180 (PFC), TI TPS3839 (supervisor), TI TPS2660 (hot-swap/eFuse).

A trip happened but the log is incomplete — storage commit/holdup issue or wrong trigger policy? → H2-9 / H2-2

Conclusion: Missing evidence almost always points to commit/holdup weakness or trigger configuration that does not allocate enough pre/post window.

Evidence 1: gap_flag set or hash chain breaks; commit_marker absent after reset; restart_reason aligns with the event time.
Evidence 2: Window lengths are too short to capture the causal lead-in (no pre-window), or triggers are only on hard trips not on warnings.
First fix: Increase holdup for storage commit and promote “warning-level triggers” to capture pre-window evidence before the hard trip.

MPN examples: TI TPS2121 (power mux), TI TPS2660 (eFuse), Fujitsu MB85RS64V (FRAM), Micron industrial eMMC (log storage).

Timestamps look wrong across subsystems — time sync loss or time-quality not declared? → H2-9 / H2-10

Conclusion: Bad cross-system correlation is usually a time-quality issue (loss of sync or missing lock/unlock edges), not “random clock drift”.

Evidence 1: Evidence packets show time_source changes or sync_lock transitions without a recorded edge; offset/uncertainty jumps.
Evidence 2: Event correlation improves when filtering packets to “sync_lock=true” periods; invalid intervals match disturbance windows.
First fix: Log time_quality fields on every packet and treat sync unlock intervals as time-uncertain for analytics and maintenance decisions.

MPN examples: u-blox ZED-F9T (timing GNSS), Renesas 8A34001 (jitter cleaner PLL), PTP-capable TSN switch silicon (time stamping).

False trips happen mainly in cold weather — threshold design or SOP model mismatch? → H2-3 / H2-8

Conclusion: Cold-weather false trips often come from SOP model mismatch (IR rise) combined with thresholds lacking temperature-aware hysteresis.

Evidence 1: At low temperature, the same load produces larger sag; trips cluster near UV/OC boundaries without true overcurrent.
Evidence 2: SOH/SOP estimators lag temperature transitions; the model predicts more available power than reality in cold start conditions.
First fix: Add temperature-conditioned SOP limits and widen hysteresis/debounce for cold-start, then validate with controlled temp-cycle scripts.

MPN examples: TI TMP117 (temp sensor), ADI ADT7420 (temp), TI TPS3703 (window supervisor), ADI LTC6813 (cell monitor).

Communication errors spike during charging — EMI common-mode path or isolation boundary weakness? → H2-10 / H2-7

Conclusion: Charge-state comm bursts typically indicate common-mode current paths and imperfect isolation boundary treatment rather than a protocol bug.

Evidence 1: comm_error_burst aligns with switching edges and AFE saturation flags; improvement occurs when filtering by “quiet switching” intervals.
Evidence 2: Errors increase with harness configuration or shield termination changes; insulation measurement shows more invalid intervals concurrently.
First fix: Control CM return path (shield termination, reference strategy) and add CM suppression at the interface; re-test with disturbance injection scripts.

MPN examples: TI SN65HVD1050 (RS-485), TI ISO1042 (isolated CAN), ADI ADuM140D (isolator), TI ISO7741 (isolator).

Deep-discharge recovery takes too long — protection policy too conservative or hardware holdup too small? → H2-2 / H2-8 / H2-11

Conclusion: Slow recovery usually results from conservative gating interacting with marginal holdup, causing repeated partial recoveries instead of one clean restart.

Evidence 1: restart_count grows while recovery_time_ms remains high; UV inhibit reasons repeat and commit markers appear intermittently.
Evidence 2: Vrail droops during bring-up load; time-at-UV boundary is high and temperature gating never clears fully.
First fix: Improve holdup margin (or reduce start-up load) and re-tune recovery gating with clear hysteresis + cooldown timers validated by scripts.

MPN examples: TI TPS2121 (power mux), TI TPS2660 (eFuse/hot-swap), TI TPS3839 (supervisor), ADI LTC4368 (surge stopper).

Onboard Battery & Charger for Rail Transit

Onboard Battery & Charger for Rail Transit

H2-1. System Scope & Rail Context Boundary

What this page covers

What this page does not cover

H2-2. Rail Power Conditions & Transient Environment

Rail-specific stressors that shape the design

Three non-negotiables: wide input, distinct brownout vs deep-discharge logic, and holdup

H2-3. Battery Chemistry & Aging Model

Chemistry choice drives policy, not just capacity

Rail aging paths and what must be observable

H2-4. BMS Core Architecture

Architecture must separate measurement chain and safety/evidence chain

What “fault latching” means in practice

H2-5. Charging Topology & Power Stage

Energy path options and where isolation belongs

Standby/float policy and overcharge protection must be auditable

H2-6. Balancing Strategies & Failure Risk

Balancing is a controlled intervention, not a background task

Balancing actions must be auditable (three required records)

H2-7. Isolation & Insulation Monitoring

Insulation monitoring is a compliance safety chain, not an option

Injection method + high-CMR sensing: where errors come from

H2-8. Protection State Machine & Safe State Logic

Protection must be implemented as a state machine with evidence windows

Safe-state rules: graded actions and explicit recovery conditions

H2-9. Event Logging & Black-Box Evidence

Evidence logging is a packet, not a dump

Integrity + time alignment: what makes it “black-box” evidence

H2-10. EMC & Rail Compliance Mapping

Standards mapped into engineering actions and log fields

EN 50155 — Power and environmental operating conditions

EN 50121 — EMC emission/immunity behavior

EN 61373 — Shock and vibration robustness

Common-mode path thinking: the practical EMC view

H2-11. Validation & Field Feedback Loop

Why this chapter exists: BMS is a living aging-model system

Executable checklist: bring-up (measurement + decision + evidence)

Deep-discharge recovery validation: recovery gating + data trust

Thermal cycling: validate temperature dependence of SOH/SOP + thresholds

Disturbance injection: declare invalid intervals + bounded recovery

Aging trend comparison: lab vs field alignment

Field feedback: SOH model, thresholds, and balancing policy updates

Request a Quote

Accepted Formats

Attachment

H2-12. FAQs (Accordion ×12)

Explore

Categories

Get in Touch