Emergency Lighting Inverter: Charge, Transfer & Self-Test
← Back to: Lighting & LED Drivers
Emergency lighting is not just “turning on”—it is proven reliability. This page explains how to guarantee fast, stable transfer to battery/inverter power, verify runtime with evidence-based logs, and keep self-test results traceable for maintenance and compliance.
H2-1. One-line Thesis + What problem this page solves
What this page solves: A practical, evidence-ready architecture for emergency luminaires that must (1) detect mains loss, (2) transfer to battery-backed power without visible failure, (3) regulate emergency LED current within defined limits, and (4) prove compliance through runtime metering and self-test traceability.
Scope boundary (locked): Covers charger + battery readiness, emergency transfer (mains/brownout detect, hold-up, soft-start), emergency inverter/power stage behavior, emergency LED constant-current regulation, runtime/SOC evidence, and self-test + log design. Excludes protocol dimming ecosystems and any networked/PoE/PLC/wireless node content.
Why “deep” matters: Field failures are rarely continuous. Typical issues are intermittent: “sometimes no transfer,” “brief flash then off,” “self-test fails but manual test passes,” or “runtime below label.” This page is built around a minimal evidence dictionary so each symptom maps to a measurable discriminator and a repairable root cause.
Evidence dictionary (8 fields used throughout): Keep these names consistent across firmware logs, test reports, and service tools. Each field exists to separate charger vs battery vs transfer vs inverter vs LED-loop faults with minimal measurement overhead.
| Field | Unit / Type | Source (where it comes from) | Primary use (what it proves / diagnoses) |
|---|---|---|---|
| LINE_OK | bool | Mains sense comparator/ADC + debounce result | Ground truth for “mains present.” Prevents false transfer due to noise/brownout chatter. |
| MAINS_LOSS_TS | timestamp | Latched at the first validated LINE_OK=0 event | Start anchor for transfer timing; enables consistent transfer-time audits. |
| TRANSFER_MS | ms | Computed: ILED_STABLE_TS − MAINS_LOSS_TS | Single KPI for transfer reliability; correlates with “flash”/“dark gap” complaints. |
| VBUS_MIN_XFER | V | Min DC-link/bus voltage captured during the transfer window | Separates “control logic transfer” vs “energy hold-up collapse” vs “reset during dip.” |
| VBAT | V | Battery ADC (pack sense) with temperature context | Battery readiness & discharge trajectory; helps explain short runtime and early cutoff. |
| IBAT_DISCH | A | Battery shunt/HS sense (discharge current) | Detects overload, inrush problems, and aging-related IR drop under emergency load. |
| ILED_AVG / ILED_RIPPLE_PCT | A / % | LED sense resistor/CSA sampling (average + ripple estimate) | Proves emergency output quality; flags loop instability and flicker-risk conditions. |
| SELFTEST_CODE + FAULT_BITS | enum + bitfield | State machine result + latched protection causes | Makes failures traceable: where it failed (sense/transfer/inverter/LED/runtime) and why (UV/OV/OCP/OTP/timeout/reset). |
H2-2. System requirements & modes (Normal / Standby / Emergency / Test)
Goal of this chapter: Define operating modes with explicit entry/exit conditions and measurable success criteria. Clear mode boundaries prevent hidden conflicts (for example: charger behavior interfering with emergency transfer, or self-test triggering during an actual mains-loss event).
Mode model (4 states):
- Normal Lighting (Mains ON): Mains present; lighting may run from the normal power path; charger can operate in parallel. Must-log: LINE_OK.
- Standby Charging (Ready): Luminaire is idle/ready; charger maintains battery readiness with minimal standby loss. Must-log: battery readiness evidence (use VBAT plus internal charger phase, if available).
- Emergency Supply (Mains OFF): Battery-backed power engages; emergency LED current must reach a defined stable level. Must-log: MAINS_LOSS_TS, TRANSFER_MS, VBUS_MIN_XFER, ILED_AVG/RIPPLE%, RUNTIME_MINUTES.
- Test Mode (Self-test): Controlled verification of emergency chain health. Two classes: functional test (short) and duration test (long). Must-log: SELFTEST_CODE + FAULT_BITS (and the measured KPIs used for pass/fail).
Key requirements (measurable definitions):
- Transfer time (TRANSFER_MS): Measured from MAINS_LOSS_TS (first validated mains-loss) to the time the emergency LED current reaches a defined stable window (for example, ≥X% of target and stable for Y ms). This avoids “success by momentary spike.”
- Emergency constant-current quality: Use both ILED_AVG (accuracy) and ILED_RIPPLE_PCT (stability/visual risk). The minimum requirement should be expressed as a bound, not as “good.”
- Minimum output power / illumination policy: Implemented as an emergency current setpoint and/or derating ladder. Derating decisions must be explainable through runtime evidence (battery voltage/current and time).
- Runtime compliance: Proved by RUNTIME_MINUTES (and optionally SOC estimate in later chapters). Store the battery condition snapshot at start and end of emergency operation.
- Restore strategy: When mains returns, mode switching must avoid chatter. A restore debounce and a controlled re-entry (soft-start/recharge gating) prevents repeated transfer cycling.
Typical constraints (why requirements fail in real luminaires):
- Small enclosure / thermal stress: Reduces battery capacity margin and increases protection triggers; transfer success must be measured at temperature corners.
- Low standby power: Forces sparse sensing/logging; the evidence dictionary prevents “too little data to diagnose.”
- Noise/EMI bursts at transition: The transfer moment is the highest-risk EMI and reset window; capture VBUS_MIN_XFER and fault bits to prevent non-reproducible investigations.
Evidence method (how transfer time and test results are recorded):
- Transfer timing: Latch MAINS_LOSS_TS when LINE_OK becomes false after debounce. Declare “transfer complete” when LED current reaches a stable band (log that moment internally), then compute TRANSFER_MS. During the same window, capture VBUS_MIN_XFER and any reset/fault indicators.
- Functional self-test (short): Verify sense → transfer → inverter start → LED current regulation. Pass/fail is encoded in SELFTEST_CODE. If fail, set FAULT_BITS to indicate the stage and dominant protection cause.
- Duration self-test (long): Verify runtime. Record RUNTIME_MINUTES and the final battery/LED condition snapshot to show whether “short runtime” is capacity-related or load-related.
H2-3. Reference architecture (Power tree + control partition)
Intent: This chapter provides the single “map” used by all later deep-dives. It separates the power tree (energy flow) from the control/telemetry partition (who decides transfer, who enforces protection, and who produces logs). Each major block has a corresponding test point (TP) and a log source, so symptoms can be diagnosed without guesswork.
Power tree (energy flow):
- AC input → Rectifier / (optional PFC) → DC bus (VBUS): Establishes the main energy reservoir and defines the transfer stress window.
- Charger → Battery pack: Maintains readiness; battery condition dominates runtime compliance over product life.
- Battery → Inverter / emergency power path → LED constant-current stage: Produces stable emergency output; any instability here shows up as current ripple, early cutoff, or protection latching.
Control partition (decisions + evidence generation):
- Mains sense + debounce: Produces LINE_OK and the “start timestamp” for transfer evidence.
- Transfer actuation: Relay/solid-state control that gates emergency power engagement (and defines where bus dips may occur).
- Inverter enable + soft-start policy: Controls inrush and peak stress; ties directly to transfer success and fault bits.
- LED CC enable + setpoint: Defines emergency output quality (average current + ripple proxy).
- Logger + self-test scheduler: Converts measured behavior into traceable results (SELFTEST_CODE, FAULT_BITS, runtime evidence).
Evidence mapping: The table below is the practical bridge from “block diagram” to “what to measure first.” Keep TP naming consistent across lab notes, firmware, and service procedures.
| TP | Node / Block | What to measure | Primary purpose | Typical failure discrimination |
|---|---|---|---|---|
| TP1 | Mains sense | LINE_OK | Truth source for mode entry; prevents chatter | False transfer vs real brownout; debounce margin issues |
| TP2 | DC bus | VBUS (incl. dip) | Hold-up adequacy during transfer | “Flash then reset” vs “transfer logic OK but bus collapses” |
| TP3 | Charger status | Charge phase / ready flag | Battery readiness evidence | Short runtime caused by undercharged pack vs aging |
| TP4 | Battery pack | VBAT | Pack condition snapshot | Early cutoff from UV vs genuine capacity loss |
| TP5 | Battery discharge | IBAT_DISCH | Load stress + overload indication | Overload/inrush vs normal load but weak pack IR |
| TP6 | Inverter control | INV_EN + peak proxy | Soft-start quality + peak events | OCP trip at start vs stable start but poor regulation |
| TP7 | LED CC sense | ILED_AVG | Emergency output correctness | Current below target (derate/limit) vs loop failure |
| TP8 | LED ripple proxy | ILED_RIPPLE_PCT | Stability / visual risk proxy | Control instability vs measurement aliasing / sampling design |
| TP9 | Logger | SELFTEST_CODE | Traceable test result | Fail-stage pinpointing (sense/transfer/inverter/LED/runtime) |
| TP10 | Fault latch | FAULT_BITS (and reset cause if available) | Root-cause evidence | UV/OV/OCP/OTP/timeout vs unexpected reset during dip |
H2-4. Battery pack choices & safety envelope (Li-ion / LiFePO₄ / NiMH)
Intent: Battery selection in emergency luminaires is not about “maximum energy density.” It is about long standby life, temperature corners, maintenance strategy, and provable runtime compliance. This chapter compares chemistries only through the lens that matters for emergency lighting: readiness, safety envelope, and evidence fields for ongoing compliance.
What actually matters in emergency lighting:
- Standby behavior: The system may spend most of its life in standby/charging. Chemistry determines whether long-term float/maintenance charging is benign or aging-accelerating.
- Temperature envelope: Enclosures run warm; cold corners reduce deliverable capacity and increase internal resistance effects. Compliance must be defensible at realistic temperature extremes.
- Low-temperature discharge: Runtime shortfalls frequently appear only in cold conditions; the evidence set must capture temperature context.
- Maintenance strategy: Some systems require periodic controlled discharge tests to keep runtime claims calibrated over aging.
Chemistry selection notes (emergency-luminaire lens only):
- Li-ion: Strong energy density but more sensitive to charging policy and thermal history; ensure protection boundaries and evidence logging support safe long-life standby.
- LiFePO₄: Often chosen for safety/thermal robustness; still needs defined UV/OV and temperature-aware behavior to avoid false runtime assumptions.
- NiMH: Different standby/maintenance behavior; selection is driven by the permitted maintenance approach and the acceptable runtime margin under temperature extremes.
Safety envelope (must be explicit, not “implied”): Define what happens when any boundary is crossed. The luminaire must enter a safe state, and the event must be traceable via pack_fault_flags and FAULT_BITS.
- OV / UV: Stop charge/discharge as appropriate; record fault source and snapshot VBAT.
- OCP: Protect against overload/inrush and short conditions; log whether it happened during transfer or steady emergency output.
- OTP: Use pack NTC placement to represent worst-case heating; record temperature extremes for maintenance evidence.
- Reverse / miswire / connector faults: Treat as a pack fault condition; log a distinct fault bit for serviceability.
Aging → runtime compliance (the key linkage): A luminaire often “works” while silently losing compliant runtime. To keep runtime claims defensible, record an evidence set that connects capacity/aging to the observed duration outcome.
Evidence fields (minimal maintenance-ready set):
- capacity_estimate: A capacity proxy derived from controlled tests or consistent discharge characterization (concept-level; implementation detailed later).
- temp_extremes: Max/min temperatures observed near the pack; temperature is the first-order modifier of deliverable runtime.
- cycle_count: Tracks wear; helps distinguish “normal aging” from a sudden defect or abnormal maintenance policy.
- pack_fault_flags: Pack/BMS fault indicators (OV/UV/OCP/OTP, etc.) that must map into service-visible fault codes.
H2-5. Charge management (CC/CV, standby, health monitoring)
Intent: Charge control must be a phase machine with explicit entry/exit rules, temperature guards, per-phase timeouts, and a traceable termination reason. “Standby” is an active policy state: it maintains readiness while collecting evidence that explains long-term runtime compliance.
Phase machine: PRECHG → CC → CV → TERM → STANDBY
Implement charging as discrete phases, each with entry conditions, a control objective, exit conditions, and mandatory evidence. Avoid hidden behavior that cannot be audited later.
| Phase | Entry conditions | Control objective | Exit conditions | Must log |
|---|---|---|---|---|
| PRECHG | VBAT below safe fast-charge window; temp within guard | Safe low-current recovery; avoid stress at deep discharge | VBAT reaches CC-entry threshold OR timeout OR temp violation | CHG_PHASE, phase start/end, VBAT/IBAT/T snapshots, timeout flag |
| CC | VBAT in fast-charge window; no pack fault | Constant current (or bounded current) while tracking temp | VBAT reaches CV target OR timeout OR temp violation | Phase duration, IBAT trend, temp guard events |
| CV | VBAT at regulation point | Constant voltage; current naturally tapers | IBAT below termination threshold OR timeout OR temp violation | VBAT stability, IBAT taper samples, termination candidate evidence |
| TERM | Termination criteria satisfied | Stop/limit charge per policy | Enter standby policy state | TERMINATION_REASON (enum), charge end timestamp |
| STANDBY | Charging ended | Maintain readiness with minimal aging and minimal power | Recharge trigger OR test trigger OR fault | Standby entry time, periodic health snapshots, exceptions |
Standby policy: readiness + low aging + traceability
- Recharge triggers: OCV drift below a threshold (measured under low-load/quiet conditions), scheduled maintenance events, or post-test recovery.
- Guards: temperature out of range, active pack fault, excessive recharge frequency (anti-chatter) to avoid accelerated aging.
- Exception handling: if a charge cycle is repeatedly terminated by timeouts or temp guards, raise a service-visible code rather than silently looping.
Health monitoring (minimal, implementable set)
OCV sampling: only when the system is in a quiet/low-load condition and after a short settling interval. Log OCV with timestamp and temperature to avoid misinterpretation.
IR proxy (concept): use a repeatable load step or known discharge event and record VBAT droop characteristics as a proxy for rising internal resistance (no complex model required).
Timeout diagnostics: maintain per-phase timeout flags (PRECHG/CC/CV) to distinguish “slow charge due to temperature guard” from “abnormal pack behavior.”
Evidence fields (service-usable)
- CHG_PHASE (PRECHG/CC/CV/TERM/STANDBY)
- PHASE_T_START / PHASE_T_END per phase
- VBAT_SNAP / IBAT_SNAP / T_SNAP (enter/mid/exit sampling points)
- CHG_TIMEOUT_FLAG (per-phase bits recommended)
- TERMINATION_REASON (I_TERM / TIMEOUT / TEMP_GUARD / FAULT / MANUAL)
H2-6. Emergency transfer (mains sense, hold-up, inrush, switchover)
Intent: This is the reliability core: explain why transfer can fail (“no transfer”), why it can cause a visible disturbance (“flicker dip”), and why it can trigger a restart (“reset/brownout”). The chapter is structured as a time-chain: detect → decide → actuate → stabilize, with evidence fields that make outcomes defensible.
1) Mains detection (logic-level): threshold, debounce, delay
- Signal definition: create a single truth signal (e.g., LINE_OK) that represents “mains is valid for normal operation.”
- Debounce: apply separate down/up debounces to prevent chatter during dips and recovery.
- Delay policy: a short delay reduces false transfers; too long increases the probability of a deep DC-bus dip and restart.
- Evidence anchor: the moment LINE_OK deasserts defines transfer_timestamp.
2) Hold-up (keep control alive during the transfer window)
- Hold-up window: from mains invalidation to stable emergency output (ILED stable).
- Critical metric: VDC-link dip depth (how low VBUS falls) relative to the system’s brownout threshold.
- Priority: maintain the control/log domain first; an uncontrolled restart turns a brief dip into a long blackout and destroys traceability.
3) Inrush and soft-start (battery/inverter start events)
- Battery-side stress: discharge current peaks can create VBAT droop that mimics an undervoltage event.
- Inverter-side stress: peak events at inverter start can trigger OCP/OTP or destabilize the LED CC stage.
- Soft-start objective: cap peaks, avoid false UV/OCP, and reach stable ILED quickly without overshoot.
4) Switchover actuation: relay vs solid-state (impact metrics only)
- Relay: can introduce contact bounce and timing variation; impacts transfer time and dip shape.
- Solid-state: supports tighter timing control but adds conduction loss and thermal constraints; impacts steady efficiency and heat.
- Engineering lens: choose based on impact to transfer timestamp precision, dip depth, inrush peak, and restart probability.
Evidence fields (transfer reliability proof set)
- transfer_timestamp: when the system declared mains invalid and initiated transfer.
- VDC-link dip depth: minimum VBUS during the window; predicts brownout risk.
- reset_cause: explicit reason if a restart occurs (brownout/WDT/other).
- brownout_counter: frequency measure for field reliability and maintenance decisions.
H2-7. Inverter / emergency power stage (topologies + control priorities)
Intent: The emergency inverter stage is evaluated by start reliability, peak containment, and deterministic fault behavior—not by general-purpose UPS features. Topology is compared only by its impact on isolation integration, efficiency tendency, cost/complexity, and EMI risk.
Common stage forms (compared by impact metrics)
| Form | Isolation integration | Efficiency tendency | Cost / complexity | EMI risk lens |
|---|---|---|---|---|
| Push-pull | Often used with a transformer; isolation-friendly block partition | Can be solid in typical emergency power range; depends on stress margins | Moderate complexity; symmetric drive timing matters | Layout-sensitive current loops; switching edges shape conducted/radiated risk |
| Half-bridge | Works well with isolation stages; straightforward power-path mapping | Balanced switching behavior; good controllability across load transitions | Medium gate-drive complexity; fewer devices than full-bridge | dv/dt + loop area drive EMI risk; timing and return path dominate |
| Full-bridge | Isolation-capable; flexible modulation and stress distribution | Often favorable under higher power demands; strong control authority | Highest gate-drive complexity; more devices and coordination | More switching nodes; EMI depends heavily on edge control and loop closure |
Control priorities ladder (what wins when constraints conflict)
- 1) Start-up reliability: reach a stable switching state quickly after INV_EN, without oscillation or repeated retries.
- 2) Peak containment: enforce power/current limits to prevent VBAT droop and over-current events during ramp and load steps.
- 3) Regulation stability: provide a stable supply condition to the emergency LED current loop (or direct emergency CC path).
- 4) Efficiency: optimize only after the above constraints are satisfied.
Defined behaviors for edge cases (deterministic, service-friendly)
- Light-load / no-load: enter a bounded switching mode (e.g., frequency reduction or controlled burst) with a logged mode flag, not an uncontrolled oscillation.
- Short-circuit: prefer hiccup (periodic retry) when safe to attempt recovery; otherwise latched off with a clear cause code.
- Over-temperature: use a staged response (warn → limit → shutdown) so field logs reveal whether the event was gradual heating or a sudden thermal spike.
Evidence fields (minimal set that explains real failures)
- IINV_PEAK: peak inverter input current in a defined window (e.g., around T2→T3 of transfer).
- SWFREQ_STATE: startup / normal / foldback / hiccup (enum), to disambiguate “attempting recovery” vs “stable.”
- FAULT_CAUSE_BITS: OCP / UV / OV / SHORT / DRIVE_FAULT (bitset).
- THERMAL_FLAGS: T_WARN / T_LIMIT / OTP_LATCH (bitset).
- FAULT_FIRST_CAUSE: first-cause capture when multiple faults occur simultaneously (prevents ambiguous field diagnosis).
H2-8. LED current regulation during emergency (CC loop, dim level, flicker safety)
Intent: In emergency mode, the current loop must stay stable while respecting a tighter energy budget. The key is priority: battery-side power limiting sets the available envelope, and the LED CC loop tracks within that envelope using a setpoint clamp (or duty cap) to prevent integrator wind-up, oscillation, and visible flicker.
Emergency CC targets (measurable, service-friendly)
- ILED_SETPOINT: defined current target (or power-to-current mapping) in emergency mode.
- RIPPLE_BOUNDS: acceptable current ripple bounds (pp or rms; choose one consistently).
- STABILITY: no sustained oscillation or repeated mode toggling during derate events.
Dual-loop concept (battery budget loop + LED CC loop)
Budget loop: estimates available power/current from VBAT, IINV limits, and temperature constraints to output P_AVAIL / I_AVAIL.
LED CC loop: regulates ILED using sensed current feedback to meet the current command.
Arbitration: when the budget is insufficient, clamp the command (ILED_CMD_CLAMP) or cap duty (DUTY_CAP) so the CC controller does not wind up and create flicker-inducing recovery bursts.
Derate strategy (low battery / thermal / power limit) with anti-chatter
- Triggers: VBAT low, power limit active, thermal warning, abnormal IR proxy trend.
- Shape: step-down or ramp-down with a minimum hold time; avoid rapid up/down toggles.
- Logging: every derate event must record a DERATE_REASON and an updated current command.
Flicker safety: evidence chain (no standards deep-dive)
- ILED_RIPPLE_PP / ILED_RIPPLE_RMS: ripple evidence at the time of complaint.
- LOOP_MODE_FLAG: CC normal / CC clamped / power-limit / saturation (enum).
- MODE_TOGGLE_COUNT: number of mode switches in a time window (high counts correlate with visible instability).
- Complaint triage fields: timestamp, temperature, VBAT snapshot, and whether the event occurred shortly after transfer.
H2-9. Discharge curve, runtime metering & SOC estimation (aging-aware)
Intent: Turn “runtime” into a verifiable and self-testable metric. That requires an evidence chain: a meter that is explainable, thresholds that adapt to temperature/aging, and a derate/cutoff policy that is logged and replayable.
Runtime metering: why coulomb count + voltage need to work together
- Coulomb count (primary continuity): tracks delivered charge/energy during emergency discharge, enabling a continuous runtime_minutes counter.
- Voltage-based check (absolute sanity anchor): validates boundaries and protects against drift, especially near low-battery thresholds where load/temperature distort terminal voltage.
- Engineering conclusion: coulomb count explains “how much was delivered,” voltage-based checks prevent “drifted truth.” Combined, runtime becomes reproducible across field conditions without exposing algorithm internals.
Discharge curve drift: temperature + aging move the thresholds
- Temperature: low temperature increases effective impedance; terminal voltage collapses earlier under load → fixed cutoffs can cause premature shutdown.
- Aging: reduced usable capacity and softer voltage profile compress the tail of discharge → stability and flicker risk rise near end-of-runtime.
- Load level: emergency brightness/power changes the curve shape; thresholds must be consistent with the active power budget.
Low-battery policy: staged derate + deterministic cutoff (battery-safe, recoverable)
- Stage 1 (Early warning): mild derate to keep regulation stable; log the event and the active threshold profile.
- Stage 2 (Stability priority): stronger derate with command clamp; prevent mode-chatter that creates visible instability.
- Stage 3 (Safe cutoff): shutdown at a defined boundary, record a full snapshot so maintenance can replay the decision.
Evidence fields (the minimum set that proves runtime)
- runtime_minutes: elapsed emergency runtime from transfer start to cutoff/restore.
- SOC_percent: reported SOC with a method tag (e.g., “cc+v sanity”) for interpretability.
- low_batt_events: count of low-battery stage entries (process trace).
- capacity_estimate: aging-aware usable capacity estimate used to select thresholds.
- temp_at_discharge: temperature snapshot (and/or bin) used for the threshold profile.
H2-10. Self-test & reporting (functional test vs duration test, logs & indicators)
Intent: Self-test is the compliance and maintenance core: define what is tested, how pass/fail is decided, and how results are recorded and indicated so failures are deterministic, explainable, and repeatable.
Two self-test classes (coverage vs cost)
- Functional test (short): validates transfer path, inverter start, and LED current regulation stability over a short window. Low battery wear, high frequency scheduling.
- Duration test (long): validates runtime compliance and end-of-discharge derate/cutoff policy. Higher wear and time cost, lower frequency scheduling.
Pass/fail gates (each gate maps to a measurable field + failure stage)
- Transfer gate: transfer_time meets target and does not trigger brownout/reset; failure_stage=transfer.
- Regulation gate: ILED setpoint is reached and remains stable (no sustained oscillation, limited mode toggles); failure_stage=cc_reg.
- Runtime gate (duration test only): runtime_minutes reaches requirement or follows the approved derate/cutoff policy; failure_stage=runtime or cutoff.
- Diagnosability gate: failure is explainable (first-cause + stage), not random; failure_stage=diag.
Reporting: minimal self-test record (field-replayable)
- self_test_schedule: periodic / manual / maintenance-triggered (source + cadence id).
- test_type: functional / duration (enum).
- test_result_code: PASS or FAIL_xxx (enum).
- failure_stage: transfer / cc_reg / runtime / cutoff / diag (enum).
- last_pass_time: timestamp of last PASS (supports overdue logic).
Indicators: align UI state with logs (no black boxes)
- PASS: last_pass_time recent, no active failure codes.
- FAIL: test_result_code indicates failure; indicator must be explainable by failure_stage.
- OVERDUE: schedule missed; does not imply failure, but requires maintenance action.
H2-11. Validation & field debug playbook (what to measure first)
Goal: make field troubleshooting repeatable. Each symptom maps to two probes first, then a deterministic branch, then a first-fix action, and finally a minimal log/trace set so the outcome is replayable.
Quick start: “two probes first” per symptom
- No light on outage → MAINS_SENSE_STATUS + VDC_LINK (Vbus)
- Blinks then off → VDC_LINK dip + RESET_CAUSE / BROWNOUT_COUNTER
- Runtime short → capacity_estimate + temp_at_discharge + IBAT (tail) / low_batt_events
- Self-test fails but manual OK → test gating state + failure_stage + result_code
The two probes are chosen to separate trigger chain vs energy chain within minutes.
Recommended field test points (TP naming)
- TP-MS: mains-sense logic output (post-debounce)
- TP-EN: transfer/inverter enable (control to power stage)
- TP-BAT: VBAT at pack terminals (under load)
- TP-IBAT: battery current sense output
- TP-BUS: VDC_LINK / emergency bus (Vbus)
- TP-ILED: LED current sense output (CC loop)
- TP-RST: supervisor reset line / reset-flag readback
Use consistent TP labels so waveform screenshots are comparable across sites.
Symptom playbooks (evidence → branch → first fix)
Symptom A — “No light on outage”
- Measure first (2): TP-MS + TP-BUS
- Branch:
- TP-MS not asserted → detection chain issue (threshold/debounce/supply collapse)
- TP-MS asserted but TP-BUS stays low → power-stage start blocked (enable, protection, IBAT limit)
- TP-BUS rises but TP-ILED = 0 → CC enable/loop clamp or load protection path
- First fix: align mains-sense debounce/threshold with field bins; verify TP-EN timing; confirm protection is logged (not silent).
Symptom B — “Blinks then off”
- Measure first (2): TP-BUS (dip depth) + RESET_CAUSE
- Branch:
- Deep Vbus dip + brownout reset → inrush / hold-up / soft-start slope too aggressive
- Vbus OK but turns off → false protection trigger (short detect / CC instability → hiccup/latched)
- First fix: limit inrush (IBAT_peak) by staged enable; tighten protection debounce; require every off-event to produce a failure_stage + snapshot.
Symptom C — “Runtime short (does not meet requirement)”
- Measure first (2): capacity_estimate + temp_at_discharge + TP-IBAT tail / low_batt_events
- Branch:
- capacity_estimate low → battery aging/maintenance window (not a control bug)
- temp low → wrong threshold profile causes premature derate/cutoff
- IBAT abnormally high → efficiency/short/load anomaly (not “small capacity”)
- First fix: bind cutoffs to PROFILE_ID (TEMP_BIN + CAP_EST); log derate stage transitions and cutoff snapshot; verify tail stability (avoid flicker/reset).
Symptom D — “Self-test FAIL but manual works”
- Measure first (2): test gating state + failure_stage + test_result_code
- Branch:
- gating differs (SOC/temp/load) → self-test runs at a different operating point
- threshold/debounce stricter than runtime policy → “manual OK, test FAIL” becomes systematic
- logs incomplete → mis-decision (record chain not closed)
- First fix: make gating explicit and recorded; align pass/fail gates with the same thresholds used in emergency mode; require FAIL to include a full snapshot.
Validation matrix (minimal coverage that exposes hidden failures)
Avoid exploding combinations; pick a minimal set that stresses transfer + tail + test consistency.
- Temperature bins: Cold / Room / Hot (use the same bins as threshold selection)
- Aging bins: “High CAP_EST” vs “Low CAP_EST” (aging-aware)
- Start SOC bins: High / Mid / Low (matches derate thresholds)
- Load bins: Nominal vs Worst-case emergency brightness
Recommended minimal runs (example): Cold×LowSOC×WorstLoad, Room×MidSOC×Nominal, Hot×HighSOC×WorstLoad, Room×LowSOC×Nominal, Cold×MidSOC×Nominal, Hot×MidSOC×Nominal.
Evidence: minimum capture set (10 fields) + waveform screenshot points
Minimum 10 fields (one capture closes the loop)
- event_timestamp
- mode_state (Normal/Standby/Emergency/Test)
- mains_sense_status (post-debounce)
- transfer_time_ms
- Vdc_link_min_mV (during transfer window)
- reset_cause
- IBAT_peak_mA
- runtime_minutes
- capacity_estimate
- test_result_code + failure_stage
If temperature is a primary driver in your field returns, swap #9 or add temp_at_discharge as an 11th “optional but recommended”.
Waveform screenshots (3 captures are usually enough)
- Waveform A (transfer window): TP-MS + TP-BUS (mark outage edge + Vbus rise/dip)
- Waveform B (inrush): TP-BUS + TP-IBAT (correlate Vbus dip with IBAT peak)
- Waveform C (tail policy): TP-ILED + derate stage markers (show stability near end-of-runtime)
Always include the same timebase rules (e.g., “event −200 ms to +800 ms” for transfer) to make comparisons meaningful.
Example MPNs (debug-friendly building blocks used in emergency lighting)
These are examples to make the playbook actionable (signal naming + likely IC blocks). Equivalent parts are acceptable if they expose the same debug hooks.
Mains sense / transfer & reset
- AC mains detect optocoupler: H11AA1
- Comparator (thresholding/debounce support): LMV331, TLV3201
- Power mux / switchover (solid-state): TPS2121
- eFuse / inrush limiting: TPS25940
- Reset supervisor: TPS3839
Battery charge, SOC & metering
- 1-cell Li-ion charger (compact): BQ24074
- Switch-mode charger (higher power class): BQ24650, LTC4015
- Coulomb counter / meter IC: LTC2942
- Fuel gauge (voltage-based with compensation): MAX17048, BQ27441
- Battery protector (pack front-end class): BQ2970x (example family)
Inverter / power stage observability
- Half-bridge gate driver: IRS2101S
- Isolated gate driver: ADuM3223
- Current sense amplifier (fast, switching-friendly): INA240, AD8418
- Bus/battery monitor (telemetry): INA226
- MCU for logs/self-test: STM32G030 (example family)
LED current regulation / protection hooks
- Buck LED driver (CC building block): TPS92515, LM3409
- High-side switch / protection: TPS1H100 (example class)
- Temperature sensor (derate trigger): TMP117
- EEPROM for logs (optional): 24LC256 (I²C EEPROM class)
- RTC (optional schedule): DS3231
Tip: if your platform already has different ICs, keep the same evidence fields and map them to your part’s available pins/telemetry registers.
H2-12. FAQs (12) — evidence-based, accordion
Each answer stays within 40–70 words and follows: Short answer → What to measure → First fix. Evidence fields and TP names match earlier chapters for mechanical verification.
After a power outage it sometimes does not transfer — should you suspect undervoltage threshold or debounce delay first? Maps to: H2-6
Short answer: Most intermittent misses come from the mains-sense threshold plus debounce window being wrong for sagging input, or the controller browning out before it can commit the transfer.
What to measure: Measure TP-MS (post-debounce) and TP-BUS, and log transfer_time_ms and reset_cause.
First fix: Add hysteresis and tune debounce, keep sense powered, and consider a supervisor (TPS3839) and AC detect (H11AA1).
The lamp “flashes once” at switchover — is it a Vbus dip or an LED loop restart? Which two waveforms first? Maps to: H2-6 / H2-8
Short answer: A single flash at switchover usually means Vbus dipped below the LED stage’s UVLO, or the CC loop restarted after an enable glitch.
What to measure: Measure TP-BUS and TP-ILED around transfer_timestamp; check Vdc_link_min_mV and loop_mode_flag.
First fix: Delay LED enable until the bus recovers, limit inrush/soft-start, and stabilize the CC loop so it doesn’t re-arm.
Why is EMI more likely to fail in emergency mode — inverter switching or harness return loop? Maps to: H2-7 / H2-11
Short answer: Emergency mode often increases dV/dt and changes return paths, so common-mode noise rises even if steady power is lower.
What to measure: Measure switching_freq_state (or gate edge timing) and do a quick near-field scan at the bridge and harness; compare with TP-IBAT ripple.
First fix: Slow edges (gate R/driver), shrink hot-loops, add CM filtering where allowed, and verify harness grounding.
Runtime is below nominal — is it capacity aging or low temperature? Which log fields separate them? Maps to: H2-9
Short answer: Separate aging from cold by correlating capacity_estimate with temp_at_discharge and the timing of low_batt_events.
What to measure: Measure runtime_minutes, capacity_estimate, temp_at_discharge, plus average IBAT and derate_reason.
First fix: Select cutoff/derate profiles by TEMP_BIN + CAP_EST; replace the pack when CAP_EST is low, and for cold conditions derate earlier instead of hard-cutting into resets.
Self-test fails often but manual test seems OK — is it test window design or threshold decision logic? Maps to: H2-10
Short answer: Most cases are gating or policy mismatch: the test runs at a different SOC/temp/load point, or its thresholds/debounce are stricter than emergency mode.
What to measure: Measure self_test_schedule, gating_state, failure_stage, test_result_code, and TP-BUS/TP-ILED during the run.
First fix: Record gating explicitly, align pass/fail gates with runtime policy, and require a FAIL snapshot so it’s diagnosable.
Charging is very slow or never reaches full — check the charge phase machine or temperature limits first? Maps to: H2-5
Short answer: Start by confirming the charger state machine, not the battery: slow charge is often precharge/CC limited by temperature, input headroom, or safety timers.
What to measure: Measure charge_phase, charge_timeout, termination_reason, and the VBAT/IBAT curve; verify NTC reading.
First fix: Validate NTC placement/limits, tune timers/termination, and confirm the charger (e.g., BQ24074/BQ24650) sees correct sense R and VIN margin.
At low battery should the system derate first or shut off immediately — how to protect the pack and ensure recovery? Maps to: H2-9
Short answer: Derate first to preserve light continuity, then cut off to protect the pack and guarantee recovery.
What to measure: Measure SOC_percent or capacity_estimate, low_batt_events, derate_reason, and Vdc_link_min_mV near the tail.
First fix: Implement staged derate (P1→P2) then UV cutoff with hysteresis; store last_cutoff snapshots and define recharge-clear conditions to avoid latch-off.
Inverter startup trips overcurrent — is it inrush or power-stage drive timing? Maps to: H2-6 / H2-7
Short answer: Startup overcurrent is typically bus-cap inrush or timing/blanking mismatch, not a real short.
What to measure: Measure IINV_peak with TP-IBAT, and TP-EN versus gate signals; check Vbus ramp and fault_cause bits.
First fix: Add precharge/soft-start and peak limiting, tune deadtime/blanking, and use a driver with consistent timing (IRS2101S or isolated ADuM3223).
Emergency CC ripple is high and causes visible flicker — adjust compensation first or change power-limit strategy? Maps to: H2-8
Short answer: Large ripple and visible flicker are often loop-priority conflict: the power-limit loop modulates the CC loop too aggressively.
What to measure: Measure ILED_ripple, loop_mode_flag, power_limit_active, and correlate TP-ILED with TP-IBAT.
First fix: Enforce priority (power-limit slower than CC), add ripple filtering on sense paths, then retune CC compensation and verify deep-dim stability.
How to do long-term standby health monitoring without false alarms or misses? Maps to: H2-5
Short answer: Avoid false alarms by using periodic, low-disturbance checks and trend thresholds; single OCV snapshots are noisy.
What to measure: Measure OCV_sample after rest, internal_resistance_proxy (ΔV/ΔI), temperature trend, and false_alarm_counter; log charge_phase transitions.
First fix: Gate checks to stable temperature and known rest time, require N consecutive anomalies, and update baselines slowly to track aging.
After replacing the battery, runtime becomes worse — pack ID/temperature sensing or SOC initialization? Maps to: H2-4 / H2-9
Short answer: After replacement, worse runtime is usually wrong NTC curve/placement, pack protection current limiting, or SOC/model initialized to the wrong capacity.
What to measure: Measure pack_fault_flags, pack_ntc, capacity_estimate reset behavior, and first-discharge IBAT; confirm SOC_init_reason.
First fix: Validate NTC, clear/relearn capacity model, and ensure protector FETs aren’t limiting (BQ2970x-class).
How can a minimal test set prove the fault is in the battery / charger / transfer / inverter / LED loop block? Maps to: H2-11
Short answer: Use a two-probes-first sequence and a single capture package to isolate the failing block.
What to measure: Measure TP-MS+TP-BUS (transfer), TP-IBAT (battery/charger), TP-EN/gates (inverter), and TP-ILED (LED loop); save the min-10-fields set from H2-11.
First fix: Repair the first failing link, then rerun only that block’s temp/SOC/load bin to confirm repeatability.
MPN quick reference used/mentioned in this FAQ (examples):
- H11AA1 (AC mains detect optocoupler)
- TPS3839 (reset supervisor)
- BQ24074, BQ24650 (battery charger IC examples)
- IRS2101S, ADuM3223 (gate driver examples)
- BQ2970x family (battery protection example class)