Hold-Up & Emergency Power (Supercap Ride-Through & Fast OR-ing)
← Back to: Avionics & Mission Systems
Hold-up & emergency power is about buying a predictable energy window between a bus drop and a controlled degrade/safe shutdown—by sizing supercaps realistically, switching over with zero backfeed, and triggering last-gasp logging before rails fall below UVLO.
H2-1 · What it is & what this page owns
Hold-up (ride-through) is the energy window that keeps critical avionics alive long enough to switch sources and complete a controlled last-gasp. This page focuses on three pieces that must close the loop: supercap energy management, ideal-diode OR-ing for fast switchover, and power-fail logging so every dropout can be reconstructed.
What this page owns (hard boundary)
A reusable hold-up module view: energy store (supercap), power path (OR-ing/ideal diode), and evidence chain (power-fail → last-gasp → NVM record).
What is intentionally not covered
Upstream surge/lightning/DO-160 front-end design, EMI filter implementation, and full-aircraft multi-rail architecture. Those belong to sibling pages (linked only).
- 28 V Aircraft Power Front-End (surge/spike handling, hot-swap/eFuse at the input)
- Multi-Rail PoL & Sequencing (system sequencing, PMBus telemetry as a platform topic)
- BIT/BIST & Health Monitoring (fleet analytics beyond hold-up module health)
H2-2 · Requirements you must pin down first (the 6 numbers)
Hold-up design fails most often because the “inputs” are vague. This chapter turns the requirement into six measurable numbers so energy sizing, switchover design, and last-gasp logging can be verified on a bench and defended in reviews.
The six inputs (each must be definable, measurable, worst-case)
-
Load model (constant power vs constant current): define the power profile during last-gasp (steady + peak), not just “average current.”
Measure: capture rail power during the exact shutdown/flush routine; Worst-case: peak power and minimum input voltage. -
Voltage window (Vstart → Vmin): Vmin is set by the tightest limit among DC/DC dropout, UVLO, and PG/RESET behavior under droop.
Measure: sweep input droop while monitoring regulation loss + PG; Worst-case: lowest temperature + aged caps. -
Target hold-up time (ms→s): split into switchover budget + last-gasp budget + margin.
Measure: timestamp PF asserted → “flush done”; Worst-case: cold ESR + highest load. -
Max switchover disturbance (ΔV and Δt): define the largest allowed droop and interruption without causing resets or state corruption.
Measure: step load + forced dropout and record droop at the rail; Worst-case: fastest droop edge + highest di/dt. -
Temperature + life targets: supercap ESR rises at cold; leakage rises at hot; usable capacitance and margin degrade with aging.
Measure: ESR/leakage at temperature corners; Worst-case: end-of-life ESR + lowest ambient. -
Recharge constraints (charge time + bus impact): define acceptable inrush/charge current and how much bus droop is permitted while recharging.
Measure: bus droop during recharge and steady leakage; Worst-case: simultaneous loads + limited source current.
Common requirement traps (and what they break)
- “System still runs” is used as Vmin. Result: the DC/DC is already out of regulation; the CPU resets before logs finish.
- Only average current is specified. Result: last-gasp peaks (flush/write/interrupt storms) collapse the rail early.
- Switchover is specified as “fast” without ΔV/Δt. Result: OR-ing oscillation or brief droop trips PG/RESET and creates phantom faults.
H2-3 · Hold-up energy sizing (supercap math that matches reality)
Hold-up sizing is not only “pick a capacitance.” A complete requirement produces three deliverables: C_required at worst-case temperature/end-of-life, ESR_max so the rail never crosses UVLO during the initial step, and charge-limit so recharge does not disturb the bus.
Choose the right load model first
Constant power is typical when a DC/DC keeps a rail regulated (the input current rises as voltage falls). Constant current applies to simple resistive/linear loads or known current sinks.
Pick V1 and V2 from hard limits
V1 is the usable starting voltage after any immediate ESR step. V2 must stay above the strictest boundary among DC/DC dropout, UVLO, and PG/RESET behavior under droop.
ΔV = (I · t) / C (use when current is fixed)E_load = P · t and ΔE = 0.5 · C · (V1² − V2²)Reality corrections (apply before final margin)
- ESR step: ensure the initial droop I_peak × ESR does not push below V2/UVLO.
- Conversion loss: include DC/DC efficiency so the store energy matches the input power required during last-gasp.
- Temperature corners: cold raises ESR; heat raises leakage; usable C shifts with chemistry and construction.
- End-of-life margin: design to EoL ESR/C, not only day-one measurements.
| Input | How to define / measure | What it drives |
|---|---|---|
| P_hold, P_peak, t_last-gasp | Capture power during the actual shutdown/flush routine (include worst interrupt/service load). Use peak for ESR-step checks; use hold for energy budget. | C_required (energy) and ESR_max (step droop) |
| V1, V2 | V2 from dropout/UVLO/PG boundary; V1 after accounting for step droop and path losses. Define V2 as a hard “no reset/no corruption” limit. | Energy window (V1²−V2²) and acceptance limits |
| η (efficiency) | Use worst-case efficiency at the relevant load and temperature. η links store energy to rail power needs. | Effective power drawn from the store during last-gasp |
| ESR(T, EoL), C(T, EoL) | Use vendor curves + verification data at cold/room/hot, then apply aging margin. Design to worst-case; validate with sweep tests. | Final C sizing, ESR limit, and confidence under corners |
| I_charge_limit, recharge time | Set by allowable bus disturbance and thermal budget of charger/FETs. Recharge must not cause brownout of other loads. | Charger spec and bus impact constraints (module-level) |
H2-4 · Architecture options (where the hold-up actually sits)
Architecture choice should be a criteria decision, not a preference. The hold-up location determines required energy, switchover difficulty, recharge impact, and how clearly the “critical domain” can be bounded.
Bus hold-up
Hold-up is placed on the bus side to cover multiple downstream loads. It maximizes coverage but tends to increase energy, recharge impact, and thermal stress on the power path.
Rail hold-up
Hold-up is placed on a selected critical rail (or the output of a critical DC/DC). Energy is smaller and control is tighter, but the boundary must be correct—unsupported rails will reset first.
Hybrid is common for last-gasp
A small bus hold-up plus a dedicated critical-rail hold-up often balances coverage and energy efficiency. It is a practical way to guarantee “log flush + safe state” without trying to hold everything.
Hard criteria (turn this into a simple decision)
- Coverage scope: multiple loads must ride through → bus/hybrid; only “critical compute + log” → rail/hybrid.
- Energy scale (P × t): large energy pushes toward bus/hybrid; small energy favors rail/hybrid.
- Switchover acceptance: near-zero interruption needs clean OR-ing and tight boundaries; define ΔV/Δt explicitly.
- Recharge constraint: sensitive bus + limited source current favors rail/hybrid with strict charge limiting.
- Maintainability: modular hold-up domains reduce “mystery resets” and simplify health checks.
Typical failure modes (one per architecture)
- Bus: recharge disturbance or excessive heat causes repeated brownout events during recovery.
- Rail: the chosen “critical rail” misses a dependency (PG/RESET chain), so the system resets before logs finish.
- Hybrid: boundaries are unclear and domains “fight” (oscillatory handoff), creating intermittent resets.
H2-5 · Supercap management: charging, balancing, health (the real IC jobs)
A hold-up module is only reliable when four IC-level jobs work together: charger (controlled recharge), balancer (safe series stacking), supervisor (protection and gating), and sense/health (capacity/ESR/leakage trends that explain field events).
Charger: recharge without disturbing the bus
Implement current limit with a voltage clamp and thermal derating. The target is not “fastest charge” but predictable bus impact and controlled heat.
- Set: I_charge_limit / Vcap_max / T_derate
- Verify: bus droop during recharge at weakest source condition
- Fail mode: recharge droop triggers false PF / repeated last-gasp
Balancer: series stacks need active control
Series cells drift due to tolerance and leakage differences. Without balancing, a single cell can hit OV or UV while the total pack voltage still looks “normal.”
- Passive: simplest, good for small drift; costs continuous heat
- Active: faster recovery for larger drift; higher complexity and qualification burden
- Boundary inputs: cell count, allowed equalization time, permitted standby loss
Supervisor: module-self protection and gating
Keep protection local to the hold-up module: per-cell OV/UV, pack OV/UV, short-circuit response, and defined reverse-path behavior (what directions are allowed vs blocked).
- OV/UV: per-cell thresholds + pack thresholds
- Short: limit or disconnect with a predictable state
- Outcome: no “mystery” behavior—faults are latched and reportable
Sense & Health: trends that predict failures
Use simple, testable estimates: C_eff from controlled ΔQ/ΔV windows, ESR from ΔV_step/ΔI, and leakage from rest-current/voltage decay. Apply temperature-aware thresholds.
- ESR↑ at cold → higher step droop during switchover
- Leakage↑ at hot → reduced ready energy / longer recharge
- Validate: trend consistency across temperature corners
H2-6 · OR-ing & fast switchover (zero-gap power path design)
“Fast switchover” must be testable. A correct OR-ing path meets four acceptance checks: max droop, max interruption, no hunting, and no reverse current. Everything else (ideal diode control, thresholds, damping) exists to satisfy these four.
Fast switchover acceptance (bench-verifiable)
- Max droop (ΔV): output stays above Vmin / avoids PG/RESET faults during the handoff.
- Max interruption (Δt): dead-time stays below the system’s dropout tolerance.
- No hunting: sources do not “fight” (no oscillatory takeover / return behavior).
- No reverse current: backup energy never backfeeds the primary path.
Why diode OR-ing is often insufficient
- Voltage loss: diode drop steals hold-up window and shortens ride-through time.
- Thermal loss: high current raises dissipation and reduces reliability margin.
- Control limits: passive behavior cannot enforce clean reverse-current blocking under all transients.
Ideal diode / OR-ing FET control (what it actually does)
- Detect: ΔV or Vds plus current direction cues.
- Actuate: gate drive to minimize drop while enabling fast reverse-current turn-off.
- Stabilize: thresholds, hysteresis, and timing windows to prevent “fight-back.”
Two hidden enemies (the causes of “it should work but resets anyway”)
- Parasitics + Qg delay: wiring inductance/resistance and gate charge slow the real transition, creating droop or dead-time.
- Mis-detection and chatter: noisy ΔV sensing or insufficient hysteresis causes oscillatory handoff and intermittent PG faults.
H2-7 · Power-fail detection & last-gasp controller (how you buy time)
A reliable last-gasp flow needs two things: a PF decision that is early enough and actions that are provably completed. Treat PF as an event pipeline (detect → confirm → act), not a single threshold.
Where to sense PF (choose a role, not just a node)
- Vbus sense (early warning): triggers “prepare” actions early; needs stronger immunity to transients.
- Key rail sense (true boundary): aligned with dropout/UVLO reality; may be too late for long flush tasks.
- DC/DC in/out (two-stage): use input as early warning and output as final confirmation.
A practical strategy is Vbus → PF_pending, then rail confirm before entering hard last-gasp.
Threshold + debounce (budget-based)
- Debounce too short: false PF from load steps → repeated last-gasp and wasted energy/write cycles.
- Debounce too long: PF arrives late → flush misses the window before UVLO/reset.
- Rule: debounce time must fit inside the last-gasp budget (PF → UVLO).
Add hysteresis and a short blanking window so PF decisions do not chatter around the threshold.
Last-gasp controller: actions with completion proof
- Shed load first: disable non-critical tasks/peripherals to reduce power immediately.
- Snapshot & flush: trigger NVM write for the event record and critical state.
- Record evidence: PF time (relative ticks), Vcap/Vbus at PF, temperature, reset cause.
- Completion flag: write a “flush_done” marker only after CRC/sequence is committed.
The most useful outcome is not “attempted to write,” but provably written with a consistent record format.
H2-8 · Power-fail logging that is actually useful (what to log + how to trust it)
Power-fail logs are only valuable when they can replay cause and timing after the next boot. The minimum recipe is: evidence fields + atomic write rules + simple time base (monotonic ticks).
Minimum fields for replay (evidence chain)
- Trigger: pf_time_ticks, pf_source, pf_debounce_id
- Energy: vbus_at_pf, vcap_at_pf, temp_at_pf
- State: hold_up_enter_ticks, hold_up_exit_ticks, or_ing_state
- Outcome: flush_started, flush_done, reset_cause, task_shed_mask
The key discriminator is flush_done with a valid CRC/sequence—without it, the record is not trustworthy.
| Field | Why it exists | When to write |
|---|---|---|
| pf_time_ticks | Orders events and measures budgets without needing absolute time. | At PF detection (before long actions). |
| pf_source | Explains whether PF came from bus sense, rail boundary, or UVLO proximity. | At PF_pending entry. |
| vbus_at_pf / vcap_at_pf | Captures remaining energy margin at the moment decisions were made. | At PF_pending or Hold-up entry. |
| hold_up_enter_ticks | Proves when backup took over; useful to debug droop and chatter. | When OR-ing switches to hold-up path. |
| flush_started / flush_done | Separates “attempted write” from “provably committed write.” | Start flag early; done flag after CRC/sequence commit. |
| reset_cause | Confirms if the outcome was BOR/UVLO/WDT and supports root-cause replay. | After reboot (first code path on next boot). |
Write strategy: snapshot + ring (minimal and robust)
- Snapshot record: written on PF to capture the high-value evidence chain.
- Ring buffer: optional low-rate background records for trends (keep small).
- Rule: PF events must produce a snapshot; ring logs never replace snapshots.
Atomicity: how to trust the record after a hard cut
- sequence_id: increments per record to detect missing/partial writes.
- crc: validates payload integrity.
- valid_marker: written last; if missing, record is invalid.
Write order: payload → CRC/sequence → valid_marker. Read only records with valid_marker + CRC OK.
1) PF detected → pf_time_ticks, pf_source, vbus_at_pf, vcap_at_pf 2) Enter PF_pending → debounce window starts 3) Hold-up active → hold_up_enter_ticks, or_ing_state 4) Last-gasp → task_shed_mask set, flush_started=1 5) Commit record → payload written, CRC+sequence written, valid_marker written, flush_done=1 6) Next boot → reset_cause recorded and linked to last valid PF record
H2-9 · Failure modes & safety behaviors (what breaks in the field)
Field failures should be described as a closed loop: Failure → Symptom → Observable signal → Mitigation. The most useful mitigations are those that make the system fail in a diagnosable way and leave evidence in logs.
How to troubleshoot (minimal toolset)
- Waveforms: Vbus / Vrail / Vcap, switchover droop, dead-time, reverse current spikes.
- Thermal: OR-ing FET temperature rise, charger/balancer hotspots.
- Event evidence: pf_source, vcap_at_pf, hold_up_enter, flush_done, record_valid, reset_cause.
A failure mode is considered “closed” only when the observable signal uniquely points to a cause and a fix.
| Failure | Symptom | Observable signal | Quick check | Mitigation / safety behavior |
|---|---|---|---|---|
| Cap ESR ↑ | Deeper droop at switchover; PG/RESET chatter | ΔV_step at switchover grows; worst at cold | Repeat same ΔI step at two temps; compare step size | Shed load earlier; enforce ESR end-of-life limit; keep droop margin |
| Leakage ↑ | Standby drain; Vcap not “ready” when needed | Vcap decays faster after charge; charger duty rises | Charge → rest → log Vcap decay curve | Add “Vcap_ready” gate; alert on leakage trend; reduce background load |
| Balancing fails | Single cell over-voltage / reduced lifetime | Cell voltage spread increases; OV event counter increases | Check cell taps near end-of-charge; look for early peaking cell | Limit per-cell max; log cell-OV events; degrade to safe stop if persistent |
| Backfeed | Unexpected energizing of upstream node | Reverse current spike; upstream voltage rises after main removal | Open main supply; monitor upstream node for lift | Fast reverse cutoff; treat backfeed as fault and latch in logs |
| Hunting / oscillation | Repeated small droops; intermittent resets | Alternating current share; periodic ripple around switchover | Repeat identical outage; verify repeatable oscillation signature | Add hysteresis + blanking; prevent chattering; prefer stable takeover rule |
| OR-ing FET thermal | Efficiency drops; thermal shutdown or drift | Vds drop increases; temperature rise accelerates at high current | Measure Vds + temperature vs current; look for runaway region | Enforce safe derating; log over-temp; degrade to controlled shutdown (fail-safe) |
| Power path short | Uncontrolled behavior; protection trips | Abnormal current, persistent droop, no stable state | Observe current clamp behavior during takeover attempt | Prefer fail-safe isolation if possible; record fault and stop noncritical loads |
| False PF | Frequent last-gasp entries; write wear | PF events without real outage; pf_debounce_id points to short dips | Inject short droop pulses; count false triggers | Budget-based debounce; PF_pending + confirm; log false-PF counter |
| Missed / late PF | flush_done=0; logs missing for true outage | Record invalid (CRC/valid missing) or absent event record | Speed up outage edge; repeat N trials and measure failure rate | Sense earlier node; prioritize shed before flush; tighten action ordering |
H2-10 · Validation checklist (prove hold-up & switchover are done)
Validation is complete only when it covers bench waveforms, system task completion, and worst-case reliability. Each test should define: setup → stimulus → measurement → pass/fail.
Three validation layers (coverage intent)
- Bench: droop, interrupt, reverse current, thermal rise under controlled load steps.
- System: last-gasp success rate (N trials), record validity rate (valid + CRC).
- Reliability: cold/hot behavior, ESR/leakage worst points, aging margin checks.
Must-test checklist (with pass/fail language)
- Switchover droop & interrupt: measure ΔV_droop and Δt_interrupt under multiple load steps; require no hunting and no backfeed.
- PF debounce boundary: inject short dips; require “no last-gasp” in the non-outage window and “must trigger” in the true outage window.
- Last-gasp success rate: run N outage trials per condition; require flush_done and record_valid for each intended snapshot.
- Temperature worst point: confirm cold/hot limits for ESR/leakage and verify hold-up margin is still met.
Avoid statements like “test passed.” Replace with measurable criteria and required repeatability (N consecutive trials).
1) Switchover:
At condition {TEMP, LOAD, OUTAGE_SHAPE}, ΔV_droop ≤ {V_MAX} and Δt_interrupt ≤ {T_MAX}
for {N} consecutive trials, with no hunting and no reverse current beyond {I_REV_MAX}.
2) PF decision:
Short dips within {DIP_DEPTH, DIP_WIDTH} shall NOT enter Last-gasp.
True outages beyond {OUTAGE_DEPTH, OUTAGE_DURATION} shall enter Last-gasp within {T_PF_MAX}.
3) Logging:
For {N} outage trials, record_valid_rate ≥ {R_VALID_MIN} and flush_success_rate ≥ {R_FLUSH_MIN}.
A record is valid only if valid_marker=1 AND CRC OK.
H2-11 · BOM / IC selection criteria (use criteria, not part numbers)
Use module-specific criteria that can be verified on the bench (waveforms, thermal, fault behavior) and audited in logs (PF timestamps, Vcap/Vbus, reset cause, flush success). Part numbers below are examples for shortlisting—always validate temperature range, qualification, derating, and fault behavior in the target system.
1) Supercap charger / backup controller
- Input range & UV/OV behavior — confirm what happens at brownout edges (no surprise oscillation).
- Charge current limit accuracy — specify whether it is peak/average and how it behaves over temperature.
- Thermal regulation — require deterministic foldback rather than random shutdown near hot limits.
- Reverse blocking — prevent backfeed into the upstream bus during switchover or bus collapse.
- Stack support & visibility — how many series caps/cells can be monitored and protected.
- Health signals — ability to derive ESR/leakage/capacity trends for maintenance and margin checks.
- Fault reporting — OV/UV/OT/short indicators that map cleanly into last-gasp logs.
2) Balancer / stack monitor (cell safety & drift control)
- Series count support — match the actual stack (2/3/4/5…); avoid “almost fits” designs.
- Balancing method — passive bleed vs active transfer; define when complexity is justified.
- Balancing current & correction time — ensure worst-case imbalance can be corrected within maintenance windows.
- Quiescent current (Iq) — leakage + balancer Iq sets “always-ready” standby drain.
- Fault detection — open sense lead, shorted cell/cap, stuck balancer path; require explicit flags.
- Measurement accuracy — cell/cap voltage error directly impacts OV protection margins.
- Fail behavior — define the safe reaction if balancing is lost (alarm + disable charge + protect stack).
3) OR-ing / ideal diode controller (fast switchover, no backfeed)
- Reverse current cutoff speed — must shut off backfeed before it trips upstream rails or causes latchups.
- Gate drive strength — enough to slew MOSFET Qg with real PCB parasitics (trace L/R included).
- Forward drop control — stable handoff without light-load oscillation or “ping-pong” sharing.
- Parallel sharing — explicit support for multi-path OR-ing without current-hogging.
- Fault mode definition — what happens if MOSFET shorts/opens (fail-safe vs fail-dead must be tested).
- SOA / thermal validation path — provide a clear method to bound MOSFET stress during takeover.
- Testability — criteria must map to measurable: droop, dead-time, reverse spike, recovery.
4) PF supervisor + sensing (buy last-gasp time reliably)
- Threshold accuracy & temp drift — PF errors directly shrink usable last-gasp budget.
- Debounce/delay control — filter load transients without missing true bus collapse.
- Window / multi-rail logic — define priority when bus vs critical rail disagree.
- Reset behavior consistency — deterministic pulse width, release condition, and brownout handling.
- Low Iq — supervisor current becomes permanent standby drain alongside supercap leakage.
- Event latch hooks — clean “PF asserted” latch for the logger and firmware state machine.
- Sensing bandwidth & accuracy — current/voltage capture must match takeover dynamics, not generic telemetry.
H2-12 · FAQs ×12 – Hold-Up & Emergency Power
These FAQs focus on sizing, switchover, PF/last-gasp, logging robustness, and validation within the hold-up module. Answers are concise and map back to the main sections.
› How do you size supercap C for a constant-power load?
Use an energy balance, not ΔV = I·t/C. Require 0.5·C·(Vstart² − Vend²)·ηpath ≥ Pload·thold, where ηpath includes DC/DC efficiency and power-path losses. Budget an immediate ESR droop (Iload·ESR) and ensure Vend stays above converter UVLO after that step. Add margin for cold ESR rise and end-of-life capacitance loss.
› Why does switchover still dip even with “ideal diode” OR-ing?
“Ideal diode” control still needs time to detect ΔV and move MOSFET gate charge, and the power path has real R/L. During takeover, parasitic inductance plus MOSFET Qg produces a short dead-time and a transient droop. Extra dip often comes from source impedance (cap ESR), sense-point mismatch (measured at the wrong node), or control chatter under light load.
› What’s the practical limit for “zero interruption” switchover?
“Zero interruption” should be defined as “no reset, no PG drop, and no functional fault,” not literal 0 mV droop. Any path has finite resistance/inductance, so a fast load step will create ΔV. Set acceptance targets as (a) maximum droop at the load, and (b) maximum time below the rail’s valid window, measured under worst-case load and temperature.
› Supercap vs small backup battery—when is each better for last-gasp?
Supercaps excel when last-gasp needs high peak power, rapid response, and frequent cycles (ms–seconds) with simple state estimation, but leakage and cold ESR must be budgeted. Small batteries win when the required backup time is longer (seconds–minutes) and standby leakage must be very low, at the cost of more constraints on charging, health, and maintenance.
› How do ESR and temperature shift your hold-up margin?
ESR increases at cold and with aging, causing a larger immediate droop when the hold-up source takes load. Capacitance can also fall over life and at low temperature, shrinking usable energy. Design margin must guarantee Vend after ESR step stays above UVLO and task-complete thresholds. Validation should explicitly test cold/aged corners and confirm droop/interrupt plus last-gasp success rate.
› How do you prevent inrush when charging a large supercap bank?
Treat supercap charging as a controlled load: enforce current limiting, soft-start, and power limiting so the aircraft bus does not droop or trip protections. Use a charger that supports CC/CV (or CC with a clear voltage clamp), thermal foldback, and a predictable ramp. Coordinate charge enable with system state (staggered start) and verify bus current and droop during worst-case cold ESR.
› Passive vs active balancing—what failure does each prevent?
Balancing primarily prevents single-cell overvoltage and long-term drift in series stacks. Passive balancing bleeds energy to keep cells aligned; it is simple but wastes power and can be slow for large imbalances. Active balancing moves charge between cells for faster correction and better efficiency, but it adds switches/control and introduces new fault modes. Require explicit fault detection (stuck switch, open bleed, sense lead issues).
› Where should the power-fail threshold be set to avoid false triggers?
Set PF so it triggers early enough to buy last-gasp time, but not so early that normal load steps cause false alarms. A practical rule is PF threshold above the rail’s minimum operating window plus a measured transient droop margin. Use a two-stage approach: PF-pending with short debounce to reject spikes, then PF-confirmed before committing to load shedding and NVM flush. Validate by injecting brief dips and real load transients.
› What must be logged so post-event analysis is actually possible?
Log the minimum set that recreates the timeline: PF asserted time (relative counter), Vbus and Vcap snapshots, switchover status (entered/exited hold-up), reset cause, and a “flush completed” marker. Add temperature and a cap-health proxy (ESR/leak trend or simple flags) to explain margin shifts. Include a sequence ID and CRC so corrupted or half-written records can be detected and ignored.
› How do you avoid half-written logs during brownout?
Use a commit pattern: write record payload first, then write a final “valid/commit” marker last, and protect the record with CRC plus a monotonic sequence ID. Prefer append-only or double-buffered slots rather than in-place updates. Budget NVM write time against last-gasp energy and enforce a hard stop if voltage falls below a safe write threshold. On boot, scan for the latest valid committed record only.
› What field symptoms indicate supercap aging vs OR-ing instability?
Supercap aging often shows as growing droop during takeover (especially at cold), longer recharge time, and increased standby drain from rising leakage; the behavior is usually repeatable with temperature correlation. OR-ing instability tends to show chatter or oscillatory current sharing, unexpected heating in the power-path MOSFETs, and waveform signatures like reverse-current spikes or “ping-pong” handoff. Correlate logs (PF frequency) with scope captures during injected dips.
› What’s a minimal validation plan to prove hold-up is “done”?
A minimal plan should prove function, margin, and repeatability: (1) measure switchover droop and time-below-valid under worst-case load steps, (2) sweep PF threshold/debounce with injected brief dips and real transients, (3) run N-cycle last-gasp tests and track flush success rate, (4) test cold/hot corners and end-of-life assumptions for ESR/leakage, and (5) verify log integrity (CRC/sequence/commit marker).