Edge Power & Backup: eFuse, Hot-Swap, PMBus Telemetry
← Back to: IoT & Edge Computing
Edge power and backup design is about making power-up, faults, and short hold-up events behave in a controlled, repeatable way: avoid false trips, stay within SOA/thermal limits, keep the bus above UVLO/PG during switchover, and leave actionable telemetry/log evidence when something fails.
H2-1|Scope & Boundary: What “Edge Power & Backup” actually covers
This page focuses on the power path that decides whether an edge device survives the field: controlled turn-on (inrush), fault isolation (short/overload/over-voltage), ride-through/hold-up (seconds to minutes), and telemetry + fault logs that explain what happened. It does not expand into facility power, long-duration UPS, or full battery-management algorithms.
- 24 V industrial: long cables, load steps, transient events → inrush + brownout + nuisance trips.
- PoE PD → 48 V / intermediate bus: negotiation and dynamic power limits → restart storms if thresholds are wrong.
- 12 V adapters: different dynamic response and current limits → “swap adapter changes stability” is a common symptom.
- Hold-up window (thold): how long critical loads stay alive after input droops.
- Allowed droop (Vhi → Vlo): usable voltage range set by PG thresholds and DC/DC UVLO.
- Goal: finish essential actions (log flush / safe state / link preservation), not long runtime.
Success criteria (pass/fail, not slogans)
- Cold/hot start, max load, worst input source: no repeated trip/retry loops.
- Blanks/debounce windows prevent false OCP/UV events during intended inrush.
- Repeated start attempts do not push the pass FET into unsafe linear-energy zones.
- OTP flags and temperature telemetry remain inside design limits across stress cases.
- Input droop does not cause reboot storms; PG/RESET sequencing behaves deterministically.
- Critical rail stays up for thold to complete “must-do” actions.
- Telemetry/logs answer: input collapse vs inrush too aggressive vs thresholds/time constants wrong.
- Last-event record includes key voltages/currents/status bits near the fault timestamp.
SEO note: This page uses explicit scope boundaries and measurable definitions (thold, Vhi, Vlo) to match “how-to debug” and “how-to size hold-up” search intent without drifting into UPS/BMS/harvesting topics.
H2-2|Power Path Topology: Input → controlled turn-on → bus → critical loads
Practical edge systems usually fall into one of three power-path templates. The goal is to pick a topology that matches the field failure pattern (brownout, overload, source swapping, intermittent shorts), then set thresholds/time constants so the system is stable across cold start, hot start, and worst-case input sources.
Three templates (selection intent)
- Best for: one power source, but “dirty input” and frequent load steps.
- Primary risk: inrush and linear-energy stress during startup or repeated retries.
- Key metrics: dV/dt ramp, Ilimit, blanking, SOA/thermal, retry policy.
- Best for: need switchover without dropping the bus below UVLO/PG.
- Primary risk: reverse feeding and supply “fighting”, plus bus dip during switchover.
- Key metrics: reverse block, switchover threshold, ORing loss/heat, Vbus capacitance.
- Best for: whole system does not need backup; only critical functions must stay alive.
- Primary risk: critical load definition expands over time → hold-up budget is exceeded.
- Key metrics: keep-alive power cap, Vlo boundary, hold-up window, load shedding order.
Deliverable: topology selection mini-table (scenario → topology → what must be verified)
| Field scenario trigger | Recommended template | Critical design knobs (must be explicit) | First waveforms / evidence to capture | Common failure mode if mis-set |
|---|---|---|---|---|
|
Cold start trips, warm start is fine “starts only after repeated plugging” |
Template A | dV/dt ramp, Ilimit, blanking window, SOA margin, retry/latch policy | TP1 Vin, TP3 Iin peak, TP2 Vbus ramp, FAULT status and retry counter | Inrush too aggressive → nuisance trips or linear-energy overheating during retries |
|
Input droops cause reboot storms “device keeps rebooting during brownouts” |
Template C (or A + tuned UV/PG) | Vlo boundary (PG vs UVLO), hold-up window, load shedding order, reset gating | TP2 Vbus vs PG/RESET timing, TP4 Vcap, last-event log around droop | PG/UV thresholds inconsistent → uncontrolled resets and log corruption |
|
Main power fails but system must stay alive briefly “needs safe shutdown / log flush” |
Template C | keep-alive rail power cap, Vhi/Vlo, storage ESR, peak-power tasks | TP4 Vcap droop under peak load, keep-alive rail current, completion time for critical actions | Underestimated peak tasks (write flash, radio burst) → hold-up window collapses |
|
Dual source present (main + backup), but switchover is unstable “sometimes stable, sometimes resets” |
Template B | reverse blocking, switchover threshold, ORing loss/heat, bus capacitance, priority policy | TP2 Vbus dip at switchover, reverse current indication, ORing thermal, event logs | Reverse feed or supply fighting → unexpected trips, overheating, or bus collapse |
Next chapters (H2-3+) will zoom into eFuse vs hot-swap boundaries, inrush/SOA setting, fault response policy, hold-up sizing, ORing switchover stability, and PMBus evidence logging—without drifting into UPS, harvesting, or full BMS.
H2-3|eFuse vs Hot-Swap: Practical boundary, tradeoffs, and the most common mistakes
eFuse and hot-swap controllers often solve the same headline problem—protecting an input rail—but they behave very differently under linear operation (inrush limiting, brownout recovery, and repeated retries). A correct choice is less about the steady-state current rating and more about fault energy, response policy, and diagnostics.
Where the boundary actually sits
- Compact integration is valued and the expected fault energy is modest.
- Start-up is predictable (Cload and ramp behavior are well-bounded).
- Basic fault flags and simple GPIO reporting are sufficient for service.
- Repeated retries, large Cload, or harsh shorts make linear-energy the dominant risk.
- External FET selection and thermal design are needed to scale SOA margin.
- Richer fault policy and diagnostics are required to cut field debug time.
The 3 mistakes that cause “it limits current but still fails”
- Only checking Ilimit, ignoring MOSFET SOA: during inrush limit or short-circuit limiting, the pass FET operates in the linear region where loss is Vds × Id, not I²R. If the device repeats this event, thermal stress accumulates quickly.
- Using steady-state current to size protection: inrush peak is driven by Cload and ramp rate, so a “small-load” system can still trip at cold start or overload the upstream supply.
- Treating eFuse like a breaker (unbounded retries): automatic retries can turn a recoverable event into repeated linear-energy hits, causing the device to run hotter each cycle and fail sooner.
One-card comparison table (selection + debug oriented)
| Dimension | eFuse (integrated pass FET) | Hot-swap (controller + external FET) | Typical symptom when misapplied |
|---|---|---|---|
| Linear energy / SOA headroom | Fixed by package + internal FET; limited ability to scale | Scalable by FET choice, heatsinking, and layout/thermal path | “Limits current” but fails after retries or long-limit events |
| Inrush control | Often simpler ranges; may be adequate for bounded Cload | Wider ramp control and policy options for complex starts | Cold start trips, warm start works; long ramp causes upstream droop |
| Fault response policy | Commonly fixed or limited modes (retry/hiccup/latch) | More flexible policies and timing windows | Restart storms; “random” resets around brownouts |
| Voltage drop & dissipation | Rds(on) and heating are fixed; bus margin may shrink | Drop/heat can be optimized by choosing a lower-Rds FET | PG flicker or UVLO events even without overload |
| Diagnostics | Often basic flags and limited counters | Potentially richer fault reason codes and telemetry integration | Long debug time; parts swapped without identifying root cause |
This chapter intentionally avoids ORing switchover details and hold-up energy sizing; those are handled in later chapters to prevent scope overlap.
H2-4|Inrush & SOA: Controlled turn-on that avoids nuisance trips and device damage
Most edge power failures attributed to “random trips” are predictable outcomes of two coupled mechanisms: inrush peak (charging Cload) and linear energy (Vds × Id × time) during current limiting. A robust design chooses a ramp strategy that the upstream supply can tolerate and verifies SOA margin across single events and retry sequences.
Where inrush really comes from (Cload is a system property)
- Input bulk/EMI capacitors (near the connector).
- DC/DC input capacitors (often the dominant term).
- Downstream module caps (radio, storage, compute modules).
- Sequenced attachments (modules that connect after enable).
- Inrush is governed by dV/dt and effective Cload, not steady load current.
- Upstream supplies with different current-limit behavior can reshape dV/dt and stability.
- Repeated limiting events can accumulate heat even when average current looks small.
Two engineering-level calculations (useful, minimal)
- Iinrush ≈ Cload × dV/dt
- dV/dt is the controllable knob (soft-start / gate ramp).
- Use this to keep peak current below upstream limit.
- Epulse ≈ ∫ Vds × Id dt
- During limiting, loss is dominated by Vds × Id.
- Retry loops add energy repeatedly; heat may not fully recover.
Limit-mode choice (startup success vs heat vs safety)
- Higher startup success for heavy loads.
- Risk: longer linear operation can raise E_pulse and heating.
- Needs strict time windows and thermal protection.
- Reduces dissipation under hard faults.
- Risk: some loads cannot start if current collapses too early.
- Useful when fault probability is high and restart must be safe.
- Strong safety and minimal linear stress.
- Risk: poor availability without a controlled retry/latch policy.
- Best paired with evidence logging to avoid “blind swapping”.
Symptom → likely cause (evidence-first mapping)
- Effective Cload and ESR behave differently at temperature.
- dV/dt too fast or blanking window too short for the actual ramp.
- Upstream supply cold-start dynamics are weaker than assumed.
- Evidence: Iin peak, Vbus ramp slope, fault timing relative to blanking.
- Different upstream current-limit strategies reshape dV/dt and Vin sag.
- Output impedance differences change the Vin droop depth and UV behavior.
- Some sources enter foldback early, forcing repeated restart attempts.
- Evidence: Vin sag profile, Vbus build time, retry counter/fault reason.
Deliverable: 3-step parameter-setting flow (repeatable)
-
Estimate worst-case Cload
Output: Cload_max used for inrush design baseline (include DC/DC input caps and attached modules).
-
Choose dV/dt to bound Iinrush
Output: a dV/dt range that keeps Iinrush below upstream limit while meeting startup time needs.
-
Check SOA/thermal for single + retry sequences
Output: E_pulse margin and a bounded retry/latch policy (count + cooldown condition) tied to diagnosable logs.
This section sets up later chapters: fault policy (blanking/retry/latch) and evidence logging must be consistent with SOA/thermal margins, otherwise stability varies with temperature and upstream power sources.
H2-5|Fault Coverage Map: What to protect, and how to prioritize responses
A stable edge power path is defined by what faults are covered and how each fault is qualified and acted on. “Nuisance trips” and “insufficient protection” are usually the same design error: thresholds were chosen without consistent time qualification and without respecting linear-energy limits during current limiting.
Fault list (edge-relevant) and why they matter
- SCP: hard short; linear operation becomes destructive quickly.
- OCP: overload or partial short; can heat the pass FET during limiting.
- Reverse: reverse polarity or reverse current backfeed.
- UV / Brownout: sag causes repeated restarts without proper qualification.
- Power bounce: upstream foldback or cable drop produces oscillation.
- OVP: abnormal source or transient; may require immediate isolation.
- OTP: device/FET heating from repeated limit or poor airflow.
- Clamp heating: surge clamp path (e.g., TVS) runs hot after events.
Three response styles (choose by amplitude × time)
- Use for SCP, severe reverse, unsafe OVP, runaway thermal.
- Goal: avoid extended Vds × Id dissipation in linear region.
- Typical post-action: Latch + Report.
- Use for controlled start and bounded overload.
- Must include: time limit + retry budget + cooldown rule.
- Typical post-action: bounded Retry/Hiccup + Report.
- Use for UV/bounce/PG noise where false trips are costly.
- Key tool: blanking/debounce aligned with safe energy limits.
- Typical post-action: Report with min/peak capture.
Deliverable: Fault → Action matrix (policy + evidence)
| Fault | Primary action | Qualification | Post-action | Retry budget (if any) | Report / evidence to capture |
|---|---|---|---|---|---|
| SCP (hard short) | Disconnect | Very short / none | Latch | 0–small (only if proven safe) | Fault reason + Iin peak + Vbus min + counter |
| OCP (overload/soft short) | Limit | Short window to avoid false trips | Retry or Hiccup | Bounded retries + cooldown condition | Limit duration + Iin peak/avg + temp peak + counter |
| OVP | Disconnect (typ.) | Short (avoid slow thermal overstress) | Latch or Retry | Retry only if source is known to recover | Vin max + event timestamp + reason code |
| UV / Brownout | Qualify then Disconnect or Hold | Debounce/blanking to avoid oscillation | Retry | Bounded; prevent reset storms | Vin min + Vbus ramp time + retry counter |
| Reverse (polarity/backfeed) | Disconnect / block | None | Latch | 0 | Reverse detect flag + Vbus/Vin relation |
| OTP | Limit or Disconnect | Time-qualified to avoid chatter | Latch or Retry | Cooldown-based | Temperature peak + time-to-trip + counter |
| Power bounce | Qualify + stabilize | Debounce + restart gating | Retry | Bounded | Vin sag profile + Vbus min + retry counter |
| Clamp heating | Derate / Report | Time-qualified | Report | N/A | Event count + thermal sensor (if present) |
Evidence checklist (minimum to make field failures diagnosable)
- TP1 Vin: min/max during the event (sag/overshoot behavior).
- TP2 Vbus: minimum value and ramp time (brownout and restart storms).
- TP3 Iin: peak and/or limit duration (linear-energy driver).
- TPx Fault: reason code + counter (and temperature peak if available).
The matrix defines consistent behavior across sources, temperature, and load variability by tying actions to time qualification and bounded retry policies with diagnosable evidence.
H2-6|Backup Energy: Supercap vs Battery (hold-up / ride-through only)
“Backup” in edge devices typically means hold-up / ride-through: maintaining critical operation for seconds to minutes through brownouts, cable drops, or brief input loss. The decision is driven by power level, required time, and the allowed voltage droop window (Vhi → Vlo).
When each option fits (and what it costs)
- High pulse power is needed for short durations.
- Frequent ride-through events demand long cycle life.
- Cold environments require predictable short-term behavior.
- Leakage current increases standby energy cost.
- Volume grows quickly for minute-level energy.
- Series stacks may require basic balancing/protection.
- Minute-level energy is required in a small volume.
- Critical loads are modest and power is not extremely pulsed.
- Graceful shutdown or data preservation needs more time.
Key metrics that decide the design
- thold: required ride-through time (seconds → minutes).
- Pcrit: critical load power to maintain.
- V window: allowed droop from Vhi to Vlo.
- ESR matters: ΔV ≈ Ipeak × ESR can cause immediate brownout.
- Peak current and thermal rise must be checked under worst-case pulses.
- Charge time and charge limiting define recovery behavior after events.
Minimal engineering formulas (enough for sizing decisions)
- E ≈ 1/2 · C · (Vhi2 − Vlo2)
- Wider droop window yields more usable energy for the same capacitance.
- ΔV ≈ Ipeak · ESR
- “Energy is sufficient” does not guarantee “voltage stays above Vlo”.
Deliverable: three selection questions (fast decision tool)
-
How long must the system ride through?
Seconds often favor supercap; minutes often favor small battery, depending on power and volume constraints.
-
How much power is truly critical?
High pulse loads push the design toward low-ESR buffering (supercap-friendly behavior).
-
How low may the rail droop (Vhi → Vlo)?
A tighter droop window increases required storage and raises the importance of ESR and ORing losses.
A hold-up design is only “real” when it passes both checks: (1) energy is sufficient over Vhi→Vlo for the required time, and (2) ESR and ORing losses do not cause an immediate droop below Vlo during peak load.
H2-7|Hold-up Sizing: an engineering-ready way to estimate “how long it lasts”
Hold-up time is determined by two practical constraints: (1) the usable voltage window (Vhi → Vlo) and (2) the real load profile (average + peaks). A correct estimate must map Vlo to the system’s actual stop points (UVLO/PG/critical-min) and must subtract immediate droop from ESR and path losses.
Map Vhi / Vlo to real system thresholds
- Storage charge limit (supercap charger CV / stack rating).
- Backup source regulation point (battery + protection cutoff).
- Path headroom (ORing drop, protection drop).
- DC/DC UVLO (input lockout of the downstream converter).
- Critical load minimum (MCU/modem/storage min voltage).
- PG/Reset policy (avoid premature reset due to short dips).
- ΔVESR ≈ Ipeak · ESRtotal
- ESRtotal includes storage ESR + trace/cable + ORing/series elements.
- Subtract ΔVESR from the usable window before computing time.
Supercap sizing: two approximations (choose by load behavior)
- Use when the bus-side load current can be treated as roughly constant.
- thold ≈ C · (Vhi − Vlo) / Iload
- Good for quick checks and bounded current profiles.
- Use when critical loads are closer to constant power (DC/DC regulated).
- thold ≈ [½ · C · (Vhi2 − Vlo2)] / Pload
- Apply a conservative path efficiency factor η for drops and conversion losses.
Battery sizing (minute-level): energy estimate only
- thold ≈ (Eusable · η) / Pcrit
- Eusable can be approximated by Wh with a conservative usable fraction k.
- Keep the model simple; focus on cutoff voltage and protection boundaries.
- Protection cutoff / minimum allowed battery voltage (effective “Vlo”).
- Peak power events during hold-up (radio TX, flash write, actuator pulses).
- Temperature and aging margin in the usable fraction k.
Common sizing pitfalls (and the symptom they create)
| Pitfall | Typical symptom | Engineering fix |
|---|---|---|
| Capacitance is sized, but ESR droop is ignored | “Energy looks sufficient,” yet the system resets immediately on input loss | Reserve ΔVESR = Ipeak·ESRtotal; increase headroom, reduce ESR, or limit peaks |
| Average load is used, but peak events occur during hold-up | Stable at idle; fails during flash write / modem TX / startup spikes | Model Ppeak / Ipeak; define whether peak events are allowed during hold-up |
| Vlo is chosen without UVLO/PG/critical-min mapping | Unexpected early shutdown or repeated reset storms | Set Vlo to the highest of (UVLO, critical-min, PG policy) with margin |
| Path drops are ignored (ORing, protection, conversion) | Time estimate matches bench but fails in the field at temperature | Use conservative η; treat series drops as reduced V window |
Deliverable: minimum usable calculation box (variables + steps)
The box below is structured for copy/paste into a design note or test plan. Replace placeholders with measured values.
| Symbol | Meaning | Typical unit | Fill-in |
|---|---|---|---|
| C | Effective capacitance (at operating voltage) | F | [ ___ ] |
| Vhi | Initial bus/storage voltage at hold-up start | V | [ ___ ] |
| Vlo | Minimum allowed bus voltage (mapped to UVLO/PG/critical-min) | V | [ ___ ] |
| Iavg | Average bus-side current during hold-up | A | [ ___ ] |
| Ipeak | Peak current during hold-up events | A | [ ___ ] |
| ESRtotal | Storage + interconnect + ORing equivalent ESR | Ω | [ ___ ] |
| η | Path efficiency (drops and conversion losses) | – | [ ___ ] |
| Pcrit | Critical-load power (for constant-power estimate) | W | [ ___ ] |
-
Set Vlo from real stop points.
Vlo ≥ max(UVLO, critical-min, PG policy) + margin.
-
Reserve immediate droop.
ΔVESR = Ipeak · ESRtotal. Use (Vhi − Vlo − ΔVESR) as usable droop when peaks can occur.
-
Compute hold-up time (choose one model).
Constant-current: t ≈ C · (Vhi − Vlo) / Iavg. Constant-power: t ≈ [½ · C · (Vhi2 − Vlo2)] / Pcrit, then apply η conservatively.
-
Sanity-check against peaks.
Verify Vbus does not cross UVLO/PG during peak events (radio TX / flash write / actuator pulse).
A sizing result is only acceptable if the measured Vbus minimum stays above UVLO/PG during the worst allowed peak event, and if ΔVESR is explicitly reserved rather than assumed away.
H2-8|Power-Path Management: switching between main and backup without dropouts or reverse feed
Main/backup switching fails for two reasons: uncontrolled current direction (reverse feed or source fighting) and insufficient transient support (Vbus briefly crosses UVLO/PG during handover). Power-path management must guarantee reverse-current blocking, bounded voltage drop, and a PG/Reset policy that matches real switching dynamics.
ORing / ideal-diode control: what it actually prevents
- Prevents backup from energizing the main bus or upstream supply.
- Avoids unintended heating, “mystery power,” and unsafe current loops.
- Key feature: reverse current blocking (not just low forward drop).
- Two sources with slightly different voltages can circulate current.
- Creates heat and can trip protection even at modest load power.
- Controlled ORing ensures only the preferred path conducts.
- Forward drop reduces the usable V window for hold-up sizing.
- Losses scale with current; thermal rise can worsen drop and stability.
- Design target: “low drop + controlled direction + diagnosable events.”
What matters during the handover instant (microseconds to milliseconds)
- Current direction: load current must migrate to the intended source without reverse conduction.
- Bus capacitance support: Cbus bridges the delay while ORing transitions and control reacts.
- PG/Reset timing: PG thresholds and blanking must prevent reset storms during brief dips, without hiding real brownouts.
Two failure modes that most often “flip the system”
- Observation: main input “looks alive” after removal; unexpected heating or current.
- Root causes: missing reverse blocking, body-diode paths, unintended conduction through protection.
- Fix direction: true ideal-diode ORing or back-to-back FET strategy; log reverse-current events.
- Observation: repeating resets / reconnect loops during brownouts and cable drops.
- Root causes: Cbus too small, ORing delay, excessive path drop, PG too sensitive.
- Fix direction: increase effective window (reduce drop, add C, tune thresholds/blanking) and verify with waveforms.
Deliverable: switching validation checklist (must-capture waveforms)
- Vin_main and Vin_backup (relative timing and sag).
- Vbus (Vmin, dip duration, recovery time).
- Ipath (direction + peak during handover).
- PG/Reset (whether resets correlate with brief dips).
- ORing Vds / gate (confirm intended turn-on/off and reverse blocking).
- Vcrit (critical rail at the load, not only at the bus).
- Trigger: on Vbus falling edge or UVLO crossing for repeatable captures.
A switching design is considered robust only if it demonstrates (1) no reverse feed under worst-case source mismatch, and (2) no UVLO/PG crossings at the critical rail during the full handover envelope.
H2-9|PMBus Telemetry & Fault Logs: treating the power system as a “black box”
A robust edge device does not rely on guesswork during brownouts and trips. It collects a small but decisive set of telemetry signals and turns them into two outcomes: (1) online closed-loop protection (derate before a crash) and (2) offline forensics (reconstruct what happened right before the last reset or shutdown).
What to measure (minimum set that explains most failures)
- Vin_main / Vin_backup (or post-PD bus for PoE)
- Iin (or Pin if available)
- UV/brownout indication (sag window)
- Vbus (Vmin + dip duration)
- Ibus / Pbus (load migration signature)
- ORing / switch state (handover events)
- MCU rail, modem rail, storage rail (only the rails that define “alive”)
- PG/Reset reason (if accessible)
- Rail undervoltage flags
- eFuse / hot-swap: OCP/OTP/UV/OV flags
- Retry / hiccup counters, latch reason
- Vcap / Vbat (backup energy readiness)
Two values from the same telemetry: online loop vs offline forensics
- Soft derate: CPU frequency cap, radio TX power cap
- Peak shaping: postpone non-critical tasks, limit concurrent loads
- Selective disable: turn off high-peak peripherals first
- Safe shutdown when inevitable: persist state → power down cleanly
- Reconstruct a timeline: what changed first (Vin sag, OCP, OTP, PG drop)
- Capture a snapshot window: pre-event + post-event samples
- Use counters to detect patterns: repeated retries vs single hard trip
Event logging strategy: timestamp + reason code + pre/post window
A black-box log must do more than store a fault bit. It needs a timestamped event record, a cause code, and a bounded capture window around the event. The practical implementation is a ring buffer in RAM that keeps recent telemetry frames and freezes on triggers, then commits the frozen record to non-volatile memory.
| Field | Why it matters | Example content (compact) |
|---|---|---|
| Timestamp | Places all evidence on a single timeline | t = 123456 ms (monotonic) or RTC time |
| Event type + cause | Distinguishes source sag from protection action | UV / OCP / OTP / ORing switch / PG drop |
| Pre-trigger frames | Explains what was trending before the event | Last N frames of Vin/Vbus/I/T/Vcap |
| Post-trigger frames | Shows recovery attempt (or collapse behavior) | Next M frames after trigger |
| State snapshot | Converts raw waveforms into actionable clues | eFuse flags + retry count + latch reason |
Deliverable: “Telemetry → Action” mapping table (firmware-ready)
Thresholds below are expressed as “value + time window” to prevent false triggers. Replace with product-specific numbers.
| Anomaly signal | Detection rule (value + time) | Firmware action | What to log |
|---|---|---|---|
| Vin sag / brownout window | Vin < threshold for Δt (debounced) | Derate loads; prepare safe state; prefer backup path if available | Vin/Vbus/Iin + event stamp + mode |
| Vbus dip near UVLO | Vbus min < margin-to-UVLO | Peak shaping; temporary peripheral disable; PG blanking policy check | Vbus min, dip duration, PG state |
| Iin spike / repeated pulses | Iin peaks repeating within window | Limit peak, extend ramp, avoid repeated retries | Iin peaks + retry counter + cause flags |
| Protection warning (pre-OTP / near-limit) | T rising to warning band | Thermal derate; reduce duty; pause high-power features | T trend + current + power |
| Backup energy low | Vcap/Vbat below ready threshold | Reduce hold-up expectations; block certain peak actions | Vcap/Vbat + charge state |
Telemetry is successful only if it can answer three questions after any failure: (1) did the input collapse, (2) did protection act, and (3) which threshold or sequence happened first on the timeline.
H2-10|Field Debug Evidence: why swapping an adapter or cable can fix—or worsen—dropouts
In the field, “swap the adapter” often changes three hidden parameters: source current limiting behavior, output impedance, and transient response. A cable swap changes DC drop and pulse droop. The fastest way to converge is to follow an evidence priority: voltage first, current second, and state+temperature+logs last to confirm the diagnosis.
Evidence priority (must be executable in the field)
- Vin sag depth + duration (brownout window)
- Vbus minimum vs UVLO/PG thresholds
- Ripple growth during load events
- Inrush peak (start-up and retries)
- Repeating pulses (hiccup source or retry protection)
- Abnormal bursts aligned with resets
- eFuse/hot-swap flags: OCP/OTP/UV/OV
- Retry counters and latch reasons
- Thermal trend near protection limits
Diagnosis split (three outcomes)
| Outcome | What the waveforms usually show | What the logs/flags usually show | Action direction (within scope) |
|---|---|---|---|
| A) Source-limited | Vin collapses first; Vbus follows; dips repeat if the source hiccups | UV/brownout timestamps lead; protection may not trip first | Reduce peak demand; widen ramp; validate cable drop; confirm input margin |
| B) Inrush / peak-limited | Iin spike then Vin/Vbus dip; repeated current pulses indicate retries | OCP with retry count rising; “start succeeds warm but fails cold” patterns | Shape inrush; avoid repeated retries; reserve ESR droop; align thresholds |
| C) Policy mismatch | Vbus briefly dips near thresholds; PG/Reset toggles despite quick recovery | PG/Reset events correlate with short dips; UVLO/PG policy too aggressive | Tune debounce/blanking; ensure Vmin stays above UVLO; validate handover dynamics |
Deliverable: “10-minute triage card” (meter → scope → logs)
-
0–3 min: basic checks (meter / quick record).
Measure Vin and Vbus at light and heavy load. Note cable and connector temperature or discoloration if present.
-
3–7 min: capture waveforms (scope).
Capture Vin, Vbus, PG/Reset. Add Iin/Ipath if possible. Trigger on Vbus falling edge or threshold crossing.
-
7–10 min: confirm with state and logs.
Read last event: cause flags, retry counters, and the pre/post snapshots. Confirm what happened first on the timeline.
H2-11|Validation Plan: proving it is truly “more stable” (not just lucky)
A stable edge power + backup design must be verified as a repeatable system behavior, not a one-off demo. This plan converts “more stable” into measurable pass/fail criteria plus a mandatory evidence set (waveforms + fault logs) so every failure is classifiable and reproducible.
Define “stable” as testable metrics
- Cold start and hot start success across N cycles
- No protection misfire during ramp or load attach
- PG/Reset does not chatter near thresholds
- Brownout / jitter / dropout handled without reboot loops
- Backup switchover does not cross UVLO/PG trip points
- System enters derate / safe mode before collapse (if designed)
- Short/overload forces the intended action within time bounds
- No uncontrolled heating from repeated retries
- Fault cause is consistent with waveforms and logs
Example BOM material numbers (for building a verifiable platform)
The list below is a pragmatic “validation-friendly” BOM: protection devices with readable status, power-path mux/ORing, backup managers, telemetry parts, and non-volatile logging. Choose according to voltage/current, SOA, thermal limits, and availability.
TPS25982(TI) — eFuse with adjustable protection, diagnosticsTPS25947(TI) — eFuse family for inrush/OV/UV/OCP use-casesTPS2662(TI) — dual-channel eFuse-class protection for rails/branchesMAX17613(Analog Devices/Maxim) — protected high-voltage switch family (use-case dependent)
LTC4215(Analog Devices) — hot-swap controller (external MOSFET)LTC4217(Analog Devices) — hot-swap controller options for higher powerLTC4368(Analog Devices) — surge stopper / overvoltage protectionLTC4365(Analog Devices) — OV/UV + reverse protection controller class
TPS2121(TI) — power mux (priority + seamless switchover)TPS2120(TI) — power mux family option (ratings vary)LTC4412(Analog Devices) — ideal diode controller classLTC4359(Analog Devices) — ideal diode ORing controller
LTC3350(Analog Devices) — supercap backup controller + monitoringLTC4040(Analog Devices) — backup power manager for 2.5V–5V rails (use-case dependent)LTC3225(Analog Devices) — 2-cell supercap charger/balancer classMAX38888(Analog Devices/Maxim) — backup/supercap manager family (check rail domain)
INA238(TI) — current/voltage/power monitor (I²C, fast captures)INA229(TI) — precision monitor options (logging-friendly)LTC2946(Analog Devices) — power/energy monitor class (application dependent)ADT7420(Analog Devices) — digital temperature sensor (for OTP correlation)
LTC2977(Analog Devices) — PMBus power system manager (multi-rail)LTC2974(Analog Devices) — PMBus manager familyUCD9090A(TI) — system health monitor / supervisor with telemetry conceptsUCD90240(TI) — digital power sequencer / supervisor class
FM24CL64B(Infineon/Cypress) — I²C FRAM (write endurance friendly)FM25V10(Infineon/Cypress) — SPI FRAM option24LC256(Microchip) — I²C EEPROM (budget logging, slower writes)W25Q32JV(Winbond) — SPI NOR flash (bulk logs; manage wear)
LTC3225(Analog Devices) — 2-cell supercap charger + balanceLTC3350(Analog Devices) — system-level supercap controller + monitorTLV431(TI) — shunt reference used in simple balance/clamp topologiesLMV331(TI) — comparator class for simple UV/OV detect hooks
Validation Matrix (Test item → Setup → Pass/Fail → required waveforms/log fields)
Use this as an execution script. For each test, capture the evidence set and enforce the pass criteria. If a failure occurs, the waveform + log must produce a consistent “failure signature” (source-limited vs inrush-limited vs policy mismatch).
| Test item | Setup (conditions) | Stimulus / injection | Must-capture waveforms | Must-capture log fields | Pass / Fail criteria |
|---|---|---|---|---|---|
| Cold start | T = Tmin, load = typical, input = source A | Power-on ramp to nominal | Vin, Vbus, PG/Reset (Iin recommended) | Boot event stamp; any OCP/UV/OTP flags; retry counters | Pass: 100% boot success across N cycles; no false trips; no PG chatter |
| Hot start | T = after warm-up (steady thermal), load = typical | Power-cycle with short off-time | Vin, Vbus, PG/Reset, Iin | Cause of any trip; pre-trigger frames (Vin/Vbus/I/T) | Pass: no degradation vs cold start; no “retry heating” pattern |
| Input source variation | Same device/load; swap input sources (A/B/C) | Repeat start + load steps per source | Vin, Vbus, Iin, PG/Reset | Event timeline; UV window duration; source sag signature | Pass: stable across sources; if behavior changes, logs must clearly classify root cause |
| Max load + peak events | Load = max; enable known peak (radio TX / write / actuator) | Peak bursts while monitoring | Vbus min, Iin peaks, PG/Reset | Derate actions taken; peak limiter state; counters | Pass: no unexpected reset; no thermal runaway; controlled derate if designed |
| Brownout window | Operate near minimum input margin | Inject sag (depth + duration sweep) | Vin sag profile, Vbus response, PG/Reset | UV event; pre/post capture; recovery mode | Pass: no false trips outside defined boundary; within boundary → recovery strategy is correct and repeatable |
| Jitter / dropout | Nominal load; input toggles or intermittent contact | Short dropouts & bursts | Vin/Vbus/PG, Iin pulses | Event counters; retry count growth; any latch reason | Pass: no reboot loops; protection does not self-heat via repeated retries |
| Backup switchover (main→backup) | Main + backup present; backup charged/ready (Vcap/Vbat) | Remove main input | Vin_main, Vin_backup, Vbus, PG/Reset, Vcap/Vbat | ORing/switch event; Vcap/Vbat min; any UV flags | Pass: Vbus stays above UVLO/PG; switchover logged with correct time order |
| Backup switchover (backup→main) | Running on backup; main restored | Restore main input | Vin_main/Vbus/Iin, PG/Reset | Switch event; reverse-current flags if available | Pass: no shoot-through/contest; no PG dip; no reverse feed into main path |
| Short-circuit injection | Representative output/branch point (document the point) | Hard short for controlled duration | Vbus collapse response, Iin limit profile, device temperature trend | OCP cause; action type (latch/retry); retry count | Pass: intended action occurs within bounds; no uncontrolled heating; logs match waveforms |
| Overload injection | Step load beyond rated; then sustained overload | Load step + dwell | Vbus dip, Iin profile, PG/Reset behavior | Derate/limit flags; OCP/OTP warnings | Pass: no oscillatory reboot; controlled limit or shutdown with evidence |
| Reverse polarity (if allowed) | Only if design intent supports it | Apply reverse input within safe bounds | Input current behavior; protection response | Reverse flag/cause; latch reason | Pass: no damage; no uncontrolled current; event correctly logged |
Practical rule: a test only “counts” when the captured waveforms and the logged cause/timeline agree. If they disagree, the platform lacks observability.
H2-12|FAQs (12) — Edge Power & Backup
Each FAQ maps to the corresponding chapter (H2-3…H2-11) and stays within this page boundary: protection, hold-up, switchover, telemetry/logs, field evidence, and validation methodology.
1What is the clean engineering boundary between an eFuse and a hot-swap controller, and when is hot-swap mandatory?
- Evidence: startup Vbus ramp, Iin profile, and whether the pass element spends long time in linear mode.
- Rule: pick the device class by energy + thermal, not just steady-state current.
TPS25982, TPS2662 (TI) · Hot-swap/surge stopper: LTC4215, LTC4368 (Analog Devices)2Why can a system trip frequently even when average load current is small, and what are the top three root-cause classes?
- Inrush or burst peaks (capacitor charge, radio TX bursts, motor/relay events) that exceed the limit briefly.
- Threshold/blanking mismatch where ripple or short droops hit OCP/UV filters too aggressively.
- Thermal accumulation from repeated retries: each attempt adds heat even if “average” looks small.
INA238, INA229 (TI) · protection with diagnostics: TPS25947 (TI), LTC4217 (Analog Devices)3After load capacitance increases, should dV/dt or current limit be adjusted first?
- Evidence: compare peak Iin, ramp time, and pass-element temperature across settings.
TPS25982, TPS2662 (TI) · external-FET hot-swap: LTC4215 (Analog Devices), mux for controlled sourcing: TPS2121 (TI)4Foldback vs constant-current limiting: which is more likely to “start successfully but run hotter”?
- Evidence: log retry counts and temperature; correlate with Vbus ramp duration under limiting.
TPS25947, TPS25982 (TI) · hot-swap control: LTC4217 (Analog Devices) · telemetry: INA238 (TI)5Why do more retries make failures more likely to burn parts, and how should retry vs latched behavior be set safely?
TPS25982 (TI), LTC4215 (Analog Devices) · persistent logs: FM24CL64B (FRAM, Infineon/Cypress) · supervisor/health monitor: UCD9090A (TI)6In supercap hold-up sizing, how does ESR cause “the math is right but the hardware fails”?
LTC3350 (Analog Devices) · 2-cell charger/balancer: LTC3225 (Analog Devices) · current monitor: INA238 (TI) · power mux: TPS2121 (TI)7For hold-up, should V_lo be defined by the PG threshold or by the DC/DC UVLO threshold?
TPS3702 (TI) · system monitor/sequencer: UCD9090A (TI) · backup manager: LTC3350 (Analog Devices) · power mux: TPS2121 (TI)8If the system reboots during backup switchover, which three waveforms should be captured first?
- Vin_main and Vin_backup (or the mux/ORing input nodes) to confirm the source transition.
- Vbus to see whether the bus crosses UVLO/PG boundaries and for how long.
- PG/Reset to mark exactly when the system decides it is “not OK.”
TPS2121 (TI) · ideal diode: LTC4412 (Analog Devices) · monitor: INA238 (TI) · PMBus manager (logs): LTC2977 (Analog Devices)9How can backfeed from the backup rail into the main bus be prevented?
LTC4412, LTC4359 (Analog Devices) · power mux with priority: TPS2121 (TI) · reverse/OV protection class: LTC4365 (Analog Devices)10For PMBus telemetry, which fields are most valuable for a “black-box” record around power-loss events?
- VIN/VOUT, IIN/IOUT, input/output power, temperature (board/hotspot if possible).
- Status/fault flags: OCP/OV/UV/OTP, retry counters, last fault code.
- Energy store voltage (Vcap/Vbat) and switchover state.
LTC2977, LTC2974 (Analog Devices) · health monitor/sequencer: UCD9090A (TI) · non-volatile logs: FM24CL64B (FRAM, Infineon/Cypress)11When “swapping the adapter” makes the system stable (or worse), what does it usually imply—and should the input source or the local design be checked first?
- Voltage evidence: Vin droop depth and duration, ripple, and brownout window relative to UV thresholds.
- Current evidence: inrush peak, retry pulse train, and whether the source hits its own current limit.
- Status/thermal evidence: protection flags, OTP events, and retry counters over time.
INA238 (TI) · eFuse diagnostics: TPS25982 (TI) · surge stopper: LTC4368 (Analog Devices) · supervisor: TPS3702 (TI)12How should validation cases be designed to prove the architecture is truly more stable, not just “occasionally not failing”?
INA238 (TI) · PMBus logs: LTC2977 (Analog Devices) · persistent storage: FM25V10 (FRAM, Infineon/Cypress) · switchover element: TPS2121 (TI)Tip: keep each acceptedAnswer aligned with the evidence set (Vin/Vbus/PG/Iin/Vcap + flags + timestamped pre/post window) so Google snippets and field engineers see the same “truth source.”