← Back to: eFuse / Hot-Swap / OR-ing Protection
Definition & Role
This page frames the protection manager layer as three stacked responsibilities: Telemetry (normalize RAW to engineering units), Policy (limits, hysteresis, debounce), and Audit (PG/FAULT semantics and black-box logging). It abstracts below-layer eFuse, Hot-Swap, and Ideal-Diode devices without repeating device-level details.
- Chain:
RAW → ENG → POLICY → PG/FAULT → LOGwith explicit field names, units, and timebase. - Device boundary: devices expose registers/interrupts; the manager normalizes units, applies limits, latches faults, and writes events.
- Small-batch procurement: a unified API (names, units, scaling) lowers cross-brand swap risk.
Scope
API & semantics layer: telemetry normalization, threshold policy, PG/FAULT states, ring-buffer events, last-gasp write-back, and field mapping.
Non-Goals
No SOA math, MOSFET selection, gate charge tuning, or Ideal-Diode reverse-current criteria—those belong to sibling device pages.
Terminology
- PG
- Power-Good state (good/warn/bad).
- FAULT
- Latched abnormal condition with code.
- Debounce
- Time requirement before entering/exiting a state.
- Hysteresis
- Separated enter/exit thresholds to avoid chatter.
- Last-Gasp
- Guaranteed write-back before power loss.
- Ring Buffer
- Head/tail indexed event log with CRC.
Telemetry Schema
Use lowercase snake_case; all values are integers with unit suffixes (mV, mA, °C, ms). Provide UTC millisecond timestamps for samples or batch headers.
Sampling & Precision
- Voltage: 1–5 mV/LSB; Current: 1–10 mA/LSB; Temperature: 1 °C/LSB.
- Refresh via 10–100 ms windows with
avg_k,p95,max. - Calibration: zero/gain two-point; verify by read-back.
Debounce & Hysteresis
- Enter:
time_above(H) ≥ Td_enter; Exit:time_below(L) ≥ Td_exit. - Hysteresis:
H − L ≥ Δmin(≥1% Vnom / 5% Inom typical). - Thermal events use longer
Td_enter/Td_exit.
Missing/Edge Cases
- If
iout_mAabsent:iout_mA ≈ round(shunt_mV*1000/Rsense_mΩ). - Mark
quality_flag= valid | inferred | stale. - Saturation: set
saturated=1and clamp last reliable value.
Validation & KPIs
- Unit consistency: all exported values are integers with suffixes (mV/mA/°C/ms).
- Chatter robustness: with ±1–2% threshold wobble, PG transitions remain stable; event rate is limited.
- Cross-brand alignment: voltage/current curves within ≤2%; inferred values within ≤5% after calibration.
PMBus / I²C Mapping
Normalize vendor-specific PMBus/I²C registers into a stable manager API. Keep device-level semantics aligned: measurement (V/I/T), status (PG/STATUS), fault limits, response behavior, and private extensions for black-box and last-gasp.
PMBus essentials
- READ_VOUT, READ_IOUT, READ_TEMPERATURE_1
- STATUS_WORD / STATUS_BYTE, MFR_STATUS
- VOUT_OV_FAULT_LIMIT, IOUT_OC_FAULT_LIMIT, OT_FAULT_LIMIT
- FAULT_RESPONSE, PAGE (multi-rail)
I²C private extensions
- BB_EVT_HEAD / BB_EVT_TAIL / BB_EVT_READ(n)
- LAST_GASP_EN, LG_CAP_VHI / LG_CAP_VLO
- EVT_MASK, CLEAR_LATCHED_FAULT
Configuration transaction
- Write limits (OV/OC/OT, hysteresis, debounce).
- Read-back & verify; enable PEC/CRC.
- Commit marker (status bit or MFR flag).
SMBus timing & timeout; I²C 100/400 kHz.
Polling vs. interrupt
- Poll slow variables (T/avg current).
- Interrupt for fast events (SC/UV/OV) and take an immediate snapshot: V/I/T + STATUS + unified fault_code.
Reliability
- Write->read-back match ≥ 99.9% with retry/backoff.
- Pre-shutdown commit check before last-gasp.
- Shadow registers on MCU for consistency.
Black-Box Events
Minimal event set
- OV / UV / OC / SC / OT
- PG lost, reverse current, foldback
- Restart source: cold-start / remote / thermal
Entry structure (16–24 B)
{ ts_utc32, code16, v_mV16, i_mA16, t_c8, detail16, crc8 }
code16 = domain(hi8) + subcode(lo8), e.g., OV=0x01, UV=0x02; subcode=enter/exit/latch/source.
Capacity & compression
- N = 16 / 32 / 64 depth; size ≈ N × entry_size.
- Collapse same-type events; rate limit with min Δt.
- Stage summary: merge enter/exit into one period log.
Readout protocol
- HEAD/TAIL pointers; BB_EVT_READ(n) for bursts.
- Clear policy: on_read vs explicit_clear.
- CRC on every entry; mark
crc_okin diagnostics.
Storm suppression
- min_delta_t between identical domains.
- Collapse window merges back-to-back toggles.
- Hot events prioritized for last-gasp write-back.
Validation & KPIs
- CRC pass rate ≥ 99.9% under UV/SC/thermal injection.
- Lost-entry rate < 0.5% at full depth.
- READ(n) for full depth ≤ 10 ms @ 400 kHz.
Last-Gasp Write-Back
Guarantee writing back N critical events plus one CRC before power collapses. Compute energy and timing budgets, then enforce a graceful shutdown sequence with downgrade strategies if the hold-up is insufficient.
E_req = N_evt·E_write + E_crc + E_handshake + MarginE_cap = 0.5·C·(V_hi² − V_lo²)must satisfyE_cap ≥ E_reqt_total = N_evt·t_write + t_crc + t_flush ≤ t_hold
Shutdown sequence
- Detect power-fail IRQ → freeze sampling.
- Serialize key events (add latest V/I/T snapshot).
- Write
Nentries → write CRC → set commit flag. - Assert power-off policy (gate disable / foldback).
Degrade on short hold-up
- Reduce
NtoN’(prioritize SC/OC/UV > OT > PG). - Replace edges with “phase summary”.
- Mark
truncated=1for audit.
Temperature & ESR
Low temperature raises ESR → reduces usable energy and bus voltage. Compensate via higher V_hi, lower N, or earlier power-fail threshold.
Validation & KPIs
- Three collapse shapes (linear/exponential/step): success ≥ 99% (−20~85 °C).
- CRC pass rate ≥ 99.9%; timing error ≤ 10%.
- On truncate: tail marker + reason code readable at next boot.
BOM remarks (small-batch)
Specify C,V_hi,V_lo,N,CRC=on,clear_policy. Substitutions must be within TI / ST / NXP / Renesas / onsemi / Microchip / Melexis and re-validated against E_cap/E_req and t_hold.
PG/FAULT Normalization
Unified fields
pg_state ∈ {good, warn, bad}fault_code(domain + subcode)fault_latched ∈ {0,1},clear_policy ∈ {auto, remote, power_cycle}
Debounce & hysteresis
Enter when time_above(H) ≥ Td_enter; exit when time_below(L) ≥ Td_exit. Use a middle “warn” band for predictive maintenance (rising temp, contact resistance, nearing OC).
Latch & clear
After fault_latched=1, do not auto-clear even if the signal returns to a safe window; require a remote clear once in_safe_window & dwell ≥ Td_exit & no_new_event.
Vendor bitfields → unified map
- STATUS_WORD/BYTE bits → domains: OV, UV, OC, SC, OT, RCB, etc.
- Boolean PG → derive “warn” via thresholds; tri-state PG → direct mapping.
Validation & SLAs
- Edge alignment across vendors ≤ 5 ms under identical waveforms.
- False-alarm ≤ 0.1% with ±1–2% threshold wobble.
- “warn” must be visible via API within 1 s.
Logs & reporting
- Every transition good↔warn↔bad emits
{ts, cause, code, latched}. - Warn dwell time rolls into a health score; expose via telemetry.
- Remote clear action is logged with operator/host ID (optional).
Cloud Mapper
Unify heterogeneous PMBus/I²C telemetry from seven brands into a single cloud schema. Provide lossless field mapping, quality flags, and deterministic fallbacks for missing signals.
Minimum viable schema
- vin_mV, vout_mV, iout_mA, shunt_mV, die_temp_c
- pg_state ∈ {good,warn,bad}, fault_code, fault_latched
- policy_state, uptime_ms, fw_rev, device_profile, schema_rev
- quality: valid{0|1}, stale{0|1}, src{direct|derived}
All fields integer with unit suffix; UTC timestamp in ms; 10–100 ms bucketed sampling.
Derivation rules
- iout_mA ← shunt_mV / Rsense(mΩ); mark
src=derived. - board temp fallback ← die_temp_c + ΔT_profile.
- p_out_mW = vout_mV * iout_mA / 1000; windowed energy integration.
Post-calibration error bounds ≤ 5% for derived current.
Versioning & compatibility
- schema_rev backward compatible; new fields default valid=0, stale=1.
- device_profile pins vendor/series/page/PEC mapping.
- Charts must show continuous lines when mixed-brand uploads occur.
Brand → PN examples (1/3)
- TI LM5066I (PMBus): direct READ_* / STATUS_* mapping; PAGE per-rail.
- TI TPS25985 (I²C eFuse): limits + status; derive current via shunt chain.
- Renesas ISL6146A + ISL28022: Hot-swap + I²C current monitor, easy iout_mA.
Brand → PN examples (2/3)
- Microchip PAC1934 + MIC25404: quad power monitor + limiter.
- onsemi NIS5021/NIS5420 + FAN4010: eFuse + current sense.
- ST STEF01/12 + TSC2010/2020 + STTS22H: eFuse + current + temp.
Brand → PN examples (3/3)
- NXP PCA9450 / PF series: PMIC telemetry mapped via profile.
- Melexis MLX91220/91221/91230: hall current sensors → iout_mA.
- Melexis MLX90632: board temp fallback source.
Validation
- Mixed-brand uploads render continuous charts with no breaks.
- Derived current error ≤ 5% after calibration.
- Update rate capped ≤ 2 Hz under oscillation to avoid storms.
Small-Batch Procurement Hooks
BOM note (copy & paste)
Manager must expose vin_mV, vout_mV, iout_mA, die_temp_c, pg_state, fault_code. Black-box ring ≥ 32 entries. Last-gasp enabled with ≥ 2 event writes + CRC. PG/FAULT latched semantics required. Cloud mapper profile = efuse_mgr_v1. Cross-brand swap limited to TI / ST / NXP / Renesas / onsemi / Microchip / Melexis and requires updated mapping before use.
Compliance checklist
- Fields complete; units/scale/time-base consistent.
- Event codes compatible; remote clear supported; latched semantics match.
- Last-gasp passes three collapse replays.
- Cloud mapping updated; charts show no discontinuity.
Concrete PNs & rationale (1/2)
- TI LM5066I — PMBus native; telemetry/limits map 1:1.
- TI TPS25985 — I²C eFuse; limits/status + shunt chain for current.
- Renesas ISL6146A + ISL28022 — hot-swap + monitor; complete V/I/T via I²C.
- Microchip PAC1934 + MIC25404 — power monitor + limiter; small-batch friendly.
Concrete PNs & rationale (2/2)
- onsemi NIS5021/NIS5420 + FAN4010 — eFuse + current sense; PG normalized in MCU.
- ST STEF01/12 + TSC2010/2020 + STTS22H — protection + current + temp.
- NXP PCA9450 / PF series — PMIC telemetry; map via device_profile.
- Melexis MLX91220/91221/91230 — current sensors for iout_mA; MLX90632 for temp fallback.
Validation & Diagnostics
Establish offline log replay and boundary-condition fault injection to verify consistency of thresholds, debounce/hysteresis, and PG/FAULT edges across brands.
Key performance indicators
- Cross-brand PG edge alignment < 5 ms under identical stimulus.
vout_mV/iout_mAcurve error < 2% (RMS or peak, defined below).- Black-box integrity: CRC pass ≥ 99.9%; storm-throttled logs retain at least one enter and exit per fault class.
Error definitions
RMS error on window W: √(Σ((x_ref−x)/x_ref)² / |W|). Peak error is max absolute deviation over W.
Offline log replay
- Join black-box events
{ts, code, v_mV, i_mA, t_c, detail, crc}with bucketed telemetry (10–100 ms). - Reconstruct
vout_mV/iout_mAvia segmented linear or cubic spline. - Recompute golden PG using chapter-6 rules:
H/L,Td_enter/exit, hysteresis/warn band. - Compare actual vs golden edges (time delta, jitter histogram); export artifacts.
Boundary fault injection
- Cold start (post last-gasp), hot-swap with contact bounce, brown-out (linear/exp/step).
- R_contact drift to accumulate warn; light-load oscillation near loop margin.
- Negative tests: out-of-order timestamps, duplicates, CRC error → must be rejected and reason logged.
Outputs & reporting
- Replay YAML: scenario, seed, window_ms, collapse_type{linear,exp,step}, R_contact(mΩ/t).
- Comparison CSV: brand, pn, metric, value, limit, pass.
- Regression gate: every mapping/threshold change re-runs full replay + injections.
Regression gate
Any change to mapping, thresholds, debounce/hysteresis, or last-gasp policies triggers full replay + injection. Release only when all KPIs pass with headroom and charts show no discontinuity.
Cross-Brand Alternatives (Stable API)
Stable contract
- Field names/units/time base fixed: lowercase snake_case with unit suffix.
- Event code space
domain:subcodegrows only by addition; no re-use. clear_policy ∈ {auto, remote, power_cycle}invariant across swaps.- New fields start with
valid=0,stale=1to preserve chart continuity.
Migration steps (A → B)
- Create
device_profilefor B (bitfields, PAGE, PEC/CRC, thresholds). - Shadow compare: A is primary; B logs in parallel on same rail(s).
- Calibrate
H/L,Td_enter/exit, warn band to meet KPIs. - Gray switch: small traffic to B while dual-logging persists.
- Full switch: B primary; keep A as rollback for a window.
- Archive: freeze A’s profile and mapping for historical reports.
Rollback & auditing
- Instant rollback to A without API change or chart breaks.
- Dual-write comparison on ΔPG_edge, Δcurve, Δfault_hist with thresholds.
- Every switch/rollback emits a black-box event with operator and reason code.
Migration runbook
- Define device_profile for B; bind mapping.
- Enable shadow; collect Δ metrics for ≥ 24 h typical load.
- Calibrate thresholds/timers until KPIs pass with margin.
- Gray switch > 20%; monitor alarms; then full switch.
Deliverables
- Profile diff table: brand, pn, bitfield_map, thresholds, debounce, hysteresis.
- Stability report: pre/post KPIs and exception samples.
- Audit trail: switch/rollback events with operator IDs.
FAQ
Frequently asked questions about the PMBus/I²C protection manager layer (Telemetry · Policy · Audit). Answers are scoped to this page and match the JSON-LD exactly.
Why normalize PG/FAULT semantics across vendors?
Normalization makes policies portable and verifiable. Mapping diverse boolean/tri-state and latched behaviors into pg_state{good|warn|bad}, fault_code, and fault_latched keeps thresholds, debounce, and clear rules brand-agnostic. It enables cross-brand shadow comparisons, consistent charts, and automated regression gates without rewriting higher-level logic or retraining alert receivers.
How many black-box entries are enough for field diagnostics?
For small-batch systems, ≥32 entries is a practical floor: enter/exit pairs for several fault types plus housekeeping transitions. If unattended intervals are long or events are bursty, raise depth to 64/128 and enable storm-throttling. Each entry should store timestamp, domain:subcode, compact V/I/T snapshot, detail bits, and CRC for integrity.
What’s a safe last-gasp budget for two event writes and a CRC?
Size energy so E_cap ≥ E_req = N_evt·E_write + E_crc + E_handshake + margin with N_evt=2. Validate under linear, exponential, and step collapses. Account for temperature-dependent ESR and minimum hold-up. If margin is tight, persist one compressed summary plus CRC, then queue deferred details for upload after reboot completes successfully.
How do I prevent event-storm during threshold chatter?
Combine hysteresis windows H/L and Td_enter/exit with a per-class rate limiter. Coalesce identical back-to-back entries, carrying counts and last timestamp. Guarantee at least one enter and exit per storm. For borderline conditions, raise warn instead of repeatedly toggling bad, reducing flapping without masking genuine escalations.
Which PMBus registers should I log at each fault snapshot?
Capture STATUS_WORD/STATUS_BYTE, relevant *_FAULT_LIMIT, current READ_VOUT, READ_IOUT, READ_TEMPERATURE_1, and MFR_STATUS if present, plus page via PAGE. Add manager-side policy_state and debounce timers. This minimal set reconstructs decision boundaries and lets you replay expected pg_state against golden rules for that rail.
Can I infer missing fields (e.g., shunt current) from available data?
Yes—derive iout_mA = shunt_mV / Rsense(mΩ) when current is absent, mark src=derived, and publish an error bound (≤5% post-calibration). Temperature can fall back to die_temp_c + ΔT_profile. Derived values never overwrite originals and must carry valid/stale flags plus calibration revision in metadata for auditability.
How do I prove the manager wrote logs before power went away?
Use a two-phase last-gasp: persist events to non-volatile storage, then append a commit marker with CRC and monotonic counter. On next boot, verify the latest marker and continuity. Record a dedicated “power-loss imminent” event before writes begin; absence of the commit indicates partial or failed persistence requiring diagnostic attention.
Best practice to clear latched faults remotely?
Expose a clear request that only succeeds when measurements return within safe windows for Td_exit. Log operator, reason, and a pre-clear snapshot. If hardware mandates power-cycle clear, express it via clear_policy. Never auto-clear on transient reads; pair remote clear with a brief, rate-limited PG grace period to avoid flapping.
Procurement: What must be in the BOM to allow cross-brand swap?
Require vin_mV, vout_mV, iout_mA, die_temp_c, pg_state, fault_code; black-box depth ≥32; last-gasp with two writes plus CRC; and latched semantics with remote clear. Pin schema_rev and reference device_profile. Swaps are limited to TI, ST, NXP, Renesas, onsemi, Microchip, Melexis and require updated mapping.
Testing: How to replay logs to reproduce a field failure?
Align black-box events and bucketed telemetry on a common UTC-ms axis. Recompute golden PG using thresholds, hysteresis, and debounce, then overlay measured edges. Inject boundary conditions—brown-out profile, contact bounce, light-load oscillation—to bracket the failure. Export CSV and plots to compare edge deltas and curve error for pass/fail decisions.
How to time-align multi-rail events from different managers?
Use monotonic uptime_ms plus UTC anchors to correct clock skew. When only uptime exists, align on distinctive transitions—such as simultaneous PG drops—and refine by minimizing edge-to-edge deltas. Propagate the derived offset into charts and CSV exports so downstream analytics operate on a single, coherent timeline.
What’s the fallback when a vendor lacks PMBus but has only I²C?
Adopt an I²C profile mirroring PMBus fields: publish integer physical units with the same names, implement snapshot reads, and map faults to domain:subcode. If limits are write-only, emulate readback via cached policy state. Mark such sources as src=direct_i2c to distinguish them while keeping the manager API unchanged and charts continuous.