123 Main Street, New York, NY 10001

← Back to: eFuse / Hot-Swap / OR-ing Protection

Definition & Role

This page frames the protection manager layer as three stacked responsibilities: Telemetry (normalize RAW to engineering units), Policy (limits, hysteresis, debounce), and Audit (PG/FAULT semantics and black-box logging). It abstracts below-layer eFuse, Hot-Swap, and Ideal-Diode devices without repeating device-level details.

  • Chain: RAW → ENG → POLICY → PG/FAULT → LOG with explicit field names, units, and timebase.
  • Device boundary: devices expose registers/interrupts; the manager normalizes units, applies limits, latches faults, and writes events.
  • Small-batch procurement: a unified API (names, units, scaling) lowers cross-brand swap risk.

Scope

API & semantics layer: telemetry normalization, threshold policy, PG/FAULT states, ring-buffer events, last-gasp write-back, and field mapping.

Non-Goals

No SOA math, MOSFET selection, gate charge tuning, or Ideal-Diode reverse-current criteria—those belong to sibling device pages.

Terminology

PG
Power-Good state (good/warn/bad).
FAULT
Latched abnormal condition with code.
Debounce
Time requirement before entering/exiting a state.
Hysteresis
Separated enter/exit thresholds to avoid chatter.
Last-Gasp
Guaranteed write-back before power loss.
Ring Buffer
Head/tail indexed event log with CRC.
Telemetry Pipeline Overview Pipeline from RAW to ENG units, Policy thresholds and debounce, into PG/FAULT abstraction and finally into a LOG ring buffer. RAW ADC / Registers ENG Units & Scaling POLICY Limits · Hysteresis · Debounce PG / FAULT good · warn · bad LOG Ring Buffer + CRC
Pipeline — RAW → ENG → POLICY → PG/FAULT → LOG

Telemetry Schema

Use lowercase snake_case; all values are integers with unit suffixes (mV, mA, °C, ms). Provide UTC millisecond timestamps for samples or batch headers.

vin_mV
Input voltage (mV)
vout_mV
Output voltage (mV)
iout_mA
Output current (mA); infer from shunt if absent
shunt_mV
Shunt drop (mV) for cross-check/inference
die_temp_c
Die temperature (°C)
pg_state
good | warn | bad
fault_code
16-bit domain + subcode
fault_latched
0/1; requires clear policy
policy_state
normal | limit | foldback | shutdown
uptime_ms
Device uptime in ms (uint32)
fw_rev
Firmware revision (short string)

Sampling & Precision

  • Voltage: 1–5 mV/LSB; Current: 1–10 mA/LSB; Temperature: 1 °C/LSB.
  • Refresh via 10–100 ms windows with avg_k, p95, max.
  • Calibration: zero/gain two-point; verify by read-back.

Debounce & Hysteresis

  • Enter: time_above(H) ≥ Td_enter; Exit: time_below(L) ≥ Td_exit.
  • Hysteresis: H − L ≥ Δmin (≥1% Vnom / 5% Inom typical).
  • Thermal events use longer Td_enter / Td_exit.

Missing/Edge Cases

  • If iout_mA absent: iout_mA ≈ round(shunt_mV*1000/Rsense_mΩ).
  • Mark quality_flag = valid | inferred | stale.
  • Saturation: set saturated=1 and clamp last reliable value.
Field Dictionary & Hysteresis Left: standard telemetry field cards. Right: threshold with hysteresis and debounce windows to stabilize PG transitions. vin_mV Input voltage (mV) vout_mV Output voltage (mV) iout_mA Output current (mA) shunt_mV Shunt drop (mV) die_temp_c Die temperature (°C) pg_state good | warn | bad fault_code 16-bit domain + subcode fault_latched 0/1 policy_state normal | limit | foldback | shutdown uptime_ms Device uptime H (enter) L (exit) Td_exit Td_enter
Field dictionary (left) and hysteresis/debounce stabilization (right)

Validation & KPIs

  • Unit consistency: all exported values are integers with suffixes (mV/mA/°C/ms).
  • Chatter robustness: with ±1–2% threshold wobble, PG transitions remain stable; event rate is limited.
  • Cross-brand alignment: voltage/current curves within ≤2%; inferred values within ≤5% after calibration.

PMBus / I²C Mapping

Normalize vendor-specific PMBus/I²C registers into a stable manager API. Keep device-level semantics aligned: measurement (V/I/T), status (PG/STATUS), fault limits, response behavior, and private extensions for black-box and last-gasp.

PMBus essentials

  • READ_VOUT, READ_IOUT, READ_TEMPERATURE_1
  • STATUS_WORD / STATUS_BYTE, MFR_STATUS
  • VOUT_OV_FAULT_LIMIT, IOUT_OC_FAULT_LIMIT, OT_FAULT_LIMIT
  • FAULT_RESPONSE, PAGE (multi-rail)

I²C private extensions

  • BB_EVT_HEAD / BB_EVT_TAIL / BB_EVT_READ(n)
  • LAST_GASP_EN, LG_CAP_VHI / LG_CAP_VLO
  • EVT_MASK, CLEAR_LATCHED_FAULT

Configuration transaction

  1. Write limits (OV/OC/OT, hysteresis, debounce).
  2. Read-back & verify; enable PEC/CRC.
  3. Commit marker (status bit or MFR flag).

SMBus timing & timeout; I²C 100/400 kHz.

Polling vs. interrupt

  • Poll slow variables (T/avg current).
  • Interrupt for fast events (SC/UV/OV) and take an immediate snapshot: V/I/T + STATUS + unified fault_code.

Reliability

  • Write->read-back match ≥ 99.9% with retry/backoff.
  • Pre-shutdown commit check before last-gasp.
  • Shadow registers on MCU for consistency.
Register Map Join — PMBus / I²C to Unified API Left: Vendor A/B/C command sets. Right: Unified API table. Center: join arrows; notes for PAGE/PEC/CRC/Timeout. Vendor A READ_VOUT READ_IOUT READ_TEMPERATURE_1 STATUS_WORD VOUT_OV_FAULT_LIMIT IOUT_OC_FAULT_LIMIT OT_FAULT_LIMIT FAULT_RESPONSE MFR_STATUS Vendor B (I²C ext) BB_EVT_HEAD / TAIL BB_EVT_READ(n) LAST_GASP_EN LG_CAP_VHI / VLO EVT_MASK CLEAR_LATCHED_FAULT Vendor C PAGE (multi-rail) PEC / CRC Timeout / Retry Shadow mirror Snapshot on IRQ Unified API api_name unit R/W PEC page notes read(“vout_mV”)mVRyesoptREAD_VOUT read(“iout_mA”)mARyesoptREAD_IOUT / shunt read(“die_temp_c”)°CRyesREAD_TEMPERATURE_1 status()RyesSTATUS_WORD/BYTE write_limit(“vout_ov_fault”)mVWyesper pageOV limit + hyst + Td write_limit(“iout_oc_fault”)mAWyesper pageOC limit + foldback write_limit(“ot_fault”)°CWyesOT limit fault_response()R/WyesFAULT_RESPONSE bb.read(n)bytesRcrcBB_EVT_READ bb.head/tailidxRcrcHEAD/TAIL bb.clear()WcrcCLEAR_LATCHED_FAULT last_gasp.enable()WcrcLAST_GASP_EN last_gasp.cap(VHI/VLO)mVWcrcLG_CAP_VHI/VLO Notes: PAGE per-rail, PEC/CRC recommended, timeout & retry with backoff.
Register map join — vendors to a unified manager API

Black-Box Events

Minimal event set

  • OV / UV / OC / SC / OT
  • PG lost, reverse current, foldback
  • Restart source: cold-start / remote / thermal

Entry structure (16–24 B)

{ ts_utc32, code16, v_mV16, i_mA16, t_c8, detail16, crc8 }

code16 = domain(hi8) + subcode(lo8), e.g., OV=0x01, UV=0x02; subcode=enter/exit/latch/source.

Capacity & compression

  • N = 16 / 32 / 64 depth; size ≈ N × entry_size.
  • Collapse same-type events; rate limit with min Δt.
  • Stage summary: merge enter/exit into one period log.

Readout protocol

  • HEAD/TAIL pointers; BB_EVT_READ(n) for bursts.
  • Clear policy: on_read vs explicit_clear.
  • CRC on every entry; mark crc_ok in diagnostics.

Storm suppression

  • min_delta_t between identical domains.
  • Collapse window merges back-to-back toggles.
  • Hot events prioritized for last-gasp write-back.

Validation & KPIs

  • CRC pass rate ≥ 99.9% under UV/SC/thermal injection.
  • Lost-entry rate < 0.5% at full depth.
  • READ(n) for full depth ≤ 10 ms @ 400 kHz.
Ring Buffer Layout — Black-Box Events Left: circular buffer with HEAD/TAIL. Right: single entry fields with CRC. Bottom: rate-limit and collapse strategy. Ring Buffer N = 16 / 32 / 64 HEAD TAIL Event Entry (16–24 B) ts_utc32 — UTC timestamp (ms) code16 — domain(hi8) + subcode(lo8) v_mV16 — voltage snapshot i_mA16 — current snapshot t_c8 — die temperature detail16 — auxiliary info crc8 — integrity check Clear policy: on_read vs explicit_clear · Last-gasp writes minimum 2 entries + CRC Storm suppression min Δt collapse window merged summary
Ring buffer layout — head/tail pointers, compact entry fields, and storm suppression

Last-Gasp Write-Back

Guarantee writing back N critical events plus one CRC before power collapses. Compute energy and timing budgets, then enforce a graceful shutdown sequence with downgrade strategies if the hold-up is insufficient.

  • E_req = N_evt·E_write + E_crc + E_handshake + Margin
  • E_cap = 0.5·C·(V_hi² − V_lo²) must satisfy E_cap ≥ E_req
  • t_total = N_evt·t_write + t_crc + t_flush ≤ t_hold

Shutdown sequence

  1. Detect power-fail IRQ → freeze sampling.
  2. Serialize key events (add latest V/I/T snapshot).
  3. Write N entries → write CRC → set commit flag.
  4. Assert power-off policy (gate disable / foldback).

Degrade on short hold-up

  • Reduce N to N’ (prioritize SC/OC/UV > OT > PG).
  • Replace edges with “phase summary”.
  • Mark truncated=1 for audit.

Temperature & ESR

Low temperature raises ESR → reduces usable energy and bus voltage. Compensate via higher V_hi, lower N, or earlier power-fail threshold.

Energy & Timing Budget for Last-Gasp Write-Back Top: E_cap vs E_req comparison with headroom. Bottom: Gantt timeline of Write×N, CRC, Flush versus t_hold. Energy budget E_cap E_req headroom = E_cap − E_req Timing budget Write × N_evt CRC Flush t_hold t_total = N_evt·t_write + t_crc + t_flush
Energy (top) and timing (bottom) budgets for last-gasp write-back

Validation & KPIs

  • Three collapse shapes (linear/exponential/step): success ≥ 99% (−20~85 °C).
  • CRC pass rate ≥ 99.9%; timing error ≤ 10%.
  • On truncate: tail marker + reason code readable at next boot.

BOM remarks (small-batch)

Specify C,V_hi,V_lo,N,CRC=on,clear_policy. Substitutions must be within TI / ST / NXP / Renesas / onsemi / Microchip / Melexis and re-validated against E_cap/E_req and t_hold.

PG/FAULT Normalization

Unified fields

  • pg_state ∈ {good, warn, bad}
  • fault_code (domain + subcode)
  • fault_latched ∈ {0,1}, clear_policy ∈ {auto, remote, power_cycle}

Debounce & hysteresis

Enter when time_above(H) ≥ Td_enter; exit when time_below(L) ≥ Td_exit. Use a middle “warn” band for predictive maintenance (rising temp, contact resistance, nearing OC).

Latch & clear

After fault_latched=1, do not auto-clear even if the signal returns to a safe window; require a remote clear once in_safe_window & dwell ≥ Td_exit & no_new_event.

Vendor bitfields → unified map

  • STATUS_WORD/BYTE bits → domains: OV, UV, OC, SC, OT, RCB, etc.
  • Boolean PG → derive “warn” via thresholds; tri-state PG → direct mapping.

Validation & SLAs

  • Edge alignment across vendors ≤ 5 ms under identical waveforms.
  • False-alarm ≤ 0.1% with ±1–2% threshold wobble.
  • “warn” must be visible via API within 1 s.
PG/FAULT Normalization Map Left: three vendor mini state machines. Right: unified good/warn/bad state machine with debounce, hysteresis, and clear policy. Vendor A good warn bad Vendor B (bool PG) pg=1 pg=0 Vendor C (latched) ok fault_latched Unified PG/FAULT state machine good warn bad Debounce: enter when time_above(H) ≥ Td,enter; exit when time_below(L) ≥ Td,exit Hysteresis: separate H/L thresholds; warn band between Hwarn and Lwarn Clear policy: auto | remote | power_cycle remote requires safe_window & dwell & no_new_event STATUS bits → domains: OV, UV, OC, SC, OT, RCB … → fault_code (domain:subcode)
Vendor semantics (left) normalized to a unified good/warn/bad state machine (right) with debounce, hysteresis, and clear policy.

Logs & reporting

  • Every transition good↔warn↔bad emits {ts, cause, code, latched}.
  • Warn dwell time rolls into a health score; expose via telemetry.
  • Remote clear action is logged with operator/host ID (optional).

Cloud Mapper

Unify heterogeneous PMBus/I²C telemetry from seven brands into a single cloud schema. Provide lossless field mapping, quality flags, and deterministic fallbacks for missing signals.

Minimum viable schema

  • vin_mV, vout_mV, iout_mA, shunt_mV, die_temp_c
  • pg_state ∈ {good,warn,bad}, fault_code, fault_latched
  • policy_state, uptime_ms, fw_rev, device_profile, schema_rev
  • quality: valid{0|1}, stale{0|1}, src{direct|derived}

All fields integer with unit suffix; UTC timestamp in ms; 10–100 ms bucketed sampling.

Derivation rules

  • iout_mA ← shunt_mV / Rsense(mΩ); mark src=derived.
  • board temp fallback ← die_temp_c + ΔT_profile.
  • p_out_mW = vout_mV * iout_mA / 1000; windowed energy integration.

Post-calibration error bounds ≤ 5% for derived current.

Versioning & compatibility

  • schema_rev backward compatible; new fields default valid=0, stale=1.
  • device_profile pins vendor/series/page/PEC mapping.
  • Charts must show continuous lines when mixed-brand uploads occur.

Brand → PN examples (1/3)

  • TI LM5066I (PMBus): direct READ_* / STATUS_* mapping; PAGE per-rail.
  • TI TPS25985 (I²C eFuse): limits + status; derive current via shunt chain.
  • Renesas ISL6146A + ISL28022: Hot-swap + I²C current monitor, easy iout_mA.

Brand → PN examples (2/3)

  • Microchip PAC1934 + MIC25404: quad power monitor + limiter.
  • onsemi NIS5021/NIS5420 + FAN4010: eFuse + current sense.
  • ST STEF01/12 + TSC2010/2020 + STTS22H: eFuse + current + temp.

Brand → PN examples (3/3)

  • NXP PCA9450 / PF series: PMIC telemetry mapped via profile.
  • Melexis MLX91220/91221/91230: hall current sensors → iout_mA.
  • Melexis MLX90632: board temp fallback source.
7-Brand Cloud Mapping Left: seven brand lanes. Center: arrows converging. Right: unified schema with units and quality flags. TI ST NXP Renesas onsemi Microchip Melexis Unified cloud schema field unit src valid stale notes vin_mVmVdirect10PAGE-aware vout_mVmVdirect10bucket 10–100 ms iout_mAmAderived10shunt_mV/Rsense die_temp_c°Cdirect10fallback for board pg_statemapped10good/warn/bad fault_codemapped10domain:subcode fault_latchedmapped10clear policy schema_revmeta10backward compatible device_profilemeta10vendor/series/pages
Cloud-side mapping unifies telemetry from TI, ST, NXP, Renesas, onsemi, Microchip, Melexis

Validation

  • Mixed-brand uploads render continuous charts with no breaks.
  • Derived current error ≤ 5% after calibration.
  • Update rate capped ≤ 2 Hz under oscillation to avoid storms.

Small-Batch Procurement Hooks

BOM note (copy & paste)

Manager must expose vin_mV, vout_mV, iout_mA, die_temp_c, pg_state, fault_code. Black-box ring ≥ 32 entries. Last-gasp enabled with ≥ 2 event writes + CRC. PG/FAULT latched semantics required. Cloud mapper profile = efuse_mgr_v1. Cross-brand swap limited to TI / ST / NXP / Renesas / onsemi / Microchip / Melexis and requires updated mapping before use.

Compliance checklist

  • Fields complete; units/scale/time-base consistent.
  • Event codes compatible; remote clear supported; latched semantics match.
  • Last-gasp passes three collapse replays.
  • Cloud mapping updated; charts show no discontinuity.

Concrete PNs & rationale (1/2)

  • TI LM5066I — PMBus native; telemetry/limits map 1:1.
  • TI TPS25985 — I²C eFuse; limits/status + shunt chain for current.
  • Renesas ISL6146A + ISL28022 — hot-swap + monitor; complete V/I/T via I²C.
  • Microchip PAC1934 + MIC25404 — power monitor + limiter; small-batch friendly.

Concrete PNs & rationale (2/2)

  • onsemi NIS5021/NIS5420 + FAN4010 — eFuse + current sense; PG normalized in MCU.
  • ST STEF01/12 + TSC2010/2020 + STTS22H — protection + current + temp.
  • NXP PCA9450 / PF series — PMIC telemetry; map via device_profile.
  • Melexis MLX91220/91221/91230 — current sensors for iout_mA; MLX90632 for temp fallback.
Procurement Compliance Card Left: copy-ready BOM note. Right: checklist with tick marks for compliance items; bottom banner limiting swaps to seven brands. BOM note Manager must expose vin_mV, vout_mV, iout_mA, die_temp_c, pg_state, fault_code. Black-box ring ≥ 32 entries. Last-gasp enabled with ≥ 2 event writes + CRC. PG/FAULT latched semantics required. Cloud mapper profile = efuse_mgr_v1. Cross-brand swap limited to TI/ST/NXP/Renesas/onsemi/ Microchip/Melexis and requires updated mapping before use. Compliance checklist Fields complete; units/scale/time-base consistent Event code compatible; remote clear; latched semantics Last-gasp passes linear/exponential/step replays Cloud mapping updated; charts show no discontinuity Cross-brand swaps are limited to TI / ST / NXP / Renesas / onsemi / Microchip / Melexis; update device_profile before use.
Procurement checklist and BOM notes for small-batch, cross-brand compatibility

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

Validation & Diagnostics

Establish offline log replay and boundary-condition fault injection to verify consistency of thresholds, debounce/hysteresis, and PG/FAULT edges across brands.

Key performance indicators

  • Cross-brand PG edge alignment < 5 ms under identical stimulus.
  • vout_mV / iout_mA curve error < 2% (RMS or peak, defined below).
  • Black-box integrity: CRC pass ≥ 99.9%; storm-throttled logs retain at least one enter and exit per fault class.

Error definitions

RMS error on window W: √(Σ((x_ref−x)/x_ref)² / |W|). Peak error is max absolute deviation over W.

Offline log replay

  1. Join black-box events {ts, code, v_mV, i_mA, t_c, detail, crc} with bucketed telemetry (10–100 ms).
  2. Reconstruct vout_mV/iout_mA via segmented linear or cubic spline.
  3. Recompute golden PG using chapter-6 rules: H/L, Td_enter/exit, hysteresis/warn band.
  4. Compare actual vs golden edges (time delta, jitter histogram); export artifacts.

Boundary fault injection

  • Cold start (post last-gasp), hot-swap with contact bounce, brown-out (linear/exp/step).
  • R_contact drift to accumulate warn; light-load oscillation near loop margin.
  • Negative tests: out-of-order timestamps, duplicates, CRC error → must be rejected and reason logged.

Outputs & reporting

  • Replay YAML: scenario, seed, window_ms, collapse_type{linear,exp,step}, R_contact(mΩ/t).
  • Comparison CSV: brand, pn, metric, value, limit, pass.
  • Regression gate: every mapping/threshold change re-runs full replay + injections.
Replay & Fault Injection Top: reconstructed waveforms with PG markers. Bottom: fault injection modules feeding replay pipeline. Offline replay (V/I + PG) vout_mV iout_mA enter warn enter bad exit bad Fault injection modules Cold start NVM restore Hot-swap contact bounce Brown-out linear/exp/step R_contact drift warn accumulation Light-load oscillation Replay pipeline: synth/merge → golden PG → edge/curve comparison → report
Offline log replay and boundary fault injection to validate PG/FAULT consistency

Regression gate

Any change to mapping, thresholds, debounce/hysteresis, or last-gasp policies triggers full replay + injection. Release only when all KPIs pass with headroom and charts show no discontinuity.

Cross-Brand Alternatives (Stable API)

Stable contract

  • Field names/units/time base fixed: lowercase snake_case with unit suffix.
  • Event code space domain:subcode grows only by addition; no re-use.
  • clear_policy ∈ {auto, remote, power_cycle} invariant across swaps.
  • New fields start with valid=0, stale=1 to preserve chart continuity.

Migration steps (A → B)

  1. Create device_profile for B (bitfields, PAGE, PEC/CRC, thresholds).
  2. Shadow compare: A is primary; B logs in parallel on same rail(s).
  3. Calibrate H/L, Td_enter/exit, warn band to meet KPIs.
  4. Gray switch: small traffic to B while dual-logging persists.
  5. Full switch: B primary; keep A as rollback for a window.
  6. Archive: freeze A’s profile and mapping for historical reports.

Rollback & auditing

  • Instant rollback to A without API change or chart breaks.
  • Dual-write comparison on ΔPG_edge, Δcurve, Δfault_hist with thresholds.
  • Every switch/rollback emits a black-box event with operator and reason code.
Stable API Under Swap Lower layer A↔B device swap with shadow channel; upper manager API bar remains unchanged; right-side gray/rollback controls and comparison metrics. Stable manager API: fields/units/timebase · event codes · clear_policy Device A current primary Device B shadow (parallel) Shadow compare: ΔPG_edge, Δcurve, Δfault_hist Gray & rollback Gray switch: route x% to B Dual-write logs for comparison Auto rollback on threshold breach Emit audit event for switch/rollback
Stable manager API while swapping underlying eFuse/Hot-Swap/Ideal-Diode parts

Migration runbook

  1. Define device_profile for B; bind mapping.
  2. Enable shadow; collect Δ metrics for ≥ 24 h typical load.
  3. Calibrate thresholds/timers until KPIs pass with margin.
  4. Gray switch > 20%; monitor alarms; then full switch.

Deliverables

  • Profile diff table: brand, pn, bitfield_map, thresholds, debounce, hysteresis.
  • Stability report: pre/post KPIs and exception samples.
  • Audit trail: switch/rollback events with operator IDs.

FAQ

Frequently asked questions about the PMBus/I²C protection manager layer (Telemetry · Policy · Audit). Answers are scoped to this page and match the JSON-LD exactly.

Why normalize PG/FAULT semantics across vendors?

Normalization makes policies portable and verifiable. Mapping diverse boolean/tri-state and latched behaviors into pg_state{good|warn|bad}, fault_code, and fault_latched keeps thresholds, debounce, and clear rules brand-agnostic. It enables cross-brand shadow comparisons, consistent charts, and automated regression gates without rewriting higher-level logic or retraining alert receivers.

How many black-box entries are enough for field diagnostics?

For small-batch systems, ≥32 entries is a practical floor: enter/exit pairs for several fault types plus housekeeping transitions. If unattended intervals are long or events are bursty, raise depth to 64/128 and enable storm-throttling. Each entry should store timestamp, domain:subcode, compact V/I/T snapshot, detail bits, and CRC for integrity.

What’s a safe last-gasp budget for two event writes and a CRC?

Size energy so E_cap ≥ E_req = N_evt·E_write + E_crc + E_handshake + margin with N_evt=2. Validate under linear, exponential, and step collapses. Account for temperature-dependent ESR and minimum hold-up. If margin is tight, persist one compressed summary plus CRC, then queue deferred details for upload after reboot completes successfully.

How do I prevent event-storm during threshold chatter?

Combine hysteresis windows H/L and Td_enter/exit with a per-class rate limiter. Coalesce identical back-to-back entries, carrying counts and last timestamp. Guarantee at least one enter and exit per storm. For borderline conditions, raise warn instead of repeatedly toggling bad, reducing flapping without masking genuine escalations.

Which PMBus registers should I log at each fault snapshot?

Capture STATUS_WORD/STATUS_BYTE, relevant *_FAULT_LIMIT, current READ_VOUT, READ_IOUT, READ_TEMPERATURE_1, and MFR_STATUS if present, plus page via PAGE. Add manager-side policy_state and debounce timers. This minimal set reconstructs decision boundaries and lets you replay expected pg_state against golden rules for that rail.

Can I infer missing fields (e.g., shunt current) from available data?

Yes—derive iout_mA = shunt_mV / Rsense(mΩ) when current is absent, mark src=derived, and publish an error bound (≤5% post-calibration). Temperature can fall back to die_temp_c + ΔT_profile. Derived values never overwrite originals and must carry valid/stale flags plus calibration revision in metadata for auditability.

How do I prove the manager wrote logs before power went away?

Use a two-phase last-gasp: persist events to non-volatile storage, then append a commit marker with CRC and monotonic counter. On next boot, verify the latest marker and continuity. Record a dedicated “power-loss imminent” event before writes begin; absence of the commit indicates partial or failed persistence requiring diagnostic attention.

Best practice to clear latched faults remotely?

Expose a clear request that only succeeds when measurements return within safe windows for Td_exit. Log operator, reason, and a pre-clear snapshot. If hardware mandates power-cycle clear, express it via clear_policy. Never auto-clear on transient reads; pair remote clear with a brief, rate-limited PG grace period to avoid flapping.

Procurement: What must be in the BOM to allow cross-brand swap?

Require vin_mV, vout_mV, iout_mA, die_temp_c, pg_state, fault_code; black-box depth ≥32; last-gasp with two writes plus CRC; and latched semantics with remote clear. Pin schema_rev and reference device_profile. Swaps are limited to TI, ST, NXP, Renesas, onsemi, Microchip, Melexis and require updated mapping.

Testing: How to replay logs to reproduce a field failure?

Align black-box events and bucketed telemetry on a common UTC-ms axis. Recompute golden PG using thresholds, hysteresis, and debounce, then overlay measured edges. Inject boundary conditions—brown-out profile, contact bounce, light-load oscillation—to bracket the failure. Export CSV and plots to compare edge deltas and curve error for pass/fail decisions.

How to time-align multi-rail events from different managers?

Use monotonic uptime_ms plus UTC anchors to correct clock skew. When only uptime exists, align on distinctive transitions—such as simultaneous PG drops—and refine by minimizing edge-to-edge deltas. Propagate the derived offset into charts and CSV exports so downstream analytics operate on a single, coherent timeline.

What’s the fallback when a vendor lacks PMBus but has only I²C?

Adopt an I²C profile mirroring PMBus fields: publish integer physical units with the same names, implement snapshot reads, and map faults to domain:subcode. If limits are write-only, emulate readback via cached policy state. Mark such sources as src=direct_i2c to distinguish them while keeping the manager API unchanged and charts continuous.