V/I/T System Monitor
← Back to: Supervisors & Reset
What It Solves
Under multi-rail, multi-heat-source, and dynamic loads, a binary PG/RESET cannot reveal “near-violation” trends. A V/I/T System Monitor builds window thresholds + hysteresis + delay + debounce to emit actionable early signals enabling soft derating and black-box forensics.
Reader Scenarios & Pain Points
- Multi-rails (5V/3V3/1V8/1V2/0V9) ramping, transients, thermal drift need trend and early warning—not just RESET.
- Production/field ask for black-box proofs: when it broke limits, by how much, and who first.
- Industrial/automotive require quantified false-alarm & false-negative rates (e.g., ≤10⁻⁶ per hour).
Key Takeaways
- Alert logic = OV/UV & OT/UT windows + HYS + tdelay + debounce.
- False-alarm rate depends on noise spectrum, HYS, filter window, and sampling rate.
- Minimal black-box set:
rail_id, evt, thr, meas, t_tag, policy, crc, monotonic_cnt.
3R Parameterization
Range (thresholds/hysteresis) · Rate (Fs/response) · Robustness (debounce/CRC).
Budget false alarms and back-solve HYS/window:
P(false) ≈ Q(HYS/σ_eff) × N_samples
Validation & Ramp-to-Mass
- Ramps/steps/thermal sweep/injected noise/under-sampling stress.
- Upper/lower tolerance stack: reference → divider → ADC → temp drift → PCB leakage.
Common Pitfalls
- Slow ramp with too-small HYS → chatter.
- Reference and sense return not co-located → ground bounce mis-trip.
Brand Capability (No PNs)
Check: internal precision ref, digital programmability for windows/HYS, event NVM depth & brown-out protection, AEC-Q grade, diagnostic registers.
Architecture & Signals
Signal Domains
- MON_IN (V/I/T): divider/shunt/NTC with RC anti-alias.
- ALERT/IRQ, RESET, PG/FAULT: reporting & chaining.
- I²C/SMBus/PMBus: thresholds, HYS, filter windows, events.
- REF, GND_SENSE: reference and Kelvin returns (single-point).
Timing & Rates
- Sampling:
F_s ≥ 5 × f_maxof interest. - Observe window
T_obsgoverns miss probability. - Alert → derate command latency < X ms.
- Power-fail tag written >= Y μs before collapse.
Interface Matrix
- ALERT → MCU (IRQ/poll).
- RESET → cross-domain: prefer open-drain + pull-up (avoid back-power).
- PG/FAULT → power tree interlocks.
- Bus pull-ups, fan-out, capacitance → bound alert latency under arbitration.
Validation & Mass-Prod
- Cross-level compatibility, pull-up sizing, fan-out, back-power current.
- Alert end-to-end latency under bus congestion (P95/P99).
Common Pitfalls
- RESET mistakenly push-pull → back-power into low-V domain.
- Long NTC/shunt routes → noise coupling & thermal lag.
- Marginal connectors → sporadic timeouts → “holes” in black-box logs.
BOM Remarks (Template)
Thresholds/HYS: UV=__V, OV=__V, HYS=__mV, DELAY=__ms, DEBOUNCE=__ms ·
Rail Map: rail_id ↔ net_name ·
Black-Box: {rail_id, evt, thr, meas, time, policy, crc} ·
Policy: derate x% / y ms; escalate reset/cutoff ·
TP: TP_sense, TP_ref, TP_alert, TP_derate
Thresholds, Hysteresis & Debounce
Key Conclusions & Metrics
- Windows: OV/UV (voltage), OT/UT (temperature).
- Hysteresis:
HYS ≥ k·σ_eff + margin_temp(k≈3–5). - Debounce: time window (tdb) or sample count (N).
- Delays: rise/fall independently tuned to suppress slow-ramp chatter.
Engineering Method
- σeff estimate = noise aggregation + environmental drift.
- Two-tier policy: pre-warn (IRQ/PG only) vs upgrade (RESET/Derate).
- Temperature compensation via LUT or piecewise NTC linearization.
Quantitative Placement
Budget false-alarms, then back-solve HYS and the debounce window:
P(false) ≈ Q(HYS/σ_eff) × N_samples
Validation & Mass Production
- Sweep jitter bands near thresholds and measure false-alarm rate.
- HYS-sweep to obtain Pareto (false-alarm vs response time).
- Full temp span (−40~+85/125 °C) and under/over-sampling checks.
Common Pitfalls
- Mean-only filtering ignores peaks → spike miss.
- HYS plus factory offset shrink the effective window excessively.
- Single delay for both edges → slow-ramp chatter.
- NTC self-heating and wire resistance cause apparent drift.
Emergency Derating State Machine
Goal & Transitions
Turn alerts into soft reactions rather than hard shutdown: Alert → Derate → Observe Window → {Clear | Escalate}. Policy parameters include max derating depth, step size ΔP, retries, cool-down, and escalation criteria.
Key Metrics
- Stability: match ΔP and Tobs to thermal-electrical coupling constants.
- Anti-toggle: Stickiness + Cool-down near thresholds.
- End-to-end latency: Alert → Derate < X ms.
- Escalation when limits persist or repeat within Tobs.
Implementation Notes
- VR/PMIC: current limit, freq/duty down-shift, power cap.
- eFuse/Driver: soft-limit, gate-slope control, pulsed power-limit.
- Fail-safe: on comms loss → revert to safest policy (power-limit) and log event.
Validation & Consistency
- Thermal chamber + load curves → stability map of {ΔP, T_obs}.
- Inject OV/OT/OC to verify clear vs escalate paths and black-box consistency.
- Measure P95/P99 end-to-end latency; verify recovery time to nominal.
Common Pitfalls
- ΔP too large → workload jitter and EMI jumps.
- Tobs too short → frequent escalations.
- Ignoring multi-rail coupling → “fix one rail, break another”.
- Sticky flags not cleared on recovery → permanent derate.
Black-Box Events (Tamper-Evident)
Goal
Define a minimal, power-fail–resilient event set and a tamper-evident write path that survives brown-outs and enables forensic ordering without a full RTC.
Key Conclusions & Metrics
- μs-level write budget; ring buffer + atomic write.
- No RTC: use monotonic seq + boot_epoch for total ordering.
- Tamper-evident: CRC16 + one-way monotonic counter; optional lightweight signature if MCU/SE supports.
Engineering Methods
- Power-fail anticipate ISR: write pre_fault tag first, then full payload → CRC → commit flag.
- Compress: collapse adjacent same-type events; bucketize amplitudes.
- Partition high-frequency vs milestone logs to limit wear and contention.
struct event {
uint8_t rail_id; // rail identifier
uint8_t event_type; // OV/UV/OT/OC/Derate/Escalate/Reset...
int16_t thr_set; // threshold (scaled / temp-compensated)
int16_t meas_val; // measured value at trigger
uint32_t time_tag; // RTC or monotonic ticks
uint8_t policy_id; // active policy/version
uint32_t seq_monotonic; // never-decreasing counter
uint16_t crc16; // CRC over header+payload
};
Validation & Mass Production
- Drop-curve dV/dt vs PF→Vmin window; measure write success rate (P95/P99).
- Power yank at pre/half/post write to verify atomicity & replay resistance.
- Abuse tests for reordering/replay; endurance and wear-leveling checks.
Common Pitfalls
- Write amplification → early NVM wear-out.
- Unmasked nested ISRs → out-of-order records.
- CRC coverage gap or commit flag not coupled with CRC → spoofable frames.
Analog Front-End Recipes
Key Conclusions & Metrics
- Shunt value: balance
P=I²Rvs resolution; meet LSB/noise/TC targets. - RTD/NTC linearization: prefer LUT; 3rd-order approx with stated temp range.
- Common-mode protection: series R + RC anti-alias + low-leak/low-C TVS (signal path only).
- RC↔Fs co-design:
τ=RCset near 1/(2π·3–5×fmax). - REF/GND_SENSE single-point return; divider matching & TC pairing.
Implementation Steps
- Set ranges with overload headroom; reserve window for HYS + tolerance stack.
- Budget σ_eff from amplifier/quantization/thermal + wiring/connectors/self-heating.
- Choose RC near targeted band; verify steps/slow-ramp boundaries.
- Layout: Kelvin at load; short symmetric pairs; keep sensors away from hot airflow.
- Select TVS with low leakage/C; check resistor power and temp rise.
Validation & Pitfalls
- Temp/noise/EMI injection; σ and drift statistics across boards/batches.
- RC too large → edge blunting → missed peaks; improper ground domains → false trips.
- NTC placement near heat source or inconsistent airflow → laggy readings.
Layout & Isolation
Grounding & Returns
- Analog/Digital split with single-point reunification at the measurement reference.
- REF and GND_SENSE routed in parallel with the sense pair; short and away from high dv/dt.
- Guard rings in sensitive areas to block domain coupling.
Coupling & Thermal
- Keep SW/rectifier/gate-drive away from sense pairs; close the return path locally.
- Thermal isolation for NTC/RTD; use via fences near hot zones.
- Prefer true differential or pseudo-differential sense to reject ground bounce.
Testability & Pull-ups
- Provide TP_sense±, TP_ref, TP_alert, TP_derate with clean silks and guard.
- Open-drain pull-ups must reside in the target domain to avoid back-power.
- Reserve apertures for near-field probe and IR camera lines of sight.
Verification
- Near-field EMI scan across threshold neighborhoods; tag hot/coupled spots.
- IR + electrical co-measurement for drift/false-trip correlation.
- Boundary-board review for tolerance extremes and mixed diff/pseudo-diff cases.
Common Pitfalls
- Long sense lines behaving as antennas.
- Cross-domain pull-ups → back-power paths.
- “Ground by copper pour” instead of single-point → hidden loops.
Validation & Corner Cases
Validation Matrix
- Electrical: ramp/step/spike/droop (PF→Vmin window).
- Thermal: −40~+85/125 °C sweeps and shocks.
- Noise: wideband/narrowband injection over the band of interest.
- Sampling: under/over-sampling boundaries.
- Statistics: false-/miss-rate targets with proper CIs.
Pass Criteria & MP Gates
- P(false_alarm) ≤ target; Min detectable amplitude ≤ target.
- End-to-end latency within target; DOE-derived feasible region for HYS/t_db.
- Mass production: boundary lots + aged samples + expanded uncertainty stack.
Common Pitfalls
- Single-load emulation → mis-tuned policies.
- Insufficient sample size → overly wide confidence intervals.
- Ignoring thermal hysteresis and multi-rail coupling effects.
Cross-Brand Shortlist
Scope & Method
Parameter-to-capability mapping across seven brands. Columns align channels (V/I/T), interface, ADC resolution/rate, reference accuracy, window/hysteresis programmability, event/NVM depth, diagnostics, AEC-Q, quiescent current, package, ESD/surge, and temperature range. Links stay disabled until verified (use rel="nofollow when activated).
Brand Highlights
- TI: PMBus ecosystem, multi-rail aggregation, broad AEC-Q.
- ST: Low Iq, flexible window/hysteresis, industrial/auto temp.
- NXP: PMIC pairing, system-level interfaces.
- Renesas: Wide reference options, strong industrial/auto mix.
- onsemi: Mature thermal alert chain (ALERT/THERM).
- Microchip: Multi-channel power metering + accumulators/NVM.
- Melexis: Automotive-grade thermal/IR monitoring strength.
| Brand | Part Number | Channels (V/I/T) | Interface | ADC (res/rate) | Reference & Accuracy | Window/HYS Prog. | Event / NVM | Diagnostics | AEC-Q | Iq | Pkg | ESD/Surge | Temp | Reason (Engineering & Procurement) | Docs |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TI | INA233 | V/I (1), T via ext sensor | I²C / PMBus | 16-bit, kSPS-class | Internal ref; high-accuracy shunt chain | Programmable thresholds/alerts | Accumulator (energy tally) | Status/alert masks | Platform-level Q options | Low | MSOP/TSSOP | HBM/IEC per DS | −40~+125 °C | PMBus-friendly; energy accumulation helps black-box analytics and derating policies. | Datasheet |
| TI | UCD90120A | Multi-rail V (up to 12), PG agg. | PMBus | Internal monitor matrix | Precision ref (see DS) | Window/HYS/delay/debounce CFG | Sequencing + logging features | Rich status/PG/fault tree | Q variants exist | Med-low (system dep.) | QFN et al. | Per DS | −40~+125 °C | One-stop PMBus multi-rail supervisor; easy migration across TI power trees. | Datasheet |
| ST | L9963E | Multi-V/T (battery chain) | SPI / diag lines | High-res ADC (per DS) | Precise refs; robust temp path | Flexible windows/HYS options | Built-in logging support (sys-dep) | Rich diag/status map | AEC-Q100/-Q specific | Low (IC + system) | TQFP/QFP | Per DS/IEC | −40~+125/+150 °C opt. | Strong V/T monitoring for automotive thermal paths; good fit for T-domain alerts. | Datasheet |
| NXP | PF5023 | Multi-rail V (PMIC) | I²C/PMIC control & PG/IRQ | Integrated PMIC monitors | PMIC-grade reference block | Programmable thresholds/windows | OTP/logging (platform-dep.) | Status/diag registers | AEC-Q variants | Low/Med (PMIC) | QFN/BGA | Per DS | −40~+125 °C | Good pairing with NXP SoC power trees; simplifies migration for system monitors. | Datasheet |
| Renesas | ISL28023 | V/I (1), power calc. | I²C/SMBus | 16-bit (typ.) | High-accuracy measurement chain | Programmable alert thresholds | Counters/log (sys-dep.) | Detailed status bits | Industrial/Auto options | Low | MSOP/TSSOP | Per DS | −40~+125 °C | High precision and robust SMBus; solid building block for multi-rail monitors. | Datasheet |
| onsemi | NCT218 / NCT203 | T (local/remote) | I²C + ALERT/THERM pins | 12-bit equiv. (typ.) | Precision temp front-end | Programmable alert windows/HYS | NVM optional (platform) | Alert/status registers | Industrial/Auto vars. | Ultra-low (typ.) | SOT/QFN | IEC-ESD per DS | −40~+125 °C | Mature T-domain alert chain; direct hardware path for derating triggers. | NCT218 DS · NCT203 DS |
| Microchip | PAC1933 / PAC1953 | V/I (3), energy accum. | I²C/SMBus + ALERT | 16-bit, multi-channel metering | Internal ref; scaling per DS | Programmable alert levels/HYS | Accumulators + status latches | Status/overflow/alert bits | Industrial/Auto recs | Low/Med | TQFN/QFN | Per DS | −40~+125 °C | Multi-channel metering suits black-box energy analytics and threshold policies. | PAC1933 DS · PAC1953 DS |
| Melexis | MLX90614 (Auto var.) | T (IR, 1–2 zones) | SMBus + ALARM opt. | 16-bit internal pipeline (typ.) | Factory-trimmed IR front-end | Programmable alarm thresholds | Rolling registers (sys-dep.) | Status + emissivity config | AEC-Q options | Very low (sensor) | TO-can/SMT | Per DS/IEC | −40~+125 °C | Fast non-contact hotspot monitoring complements NTC/Tc paths for derating. | Datasheet |
Procurement & Migration Notes
BOM Remarks Template
Thresholds/Hysteresis: UV=__V / OV=__V, HYS=__mV, DELAY=__ms, DEBOUNCE=__ms
Channel Map: rail_id ↔ net_name
Black-box Fields: {rail_id, evt, thr, meas, time, policy, crc}
Policy: derate: x% / y ms; escalate: reset / cut-off
Test Points: TP_sense, TP_ref, TP_alert, TP_derate
Compliance: AEC-Q100 grade, IEC 61000-4-2 level
Migration Checklist
- Threshold equivalence: align register units/steps (mV, LSB, scaling).
- HYS/Delay equivalence: confirm independent rising/falling settings and debounce definition.
- Event format: field order, CRC polynomial, monotonic counter mapping.
- Pins/Package: RESET/ALERT/PG polarity, voltage domain, pull-up location, fan-out budget.
- Tooling: PMBus/I²C scripts and mass-production programming workflow.
Risk Cards
Do
Validate black-box/event consistency in a minimal system before rolling into the full platform.
Don’t
Change thresholds without aligning units/LSB first; avoid cross-domain pull-ups that create back-power paths.
Frequently Asked Questions
How do I size hysteresis to suppress chatter on slow ramps?
Size hysteresis from the effective noise plus drift budget: HYS ≥ k·σ_eff + margin_temp, with k≈3–4 for sub-ppm false-alarm rates. Measure σ_eff under worst-case bandwidth and loading. If ramps are very slow, add separate rise/fall delays so the window does not re-arm mid-slope. Validate with sweep ramps and injected noise across your band of interest.
What debounce strategy avoids false triggers on temperature drift?
Use sample-count debounce for fast events and time-window debounce for slow drift, then gate both with a minimum persistence rule. Compensate the threshold versus temperature (LUT or linear segments) so drift moves the setpoint, not the measurement window. Always separate rising/falling debounce constants and confirm no aliasing versus the chosen sampling frequency.
When should Alert drive RESET versus IRQ/PG only?
Use RESET only for conditions that threaten data integrity or safety (e.g., brown-out below clock domain limits, over-temperature near damage). Prefer IRQ/PG for early-warning and derating actions. If domains mix, drive RESET through open-drain into the correct voltage domain and include stickiness plus cool-down to prevent ping-pong resets on marginal rails.
How fast must sampling be to catch sub-millisecond dips?
Nyquist alone is insufficient; aim for at least 5–10× the highest event bandwidth you must observe. For 200 µs dips, target ≥50 kS/s and minimize front-end RC so the time constant does not blunt edges. Add a peak-capture or window-comparator path to avoid averaging away narrow dips that violate safety or data-integrity margins.
What’s a safe minimal black-box dataset without an RTC?
Log the minimal, tamper-evident tuple: {rail_id, event_type, thr_set, meas_val, seq_monotonic, power_phase, policy_id, crc16}. The monotonic counter and power-phase tags reconstruct ordering without wall-clock time. Use atomic writes into a ring buffer and pre-fault tags so brown-outs still capture the last transition reliably.
How to make power-fail logs tamper-evident under brown-outs?
Pair a per-event CRC with an irreversible monotonic counter and sign block headers when a secure element is available. Commit entries using copy-on-write or dual-page journaling, and emit a pre-fault tag on power-fail interrupt. On readout, verify sequence continuity and CRCs; any gap or counter rollback marks the log as suspect.
Derating vs hard cut-off—what escalation policy is safer for SSD/MCU?
Prefer staged derating (limit current/clock/duty) with an observe window that tracks thermal/electrical recovery. Use stickiness and cool-down to avoid oscillation. Escalate to controlled reset only when the hazard persists or integrity is at risk (write windows, flash operations). Hard cut-off is last resort for over-temperature or destructive surge conditions.
How do I map multi-rail thresholds across brands during migration?
First normalize units and LSBs (mV vs raw codes) and confirm independent rising/falling thresholds. Recreate window, hysteresis, delay, and debounce semantics exactly; some parts gate debounce by samples, others by time. Validate policy equivalence on a minimal testbed with scripted sweeps, then export brand-neutral CSV for mass programming across boards and variants.
How to route Kelvin sense to minimize ground-bounce errors?
Route sense+ and sense− as a tight pair directly to the measurement node with a short return, and keep them away from high dv/dt nets. Tie REF and GND_SENSE at a single point. Place anti-alias RC close to the ADC pins. Provide TP_sense± and TP_ref for correlation and near-field scans during validation.
What tolerance stack kills a tight window (ref, divider, ADC, temp)?
The usual culprits are reference tolerance and temperature coefficient, divider mismatch and drift, ADC gain/offset, and sensor self-heating. Build a worst-case stack including board leakage and connector resistance. If the margin collapses, widen hysteresis, tighten component grades, or calibrate in production with a small trim to recover usable window width.
Can a single monitor supervise mixed-voltage domains safely?
Yes, if interfaces are domain-safe: use open-drain outputs with domain-local pull-ups, observe input common-mode limits with proper dividers or attenuators, and isolate any push-pull crossings. Where domains interact, aggregate PG/FAULT through level-compatible buffers. Always test back-power susceptibility and verify that resets never drive into an unpowered domain.
How to validate false-negative rate without million-cycle tests?
Use designed experiments: sweep amplitude, duration, and slew near thresholds, then fit detection probability versus margin. Combine with analytical bounds using your noise model and debounce policy. Apply Clopper–Pearson or Wilson intervals to estimate miss-rate with manageable runs, and confirm with spot long-haul tests at the identified worst-case corners.