← Back to: Supervisors & Reset
What It Solves
Typical Failures
- Single-rail chatter triggers unintended global reset.
- Slow rise/fall causes PG flapping; short pulses from worn connectors.
- Hot-plug ping-pong between sources breaks stability.
Design Objective
Stabilize multi-rail readiness with N-of-M voting, time-domain constraints (deglitch/hold/blanking), interlocks to EN/RESET, and degraded policies for non-critical rails.
Where It Applies
4–12 rails with mixed PG sources: Vcore/Vio/Vana/Vddq plus charger/battery and load-side PG. Works with discrete regulators or modular power trees.
PG Sources & Levels
Physical Layer Unification
Prefer an open-drain (OD) PG bus with a shared pull-up (1.8/3.3/5 V). Convert push-pull (PP) PGs via buffer or level-shifter before joining the bus to avoid back-powering and contention.
Level Compatibility
Check VIL/VIH and hysteresis. Choose Rpull-up by bus capacitance and fanout: 4.7–10 kΩ for heavy bus, 10–47 kΩ for light/long lines. Keep rise-time ≤ 0.5× deglitch window.
Fanout & Wiring
Limit star branches; buffer long stubs; ensure clean return paths to reduce coupled spikes. Tag PP→OD adapters and document pull-up domain for audit.
Rise-time budget: t_rise(10–90%) ≈ 2.2 × R_pullup × C_bus → keep t_rise ≤ 0.5× t_deglitch.
Check back-powering risk on PP sources; buffer or series-resist if needed. Document the Vdomain and R_pullup in your BOM notes.
| PG_src | Type (OD/PP) | Vdomain (V) | Rp_kΩ | C_bus_pF | Fanout | Note |
|---|---|---|---|---|---|---|
| DC/DC PG | OD | 3.3 | 10 | 120 | 5 | Shared bus, close return path |
| LDO PG | PP→OD | 3.3 | 22 | 200 | 6 | Buffered to avoid back-power |
| Charger/Batt PG | OD | 3.3 | 10 | 150 | 4 | Cable-aware; check connector chatter |
Voting Schemes
Why Voting
1oo1 is fragile under chatter and slow ramps. Voting improves availability and suppresses false trips by tolerating a single misbehaving rail.
Time-Domain Guard
tdeglitch filters short spikes; thold enforces minimum high width; blanking masks start/stop transients. Align windows across rails to avoid early-bird bias.
Weighted & Penalty
Give critical rails higher weights. Track a penalty counter for jittery rails and enter degraded rather than immediate shutdown when it exceeds a threshold.
Core rules: valid = (PG==1 for ≥ t_hold) ∧ (glitch < t_deglitch) · vote = Σ(weighted_valid_rails) ≥ threshold
Start points: tdeglitch 20–100 μs · thold 2–20 ms · align window 5–20 ms · penalty_max 3–5/min.
| Rail | Weight (0–3) | Class | t_deglitch (μs) | t_hold (ms) | blanking (ms) | priority | penalty_max | notes |
|---|---|---|---|---|---|---|---|---|
| Vcore | 3 | critical | 50 | 5 | 100 | 1 | 3 | DDR release gated by Vcore OK |
| Vio | 2 | non-critical | 40 | 4 | 80 | 2 | 4 | Allow degraded I/O service on jitter |
Interlocks & Safe-State
Interlock Paths
PG_vote → EN_main, PG_mismatch → limit/derate/RO. Use Δt_release and Δt_reassert to prevent ping-pong.
Reset Tree Interface
Stage releases: Core → Clock → Memory → IO. Match minimum pulse widths and phases across the reset tree.
Fault Semantics
PG = power ready; FAULT = boundary violation. Keep channels separate; only bind EN/RESET to PG_vote.
| Condition | Action (EN/RESET/limit/derate) | Δt_gap (ms) | Latch (l/nl) | Clear_rule | Affected_blocks |
|---|---|---|---|---|---|
| PG_vote == OK & penalty < k | EN=1, RESET released (staged) | 50 | nl | auto (PG stable) | Core + DDR + IO |
| PG_mismatch transient | Derate/limit; keep session RO | 200 | nl | PG stable ≥ 5 s | Core |
| PG_mismatch persistent | Orderly shutdown | 500 | l | service_only | All |
Bypass & Degraded Operation
Policy Goals
For non-critical transient failures, maintain service (bypass). For critical rails, enter time-limited degrade and then orderly shutdown, preserving data integrity via write-protect.
Runtime Modes
Frequency/voltage derate, resolution/bitrate downscale, read-only (RO), session keep-alive, user notice, event logging, and well-defined recovery rules.
Field Overrides
Override priority: register > OTP > jumper. After the maintenance window, auto-revert when PG is stable.
| Rail_class | Fault_type | Timer_ms | Mode | User_notice | Recovery_rule |
|---|---|---|---|---|---|
| non-critical | transient | 0 | bypass | banner: “Video scaled” | auto when PG_stable ≥ 5 s |
| non-critical | persistent | 3000 | degrade (fps↓/bitrate↓) | toast: “Limited performance” | auto after 60 s stable |
| critical | transient | 200 | degrade (RO/limit) | log: WARN(PG_TRANSIENT) | require PG_stable ≥ 10 s |
| critical | persistent | 500 | shutdown (orderly) | dialog + log | save ctx → power-off |
Timing Budget
Key Windows
tdeglitch: 10–200 μs · thold: 2–50 ms · blanking: 50–300 ms · tpg-delay: device-specific P95/P99.
Clock Tolerance
Map ppm of RC/XTAL/IRC to a safety factor (≥1.25×). Apply guardband for temp/aging and lot variation (+10–20%).
Edge Budget
Rise time: t_rise ≈ 2.2·R_pullup·C_bus → keep t_rise ≤ 0.5 × t_deglitch.
Alignment
Use a 5–20 ms alignment window for multi-rail decisions to avoid early-bird bias.
| Rail | t_ramp_ms | t_pg_assert_ms | t_pg_deassert_ms | t_deglitch_us | t_hold_ms | blanking_ms | clk_tol_ppm | safety_factor |
|---|---|---|---|---|---|---|---|---|
| Vcore | 8 | 12 | 6 | 60 | 6 | 120 | 30 | 1.3 |
| Vio | 15 | 22 | 10 | 40 | 4 | 80 | 50 | 1.25 |
| Periph | 25 | 30 | 16 | 80 | 8 | 150 | 100 | 1.4 |
Diagnostics & Telemetry
Event Classes
Classify transient(glitch/short loss)vs persistent(PG low ≥ Tpersist); classification feeds bypass/degrade/shutdown.
Counters & Timestamps
Track fault_count[rail], min_high_width[rail], last_event_ts. Expose IRQ/Wire-OR, write audit logs, allow remote clear.
Storm Control
Coalesce repeats, rate-limit uploads, add cooldown during brown-out/hot-swap to prevent telemetry storms.
| Field | Type | Unit | Update_rule | Clear_rule | Notes |
|---|---|---|---|---|---|
| fault_count[rail] | uint32 | events | on classify(event) | remote_clear with audit | rate-limit uplink |
| min_high_width[rail] | uint32 | ms | if PG==1 then track min | manual only | windowed measurement |
| last_event_ts | utc string | ISO-8601 | on classify(event) | n/a | synced to host clock |
Procurement Hooks · Seven Brands
Note: Native N-of-M voting is typically realized via MCU/logic using supervisor outputs (PG/RESET/WDO). Mark NofM_support = via MCU/logic unless the device explicitly implements it.
| Brand | Part | Channels | Logic (OD/PP) | NofM_support | t_deglitch_range (μs) | t_delay_range (ms) | Vdomain (V) | AEC-Q100 | Pkg | TempGrade | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TI | TPS386000-Q1 | 4 + WDT | OD RESET | via MCU/logic | — | prog delay | 1.8/3.3 | Yes | QFN/TSSOP | -40~125°C | Multi-rail, flexible thresholds |
| TI | TPS3890-Q1 | 1 | OD | via MCU/logic | — | fixed/prog | 1.7–6.5 | Yes | SOT-23 | -40~125°C | Low Iq, accurate threshold |
| ST | L99PM62GXP | multi | OD/PP mix | via MCU/logic | — | prog | 3.3/5.0 | Yes | QFP | -40~150°C | SBC with diagnostics |
| NXP | FS6500 | multi + WDT | OD/PP config | via MCU/logic | — | prog | 3.3/5.0 | Yes | QFN | -40~150°C | ASIL-capable SBC |
| Renesas | ISL88014 | 1 | PP/OD options | via MCU/logic | — | fixed/prog | 1.8/3.3/5.0 | Var | SOT/QFN | -40~125°C | Compact supervisor |
| onsemi | NCV809 | 1 | PP RESET | via MCU/logic | — | fixed | 1.8–5.5 | Yes | SOT/SC-70 | -40~125°C | Low Iq reset |
| Microchip | MCP1316/18/19/132x | 1 | PP/OD options | via MCU/logic | — | fixed/prog | 1.5–5.5 | Some | SOT/TDFN | -40~125/150°C | Automotive variants available |
| Melexis | MLX8003x/8005x (LIN-SBC) | LDO+Reset+WDT | OD/PP mix | via MCU/logic | — | prog | 5.0 | Some | QFN/QFP | -40~150°C | Use as PG/WDT node |
BOM Remark (copy): PG voting requires N-of-M = __; t_deglitch ≥ __ μs; t_hold ≥ __ ms; anti-ping-pong gap ≥ __ ms; OD pull-ups to __ V, R = __ kΩ; non-critical rails degrade per policy v__ ; brand preference: TI / ST / NXP / Renesas / onsemi / Microchip / Melexis (interchangeable).
FMEA & Test Matrix
What this chapter delivers
A reproducible fault-injection plan and coverage matrix executable in factory and field, aligned with voting/interlock policies.
Injection set
Glitch width/frequency sweep · Slow-rise/slow-fall · Power rebound · Hot-swap · Cross-temperature (−40/25/85/105 °C) · Contact-R ↑ (20/50/100 mΩ).
Acceptance lines
- False-trigger rate ≤ 1e-6 (test-approx via count)
- MTTRPG ≤ 300 ms
- Δtgap ≥ 500 ms(anti-ping-pong)
- Reporting throttle hits policy(≤3 push/60s for same root cause)
- Classification accuracy(transient vs persistent)≥ 99%
| Case | Rail | Profile | Temp (°C) | Expected_behavior | Pass/Fail | Metric | Action |
|---|---|---|---|---|---|---|---|
| FMEA-JIT-001 | Vcore | glitch width=20/60/120µs @5Hz ×60s | 25 | ignore, fault_count↑, min_high_width tracked |
PASS | {“false_trigger”:0,”mttr_ms”:0,”reports”:0} | — |
| FMEA-JIT-012 | Vio | slow-rise t_rise=150ms (supply @−40°C) | -40 | ignore by deglitch/hold; no reset |
PASS | {“false_trigger”:0,”mttr_ms”:0,”reports”:1} | t_hold ≥ 8ms |
| FMEA-JIT-021 | Vddq | rebound 8%V for 20ms after drop | 25 | no shutdown; enforce Δt_gap before reassert | MARGINAL | {“false_trigger”:1,”mttr_ms”:210,”reports”:2} | Δt_gap ≥ 500ms |
| FMEA-JIT-030 | Periph | hot-swap 10 cycles (port) | 85 | no ping-pong; degrade→recover | PASS | {“false_trigger”:0,”mttr_ms”:180,”reports”:1} | series-R / clamp at edge |
Put Δt_gap and report throttling into the acceptance sheet. Version your PG policy (policy_rev) and store it in each event header for field cross-check.
Implementation Patterns
Pure Logic
Schmitt + RC deglitch (t≈2.2RC) + monostable + gates for 1oo2/2oo3. Ultra-low latency, but limited flexibility and temperature drift must be budgeted.
Programmable Supervisor
I²C/PMBus-configurable deglitch/hold/blanking/delay with IRQ/counters. Great for consistency; complex N-of-M often needs MCU cooperation.
MCU Firmware Voting
GPIO sampling + software counters + window alignment + weighted sum ≥ threshold + interlocks. Highest flexibility; mind boot masking and admission delays.
| Pattern | Latency | Flexibility | Diagnostics | BOM_cost | Tooling | Notes |
|---|---|---|---|---|---|---|
| Logic | 1–5 µs | Low | Low | Low | None | Small rail count; fixed policy; temp drift on RC |
| ProgSupervisor | 50–500 µs | Medium | Medium-High | Medium | GUI/Script | Consistent mass production; IRQ/counters ready |
| MCU | 0.5–2 ms | High | High | Low-Medium | FW+CI | Complex voting & interlocks; boot mask needed |
Copy (BOM/Spec): Δt_gap ≥ 500 ms, t_deglitch ≥ 60 µs, t_hold ≥ 8 ms, report_throttle ≤ 3/60s, classify transient/persistent per policy policy_rev=__.
Copy-Ready Assets
BOM Note (copy & paste)
PG voting requires N-of-M=__, t_deglitch ≥ __ µs, t_hold ≥ __ ms, Δt_gap ≥ __ ms. OD pull-ups to __ V with R=__ kΩ; align window W=__ ms. Non-critical rails follow degrade policy v__. Acceptable brands: TI / ST / NXP / Renesas / onsemi / Microchip / Melexis.
Silkscreen & Labels
- Signals:
PG_VCORE,PG_VIO,VOTE_OK,DEGRADE,BYPASS_EN,PG_FAULT. - Polarity/Domain:
OD@3V3orPP@Vdomain; annotateRp=10k,C_bus=100pFnear pull-ups. - Harness tags:
PG_BUS,VOTE_BUS,IRQ_PG.
pg_vote_matrix.csv
| Rail | Weight(0-3) | Class(critical|noncritical) | N_of_M_threshold | Align_window_ms | t_deglitch_us | t_hold_ms | Penalty_max | Priority | Notes |
|---|---|---|---|---|---|---|---|---|---|
| Vcore | 3 | critical | 3 | 10 | 60 | 10 | 2 | P1 | Weighted critical rail |
| Vio | 2 | noncritical | 2 | 10 | 60 | 8 | 3 | P2 | Shared IO domain |
| Vddq | 2 | critical | 3 | 12 | 80 | 10 | 2 | P1 | Memory rail |
Rule: N_of_M_threshold ≤ ΣWeight; Align_window_ms 5–20; t_deglitch_us ≥ 40; t_hold_ms ≥ 8.
pg_timing_budget.csv
| Rail | t_ramp_ms | t_pg_assert_ms | t_pg_deassert_ms | t_deglitch_us | t_hold_ms | blanking_ms | clk_tol_ppm | safety_factor |
|---|---|---|---|---|---|---|---|---|
| Vio | 120 | 15 | 12 | 60 | 8 | 20 | 50 | 1.3 |
| Vcore | 90 | 12 | 10 | 60 | 10 | 16 | 30 | 1.5 |
| Vddq | 140 | 18 | 14 | 80 | 10 | 24 | 80 | 1.4 |
Rule: set t_deglitch ≥ 1.5× the shortest measured false pulse; include temperature drift in clk_tol_ppm.
pg_degrade_matrix.csv
| Rail_class(critical|noncritical) | Fault_type(transient|persistent) | Timer_ms | Mode(degrade|bypass|shutdown) | User_notice | Recovery_rule |
|---|---|---|---|---|---|
| noncritical | transient | 0 | bypass | silent | auto |
| noncritical | persistent | 3000 | degrade | banner | stable ≥ 10s |
| critical | persistent | 1000 | shutdown | modal | manual |
pg_interlock_policy.csv
| Condition | Action(EN|RESET|Current_limit|Freq_cap|RO_mode) | Δt_gap_ms | Latch(l|nl) | Clear_rule(manual|auto) | Affected_blocks | Notes |
|---|---|---|---|---|---|---|
| PG_vote_drop | EN | 500 | l | manual | GPU,DDR | anti-ping-pong |
| PG_mismatch | Freq_cap | 0 | nl | auto | CPU,ISP | graceful degrade |
| Thermal_warn | Current_limit | 0 | nl | auto | All | coupled policy |
pg_test_matrix.csv
| Case | Rail | Profile | Temp | Expected_behavior | Pass/Fail | Metric | Action |
|---|---|---|---|---|---|---|---|
| FMEA-JIT-001 | Vcore | glitch 20/60/120µs @5Hz ×60s | 25 | ignore; count↑; min_high_width tracked | PASS | {“false_trigger”:0,”mttr_ms”:0,”reports”:0} | — |
| FMEA-JIT-021 | Vddq | rebound 8%V/20ms | 25 | no shutdown; enforce Δt_gap before reassert | MARGINAL | {“false_trigger”:1,”mttr_ms”:210,”reports”:2} | Δt_gap ≥ 500ms |
| FMEA-JIT-030 | Periph | hot-swap 10 cycles | 85 | no ping-pong; degrade→recover | PASS | {“false_trigger”:0,”mttr_ms”:180,”reports”:1} | series-R / clamp |
pg_telemetry_fields.csv
| Field | Type(u8|u16|u32|ts|bool) | Unit | Update_rule | Clear_rule | Notes |
|---|---|---|---|---|---|
| fault_count[Vcore] | u16 | count | inc | manual | monotonic per boot |
| min_high_width[Vio] | u16 | us | min | manual | shortest PG=1 pulse |
| last_event_ts[Vddq] | ts | unix | overwrite | manual | UTC |
| throttle_hits | u8 | count | inc | manual | per 60 s window |
| policy_rev | u16 | rev | set | manual | logged in header |
supervisor_brand_fields.csv
| Brand | Channels | Logic(OD|PP|both) | NofM_support(yes|no|viaMCU) | t_deglitch_range_us | t_delay_range_ms | Vdomain(V) | AECQ(Q100|—) | Pkg | TempGrade | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| TI | 4 | both | viaMCU | 40-160 | 2-50 | 1.8/3.3 | Q100 | QFN-24 | -40~125 | IRQ+counters |
| ST | 2 | OD | no | 40-120 | 2-20 | 3.3/5 | Q100 | TSSOP-16 | -40~125 | low Iq |
| NXP | 3 | both | viaMCU | 60-200 | 5-40 | 1.8/3.3 | — | QFN-24 | -40~105 | robust IO |
| Renesas | 4 | OD | yes | 50-150 | 2-30 | 3.3 | Q100 | QFN-20 | -40~125 | PG vote pins |
| onsemi | 2 | PP | no | 40-80 | 2-10 | 3.3/5 | Q100 | SOIC-8 | -40~125 | simple |
| Microchip | 4 | both | viaMCU | 40-160 | 2-50 | 1.8/3.3 | — | QFN-24 | -40~125 | I²C cfg |
| Melexis | 2 | OD | no | 60-120 | 2-15 | 3.3 | Q100 | SOIC-8 | -40~125 | sensor PG |
Frequently Asked Questions
When should I choose 2oo3 instead of a simple AND?
2-out-of-3 lets one rail fail or flutter without collapsing the system, improving availability versus a strict AND. It is valuable when rails have different ramp profiles or noisy connectors. The trade-off is a third sensor or PG channel and slightly higher latency from alignment windows and voting logic.
How large should deglitch be to tame connector chatter?
Start from the measured worst-case pulse width plus margin. A practical baseline is 40–60 µs for digital rails, rising to 80–120 µs with long harnesses or high EMI. Verify across temperature and slew-rate corners and set deglitch at least 1.5× the shortest observed false pulse during qualification.
How do I size the anti-ping-pong gap Δt?
Pick Δt so the fastest realistic bounce cannot retrigger release—typically 300–800 ms. Use field data: the longest rebound or brownout recovery plus twice the safety margin. Longer gaps reduce flicker but may delay recovery, so pair with reporting throttling to avoid alert storms during repeated disturbances.
Can I weight critical rails inside N-of-M voting?
Yes. Assign higher weights to Vcore or storage rails and raise the threshold accordingly. Combine with alignment windows and minimum-high times so brief transients on noncritical rails do not dominate. Document weights in the policy revision, and reflect them explicitly in FMEA cases and acceptance criteria.
How do I interlock PG-vote with EN/RESET without adding boot latency?
Mask voting until primary rails cross early thresholds, then release alignment windows and hold timers. Use staged enables and a single anti-ping-pong gap. Keep POR sequencing in hardware for determinism; let firmware confirm, log, and publish events rather than blocking first-boot power-up timing.
Is OD aggregation more robust than converting to push-pull first?
Open-drain with a common pull-up tolerates level mismatches and avoids back-power from failed sources. However, large bus capacitance slows edges. For long busses, buffer into a local domain or segment fan-out. If sources are strictly push-pull, convert to OD or pass through Schmitt buffers before aggregation.
How do I distinguish transient from persistent PG loss in the field?
Track min_high_width, fault_count, and last_event_ts per rail. A loss that clears before T_persist is transient; otherwise persistent. Throttle identical events within 60 seconds. Keep counters across soft resets and require explicit maintenance actions to clear them for auditability.
What are safe degraded modes for noncritical rails?
Prefer read-only, reduced frequency, or lower image quality rather than shutdown. Maintain session keep-alive and cache integrity. Exit degrade automatically once VOTE_OK is stable for a defined hold time. Critical rails still trigger an orderly shutdown after a bounded grace period to protect data.
How do I validate timing budgets across temperature and ramp slopes?
Sweep slow-rise and slow-fall profiles at cold and hot corners. Confirm deglitch is at least 1.5× the shortest false pulse and hold covers the longest settling. Measure with real harness capacitance and pull-ups. Any fail converts to policy changes: raise deglitch, increase hold, or widen alignment windows.
How do I balance pull-up value between no back-power and fast edges?
Lower resistances speed edges but increase power loss and back-feed risks. Start near 10 kΩ for 3.3 V domains with moderate fan-out, then adjust by measuring far-end edges on the scope. If leakage or long busses slow transitions, buffer or segment rather than forcing very small resistors.
What’s a good minimum anti-ping-pong for hot-swap events?
Use 500–1000 ms if cables or external modules are present. Hot-swap introduces rebounds and partial insertions; longer gaps prevent oscillation and repeated resets. Combine with input clamps or series resistors to limit surges, and enable event throttling so logs do not flood during repeated insertions.
What should a BOM note include for PG wiring and voting?
State the N-of-M threshold, deglitch and hold values, the alignment window, and the anti-ping-pong gap. Specify OD or push-pull domains, pull-up voltage and resistor, and whether logging and throttling are enabled. Include the degrade-policy revision and acceptable supervisor brands so procurement can substitute safely.