123 Main Street, New York, NY 10001

Telco Power & Sequencing for -48V/48V Front Ends

← Back to: Telecom & Networking Equipment

Telco Power & Sequencing is about turning a harsh -48V input into a safe, repeatable power-up: controlled hot-plug/inrush, coordinated protection, and deterministic PGOOD/RESET sequencing.

The goal is high availability with evidence: PMBus telemetry and fault logs that pinpoint the first trigger, so brownouts, redundancy switchovers, and load faults can be diagnosed and fixed without guesswork.

H2-1 · What “Telco Power & Sequencing” Means (Scope & Boundaries)

This page defines and engineers the -48V/48V front-end from the input connector to a repeatable, stable “power-good / reset-release” state. The goal is not only to survive real input events, but to start reliably and leave actionable evidence when something goes wrong.

What is in scope (engineering deliverables)
  • Safe attach: protection layers that keep the node alive during surges, dips, hot-plug stress, and reverse-current situations.
  • Stable bring-up: hot-swap inrush control, branch protection (eFuse/high-side switches), and sequencing/RESET behavior that avoids “intermittent boot failures.”
  • Observable & replayable: PMBus telemetry + alert/status + fault logs that allow post-mortem reasoning and fast field triage.

Out of scope: packet/traffic features, optics modules, clock trees, PoE, or detailed management-plane architecture. These may appear only as generic loads or alarm consumers.

After reading, it should be possible to
  • Sketch a reference front-end from -48V input to PGOOD/RESET, including measurement points and the alert/log chain.
  • Define a sequencing & reset policy (dependencies, timeouts, debounce) that is robust to dips and hot-plug transients.
  • Write a validation & troubleshooting checklist that proves repeatable bring-up and yields fast root-cause isolation.
Figure F1 — System boundary: from input to stable PGOOD/RESET + logs
Telco power and sequencing boundary diagram IN SCOPE -48V IN connector EMI / Surge survive events ORing no backfeed Hot-Swap inrush control eFuse / Switch fault isolation Sequencer / Reset PGOOD • RESET DC/DC Loads generic rails PMBus Telemetry VIN • IIN • VBUS • TEMP • STATUS Fault Logs VIN IIN VBUS Out of scope PoE • optics • timing • protocols
A useful mental model is two parallel paths: a power path that safely attaches and ramps the bus, and a control/telemetry path that decides when to release reset and records evidence for post-mortem analysis.

H2-2 · Input Realities: -48V Nominal, Brownouts, Surges, Redundancy Feeds

A telco node rarely sees a “clean bench supply.” Real inputs are dominated by events (hot-plug, dips, spikes, feed switchover), and the front-end must turn those events into bounded stress, stable PGOOD/RESET behavior, and clean evidence instead of mystery resets.

Event taxonomy (phenomenon → risk → front-end objective)

Hot-plug / plug-in

Risk: inrush + device stress. Objective: monotonic bus ramp, bounded peak current, controlled dv/dt.

Brownout / sag

Risk: PGOOD chatter + false reset. Objective: debounce/timeouts that distinguish “dip” vs “true loss.”

Surge / spike

Risk: over-voltage energy and MOSFET VDS stress. Objective: layered clamping + controlled stress envelope.

Reverse / backfeed

Risk: heating, unexpected shutdown, feed fighting. Objective: ORing that blocks reverse current and switches stably.

Redundancy feeds (A/B) can still “fight” each other when small voltage offsets and dynamic response differences create reverse-current paths or rapid switchover. The architecture must prefer stable selection over constant toggling, because toggling often becomes an alarm storm and eventually a reset cascade.

What to observe first (fastest triage)
  • VIN / VBUS shape: is the ramp monotonic, and do dips align with resets?
  • IIN / reverse-current hints: does current spike during attach or during feed switchover?
  • PGOOD/RESET timing: does reset release only after the bus is stable (debounce + timeout), or does it chatter?
  • Status + logs: do alerts say “why” (OV/UV/OCP/OTP) at the moment the symptom occurs?
Figure F2 — Input event timeline and what the front-end must enforce
Timeline showing plug-in, inrush, settle, brownout dip, and surge spike with VIN, IIN, and PGOOD/RESET traces Event timeline plug-in inrush settle brownout surge Waveforms (simplified) VIN IIN PGOOD/RESET LIMIT CONTROL DEBOUNCE LOG PROTECT dip may cause false reset spike stress
Treat input behavior as a sequence of events. A robust design enforces a bounded inrush, filters dips with debounce/timeouts, clamps spikes into a known stress envelope, and writes actionable logs at the moment symptoms occur.

H2-3 · Reference Architecture: Connector → Power-Good → Fault Logs

A practical way to design and debug a -48V/48V front-end is to treat it as two parallel channels: a power path that carries energy and stress, and a control/observability path that decides when to release reset, raises alarms, and freezes evidence into logs.

Power path (thick line): where energy and stress flow

Input conditioning

Clamps and filters events so the rest of the system sees a bounded stress envelope.

ORing

Prevents reverse current and stabilizes feed selection during transients.

Hot-swap

Shapes the bus ramp (dv/dt) and limits inrush so the pass device stays inside its safe region.

Branch protection

Isolates faults so a single bad branch does not collapse the whole node.

Control & observability (thin line): how behavior becomes deterministic

Sequencer / reset supervisor

Implements dependency logic and time rules: when to assert reset, when to release it, and when to shut down.

Telemetry + status

Turns voltages/currents/temperature into actionable states (warn vs fault) and correlated time-ordered evidence.

Signal semantics (short, engineering meaning)
  • EN: permission to ramp; a policy output, not a measurement.
  • PGOOD: “stable enough” declaration after filtering/timeout; not a raw instantaneous voltage indicator.
  • RESET: system coordination line; held until the bus and required rails are stable under the defined policy.
  • FAULT: protection action occurred (hard fact); used to isolate and to force a safe state.
  • ALERT: “reason entry point” to query status and decide whether to log, retry, or latch off.

Debug rule of thumb: if the bus waveform is wrong, follow the power path. If behavior is intermittent, follow PGOOD/FAULT/ALERT and the log trigger.

Figure F3 — Dual-channel architecture: thick power path + thin control/telemetry path
Architecture diagram showing power blocks with thick arrows and control/telemetry signals with thin lines POWER PATH (thick) CONTROL / TELEMETRY (thin) -48V IN connector Conditioning filter / clamp ORing no backfeed Hot-Swap ramp control eFuse isolation BUS / LOADS generic downstream rails VIN IIN VOUT Sequencer PGOOD / RESET Telemetry STATUS / ALERT Fault Logs freeze on trigger → keep evidence PGOOD / RESET FAULT ALERT → LOG trigger
The thick line is “what can burn or drop.” The thin line is “what makes behavior deterministic”: release reset only after stability, and record the reason when alarms occur.

H2-4 · Hot-Swap Deep Dive: Inrush, dv/dt, SOA, and Fault Timing

Hot-swap is controlled attachment: it charges the effective load capacitance while keeping the pass device inside a safe stress envelope. The most common failures happen when the design focuses only on “peak current,” while the real limiter is the VDS × ID × time stress window.

Mental model (what hot-swap is really doing)

During bring-up, the bus behaves like a capacitor that must be charged. A faster ramp increases charging current; a slower ramp extends the time the pass device must dissipate power. Robust bring-up therefore requires shaping both current and time, not just clamping a peak.

The four tuning knobs (cause → waveform impact)

Inrush limit

Caps the current pulse; may lengthen the stress window if the ramp becomes too slow.

dv/dt (ramp rate)

Sets the bus slope; too fast risks shock, too slow risks long high-VDS dissipation.

Current limit behavior

Defines what happens under abnormal loads (foldback/hold/turn-off) and shapes the IIN waveform.

Fault blanking timer

Ignores expected transients; too short causes nuisance trips, too long hides real faults.

Why MOSFETs can fail without an “obvious overcurrent”

A slow ramp into a large capacitance (often worsened by long cabling) can keep the pass device in a region of high VDS with moderate current for a long time. The peak may look acceptable, but the integrated dissipation builds heat until the device leaves its safe region. When that happens, the symptom is often repeatable: the bus rises, pauses, heats, then collapses or trips late.

Waveform-first diagnosis (fastest path to the right knob)
  • VOUT ramp: non-monotonic ramps or plateaus often indicate stress-window problems or premature fault timing.
  • IIN pulse width: a “not huge” peak can still be dangerous if the pulse is long (energy/time problem).
  • VDS stress window: a long high-VDS interval is a strong indicator of SOA/thermal margin risk.
  • FAULT timing: if FAULT aligns with the early transient, blanking is too short; if it aligns late after heating, stress is too long.

Practical pass criteria: the bus ramp is monotonic and repeatable, the stress window is bounded, and protection timing separates “expected transient” from “true fault” while leaving a clean, time-ordered reason trail.

Figure F4 — Four traces on one timeline: VOUT, IIN, VDS stress, and FAULT blanking window
Hot-swap waveforms showing ramp, current pulse, VDS stress window, and fault blanking timing with danger zone Waveforms (conceptual, single time axis) VOUT ramp IIN pulse VDS stress FAULT blanking DANGER ZONE high VDS for too long dv/dt knob inrush limit knob current limit blanking window fault blanking time
When the VDS stress window stays high for too long, failures can occur even if the current peak looks moderate. Tune ramp shape and timers as a coupled policy: current, dv/dt, and blanking must agree with the expected transient envelope.

H2-5 · eFuse / High-Side Switch Strategy: Protection Without Killing Availability

Branch protection exists to keep a “bad cable or bad load” inside its own compartment. The input front-end keeps the node safe to attach; the branch layer keeps the node available when one branch misbehaves.

What the branch layer must contain
  • Short / overload: isolate a faulty branch before the shared bus collapses.
  • Thermal runaway: prevent repeated stress cycles from turning into a permanent hardware failure.
  • Intermittent faults: turn “mystery resets” into a counted, time-stamped, explainable pattern.
Fault policy is an availability policy (latch-off vs hiccup vs retry)

Latch-off

Clean isolation and no repeated stress. Requires explicit re-enable. Best when repeated retries would be unsafe.

Hiccup

Automatic periodic attempts. Useful for transient faults, but can create alarm storms if not budgeted.

Retry (with budget)

A controlled number of attempts with backoff, then escalates to latch-off when the budget is exhausted.

Why budget matters

Budgeted retries protect availability while avoiding endless stress cycles and repeated brownout-like disturbances.

Selective power shedding (critical vs non-critical groups)

Avoid “one fault kills everything” by grouping loads. A critical group should favor deterministic isolation (often latch-off) so the rest of the node remains stable. A non-critical group can use budgeted retry to recover from transient faults without requiring manual intervention.

Coordination rule: input front-end vs branch protection
  • Split responsibilities: the input front-end shapes the shared bus; branch protection isolates individual loads.
  • Avoid timer overlap: expected inrush / transient windows must not look like a branch short-circuit window.
  • Preserve root cause: branch faults should produce a clear reason trail instead of triggering a larger “mysterious shutdown.”

Minimum log fields: first-trip timestamp, fault type, temperature/current peak, retry count + backoff, final state (recovered vs latched), and external re-enable action.

Figure F5 — Fault policy state machine (hiccup / retry / latch-off)
State machine showing ON, OCP detect, TRIP, cooldown, retry wait, latched off, and re-enable with timers and log triggers Branch fault policy: timers, thresholds, and log triggers ON supplying load OCP_DETECT TH + timer TRIP switch OFF COOLDOWN thermal settle RETRY_WAIT backoff LATCHED_OFF needs CLEAR/EN RE-ENABLE try again OCP_TH FAULT_TMR COOLDOWN_TMR RETRY_BUDGET? budget exhausted budget ok CLEAR / EN LOG_TRIG LOG_TRIG LOG_TRIG Notes Timers + budgets turn chaos into deterministic behavior
Use a state machine with explicit timers and a retry budget. Log at first-trip, each failed retry, and any latched-off escalation so field evidence is time-correlated and actionable.

H2-6 · ORing & Redundancy: Ideal Diode, Dual Feeds, Reverse Current, and Switchover Behavior

ORing is not just “two supplies in parallel.” It must block reverse current, select the better feed without chatter, and keep the shared bus stable enough that PGOOD/RESET policies do not oscillate.

ORing objectives (system-facing)
  • No backfeed: prevent reverse current from heating paths and destabilizing inputs.
  • Stable switchover: avoid rapid A↔B toggling that creates alarm storms and bus wobble.
  • Low loss: reduce drop and heat so redundancy does not become a thermal liability.
Three common failure patterns

Chatter / feed fighting

Small offsets and dynamic response differences cause repeated toggling and noisy alarms.

Reverse current

A feed is unintentionally powered through the other path, raising heat and confusing telemetry.

Bus wobble → PGOOD risk

Switchover dips can trigger false PGOOD transitions unless events are debounced and logged.

What to measure

VA, VB, VBUS, and Irev indicators plus a switchover event marker.

System-level hold-up behavior (focus on bus and policy)

The key metric is not a component choice but the depth and duration of any VBUS dip during switchover. ORing decisions should be aligned with the reset/PGOOD policy so brief transitions do not become system resets.

Log triggers: switchover detected, reverse-current event, VBUS dip below threshold, and any resulting PGOOD/RESET assertion.

Figure F6 — Dual-feed ORing into a shared bus, with Irev sensing and switchover debounce
Diagram showing Feed A and Feed B each through an ideal diode into a shared bus feeding hot-swap, with reverse current sensing and switchover event debounce and logging A/B feeds → ideal diodes → shared bus → hot-swap (with event + debounce + logs) Feed A VA Ideal Diode block Irev Irev_A Feed B VB Ideal Diode block Irev Irev_B Shared BUS VBUS VBUS Hot-Swap bus attach SWITCHOVER event DEBOUNCE stabilize alarms Telemetry / Logs log switchover + Irev + VBUS dip + PGOOD VBUS dip
Redundancy problems are diagnosed by events: switchover, reverse current, and bus dips. Add debounce to prevent chatter-driven alarms and log the event chain so PGOOD/RESET consequences can be traced back to the real cause.

H2-7 · Sequencing & RESET: Dependency Graph, PGOOD Logic, Timeouts, Safe Shutdown

A sequencing plan is not a list of rails. It is a dependency policy: which conditions must be true before enabling the next domain, who can assert RESET, and when the system should stop retrying and enter a safe shutdown state.

Why order matters (system consequences)
  • Prevent false start: dependent domains must not run before prerequisites are stable.
  • Prevent reset storms: unstable PGOOD signals create repeated resets and non-deterministic behavior.
  • Preserve evidence: shutdown must leave a path for logs/telemetry to capture the cause and sequence.
Model it as a dependency graph (not prose)

Nodes

BUS_OK, MGMT_RAIL, CORE_RAIL, IO_RAIL, PGOOD_AGG, RESET_OUT.

Edges

Each edge means a PGOOD dependency or an enable permission (EN).

RESET permissions

Many sources may assert RESET, but only a single policy should release it.

Policy output

A deterministic bring-up / shutdown flow with explicit timers and states.

PGOOD is a policy signal (window + time + status)

Treat PGOOD as “conditions satisfied” rather than “voltage reached.” A practical definition is: voltage-in-window and stable for a defined interval and no critical fault status. This prevents transient spikes and noise from toggling the dependency chain.

Timers: blanking, bring-up timeout, stability window
  • Blanking window: ignore expected transients so the system does not misfire during normal ramp events.
  • Bring-up timeout: if a domain cannot reach PGOOD in time, fail fast instead of dragging the node into partial power states.
  • Stability window: require persistence so brief dips do not cause PGOOD/RESET oscillation.
Safe shutdown: isolate first, keep evidence, stop storms

Safe shutdown is not “everything off.” It is an ordered exit: isolate the fault domain when possible, keep the minimum logging path alive long enough to record the event, and enforce retry limits so repeated transitions do not become a field reliability problem.

Figure F7 — Dependency graph (left) + simplified timing chart (right)
Left: rail dependency graph with PGOOD aggregation and reset permissions. Right: timing chart showing EN, rails, PGOOD and RESET with blanking, timeout, and stability windows. Sequencing policy = dependencies + windows + reset permissions Dependency graph BUS_OK VBUS in range MGMT_RAIL early stable CORE_RAIL main domain IO_RAIL peripherals PGOOD_AGG window + time RESET_OUT policy release PGOOD PGOOD EN EN PGOOD inputs release assert assert Timing chart time → BLANK TIMEOUT STABLE EN V_MGMT V_CORE V_IO PGOOD RESET release If PGOOD not reached → timeout → shutdown + log ignore transients fail fast persist
Left: dependencies define who can enable the next domain and how PGOOD is aggregated. Right: blanking, timeout, and stability windows turn noisy rails into deterministic PGOOD/RESET behavior.

H2-8 · PMBus Digital Power: What to Monitor, What to Log, and How to Make It Actionable

PMBus is valuable here because it standardizes observability and evidence. The goal is a power “black box”: layered telemetry, graded alerts (warn vs fault), and logs that explain what happened and why.

Monitoring layers (Input → Bus → Branch)
  • Input: VIN / IIN to capture supply events and attach stress.
  • Bus: VBUS / IBUS to correlate dips with PGOOD/RESET consequences.
  • Branch: IBRANCH / TEMP to identify the fault domain and repeated stress cycles.
Alerts: warn vs fault (avoid noise-driven trips)

WARN

Trend or margin loss. Record and notify, but do not destabilize the node.

FAULT

Requires action: isolate a domain, assert RESET, or enter a safe shutdown state.

Persistence

Use time-based persistence so brief spikes do not create false faults.

Context

Use different rules for bring-up vs steady state to reduce mis-triggers.

Minimum event log set (enough to replay failures)

Events: power-on start/done, brownout or VBUS dip, OCP, OTP, PGOOD drop, RESET assert, and retry-count changes.

Make it actionable: a simple triage loop
  • Start from the consequence: find PGOOD drop / RESET assert timestamps.
  • Check the system cause: did VBUS dip or did input/bus status change in the same window?
  • Drill into the domain: which branch current/temperature rose first, and did retry budget escalate?
Figure F8 — Telemetry map: point → sensor → PMBus class → alert → log entry
Telemetry map dividing input, bus and branch measurements and mapping each to PMBus register classes, alerts and log entries Telemetry chain: measurement → class register → alert → log INPUT BUS BRANCH VIN IIN VBUS IBUS IBR TEMP SENSOR capture SENSOR correlate SENSOR isolate PMBus Class VOUT / IOUT TEMP STATUS PMBus Class VOUT / IOUT STATUS PMBus Class IOUT / TEMP STATUS ALERT WARN FAULT persistence LOG EVENT SNAP STATE COUNT BUS telemetry links to PGOOD / RESET policy Use VBUS dips to explain resets and shutdowns
A useful power “black box” is built from layers (input/bus/branch), graded alerts (warn vs fault), and event logs with snapshots and counters. Keep labels short and consistent: VOUT/IOUT/TEMP/STATUS → ALERT → LOG.

H2-9 · Fault Policy Design: Coordination, Retry Budgets, Graceful Degradation

A robust front-end is defined by policy, not by parts. The goal is predictable behavior under stress: isolate where possible, cut fast when required, and stop infinite retry loops while preserving evidence.

Protection vs availability (when to cut vs when to degrade)

Immediate cut (hard safety)

Thermal runaway risk, uncontrolled stress window, reverse-current risk, or unstable system states.

Graceful degradation

Non-critical branch faults can be isolated while keeping critical rails and logging alive.

Severity levels (S0–S3) mapped to actions
  • S0 Info: record only (no action).
  • S1 Warning: notify + record (avoid destabilizing the node).
  • S2 Recoverable fault: isolate and/or retry under a defined budget.
  • S3 Critical fault: immediate cut or latched shutdown, with explicit clear conditions.
Retry budget (the anti-oscillation mechanism)
  • Retry count: limit automatic restarts per fault type and per time window.
  • Cooldown time: enforce cooling/settling between retries to avoid heat accumulation and chatter.
  • Escalation: repeated faults within a short window must step up severity (prevents reset storms).
  • Manual intervention: budget exhaustion becomes a latched event requiring explicit recovery conditions.
Critical vs non-critical rails (policy differs)

Non-critical faults should prefer isolation and continued operation of the minimum evidence path.

Critical faults should prefer deterministic reset/shutdown, because continued operation is unsafe or non-deterministic.

Coordination principles (without management-architecture details)
  • Detection is local: the domain that sees the fault must flag it and freeze context.
  • Decision is unified: one policy point decides isolate/retry/reset/shutdown to avoid “protections fighting.”
  • Evidence is mandatory: pre/post snapshots plus retry counters must be logged for every action.
Figure F9 — Policy decision tree: fault type → severity → action (with required log fields)
Decision tree mapping fault types to severity levels and actions such as log, isolate, retry, reset, or shutdown. Each action leaf includes required log fields. Fault policy = severity gate + deterministic actions + evidence Fault type Severity Action leaves INPUT EVENT brownout / surge BUS INSTABILITY VBUS dip / chatter BRANCH FAULT OCP / short THERMAL FAULT OTP / trend POLICY FAULT budget exhausted SEVERITY GATE S0 / S1 S2 S3 ESCALATE DEGRADE PATH LOG TS TYPE SEV ISOLATE branch off RETRY COOLDOWN LOG SNAP SNAP STATE CUT / LATCH PATH LOG TS TYPE SEV SHUTDOWN cut fast Required log fields: TS · TYPE · SEV · SNAP · STATE · COUNT S0/S1 S2 S3 ESCALATE
The decision tree forces consistency: every fault becomes a severity, every severity maps to a deterministic action set, and every action writes the same minimum evidence fields (TS/TYPE/SEV/SNAP/STATE/COUNT).

H2-10 · Validation & Production Checklist: Proving It’s Done

“Done” requires evidence. Validation must cover worst-case hot-plug stress, redundancy transitions, input events, and fault injection—each with explicit pass criteria and captured waveforms plus logs.

R&D validation (stress the real failure modes)
  • Hot-plug stress: maximum load capacitance, minimum re-plug interval, repeated cycles.
  • SOA margin: worst-case stress windows (voltage drop, current limit, thermal rise).
  • Redundancy switching: switchover behavior, reverse-current prevention, alarm debouncing.
  • Input events: brownout dips and surge spikes with expected policy behavior.
Fault injection (policy must match reality)
  • OCP / short: isolate vs shutdown decisions and retry budget behavior.
  • OTP: cooldown rules, escalation on repeats, and “stop storm” behavior.
  • PGOOD drop / RESET assert: timing windows and log triggers must be consistent.
Production tests (fast, stable, traceable)

Threshold sanity

Verify alert/fault triggers without relying on long test times.

Logging integrity

Write/read-back checks: events include snapshot + counters.

Sequencing consistency

Bring-up timing windows remain consistent across repeated power cycles.

Evidence bundle

Waveform capture + log bundle mapped to a matrix of cases and criteria.

Required deliverables

Validation matrix (case × pass criteria × evidence), waveform bundle, log bundle, and policy versioning for traceability.

Figure F10 — Validation matrix (3×4): case × criteria × evidence
A 3 by 4 validation matrix with rows as test categories and columns as evidence criteria. Cells use check and cross placeholders. Validation matrix: Case × Criteria × Evidence WAVEFORM TEMP LOG RECOVERY PLUG BROWNOUT FAULT T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 ✔ pass criteria met (placeholder) · ✖ investigate (placeholder) IDs (T1–T12) map to detailed cases and evidence bundles
The matrix enforces completeness: every case must define pass criteria and attach evidence (waveforms, temperature observations, logs, and recovery behavior). The check/cross marks are placeholders and can be replaced by a full table later.

H2-11 · Field troubleshooting: symptoms → measurements → root cause → fix

The fastest way to win in the field is to treat power events as time-ordered evidence: (1) capture the first failing waveform/log, (2) identify the stage that created it, (3) change one knob, and (4) re-run the same stimulus until the outcome is repeatable.

Triage goal in 15 minutes
Locate the failing stage Extract the first trigger Pick the correct knob Verify with repeatability
Focus stays inside: ORing / hot-swap / branch protection / sequencing / telemetry & logs.
1) Start from symptoms, but lock onto the “first observable”
  • Intermittent resets → first observable: RESET/PGOOD edge time and which rail dropped first.
  • Power-on fails → first observable: Vbus never reaches target, or reaches then trips on timer.
  • Load brownouts (traffic burst / fan spin / cold start) → first observable: Iin step vs Vbus dip.
  • Alarm storm → first observable: retry counter, fault type, and debounce window.
  • Unexpected heating → first observable: Vdrop across pass FET + time spent in linear.
2) Minimum measurement set (works even with limited access)
  • VBUS (after ORing / before hot-swap) and VOUT (after hot-swap): identify which stage collapses.
  • IIN (shunt/IMON) and FAULT/ALERT: decide “real overcurrent” vs “policy / debounce”.
  • PGOOD + RESET + EN sequence: decide “sequence dependency” vs “front-end trip”.
  • Retries / latch state + timestamps: decide “one-off transient” vs “infinite oscillation”.
Capture tip: a 4-channel scope is enough — map channels to VBUS, VOUT, IIN (or IMON), and RESET/PGOOD. Logs then explain “why”; waveforms prove “when”.
3) Symptom → stage mapping (use this to avoid wrong fixes)
Symptom Most likely stage What to check first Most common misread Typical fix knobs
Reset happens with VBUS “mostly OK” Sequencing / PGOOD logic Which rail drops first; PGOOD debounce; timeout Chasing inrush while the failure is dependency order PGOOD blanking, timeout, dependency graph, safe shutdown policy
VBUS chatters between A/B feeds ORing / ideal diode control Irev / switchover event; hysteresis; gate stability Assuming “bad PSU” when it’s controller chatter ORing hysteresis, reverse-current threshold, event debounce
VOUT ramps then trips repeatedly Hot-swap timers / SOA Fault timer window vs VOUT ramp; Vds stress Raising current limit (worsens SOA) dv/dt, inrush limit, fault blanking, SOA tuning
Only one load group dies; others stay up Branch eFuse / switch policy Latch vs hiccup; thermal cooldown; retry budget Global reset used as a “hammer” Retry mode, grouping, per-rail policy, selective shutdown
Alarm storm with no visible droop Telemetry thresholds / filtering Status bits, warn vs fault, moving average/peak capture Treating noise as faults (threshold too tight) Threshold margining, alert debounce, log trigger logic
4) Close the loop: reproduce → change one knob → verify
  • Reproduce the same stimulus (plug cycle, brownout dip, load step, redundancy switchover).
  • Pick one knob tied to the failing stage (dv/dt, blanking, retry mode, PGOOD debounce, ORing hysteresis).
  • Verify by repeatability: 20–50 cycles with consistent waveforms + consistent log classification.
  • Freeze the fix as a policy + verification artifact (parameter set + pass criteria + captured evidence).
Figure F11 — Troubleshooting flow: symptom → stage → measurement → fix
Field Troubleshooting — Evidence-First Workflow 1) Symptom 2) Identify stage 3) First probes 4) Fix knob Intermittent reset PGOOD/RESET drop Power-on fails VOUT never stable Alarm storm Retries / alerts Heating / smell Pass FET stress Sequencing Dependency / timeout Hot-swap Inrush / timer / SOA ORing / eFuse Chatter / branch trip Linear stress Vds×Id too long EN / PGOOD / RESET Which rail first? VBUS / VOUT / IIN Timer window Irev / retries / temp Debounce check Vdrop + duration SOA evidence PGOOD debounce Timeout policy dv/dt / blanking SOA tuning Retry budget Hysteresis Reduce linear Shorten ramp Rule of thumb Waveforms prove timing; telemetry explains classification. Change one knob, then re-run the same stimulus until repeatable.

H2-12 · BOM / IC selection checklist (criteria-based, with example P/Ns)

Part numbers are only useful when attached to pass/fail criteria. This checklist builds a selection “contract” per block: requirements → protection behavior → observability → validation evidence.

How to use this section
  • Write targets first (voltage, current, capacitance, fault policy, logging needs).
  • Pick a control IC only after deciding the fault policy (latch / hiccup / retry budget).
  • Ensure telemetry/logs can answer: what happened, when, and how many times.
A) Hot-swap controller (front-end) — selection criteria
  • Input domain: -48V (negative return path) vs +48V (positive bus), and required transient headroom.
  • SOA management: power limiting / foldback / timer behavior that protects the pass MOSFET under long ramps.
  • Programmable knobs: inrush limit, dv/dt, current limit, fault blanking, retry vs latch-off.
  • Observability: IMON/VMON, fault cause, peak capture, and (ideally) bus interface for logs.
  • Integration fit: external sense resistor range, gate drive strength, UV/OV thresholds.
Example part numbers (hot-swap front-end)
  • Negative (-48V) hot-swap: TI LM5067 (negative hot-swap/inrush controller)
  • Negative (-48V) hot-swap: ADI LTC4252 (negative hot-swap controller)
  • Positive (48V class) hot-swap: TI TPS2490 / TPS2491 (hot-swap controller family)
  • Hot-swap + PMBus telemetry/log-friendly: TI LM5066 / LM5066I (hot-swap + monitoring via PMBus/SMBus)
  • Hot-swap + PMBus telemetry: ADI ADM1276 / ADM1278 (hot-swap controllers with PMBus monitoring)
  • Hot-swap + PMBus power monitor: ADI LTC4286 (hot-swap controller with PMBus monitoring)

Tip: if field evidence and fleet observability matter, prefer parts with PMBus/SMBus fault reporting over “analog-only” designs.

B) eFuse / high-side switch (branch protection) — selection criteria
  • Fault response mode: latch-off vs hiccup vs auto-retry (and a bounded retry budget).
  • Selectivity: per-branch isolation (critical vs non-critical loads) to avoid “one fault drops all”.
  • Thermal realism: RON and thermal shutdown behavior under airflow variability.
  • Diagnostics: current monitor output, fault flag, and readable cause classification.
  • Coordination: ensure branch policy does not fight front-end hot-swap policy.
Example part numbers (eFuse / branch protection)
  • 60V eFuse (low-medium current): TI TPS2660 (industrial eFuse, reverse polarity protection)
  • 60V eFuse (higher current): TI TPS2663 (power limiting eFuse family)
  • 60V eFuse (smaller loads): TI TPS2662 (compact eFuse for lighter branches)
  • Secondary rails (post-conversion) eFuse option: TI TPS25985 (stackable high-current eFuse for lower-voltage rails)

Branch rule: critical rails should degrade gracefully (bounded retries + clear alarm). Non-critical rails can latch-off to protect uptime.

C) ORing / ideal diode (redundant feeds) — selection criteria
  • Reverse current control: detect and stop back-feed quickly; optional Irev reporting.
  • Stability: avoid chatter during small feed voltage differences and fast load transients.
  • Loss & heat: MOSFET selection + gate control for low drop without oscillation.
  • Event debouncing: switchover should not create false “brownout/reset” events downstream.
Example part numbers (ideal diode / ORing controllers)
  • High-voltage ideal diode: ADI LTC4357 (ideal diode controller, external MOSFET)
  • Dual ideal-diode ORing: ADI LTC4355 (diode-OR controller for two supplies, external MOSFETs)
  • Dual controller / redundancy focus: ADI LTC4370 (dual ideal diode / ORing controller family)
  • Low-side ORing (negative systems): TI LM5051 (low-side OR-ing FET controller)
  • High-voltage ORing option: TI LM5050-1 (ideal diode controller family)
D) Sequencer / reset supervisor — selection criteria
  • Dependency graph capacity: number of rails, AND/OR PGOOD logic, cascading.
  • Timeout discipline: separate “transient ignore” from “true fault cutoff”.
  • Safe shutdown: defined order for turn-off to protect ASIC/FPGA states.
  • Root-cause retention: store first-fault cause (do not overwrite it with cascading faults).
  • Factory usability: easy configuration, margining support, and predictable boot behavior.
Example part numbers (sequencing / monitoring)
  • Multi-rail sequencer + PMBus: TI UCD90120A (12-rail sequencer/monitor via PMBus/I²C)
  • Configurable supervisor/sequencer: ADI ADM1066 (Super Sequencer®, configurable monitoring/sequencing)
  • Compact programmable sequencer: ADI LTC2937 (power supply sequencer/supervisor with fault logging)
  • Simple rail sequencing: ADI LTC2924 (quad power supply sequencer)
E) Telemetry/logging blocks — selection criteria
  • Coverage: Vin/Iin + Vbus/Ibus + critical branches (at least one thermal point).
  • Actionability: warn vs fault thresholds, filtering/averaging, peak capture.
  • Log integrity: first-fault capture, retry counter, and time ordering (timestamps if available).
  • Fleet operations: consistent status taxonomy so field data can be aggregated.
Example part numbers (PMBus/SMBus telemetry-friendly)
  • Hot-swap + PMBus telemetry: TI LM5066 / LM5066I
  • Hot-swap + PMBus monitoring: ADI ADM1276 / ADM1278
  • Sequencer + PMBus: TI UCD90120A
Figure F12 — Criteria blocks + “target/priority” slots (selection worksheet)
BOM Checklist — Criteria First, Part Numbers Second Criterion Target Priority Hot-swap (front-end) SOA / power limiting Inrush + dv/dt knobs Fault timers / blanking Retry vs latch-off Monitor points (Vin/Iin) eFuse / branch switch Mode: hiccup / latch Retry budget + cooldown IMON + fault cause Thermal realism (Rds) ORing / ideal diode Reverse current stop No chatter switchover Low loss + heat Sequencing / RESET Dependency graph Timeout discipline Root-cause retention Telemetry / logs Warn vs fault Peak + averaging Retry counters Fill “Target” and “Priority” first. Only then shortlist candidate P/Ns and validate with the same stress cases.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Telco Power & Sequencing)

Each answer starts with a one-line verdict, followed by concrete checks and actions (waveforms + logs) to keep troubleshooting fast and repeatable.

1Why can a hot-swap MOSFET fail even when current limit never “looks high”?

Verdict: MOSFETs often die from VDS × ID × time (linear-region energy) rather than peak current.

  • Check VOUT ramp, IMON/IIN, and VDS (VIN−VOUT) during the slowest part of startup.
  • Identify the “SOA window”: high VDS while current is non-zero for too long.
  • Fix by reducing time in linear (dv/dt, inrush limit shaping, timer/blanking), not by raising current limit.
Related: H2-4 (Hot-swap deep dive).
2How to choose dv/dt and inrush limit when the load capacitance is uncertain?

Verdict: Design against a worst-case “unknown C” and tune knobs using repeatable stress tests.

  • Start from constraints: allowable VBUS dip, connector hot-plug limits, pass-FET SOA margin.
  • Use dv/dt to control ramp duration and inrush limit to cap peak current; add fault blanking to ignore harmless transients.
  • Validate with repeated hot-plug at maximum assumed capacitance and worst temperature/line conditions.
Related: H2-4 (Hot-swap), H2-10 (Validation).
3What’s the clean boundary between front-end hot-swap and branch eFuses?

Verdict: Hot-swap “forms the bus safely”; eFuses “isolate faulty loads selectively.”

  • Front-end hot-swap: inrush control, entry protection, safe ramp to a stable bus (VBUS/VOUT success).
  • Branch eFuses/switches: per-load OCP/OTP policy, grouping (critical vs non-critical), preventing one fault from dropping everything.
  • Avoid double-protection fights: do not let both stages run aggressive hiccup loops on the same event.
Related: H2-5 (eFuse strategy).
4Latch-off vs hiccup vs retry—how to decide without hurting availability?

Verdict: Choose behavior by fault severity and define a bounded retry budget to prevent endless oscillation.

  • Hard short/over-temperature/reverse-current risk: prefer latch-off or limited retries with long cooldown.
  • Benign transients (plug noise, short dips): allow hiccup/retry, but cap count and add cooldown + escalation.
  • Differentiate critical vs non-critical rails: keep critical up when safe; isolate non-critical early.
Related: H2-5 (Protection modes), H2-9 (Fault policy).
5Why does dual-feed ORing sometimes oscillate or chatter between inputs?

Verdict: Chatter happens when small feed deltas and fast load steps cross ORing thresholds without enough hysteresis/debounce.

  • Look for repeated switchover events aligned with VBUS ripple and load steps.
  • Check reverse-current sense thresholds and any control-loop stability around the ORing MOSFETs.
  • Fix with hysteresis, switchover debounce, and alarm filtering so “one clean switchover” does not trigger resets.
Related: H2-6 (ORing & redundancy).
6How to prevent reverse current during brownouts or feed switchover?

Verdict: Reverse current control must stay effective during undervoltage events, when back-feed risk is highest.

  • Verify IREV behavior during brownout: does the ORing stage quickly block back-feed as VIN collapses?
  • Ensure switchover logic avoids “ping-pong” that briefly opens a reverse path.
  • Log switchover + brownout as explicit events so downstream resets can be correlated to the true cause.
Related: H2-6 (Reverse current & switchover).
7What makes a sequencing scheme “fragile” and prone to intermittent boot failures?

Verdict: Fragile schemes have unclear dependencies and timeouts that misclassify transients as faults (or hide real ones).

  • Document the dependency graph: who gates EN, who asserts RESET, and which PGOODs are required.
  • Separate “startup transient ignore” from “sustained fault cutoff” with distinct windows and policies.
  • Preserve first-fault cause (do not overwrite it with cascading drops) to avoid false root causes.
Related: H2-7 (Sequencing & RESET).
8How should PGOOD/RESET blanking be set to avoid false resets yet catch real faults?

Verdict: Blanking should cover known transient widths but remain shorter than “damage time” for real faults.

  • Measure worst-case droop/glitch width during hot-plug, ORing switchover, and load steps.
  • Set PGOOD debounce/blanking slightly above those benign transients, then enforce a hard timeout for sustained undervoltage.
  • Use two-tier reporting: WARN for short events, FAULT for sustained events, each with clear log fields.
Related: H2-7 (PGOOD/RESET timeouts).
9Which telemetry points deliver the highest debugging value for the lowest BOM cost?

Verdict: The highest ROI set is the one that pins down “where it collapsed” and “why it tripped.”

  • Minimum trio: VBUS, IIN (or IMON), and PGOOD/RESET edge timing.
  • Next best: FAULT/ALERT cause classification and retry counters.
  • Prefer telemetry that can be logged and correlated (even without absolute timestamps, ordering still matters).
Related: H2-8 (Telemetry map), H2-11 (3-waveform triage).
10How to design alarm thresholds so they don’t become a “false alarm storm”?

Verdict: Alarms must be policy-driven: separate WARN from FAULT, apply filtering, and cap retries.

  • Define WARN as noisy-but-informative (debounced); define FAULT as rare-and-actionable (latched with evidence).
  • Use averaging for slow drift, peak capture for spikes; avoid thresholds tighter than measurement noise.
  • Bind alarms to retry budget escalation so repeated events converge to a stable state, not oscillation.
Related: H2-8 (Thresholding), H2-9 (Policy).
11What validation tests prove SOA margin and repeatable hot-plug behavior?

Verdict: Proof requires a stress matrix + captured waveforms + consistent log classification across repeats.

  • Run hot-plug at maximum assumed load capacitance, worst cable/temperature, and shortest re-plug interval.
  • Capture VOUT ramp, IIN pulse, and VDS stress window; verify no timer mis-trips and no thermal accumulation.
  • Record pass criteria per case (waveform shape, temperature rise, fault counters, recovery behavior).
Related: H2-10 (Validation checklist).
12In the field, what’s the fastest path from symptom to root cause using logs + 3 waveforms?

Verdict: Use logs to pick the first trigger, then use three waveforms to assign the failing stage.

  • Read logs first: first-fault cause, retry count, and event order (brownout, switchover, OCP, OTP, PGOOD drop).
  • Capture VBUS, VOUT, and IIN (or swap one channel for PGOOD/RESET if logic timing is suspect).
  • Change one knob (dv/dt, blanking, threshold, retry budget) and re-run the same stimulus until repeatable.
Related: H2-11 (Troubleshooting loop).