PoE PD Controller: Classification, Isolated DC-DC & Event Logging
← Back to: Industrial Ethernet & TSN
A PoE PD controller turns Ethernet power into predictable, field-reliable rails by managing detect/class, inrush/startup, MPS, and fault/logging.
This page focuses on PD-side design hooks for integrated isolated DC-DC (including sync rectification), so power-up stays stable, disconnects are avoided, and every field failure leaves a usable “black-box” record.
H2-1. Definition & Scope of a PoE PD Controller (with Isolated DC-DC + Sync Rectification)
A PoE PD controller is the power-admission brain of a Powered Device: it negotiates power (PD-side), performs safe power-up, keeps the port alive (MPS), and exposes protection + telemetry—so the isolated converter can start reliably and recover predictably.
- PD controller ≠ PHY/MAC. PHY/MAC moves Ethernet data; the PD controller controls PoE power entry and power-state behavior.
- “Power-on” is not the end. Inrush, MPS, retry policy, and fault recovery decide field reliability.
- Isolated DC-DC + SR raises the bar. Startup sequencing, fault policy, and telemetry must cooperate with the converter and the load.
- Signature & classification: presents the correct PD signatures so the port can grant power.
- Inrush / hot-swap admission: limits input surge, ramps the bulk capacitor, and prevents repeated brownout loops.
- MPS & disconnect behavior: maintains port validity during light-load and sleep modes, and handles safe removal.
- Protection & recovery policy: UVLO/OVP/OCP/OTP responses, latch vs hiccup vs backoff retry.
- Telemetry & event hooks: exposes PG/fault states, counters, and logs for service and forensics.
- Multiple isolated rails: an isolated main rail plus a housekeeping rail needs clean sequencing and PG/fault gating.
- Higher efficiency at low headroom: SR reduces secondary losses, but requires margin against ringing and false turn-on.
- Field reliability: deterministic retry/backoff plus event logging prevents “mystery resets” and speeds service triage.
This page stays strictly on the PD side: signature/classification, inrush/MPS/disconnect, isolated DC-DC coordination (including SR), and fault/event logging. PSE policy, magnetics/ESD/surge deep-dives, and TSN/PTP topics are handled on their dedicated pages.
H2-2. Power Path & Interfaces: From RJ45 to Isolated Rails
Treat the PD as a traceable chain. Every field failure should map to one block on the path (input admission, bulk energy storage, isolation conversion, or telemetry/control). This section defines that chain.
- Power path (energy): PoE input → hot-swap/inrush → bulk capacitor → primary switch → transformer → SR → isolated rails.
- Control/telemetry path (behavior): DET/CLS decisions, gate control, enable/PG/fault, and I²C/PMBus reporting.
- Diode bridge: simplest polarity handling, but drops headroom and concentrates heat—classification margins and startup robustness become more sensitive.
- Ideal bridge: reduces losses and improves thermal margin, but adds control behavior that must remain predictable across transients and fault recovery.
- Bulk is both a reservoir and a load: it stabilizes the converter input, but it is the main reason inrush exists.
- Placement is part of the control loop: distance increases loop inductance, distorts inrush shape, and can trigger protection or brownout loops.
- Size must match policy: bulk sizing must be consistent with inrush limit and retry/backoff strategy (large bulk + aggressive retry is a reboot amplifier).
- Isolated main rail: powers the payload. Its behavior dominates efficiency and thermal performance.
- Housekeeping rail: powers control and diagnostics so state and logs survive short droops and controlled shutdown.
This section defines the PD-side power/control chain only. Detailed magnetics/ESD/surge layout rules and PSE allocation policy are linked elsewhere and not expanded here.
H2-3. IEEE 802.3 PD Handshake (Only What PD Designers Must Use)
This section keeps only the PD-side handshake logic that directly constrains hardware design: what the PD must present in each phase, and what “Class” actually constrains (power budget, inrush behavior, and MPS/maintain requirements).
- Handshake is phase-based: Detect → Class → Power-on → Maintain. Each phase has a PD “presentation” that is checked in a window.
- Class is a constraint bundle: it bounds the budget, influences inrush limits, and affects MPS behavior during light-load or sleep.
- Common pitfall: treating Class as “guaranteed usable power” ignores path losses and policy limits (cable, bridge, thermal, DC-DC efficiency, protection).
- Power budget constraint: caps average draw assumptions for system sizing and thermal planning.
- Inrush behavior constraint: limits how quickly bulk and the converter input can be energized without violating windows.
- MPS constraint: restricts “too-light” operating modes; maintain strategies must keep the port alive.
- Class is a handshake promise checked in defined windows; it is not a constant “always available” payload guarantee.
- Field headroom shrinks with temperature, cable length, connector aging, and conversion losses.
- Policy matters: inrush limiting and MPS compliance can force conservative behaviors that reduce usable payload power.
This is a PD-side, minimal-use view of the handshake. It preserves phase semantics and constraints only. Full standard text and PSE allocation policy are not reproduced here.
H2-4. Detection & Classification Circuit Design (Accuracy, Tolerance, and Failure Modes)
Mis-detect and mis-classify rarely come from a single “bad part.” Robust PD design treats detection/classification as a three-layer error chain: tolerance stack, leakage/parasitics, and event-driven drift (plug, cold start, humidity, and post-ESD shifts).
- Layer 1 — tolerance stack: Rsig tolerance, reference/threshold spread, temperature drift.
- Layer 2 — leakage & parasitics: ESD clamp leakage, bridge behavior, PCB surface leakage, unintended parallel paths.
- Layer 3 — multi-event interaction: plug/unplug + cold start + humidity + post-ESD drift creating corner-only failures.
- Rsig is never alone: any parallel leakage path shifts the effective signature and narrows margin.
- Temperature creates direction: leakage typically increases with temperature; Rsig drift depends on its technology and coefficient.
- ESD structures are part of the circuit: clamp leakage (especially after stress) can change detect/class results without visible damage.
- Edge spikes: fast transients can “look like” over-current inside the window even when average current is correct.
- Plateau instability: a wobbling current plateau behaves like noise at the classifier’s measurement point.
- Cold-start coupling: undervoltage and startup sequencing can distort current shape and cause intermittent class results.
- Plug/unplug: contact bounce and transient paths change what the detector sees.
- Humidity/contamination: surface leakage adds a hidden parallel path that is absent on a clean bench.
- Post-ESD drift: the system may still “work” but margins shrink; detect/class failures become corner-triggered.
This section covers detection/classification accuracy and failure mechanisms at circuit level. Detailed IEC test setup and layout rules live in the protection/magnetics pages.
H2-5. Inrush, Hot-Swap, and Safe Startup Sequencing
Startup failures are usually not “random.” They are outcomes of three coupled constraints: inrush limiting, bulk energy, and the retry/backoff state machine. This section turns those constraints into design inputs, waveforms, and verification hooks.
- Plug-in reboot loop: INRUSH ↔ UVLO oscillation or timeout before Vbulk reaches the RUN threshold.
- Cold-start fails, warm-start OK: margin shrink from Rds(on), leakage, and reference drift affects inrush slope and thresholds.
- Bigger bulk makes it worse: longer inrush window + added drop/heat increases retry probability and storm risk.
- Starts once, then never again: thermal accumulation or lockout after repeated FAULT events.
- Constraint target: limit Iin peak/average and control dVbulk/dt to stay inside handshake/startup windows.
- Design intent: charge bulk fast enough to reach RUN, but not so aggressively that upstream checks or protection trip.
- Waveform requirement: a stable inrush plateau is safer than a spiky peak with the same average.
- Too small: Vbulk sags during load steps and falls into UVLO/brownout loops.
- Too large: longer charge time under Iin_limit increases timeout risk and heats the hot-swap path during retries.
- Matched sizing: bulk size must be compatible with (Iin_limit × allowed time) so Vbulk reaches Vbulk_OK with margin.
- UVLO hysteresis: insufficient hysteresis creates oscillation near thresholds (RUN↔FAULT).
- Retry/backoff: backoff prevents thermal accumulation and avoids “retry storms.”
- Brownout loop: load step pulls Vbulk below UVLO, causing repeated restarts unless policy breaks the loop.
- Waveforms: Vin, Iin, Vbulk, PG/Fault, retry counter (trigger on plug-in and on FAULT edges).
- Corner sequences: cold-start, long cable drop, repeated plug/unplug, load step at RUN entry.
- Pass criteria (placeholders): start_time ≤ X, Iin_peak ≤ X, retry_cnt ≤ X within Y minutes, Vbulk_min ≥ X.
H2-6. Maintain Power Signature (MPS) & Disconnect Behavior
A PD can “look alive” locally yet still get disconnected upstream when the maintain criteria are violated. The most common triggers are light-load gaps, burst/skip modes, and deep sleep states that drop the effective load below the maintain window.
- Low-load operation: average power may be adequate, but the effective load can fall below the maintain window.
- Burst/skip mode gaps: long “no-load” gaps create maintain holes even if bursts are large.
- Deep sleep: main rails collapse while only housekeeping remains; the port can be seen as inactive.
- AUX-only operation: housekeeping may keep logs running while main rails are off; maintain can still be lost.
- Periodic wake: long intervals create maintain holes; short intervals reduce savings and raise temperature.
- Mixed bursts: communications or sensing bursts can create irregular load profiles that violate maintain windows unexpectedly.
- Record: load profile (or duty-equivalent), longest gap, disconnect timestamp, and last keep-alive action.
- Stress: deepest sleep, lowest ambient, highest cable drop, and repeated wake/sleep cycling.
- Pass criteria (placeholders): no disconnect within Y hours; max_gap ≤ X; keep-alive energy ≤ X.
H2-7. Isolated DC-DC Integration: Primary Control, Feedback, and Sync Rectification (SR)
SR is not “just higher efficiency.” It introduces timing and recovery risks that couple secondary current, gate timing, feedback behavior, and PD enable/PG policy. This section turns SR into a controllable interface with measurable hooks.
- PD controller: inrush/hot-swap, MPS, enable sequencing, telemetry, and fault/event capture.
- Primary control: switch drive and current limiting; startup dynamics determine whether SR ever sees valid current.
- SR stage: gate timing must avoid reverse current and reduce diode conduction without creating overlap.
- Feedback: opto vs PSR impacts light-load stability and how PG/FAULT should be defined.
- Self-driven SR: simpler, but timing shifts with load, transformer parasitics, and temperature. Risks reverse current or excessive diode conduction.
- Controller-driven SR: controllable and efficient, but dead-time must be tuned. Too short → overlap/reverse current. Too long → diode loss and heating.
- Scope points: SR gate, secondary current (or proxy), Vout ripple, SR MOS temperature, PG/FAULT edges.
- Corner cases: light load + burst/skip, cold start, lowest Vin, load steps near sleep transitions.
- Pass criteria (placeholders): reverse-current duration ≤ X, diode conduction share ≤ X, PG toggles ≤ X per Y minutes.
H2-8. Protection & Fault Handling (PD + DC-DC Combined)
Fault handling should be expressed as cause → symptom → quick check → action. Combined PD + DC-DC designs must prevent retry storms, preserve first-fault evidence, and choose latch-off, hiccup, or auto-retry based on safety and recoverability.
- Input: UVLO / OVP / inrush-OCP → startup loops, drop/reset, or no power-on.
- Power stage: primary OCP, SR timing faults, feedback faults → current limit, ripple, or overheat.
- Thermal: OTP in PD, primary, or SR → periodic dropouts and heat accumulation under retries.
- Output: short/overload/OVP → collapse, hiccup cycling, or latch-off depending on policy.
- Handshake-related: classification/detect faults → never enters RUN or powers briefly then stops.
- Latch first-fault: fault_code_first, timestamp, and the state at fault entry.
- Snapshot: Vin_min, Vbulk_min, Vout_min, temp_max, retry_cnt (placeholders).
- Apply policy: latch-off / hiccup / retry+backoff based on fault class and thermal margin.
H2-9. Telemetry, Event Logging & “Black-Box” for Field Diagnostics
Field issues are rarely reproducible on demand. A minimal black-box turns “intermittent” into an ordered event timeline with consistent counter definitions, brownout-safe retention, and fast readout.
- Power class / negotiation result
- Start attempts and state transitions (IDLE → INRUSH → RUN → FAULT → RETRY)
- Faults with codes (UVLO/OVP/OCP/OTP/SHORT/FB/SR)
- Recovery action taken (LATCH / HICCUP / RETRY+backoff)
- Bind a denominator: per start_attempts (startup issues) or per uptime_minutes (run-time issues). Avoid mixing.
- Declare the time window: since boot vs rolling window vs last N events. Keep one default for dashboards.
- Define state coverage: whether INRUSH/RETRY are included. “RUN-only” metrics often hide startup storms.
- Prevent endpoint mixing: per-port vs per-device totals must not be merged without labels.
- Readout: I²C or PMBus for structured fields; an INT pin for immediate “new event” notification.
- Storage: use a ring buffer with schema_version and a commit marker to avoid partial records after brownout.
- Brownout rule: log first-fault + snapshot before retry policy decisions. Preserve evidence before cycling.
- Retention policy: keep last N events and last M faults, plus last boot record (placeholders).
- EventID (enum)
- Time (ms since boot; optional UTC if available)
- Vin (X), Iin (X), Temp (X)
- State (IDLE/INRUSH/RUN/FAULT/RETRY)
- FaultCode (UVLO/OVP/OCP/OTP/SHORT/FB/SR)
- RetryCount (X) and PolicyAction (LATCH/HICCUP/RETRY+backoff)
- CommitFlag (valid/partial) and SchemaVersion
H2-10. Verification Plan: Bench Bring-Up → System → Production Gates
Verification should run as gates. Each gate has must-test items, captured evidence, and pass criteria (X). Failures should route to the correct chapter (handshake/inrush/SR/protection/logging) without expanding scope.
- Purpose: what this gate proves (electrical truth, system robustness, or production screen).
- Must-test: 3–5 checks that cover the dominant risks.
- Evidence: waveforms + event logs aligned by time markers.
- Pass: thresholds (X) and stability windows (X).
- Handshake capture: detection/class/power-on windows (PD view).
- Inrush: Iin_peak ≤ X, Vbulk ramp time ≤ X, no oscillation.
- Startup stability: PG stable for ≥ X time after RUN entry.
- SR timing margin: dead-time X validated against secondary current across load sweep.
- Policy sanity: inject one controlled fault; confirm fault_code + action + first-fault log.
- Cable corners: validate startup and retry behavior under worst-case line drop (X).
- Temperature corners: cold/hot stability; temp_peak ≤ X and no runaway retries.
- Sleep/light-load: maintain stable operation without unintended drops; log continuity preserved.
- Load transients: Vout droop ≤ X and no false FAULT/PG toggling.
- Recovery behavior: backoff enforced; retry_cnt ≤ X per Y minutes (placeholders).
- Signature/class sanity: class result within expected window (X).
- Fast startup: start_time ≤ X and PG asserted within X.
- Log readout: schema_version + last N events + first-fault fields readable.
- Param limits: key thresholds set to allowed ranges (X).
- Controlled fault (optional): short pulse or overload pulse; action matches policy.
H2-11 · Applications (near the end)
This page targets Powered Device (PD) designs that benefit from an isolated power stage, optional synchronous rectification (SR), and fault/event records for field diagnostics. Use the buckets below to map system needs to an implementable PD power architecture.
- Why isolated: long cable + chassis coupling + remote mounting often demand isolation to reduce ground-loop and noise injection.
- Why SR matters: sealed enclosures and compact mechanicals make efficiency-to-thermal headroom a first-order constraint.
- Why logging: repeated brownouts / restart loops / thermal peaks are costly without a minimal evidence record.
- Integrated PD + flyback (size-first): TI TPS23758
- PD interface + isolated controller: TI TPS23754-1, TI TPS23753A
- Integrated PD + switching regulator: ADI LTC4269-1, ADI LTC4267
- High-power PD interface (external PWM): TI TPS2373 + TI LM51551-Q1 (PWM)
- Secondary SR controller (if external SR): TI UCC24610
- When integrated isolated DC-DC is preferred: tight BOM, faster bring-up, fewer topology gotchas, and repeatable production limits.
- Thermal/space trigger: if airflow is weak and heatsinking is limited, SR + spread-spectrum options become decision drivers.
- Service trigger: field failures often present as “reboot loops”; logging should preserve first-fault cause across brownouts.
- High-power PD interface + external PWM: TI TPS2373 + TI LM51551-Q1
- High-power PD interface (external pass FET option): TI TPS2379
- PoE-PD interface (high power family): onsemi NCP1096
- Integrated PD + switching regulator: ADI LTC4269-2 (forward + SR-friendly use cases)
- Secondary SR controller: TI UCC24610
- Primary risk: deep-sleep or burst-mode loads can fall below Maintain Power Signature (MPS) and trigger disconnect.
- Design posture: choose a PD + isolated controller that supports predictable startup/backoff and clean PG/fault signaling.
- Logging posture: record power-on attempts, fault codes, and thermal peaks to avoid “no-fault-found” returns.
- PD + isolated controller: TI TPS23754-1, TI TPS23753A
- Integrated PD + flyback regulator: ADI LTC4267 (802.3af class range), ADI LTC4269-1
- Secondary SR controller (if external SR): TI UCC24610
- When SR matters: multi-rail systems (isolated main + housekeeping) often run warm; SR recovers margin without larger heatsinks.
- When logging matters: intermittent shorts or overloads can create retry storms; recovery policy must preserve evidence and rate-limit retries.
- PD interface + isolated converter controller: TI TPS23754-1
- PD interface + external PWM (scales power): TI TPS2373 + TI LM51551-Q1
- PoE-PD interface family: onsemi NCP1096
- Secondary SR controller: TI UCC24610
- Isolation is non-negotiable: remote nodes, ground-loop exposure, or mixed chassis/field grounds.
- Thermal density is high: limited airflow + compact enclosure + power above X W (placeholder).
- Service cost is high: “reboot loop” returns require black-box evidence (first-fault + retry counters + thermal peaks).
Buckets map system intent to PD architecture triggers. Labels show the minimum decision axes that affect stability, thermals, and serviceability.
H2-12 · IC Selection Logic + Engineering Checklist (Design → Bring-Up → Production)
The goal is a repeatable selection path: must-have gates → nice-to-have → risk hooks, ending with a checklist that forces each risk hook to have a measurable pass criterion (X placeholders).
- Required input power: target class/type and worst-case load (X W) plus startup margin.
- Isolation requirement: yes/no, number of isolated rails (Y), and any functional partition constraints.
- SR requirement: required efficiency/thermal headroom (ΔT = X) and enclosure airflow assumptions.
- Telemetry/logging requirement: minimum record fields + retention across brownout (N events / T hours).
- Adapter ORing: external adapter coexistence (priority policy) and enable/PG handshakes.
- Category 1 — Integrated PD + flyback controller (fastest bring-up): TI TPS23758
- Category 2 — PD + isolated converter controller (flexible power stage): TI TPS23754-1, TI TPS23753A
- Category 3 — High-power PD interface + external PWM (scales power / tuning): TI TPS2373 + TI LM51551-Q1
- Category 4 — PoE-PD interface family (system-defined DC-DC): onsemi NCP1096
- Category 5 — Integrated PD + switching regulator options: ADI LTC4269-1, ADI LTC4269-2, ADI LTC4267
- Programmable inrush profile: reduces startup surprises with large bulk caps.
- Retry/backoff controls: prevents “storm” behavior under intermittent faults.
- PG/fault semantics: unambiguous enable/disable of the downstream converter.
- Sync / spread-spectrum options: helps manage switching interference without deep EMI detours.
- Low-load behavior: avoids accidental MPS drop during sleep or burst-mode loads.
- Log readout hooks: INT pin + I²C/PMBus (or simple GPIO codes) for field triage.
- Retry storm risk: short/overload + fast auto-retry can cause repeated inrush stress → require backoff (X) and first-fault logging.
- Light-load MPS drop: deep sleep or burst mode can look like “dead PD” → require MPS retention test across modes (X).
- SR timing margin: ringing/noise can collapse dead-time → require SR timing capture and minimum margin (X ns).
- Thermal headroom illusion: enclosure airflow assumptions often fail → require ΔT measurement at worst case (X °C).
- Counter definition mismatch: window/denominator confusion ruins diagnostics → freeze metric definitions in firmware (schema version).
Keep targets explicit; use placeholders (X) until measured. Table scrolls on mobile by design.
| Spec | Target (X) | Why it matters | How to verify |
|---|---|---|---|
| Inrush limit profile | Iin ≤ X | Prevents PSE trips and startup oscillation | Capture Vin/Iin/Vbulk during startup |
| UVLO hysteresis | ΔV ≥ X | Avoids brownout loops and repeated restarts | Sweep input & observe state transitions |
| MPS retention at light load | No disconnect for X | Prevents “system runs then suddenly dies” | Sleep/load profile tests + logs |
| SR dead-time margin | DT ≥ X ns | Avoids cross-conduction and thermal spikes | Scope SR gate vs secondary current |
| Fault recovery policy | Backoff = X | Prevents repeated stress and false RMA | Fault injection + verify logs |
The list below is intentionally pragmatic: pick a category first, then validate class/power, topology, and thermal margin in Gate tests.
| Block | Example IC P/N | Use when… |
|---|---|---|
| Integrated PD + flyback | TI TPS23758 | Size/BOM reduction is the priority |
| PD + isolated controller | TI TPS23754-1, TI TPS23753A | Isolated stage needs control flexibility |
| High-power PD interface | TI TPS2373, TI TPS2379 | Power scaling / external pass device is needed |
| PWM controller (flyback) | TI LM51551-Q1 | Used with PD interface that supports advanced startup |
| Secondary SR controller | TI UCC24610 | External SR is preferred or required |
| Integrated PD + switching regulator | ADI LTC4269-1, ADI LTC4269-2, ADI LTC4267 | Complete front-end + regulator simplifies design |
| PoE-PD interface family | onsemi NCP1096 | System-defined DC-DC; PD handshake + inrush managed |
- Inrush policy: set I-limit profile and bulk capacitance target (Cbulk = X) with waveforms as evidence.
- Brownout policy: UVLO thresholds + hysteresis (ΔV = X) to stop oscillation loops.
- SR policy: required dead-time margin (DT ≥ X ns) and capture plan (gate + current).
- Fault taxonomy: map fault → action (latch/hiccup/retry) and set backoff (X).
- Forensics schema: freeze counter definitions + event record fields + versioning.
- Handshake capture: detect/class/power-on traces and signatures (windows = X).
- Startup stability: verify no repeated restart loops across cable and temperature corners.
- MPS retention: validate deep sleep/light-load profiles do not drop power for X minutes/hours.
- SR margin: verify dead-time at worst ringing condition; confirm no cross-conduction spikes.
- Fault injection: short/overload/OTP tests must leave usable first-fault logs.
- Class sanity: verify correct class behavior under controlled input (pass = X).
- Startup time: time-to-regulation within X under nominal load.
- Log readout: verify event counters readable and schema version matches firmware.
- Thermal spot-check: ΔT within X at controlled airflow condition.
- Controlled fault: one scripted fault must produce consistent action and evidence record.
The flow forces every “risk hook” to appear as a verification item with a measurable pass criterion.
H2-13 · FAQs (PD classification / inrush / MPS / SR / logging)
Each answer uses a fixed, data-oriented format: Likely cause → Quick check → Fix → Pass criteria (X). Scope is strictly PD-side behavior (no PSE policy).
Class is correct, but the PD keeps rebooting under load — inrush retry storm or DC-DC hiccup?
Likely cause: Brownout loop (UVLO hysteresis too small) or converter hiccup colliding with PD retry/backoff.
Quick check: Capture Vin/Iin/Vbulk and log state transitions around reboot; verify reboot aligns to UVLO or OCP/OTP event.
Fix: Increase UVLO hysteresis (ΔV=X), tune inrush profile, and enforce retry backoff (Tbackoff=X) before re-enable.
Pass criteria: No reboot for X minutes at Y% load step; retry count ≤ X per hour; Vbulk never drops below X V.
Passes on bench, fails on long cable — marginal MPS or startup timing?
Likely cause: Cable drop increases input ripple; marginal MPS current during light-load phases triggers disconnect.
Quick check: Compare long-cable vs bench: measure Vin ripple, record MPS-related events, and run a controlled sleep/load profile.
Fix: Increase hold-up (Cbulk=X) or adjust load shaping (bleeder/pulsed load) to keep MPS, and avoid startup windows with insufficient margin.
Pass criteria: No disconnect for X hours on Z meters cable; Vin_min ≥ X V during worst transient; MPS events = 0.
Classification is unstable between boots — leakage/tolerance drift or input bridge drops?
Likely cause: Leakage paths (ESD/clamp/PCB contamination) or DET/CLS bias interacts with bridge drop and tolerance stack.
Quick check: Measure signature/class currents across X cold boots, varying humidity/temperature; compare with bridge vs ideal-bridge path if available.
Fix: Tighten leakage budget (cleanliness/guarding), verify Rsig tolerance stack, and re-bias DET/CLS network to meet window with margin.
Pass criteria: Class result identical across X boots and Y conditions; class current stays within ±X% window.
SR improves efficiency but creates random faults — dead-time too tight or ringing false turn-on?
Likely cause: SR dead-time margin too small, or drain ringing couples into SR gate sense and causes false turn-on.
Quick check: Scope SR gate and secondary current; look for overlap and false pulses during ringing at worst load/line.
Fix: Increase dead-time (DT=X ns), add damping/snubber, and tighten SR gate routing/return to reduce false triggering.
Pass criteria: SR overlap time = 0; minimum DT margin ≥ X ns across Y corners; random fault rate ≤ X / day.
Light-load sleep saves power but gets disconnected — MPS missing due to burst mode?
Likely cause: Burst-mode or deep sleep pulls average input below PD-side MPS requirement during long idle windows.
Quick check: Log MPS-related events and measure input current profile over the sleep duty-cycle; confirm disconnect aligns with low-load segment.
Fix: Add controlled maintenance load (bleeder or pulsed loading) and schedule periodic wake if needed; keep AUX rail from collapsing into MPS loss.
Pass criteria: No disconnect for X hours with sleep duty-cycle Y%; input maintenance pulses occur every X ms (or bleeder ≤ X mW).
Fault pin toggles but logs show nothing — brownout wiped log or interrupt not latched?
Likely cause: Log write happens too late and gets lost during brownout, or INT/fault edge is not latched/qualified.
Quick check: Force a repeatable fault and verify: (1) log commit time, (2) hold-up time, (3) INT latch behavior and debounce.
Fix: Log “first-fault” immediately, add brownout-safe commit (NVM/retention RAM), and latch INT until host acknowledges.
Pass criteria: First-fault record retained after X ms brownout; INT remains asserted ≥ X ms or until ACK; missing logs = 0 / X trials.
Thermal looks fine, still trips OTP — hotspot near SR/rectifier or sensor placement mismatch?
Likely cause: Local hotspot (SR MOSFET/rectifier/transformer) exceeds OTP while the measured “average” point stays cool.
Quick check: Compare OTP trigger time with an IR scan (or thermocouples) at SR devices, transformer, and controller sensor location.
Fix: Improve hotspot spreading (copper/thermal vias), adjust SR timing to reduce loss, and align sensor placement to worst-case hotspot.
Pass criteria: Hotspot ΔT ≤ X °C at ambient Y °C; no OTP in X minutes at Y% load.
Inrush looks within limit, yet startup fails — UVLO hysteresis or bulk cap ESR/ESL?
Likely cause: Vbulk droops due to ESR/ESL and triggers UVLO; or UVLO hysteresis is too small and creates oscillation.
Quick check: Capture Vbulk droop and UVLO threshold crossing at startup; compare capacitor ESR/ESL and placement-induced inductance.
Fix: Increase hysteresis (ΔV=X), reduce loop inductance (placement/return), and choose cap ESR/ESL to meet hold-up window.
Pass criteria: Startup succeeds across X cold boots; Vbulk_min ≥ X V during enable; UVLO transitions ≤ X per boot.
Output short causes long recovery time — latch-off policy too strict or retry backoff too long?
Likely cause: Latch-off requires manual intervention, or conservative backoff delays re-power after transient shorts.
Quick check: Inject a controlled short for X ms; log fault code, action (latch/hiccup/retry), and backoff timing.
Fix: Use hiccup for transient faults, latch for hard faults; tune backoff and limit restart attempts per window.
Pass criteria: Recovery time ≤ X s for transient short; restart attempts ≤ X per Y minutes; first-fault cause recorded.
Field units fail after an ESD event but still power — leakage shift breaks signature?
Likely cause: Post-ESD leakage increases and shifts signature/class behavior or corrupts DET/CLS bias, making handshake fragile.
Quick check: Before/after ESD: measure leakage on the input path and compare signature/class stability; correlate with new boot failures.
Fix: Improve protection/leakage budget (layout return paths, clamp selection, cleanliness), and add margin to signature/class networks.
Pass criteria: After IEC ESD stress, class/signature stable within ±X%; boot success ≥ X% across Y boots; leakage ≤ X µA.
Different PD controller vendor in the same footprint changes behavior — DET/CLS pin bias mismatch?
Likely cause: Pin-level analog expectations differ (DET/CLS bias, thresholds, leakage, timing), even if footprint matches.
Quick check: Compare DET/CLS node voltages/currents during detect/class across vendors; validate required external component ranges.
Fix: Recalculate external networks (R/C values) for the new device’s biasing model; re-validate tolerance stack and leakage.
Pass criteria: Detect/class traces overlap within ±X%; class result stable across X boots; no false detect events in X trials.
Event counters disagree across firmware versions — window/denominator definition drift?
Likely cause: Counter definitions changed (window length, reset rules, denominator scope), producing incompatible metrics.
Quick check: Verify schema version and document: window=T, denominator=scope, reset=rule; replay identical test and compare raw events.
Fix: Freeze metric definitions, add schema version to logs, and publish a migration map between versions.
Pass criteria: Same test yields counters within ±X% across versions; schema version present in 100% of records; window = X s fixed.