Power & Thermal: Per-Gbps Power and Cooling for PHY/SerDes
← Back to:Interfaces, PHY & SerDes
Power & Thermal is about turning “per-Gbps” claims into an engineering budget: define measurement scope, model steady vs transient load, and map power into junction temperature with board-realistic thermal paths. The goal is to lock guardrails (cooling, derating, and pass criteria) so links remain stable across worst-case ambient, airflow, and aging.
Scope & non-overlap guardrails
This page treats power and thermal as an engineering contract: define comparable power metrics, map them to junction temperature, and choose package/cooling that preserves throughput margin across worst-case traffic, environment, and aging.
This page covers
- Per-Gbps power definitions and comparable measurement/estimation records (mW/Gbps).
- Thermal network usage (θJA/θJC/ψJT/ψJB) and how PCB/copper/airflow shift results.
- Workload & link states (training/idle/low-power/full-rate sustained) as power waveforms and worst-case selection.
- Package/cooling/layout choices and derating rules to maintain junction margin.
- Production & field temperature rise validation, thermal failure signatures, and logging fields.
This page does NOT cover
- Electrical eye/equalization mechanisms (only referenced as power inputs). Go to: Electrical Layer / CDR & Equalization
- ESD/surge/EMI compliance design (only referenced for leakage/heat side-effects). Go to: PHY Robustness
- Protocol versions/training state machines (only referenced for state transitions and wake energy). Go to: Protocol Compatibility
- PTP/jitter/clocking theory (only referenced for PLL/clock-tree power). Go to: Timing & Synchronization
Mandatory inputs to declare (minimum reproducible record)
Link parameters
- Link type (cable/backplane/multi-lane serial)
- Line rate (Gbps) + active lane count
- Traffic profile (duty/burst/idle ratio)
Feature & power rails
- Feature flags (DFE/FEC/retime/EEE on/off)
- Supply rails + voltage (VDDx) and measurement method
- State label (training/locked/idle/LPM/recovery)
Thermal environment
- Ambient (Ta) + airflow availability
- Package + PCB class (copper/vias/spreading area)
- Pass criteria placeholders (Tj margin X°C, case Y°C, τ)
Guardrail: any power number without this record is treated as non-comparable.
Diagram — Boundary map (solar-orbit view)
Use this boundary map to keep the page lean: sibling topics are referenced only as inputs or links, never expanded here.
Power taxonomy for PHY/SerDes
A comparable power discussion needs a shared dictionary: component blocks, link-state labels, PVT corners, and a minimum record that makes numbers portable across boards, labs, and vendors.
Component blocks (what consumes power)
- Static / bias: baseline that shifts with temperature and process leakage.
- Clock / PLL / CDR: reference conditioning and lock maintenance cost.
- Tx driver: swing/pre-emphasis settings (treated as power knobs here).
- Rx EQ (CTLE/DFE/ADC): often the dominant block when adaptive features are enabled.
- DSP / FEC: scales with sustained throughput and activity duty.
- Mgmt I/O: MDIO/I²C/GPIO/EEPROM access (small but persistent).
Record (minimum)
active lanes, line rate, feature flags, and which block is expected to dominate (Rx EQ vs DSP) for the target workload.
State power (what changes over time)
- Training / re-lock: can create short spikes that drive supply and hotspot stress.
- Locked / active: sustained operating point for thermal steady-state.
- Idle: must be defined (payload idle vs electrical idle) to be comparable.
- Low-power modes: reduce average power but introduce wake energy and recovery heat.
State log schema
- state_name, entry_condition
- steady_power, peak_power, duration
- event_count (per hour/day)
Typical vs max (PVT corners)
Datasheet “max” is not automatically the worst-case workload. Worst-case thermal often comes from a specific combination of corner + state + activity duty.
Corner declaration template
- Voltage: Vmin / Vnom / Vmax
- Temperature: Tmin / Tnom / Tmax
- Workload: steady / bursty / sleep-wake / retrain spikes
- Feature flags: EQ/FEC/retime/EEE (on/off)
Measurement rules (MVR)
A power number is comparable only when the minimum reproducible record (MVR) is present. This prevents “per-Gbps” from becoming a marketing artifact.
- device state label + duration to steady (τ)
- line rate + active lane count
- feature flags (DFE/FEC/retime/EEE)
- rails + voltage + measurement method (shunt/PMBus/bench)
- ambient Ta + airflow availability
- package + PCB class (copper/vias/spreading area)
Default rule: per-Gbps uses line-rate. Payload-rate must declare coding/FEC overhead.
Diagram — Power breakdown stacks (components × states)
Use stacked blocks (not curves) to compare designs: align the MVR record, then identify which block dominates under the target state.
Per-Gbps metric done right
Per-Gbps power becomes an engineering metric only after the denominator and numerator are declared, overhead is explicit, lane scaling is stated, and peak versus sustained windows are reported with a steady-state rule.
Denominator: line-rate vs payload-rate
- Line-rate is preferred for IC-to-IC comparison and port power budgeting.
- Payload-rate requires overhead to be declared (coding/FEC/idle).
Required declaration
denominator = line_rate or payload_rate; overhead = X% (coding/FEC/idle); duty = Y%
Numerator: chip vs board-level
- Chip power: sum of declared rails (best for device comparison).
- Board-level: includes DC/DC loss (better for system budgeting).
Allowed output labels
- mW/Gbps (chip)
- mW/Gbps (board)
Lane scaling: linear vs non-linear
Low-lane configurations often show worse mW/Gbps because fixed costs (bias, clocking, management) cannot be amortized.
Minimal model
Ptotal = Pfixed + Nlanes × Plane → mW/Gbps depends on Nlanes
Peak vs sustained (thermal relevance)
- Peak window catches transition spikes (power integrity and hotspot stress).
- Sustained window must reach thermal steady-state for reliability decisions.
Steady-state rule (placeholders)
- dT/dt < X °C/min over Y minutes
- or power variation < X% over Y minutes
Do
- Declare denominator (line vs payload) and overhead (coding/FEC/idle) + duty.
- Declare numerator scope (chip rails or board-level boundary) and method.
- Report peak window and sustained window with a steady-state rule.
Don’t
- Use payload-rate without explicit overhead and duty (non-comparable).
- Mix DC/DC, fan, or terminations into chip mW/Gbps (scope violation).
- Quote burst-only mW/Gbps as sustained power (thermal risk).
Diagram — Per-Gbps definition decision tree
Decision output must be a labeled metric (scope + denominator). Any missing record element makes cross-device comparisons invalid.
Workload model for links
Link power is a waveform driven by duty, burstiness, and retrain probability. A workload model converts traffic to peak and sustained power windows that can be validated against temperature rise and derating thresholds.
Workload primitives (measurable)
- Duty (%): active time within a window.
- Burstiness: burst length + idle gaps (avg/p95/max).
- Retrain rate: events per hour (or per day) under temperature drift.
Record (placeholders)
duty = X%; burst = (avg/p95/max) ms; gap = (avg/p95/max) ms; retrain = Y events/hour
Power-sensitive activity (observables)
- Adaptive updates can appear as small periodic steps in average power.
- Re-lock events appear as spikes followed by a new steady plateau.
- Queue/timestamp activity appears as duty increase and higher baseline.
Rule: workload templates are identified by waveform shape before choosing pass criteria.
W1 — Steady full-rate
Profile: duty ≈ X%; long continuous bursts; retrain ≈ Y/hour
Power impact: drives thermal steady-state and worst-case junction.
Log fields: steady_power, Ta/airflow, τ, Tcase, state_name
Use for: heatsink/airflow sizing and derating rule validation.
W2 — Bursty traffic
Profile: duty ≈ X%; burst/gap defined (avg/p95/max); retrain low
Power impact: average rises with duty; peak follows burst edges.
Log fields: duty, peak_power, burst_len, gap, power variance
Use for: average power budgeting and temperature ramp-rate checks.
W3 — Idle + wake cycles
Profile: long idle gaps; frequent wake events; short active bursts
Power impact: low average, but wake peak and recovery energy dominate stress.
Log fields: wake_count, peak_power, recovery_duration, state transitions
Use for: PSU transient margin and hotspot fatigue risk screening.
W4 — Retrain spikes (temperature drift)
Profile: sustained traffic with occasional re-lock spikes; retrain = Y/hour
Power impact: spikes raise peak and may shift the steady plateau upward.
Log fields: event_count, spike_amplitude, spike_duration, Ta drift
Use for: field drop-risk prediction and derating guardrails.
Diagram — Workload power waveforms (W1–W4)
Assign per-Gbps numbers to a workload template (W1–W4). Sustained power is valid only after the steady-state rule is met.
Thermal fundamentals you actually use
Thermal decisions become repeatable when power is tied to a workload window and the temperature path is expressed as a network with declared boundary conditions. This section turns θ/ψ parameters into practical equations and measurement records.
One-line estimate (screening)
Tj = Ta + P × θJA
Use for early feasibility screening. Final sign-off requires a declared test condition and a measured temperature proxy.
- Ta: ambient at a declared location (not “room temperature”).
- P: dissipated power tied to workload (W1–W4) and window (peak/sustained).
- θJA: effective junction-to-ambient under a specific board/airflow/orientation.
θ pitfalls (why θJA is not a constant)
Datasheet θ values are valid only under stated boundary conditions. Changing PCB copper, airflow, or mounting orientation can shift θJA enough to invalidate cross-board comparisons.
Declare test conditions (placeholders)
- JEDEC board class: X (layers/copper thickness/openings)
- Airflow: X m/s; orientation: horizontal/vertical
- Measurement method: natural/forced convection
- Power window: peak X ms; sustained ≥ X×τ
Rule: θ values without test conditions are treated as non-portable.
Using θJC / ψJT / ψJB (measured → junction)
When junction sensing is unavailable, estimate Tj using a defined temperature proxy and the appropriate parameter for that proxy.
Tj ≈ Tcase + P × θJC
Use when case definition and contact are controlled (heatsink/interface conditions declared).
Tj ≈ Ttop + P × ψJT
Common for board bring-up. Valid only with a defined Ttop measurement point and boundary condition.
Tj ≈ Tboard + P × ψJB
Useful when board sensors exist near the package. Sensor location must be specified (distance/side).
Guardrail
ψ parameters are characterization values. They must be tied to a measurement point and condition; otherwise, back-calculated Tj is not portable.
Multi-source effects (keep it measurable)
Neighbor heat sources and shielding can raise local ambient and shift hot-spot behavior. Treat them as declared inputs, not surprises.
Neighbor record (placeholders)
- neighbor_power = X W; distance = X mm
- shielding_present = yes/no; airflow_path = open/blocked
- Ta_local measured at X location; window = sustained
Practical approximation: local ambient dominates error more often than the θ parameter itself.
Diagram — Thermal resistance network (junction / case / board / ambient)
Use θ values only with declared boundary conditions. Use ψ values only with a defined measurement point when estimating Tj.
Package & PCB heat-spreading choices
Package thermal behavior is defined by its dominant heat path and the PCB it is mounted on. This section provides a package-to-PCB selection logic that stays in the thermal/manufacturing domain (no electrical-layer discussions).
Selection anchor (placeholders)
θJA,target = ΔTallowed / P
Pick package + PCB spreading that can meet θJA,target under the declared airflow/orientation.
- ΔTallowed: junction margin to derating threshold (X °C placeholder)
- P: sustained workload power (W1–W4 + window)
Heat paths (what matters)
- Bottom path: die → pad/balls → solder → vias → planes (often dominant).
- Top path: die → mold/top → ambient (improves with contact/heatsink).
- Spreading: copper planes convert a hotspot into a lower-gradient field.
Key PCB levers (placeholders)
- copper_spread_area = X mm²; plane_layers = X
- via_count = X; via_pitch = X; backside_copper = yes/no
- keepout/slot = yes/no (controls isolation vs spreading)
Manufacturing & reliability coupling
Thermal performance is affected by assembly quality. Voiding and warpage can raise effective thermal resistance and shift hot-spot location.
Process record (placeholders)
- voiding_percent = X%; reflow_profile_id = X
- warpage_flag = yes/no; rework_cycles = X
Package quick cards (thermal + manufacturing only)
QFN (exposed pad)
Heat path: strong bottom conduction through exposed pad.
PCB needs: via array + large copper spreading region.
Process risks: voiding under pad; paste/thermal-via tuning.
Suitable power: < X W or X–Y W (placeholder; depends on copper/airflow)
QFP
Heat path: weaker bottom conduction; more dependence on convection.
PCB needs: copper spreading helps but is less direct than pad/balls.
Process risks: lead coplanarity; airflow sensitivity.
Suitable power: < X W (placeholder; depends on airflow and surface area)
BGA
Heat path: improved bottom conduction via ball array into PCB planes.
PCB needs: plane layers + via stitching for spreading.
Process risks: warpage control; inspection complexity.
Suitable power: X–Y W or > Y W (placeholder; strongly PCB-dependent)
FCBGA
Heat path: high power-density handling with stronger, controllable conduction paths.
PCB needs: robust planes, thermal via strategy, and controlled stack-up.
Process risks: cost and assembly control; warpage/underfill tradeoffs.
Suitable power: > Y W (placeholder; validate with θJA,target)
WLCSP
Heat path: low thermal mass; highly sensitive to PCB spreading and local hotspots.
PCB needs: careful copper strategy; strong control of nearby heat sources.
Process risks: board-level reliability and rework constraints.
Suitable power: < X W (placeholder; validate with real board conditions)
Diagram — Package + PCB thermal conduction cross-section (concept)
Package thermal performance is a package × PCB function. Use θJA,target to anchor selection, then validate with measured proxies (Ttop/Tboard).
Cooling options and when they backfire
Cooling is not a checklist of parts. Each method has a boundary condition where it becomes ineffective or moves heat into a more sensitive area. This section focuses on actionable “works when…” criteria and “backfires when…” failure triggers.
Natural vs forced convection
Works when
- Air path reaches the hotspot (no bypass short-circuit).
- Boundary layer is disrupted at the hotspot surface.
- Intake air temperature is controlled/declared.
Backfires when
- Airflow bypasses the device (high system CFM, low local flow).
- Fan is present but hotspot boundary layer remains intact.
- Hot recirculation raises local ambient around the device.
Quick checks (placeholders)
- ΔTin-out = X °C (intake vs exhaust)
- Local hotspot airflow present: yes/no
- Hotspot location shifts with airflow direction: yes/no
Heatsink + contact + TIM
Works when
- Contact resistance is minimized (flatness + controlled pressure).
- TIM thickness/compression is within the intended range.
- Mechanical retention maintains pressure across thermal cycling.
Backfires when
- Large heatsink but poor contact (θcontact dominates).
- Clamp pressure relaxes (vibration / thermal cycling).
- TIM pumps out or ages → gradual temperature drift in the field.
Quick checks (placeholders)
- Pre/post re-test ΔTtop drift ≤ X °C
- Contact area evidence (inspection): pass/fail
- Thermal decay after stop-load is “fast” or “slow” (qualitative)
Chassis conduction (when heat “moves”)
Works when
- Heat is routed to a large thermal mass away from sensitive parts.
- Interface stack is controlled (pad thickness + compression).
- Resulting gradients are reduced, not relocated.
Backfires when
- Thermal “short” conducts heat into a more sensitive component zone.
- Hotspot relocates (device cools, nearby critical part heats up).
- Local gradients increase → hidden mechanical stress risk.
Quick checks (placeholders)
- Max temperature location changes after adding chassis path: yes/no
- Sensitive-zone ΔT increase ≤ X °C (placeholder)
- Gradient metric ≤ X °C/cm (placeholder)
Fan strategy (field degradation aware)
Works when
- Control targets temperature margin, not maximum RPM.
- Intake filtering and maintenance are designed-in.
- Fan health is monitored and logged.
Backfires when
- Dust clogging reduces airflow over time → hidden drift.
- Noise constraints force lower RPM without re-validating margin.
- Fan stalls/degrades with no telemetry → misdiagnosed failures.
Log fields (placeholders)
- fan_rpm, pwm_duty, alarm_count
- intake_temp, exhaust_temp, Ta_local
- maintenance_age_days (or filter ΔP if available)
Selection matrix (goal → recommended bundle)
Low cost
- Copper spreading + via array
- Air path cleanup (avoid bypass)
- Declare boundary conditions
Backfire check: hotspot local airflow = yes/no; ΔTin-out within expected range.
Low noise
- Maximize spreading first
- Low-RPM fan + clean ducting
- Control contact resistance (TIM + pressure)
Backfire check: contact-driven drift (pre/post ΔTtop ≤ X °C).
High reliability
- Fan telemetry + alarms + maintenance plan
- Dust mitigation (filtering/accessible service)
- Chassis conduction with sensitive-zone guard
Backfire check: hotspot migration = no; sensitive-zone ΔT ≤ X °C.
Diagram — Cooling selection matrix (2×2)
The matrix is anchored by power density and airflow constraints. Each quadrant adds a “backfire” tag to force a verification check.
Thermal-aware bring-up & validation
Thermal validation becomes meaningful only when measurement points are defined, steady-state is gated, workload is repeatable, and Tj is derived via the appropriate θ/ψ mapping. This section provides a bring-up flow that is production- and field-friendly.
Define conditions
- Ambient and location: Ta measured at X point (placeholder).
- Airflow mode: fixed RPM or closed-loop control (declared).
- Workload template: W1–W4 + window (peak/sustained).
- Power scope: chip-only vs board-including (label).
Pitfall
Unfixed conditions produce non-comparable results and false “randomness”.
Measure points & sensors
Minimum measurement set
- Ttop/Tcase (top-of-package)
- Tboard (nearby copper region)
- Tin/Tout (intake/exhaust)
- Ta (ambient; declared location)
Sensor notes (high-impact)
- Thermocouple attachment method must be consistent (adhesive/coverage).
- IR camera requires emissivity declaration (placeholder) and reflection control.
- On-board sensors are position-biased → record location and distance.
Gate steady-state
Use a steady-state gate based on time constant and temperature slope. Avoid taking a snapshot during transient warm-up.
Gating criteria (placeholders)
- Wait time ≥ X × τ (placeholder)
- dT/dt < X °C/min over Y min (placeholder)
- Re-run workload template if retrain/error spikes occurred
Convert & decide
Tj conversion (choose one)
- Tj ≈ Ttop + P × ψJT
- Tj ≈ Tboard + P × ψJB
- Tj ≈ Tcase + P × θJC (case defined)
Pass criteria (placeholders)
- Tj margin ≥ X °C
- Tcase ≤ X °C
- Hotspot gradient ≤ X °C/cm
Record fields (placeholders)
- workload_id (W1–W4), window_type (peak/sustained)
- steady_state_met (yes/no), retrain_spikes (count)
- Ttop, Tboard, Tin, Tout, Ta, P_total (scope label)
Diagram — Thermal validation flow
The flow enforces steady-state gating and a declared θ/ψ mapping before pass/fail decisions are made.
Power integrity & thermal coupling
This section treats power delivery only through a power/thermal lens: conversion efficiency, transient spikes, protection-driven derating, and the temperature-to-power positive feedback loop. It avoids electrical-layer tuning details.
DC/DC efficiency becomes board heat
Even if chip power is unchanged, input-side loss turns into extra heat near the power stage and copper planes. A practical placeholder range is +10–20% board heat overhead, depending on efficiency and loading.
Thermal viewpoint checklist
- Declare DC/DC efficiency at the tested load (η = X%, placeholder).
- Log converter temperature zone near hotspots (Tpwr, placeholder).
- Separate “chip power” vs “board-including power” in reports.
Transient spikes distort power and heat readings
Training / wake / relock windows can create short power spikes. These can cause voltage ripple and measurement aliasing, leading to under-estimated sustained heat or over-estimated per-Gbps numbers if sampling windows are not aligned.
Quick “sanity gates” (placeholders)
- Peak window length = X ms; sustained window length = Y s.
- Spike count per minute ≤ X (placeholder).
- Ripple proxy / variance metric flagged: yes/no.
Protection-driven derating (thermal safety valves)
Over-temperature protection and throttling are “breakers” that prevent runaway. They also change observed power, so validation must log whether the system is in normal, throttled, or degraded mode.
Common breakers (strategy only)
- OTP threshold reached → throttle / cut activity (placeholder T).
- Lane disable / reduced duty cycle to preserve temperature margin.
- Low-power modes (e.g., energy saving) to reduce sustained heating.
Temperature → leakage → power (positive feedback)
As temperature rises, leakage and bias shifts can increase power, especially in advanced processes. This creates a positive loop that must be broken by airflow, throttling, or derating rules.
What to log (minimal set)
- Ttop/Tj proxy, power scope label (chip/board)
- Throttle state (normal/throttled), lane count active
- Converter η (placeholder), local power-stage temperature
Thermal–electrical coupling loop (explain + break)
A practical way to prevent misleading power numbers is to explicitly label the loop states: normal → warming → near-limit → throttled. Validation should record which “breaker” is active (airflow, throttling, derating) when the temperature slope changes.
Diagram — Thermal–electrical positive feedback loop with breakers
The loop is intentionally simple: it enforces scope labeling and breaker logging rather than electrical tuning details.
Design guardrails & derating rules
This section turns the page into reusable engineering guardrails: power budgets across chip/port/system, thermal budgets with explicit margin, and derating rules that map temperature back to allowable throughput or duty-cycle.
Three-layer power budget
- Chip: silicon-only power (declared state + temperature)
- Port: per-port / per-lane budget (scales, but not perfectly linear)
- System: DC/DC loss + fan + chassis paths (board-including)
Guardrail rule: reports must label scope (chip vs board-including).
Thermal budget (with explicit margin)
- Ta(max) = X °C (placeholder)
- Tj(max allowed) = X °C (placeholder)
- Reserve margin = X °C (placeholder)
Guardrail rule: validation must demonstrate margin under sustained workload, not only peak snapshots.
Derating rule model (placeholder)
Use a declared linear model to map temperature to allowed activity: for each +ΔT, reduce duty-cycle or throughput by a fixed step.
Example format (placeholders)
- If T ≥ Tguard, then duty = duty − k × (T − Tguard)
- If T ≥ Tcrit, then throttle state = ON
- If T returns below Trecover, then ramp back with hysteresis
Copyable guardrails (If / Then)
- If system power scope is used, then DC/DC efficiency (η) and fan power must be logged.
- If peak windows are reported, then sustained windows (Y s) must be reported with steady-state gating.
- If Tj margin < X °C, then reduce duty-cycle or throughput per the declared derating model.
- If hotspot relocates after adding chassis conduction, then redesign thermal path to protect sensitive zones.
- If P/Gbps exceeds X mW/Gbps (placeholder), then upgrade package / spreading / airflow bundle before lock-in.
- If throttling is observed in validation, then published performance must be labeled “throttled-mode” or redesigned.
- If dust/aging is expected, then add maintenance fields (age, filter status) and re-validate margins over time.
- If any budget changes, then the loop (budget → Tj → validate → derate/redesign) must be re-run and re-locked.
Diagram — Budget → thermal → derate loop (spec lock)
The loop enforces re-validation whenever a budget, cooling bundle, or workload assumption changes.
Applications (Power/Thermal-only)
Applications are framed strictly as “how thermal constraints reshape system design.” No protocol details are expanded here. Each card provides: scenario → thermal pain points → design actions → validation items.
Material numbers note: examples below are common, widely stocked families. Always verify thickness / size / suffix / safety rating / availability against your mechanical stack and compliance needs.
Industrial long-line gateway (sealed box + high Ta + sustained load)
Thermal pain points: steady-state heat soak, converter loss adding board heat, hotspots trapped under shields, and margin collapse when dust aging reduces airflow.
Design actions: define a front-to-back airflow path (even “pseudo-ducts” with foam), isolate power-stage heat from PHY/SerDes, and enforce a derating curve tied to measured surface temperature.
Validation items (placeholders)
- Steady-state gate: wait ≥ X·τ before declaring pass/fail.
- Record: Ta, Tin/Tout, Ttop, Tboard-hotspot, throttle state, DC/DC η (placeholder).
- Pass: Tj margin ≥ X °C; surface ≤ X °C; hotspot gradient ≤ X °C/cm.
Example materials / part numbers (verify fit)
- Thermal tape: 3M 8810 / 3M 8815 (bonding, thin interface).
- Thermal pad family: (Henkel/Bergquist) GAP PAD 1500 series (thickness-dependent variants).
- Heatsink (example): Aavid Thermalloy 576802B00000G (board-level clip-on style).
- Fan family (example): Delta AFB series (size/speed variants); SUNON MagLev series (variants).
Camera links (small volume + remote power heat concentration)
Thermal pain points: hotspot under RF/EMI shields, heat path competing with shielding, and skin temperature constraints (touch-safe surface).
Design actions: create a defined conduction path (chip → spreader → chassis), avoid “thermal shorting” into sensitive optics, and prefer pads/tapes that keep thickness stable over aging.
Validation items (placeholders)
- Surface temperature: Tsurf ≤ X °C (touch-safe placeholder).
- Measure Ttop at shielded hotspots; verify gradient vs chassis region.
- Run steady + burst template to detect wake/processing heat spikes.
Example materials / part numbers (verify fit)
- Thermal tape: 3M 8810 (thin bonding to chassis/spreader).
- Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (choose thickness for stack).
- Thermal grease (example): DOWSIL TC-5022 (industrial TIM family; verify grade/spec).
- Shield interface: conductive foam + thermal pad stack (select by compliance need).
Server retimers (high lane count + high heat density)
Thermal pain points: heatsink attachment quality (contact resistance), airflow shadowing, and “best-looking per-Gbps” that collapses once throttling engages under sustained traffic.
Design actions: prioritize mechanical repeatability (clip/torque), enforce airflow directionality, and require telemetry / temperature flags for production correlation.
Validation items (placeholders)
- Require “sustained” window: Y minutes at worst duty + worst Ta.
- Log: Ttop/Tboard, airflow state, heatsink attachment lot/process.
- Pass: no throttling within X margin; stable temperature slope.
Example materials / part numbers (verify fit)
- Heatsink (example): Aavid Thermalloy 576802B00000G (example family; choose footprint).
- Thermal tape: 3M 8815 (higher thickness variant vs 8810).
- Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (compressibility helps tolerance).
- Fan family: Delta AFB series / Nidec UltraFlo series (select by airflow/noise targets).
Automotive (temperature cycling + lifetime + degradation)
Thermal pain points: repeated thermal stress changes contact resistance and pad performance; field conditions drift beyond lab assumptions; failures can be intermittent and correlation requires telemetry.
Design actions: enforce conservative margins, define derating by temperature bands, and implement graceful degradation modes (reduced lane count / reduced duty) tied to measured temperature.
Validation items (placeholders)
- Cycle test: ΔT range = X °C; cycles = N (placeholders).
- Log: max Ttop, time-over-threshold, derate events, recovery hysteresis.
- Pass: no uncontrolled throttling; predictable derate curve; stable hotspot mapping.
Example materials / part numbers (verify fit)
- Thermal tape: 3M 8810 / 3M 8815 (process repeatability; verify temp rating).
- Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (choose for cycling compliance).
- Thermal grease (example): DOWSIL TC-5022 (verify grade/spec for automotive constraints).
- Heatsink (example): Aavid Thermalloy 576802B00000G (verify vibration/attachment needs).
Diagram — Scenario → thermal constraints mapping (card-style)
Each scenario card intentionally shows only thermal constraints, actions, and validation signals.
IC selection logic (Power/Thermal)
This is a decision workflow (not a product list). It starts with required inputs, computes junction temperature feasibility, then decides package/cooling tier and whether derating/telemetry is mandatory.
Required inputs (must-fill)
- Line rate + lane/port count
- Duty-cycle / workload template (steady vs burst)
- Ta(max), airflow availability, enclosure type
- Board area + z-height for heatsink/duct
- Allowed surface temperature (Tsurf limit)
Missing inputs = misleading mW/Gbps comparisons.
Power/Thermal metrics to demand
- mW/Gbps with declared scope (chip-only vs board-including)
- Pmax at Ta(max) (peak vs sustained windows labeled)
- θ/ψ metrics with test conditions (board, copper, airflow)
- Package thermal path features (EPAD / BGA heat spread)
- OTP + thermal telemetry availability (verify in datasheet)
- Low-power state benefit vs wake spike cost (thermal shock)
Red flags (risk markers)
- No θ/ψ test conditions published
- No power vs rate/state/temperature curves (or equivalent)
- Only “typical” power; no max/corners labeling
- No OTP or no measurable thermal/derate status
- Peak-only claims without sustained thermal gating
Example IC material numbers (for comparison workflows)
These are examples only (not recommendations). Use them to build a consistent power/thermal comparison sheet and validate availability, package, and suffix.
PCIe retimer / redriver
- TI DS80PCI402
- TI DS160PR412
- TI DS280DF810
USB redriver / hub (examples)
- TI TUSB522
- TI TUSB1046
- Microchip USB5744 (hub family example)
Ethernet PHY (examples)
- TI DP83822I
- Microchip LAN8840
- NXP TJA1103 (automotive Ethernet PHY example)
Thermal management materials
- 3M 8810 / 3M 8815 (thermal tape)
- (Henkel/Bergquist) GAP PAD 1500 series
- Aavid 576802B00000G (heatsink example)
- DOWSIL TC-5022 (TIM example)
Diagram — Power/Thermal selection decision tree
The workflow forces declared measurement scope and sustained thermal gating before any “per-Gbps” comparison is accepted.
Recommended topics you might also need
Request a Quote
FAQs (Power & Thermal)
Troubleshooting is scoped strictly to power definitions, thermal metrics, cooling effectiveness, derating behavior, and measurement traps. Each answer is intentionally short and executable.
Answer format: Likely cause / Quick check / Fix / Pass criteria (thresholds use X/Y placeholders)
“Datasheet per-Gbps looks great, but system power is 30% higher”—first accounting check?
Likely cause: mismatched scope (chip-only vs board-including), wrong denominator (line rate vs payload), or missing losses (DC/DC η, LDO drop, termination, clocks).
Quick check: measure VIN×IIN at board input and compare against summed rail power at the IC load points; tag state: lanes, rate, FEC/DFE enable, idle vs sustained window.
Fix: lock a single comparison template: scope (chip/board), denominator (line/payload), window (peak/sustained), and always include conversion losses using measured DC/DC efficiency at that load point.
Pass criteria: accounting gap ≤ X% (board-in vs sum-of-rails), and reported mW/Gbps varies ≤ X% across repeated runs with identical state tags.
“Works for 5 minutes then drops”—is it thermal throttling or link instability?
Likely cause: steady-state thermal rise crossing a throttle/OTP threshold, or temperature-driven leakage pushing power higher until protection engages.
Quick check: log a timeline of Ttop/Tboard, input power, and throttle/derate flags; look for “drop event” alignment with a temperature knee or a power step.
Fix: reproduce with a controlled workload window (sustained Y min), add airflow or heatsink contact improvement, and apply a derate rule before the threshold (temperature hysteresis included).
Pass criteria: no throttle/OTP during Y minutes sustained worst-case workload at Ta(max), and temperature slope |dT/dt| ≤ X °C/min after reaching steady state.
Same throughput, different PCB revision runs hotter—what board-level variable usually changed?
Likely cause: reduced heat spreading (copper area/planes), fewer or smaller thermal vias, changed component placement creating an airflow shadow, or added nearby heat sources (DC/DC, shield can).
Quick check: compare rev-A vs rev-B: copper pour under the package, via count/pitch, inner-plane continuity, and physical obstructions above the device; capture identical workload heatmaps at the same Ta.
Fix: restore thermal path symmetry (spread copper, add via array, keep planes continuous), and enforce a mechanical keep-out to protect airflow over the hotspot.
Pass criteria: ΔT(top) between revisions ≤ X °C at the same sustained window, or demonstrated θJA improvement ≥ X% with the same test setup.
IR camera shows low temp but failures correlate with heat—what emissivity/spot error to suspect?
Likely cause: wrong emissivity on shiny surfaces, reflections from hot neighbors, spot size larger than the hotspot, or viewing angle causing under-read.
Quick check: place matte tape/paint dot on the package top, re-measure with emissivity set accordingly, and cross-check with a thin thermocouple; confirm the IR spot diameter < X× hotspot size.
Fix: standardize IR procedure: emissivity reference patch, fixed distance/angle, and a calibration step against a contact sensor on the same surface.
Pass criteria: IR vs contact measurement error ≤ X °C on the same marked spot under steady-state conditions.
θJA from datasheet underestimates Tj by a lot—what JEDEC condition mismatch is typical?
Likely cause: θJA was measured on a specific JEDEC board (copper, layers, airflow, orientation) that does not match the real PCB (smaller copper, blocked airflow, enclosure).
Quick check: read the datasheet θJA conditions (board type, copper area, airflow) and compare to the actual stack; use ψJT/ψJB with measured Ttop/Tboard to back-calculate Tj.
Fix: replace single θJA with a board-calibrated model: measure Ttop and Tboard under known power, derive an effective θ for that assembly, then re-run margin and derate rules.
Pass criteria: effective θ model predicts measured Ttop/Tj within ≤ X °C across two workloads (steady + burst) at Ta(max).
Fan increases airflow but hotspot gets worse—what airflow path or recirculation pattern causes this?
Likely cause: short-circuit airflow (inlet to outlet bypass), recirculation loop, or a new “shadow zone” where high-speed flow skips the hotspot and pulls warm air back over it.
Quick check: run a simple airflow visualization (smoke thread / tissue / anemometer points) to confirm direction and bypass; measure Tin/Tout and hotspot Ttop with the same workload window.
Fix: add ducting/foam baffles to force flow across the hotspot, separate inlet/outlet, and remove obstructions that create a local recirculation pocket.
Pass criteria: hotspot temperature improves by ≥ X °C and ΔT(Tout−Tin) increases by ≥ X °C (indicating useful heat extraction) under the same sustained load.
Low-power mode saves energy but increases temperature spikes after wake—what transient to log?
Likely cause: wake causes a short burst of digital activity and analog re-lock that raises instantaneous power; average energy improves but peak thermal shock increases.
Quick check: log power at high time resolution (≥ X ksps equivalent) for the first Y ms after wake, plus temperature slope; correlate spikes with wake events.
Fix: stagger wake (ports/lane groups), limit wake burst rate, or pre-condition airflow; if allowed, add a soft-start ramp in firmware/power sequencing.
Pass criteria: peak power during wake ≤ X% of sustained power budget, and post-wake ΔT within Y seconds ≤ X °C (thermal shock bound).
Thermal pad added but no improvement—what contact resistance or mounting pressure issue is common?
Likely cause: pad too thick/hard (insufficient compression), uneven pressure, air gaps/voids, or a flatness mismatch that prevents real contact at the hotspot.
Quick check: inspect pad “imprint” after disassembly; verify compression ratio (actual thickness vs nominal), and check if the hotspot aligns with the contact zone.
Fix: choose a pad with appropriate softness/compressibility and thickness tolerance; improve mounting planarity and clamp force; consider tape/grease if stack height allows.
Pass criteria: confirmed compression within X–Y% target range and hotspot reduction ≥ X °C under identical sustained workload.
Why does higher ambient cause more than linear power rise—what leakage/OTP behavior explains it?
Likely cause: temperature increases leakage and bias currents, raising power; higher temperature can also reduce DC/DC efficiency and trigger pre-throttle behavior that changes operating point.
Quick check: sweep Ta in steps (e.g., +10 °C) and log input power, device temperature, and any throttle flags; check whether power rise accelerates near a protection threshold.
Fix: apply derating by temperature bands, ensure airflow margin at Ta(max), and re-validate power budget using the highest-Ta sustained window (not a short burst).
Pass criteria: at Ta(max), sustained input power ≤ P_budget and Tj margin ≥ X °C; no protection/pre-throttle activation within Y minutes steady window.
Port-to-port temperature spread is huge—what copper/heat-spreading asymmetry is most common?
Likely cause: unequal copper planes/via arrays, one port located near a hot converter or shield edge, or uneven airflow distribution across the board.
Quick check: compare each port’s copper/via pattern and nearby heat sources; measure a synchronized heatmap under the same traffic/duty and confirm airflow direction across all ports.
Fix: balance heat spreading geometry (plane continuity, via density), move or isolate nearby heat sources, and use airflow guides so each port sees similar inlet temperature.
Pass criteria: ΔT(port max − port min) ≤ X °C at the sustained worst-case window and Ta(max).
DC/DC runs cool but PHY is hot—what efficiency/load point misleads the measurement?
Likely cause: power was measured at the wrong node (before LDOs or after sense resistors), or DC/DC is off the hotspot while the IC dissipates most heat locally; efficiency curve may be poor at transient load.
Quick check: measure rail power at the IC load points (V at the pins + I into the rail) during both peak and sustained windows; compare against VIN×IIN to reveal hidden drops/losses.
Fix: instrument power with correct sense locations (Kelvin), include all intermediate losses (LDO, cable, connector), and use a converter operating region that matches the sustained load point.
Pass criteria: power discrepancy between VIN×IIN and sum of rail powers ≤ X%, and sustained rail ripple/voltage droop stays within X (unit placeholder) during retrain/wake events.
Production passes at 25°C but field fails in summer—what margin number is usually missing?
Likely cause: qualification used short bursts (not sustained), did not test Ta(max), ignored airflow degradation (dust/blocked vents), or lacked an explicit Tj/surface margin requirement.
Quick check: replicate a field-like sustained duty at elevated Ta and reduced airflow; log time-over-temperature, throttle events, and peak-to-steady temperature rise.
Fix: define acceptance at Ta(max) with a sustained window and explicit margin, then implement derating before the limit; add telemetry/log fields for correlation in the field.
Pass criteria: at Ta(max) and “degraded airflow” condition, Tj margin ≥ X °C, surface ≤ X °C, and no throttle/OTP during Y minutes sustained duty.