Power & Thermal: Per-Gbps Power and Cooling for PHY/SerDes

Q: “Datasheet per-Gbps looks great, but system power is 30% higher”—first accounting check?

Likely cause: mismatched scope (chip-only vs board-including), wrong denominator (line vs payload), or missing losses (DC/DC efficiency, LDO drop, termination, clocks). Quick check: measure VIN×IIN at board input and compare against summed rail power at the IC load points; tag lanes, rate, feature enables, and peak/sustained windows. Fix: lock a single comparison template for scope/denominator/window and include conversion losses using measured efficiency at the same load point. Pass criteria: accounting gap ≤ X% and reported mW/Gbps varies ≤ X% across repeated runs with identical state tags.

Q: “Works for 5 minutes then drops”—is it thermal throttling or link instability?

Likely cause: steady-state thermal rise crossing a throttle/OTP threshold or temperature-driven leakage pushing power up until protection engages. Quick check: log Ttop/Tboard, input power, and throttle/derate flags; confirm drop alignment with a temperature knee or power step. Fix: reproduce with a sustained Y-minute window; improve airflow/contact and apply derating before the threshold with hysteresis. Pass criteria: no throttle/OTP during Y minutes at Ta(max) and |dT/dt| ≤ X °C/min after reaching steady state.

Q: Same throughput, different PCB revision runs hotter—what board-level variable usually changed?

Likely cause: reduced copper/planes, fewer thermal vias, airflow shadowing, or added nearby heat sources. Quick check: compare copper/via patterns, plane continuity, and mechanical obstructions; capture identical workload heatmaps at the same Ta. Fix: restore heat-spreading geometry (copper, via array, planes) and enforce mechanical keep-outs for airflow. Pass criteria: ΔT(top) between revisions ≤ X °C at the same sustained window or effective θJA improves ≥ X% with the same setup.

Q: IR camera shows low temp but failures correlate with heat—what emissivity/spot error to suspect?

Likely cause: wrong emissivity on shiny surfaces, reflections, spot size larger than hotspot, or angle-induced under-read. Quick check: add a matte reference patch and cross-check with a thin thermocouple; ensure IR spot diameter < X× hotspot size. Fix: standardize IR procedure (reference patch, fixed distance/angle, calibration against contact sensor). Pass criteria: IR vs contact error ≤ X °C on the same marked spot at steady state.

Q: θJA from datasheet underestimates Tj by a lot—what JEDEC condition mismatch is typical?

Likely cause: datasheet θJA was measured on a JEDEC board/airflow/orientation that does not match the real PCB/enclosure. Quick check: compare θJA conditions to the actual stack; use ψJT/ψJB with measured Ttop/Tboard to estimate Tj. Fix: build a board-calibrated effective θ model from measurements under known power, then re-run margin/derating. Pass criteria: model predicts measured temperatures within ≤ X °C across two workloads at Ta(max).

Q: Fan increases airflow but hotspot gets worse—what airflow path or recirculation pattern causes this?

Likely cause: inlet-to-outlet bypass, recirculation loop, or a shadow zone that skips the hotspot and reuses warm air. Quick check: visualize airflow direction and bypass; measure Tin/Tout and hotspot Ttop under the same workload window. Fix: add ducting/baffles to force flow across the hotspot and separate inlet/outlet paths. Pass criteria: hotspot improves by ≥ X °C and ΔT(Tout−Tin) increases by ≥ X °C under the same sustained load.

Q: Low-power mode saves energy but increases temperature spikes after wake—what transient to log?

Likely cause: wake bursts (digital activity + analog re-lock) raise instantaneous power; average energy improves but peak thermal shock increases. Quick check: log power at ≥ X ksps-equivalent for the first Y ms after wake plus temperature slope; correlate spikes with wake events. Fix: stagger wake across ports/lane groups, limit wake rate, or add soft-start ramp and airflow pre-conditioning. Pass criteria: wake peak power ≤ X% of sustained budget and post-wake ΔT within Y seconds ≤ X °C.

Q: Thermal pad added but no improvement—what contact resistance or mounting pressure issue is common?

Likely cause: pad too thick/hard, uneven pressure, voids, or flatness mismatch preventing real contact at the hotspot. Quick check: inspect pad imprint; verify compression ratio and hotspot alignment with contact zone. Fix: choose appropriate thickness/compressibility, improve planarity/clamp force, or switch to tape/grease if allowed. Pass criteria: compression within X–Y% target and hotspot reduction ≥ X °C under identical sustained workload.

Q: Why does higher ambient cause more than linear power rise—what leakage/OTP behavior explains it?

Likely cause: leakage and bias currents rise with temperature; DC/DC efficiency may drop and protection behavior can shift operating point. Quick check: sweep Ta in steps and log input power, device temperature, and throttle flags; watch for acceleration near thresholds. Fix: derate by temperature bands and validate at Ta(max) using sustained windows. Pass criteria: sustained input power ≤ P_budget at Ta(max), Tj margin ≥ X °C, and no protection/pre-throttle within Y minutes.

Q: Port-to-port temperature spread is huge—what copper/heat-spreading asymmetry is most common?

Likely cause: unequal copper/via arrays, proximity to hot blocks, or uneven airflow distribution. Quick check: compare geometry and neighbors per port; capture synchronized heatmaps under identical duty and airflow. Fix: balance heat-spreading (planes/vias), isolate hot neighbors, and guide airflow for consistent inlet conditions. Pass criteria: ΔT(port max − port min) ≤ X °C at Ta(max) during sustained worst-case window.

← Back to:Interfaces, PHY & SerDes

Power & Thermal is about turning “per-Gbps” claims into an engineering budget: define measurement scope, model steady vs transient load, and map power into junction temperature with board-realistic thermal paths. The goal is to lock guardrails (cooling, derating, and pass criteria) so links remain stable across worst-case ambient, airflow, and aging.

Scope & non-overlap guardrails

This page treats power and thermal as an engineering contract: define comparable power metrics, map them to junction temperature, and choose package/cooling that preserves throughput margin across worst-case traffic, environment, and aging.

Covers: power → temperature → cooling decisions Guardrail: link out, do not expand

This page covers

Per-Gbps power definitions and comparable measurement/estimation records (mW/Gbps).
Thermal network usage (θJA/θJC/ψJT/ψJB) and how PCB/copper/airflow shift results.
Workload & link states (training/idle/low-power/full-rate sustained) as power waveforms and worst-case selection.
Package/cooling/layout choices and derating rules to maintain junction margin.
Production & field temperature rise validation, thermal failure signatures, and logging fields.

This page does NOT cover

Electrical eye/equalization mechanisms (only referenced as power inputs). Go to: Electrical Layer / CDR & Equalization
ESD/surge/EMI compliance design (only referenced for leakage/heat side-effects). Go to: PHY Robustness
Protocol versions/training state machines (only referenced for state transitions and wake energy). Go to: Protocol Compatibility
PTP/jitter/clocking theory (only referenced for PLL/clock-tree power). Go to: Timing & Synchronization

Mandatory inputs to declare (minimum reproducible record)

Link parameters

Link type (cable/backplane/multi-lane serial)
Line rate (Gbps) + active lane count
Traffic profile (duty/burst/idle ratio)

Feature & power rails

Feature flags (DFE/FEC/retime/EEE on/off)
Supply rails + voltage (VDDx) and measurement method
State label (training/locked/idle/LPM/recovery)

Thermal environment

Ambient (Ta) + airflow availability
Package + PCB class (copper/vias/spreading area)
Pass criteria placeholders (Tj margin X°C, case Y°C, τ)

Guardrail: any power number without this record is treated as non-comparable.

Diagram — Boundary map (solar-orbit view)

Use this boundary map to keep the page lean: sibling topics are referenced only as inputs or links, never expanded here.

Power taxonomy for PHY/SerDes

A comparable power discussion needs a shared dictionary: component blocks, link-state labels, PVT corners, and a minimum record that makes numbers portable across boards, labs, and vendors.

Component blocks (what consumes power)

Static / bias: baseline that shifts with temperature and process leakage.
Clock / PLL / CDR: reference conditioning and lock maintenance cost.
Tx driver: swing/pre-emphasis settings (treated as power knobs here).
Rx EQ (CTLE/DFE/ADC): often the dominant block when adaptive features are enabled.
DSP / FEC: scales with sustained throughput and activity duty.
Mgmt I/O: MDIO/I²C/GPIO/EEPROM access (small but persistent).

Record (minimum)

active lanes, line rate, feature flags, and which block is expected to dominate (Rx EQ vs DSP) for the target workload.

State power (what changes over time)

Training / re-lock: can create short spikes that drive supply and hotspot stress.
Locked / active: sustained operating point for thermal steady-state.
Idle: must be defined (payload idle vs electrical idle) to be comparable.
Low-power modes: reduce average power but introduce wake energy and recovery heat.

State log schema

state_name, entry_condition
steady_power, peak_power, duration
event_count (per hour/day)

Typical vs max (PVT corners)

Datasheet “max” is not automatically the worst-case workload. Worst-case thermal often comes from a specific combination of corner + state + activity duty.

Corner declaration template

Voltage: Vmin / Vnom / Vmax
Temperature: Tmin / Tnom / Tmax
Workload: steady / bursty / sleep-wake / retrain spikes
Feature flags: EQ/FEC/retime/EEE (on/off)

Measurement rules (MVR)

A power number is comparable only when the minimum reproducible record (MVR) is present. This prevents “per-Gbps” from becoming a marketing artifact.

device state label + duration to steady (τ)
line rate + active lane count
feature flags (DFE/FEC/retime/EEE)
rails + voltage + measurement method (shunt/PMBus/bench)
ambient Ta + airflow availability
package + PCB class (copper/vias/spreading area)

Default rule: per-Gbps uses line-rate. Payload-rate must declare coding/FEC overhead.

Diagram — Power breakdown stacks (components × states)

Use stacked blocks (not curves) to compare designs: align the MVR record, then identify which block dominates under the target state.

Per-Gbps metric done right

Per-Gbps power becomes an engineering metric only after the denominator and numerator are declared, overhead is explicit, lane scaling is stated, and peak versus sustained windows are reported with a steady-state rule.

Denominator: line-rate vs payload-rate

Line-rate is preferred for IC-to-IC comparison and port power budgeting.
Payload-rate requires overhead to be declared (coding/FEC/idle).

Required declaration

denominator = line_rate or payload_rate; overhead = X% (coding/FEC/idle); duty = Y%

Numerator: chip vs board-level

Chip power: sum of declared rails (best for device comparison).
Board-level: includes DC/DC loss (better for system budgeting).

Allowed output labels

mW/Gbps (chip)
mW/Gbps (board)

Lane scaling: linear vs non-linear

Low-lane configurations often show worse mW/Gbps because fixed costs (bias, clocking, management) cannot be amortized.

Minimal model

P_total = P_fixed + N_lanes × P_lane → mW/Gbps depends on N_lanes

Peak vs sustained (thermal relevance)

Peak window catches transition spikes (power integrity and hotspot stress).
Sustained window must reach thermal steady-state for reliability decisions.

Steady-state rule (placeholders)

dT/dt < X °C/min over Y minutes
or power variation < X% over Y minutes

Do

Declare denominator (line vs payload) and overhead (coding/FEC/idle) + duty.
Declare numerator scope (chip rails or board-level boundary) and method.
Report peak window and sustained window with a steady-state rule.

Don’t

Use payload-rate without explicit overhead and duty (non-comparable).
Mix DC/DC, fan, or terminations into chip mW/Gbps (scope violation).
Quote burst-only mW/Gbps as sustained power (thermal risk).

Diagram — Per-Gbps definition decision tree

Decision output must be a labeled metric (scope + denominator). Any missing record element makes cross-device comparisons invalid.

Workload model for links

Link power is a waveform driven by duty, burstiness, and retrain probability. A workload model converts traffic to peak and sustained power windows that can be validated against temperature rise and derating thresholds.

Workload primitives (measurable)

Duty (%): active time within a window.
Burstiness: burst length + idle gaps (avg/p95/max).
Retrain rate: events per hour (or per day) under temperature drift.

Record (placeholders)

duty = X%; burst = (avg/p95/max) ms; gap = (avg/p95/max) ms; retrain = Y events/hour

Power-sensitive activity (observables)

Adaptive updates can appear as small periodic steps in average power.
Re-lock events appear as spikes followed by a new steady plateau.
Queue/timestamp activity appears as duty increase and higher baseline.

Rule: workload templates are identified by waveform shape before choosing pass criteria.

W1 — Steady full-rate

Profile: duty ≈ X%; long continuous bursts; retrain ≈ Y/hour

Power impact: drives thermal steady-state and worst-case junction.

Log fields: steady_power, Ta/airflow, τ, Tcase, state_name

Use for: heatsink/airflow sizing and derating rule validation.

W2 — Bursty traffic

Profile: duty ≈ X%; burst/gap defined (avg/p95/max); retrain low

Power impact: average rises with duty; peak follows burst edges.

Log fields: duty, peak_power, burst_len, gap, power variance

Use for: average power budgeting and temperature ramp-rate checks.

W3 — Idle + wake cycles

Profile: long idle gaps; frequent wake events; short active bursts

Power impact: low average, but wake peak and recovery energy dominate stress.

Log fields: wake_count, peak_power, recovery_duration, state transitions

Use for: PSU transient margin and hotspot fatigue risk screening.

W4 — Retrain spikes (temperature drift)

Profile: sustained traffic with occasional re-lock spikes; retrain = Y/hour

Power impact: spikes raise peak and may shift the steady plateau upward.

Log fields: event_count, spike_amplitude, spike_duration, Ta drift

Use for: field drop-risk prediction and derating guardrails.

Diagram — Workload power waveforms (W1–W4)

Assign per-Gbps numbers to a workload template (W1–W4). Sustained power is valid only after the steady-state rule is met.

Thermal fundamentals you actually use

Thermal decisions become repeatable when power is tied to a workload window and the temperature path is expressed as a network with declared boundary conditions. This section turns θ/ψ parameters into practical equations and measurement records.

One-line estimate (screening)

T_j = T_a + P × θ_JA

Use for early feasibility screening. Final sign-off requires a declared test condition and a measured temperature proxy.

T_a: ambient at a declared location (not “room temperature”).
P: dissipated power tied to workload (W1–W4) and window (peak/sustained).
θ_JA: effective junction-to-ambient under a specific board/airflow/orientation.

θ pitfalls (why θJA is not a constant)

Datasheet θ values are valid only under stated boundary conditions. Changing PCB copper, airflow, or mounting orientation can shift θJA enough to invalidate cross-board comparisons.

Declare test conditions (placeholders)

JEDEC board class: X (layers/copper thickness/openings)
Airflow: X m/s; orientation: horizontal/vertical
Measurement method: natural/forced convection
Power window: peak X ms; sustained ≥ X×τ

Rule: θ values without test conditions are treated as non-portable.

Using θJC / ψJT / ψJB (measured → junction)

When junction sensing is unavailable, estimate T_j using a defined temperature proxy and the appropriate parameter for that proxy.

T_j ≈ T_case + P × θ_JC

Use when case definition and contact are controlled (heatsink/interface conditions declared).

T_j ≈ T_top + P × ψ_JT

Common for board bring-up. Valid only with a defined T_top measurement point and boundary condition.

T_j ≈ T_board + P × ψ_JB

Useful when board sensors exist near the package. Sensor location must be specified (distance/side).

Guardrail

ψ parameters are characterization values. They must be tied to a measurement point and condition; otherwise, back-calculated T_j is not portable.

Multi-source effects (keep it measurable)

Neighbor heat sources and shielding can raise local ambient and shift hot-spot behavior. Treat them as declared inputs, not surprises.

Neighbor record (placeholders)

neighbor_power = X W; distance = X mm
shielding_present = yes/no; airflow_path = open/blocked
Ta_local measured at X location; window = sustained

Practical approximation: local ambient dominates error more often than the θ parameter itself.

Diagram — Thermal resistance network (junction / case / board / ambient)

Use θ values only with declared boundary conditions. Use ψ values only with a defined measurement point when estimating T_j.

Package & PCB heat-spreading choices

Package thermal behavior is defined by its dominant heat path and the PCB it is mounted on. This section provides a package-to-PCB selection logic that stays in the thermal/manufacturing domain (no electrical-layer discussions).

Selection anchor (placeholders)

θ_JA,target = ΔT_allowed / P

Pick package + PCB spreading that can meet θ_JA,target under the declared airflow/orientation.

ΔT_allowed: junction margin to derating threshold (X °C placeholder)
P: sustained workload power (W1–W4 + window)

Heat paths (what matters)

Bottom path: die → pad/balls → solder → vias → planes (often dominant).
Top path: die → mold/top → ambient (improves with contact/heatsink).
Spreading: copper planes convert a hotspot into a lower-gradient field.

Key PCB levers (placeholders)

copper_spread_area = X mm²; plane_layers = X
via_count = X; via_pitch = X; backside_copper = yes/no
keepout/slot = yes/no (controls isolation vs spreading)

Manufacturing & reliability coupling

Thermal performance is affected by assembly quality. Voiding and warpage can raise effective thermal resistance and shift hot-spot location.

Process record (placeholders)

voiding_percent = X%; reflow_profile_id = X
warpage_flag = yes/no; rework_cycles = X

Package quick cards (thermal + manufacturing only)

QFN (exposed pad)

Heat path: strong bottom conduction through exposed pad.

PCB needs: via array + large copper spreading region.

Process risks: voiding under pad; paste/thermal-via tuning.

Suitable power: < X W or X–Y W (placeholder; depends on copper/airflow)

QFP

Heat path: weaker bottom conduction; more dependence on convection.

PCB needs: copper spreading helps but is less direct than pad/balls.

Process risks: lead coplanarity; airflow sensitivity.

Suitable power: < X W (placeholder; depends on airflow and surface area)

BGA

Heat path: improved bottom conduction via ball array into PCB planes.

PCB needs: plane layers + via stitching for spreading.

Process risks: warpage control; inspection complexity.

Suitable power: X–Y W or > Y W (placeholder; strongly PCB-dependent)

FCBGA

Heat path: high power-density handling with stronger, controllable conduction paths.

PCB needs: robust planes, thermal via strategy, and controlled stack-up.

Process risks: cost and assembly control; warpage/underfill tradeoffs.

Suitable power: > Y W (placeholder; validate with θ_JA,target)

WLCSP

Heat path: low thermal mass; highly sensitive to PCB spreading and local hotspots.

PCB needs: careful copper strategy; strong control of nearby heat sources.

Process risks: board-level reliability and rework constraints.

Suitable power: < X W (placeholder; validate with real board conditions)

Diagram — Package + PCB thermal conduction cross-section (concept)

Package thermal performance is a package × PCB function. Use θ_JA,target to anchor selection, then validate with measured proxies (Ttop/Tboard).

Cooling options and when they backfire

Cooling is not a checklist of parts. Each method has a boundary condition where it becomes ineffective or moves heat into a more sensitive area. This section focuses on actionable “works when…” criteria and “backfires when…” failure triggers.

Natural vs forced convection

Works when

Air path reaches the hotspot (no bypass short-circuit).
Boundary layer is disrupted at the hotspot surface.
Intake air temperature is controlled/declared.

Backfires when

Airflow bypasses the device (high system CFM, low local flow).
Fan is present but hotspot boundary layer remains intact.
Hot recirculation raises local ambient around the device.

Quick checks (placeholders)

ΔT_in-out = X °C (intake vs exhaust)
Local hotspot airflow present: yes/no
Hotspot location shifts with airflow direction: yes/no

Heatsink + contact + TIM

Works when

Contact resistance is minimized (flatness + controlled pressure).
TIM thickness/compression is within the intended range.
Mechanical retention maintains pressure across thermal cycling.

Backfires when

Large heatsink but poor contact (θ_contact dominates).
Clamp pressure relaxes (vibration / thermal cycling).
TIM pumps out or ages → gradual temperature drift in the field.

Quick checks (placeholders)

Pre/post re-test ΔT_top drift ≤ X °C
Contact area evidence (inspection): pass/fail
Thermal decay after stop-load is “fast” or “slow” (qualitative)

Chassis conduction (when heat “moves”)

Works when

Heat is routed to a large thermal mass away from sensitive parts.
Interface stack is controlled (pad thickness + compression).
Resulting gradients are reduced, not relocated.

Backfires when

Thermal “short” conducts heat into a more sensitive component zone.
Hotspot relocates (device cools, nearby critical part heats up).
Local gradients increase → hidden mechanical stress risk.

Quick checks (placeholders)

Max temperature location changes after adding chassis path: yes/no
Sensitive-zone ΔT increase ≤ X °C (placeholder)
Gradient metric ≤ X °C/cm (placeholder)

Fan strategy (field degradation aware)

Works when

Control targets temperature margin, not maximum RPM.
Intake filtering and maintenance are designed-in.
Fan health is monitored and logged.

Backfires when

Dust clogging reduces airflow over time → hidden drift.
Noise constraints force lower RPM without re-validating margin.
Fan stalls/degrades with no telemetry → misdiagnosed failures.

Log fields (placeholders)

fan_rpm, pwm_duty, alarm_count
intake_temp, exhaust_temp, Ta_local
maintenance_age_days (or filter ΔP if available)

Selection matrix (goal → recommended bundle)

Low cost

Copper spreading + via array
Air path cleanup (avoid bypass)
Declare boundary conditions

Backfire check: hotspot local airflow = yes/no; ΔT_in-out within expected range.

Low noise

Maximize spreading first
Low-RPM fan + clean ducting
Control contact resistance (TIM + pressure)

Backfire check: contact-driven drift (pre/post ΔT_top ≤ X °C).

High reliability

Fan telemetry + alarms + maintenance plan
Dust mitigation (filtering/accessible service)
Chassis conduction with sensitive-zone guard

Backfire check: hotspot migration = no; sensitive-zone ΔT ≤ X °C.

Diagram — Cooling selection matrix (2×2)

The matrix is anchored by power density and airflow constraints. Each quadrant adds a “backfire” tag to force a verification check.

Thermal-aware bring-up & validation

Thermal validation becomes meaningful only when measurement points are defined, steady-state is gated, workload is repeatable, and T_j is derived via the appropriate θ/ψ mapping. This section provides a bring-up flow that is production- and field-friendly.

Define conditions

Ambient and location: T_a measured at X point (placeholder).
Airflow mode: fixed RPM or closed-loop control (declared).
Workload template: W1–W4 + window (peak/sustained).
Power scope: chip-only vs board-including (label).

Pitfall

Unfixed conditions produce non-comparable results and false “randomness”.

Measure points & sensors

Minimum measurement set

T_top/T_case (top-of-package)
T_board (nearby copper region)
T_in/T_out (intake/exhaust)
T_a (ambient; declared location)

Sensor notes (high-impact)

Thermocouple attachment method must be consistent (adhesive/coverage).
IR camera requires emissivity declaration (placeholder) and reflection control.
On-board sensors are position-biased → record location and distance.

Gate steady-state

Use a steady-state gate based on time constant and temperature slope. Avoid taking a snapshot during transient warm-up.

Gating criteria (placeholders)

Wait time ≥ X × τ (placeholder)
dT/dt < X °C/min over Y min (placeholder)
Re-run workload template if retrain/error spikes occurred

Convert & decide

Tj conversion (choose one)

T_j ≈ T_top + P × ψ_JT
T_j ≈ T_board + P × ψ_JB
T_j ≈ T_case + P × θ_JC (case defined)

Pass criteria (placeholders)

T_j margin ≥ X °C
T_case ≤ X °C
Hotspot gradient ≤ X °C/cm

Record fields (placeholders)

workload_id (W1–W4), window_type (peak/sustained)
steady_state_met (yes/no), retrain_spikes (count)
Ttop, Tboard, Tin, Tout, Ta, P_total (scope label)

Diagram — Thermal validation flow

The flow enforces steady-state gating and a declared θ/ψ mapping before pass/fail decisions are made.

Power integrity & thermal coupling

This section treats power delivery only through a power/thermal lens: conversion efficiency, transient spikes, protection-driven derating, and the temperature-to-power positive feedback loop. It avoids electrical-layer tuning details.

DC/DC efficiency becomes board heat

Even if chip power is unchanged, input-side loss turns into extra heat near the power stage and copper planes. A practical placeholder range is +10–20% board heat overhead, depending on efficiency and loading.

Thermal viewpoint checklist

Declare DC/DC efficiency at the tested load (η = X%, placeholder).
Log converter temperature zone near hotspots (T_pwr, placeholder).
Separate “chip power” vs “board-including power” in reports.

Transient spikes distort power and heat readings

Training / wake / relock windows can create short power spikes. These can cause voltage ripple and measurement aliasing, leading to under-estimated sustained heat or over-estimated per-Gbps numbers if sampling windows are not aligned.

Quick “sanity gates” (placeholders)

Peak window length = X ms; sustained window length = Y s.
Spike count per minute ≤ X (placeholder).
Ripple proxy / variance metric flagged: yes/no.

Protection-driven derating (thermal safety valves)

Over-temperature protection and throttling are “breakers” that prevent runaway. They also change observed power, so validation must log whether the system is in normal, throttled, or degraded mode.

Common breakers (strategy only)

OTP threshold reached → throttle / cut activity (placeholder T).
Lane disable / reduced duty cycle to preserve temperature margin.
Low-power modes (e.g., energy saving) to reduce sustained heating.

Temperature → leakage → power (positive feedback)

As temperature rises, leakage and bias shifts can increase power, especially in advanced processes. This creates a positive loop that must be broken by airflow, throttling, or derating rules.

What to log (minimal set)

T_top/T_j proxy, power scope label (chip/board)
Throttle state (normal/throttled), lane count active
Converter η (placeholder), local power-stage temperature

Thermal–electrical coupling loop (explain + break)

A practical way to prevent misleading power numbers is to explicitly label the loop states: normal → warming → near-limit → throttled. Validation should record which “breaker” is active (airflow, throttling, derating) when the temperature slope changes.

Diagram — Thermal–electrical positive feedback loop with breakers

The loop is intentionally simple: it enforces scope labeling and breaker logging rather than electrical tuning details.

Design guardrails & derating rules

This section turns the page into reusable engineering guardrails: power budgets across chip/port/system, thermal budgets with explicit margin, and derating rules that map temperature back to allowable throughput or duty-cycle.

Three-layer power budget

Chip: silicon-only power (declared state + temperature)
Port: per-port / per-lane budget (scales, but not perfectly linear)
System: DC/DC loss + fan + chassis paths (board-including)

Guardrail rule: reports must label scope (chip vs board-including).

Thermal budget (with explicit margin)

T_a(max) = X °C (placeholder)
T_j(max allowed) = X °C (placeholder)
Reserve margin = X °C (placeholder)

Guardrail rule: validation must demonstrate margin under sustained workload, not only peak snapshots.

Derating rule model (placeholder)

Use a declared linear model to map temperature to allowed activity: for each +ΔT, reduce duty-cycle or throughput by a fixed step.

Example format (placeholders)

If T ≥ T_guard, then duty = duty − k × (T − T_guard)
If T ≥ T_crit, then throttle state = ON
If T returns below T_recover, then ramp back with hysteresis

Copyable guardrails (If / Then)

If system power scope is used, then DC/DC efficiency (η) and fan power must be logged.
If peak windows are reported, then sustained windows (Y s) must be reported with steady-state gating.
If T_j margin < X °C, then reduce duty-cycle or throughput per the declared derating model.
If hotspot relocates after adding chassis conduction, then redesign thermal path to protect sensitive zones.
If P/Gbps exceeds X mW/Gbps (placeholder), then upgrade package / spreading / airflow bundle before lock-in.
If throttling is observed in validation, then published performance must be labeled “throttled-mode” or redesigned.
If dust/aging is expected, then add maintenance fields (age, filter status) and re-validate margins over time.
If any budget changes, then the loop (budget → Tj → validate → derate/redesign) must be re-run and re-locked.

Diagram — Budget → thermal → derate loop (spec lock)

The loop enforces re-validation whenever a budget, cooling bundle, or workload assumption changes.

Applications (Power/Thermal-only)

Applications are framed strictly as “how thermal constraints reshape system design.” No protocol details are expanded here. Each card provides: scenario → thermal pain points → design actions → validation items.

Material numbers note: examples below are common, widely stocked families. Always verify thickness / size / suffix / safety rating / availability against your mechanical stack and compliance needs.

Industrial long-line gateway (sealed box + high Ta + sustained load)

High Ta Sustained duty Airflow limited

Thermal pain points: steady-state heat soak, converter loss adding board heat, hotspots trapped under shields, and margin collapse when dust aging reduces airflow.

Design actions: define a front-to-back airflow path (even “pseudo-ducts” with foam), isolate power-stage heat from PHY/SerDes, and enforce a derating curve tied to measured surface temperature.

Validation items (placeholders)

Steady-state gate: wait ≥ X·τ before declaring pass/fail.
Record: Ta, Tin/Tout, Ttop, Tboard-hotspot, throttle state, DC/DC η (placeholder).
Pass: Tj margin ≥ X °C; surface ≤ X °C; hotspot gradient ≤ X °C/cm.

Example materials / part numbers (verify fit)

Thermal tape: 3M 8810 / 3M 8815 (bonding, thin interface).
Thermal pad family: (Henkel/Bergquist) GAP PAD 1500 series (thickness-dependent variants).
Heatsink (example): Aavid Thermalloy 576802B00000G (board-level clip-on style).
Fan family (example): Delta AFB series (size/speed variants); SUNON MagLev series (variants).

Camera links (small volume + remote power heat concentration)

Small enclosure Heat concentrated Shield conflict

Thermal pain points: hotspot under RF/EMI shields, heat path competing with shielding, and skin temperature constraints (touch-safe surface).

Design actions: create a defined conduction path (chip → spreader → chassis), avoid “thermal shorting” into sensitive optics, and prefer pads/tapes that keep thickness stable over aging.

Validation items (placeholders)

Surface temperature: T_surf ≤ X °C (touch-safe placeholder).
Measure Ttop at shielded hotspots; verify gradient vs chassis region.
Run steady + burst template to detect wake/processing heat spikes.

Example materials / part numbers (verify fit)

Thermal tape: 3M 8810 (thin bonding to chassis/spreader).
Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (choose thickness for stack).
Thermal grease (example): DOWSIL TC-5022 (industrial TIM family; verify grade/spec).
Shield interface: conductive foam + thermal pad stack (select by compliance need).

Server retimers (high lane count + high heat density)

High heat density Airflow-dependent Sustained workloads

Thermal pain points: heatsink attachment quality (contact resistance), airflow shadowing, and “best-looking per-Gbps” that collapses once throttling engages under sustained traffic.

Design actions: prioritize mechanical repeatability (clip/torque), enforce airflow directionality, and require telemetry / temperature flags for production correlation.

Validation items (placeholders)

Require “sustained” window: Y minutes at worst duty + worst Ta.
Log: Ttop/Tboard, airflow state, heatsink attachment lot/process.
Pass: no throttling within X margin; stable temperature slope.

Example materials / part numbers (verify fit)

Heatsink (example): Aavid Thermalloy 576802B00000G (example family; choose footprint).
Thermal tape: 3M 8815 (higher thickness variant vs 8810).
Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (compressibility helps tolerance).
Fan family: Delta AFB series / Nidec UltraFlo series (select by airflow/noise targets).

Automotive (temperature cycling + lifetime + degradation)

Cycling & aging Derating required Logging & fallback

Thermal pain points: repeated thermal stress changes contact resistance and pad performance; field conditions drift beyond lab assumptions; failures can be intermittent and correlation requires telemetry.

Design actions: enforce conservative margins, define derating by temperature bands, and implement graceful degradation modes (reduced lane count / reduced duty) tied to measured temperature.

Validation items (placeholders)

Cycle test: ΔT range = X °C; cycles = N (placeholders).
Log: max Ttop, time-over-threshold, derate events, recovery hysteresis.
Pass: no uncontrolled throttling; predictable derate curve; stable hotspot mapping.

Example materials / part numbers (verify fit)

Thermal tape: 3M 8810 / 3M 8815 (process repeatability; verify temp rating).
Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (choose for cycling compliance).
Thermal grease (example): DOWSIL TC-5022 (verify grade/spec for automotive constraints).
Heatsink (example): Aavid Thermalloy 576802B00000G (verify vibration/attachment needs).

Diagram — Scenario → thermal constraints mapping (card-style)

Each scenario card intentionally shows only thermal constraints, actions, and validation signals.

IC selection logic (Power/Thermal)

This is a decision workflow (not a product list). It starts with required inputs, computes junction temperature feasibility, then decides package/cooling tier and whether derating/telemetry is mandatory.

Required inputs (must-fill)

Line rate + lane/port count
Duty-cycle / workload template (steady vs burst)
Ta(max), airflow availability, enclosure type
Board area + z-height for heatsink/duct
Allowed surface temperature (Tsurf limit)

Missing inputs = misleading mW/Gbps comparisons.

Power/Thermal metrics to demand

mW/Gbps with declared scope (chip-only vs board-including)
Pmax at Ta(max) (peak vs sustained windows labeled)
θ/ψ metrics with test conditions (board, copper, airflow)
Package thermal path features (EPAD / BGA heat spread)
OTP + thermal telemetry availability (verify in datasheet)
Low-power state benefit vs wake spike cost (thermal shock)

Red flags (risk markers)

No θ/ψ test conditions published
No power vs rate/state/temperature curves (or equivalent)
Only “typical” power; no max/corners labeling
No OTP or no measurable thermal/derate status
Peak-only claims without sustained thermal gating

Example IC material numbers (for comparison workflows)

These are examples only (not recommendations). Use them to build a consistent power/thermal comparison sheet and validate availability, package, and suffix.

PCIe retimer / redriver

TI DS80PCI402
TI DS160PR412
TI DS280DF810

USB redriver / hub (examples)

TI TUSB522
TI TUSB1046
Microchip USB5744 (hub family example)

Ethernet PHY (examples)

TI DP83822I
Microchip LAN8840
NXP TJA1103 (automotive Ethernet PHY example)

Thermal management materials

3M 8810 / 3M 8815 (thermal tape)
(Henkel/Bergquist) GAP PAD 1500 series
Aavid 576802B00000G (heatsink example)
DOWSIL TC-5022 (TIM example)

Diagram — Power/Thermal selection decision tree

The workflow forces declared measurement scope and sustained thermal gating before any “per-Gbps” comparison is accepted.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Power & Thermal)

Troubleshooting is scoped strictly to power definitions, thermal metrics, cooling effectiveness, derating behavior, and measurement traps. Each answer is intentionally short and executable.

Answer format: Likely cause / Quick check / Fix / Pass criteria (thresholds use X/Y placeholders)

“Datasheet per-Gbps looks great, but system power is 30% higher”—first accounting check?

Likely cause: mismatched scope (chip-only vs board-including), wrong denominator (line rate vs payload), or missing losses (DC/DC η, LDO drop, termination, clocks).

Quick check: measure VIN×IIN at board input and compare against summed rail power at the IC load points; tag state: lanes, rate, FEC/DFE enable, idle vs sustained window.

Fix: lock a single comparison template: scope (chip/board), denominator (line/payload), window (peak/sustained), and always include conversion losses using measured DC/DC efficiency at that load point.

Pass criteria: accounting gap ≤ X% (board-in vs sum-of-rails), and reported mW/Gbps varies ≤ X% across repeated runs with identical state tags.

“Works for 5 minutes then drops”—is it thermal throttling or link instability?

Likely cause: steady-state thermal rise crossing a throttle/OTP threshold, or temperature-driven leakage pushing power higher until protection engages.

Quick check: log a timeline of Ttop/Tboard, input power, and throttle/derate flags; look for “drop event” alignment with a temperature knee or a power step.

Fix: reproduce with a controlled workload window (sustained Y min), add airflow or heatsink contact improvement, and apply a derate rule before the threshold (temperature hysteresis included).

Pass criteria: no throttle/OTP during Y minutes sustained worst-case workload at Ta(max), and temperature slope |dT/dt| ≤ X °C/min after reaching steady state.

Same throughput, different PCB revision runs hotter—what board-level variable usually changed?

Likely cause: reduced heat spreading (copper area/planes), fewer or smaller thermal vias, changed component placement creating an airflow shadow, or added nearby heat sources (DC/DC, shield can).

Quick check: compare rev-A vs rev-B: copper pour under the package, via count/pitch, inner-plane continuity, and physical obstructions above the device; capture identical workload heatmaps at the same Ta.

Fix: restore thermal path symmetry (spread copper, add via array, keep planes continuous), and enforce a mechanical keep-out to protect airflow over the hotspot.

Pass criteria: ΔT(top) between revisions ≤ X °C at the same sustained window, or demonstrated θJA improvement ≥ X% with the same test setup.

IR camera shows low temp but failures correlate with heat—what emissivity/spot error to suspect?

Likely cause: wrong emissivity on shiny surfaces, reflections from hot neighbors, spot size larger than the hotspot, or viewing angle causing under-read.

Quick check: place matte tape/paint dot on the package top, re-measure with emissivity set accordingly, and cross-check with a thin thermocouple; confirm the IR spot diameter < X× hotspot size.

Fix: standardize IR procedure: emissivity reference patch, fixed distance/angle, and a calibration step against a contact sensor on the same surface.

Pass criteria: IR vs contact measurement error ≤ X °C on the same marked spot under steady-state conditions.

θJA from datasheet underestimates Tj by a lot—what JEDEC condition mismatch is typical?

Likely cause: θJA was measured on a specific JEDEC board (copper, layers, airflow, orientation) that does not match the real PCB (smaller copper, blocked airflow, enclosure).

Quick check: read the datasheet θJA conditions (board type, copper area, airflow) and compare to the actual stack; use ψJT/ψJB with measured Ttop/Tboard to back-calculate Tj.

Fix: replace single θJA with a board-calibrated model: measure Ttop and Tboard under known power, derive an effective θ for that assembly, then re-run margin and derate rules.

Pass criteria: effective θ model predicts measured Ttop/Tj within ≤ X °C across two workloads (steady + burst) at Ta(max).

Fan increases airflow but hotspot gets worse—what airflow path or recirculation pattern causes this?

Likely cause: short-circuit airflow (inlet to outlet bypass), recirculation loop, or a new “shadow zone” where high-speed flow skips the hotspot and pulls warm air back over it.

Quick check: run a simple airflow visualization (smoke thread / tissue / anemometer points) to confirm direction and bypass; measure Tin/Tout and hotspot Ttop with the same workload window.

Fix: add ducting/foam baffles to force flow across the hotspot, separate inlet/outlet, and remove obstructions that create a local recirculation pocket.

Pass criteria: hotspot temperature improves by ≥ X °C and ΔT(Tout−Tin) increases by ≥ X °C (indicating useful heat extraction) under the same sustained load.

Low-power mode saves energy but increases temperature spikes after wake—what transient to log?

Likely cause: wake causes a short burst of digital activity and analog re-lock that raises instantaneous power; average energy improves but peak thermal shock increases.

Quick check: log power at high time resolution (≥ X ksps equivalent) for the first Y ms after wake, plus temperature slope; correlate spikes with wake events.

Fix: stagger wake (ports/lane groups), limit wake burst rate, or pre-condition airflow; if allowed, add a soft-start ramp in firmware/power sequencing.

Pass criteria: peak power during wake ≤ X% of sustained power budget, and post-wake ΔT within Y seconds ≤ X °C (thermal shock bound).

Thermal pad added but no improvement—what contact resistance or mounting pressure issue is common?

Likely cause: pad too thick/hard (insufficient compression), uneven pressure, air gaps/voids, or a flatness mismatch that prevents real contact at the hotspot.

Quick check: inspect pad “imprint” after disassembly; verify compression ratio (actual thickness vs nominal), and check if the hotspot aligns with the contact zone.

Fix: choose a pad with appropriate softness/compressibility and thickness tolerance; improve mounting planarity and clamp force; consider tape/grease if stack height allows.

Pass criteria: confirmed compression within X–Y% target range and hotspot reduction ≥ X °C under identical sustained workload.

Why does higher ambient cause more than linear power rise—what leakage/OTP behavior explains it?

Likely cause: temperature increases leakage and bias currents, raising power; higher temperature can also reduce DC/DC efficiency and trigger pre-throttle behavior that changes operating point.

Quick check: sweep Ta in steps (e.g., +10 °C) and log input power, device temperature, and any throttle flags; check whether power rise accelerates near a protection threshold.

Fix: apply derating by temperature bands, ensure airflow margin at Ta(max), and re-validate power budget using the highest-Ta sustained window (not a short burst).

Pass criteria: at Ta(max), sustained input power ≤ P_budget and Tj margin ≥ X °C; no protection/pre-throttle activation within Y minutes steady window.

Port-to-port temperature spread is huge—what copper/heat-spreading asymmetry is most common?

Likely cause: unequal copper planes/via arrays, one port located near a hot converter or shield edge, or uneven airflow distribution across the board.

Quick check: compare each port’s copper/via pattern and nearby heat sources; measure a synchronized heatmap under the same traffic/duty and confirm airflow direction across all ports.

Fix: balance heat spreading geometry (plane continuity, via density), move or isolate nearby heat sources, and use airflow guides so each port sees similar inlet temperature.

Pass criteria: ΔT(port max − port min) ≤ X °C at the sustained worst-case window and Ta(max).

DC/DC runs cool but PHY is hot—what efficiency/load point misleads the measurement?

Likely cause: power was measured at the wrong node (before LDOs or after sense resistors), or DC/DC is off the hotspot while the IC dissipates most heat locally; efficiency curve may be poor at transient load.

Quick check: measure rail power at the IC load points (V at the pins + I into the rail) during both peak and sustained windows; compare against VIN×IIN to reveal hidden drops/losses.

Fix: instrument power with correct sense locations (Kelvin), include all intermediate losses (LDO, cable, connector), and use a converter operating region that matches the sustained load point.

Pass criteria: power discrepancy between VIN×IIN and sum of rail powers ≤ X%, and sustained rail ripple/voltage droop stays within X (unit placeholder) during retrain/wake events.

Production passes at 25°C but field fails in summer—what margin number is usually missing?

Likely cause: qualification used short bursts (not sustained), did not test Ta(max), ignored airflow degradation (dust/blocked vents), or lacked an explicit Tj/surface margin requirement.

Quick check: replicate a field-like sustained duty at elevated Ta and reduced airflow; log time-over-temperature, throttle events, and peak-to-steady temperature rise.

Fix: define acceptance at Ta(max) with a sustained window and explicit margin, then implement derating before the limit; add telemetry/log fields for correlation in the field.

Pass criteria: at Ta(max) and “degraded airflow” condition, Tj margin ≥ X °C, surface ≤ X °C, and no throttle/OTP during Y minutes sustained duty.

Power & Thermal: Per-Gbps Power and Cooling for PHY/SerDes

Power & Thermal: Per-Gbps Power and Cooling for PHY/SerDes

Scope & non-overlap guardrails

This page covers

This page does NOT cover

Mandatory inputs to declare (minimum reproducible record)

Power taxonomy for PHY/SerDes

Component blocks (what consumes power)

State power (what changes over time)

Typical vs max (PVT corners)

Measurement rules (MVR)

Per-Gbps metric done right

Denominator: line-rate vs payload-rate

Numerator: chip vs board-level

Lane scaling: linear vs non-linear

Peak vs sustained (thermal relevance)

Do

Don’t

Workload model for links

Workload primitives (measurable)

Power-sensitive activity (observables)

W1 — Steady full-rate

W2 — Bursty traffic

W3 — Idle + wake cycles

W4 — Retrain spikes (temperature drift)

Thermal fundamentals you actually use

One-line estimate (screening)

θ pitfalls (why θJA is not a constant)

Using θJC / ψJT / ψJB (measured → junction)

Multi-source effects (keep it measurable)

Package & PCB heat-spreading choices

Selection anchor (placeholders)

Heat paths (what matters)

Manufacturing & reliability coupling

Package quick cards (thermal + manufacturing only)

Cooling options and when they backfire

Natural vs forced convection

Heatsink + contact + TIM

Chassis conduction (when heat “moves”)

Fan strategy (field degradation aware)

Selection matrix (goal → recommended bundle)

Thermal-aware bring-up & validation

Define conditions

Measure points & sensors

Gate steady-state

Convert & decide

Power integrity & thermal coupling

DC/DC efficiency becomes board heat

Transient spikes distort power and heat readings

Protection-driven derating (thermal safety valves)

Temperature → leakage → power (positive feedback)

Thermal–electrical coupling loop (explain + break)

Design guardrails & derating rules

Three-layer power budget

Thermal budget (with explicit margin)

Derating rule model (placeholder)

Copyable guardrails (If / Then)

Applications (Power/Thermal-only)

Industrial long-line gateway (sealed box + high Ta + sustained load)

Camera links (small volume + remote power heat concentration)

Server retimers (high lane count + high heat density)

Automotive (temperature cycling + lifetime + degradation)

IC selection logic (Power/Thermal)

Required inputs (must-fill)

Power/Thermal metrics to demand

Red flags (risk markers)

Example IC material numbers (for comparison workflows)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

FAQs (Power & Thermal)

Explore

Categories

Get in Touch