123 Main Street, New York, NY 10001

Power & Thermal: Per-Gbps Power and Cooling for PHY/SerDes

← Back to:Interfaces, PHY & SerDes

Power & Thermal is about turning “per-Gbps” claims into an engineering budget: define measurement scope, model steady vs transient load, and map power into junction temperature with board-realistic thermal paths. The goal is to lock guardrails (cooling, derating, and pass criteria) so links remain stable across worst-case ambient, airflow, and aging.

tokens: bg=#0b1120 card=#0a1226 border=#1f2a44 border2=#223154 accent=#38bdf8 text=#e5e7eb sub=#cbd5e1

Scope & non-overlap guardrails

This page treats power and thermal as an engineering contract: define comparable power metrics, map them to junction temperature, and choose package/cooling that preserves throughput margin across worst-case traffic, environment, and aging.

Covers: power → temperature → cooling decisions Guardrail: link out, do not expand

This page covers

  • Per-Gbps power definitions and comparable measurement/estimation records (mW/Gbps).
  • Thermal network usage (θJA/θJC/ψJT/ψJB) and how PCB/copper/airflow shift results.
  • Workload & link states (training/idle/low-power/full-rate sustained) as power waveforms and worst-case selection.
  • Package/cooling/layout choices and derating rules to maintain junction margin.
  • Production & field temperature rise validation, thermal failure signatures, and logging fields.

This page does NOT cover

Mandatory inputs to declare (minimum reproducible record)

Link parameters

  • Link type (cable/backplane/multi-lane serial)
  • Line rate (Gbps) + active lane count
  • Traffic profile (duty/burst/idle ratio)

Feature & power rails

  • Feature flags (DFE/FEC/retime/EEE on/off)
  • Supply rails + voltage (VDDx) and measurement method
  • State label (training/locked/idle/LPM/recovery)

Thermal environment

  • Ambient (Ta) + airflow availability
  • Package + PCB class (copper/vias/spreading area)
  • Pass criteria placeholders (Tj margin X°C, case Y°C, τ)

Guardrail: any power number without this record is treated as non-comparable.

Diagram — Boundary map (solar-orbit view)

Power & Thermal Scope center Electrical Layer Link only PHY Robustness Input only Timing & Sync Input only Protocol Compat. Link only CDR & Equalization Input only Input only Link only

Use this boundary map to keep the page lean: sibling topics are referenced only as inputs or links, never expanded here.

Power taxonomy for PHY/SerDes

A comparable power discussion needs a shared dictionary: component blocks, link-state labels, PVT corners, and a minimum record that makes numbers portable across boards, labs, and vendors.

Component blocks (what consumes power)

  • Static / bias: baseline that shifts with temperature and process leakage.
  • Clock / PLL / CDR: reference conditioning and lock maintenance cost.
  • Tx driver: swing/pre-emphasis settings (treated as power knobs here).
  • Rx EQ (CTLE/DFE/ADC): often the dominant block when adaptive features are enabled.
  • DSP / FEC: scales with sustained throughput and activity duty.
  • Mgmt I/O: MDIO/I²C/GPIO/EEPROM access (small but persistent).

Record (minimum)

active lanes, line rate, feature flags, and which block is expected to dominate (Rx EQ vs DSP) for the target workload.

State power (what changes over time)

  • Training / re-lock: can create short spikes that drive supply and hotspot stress.
  • Locked / active: sustained operating point for thermal steady-state.
  • Idle: must be defined (payload idle vs electrical idle) to be comparable.
  • Low-power modes: reduce average power but introduce wake energy and recovery heat.

State log schema

  • state_name, entry_condition
  • steady_power, peak_power, duration
  • event_count (per hour/day)

Typical vs max (PVT corners)

Datasheet “max” is not automatically the worst-case workload. Worst-case thermal often comes from a specific combination of corner + state + activity duty.

Corner declaration template

  • Voltage: Vmin / Vnom / Vmax
  • Temperature: Tmin / Tnom / Tmax
  • Workload: steady / bursty / sleep-wake / retrain spikes
  • Feature flags: EQ/FEC/retime/EEE (on/off)

Measurement rules (MVR)

A power number is comparable only when the minimum reproducible record (MVR) is present. This prevents “per-Gbps” from becoming a marketing artifact.

  • device state label + duration to steady (τ)
  • line rate + active lane count
  • feature flags (DFE/FEC/retime/EEE)
  • rails + voltage + measurement method (shunt/PMBus/bench)
  • ambient Ta + airflow availability
  • package + PCB class (copper/vias/spreading area)

Default rule: per-Gbps uses line-rate. Payload-rate must declare coding/FEC overhead.

Diagram — Power breakdown stacks (components × states)

ACTIVE / LOCKED IDLE / LOW-POWER Base bias Clock / PLL / CDR Tx driver Rx EQ DSP / FEC Other / mgmt Dominant block can shift with state (Rx EQ ↔ DSP) Base bias Clock / PLL Tx driver Rx EQ DSP / FEC Base bias Clock / PLL Tx (reduced) Rx EQ (reduced) DSP (low) Other / mgmt state transitions can spike peak power

Use stacked blocks (not curves) to compare designs: align the MVR record, then identify which block dominates under the target state.

Per-Gbps metric done right

Per-Gbps power becomes an engineering metric only after the denominator and numerator are declared, overhead is explicit, lane scaling is stated, and peak versus sustained windows are reported with a steady-state rule.

Denominator: line-rate vs payload-rate

  • Line-rate is preferred for IC-to-IC comparison and port power budgeting.
  • Payload-rate requires overhead to be declared (coding/FEC/idle).

Required declaration

denominator = line_rate or payload_rate; overhead = X% (coding/FEC/idle); duty = Y%

Numerator: chip vs board-level

  • Chip power: sum of declared rails (best for device comparison).
  • Board-level: includes DC/DC loss (better for system budgeting).

Allowed output labels

  • mW/Gbps (chip)
  • mW/Gbps (board)

Lane scaling: linear vs non-linear

Low-lane configurations often show worse mW/Gbps because fixed costs (bias, clocking, management) cannot be amortized.

Minimal model

Ptotal = Pfixed + Nlanes × Plane → mW/Gbps depends on Nlanes

Peak vs sustained (thermal relevance)

  • Peak window catches transition spikes (power integrity and hotspot stress).
  • Sustained window must reach thermal steady-state for reliability decisions.

Steady-state rule (placeholders)

  • dT/dt < X °C/min over Y minutes
  • or power variation < X% over Y minutes

Do

  1. Declare denominator (line vs payload) and overhead (coding/FEC/idle) + duty.
  2. Declare numerator scope (chip rails or board-level boundary) and method.
  3. Report peak window and sustained window with a steady-state rule.

Don’t

  1. Use payload-rate without explicit overhead and duty (non-comparable).
  2. Mix DC/DC, fan, or terminations into chip mW/Gbps (scope violation).
  3. Quote burst-only mW/Gbps as sustained power (thermal risk).

Diagram — Per-Gbps definition decision tree

Goal IC compare or system energy IC comparison prefer chip scope System budgeting board scope allowed Denominator line-rate or payload-rate Numerator chip or board-level Overhead declared? coding/FEC/idle + duty Lane count stated? fixed cost amortization Output label mW/Gbps (chip|board, line|payload) Non-comparable missing record

Decision output must be a labeled metric (scope + denominator). Any missing record element makes cross-device comparisons invalid.

Link power is a waveform driven by duty, burstiness, and retrain probability. A workload model converts traffic to peak and sustained power windows that can be validated against temperature rise and derating thresholds.

Workload primitives (measurable)

  • Duty (%): active time within a window.
  • Burstiness: burst length + idle gaps (avg/p95/max).
  • Retrain rate: events per hour (or per day) under temperature drift.

Record (placeholders)

duty = X%; burst = (avg/p95/max) ms; gap = (avg/p95/max) ms; retrain = Y events/hour

Power-sensitive activity (observables)

  • Adaptive updates can appear as small periodic steps in average power.
  • Re-lock events appear as spikes followed by a new steady plateau.
  • Queue/timestamp activity appears as duty increase and higher baseline.

Rule: workload templates are identified by waveform shape before choosing pass criteria.

W1 — Steady full-rate

Profile: duty ≈ X%; long continuous bursts; retrain ≈ Y/hour

Power impact: drives thermal steady-state and worst-case junction.

Log fields: steady_power, Ta/airflow, τ, Tcase, state_name

Use for: heatsink/airflow sizing and derating rule validation.

W2 — Bursty traffic

Profile: duty ≈ X%; burst/gap defined (avg/p95/max); retrain low

Power impact: average rises with duty; peak follows burst edges.

Log fields: duty, peak_power, burst_len, gap, power variance

Use for: average power budgeting and temperature ramp-rate checks.

W3 — Idle + wake cycles

Profile: long idle gaps; frequent wake events; short active bursts

Power impact: low average, but wake peak and recovery energy dominate stress.

Log fields: wake_count, peak_power, recovery_duration, state transitions

Use for: PSU transient margin and hotspot fatigue risk screening.

W4 — Retrain spikes (temperature drift)

Profile: sustained traffic with occasional re-lock spikes; retrain = Y/hour

Power impact: spikes raise peak and may shift the steady plateau upward.

Log fields: event_count, spike_amplitude, spike_duration, Ta drift

Use for: field drop-risk prediction and derating guardrails.

Diagram — Workload power waveforms (W1–W4)

W1 — steady W2 — bursty W3 — sleep/wake W4 — retrain spikes power time power time power time power time sustained duty ↑ wake peak re-lock spikes τ peak count dur

Assign per-Gbps numbers to a workload template (W1–W4). Sustained power is valid only after the steady-state rule is met.

Thermal fundamentals you actually use

Thermal decisions become repeatable when power is tied to a workload window and the temperature path is expressed as a network with declared boundary conditions. This section turns θ/ψ parameters into practical equations and measurement records.

One-line estimate (screening)

Tj = Ta + P × θJA

Use for early feasibility screening. Final sign-off requires a declared test condition and a measured temperature proxy.

  • Ta: ambient at a declared location (not “room temperature”).
  • P: dissipated power tied to workload (W1–W4) and window (peak/sustained).
  • θJA: effective junction-to-ambient under a specific board/airflow/orientation.

θ pitfalls (why θJA is not a constant)

Datasheet θ values are valid only under stated boundary conditions. Changing PCB copper, airflow, or mounting orientation can shift θJA enough to invalidate cross-board comparisons.

Declare test conditions (placeholders)

  • JEDEC board class: X (layers/copper thickness/openings)
  • Airflow: X m/s; orientation: horizontal/vertical
  • Measurement method: natural/forced convection
  • Power window: peak X ms; sustained ≥ X×τ

Rule: θ values without test conditions are treated as non-portable.

Using θJC / ψJT / ψJB (measured → junction)

When junction sensing is unavailable, estimate Tj using a defined temperature proxy and the appropriate parameter for that proxy.

Tj ≈ Tcase + P × θJC

Use when case definition and contact are controlled (heatsink/interface conditions declared).

Tj ≈ Ttop + P × ψJT

Common for board bring-up. Valid only with a defined Ttop measurement point and boundary condition.

Tj ≈ Tboard + P × ψJB

Useful when board sensors exist near the package. Sensor location must be specified (distance/side).

Guardrail

ψ parameters are characterization values. They must be tied to a measurement point and condition; otherwise, back-calculated Tj is not portable.

Multi-source effects (keep it measurable)

Neighbor heat sources and shielding can raise local ambient and shift hot-spot behavior. Treat them as declared inputs, not surprises.

Neighbor record (placeholders)

  • neighbor_power = X W; distance = X mm
  • shielding_present = yes/no; airflow_path = open/blocked
  • Ta_local measured at X location; window = sustained

Practical approximation: local ambient dominates error more often than the θ parameter itself.

Diagram — Thermal resistance network (junction / case / board / ambient)

Junction Tj Case (top) Ttop / Tcase Board Tboard Ambient Ta θJC θJA ψJT ψJB θBA θ: thermal resistance (condition-dependent) ψ: proxy mapping (measurement-defined)

Use θ values only with declared boundary conditions. Use ψ values only with a defined measurement point when estimating Tj.

Package & PCB heat-spreading choices

Package thermal behavior is defined by its dominant heat path and the PCB it is mounted on. This section provides a package-to-PCB selection logic that stays in the thermal/manufacturing domain (no electrical-layer discussions).

Selection anchor (placeholders)

θJA,target = ΔTallowed / P

Pick package + PCB spreading that can meet θJA,target under the declared airflow/orientation.

  • ΔTallowed: junction margin to derating threshold (X °C placeholder)
  • P: sustained workload power (W1–W4 + window)

Heat paths (what matters)

  • Bottom path: die → pad/balls → solder → vias → planes (often dominant).
  • Top path: die → mold/top → ambient (improves with contact/heatsink).
  • Spreading: copper planes convert a hotspot into a lower-gradient field.

Key PCB levers (placeholders)

  • copper_spread_area = X mm²; plane_layers = X
  • via_count = X; via_pitch = X; backside_copper = yes/no
  • keepout/slot = yes/no (controls isolation vs spreading)

Manufacturing & reliability coupling

Thermal performance is affected by assembly quality. Voiding and warpage can raise effective thermal resistance and shift hot-spot location.

Process record (placeholders)

  • voiding_percent = X%; reflow_profile_id = X
  • warpage_flag = yes/no; rework_cycles = X

Package quick cards (thermal + manufacturing only)

QFN (exposed pad)

Heat path: strong bottom conduction through exposed pad.

PCB needs: via array + large copper spreading region.

Process risks: voiding under pad; paste/thermal-via tuning.

Suitable power: < X W or X–Y W (placeholder; depends on copper/airflow)

QFP

Heat path: weaker bottom conduction; more dependence on convection.

PCB needs: copper spreading helps but is less direct than pad/balls.

Process risks: lead coplanarity; airflow sensitivity.

Suitable power: < X W (placeholder; depends on airflow and surface area)

BGA

Heat path: improved bottom conduction via ball array into PCB planes.

PCB needs: plane layers + via stitching for spreading.

Process risks: warpage control; inspection complexity.

Suitable power: X–Y W or > Y W (placeholder; strongly PCB-dependent)

FCBGA

Heat path: high power-density handling with stronger, controllable conduction paths.

PCB needs: robust planes, thermal via strategy, and controlled stack-up.

Process risks: cost and assembly control; warpage/underfill tradeoffs.

Suitable power: > Y W (placeholder; validate with θJA,target)

WLCSP

Heat path: low thermal mass; highly sensitive to PCB spreading and local hotspots.

PCB needs: careful copper strategy; strong control of nearby heat sources.

Process risks: board-level reliability and rework constraints.

Suitable power: < X W (placeholder; validate with real board conditions)

Diagram — Package + PCB thermal conduction cross-section (concept)

Ambient air Package body Die Pad / balls PCB stack Copper planes (spreading) Inner planes Via array bottom path spreading top path Process sensitivities voiding · warpage · reflow profile

Package thermal performance is a package × PCB function. Use θJA,target to anchor selection, then validate with measured proxies (Ttop/Tboard).

Cooling options and when they backfire

Cooling is not a checklist of parts. Each method has a boundary condition where it becomes ineffective or moves heat into a more sensitive area. This section focuses on actionable “works when…” criteria and “backfires when…” failure triggers.

Natural vs forced convection

Works when

  • Air path reaches the hotspot (no bypass short-circuit).
  • Boundary layer is disrupted at the hotspot surface.
  • Intake air temperature is controlled/declared.

Backfires when

  • Airflow bypasses the device (high system CFM, low local flow).
  • Fan is present but hotspot boundary layer remains intact.
  • Hot recirculation raises local ambient around the device.

Quick checks (placeholders)

  • ΔTin-out = X °C (intake vs exhaust)
  • Local hotspot airflow present: yes/no
  • Hotspot location shifts with airflow direction: yes/no

Heatsink + contact + TIM

Works when

  • Contact resistance is minimized (flatness + controlled pressure).
  • TIM thickness/compression is within the intended range.
  • Mechanical retention maintains pressure across thermal cycling.

Backfires when

  • Large heatsink but poor contact (θcontact dominates).
  • Clamp pressure relaxes (vibration / thermal cycling).
  • TIM pumps out or ages → gradual temperature drift in the field.

Quick checks (placeholders)

  • Pre/post re-test ΔTtop drift ≤ X °C
  • Contact area evidence (inspection): pass/fail
  • Thermal decay after stop-load is “fast” or “slow” (qualitative)

Chassis conduction (when heat “moves”)

Works when

  • Heat is routed to a large thermal mass away from sensitive parts.
  • Interface stack is controlled (pad thickness + compression).
  • Resulting gradients are reduced, not relocated.

Backfires when

  • Thermal “short” conducts heat into a more sensitive component zone.
  • Hotspot relocates (device cools, nearby critical part heats up).
  • Local gradients increase → hidden mechanical stress risk.

Quick checks (placeholders)

  • Max temperature location changes after adding chassis path: yes/no
  • Sensitive-zone ΔT increase ≤ X °C (placeholder)
  • Gradient metric ≤ X °C/cm (placeholder)

Fan strategy (field degradation aware)

Works when

  • Control targets temperature margin, not maximum RPM.
  • Intake filtering and maintenance are designed-in.
  • Fan health is monitored and logged.

Backfires when

  • Dust clogging reduces airflow over time → hidden drift.
  • Noise constraints force lower RPM without re-validating margin.
  • Fan stalls/degrades with no telemetry → misdiagnosed failures.

Log fields (placeholders)

  • fan_rpm, pwm_duty, alarm_count
  • intake_temp, exhaust_temp, Ta_local
  • maintenance_age_days (or filter ΔP if available)

Selection matrix (goal → recommended bundle)

Low cost

  • Copper spreading + via array
  • Air path cleanup (avoid bypass)
  • Declare boundary conditions

Backfire check: hotspot local airflow = yes/no; ΔTin-out within expected range.

Low noise

  • Maximize spreading first
  • Low-RPM fan + clean ducting
  • Control contact resistance (TIM + pressure)

Backfire check: contact-driven drift (pre/post ΔTtop ≤ X °C).

High reliability

  • Fan telemetry + alarms + maintenance plan
  • Dust mitigation (filtering/accessible service)
  • Chassis conduction with sensitive-zone guard

Backfire check: hotspot migration = no; sensitive-zone ΔT ≤ X °C.

Diagram — Cooling selection matrix (2×2)

Cooling method selection (concept) Power density (low → high) Ambient severity / airflow availability (high severity → low) LOW density HIGH density Airflow constrained Airflow available Spreading + chassis path Copper Chassis Backfire: heat migration Chassis + controlled contact TIM Chassis Backfire: sensitive-zone heat Natural / light forced Air path No fan Backfire: bypass airflow Heatsink + forced airflow Heatsink Fan Backfire: contact R / clogging 2×2

The matrix is anchored by power density and airflow constraints. Each quadrant adds a “backfire” tag to force a verification check.

Thermal-aware bring-up & validation

Thermal validation becomes meaningful only when measurement points are defined, steady-state is gated, workload is repeatable, and Tj is derived via the appropriate θ/ψ mapping. This section provides a bring-up flow that is production- and field-friendly.

1

Define conditions

  • Ambient and location: Ta measured at X point (placeholder).
  • Airflow mode: fixed RPM or closed-loop control (declared).
  • Workload template: W1–W4 + window (peak/sustained).
  • Power scope: chip-only vs board-including (label).

Pitfall

Unfixed conditions produce non-comparable results and false “randomness”.

2

Measure points & sensors

Minimum measurement set

  • Ttop/Tcase (top-of-package)
  • Tboard (nearby copper region)
  • Tin/Tout (intake/exhaust)
  • Ta (ambient; declared location)

Sensor notes (high-impact)

  • Thermocouple attachment method must be consistent (adhesive/coverage).
  • IR camera requires emissivity declaration (placeholder) and reflection control.
  • On-board sensors are position-biased → record location and distance.
3

Gate steady-state

Use a steady-state gate based on time constant and temperature slope. Avoid taking a snapshot during transient warm-up.

Gating criteria (placeholders)

  • Wait time ≥ X × τ (placeholder)
  • dT/dt < X °C/min over Y min (placeholder)
  • Re-run workload template if retrain/error spikes occurred
4

Convert & decide

Tj conversion (choose one)

  • Tj ≈ Ttop + P × ψJT
  • Tj ≈ Tboard + P × ψJB
  • Tj ≈ Tcase + P × θJC (case defined)

Pass criteria (placeholders)

  • Tj margin ≥ X °C
  • Tcase ≤ X °C
  • Hotspot gradient ≤ X °C/cm

Record fields (placeholders)

  • workload_id (W1–W4), window_type (peak/sustained)
  • steady_state_met (yes/no), retrain_spikes (count)
  • Ttop, Tboard, Tin, Tout, Ta, P_total (scope label)

Diagram — Thermal validation flow

Thermal bring-up & validation (repeatable) Setup Ta / airflow / scope Run workload W1–W4 + window Steady? dT/dt gate Measure Ttop / Tboard / Tin / Tout Convert to Tj ψJT / ψJB / θJC Pass / Fail margin thresholds Log packet (placeholders) workload_id window_type steady_state Ttop / Tboard Tin / Tout / Ta P_total

The flow enforces steady-state gating and a declared θ/ψ mapping before pass/fail decisions are made.

Power integrity & thermal coupling

This section treats power delivery only through a power/thermal lens: conversion efficiency, transient spikes, protection-driven derating, and the temperature-to-power positive feedback loop. It avoids electrical-layer tuning details.

DC/DC efficiency becomes board heat

Even if chip power is unchanged, input-side loss turns into extra heat near the power stage and copper planes. A practical placeholder range is +10–20% board heat overhead, depending on efficiency and loading.

Thermal viewpoint checklist

  • Declare DC/DC efficiency at the tested load (η = X%, placeholder).
  • Log converter temperature zone near hotspots (Tpwr, placeholder).
  • Separate “chip power” vs “board-including power” in reports.

Transient spikes distort power and heat readings

Training / wake / relock windows can create short power spikes. These can cause voltage ripple and measurement aliasing, leading to under-estimated sustained heat or over-estimated per-Gbps numbers if sampling windows are not aligned.

Quick “sanity gates” (placeholders)

  • Peak window length = X ms; sustained window length = Y s.
  • Spike count per minute ≤ X (placeholder).
  • Ripple proxy / variance metric flagged: yes/no.

Protection-driven derating (thermal safety valves)

Over-temperature protection and throttling are “breakers” that prevent runaway. They also change observed power, so validation must log whether the system is in normal, throttled, or degraded mode.

Common breakers (strategy only)

  • OTP threshold reached → throttle / cut activity (placeholder T).
  • Lane disable / reduced duty cycle to preserve temperature margin.
  • Low-power modes (e.g., energy saving) to reduce sustained heating.

Temperature → leakage → power (positive feedback)

As temperature rises, leakage and bias shifts can increase power, especially in advanced processes. This creates a positive loop that must be broken by airflow, throttling, or derating rules.

What to log (minimal set)

  • Ttop/Tj proxy, power scope label (chip/board)
  • Throttle state (normal/throttled), lane count active
  • Converter η (placeholder), local power-stage temperature

Thermal–electrical coupling loop (explain + break)

A practical way to prevent misleading power numbers is to explicitly label the loop states: normalwarmingnear-limitthrottled. Validation should record which “breaker” is active (airflow, throttling, derating) when the temperature slope changes.

Diagram — Thermal–electrical positive feedback loop with breakers

Thermal–electrical coupling (positive loop) Temp ↑ Ttop / Tj proxy Leakage ↑ bias drift Power ↑ chip + board Heat ↑ local hotspots P × θ Breakers Throttle / Derate Airflow / Cooling Budget guardrails Note DC/DC loss adds board heat overhead (placeholder: +10–20%). Transient spikes require explicit peak vs sustained windows.

The loop is intentionally simple: it enforces scope labeling and breaker logging rather than electrical tuning details.

Design guardrails & derating rules

This section turns the page into reusable engineering guardrails: power budgets across chip/port/system, thermal budgets with explicit margin, and derating rules that map temperature back to allowable throughput or duty-cycle.

Three-layer power budget

  • Chip: silicon-only power (declared state + temperature)
  • Port: per-port / per-lane budget (scales, but not perfectly linear)
  • System: DC/DC loss + fan + chassis paths (board-including)

Guardrail rule: reports must label scope (chip vs board-including).

Thermal budget (with explicit margin)

  • Ta(max) = X °C (placeholder)
  • Tj(max allowed) = X °C (placeholder)
  • Reserve margin = X °C (placeholder)

Guardrail rule: validation must demonstrate margin under sustained workload, not only peak snapshots.

Derating rule model (placeholder)

Use a declared linear model to map temperature to allowed activity: for each +ΔT, reduce duty-cycle or throughput by a fixed step.

Example format (placeholders)

  • If T ≥ Tguard, then duty = duty − k × (T − Tguard)
  • If T ≥ Tcrit, then throttle state = ON
  • If T returns below Trecover, then ramp back with hysteresis

Copyable guardrails (If / Then)

  • If system power scope is used, then DC/DC efficiency (η) and fan power must be logged.
  • If peak windows are reported, then sustained windows (Y s) must be reported with steady-state gating.
  • If Tj margin < X °C, then reduce duty-cycle or throughput per the declared derating model.
  • If hotspot relocates after adding chassis conduction, then redesign thermal path to protect sensitive zones.
  • If P/Gbps exceeds X mW/Gbps (placeholder), then upgrade package / spreading / airflow bundle before lock-in.
  • If throttling is observed in validation, then published performance must be labeled “throttled-mode” or redesigned.
  • If dust/aging is expected, then add maintenance fields (age, filter status) and re-validate margins over time.
  • If any budget changes, then the loop (budget → Tj → validate → derate/redesign) must be re-run and re-locked.

Diagram — Budget → thermal → derate loop (spec lock)

Guardrail loop: budget → thermal → validate → derate / redesign → lock Budget power chip / port / system Map to Tj θ / ψ + Ta(max) Validate steady-state gate Derate duty / throughput rules Redesign package / cooling bundle Lock spec publish with scope Always label: scope + window Re-lock after any change

The loop enforces re-validation whenever a budget, cooling bundle, or workload assumption changes.

Applications (Power/Thermal-only)

Applications are framed strictly as “how thermal constraints reshape system design.” No protocol details are expanded here. Each card provides: scenariothermal pain pointsdesign actionsvalidation items.

Material numbers note: examples below are common, widely stocked families. Always verify thickness / size / suffix / safety rating / availability against your mechanical stack and compliance needs.

Industrial long-line gateway (sealed box + high Ta + sustained load)

High Ta Sustained duty Airflow limited

Thermal pain points: steady-state heat soak, converter loss adding board heat, hotspots trapped under shields, and margin collapse when dust aging reduces airflow.

Design actions: define a front-to-back airflow path (even “pseudo-ducts” with foam), isolate power-stage heat from PHY/SerDes, and enforce a derating curve tied to measured surface temperature.

Validation items (placeholders)

  • Steady-state gate: wait ≥ X·τ before declaring pass/fail.
  • Record: Ta, Tin/Tout, Ttop, Tboard-hotspot, throttle state, DC/DC η (placeholder).
  • Pass: Tj margin ≥ X °C; surface ≤ X °C; hotspot gradient ≤ X °C/cm.

Example materials / part numbers (verify fit)

  • Thermal tape: 3M 8810 / 3M 8815 (bonding, thin interface).
  • Thermal pad family: (Henkel/Bergquist) GAP PAD 1500 series (thickness-dependent variants).
  • Heatsink (example): Aavid Thermalloy 576802B00000G (board-level clip-on style).
  • Fan family (example): Delta AFB series (size/speed variants); SUNON MagLev series (variants).

Camera links (small volume + remote power heat concentration)

Small enclosure Heat concentrated Shield conflict

Thermal pain points: hotspot under RF/EMI shields, heat path competing with shielding, and skin temperature constraints (touch-safe surface).

Design actions: create a defined conduction path (chip → spreader → chassis), avoid “thermal shorting” into sensitive optics, and prefer pads/tapes that keep thickness stable over aging.

Validation items (placeholders)

  • Surface temperature: Tsurf ≤ X °C (touch-safe placeholder).
  • Measure Ttop at shielded hotspots; verify gradient vs chassis region.
  • Run steady + burst template to detect wake/processing heat spikes.

Example materials / part numbers (verify fit)

  • Thermal tape: 3M 8810 (thin bonding to chassis/spreader).
  • Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (choose thickness for stack).
  • Thermal grease (example): DOWSIL TC-5022 (industrial TIM family; verify grade/spec).
  • Shield interface: conductive foam + thermal pad stack (select by compliance need).

Server retimers (high lane count + high heat density)

High heat density Airflow-dependent Sustained workloads

Thermal pain points: heatsink attachment quality (contact resistance), airflow shadowing, and “best-looking per-Gbps” that collapses once throttling engages under sustained traffic.

Design actions: prioritize mechanical repeatability (clip/torque), enforce airflow directionality, and require telemetry / temperature flags for production correlation.

Validation items (placeholders)

  • Require “sustained” window: Y minutes at worst duty + worst Ta.
  • Log: Ttop/Tboard, airflow state, heatsink attachment lot/process.
  • Pass: no throttling within X margin; stable temperature slope.

Example materials / part numbers (verify fit)

  • Heatsink (example): Aavid Thermalloy 576802B00000G (example family; choose footprint).
  • Thermal tape: 3M 8815 (higher thickness variant vs 8810).
  • Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (compressibility helps tolerance).
  • Fan family: Delta AFB series / Nidec UltraFlo series (select by airflow/noise targets).

Automotive (temperature cycling + lifetime + degradation)

Cycling & aging Derating required Logging & fallback

Thermal pain points: repeated thermal stress changes contact resistance and pad performance; field conditions drift beyond lab assumptions; failures can be intermittent and correlation requires telemetry.

Design actions: enforce conservative margins, define derating by temperature bands, and implement graceful degradation modes (reduced lane count / reduced duty) tied to measured temperature.

Validation items (placeholders)

  • Cycle test: ΔT range = X °C; cycles = N (placeholders).
  • Log: max Ttop, time-over-threshold, derate events, recovery hysteresis.
  • Pass: no uncontrolled throttling; predictable derate curve; stable hotspot mapping.

Example materials / part numbers (verify fit)

  • Thermal tape: 3M 8810 / 3M 8815 (process repeatability; verify temp rating).
  • Gap pad family: (Henkel/Bergquist) GAP PAD 1500 series (choose for cycling compliance).
  • Thermal grease (example): DOWSIL TC-5022 (verify grade/spec for automotive constraints).
  • Heatsink (example): Aavid Thermalloy 576802B00000G (verify vibration/attachment needs).

Diagram — Scenario → thermal constraints mapping (card-style)

Applications (Power/Thermal-only): constraints → actions → validation Industrial gateway High Ta • Sealed • Sustained duty Pain: heat soak + converter loss + dust aging Action: airflow path + isolation + derate curve Camera link node Small volume • Heat concentrated • Shield conflict Pain: hotspot under shield + touch surface limit Action: conduction path + stable TIM stack Server retimer High lanes • High density • Airflow-dependent Pain: contact R + airflow shadow + throttling Action: repeatable attach + telemetry + ducting Automotive node Cycling • Lifetime • Field drift Pain: aging + intermittent thermal correlation Action: conservative margin + derate + logging

Each scenario card intentionally shows only thermal constraints, actions, and validation signals.

IC selection logic (Power/Thermal)

This is a decision workflow (not a product list). It starts with required inputs, computes junction temperature feasibility, then decides package/cooling tier and whether derating/telemetry is mandatory.

Required inputs (must-fill)

  • Line rate + lane/port count
  • Duty-cycle / workload template (steady vs burst)
  • Ta(max), airflow availability, enclosure type
  • Board area + z-height for heatsink/duct
  • Allowed surface temperature (Tsurf limit)

Missing inputs = misleading mW/Gbps comparisons.

Power/Thermal metrics to demand

  • mW/Gbps with declared scope (chip-only vs board-including)
  • Pmax at Ta(max) (peak vs sustained windows labeled)
  • θ/ψ metrics with test conditions (board, copper, airflow)
  • Package thermal path features (EPAD / BGA heat spread)
  • OTP + thermal telemetry availability (verify in datasheet)
  • Low-power state benefit vs wake spike cost (thermal shock)

Red flags (risk markers)

  • No θ/ψ test conditions published
  • No power vs rate/state/temperature curves (or equivalent)
  • Only “typical” power; no max/corners labeling
  • No OTP or no measurable thermal/derate status
  • Peak-only claims without sustained thermal gating

Example IC material numbers (for comparison workflows)

These are examples only (not recommendations). Use them to build a consistent power/thermal comparison sheet and validate availability, package, and suffix.

PCIe retimer / redriver

  • TI DS80PCI402
  • TI DS160PR412
  • TI DS280DF810

USB redriver / hub (examples)

  • TI TUSB522
  • TI TUSB1046
  • Microchip USB5744 (hub family example)

Ethernet PHY (examples)

  • TI DP83822I
  • Microchip LAN8840
  • NXP TJA1103 (automotive Ethernet PHY example)

Thermal management materials

  • 3M 8810 / 3M 8815 (thermal tape)
  • (Henkel/Bergquist) GAP PAD 1500 series
  • Aavid 576802B00000G (heatsink example)
  • DOWSIL TC-5022 (TIM example)

Diagram — Power/Thermal selection decision tree

Power/Thermal selection: inputs → compute → decide → lock guardrails Inputs (must-fill) Rate + lanes/ports Duty / workload Ta(max) + airflow Board space + z Compute Budget power (scope) Peak vs sustained Map to Tj (θ/ψ) Margin check Decide Package tier Cooling tier Derate required? Telemetry needed? Red flags (risk markers) No θ/ψ conditions • No power-vs-state curves • Typical-only • No OTP/telemetry • Peak-only claims Action: flag risk → require data → re-run compute loop → then lock Output (what gets locked) Package/cooling tier • derating rule • required telemetry • published scope + windows Examples: 3M 8810/8815 • GAP PAD 1500 • Aavid 576802B00000G (verify fit)

The workflow forces declared measurement scope and sustained thermal gating before any “per-Gbps” comparison is accepted.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Power & Thermal)

Troubleshooting is scoped strictly to power definitions, thermal metrics, cooling effectiveness, derating behavior, and measurement traps. Each answer is intentionally short and executable.

Answer format: Likely cause / Quick check / Fix / Pass criteria (thresholds use X/Y placeholders)

“Datasheet per-Gbps looks great, but system power is 30% higher”—first accounting check?

Likely cause: mismatched scope (chip-only vs board-including), wrong denominator (line rate vs payload), or missing losses (DC/DC η, LDO drop, termination, clocks).

Quick check: measure VIN×IIN at board input and compare against summed rail power at the IC load points; tag state: lanes, rate, FEC/DFE enable, idle vs sustained window.

Fix: lock a single comparison template: scope (chip/board), denominator (line/payload), window (peak/sustained), and always include conversion losses using measured DC/DC efficiency at that load point.

Pass criteria: accounting gap ≤ X% (board-in vs sum-of-rails), and reported mW/Gbps varies ≤ X% across repeated runs with identical state tags.

“Works for 5 minutes then drops”—is it thermal throttling or link instability?

Likely cause: steady-state thermal rise crossing a throttle/OTP threshold, or temperature-driven leakage pushing power higher until protection engages.

Quick check: log a timeline of Ttop/Tboard, input power, and throttle/derate flags; look for “drop event” alignment with a temperature knee or a power step.

Fix: reproduce with a controlled workload window (sustained Y min), add airflow or heatsink contact improvement, and apply a derate rule before the threshold (temperature hysteresis included).

Pass criteria: no throttle/OTP during Y minutes sustained worst-case workload at Ta(max), and temperature slope |dT/dt| ≤ X °C/min after reaching steady state.

Same throughput, different PCB revision runs hotter—what board-level variable usually changed?

Likely cause: reduced heat spreading (copper area/planes), fewer or smaller thermal vias, changed component placement creating an airflow shadow, or added nearby heat sources (DC/DC, shield can).

Quick check: compare rev-A vs rev-B: copper pour under the package, via count/pitch, inner-plane continuity, and physical obstructions above the device; capture identical workload heatmaps at the same Ta.

Fix: restore thermal path symmetry (spread copper, add via array, keep planes continuous), and enforce a mechanical keep-out to protect airflow over the hotspot.

Pass criteria: ΔT(top) between revisions ≤ X °C at the same sustained window, or demonstrated θJA improvement ≥ X% with the same test setup.

IR camera shows low temp but failures correlate with heat—what emissivity/spot error to suspect?

Likely cause: wrong emissivity on shiny surfaces, reflections from hot neighbors, spot size larger than the hotspot, or viewing angle causing under-read.

Quick check: place matte tape/paint dot on the package top, re-measure with emissivity set accordingly, and cross-check with a thin thermocouple; confirm the IR spot diameter < X× hotspot size.

Fix: standardize IR procedure: emissivity reference patch, fixed distance/angle, and a calibration step against a contact sensor on the same surface.

Pass criteria: IR vs contact measurement error ≤ X °C on the same marked spot under steady-state conditions.

θJA from datasheet underestimates Tj by a lot—what JEDEC condition mismatch is typical?

Likely cause: θJA was measured on a specific JEDEC board (copper, layers, airflow, orientation) that does not match the real PCB (smaller copper, blocked airflow, enclosure).

Quick check: read the datasheet θJA conditions (board type, copper area, airflow) and compare to the actual stack; use ψJT/ψJB with measured Ttop/Tboard to back-calculate Tj.

Fix: replace single θJA with a board-calibrated model: measure Ttop and Tboard under known power, derive an effective θ for that assembly, then re-run margin and derate rules.

Pass criteria: effective θ model predicts measured Ttop/Tj within ≤ X °C across two workloads (steady + burst) at Ta(max).

Fan increases airflow but hotspot gets worse—what airflow path or recirculation pattern causes this?

Likely cause: short-circuit airflow (inlet to outlet bypass), recirculation loop, or a new “shadow zone” where high-speed flow skips the hotspot and pulls warm air back over it.

Quick check: run a simple airflow visualization (smoke thread / tissue / anemometer points) to confirm direction and bypass; measure Tin/Tout and hotspot Ttop with the same workload window.

Fix: add ducting/foam baffles to force flow across the hotspot, separate inlet/outlet, and remove obstructions that create a local recirculation pocket.

Pass criteria: hotspot temperature improves by ≥ X °C and ΔT(Tout−Tin) increases by ≥ X °C (indicating useful heat extraction) under the same sustained load.

Low-power mode saves energy but increases temperature spikes after wake—what transient to log?

Likely cause: wake causes a short burst of digital activity and analog re-lock that raises instantaneous power; average energy improves but peak thermal shock increases.

Quick check: log power at high time resolution (≥ X ksps equivalent) for the first Y ms after wake, plus temperature slope; correlate spikes with wake events.

Fix: stagger wake (ports/lane groups), limit wake burst rate, or pre-condition airflow; if allowed, add a soft-start ramp in firmware/power sequencing.

Pass criteria: peak power during wake ≤ X% of sustained power budget, and post-wake ΔT within Y secondsX °C (thermal shock bound).

Thermal pad added but no improvement—what contact resistance or mounting pressure issue is common?

Likely cause: pad too thick/hard (insufficient compression), uneven pressure, air gaps/voids, or a flatness mismatch that prevents real contact at the hotspot.

Quick check: inspect pad “imprint” after disassembly; verify compression ratio (actual thickness vs nominal), and check if the hotspot aligns with the contact zone.

Fix: choose a pad with appropriate softness/compressibility and thickness tolerance; improve mounting planarity and clamp force; consider tape/grease if stack height allows.

Pass criteria: confirmed compression within X–Y% target range and hotspot reduction ≥ X °C under identical sustained workload.

Why does higher ambient cause more than linear power rise—what leakage/OTP behavior explains it?

Likely cause: temperature increases leakage and bias currents, raising power; higher temperature can also reduce DC/DC efficiency and trigger pre-throttle behavior that changes operating point.

Quick check: sweep Ta in steps (e.g., +10 °C) and log input power, device temperature, and any throttle flags; check whether power rise accelerates near a protection threshold.

Fix: apply derating by temperature bands, ensure airflow margin at Ta(max), and re-validate power budget using the highest-Ta sustained window (not a short burst).

Pass criteria: at Ta(max), sustained input power ≤ P_budget and Tj margin ≥ X °C; no protection/pre-throttle activation within Y minutes steady window.

Port-to-port temperature spread is huge—what copper/heat-spreading asymmetry is most common?

Likely cause: unequal copper planes/via arrays, one port located near a hot converter or shield edge, or uneven airflow distribution across the board.

Quick check: compare each port’s copper/via pattern and nearby heat sources; measure a synchronized heatmap under the same traffic/duty and confirm airflow direction across all ports.

Fix: balance heat spreading geometry (plane continuity, via density), move or isolate nearby heat sources, and use airflow guides so each port sees similar inlet temperature.

Pass criteria: ΔT(port max − port min) ≤ X °C at the sustained worst-case window and Ta(max).

DC/DC runs cool but PHY is hot—what efficiency/load point misleads the measurement?

Likely cause: power was measured at the wrong node (before LDOs or after sense resistors), or DC/DC is off the hotspot while the IC dissipates most heat locally; efficiency curve may be poor at transient load.

Quick check: measure rail power at the IC load points (V at the pins + I into the rail) during both peak and sustained windows; compare against VIN×IIN to reveal hidden drops/losses.

Fix: instrument power with correct sense locations (Kelvin), include all intermediate losses (LDO, cable, connector), and use a converter operating region that matches the sustained load point.

Pass criteria: power discrepancy between VIN×IIN and sum of rail powers ≤ X%, and sustained rail ripple/voltage droop stays within X (unit placeholder) during retrain/wake events.

Production passes at 25°C but field fails in summer—what margin number is usually missing?

Likely cause: qualification used short bursts (not sustained), did not test Ta(max), ignored airflow degradation (dust/blocked vents), or lacked an explicit Tj/surface margin requirement.

Quick check: replicate a field-like sustained duty at elevated Ta and reduced airflow; log time-over-temperature, throttle events, and peak-to-steady temperature rise.

Fix: define acceptance at Ta(max) with a sustained window and explicit margin, then implement derating before the limit; add telemetry/log fields for correlation in the field.

Pass criteria: at Ta(max) and “degraded airflow” condition, Tj margin ≥ X °C, surface ≤ X °C, and no throttle/OTP during Y minutes sustained duty.