NVLink / High-Speed Interconnect Switch Explained

Q: Link training succeeds, but BER/CRC slowly increases over time—what are the most common root-cause buckets?

A slow climb usually means thin margin being consumed by temperature drift, supply/PLL noise, or an overfit equalization profile that degrades across PVT. Another common cause is inadequate observability: counters are sampled too coarsely, so early warning signals are missed. Use counters to localize port/direction/condition, then reproduce with a minimal matrix. See H2-8 and H2-9.

Q: Why can adding a “jitter cleaner” make the system less stable instead of better?

Jitter cleaning is not always beneficial. Instability often comes from a poor injection point or a loop-bandwidth choice that either tracks reference noise (too wide) or reacts slowly to disturbances (too narrow), creating wander, lock stress, or phase steps. Cleaner settings must match the link’s jitter tolerance and noise spectrum. See H2-6.

Q: Refclk ppm is “in spec” but links still flap—how should phase noise/jitter be measured and attributed?

PPM describes long-interval accuracy; it does not guarantee low phase noise. Convert phase-noise to integrated jitter over a defined band, then map that jitter to CDR/BER sensitivity. Attribute the issue using controlled, repeatable tests that separate refclk contribution from channel/equalization effects, and bind the method to validation gates and logs. See H2-6 and H2-10.

Q: What is the correct tuning order for TX FIR / CTLE / DFE, and how to avoid overfitting?

Lock data rate and baseline presets, then target the worst lane under a representative stress. Typically converge CTLE first, then TX FIR, and use DFE last and sparingly. Avoid overfitting by validating margining across temperature/voltage corners and freezing the final profile under configuration management. See H2-5.

Q: Only a few lanes are always the worst—how can margining/loopback separate channel issues from a bad port?

Use margining to characterize whether the weakness is eye height, width, or timing; then use loopback and controlled swaps to determine whether the weakness follows a physical route (channel-coupled) or a specific silicon lane/port (port-coupled). Reduce to one variable before deeper EQ changes. See H2-5 and H2-9.

Q: Links fail only when hot—how does temperature affect SerDes/PLL margin, and what should be logged?

Temperature can shift channel loss, alter equalization convergence, and degrade PLL/clock margin, turning a borderline eye into intermittent retraining or BER slope changes. Log per-port temperature, data rate, EQ profile ID, CDR/deskew indicators, and counter snapshots at consistent intervals, then reproduce with a temperature × rate × port matrix. See H2-7, H2-8, and H2-9.

Q: The same port behaves very differently across data rates—does this indicate EQ coverage or jitter margin limits?

Large rate sensitivity usually comes from insufficient equalization range for the channel at a given Nyquist, or a refclk/PLL jitter margin that is rate-dependent. Attribute using measurable quantities: margining distribution, worst-lane shift, CDR stress flags, and BER slope under fixed stress recipes and consistent pass/fail gates. See H2-3, H2-5, and H2-6.

Q: How to design a “minimal” production test that catches marginal ports without blowing up cycle time?

Focus on the most discriminating conditions, not exhaustive coverage. Use short PRBS loopback, a fast margining check, and a worst-lane screening rule per port group, with stop rules to control time. Preserve traceability by recording enough fields to reproduce a marginal case in the lab. See H2-10.

Q: Without high-end lab gear, how can counters + PRBS quickly decide whether “link quality” is acceptable?

Establish a counter baseline, run a short PRBS/loopback window under controlled conditions, and compare error slopes against a known-good reference. Counters provide localization (where/when), PRBS provides stress. Follow a symptom tree to decide whether the dominant bucket is channel, clock/jitter, EQ, or thermal drift. See H2-8 and H2-9.

← Back to: Data Center & Servers

An NVLink / high-speed interconnect switch is the fabric node that connects many SerDes links, restoring signal margin with retiming/equalization and protecting system stability with reference-clock jitter conditioning and rich link telemetry.

The core of design-in success is measurable margin: a repeatable tuning flow (CTLE/FIR/DFE), a defensible jitter budget, and logs/counters that turn intermittent field issues into actionable root-cause buckets.

H2-1 · What it is & boundary

What an interconnect switch is—and what it is not

An NVLink-class interconnect switch is a multi-port SerDes switching node that routes high-speed lane groups between endpoints (typically GPUs/accelerators). It combines crosspoint switching (port-to-port mapping), a switch fabric (multi-port forwarding under load), and often signal-conditioning capabilities such as equalization and retiming. In practice, its success is measured by stable BER/CRC behavior across temperature, voltage, and traffic stress, backed by actionable counters and event logs.

The engineering boundary is defined by what must be solved at the topology level versus what can be solved on a single link:

Single-link eye closure Lane skew / bonding stability Refclk jitter margin Port mapping / isolation Multi-hop latency predictability Per-port RAS counters & logs

Out of scope by design: PCIe/CXL switching protocols (ACS/SR-IOV), Ethernet/InfiniBand stack behavior, NIC/DPU dataplane offloads, GPU card VRM/HBM power design, rack-level power/cooling infrastructure, and full BMC/Redfish system architecture. When those topics are needed, a short pointer/link is sufficient; detailed coverage belongs to the relevant sibling pages.

Output: practical boundary comparison

Component	What it fixes	What it won’t fix	Cost & “use when” triggers
Redriver	Boosts/reshapes signaling to compensate moderate loss; provides basic equalization knobs.	Cannot remove accumulated timing noise (no CDR); limited help on severe jitter/ISI; does not solve multi-port routing.	Low latency Lower complexity Use when the channel loss is manageable and the problem is amplitude/ISI, not clocking margin.
Retimer	Re-clocks data with CDR; reduces jitter accumulation; improves eye at the receiver; stabilizes long or noisy channels.	Does not provide topology-level port mapping; cannot isolate traffic domains; cannot replace fabric-level forwarding.	Adds fixed latency Power/thermal cost Use when BER improves with re-clocking and failures correlate with jitter/phase-noise margin.
Interconnect Switch	Routes lane groups across multiple endpoints; enables isolation, reroute/disable policies, and switch-local observability (counters/logs). May also include equalization/retiming.	Cannot “mask” a fundamentally broken channel without margin; cannot replace endpoint SerDes quality; does not belong to protocol-stack performance tuning.	More latency/power System integration Use when multi-endpoint routing, fault isolation, and verifiable RAS/telemetry are required—not just a cleaner eye.

Figure F1 — Boundary map: Channel → Retimer/Redriver → Switch

H2-2 · Where it sits

Topology, ports, and lane groups: a practical placement model

In an interconnect domain, a “port” is best treated as a bonded lane group rather than a single wire. This matters because most real-world failures show up as one lane becoming the limiter (skew, loss, crosstalk, or margin collapse under temperature). A placement model that speaks in lane groups makes topology design, validation, and field-debug repeatable.

A useful abstraction is: endpoint ↔ (lane-group links) ↔ interconnect switch ↔ (lane-group links) ↔ endpoint. Without discussing any protocol stack, the key system behaviors can be predicted by three latency contributors:

SerDes pipeline latency (per port): baseline encode/decode and elastic buffering.
Retiming latency (optional): fixed delay added when CDR re-clocks the data path.
Fabric hop latency: forwarding delay that scales with hop count and internal contention.

Port planning should optimize not only bandwidth but also maintainability: clear lane-group naming, predictable breakout rules, and an explicit plan for isolation (what happens when a single port or lane fails). This reduces “random” failures into observable, bounded cases.

Output: port planning checklist (interconnect domain)

Lane-group definition: fixed lanes-per-port, stable naming, and consistent polarity/ordering rules.
Breakout policy: defined breakout modes and the debug plan for worst-lane identification.
Skew control: deskew tolerance budgets for bonding; avoid mixing very different path lengths inside one group.
Clock domain clarity: which ports share a reference clock; where jitter cleaning sits; how skew is bounded.
Sideband intent: reset/health visibility and a minimal method to trigger training/loopback when needed.
Isolation paths: ability to disable a port/lane group and keep the rest of the domain stable.
Validation hooks: test access (PRBS/loopback), counters snapshot points, and a “worst-case matrix” plan.

The placement model intentionally stays at the interconnect domain level. Anything that depends on PCIe/CXL or Ethernet/IB protocol behavior is excluded here and should be handled in its dedicated pages.

Figure F2 — Topology view: endpoints, lane groups, ref clock, sideband

H2-3 · Key metrics that matter

Datasheet metrics that predict real stability

For an interconnect switch, “good on paper” is not enough. The practical goal is predictable latency and stable error behavior across stress (temperature, voltage, and traffic). The most useful metrics are the ones that can be measured, trended, and tied to a fail signature using switch-local counters and margin tests.

Spec traps to ignore (or demand proof for)

Peak bandwidth without internal contention assumptions; “typical latency” without retiming mode and hop count; “supports PAM4” without margining methods; “low jitter” without test conditions and reference-clock assumptions.

Output: Metric → Engineering meaning → How to measure

Metric (what to check)	Engineering meaning (why it matters)	How to measure / prove
Ports, lanes/port, aggregate bandwidth Capacity	Determines topology scale and whether lane groups can be mapped without awkward breakouts. Weak capacity often forces extra hops, increasing latency variance and margin loss.	Validate with a topology model: lane-group map, hop count, and “worst-case” mapping. Require a clear porting diagram and supported lane-group modes.
SerDes rate mode (NRZ/PAM4) PHY	Indicates signaling style and sensitivity to channel loss and jitter margin. PAM4 typically demands stronger equalization and more disciplined margining to avoid “runs but unstable.”	Prove with margining results (eye height/width or equivalent) at target data rate and channel condition (loss/XT). Demand corner coverage, not only typical.
Latency components Determinism	Latency is not one number: SerDes pipeline + optional retime fixed delay + fabric hop/queue. This predicts tail behavior and multi-hop predictability.	Request latency breakdown by mode (retime on/off) and by hop. Measure with controlled traffic patterns and hop sweep; record min/p50/p99 under thermal stress.
Equalization knobs Margin control	Tunable TX FIR, CTLE, and DFE determine whether the channel can be pulled back from eye closure without overfitting noise. More knobs are useful only if they are observable and repeatable.	Use PRBS/BERT or built-in margin tests to map “knob sweep → margin change.” Require saved profiles per port and a method to export settings + results.
Lane margining support Proof	Margining turns “it works” into “it has headroom.” It separates marginal designs from robust ones and supports fast binning in production.	Run margin sweep per lane group across corners (temp/voltage). Capture worst-lane distribution and a pass/fail threshold tied to field risk.
Error counters Observability	CRC trends, deskew events, CDR lock events, and retry/training events provide the earliest signal that margin is collapsing—often before a hard failure.	Verify counter coverage per port and counter reset semantics. Trend counters against temperature and traffic; require event timestamps or ordered snapshots.
Switch-local thermal & rail alerts Operate	Many “random” failures are thermal or rail-noise correlated. Switch-local alarms enable correlation without depending on external systems.	Confirm alert thresholds, hysteresis behavior, and log visibility. Heat-soak tests: correlate error slope with temperature and alert states.
RAS features Reliability	Lane repair, port isolation, and link downgrade policies prevent a single weak lane from cascading into full domain instability and reduce MTTR in the field.	Fault-inject with worst-lane conditions (margin squeeze) and verify isolation behavior. Require logs that prove why a downgrade/isolation happened.

The best “real metrics” are the ones that can be closed into a loop: measure → trend → correlate → act. If a spec cannot be measured in the intended environment, it should not be used as the primary selection driver.

Figure F3 — Metrics map: capacity, determinism, margin, and proof

H2-4 · Inside the box

Data path anatomy: where switching differs from “a bigger retimer”

A practical internal view is a data-path stack: ingress SerDes conditioning builds a clean lane stream; lane bonding/deskew forms a stable lane group (“port”); a crosspoint or fabric maps ingress ports to egress ports; and the egress SerDes drives the channel. Optional retiming points trade fixed latency for jitter cleanup.

Switching is fundamentally different from retiming because it introduces topology control: port mapping, isolation, and fault containment. Those features must be paired with switch-local counters and events; otherwise failures look random and cannot be proven robust.

Output: error-injection points (what becomes sensitive where)

Ingress PHY: EQ overfit can hide margin loss until temperature shifts; CDR lock margin collapses under refclk noise.
Bonding/deskew: one “worst lane” dominates; skew drift triggers deskew events and error bursts.
Fabric/crosspoint: internal contention creates latency variance; hot spots couple into timing margin if thermals rise.
Egress PHY: output jitter/ISI sensitivity depends on final EQ profile and channel variation.
Monitor taps: poorly placed counters can show “clean” while the real weak lane is failing.

Architecture discussion stays at the SerDes/crosspoint/fabric level. Protocol-level behaviors and endpoint architectures are excluded and should be treated as separate topics.

Figure F4 — Chip anatomy: PHY, bonding, fabric, retime points, monitors

H2-5 · Retiming & equalization

Channel budget, EQ boundaries, and a repeatable tuning SOP

The fastest way to stabilize a high-speed interconnect is to treat the link as a channel budget problem, not a “knob-twiddling” problem. Channel impairments collapse eye margin through distinct mechanisms, and each EQ tool has a clear boundary: CTLE shapes the receive spectrum, TX FIR pre-emphasizes to counter loss, DFE targets post-cursor ISI, and retiming (CDR) trades fixed latency for timing cleanup.

Channel model → eye / BER impact

Insertion loss reduces high-frequency energy and increases ISI (eye closure in width/height).
Return loss creates reflections that produce “patterned” distortion and unstable convergence.
Crosstalk injects noise that can look like ISI; aggressive DFE may amplify errors.
Group delay ripple distorts symbol timing across frequency, causing non-intuitive failures under corners.

EQ toolbox: boundaries and tradeoffs

Tool	Primary job	Typical side effects / limits
TX FIR	Counter insertion loss by shaping transmit spectrum and reducing ISI at the receiver.	Can increase sensitivity to coupling/XT; poor profiles cause overshoot and mask real noise.
CTLE	Boost high-frequency components at the receiver to reopen the eye under lossy channels.	Also boosts noise; too much CTLE reduces SNR and makes DFE decisions unstable.
DFE	Cancel post-cursor ISI with decision feedback when linear EQ is insufficient.	Can misinterpret noise/crosstalk as ISI and amplify error bursts; must be bounded.
CDR / Retiming	Improve timing stability by re-establishing sampling phase; reduces accumulated jitter sensitivity.	Adds fixed latency and can create mode-dependent determinism risks; requires proof under corners.

Output: a copyable tuning SOP (steps + record fields)

Step 0 — Lock the experiment (baseline)

Fix data rate and training mode. Snapshot per-port counters (CRC/deskew/CDR events) and an initial margin readout. Record: rate/mode, profile ID, ambient/board temperature, rail state.
Step 1 — Find the worst lane (do not average)

Run PRBS/BERT (or equivalent) and lane margining to rank lanes. Treat the “worst lane” as the governing constraint for the whole lane group. Record: worst-lane ID, margin curve key points, event-rate slope vs temperature.
Step 2 — Converge in a disciplined order: CTLE → TX FIR → DFE

Adjust one dimension at a time. First stabilize the receive spectrum (CTLE), then shape TX (FIR), then use bounded DFE only if needed. Stop when margin improves monotonically without counter spikes. Record: knob values, pass/fail points, counter deltas per change.
Step 3 — Decide on retiming using a threshold, not preference

Enable retiming when margining shows timing headroom is insufficient or error slopes rise sharply with temperature/voltage. Record fixed latency impact and confirm determinism across modes.
Step 4 — Prove headroom (margining across corners)

Build a corner matrix (temperature, voltage, traffic stress). Require a minimum residual margin and stable counters (no event bursts). Store final per-port profiles and export the proof artifacts.

Common pitfalls: over-DFE can amplify noise and create burst errors; “works at room temp” can fail at hot/cold due to drift; tuning against an average lane hides the true limiter.

Figure F4 — Channel budget waterfall: loss/noise/jitter vs recovered eye margin

H2-6 · Reference clock & jitter cleaners

Why reference-clock jitter is a make-or-break line

In high-speed SerDes links, the reference clock is not just a “frequency source.” Its phase noise and distribution noise shape the timing uncertainty seen by the sampling system. When timing headroom becomes small, links can look acceptable by frequency tolerance yet still show elevated error rates, training instability, or temperature-dependent dropouts.

Concept chain: phase noise → integrated jitter → BER risk

What changes	What it does in SerDes	What it looks like in the field
Refclk phase noise	Reduces effective timing margin through the CDR/PLL path and increases sampling uncertainty.	BER slope rises with temperature; more CDR lock/training events before hard failures.
Distribution noise (fanout / coupling)	Injects additional jitter after the source; port-to-port sensitivity becomes location-dependent.	Some ports are consistently weaker; failures correlate with certain load/thermal states.
Skew / isolation issues	Creates lane-group instability and reduces the ability to deskew/hold alignment under stress.	Deskew events spike; link “flaps” only in specific corners.

Output: jitter budget template (fill-in fields)

Inputs

Refclk source (measurement or vendor curve) · distribution nodes (fanout, routing segments) · cleaner mode (if used) · operating corners (temp/voltage).

Process

Convert “noise description” into integrated timing risk (conceptually: phase noise → integrated jitter → RJ/DJ behavior), then correlate with margining and switch-local events/counters.

Outputs

Residual timing margin vs pass threshold · per-port sensitivity map · event-rate trend (CDR/deskew/training) · corner matrix result.

When a jitter cleaner is justified (practical criteria)

Event correlation: CDR/deskew/training events rise sharply with temperature or operating mode.
Timing-direction margin deficit: margining indicates timing headroom is the limiting axis even when amplitude looks acceptable.
Location dependence: a subset of ports fail earlier, consistent with clock-tree injection points.

A cleaner is not automatically beneficial. Placement and loop bandwidth shape what noise is rejected vs passed through. If fanout coupling or return-path noise dominates, “adding a cleaner” can mask the real injection point and still fail under corners.

Figure F5 — Refclk tree and cleaner placement: noise injection points to ports

H2-7 · Power, package, and thermal

Environment-driven drift: why links pass cold and fail hot

Interconnect stability is often limited by environment-driven drift. Temperature rise, coupling noise, and board-level return-path discontinuities can reduce timing and equalization headroom even when frequency tolerance appears acceptable. This chapter focuses only on factors that directly perturb SerDes PHY and PLL/clocking behavior—without expanding into VRM design.

Only the rails that matter (PHY / PLL cleanliness)

Sensitive rails: why “cleanliness” matters

PHY rail noise can translate into eye degradation and higher BER sensitivity.
PLL/clock rail noise can increase timing uncertainty, triggering deskew/CDR events.
Coupling paths are often board-level: return-path detours, plane splits, and shared noisy reference regions.

Decoupling principles (interconnect-domain only)

Keep local bypass close to the sensitive block, preserve a short and continuous return path, and avoid routing that forces the return current to cross discontinuities near SerDes/PLL regions.

Thermal hotspots and stability drift

SerDes banks and PLL regions can form hotspots. As temperature increases, equalization effectiveness can drift and timing headroom can shrink. A practical symptom is a rising slope of error or training/deskew events versus temperature, followed by link flaps or dropouts. Thermal throttling can further change activity patterns and noise coupling, producing second-order stability shifts that appear “random” unless correlated with telemetry.

Package & board-level contributors (focused on return-path continuity)

Reference plane continuity: discontinuities can force return currents to detour, increasing coupling into sensitive zones.
Return-path control: the interconnect domain should avoid unintentional shared return segments with noisy regions.
Local isolation: keep clocking/PLL neighborhoods protected from adjacent switching noise injection points.

Output: thermal–SI linked checklist (what to verify and correlate)

Check item	How to measure / observe	Decision signal
Temperature points (die hotspot / SerDes zone / PLL zone)	Use switch-local sensors (if available) and board sensors closest to SerDes/PLL neighborhoods.	Event rates change sharply across a temperature band; failures repeat at specific temperatures.
Event correlation (deskew / CDR lock / training)	Trend counters versus temperature and operating mode (rate/profile/retime state).	Stable at cold, then sudden increases at hot; “port-local” sensitivity emerges.
EQ drift sensitivity	Compare margining or eye metrics before/after thermal soak using the same profile snapshot.	Residual margin collapses at hot even though the profile is unchanged.
Rail alert association (PHY/PLL)	Correlate rail alerts (switch-local) with event bursts and margin drops.	Alerts align with error spikes; mitigation must focus on the injection path, not on average readings.
Threshold strategy (with hysteresis)	Define trigger thresholds on event slopes and temperature bands; log pre/post snapshots.	Actions occur before dropouts: degrade/isolate/retrain with evidence retained for root cause.

Field signature focus: “cold OK, hot fails” is rarely random. It is typically a correlated interaction between thermal drift, coupling paths, and reduced timing headroom in SerDes/PLL neighborhoods.

Figure F6 — Thermal hotspots and sensitive blocks (drift path to link instability)

H2-8 · Management & telemetry (switch-local)

Observability: the ability to see degradation before it becomes a dropout

High-speed interconnects are maintainable only when degradation is observable. The goal is not to expose full system-management stacks, but to ensure the switch has switch-local telemetry and logging that can separate gradual link-quality decline from transient external causes.

Management interfaces (existence only)

Switches commonly expose configuration and readout paths through sideband-style interfaces such as I²C/SMBus/MDIO classes. These channels allow reading counters, margining results, and local temperature/rail alerts. System management layers are intentionally out of scope.

Telemetry tiers: what to watch

Tier	Metrics (examples)	Why it matters
Link health	CRC/error counters, deskew events, CDR lock/loss, training events	Shows whether the link is weakening (trend) or experiencing bursts (transient).
Margin proof	Lane margining, eye height/width (or equivalent), worst-lane identification	Separates “works” from “has headroom,” and identifies the governing lane.
Environment correlation	Switch-local temperature zones, PHY/PLL rail alerts, rate/mode/profile IDs	Explains why failures cluster at hot/certain modes and enables deterministic reproduction.

Sampling strategy (trend + trigger)

Periodic sampling (baseline trend)

Sample counters and temperature zones at a steady cadence to detect gradual degradation. Trend slopes are more informative than single-point snapshots.

Triggered sampling (capture evidence)

On mode changes (rate/profile/retime toggles) or event spikes (deskew/CDR bursts), take an immediate full snapshot (cfg + env + counters + margin).

Burst window (around failures)

When link flaps or drops, enable a short burst window to collect dense pre/post evidence for reproducibility and root-cause correlation.

Figure F7 — Observability dashboard: Port → counters/margin/env → decision & action

H2-9 · Failure modes & field debug

Unstable links: triage by symptom tree (fast narrowing, evidence-driven)

A link can come up and still fail to run stably when headroom is marginal or when a trigger condition (temperature band, mode switch, jitter injection, or coupling path) pushes the channel across its limit. Field debug should follow a repeatable loop: localize (port / direction / condition), separate domains (loopback / PRBS), and prove the trigger using a minimal reproduction matrix.

First 10 minutes: localize before changing anything

1) Localize: port + direction + condition

Identify whether the issue is port-local, direction-specific, or tied to a specific rate/mode/profile.

2) Classify: creeping vs burst behavior

Rising slopes suggest shrinking margin; sudden bursts suggest a trigger (temperature, mode switch, jitter injection).

3) Check worst-lane stability

A fixed worst lane is often position-related; a moving worst lane is often condition-related.

4) Capture evidence

Take a snapshot of config + environment + counters + margining before applying any “fix.”

Symptom classes → what to watch → what to do next

Symptom class	Watch (switch-local)	Next action (fast narrowing)
Training fails / retrains	Training events, deskew events, CDR lock/loss; link state transitions	Freeze the condition; run loopback/PRBS to separate channel vs clock/retime domain; log a before/after snapshot.
BER/CRC creeps upward	CRC/error slope, margin score trend; temperature zone trend	Run a short temperature sweep and compare margin proof; verify whether a single port group dominates the slope.
Worst lane is always the same	Worst-lane ID, worst-lane margin; repeatability across re-trains	Swap path/cable if possible; keep the profile constant; confirm “position-related” behavior with minimal matrix.
Only hot/cold triggers it	Event bursts vs temperature band; rail alerts (PHY/PLL) if present	Thermal soak at the trigger band; capture dense pre/post evidence; compare margining at identical profiles.
Only high load triggers it	Event bursts aligned with activity transitions; counters and margin shifts	Hold rate constant; test activity step changes while logging; look for condition-correlated loss of headroom.

Diagnosis loop: counters → domain split → conclusion

The most reliable triage flow is evidence-first. Use counters to localize the failing port group and direction, then use a loopback/PRBS-style separation step to decide whether the dominant contributor is channel/equalization margin, clock/jitter headroom, or a trigger condition such as temperature.

Top 5 pitfalls (field signatures)

Refclk quality / injection Lane order / deskew EQ overfit Broken return path Thermal hotspot drift

Output: minimal reproduction matrix (temperature × rate/mode × port × path)

Use a small matrix to prove triggers with minimal combinations. Each cell should record: pass/fail, profile_id, temperature zones, counters snapshot, and worst-lane + margin. This turns “intermittent” into “reproducible.”

Temperature	Rate/Mode	Port group	Path/Cable	Record (evidence)
Cold / Ambient / Hot	Mode A / Mode B (+ retime on/off)	Group 1 / Group 2	Path A / Path B	pass/fail + profile_id + env + counters + margin + worst_lane
…	…	…	…	…
…	…	…	…	…

Figure F8 — Symptom decision tree: from “link flaps” to evidence-driven actions

H2-10 · Validation & production test

Proving delivery: lab margin → production screening → field evidence loop

“Done” requires traceability across three environments. Lab characterization must demonstrate margin under stress; production tests must screen edge cases quickly and preserve worst-lane traceability; field telemetry must provide evidence that can be reproduced back in the lab. The output is a closed loop: Lab defines headroom, Prod enforces gates, Field feeds failures back into cases and gates.

Lab: characterize headroom (not just “it runs”)

BERT / PRBS evidence

Quantify error behavior under controlled patterns and conditions, capturing counters and margin proof.

Eye / margin proof

Use eye or equivalent margin metrics to show headroom and identify the governing lane group.

Stress corners

Temperature and supply-corner stress plus coupling scenarios to surface the edge of stability.

Production: fast screening + worst-lane traceability

PRBS loopback / self-test

Use repeatable patterns with loopback to screen marginal links quickly and consistently.

Quick margining

Run a short margin check to catch “passes now, fails later” units before shipment.

Record for traceability

Always store profile_id, worst_lane, margin score, counters, and temperature zones for each port group.

Coverage binding: metrics → method → gate → recorded fields

Validation becomes actionable only when each key metric is bound to a test method, a pass/fail gate type, and required record fields. Gates are expressed by threshold types (margin ≥ threshold, event slope ≤ threshold), without locking to a single vendor value.

Metric / risk	Method	Gate type	Record fields
Margin headroom (worst-lane governs)	Margining / eye-equivalent measurement	margin ≥ threshold	profile_id, margin_score, worst_lane, worst_lane_margin, temp zones
Stability (creeping vs burst)	PRBS/BERT run with trend logging	event slope ≤ threshold	counters snapshot (CRC/deskew/CDR), timestamps, mode/rate
Training robustness	Repeated bring-up cycles + stress corners	retrain count ≤ threshold	training_events, link_state transitions, profile_id
Temperature susceptibility	Thermal sweep / soak with identical profile	margin drop ≤ threshold	temp zones, margin trend, counters slope, worst-lane stability

Output: test-case checklist template (ready for lab + prod + field)

Case ID	Purpose	Setup	Method	Gate	Record
TC-01	Worst-lane headroom proof	Mode A, fixed profile, ambient	Margining	margin ≥ thr	profile_id + worst_lane + margin
TC-02	Training robustness	Repeated bring-up cycles	Bring-up + logs	retrain ≤ thr	training_events + link_state
TC-03	Thermal susceptibility	Cold/Hot soak, fixed profile	Sweep + trend	drop ≤ thr	temp + margin trend + counters
TC-04	Production quick screen	Prod line fixture, standard mode	PRBS loopback	errors ≤ thr	counters + timestamp + profile_id

Figure F9 — Closed-loop validation: Lab → Production → Field logs → back to Lab

H2-11 · IC selection & design-in checklist

IC Selection & Design-In Checklist (with MPN examples)

This chapter converts “what matters” (metrics, SI/clock margin, telemetry, validation) into a purchase-ready checklist: what to ask before committing, what to lock down during design-in, and what to require as evidence for production readiness.

Procurement reality: NVLink/NVLink Switch silicon is commonly sourced as part of a platform / OEM solution path, not as a simple catalog MPN. Plan the supply path and evidence package early (reports + tools + reproducible configuration profiles).

A) Selection dimensions — shortlist axes (what to require as evidence)

“Good-looking datasheet numbers” are not enough. The selection axes below are framed as: capability → engineering meaning → evidence type. The MPN list is provided for supporting clock/power building blocks that are typically orderable and must be aligned with the switch’s requirements.

Axis	What it really controls	Orderable MPN examples (non-exhaustive)
Ports & lane groups	Lane bonding/breakout constraints, port remap limits, worst-lane behavior under temperature and load. Require a clear “unsupported mapping” list + verified channel envelope.	Platform-sourced silicon NVLink Switch is typically obtained via OEM/platform path (confirm supply + support channel in RFQ).
Retiming modes	Fixed latency cost vs stability gain. Require a mode matrix: which paths retime, how latency classes differ, and how jitter transfer behaves across modes.	Clock cleaners Si5345 / Si5341, LMK04832, HMC7044, 8V19N850, ZL30273 (as reference-clock conditioning building blocks).
Equalization knobs	Whether TX FIR / CTLE / DFE are controllable, repeatable, exportable. Require: ranges + step sizes + default profiles + “export/import profile” mechanism.	Jitter attenuators Si5345, LMK04832, HMC7044, ZL30273 Fanout ADCLK948, LMK1C1104 (clock distribution helpers; choose by I/O standard needs)
Reference clock requirements	“ppm OK” ≠ “phase noise OK”. Require the measurement method (integration band, units) and a decision rule for when a cleaner is required.	Si5345 Si5341 LMK04832 HMC7044 8V19N850 ZL30273
Telemetry / RAS	Ability to “see degradation”: margining, CDR/training status, error counters, worst-lane flags, port isolate, lane repair. Require a counter dictionary + event/log export format.	Evidence-driven MPN is less important than: tooling, counter definitions, log export, and reproducible profiles.
Clean rails for PHY/PLL	Noise-sensitive rails shift jitter margin and EQ behavior with temperature/load. Require PSRR/noise targets and a layout guideline for the rail’s “quiet zone”.	LDO TPS7A94, TPS7A88, ADM7150, LT3045
Package & routability	Whether return paths and reference planes can remain continuous across dense escape routing. Require stackup guidance + keepouts for sensitive clock/SerDes zones.	Layout constraint pack Ask for ball map + breakout guidance + SI channel rules + reference design notes.
Ecosystem & support	Margining tools, scripts, register/profile workflows, version compatibility rules. Require an evidence package: reports + tool versions + reproducible recipes.	EVM/Tools Prefer solutions with evaluation kits & documented automation paths (tool/SDK version pinned in BOM).

MPN notes: clock/fanout/LDO examples above are orderable components frequently used to meet refclk and “clean rail” requirements. Final selection must follow the switch’s specific I/O standards, jitter transfer needs, and power/thermal envelope.

B) Design-in checklist — make it controllable and traceable

The design-in goal is not “link comes up once”, but “margin is measurable, profiles are reproducible, and failures are diagnosable”.

1) Schematic hooks (interconnect domain only)

Refclk injection path: defined entry point, optional cleaner placement footprint, and a measurement-friendly node (test header/connector).
Clock distribution: controlled fanout / buffering plan (e.g., ADCLK948 or LMK1C1104 class, depending on required I/O standards).
Sideband visibility: ensure required access for reading counters, margining metrics, and event logs (protocol specifics remain out of scope).

2) Layout / channel hygiene (what prevents “hot OK, cold fail”)

Return path continuity: avoid reference plane splits under critical SerDes/clock routes and fanout branches.
Clock quiet zone: keep noisy aggressors away from cleaner/fanout + SerDes PLL region; prioritize short, shielded, consistent-impedance routes.
Thermal correlation: place thermal sensors where the SerDes/PLL hotspots actually live; align logging with those sensors.

3) Test hooks (minimum viable bring-up + production screening)

PRBS/loopback plan: define how a failing port/lane can be isolated without relying on external “good system state”.
Worst-lane capture: ensure the “worst lane” is identifiable and recorded under stress (temperature × rate × load).
Clock margin checks: preserve the ability to swap cleaner profiles and verify pass/fail deltas with the same test recipe.

4) Configuration management (non-negotiable for scaling)

Equalization / retiming / clock settings must be treated as a versioned artifact, not a one-off tuning session. The following “minimum record” prevents irreproducible builds:

profile_id: “EQ_NVX_112G_PAM4_A01” device_stepping: “rev / stepping” mode: “rate + lane_group + retime_mode” eq_summary: “CTLE preset, TX FIR preset, DFE enable + caps” refclk_chain: “source -> (cleaner MPN + config) -> (fanout MPN) -> port groups” conditions: “temp range, supply range, cable/backplane class” evidence: “margining snapshot IDs + counter baseline + stress duration” tooling: “SDK/tool version pinned”

Example orderable building blocks for the refclk_chain: Si5345/Si5341, LMK04832, HMC7044, 8V19N850, ZL30273 (cleaners/attenuators); ADCLK948 or LMK1C1104 class (fanout/buffer); TPS7A94/TPS7A88, ADM7150, LT3045 (clean rails for PLL/SerDes).

C) RFQ must-ask questions (≤20) — require verifiable artifacts

Each question below is designed to force an evidence-backed answer (report, tool output, counter dictionary, or documented limitation), so selection does not collapse at thermal corners or in production.

Supply path: Is the interconnect switch silicon available as a discrete MPN, or only via a platform/OEM solution? Provide supported procurement routes and lifecycle policy.
Port/lane mapping limits: Provide a matrix of supported lane-grouping/breakout/remap constraints and known unsupported combinations.
Channel envelope: Provide verified channel conditions (IL/RL/crosstalk classes) and the measurement method used.
Retiming scope: Which paths truly retime? Provide retime mode list and fixed latency classes per mode.
Jitter transfer: Provide jitter transfer / tolerance characterization under key modes and temperature corners.
EQ control: Provide TX FIR / CTLE / DFE range, step sizes, and a documented “export/import profile” workflow.
Lane margining: What margining metrics exist (eye height/width/score), and how can they be read programmatically?
Worst-lane behavior: How is worst-lane detected and flagged? Provide a sample log/counter snapshot under stress.
Training stability: Provide known causes for retrain/deskew churn and mitigation notes (temperature, voltage, connector variance).
Refclk spec (method): Provide refclk phase-noise/jitter requirement including integration band and units.
Cleaner decision rule: Provide a rule-of-thumb (with supporting evidence) for when a jitter cleaner is required and recommended placements.
Clock chain options (orderable MPNs): List qualified/reference cleaners and fanout parts used in validated designs (e.g., Si5345/Si5341, LMK04832, HMC7044, 8V19N850, ZL30273; fanout such as ADCLK948 / LMK1C1104 class).
Clean rails guidance (orderable MPNs): Provide rail noise/PSRR targets and known-good LDO examples for PLL/SerDes rails (e.g., TPS7A94/TPS7A88, ADM7150, LT3045 class).
Telemetry dictionary: Provide a complete counter/state dictionary (names, meanings, reset behavior, overflow behavior).
RAS: Does the device support lane repair and port isolate? Provide conditions, limits, and expected behavior.
Event logs: What event logs are available, how are timestamps generated, and what is the export format?
Validation bundle: Provide a recommended lab characterization plan (BERT/eye/jitter tolerance/thermal stress) and sample reports.
Production recipe: Provide a production screening recipe (PRBS loopback + margin quick check) and pass/fail criteria guidance.
Tooling & versioning: Provide the required SDK/tools, supported automation APIs, and a version-compatibility policy.
Escalation artifacts: If field issues occur, what minimum dataset must be captured (counters + conditions + profiles) for root-cause turnaround?

D) BOM fields template — encode traceability (copy/paste)

These BOM fields separate “demo success” from “production scalable”. The intent is to pin the configuration and evidence chain, not just hardware.

Field	Meaning	Example value
switch_solution_path	How the interconnect switch is procured (discrete MPN vs platform/OEM). Also pins support channel.	“OEM platform module / partner SKU”
device_mpn / stepping	Exact material number + stepping/revision for all orderable supporting chips (clock/LDO/fanout).	LMK04832NKDT; ADCLK948BCPZ; TPS7A94…
ports / lane_grouping	Port count + lane group plan + any remap assumptions.	“N ports; 8-lane groups; map vA”
supported_modes	Rate/mode list that is actually validated for this design.	“112G PAM4 class; mode set M1”
retime_mode / latency_class	Selected retiming behavior + the associated latency class.	“Retime-On; Latency-L2”
eq_profile_id	Versioned EQ/retime profile identifier (must be reproducible).	EQ_NVX_112G_PAM4_A01
eq_knob_summary	Human-readable summary of key knobs (not full register dump).	“CTLE P3; FIR P2; DFE on”
refclk_chain_mpn	Refclk chain components with explicit MPNs and config IDs.	Si5345 + ADCLK948 + config C12
quiet_rail_mpn	Noise-sensitive rail regulator MPN(s) for PHY/PLL supply islands.	TPS7A94 (PLL); ADM7150 (RF/PLL)
telemetry_support	What is readable (margining, counters, thermal/rail alarms) + doc reference.	“Margin Y; Worst-lane Y; Dict v3”
validation_report_refs	Report identifiers for BERT/eye/jitter/thermal stress used to sign-off.	“BERT-RPT-07; TH-RPT-03”
production_test_recipe_id	Screening recipe version (loopback + margin quick check + thresholds).	“PROD_PRBS_MRG_R1”
tool_sdk_version	Tool/SDK version pinned so results remain reproducible across builds.	“SDK 1.8.2; tool 5.4”

Figure F10 — Selection → Design-in → Proof (evidence-first pipeline)

Keep the diagram “low text, high structure”: each box is a decision or artifact. This prevents mobile clutter while preserving the engineering logic.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12 · FAQs (Field & Selection)

These FAQs are designed to capture long-tail searches and common field-debug questions while staying strictly inside this page’s scope: interconnect switching, retiming/equalization, reference-clock jitter, observability, validation, and design-in/RFQ evidence.

What is the practical boundary between a Retimer, a Redriver, and an Interconnect Switch?

A redriver mainly boosts and equalizes analog signals; a retimer recovers data with a CDR and re-times the stream; an interconnect switch adds fabric-level connectivity (multi-source/multi-destination), isolation, and port remapping. The switch solves topology and fault-domain problems that “bigger retimers” cannot, at the cost of power, latency, and validation complexity.

Use a redriver when loss is modest and topology stays 1:1.
Use a retimer when CDR retiming is required to restore margin.
Use a switch when many endpoints must be dynamically connected and isolated.

See: H2-1 (boundary table), H2-4 (fabric vs retimer data path).

Link training succeeds, but BER/CRC slowly increases over time—what are the most common root-cause buckets?

A “slow climb” typically means the link is running with thin margin that is being consumed by temperature drift, supply/PLL noise, or an overfit equalization profile that degrades across PVT. Another common cause is inadequate observability: counters are sampled too coarsely, so early warning signals (deskew events, CDR near-unlock, margin drops) are missed.

Pinpoint port/direction/condition first using counters and snapshots.
Correlate with temperature, data rate, and EQ profile changes.
Use a minimal reproduction matrix to separate environment vs topology.

See: H2-8 (telemetry & logging), H2-9 (symptom tree).

Why can adding a “jitter cleaner” make the system less stable instead of better?

Jitter cleaning is not “always better.” Instability often comes from a poor injection point or a loop-bandwidth choice that either tracks reference noise (too wide) or reacts slowly to real disturbances (too narrow), creating wander, lock stress, or unexpected phase steps. Cleaner settings must match the link’s jitter tolerance and the system’s noise spectrum.

Verify loop bandwidth and holdover behavior against the target jitter budget.
Check where noise is injected (before/after fanout, across isolation boundaries).
Confirm the cleaner’s output format and level match downstream requirements.

Example jitter-cleaner families used in practice include devices like Si5345, LMK04832, and HMC7044 (as reference examples, not requirements).

See: H2-6 (refclk & cleaners).

Refclk ppm is “in spec” but links still flap—how should phase noise/jitter be measured and attributed?

PPM only describes frequency accuracy over long intervals; it does not guarantee low phase noise. The correct approach is to convert phase-noise to integrated jitter over a defined band, then map that jitter to the link’s CDR and BER sensitivity. Attribution requires controlled tests: isolate refclk contribution from channel/EQ by using repeatable stress conditions and consistent pass/fail criteria.

Fix the integration band and report the jitter metric consistently.
Correlate jitter margin with counters (deskew, CDR stress, BER slope).
Bind the measurement method to validation gates and logs.

See: H2-6 (jitter budgeting), H2-10 (validation gates).

What is the correct tuning order for TX FIR / CTLE / DFE, and how to avoid overfitting?

A stable methodology starts by locking data rate and baseline presets, then targeting the worst lane under a representative stress. Typically, converge CTLE first (undo channel tilt), then TX FIR (shape pre-emphasis), and use DFE last and sparingly. Overfitting happens when DFE “learns noise” or when a profile is validated only at a single corner.

Rate fixed → find worst lane → CTLE → FIR → minimal DFE.
Validate with margining across temperature/voltage corners.
Freeze and version-control the final profile for traceability.

See: H2-5 (retiming & EQ SOP).

Only a few lanes are always the worst—how can margining/loopback separate channel issues from a bad port?

The fastest separation is “location-coupled vs port-coupled.” Margining reveals whether the failure is eye height, eye width, or timing; loopback and controlled swaps test whether the weakness follows a physical route or a specific silicon lane/port. The goal is to reduce the problem to one variable before deeper EQ changes are attempted.

Use margining to characterize the weakness signature consistently.
Swap endpoints or mapping to see whether the weakness follows the path.
Record deskew events and per-lane counters for repeatable evidence.

See: H2-5 (margining), H2-9 (debug loop).

Links fail only when hot—how does temperature affect SerDes/PLL margin, and what should be logged?

Temperature changes can shift channel loss, alter equalization convergence, and degrade PLL/clock margin—turning a borderline eye into intermittent retraining, deskew stress, or BER slope changes. Robust logging must capture the condition, not just the outcome: per-port temperature, data rate, EQ profile, CDR lock indicators, and counter snapshots at consistent intervals.

Correlate failures with thermal zones and “time-at-temperature.”
Log per-port: rate, EQ profile ID, CDR/deskew status, BER/CRC counters.
Reproduce with a temperature × rate × port matrix before redesign.

See: H2-7 (thermal drift), H2-8 (logging fields), H2-9 (symptom tree).

The same port behaves very differently across data rates—does this indicate EQ coverage or jitter margin limits?

Large rate sensitivity usually comes from either insufficient equalization range for the channel at a given Nyquist, or a refclk/PLL jitter margin that is rate-dependent. The correct approach is to tie “what changed” to measurable quantities: margining distribution, worst-lane shift, CDR stress flags, and BER slope. Treat it as an attribution problem, not a tuning guess.

Compare margining and worst-lane identity across rates.
Check CDR/deskew stability and counter behavior vs rate.
Validate with a fixed stress recipe and consistent pass/fail gates.

See: H2-3 (metrics→how to measure), H2-5 (EQ), H2-6 (jitter).

How to design a “minimal” production test that catches marginal ports without blowing up cycle time?

Minimal production testing should focus on the most discriminating conditions rather than exhaustive coverage. Use a short PRBS loopback, a margining quick-check, and a “worst-lane” screening rule per port group. The key is traceability: record just enough fields to connect a marginal result to a specific port/rate/temperature and to reproduce it in the lab.

Pick representative worst-case rates and channel classes.
Use fast margin checks and stop rules instead of long soak tests.
Store per-port summaries plus a few snapshots for escalation.

See: H2-10 (validation & production test mapping).

Without high-end lab gear, how can counters + PRBS quickly decide whether “link quality” is acceptable?

A practical field method is to establish a counter baseline, run a short PRBS/loopback window under controlled conditions, and compare the error slope against a known-good reference. Counters provide the “where and when,” while PRBS provides a fast stress. Decisions should follow a symptom tree: identify which port, direction, and condition triggers instability first.

Snapshot: rate, temperature, EQ profile, key counters per port.
Run PRBS/loopback for a fixed short duration and compare slopes.
Escalate only after reproducing with a minimal condition matrix.

See: H2-8 (telemetry), H2-9 (field debug tree).

During port/channel planning, what deskew and layout risks come from lane bonding and breakout?

Lane bonding and breakout increase the probability of lane-to-lane delay mismatch, deskew-window pressure, and reference-plane discontinuities that hurt return paths. They also complicate maintenance and debug because mapping changes can hide which physical path corresponds to a logical port group. Planning should treat “port = lane group” as a first-class constraint and document mapping explicitly.

Budget deskew: length, via count, and discontinuity symmetry across lanes.
Keep reference planes continuous across breakout regions.
Version-control the mapping between logical ports and physical routes.

See: H2-2 (topology & port groups), H2-4 (deskew in data path).

What “provable” evidence should be requested from suppliers (margin, jitter, RAS, thermal drift) during selection?

Selection should be driven by evidence that can be reproduced: margining definitions and reports, jitter measurement methods and integration bands, RAS behaviors (degrade/repair/isolation) with limits, and temperature-corner stability data with clear test setups. The strongest RFQs bind each claim to a measurement, a pass/fail gate, and required log fields to support field forensics.

Ask for margining methodology, exported metrics, and corner conditions.
Ask for jitter/phase-noise test setup and the exact integration band.
Ask for RAS feature boundaries, event logs, and failure-mode handling.

See: H2-11 (RFQ + design-in checklist), H2-10 (validation mapping).

Tip: For mobile-friendly troubleshooting, each answer is written as a “mini closed loop”: what it means → what to check first → what evidence to log → which chapter contains the deeper method.

Figure F10 — FAQ map: symptom → tool → decision loop

NVLink / High-Speed Interconnect Switch Explained

NVLink / High-Speed Interconnect Switch Explained

What an interconnect switch is—and what it is not

Output: practical boundary comparison

Topology, ports, and lane groups: a practical placement model

Output: port planning checklist (interconnect domain)

Datasheet metrics that predict real stability

Output: Metric → Engineering meaning → How to measure

Data path anatomy: where switching differs from “a bigger retimer”

Output: error-injection points (what becomes sensitive where)

Channel budget, EQ boundaries, and a repeatable tuning SOP

EQ toolbox: boundaries and tradeoffs

Output: a copyable tuning SOP (steps + record fields)

Why reference-clock jitter is a make-or-break line

Concept chain: phase noise → integrated jitter → BER risk

Output: jitter budget template (fill-in fields)

When a jitter cleaner is justified (practical criteria)

Environment-driven drift: why links pass cold and fail hot

Only the rails that matter (PHY / PLL cleanliness)

Thermal hotspots and stability drift

Package & board-level contributors (focused on return-path continuity)

Output: thermal–SI linked checklist (what to verify and correlate)

Observability: the ability to see degradation before it becomes a dropout

Management interfaces (existence only)

Telemetry tiers: what to watch

Sampling strategy (trend + trigger)

Unstable links: triage by symptom tree (fast narrowing, evidence-driven)

First 10 minutes: localize before changing anything

Symptom classes → what to watch → what to do next

Diagnosis loop: counters → domain split → conclusion

Top 5 pitfalls (field signatures)

Output: minimal reproduction matrix (temperature × rate/mode × port × path)

Proving delivery: lab margin → production screening → field evidence loop

Lab: characterize headroom (not just “it runs”)

Production: fast screening + worst-lane traceability

Coverage binding: metrics → method → gate → recorded fields

Output: test-case checklist template (ready for lab + prod + field)

IC Selection & Design-In Checklist (with MPN examples)

A) Selection dimensions — shortlist axes (what to require as evidence)

B) Design-in checklist — make it controllable and traceable

C) RFQ must-ask questions (≤20) — require verifiable artifacts

D) BOM fields template — encode traceability (copy/paste)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

H2-12 · FAQs (Field & Selection)

Explore

Categories

Get in Touch