123 Main Street, New York, NY 10001

CDR / Retimer: Jitter Clean-up with Programmable Loop BW

← Back to:Interfaces, PHY & SerDes

A retimer cleans up jitter by re-clocking data through a programmable CDR loop, but real stability is set by loop bandwidth, reference quality, and added-jitter floor. This page provides a repeatable workflow: find a safe BW window, prove it with A/B/C measurements, and lock down pass/fail metrics for bring-up, production, and field diagnosis.

Definition & Where It Sits in the Link (Retimer vs Redriver vs CDR-only)

Page boundary: This section defines retimer/redriver/CDR-only at the electrical layer and shows where each block lives in a cable/backplane link. Protocol-specific compliance, presets, and training behaviors are intentionally not covered here.

What it is

  • Retimer performs clock recovery + re-timing: it rebuilds a sampling clock and re-launches data on a new timing reference.
  • It can break the input jitter transfer chain (high-frequency input jitter is not passed 1:1 to the output).
  • It has lock behavior (lock / relock / lock time) and adds its own jitter (no “free” clean-up).
  • In a link budget, retimer impact is best viewed as: Filtered(Input) + Added(Retimer).

What it is not

  • A redriver is primarily linear gain / equalization (CTLE/FFE-class behavior): it does not rebuild timing.
  • If there is no lock indicator and no concept of jitter transfer versus frequency, the block is unlikely to be a retimer.
  • “Compliance-ready” claims are not a definition. The electrical question is: does it re-time? and what jitter does it add?
  • This section does not expand into protocol ecosystems (training sequences, presets, masks). Those belong to dedicated protocol pages.

When it is needed (first discrimination)

Symptoms that point to timing margin

  • Short link passes, long cable/backplane becomes random-error dominated.
  • Lower data rate “fixes” the issue, suggesting sampling margin rather than pure amplitude.
  • Behavior changes with temperature, supply ripple, or reference quality (typical lock-margin stressors).

First decision rule (avoid overlap)

  • If errors remain after reasonable linear equalization and eye height looks “recoverable,” suspect jitter / sampling → retiming becomes relevant.
  • If the failure is clearly ISI / loss dominated (amplitude/edge-rate collapse with little timing evidence), prioritize linear equalization pages and treat retimer as optional.
  • If the system requires a new timing reference to stop jitter propagation across segments, a retimer is the correct category.

Next: “jitter clean-up” is not magic—output jitter equals filtered input jitter plus the retimer’s added jitter, shaped by loop bandwidth.

Figure 1 — Link block placement + category comparison

Retimer placement in a cable/backplane link Tx and Rx with two channels, a retimer containing CDR, elastic buffer, and Tx, plus a side comparison of redriver, retimer, and CDR-only features. Tx Channel cable / backplane Retimer CDR Elastic Buffer Tx Channel cable / backplane Rx Category comparison (electrical behavior) Redriver Retimer CDR-only Clock recovery Re-timed output Lock behavior Jitter chain break Added jitter No No No No Low Yes Yes Yes Yes Exists Yes Depends Yes Partial Exists

Practical takeaway: a retimer is the category that rebuilds timing. Its benefit is always the balance between filtered input jitter and its own added jitter.

What “Jitter Clean-up” Actually Means (Transfer Function View)

Core model: Output jitter = Filtered(input jitter) + Added(retimer). Loop bandwidth defines what is tracked versus what is filtered.

Low-frequency wander (slow timing drift)

  • Wander is the slow component of timing/frequency error (thermal drift, long-term perturbations, reference offset).
  • Below loop bandwidth, the CDR tends to track input timing changes, so some wander can appear at the output by design.
  • A narrower loop bandwidth generally means less tracking (more isolation), but can increase lock stress under frequency offset and slow perturbations.
  • Verification focus (electrical layer): record lock time, relock count, and timing stability across temperature and supply ripple (pass criteria threshold X).

High-frequency jitter (fast edge movement)

  • High-frequency jitter is the fast edge displacement that reduces sampling margin on long channels (random + deterministic components).
  • Above loop bandwidth, the CDR behaves more like a filter for input jitter; higher-frequency input jitter is attenuated rather than followed.
  • A wider loop bandwidth typically means more input jitter is passed through (clean-up weakens), while a narrower bandwidth improves filtering but may worsen acquisition/robustness in stressed conditions.
  • Verification focus: compare jitter metrics at three points (pre-retimer / post-retimer / far-end) and avoid “settings artifacts” by keeping instrument bandwidth and equalization assumptions consistent.

Useful vocabulary (retimer view): Jitter transfer describes how input jitter reaches the output across frequency, while jitter tolerance describes how much input jitter the CDR can tolerate while staying locked. Standards and protocol templates are intentionally out of scope here.

Figure 2 — Jitter transfer intuition (loop bandwidth sets the corner)

Jitter transfer versus frequency Two curves illustrate narrow versus wide loop bandwidth. Below bandwidth tracking dominates; above bandwidth filtering dominates. Frequency Jitter transfer (gain) Loop BW (wide) Loop BW (narrow) Wide BW Narrow BW Below BW Tracking dominates Above BW Filtering dominates Interpretation: output clean-up improves when high-frequency input jitter is attenuated, but added jitter always remains.

Engineering framing: loop bandwidth is a knob that trades “how much timing is followed” against “how much input jitter is filtered,” while added jitter sets a hard floor.

Inside the Retimer: CDR Loop, Buffering, and Why Loop BW Is Programmable

Page boundary: This section explains the retimer’s timing loop (CDR), its buffering, and how programmable loop bandwidth changes tracking versus filtering. Protocol training sequences and equalizer algorithm details are intentionally out of scope.

Internal structure (what each block controls)

A retimer behaves like a closed-loop timing system plus a rate/phase absorption buffer. Understanding the loop and where noise enters is the fastest way to predict whether “clean-up” will help or hurt.

Core loop blocks

  • Phase detector (PD): converts edge timing error into an error signal; sensitive to threshold noise and supply/ground coupling.
  • Loop filter (LF): shapes loop response; sets the boundary between tracking and filtering.
  • VCO/DCO: generates the recovered clock; a dominant added-jitter source via phase noise.
  • Divider / clock conditioning: maps internal clock domains; affects loop gain scaling and measurement observability (lock flags, counters).

Buffering & re-launch path

  • Elastic buffer: absorbs slow rate/phase differences between input and the recovered clock; protects lock margin but introduces latency and buffer-stress modes.
  • Re-timing latch: aligns data to the recovered clock; converts clock jitter into data edge movement at the output.
  • Tx serializer: re-launches a new edge stream; output edge quality is bounded by added jitter + output path noise.

Engineering rule: the retimer output is not “input cleaned.” It is a new edge stream produced by the recovered clock, with jitter shaped by the loop and limited by added noise sources.

Parameter knobs (what changes and what to probe first)

Programmability exists because different channels and environments demand different trade-offs between tracking (robust lock under drift) and filtering (jitter attenuation). Each knob should be tied to a primary effect, a failure risk, and a first probe point.

Loop bandwidth (BW)

  • Primary effect: moves the tracking↔filtering corner frequency (jitter transfer shape).
  • Risk: too narrow → lock stress; too wide → clean-up weakens and input noise passes.
  • First probe: compare pre/post-retimer jitter trend and lock time under temperature/supply stress (threshold X).

Peaking / damping

  • Primary effect: sets how “resonant” the loop is near the corner frequency.
  • Risk: wide BW + high peaking can amplify a mid-band disturbance into spur-like errors.
  • First probe: check for periodic error bursts and whether they correlate with a supply/reference spur.

Lock detect / relock policy

  • Primary effect: defines what “locked” means and how quickly the loop reacts to loss of lock.
  • Risk: false lock flags can hide marginal timing; aggressive relock can look like random faults.
  • First probe: validate lock flag consistency against BER/CRC counters under controlled stress.

Gearbox / clock ratio (concept-level)

  • Primary effect: changes internal clock-domain mapping and buffer pressure.
  • Risk: increased latency or reduced buffer margin under drift.
  • First probe: monitor buffer status/margin indicators and event logs (threshold X).

Next: the same loop bandwidth knob that improves filtering can also reduce tracking margin. The trade-off must be tuned against channel drift and noise injection.

Figure 3 — CDR internal closed-loop view (BW knob + added jitter entry points)

CDR loop blocks and noise injection points Closed-loop timing diagram with CDR blocks, buffering, and arrows showing added jitter sources from VCO phase noise, PD noise, and supply coupling. Data In Phase Detector PD Loop Filter LF BW VCO / DCO Recovered Clock Divider Lock Detect flags / counters Elastic Buffer Re-timing Latch data aligned to recovered clock Tx Serializer new edge stream Out VCO PN PD noise Supply / ground coupling → phase noise floor (added jitter) Added jitter sources: VCO phase noise, PD noise, and supply/ground coupling into timing-sensitive nodes.

Design intuition: the loop filter (BW and damping) determines how much input timing is followed, while noise injected into PD/VCO sets the added-jitter floor.

Loop Bandwidth Trade-offs (Clean-up vs Tracking vs Stress)

Loop bandwidth is a risk-control knob. Set too narrow, lock margin collapses under drift; set too wide, clean-up weakens and mid-band disturbances can be amplified by peaking. The correct setting is found by matching channel drift and noise injection to the required stability margin.

BW too narrow

  • What shows up: long lock time, higher relock events, temperature/slow drift triggers failures.
  • Why: insufficient tracking of low-frequency wander and offset; accumulated phase error pushes the loop/buffer toward the edge.
  • Quick check: stress with slow thermal sweep and supply ripple; log lock time and relock count (threshold X).
  • Fix direction: widen BW slightly or relax damping/lock policy carefully; re-validate clean-up afterward.

BW too wide

  • What shows up: lock is fast, but BER/CRC rises on long channels; sensitivity to supply/reference noise increases.
  • Why: more input jitter passes through; with peaking, a mid-band disturbance can be amplified into spur-like errors.
  • Quick check: look for periodic error bursts and correlation with known spurs; compare pre/post-retimer jitter trend consistently.
  • Fix direction: reduce BW or lower peaking; improve supply/ground isolation to lower added-jitter floor.

Just right (how to find it)

  1. Start stable: use a conservative BW and confirm lock stability across temperature/supply variation (pass threshold X).
  2. Then verify clean-up: compare jitter at three points (pre/post/far-end) with consistent instrument bandwidth and assumptions.
  3. Optimize margin: widen BW stepwise until the error-rate trend worsens, then step back one notch to keep headroom.

Selection cue: the “best” BW is channel- and environment-dependent; it is the setting that keeps drift tracking within margin while pushing high-frequency jitter below the system’s tolerance.

Figure 4 — Input jitter spectrum (wander / spurs / RJ) + BW window (tracked vs filtered)

Jitter spectrum and loop bandwidth window Low-frequency wander, mid-band spurs, and high-frequency random jitter are shown. A semi-transparent loop bandwidth window separates tracked and filtered regions and highlights peaking risk. Frequency Jitter energy (relative) wander spurs RJ BW window Tracked Filtered peaking Interpretation: drift pressure lives at low frequency; spurs can be amplified near the corner; random jitter dominates at high frequency.

Tuning goal: choose BW and damping so slow drift stays within lock margin, while high-frequency jitter is attenuated without introducing mid-band peaking sensitivity.

Note: if a setting improves an eye screenshot but worsens errors, suspect added jitter or mid-band peaking sensitivity before changing unrelated link parameters.

Reference Clocking & Frequency Error (What Breaks Lock First)

Page boundary: This section explains how reference quality, frequency error, temperature drift, and spread-like modulation increase CDR tracking stress and erode lock margin. Protocol-specific SSC profiles and compliance rules are intentionally out of scope.

Reference-related (source and clock-tree stress)

A retimer does not ignore the reference environment. Noise and slow modulation that land inside the loop’s tracking region can be followed and re-launched. When the effective frequency error exceeds tracking capacity, lock margin is consumed first.

  • How reference noise enters: low-frequency phase/frequency variation is more likely to be tracked; higher-frequency components are more likely to be filtered by the CDR (loop-BW dependent).
  • What breaks lock early: static offset (ppm), temperature drift, and supply-induced slow wander add up into an “equivalent modulation” the CDR must follow.
  • Spread-like modulation (SSC principle only): deliberate low-frequency frequency modulation increases tracking pressure. If tracking headroom is small, modulation can turn a stable bench link into a system-level unlock.

First probe points

  • Log lock time, relock count, and error bursts versus temperature/supply state (threshold X).
  • Check whether failure correlates with a known periodic disturbance (fan/VRM/clock-tree spur behavior) before changing unrelated link parameters.

Channel-related (why system wiring collapses margin)

The channel rarely increases frequency error by itself. Instead, long cables/backplanes reduce effective detector SNR and make the loop more sensitive to the same reference and supply environment. The practical outcome is less tracking headroom and earlier unlock.

  • Loss/ISI and reflections: reduce edge clarity, increasing timing-error noise seen by the PD.
  • Return-path and common-mode currents: inject noise into timing-sensitive nodes, turning small reference issues into lock events.
  • System integration effect: the enclosure introduces more coupling paths (grounding, harness, airflow, power distribution) that do not exist on a bench.

First discrimination step

  • Compare short-link versus long-link relock frequency and burstiness; margin collapse typically shows up as unlock/relock events first.
  • If touching cable/ground changes behavior, prioritize return-path and coupling fixes before re-tuning BW.

Figure 5 — Frequency error vs tracking capacity (margin gets consumed first)

Tracking capacity and consumed margin Tracking capacity bar is reduced by stacked contributors: offset, drift, supply wander, and SSC modulation. Remaining margin indicates lock robustness; negative margin indicates unlock risk. Tracking capacity budget (concept) Tracking capacity capacity offset temp drift supply wander SSC Consumed tracking headroom (contributors) Margin left margin unlock risk Rule: if consumed contributors exceed tracking capacity, lock breaks before “eye cosmetics” matter.

Practical interpretation: system integration increases the number of contributors that consume tracking headroom. Lock/relock behavior is often the first observable symptom.

Added Jitter Budget: When Retimer Makes Things Worse

Core model: Out jitter = Filtered(In) + Retimer Added. A retimer helps only if the reduction in filtered input jitter is larger than the increase in added jitter.

Path 1 — Supply noise → VCO/DCO phase noise floor rises

  • Symptom: output looks “cleaner” in shape yet error rate rises; failures correlate with load steps or system power modes.
  • First probe point: correlate retimer-rail ripple with error bursts; compare behavior with additional local decoupling or a cleaner rail (threshold X).
  • Fix direction: isolate and filter retimer supplies; reduce coupling into clock-control nodes; enforce short return paths and high-frequency bypass placement.

Path 2 — Return-path / package coupling → PD threshold noise increases

  • Symptom: behavior changes when cable/ground is touched; failures appear “random” but are sensitive to routing and enclosure grounding.
  • First probe point: check for common-mode current paths and reference-plane discontinuities near the retimer and connector; test with controlled grounding changes.
  • Fix direction: improve return continuity, shielding, and pin/ballout current paths; reduce coupling into PD/sampling sensitive nodes.

Path 3 — Reference spur / spread-like modulation → mid/low-frequency jitter is tracked

  • Symptom: bench passes, system fails when clock-tree or power environment changes; errors become periodic or state-dependent.
  • First probe point: search for a dominant spur and check if error periodicity matches; test sensitivity by reducing peaking or adjusting BW one step.
  • Fix direction: clean the reference environment (power/ground/shielding) or shape the loop to avoid amplifying near-corner disturbances.

Path 5 — Output-side power/termination issues → re-launched edges degrade

  • Symptom: the second segment (post-retimer) is unusually sensitive to connector/cable changes; errors increase despite improved mid-channel eye.
  • First probe point: inspect output termination continuity and supply noise local to the Tx/output driver; compare with a known-good short segment.
  • Fix direction: stabilize output supply and return path; enforce consistent termination and reduce reflection points near the retimer output.

Path 4 — Over-equalization raises noise → CDR input becomes noise-dominated

Equalization can improve eye height while also raising the noise floor. If the retimer input becomes noise-dominated, the phase detector sees a noisier timing error and added jitter can grow. This section does not expand into equalizer algorithms; it only highlights the interaction.

  • Symptom: increasing EQ makes the eye look better but errors increase; adding a retimer after aggressive EQ worsens stability.
  • First probe point: compare error rate across EQ gain steps; look for noise-floor rise or burstiness change near the retimer input.
  • Fix direction: reduce over-EQ, improve input SNR, and re-check BW/peaking sensitivity after noise is controlled.

Figure 6 — Noise injection map (sources → sensitive nodes → worse outcomes)

Noise injection paths into a retimer Left side noise sources connect via arrows to timing sensitive nodes (phase detector, loop filter node, VCO/DCO, Tx output). Right side summarizes outcomes: added jitter floor and burst errors. Noise sources Supply ripple Ground bounce Reference spur / SSC Crosstalk / EMI Over-EQ noise rise Sensitive nodes Phase Detector Loop Filter node VCO / DCO Tx output Outcomes Added jitter floor ↑ · Burst errors ↑

Diagnostic takeaway: when a retimer makes things worse, look for noise injection into PD/VCO and output-side integrity before changing unrelated link settings.

Measurement Reality: How to Prove “Clean-up” Without Fooling Yourself

Goal: prove jitter clean-up with a repeatable experiment, not a single screenshot. The minimum requirement is a three-point method: before retimer (A), after retimer (B), and at the far-end receiver (C).

Do (repeatable, comparable, actionable)

Lock the measurement settings

Freeze RBW/VBW, bandwidth limits, averaging, and any receiver presets. Treat settings as experimental variables and record them. Pass: A/B/C runs use identical settings (threshold X).

Measure three points (A/B/C)

Point A: before retimer. Point B: after retimer. Point C: far-end receiver statistics. Pass: conclusions are based on A→B→C trends, not a single point.

Use BER/CRC as the primary truth

Eye plots are supportive evidence. Link reliability is proven at Point C via BER/CRC in a fixed observation window. Pass: BER < X or CRC rate < X.

Record trigger and reference topology

Shared trigger/reference can hide system jitter by canceling common-mode timing variation. Pass: independent timing reference still shows the same A→B→C conclusion.

Repeat under stress (not only at room)

Add controlled temperature and supply stress while keeping settings fixed. Pass: relock count < X and BER/CRC remain within limits.

Compare trends, not “pretty” images

A legitimate improvement survives equipment swaps and minor setup differences. Pass: direction of change is consistent across runs (threshold X).

Don’t (common self-deception traps)

  • Do not change RBW/VBW/filters between A/B/C and claim the result is comparable.
  • Do not share trigger/reference in a way that cancels system timing variation and then conclude “jitter is gone”.
  • Do not measure only after the retimer; Point A is required to separate filtering from added jitter.
  • Do not stop at Point B; the far-end receiver (C) is the only place that proves link reliability.
  • Do not treat eye height/width as a substitute for BER/CRC. Beautiful eyes can still fail under burst noise.
  • Do not allow receiver equalization presets to drift run-to-run without recording the state.
  • Do not average long enough to hide spurs or burst errors and then declare the link “clean”.
  • Do not mix probes/fixtures without a quick correlation run that proves the setup is not the dominant error source.

Why “eye looks better” can still mean worse BER

Eye plots are a visual statistic and can under-represent low-frequency wander, correlated spurs, and burst noise. Use BER/CRC at Point C as the primary decision metric, and use the eye only to classify the failure mode.

Trigger/reference sharing can create a false “clean-up”

If the measurement reference tracks the same timing variation as the DUT, common-mode jitter can be partially canceled. Validate conclusions with an independent timing reference or with far-end receiver statistics.

Figure 7 — Three-point measurement topology (A/B/C) and what to compare

Three-point method for proving jitter clean-up Diagram shows Tx to Channel1 to Retimer to Channel2 to Far-end Rx. Points A, B, C are marked with suggested metrics and a comparison rule requiring identical settings. Tx Channel #1 Retimer CDR + buffer Channel #2 Rx A B C TJ / RJ / DJ trend TJ / RJ added floor BER / CRC primary Comparison rule Same RBW/VBW · Same presets · Same cable/temp · Compare A→B→C trends Point C proves reliability; A/B explain why

Minimum proof: identical settings and a consistent A→B→C trend where far-end BER/CRC improves without hidden reference/trigger cancellation.

Bring-up Playbook (Cables/Backplanes): A Step-by-step Debug Flow

The purpose is fast discrimination: identify whether the first dominant limiter is loss/ISI or timing/jitter stress, then apply a conservative-to-aggressive loop tuning strategy with logging.

Figure 8 — Debug decision flow (first discrimination + next probe point)

Bring-up debug flow for cables and backplanes Flowchart with yes/no branches: reduce rate improves, short cable improves, change reference improves, supply/ground correlation. Each branch gives next probe point and action. Start: Symptom CRC spikes / drops / unlock collect A/B/C baseline Reduce rate improves? Yes Timing stress probe: lock/relock No Loss / ISI probe: short link Short cable improves? Yes Channel dominated probe: reflections No System coupling probe: supply/ground Change reference improves? Yes Reference limited probe: spur / drift No Added jitter probe: rails/PD

Flow intent: each branch produces a next probe point. Use the three-point method (A/B/C) as the baseline for all comparisons.

Step 1 — Capture a baseline (A/B/C)

Action: measure A (before), B (after), and C (far-end) with locked settings. Pass: a stable baseline window with BER/CRC < X or a clearly repeatable failure signature.

Step 2 — Discriminate loss/ISI vs timing stress

Action: reduce data rate and observe whether errors drop sharply. Probe: relock events and burstiness. Pass: clear directionality (improves vs unchanged) within the same observation window (threshold X).

Step 3 — Shorten the channel to localize the dominant limiter

Action: swap to a short, known-good interconnect. Probe: whether unlock/relock disappears and whether C improves. Pass: improvement indicates channel reflections/return-path issues; no change pushes focus to system coupling.

Step 4 — Test reference sensitivity (principle-level)

Action: change the reference source or clock-tree condition in a controlled way. Probe: periodic error signatures and lock stability. Pass: strong sensitivity indicates reference/clock-tree limitation; weak sensitivity suggests added jitter or channel coupling.

Step 5 — Conservative loop settings, then open stepwise

Action: start with conservative BW and low peaking to protect against mid-band amplification. Then widen BW in steps until error-rate trends worsen, and step back one notch. Pass: improved C without increased relock count (threshold X).

Step 6 — Log list (must-have fields)

Consistent logging converts debug into engineering evidence. Keep the list short but mandatory.

temperature rail ripple reference ID cable type/len link rate BW / peaking lock/relock BER/CRC window

Pass criteria placeholder: required fields present on every run; missing fields are treated as invalid experiments.

Production & Field Diagnostics (BIST/PRBS/Loopback Hooks)

Objective: convert retimer behavior into segmentable, repeatable, and thresholded pass/fail checks suitable for factory screening, system integration, and field service.

Factory test (screening + station-to-station consistency)

Use PRBS/BIST and loopback to turn the link into controlled segments. Fix the test recipe so different stations and different operators produce comparable results.

Required pass/fail indicators (placeholders)

  • BER < X in a fixed time window
  • Lock time < X
  • Relock count under temperature points < X
  • Error bursts per window < X

PRBS / loopback coverage notes

  • Good at catching margin collapse due to loss/ISI, reflections, and strong added-jitter floors.
  • May miss failures driven by real remote Rx behavior, system-level coupling, or rare EMI bursts.
  • Shared fixtures/references can create false “good” results; require a correlation check (threshold X).

System integration (segment localization with controlled toggles)

Use loopback positions to localize whether the dominant limiter is pre-retimer, inside retimer, or post-retimer. Change one variable at a time: channel length, reference condition, BW/peaking mode, and power state.

Action sequence

  1. Run near-end loopback to validate the first segment (probe: BER/CRC, lock/relock).
  2. Run far-end loopback to extend coverage into the second segment (probe: burst errors, relock).
  3. If loopback passes but system fails, prioritize remote Rx behavior and system coupling paths.

Pass criteria (placeholders)

  • Segment-level BER/CRC improves in the same direction across repeated runs (trend stability X).
  • Unlock/relock events are confined to one segment (localization confidence X).

Field service (minimum-tool repeatability)

Field diagnostics must be scriptable. The goal is a short test that produces a stable signature, logs the environment, and can be replayed in the lab without interpretation drift.

Field script outputs

  • Lock time, relock count, and error bursts in a fixed window (threshold X).
  • Temperature, rail ripple estimate, cable ID/length, and BW/peaking mode tags.
  • Saved configuration snapshot (consider missing fields as invalid evidence).

Risk guardrails

  • PRBS pass does not guarantee system pass; remote Rx behavior can be the true limiter.
  • Loopback pass does not prove real payload patterns; keep a “real traffic” sanity test step.

Figure 9 — BIST & loopback positions (near-end vs far-end) and coverage

Loopback positions and coverage around a retimer Near-end loopback covers Tx and Channel1 into retimer input. Far-end loopback extends coverage through retimer and Channel2 toward the receiver. Coverage bars show what is included. Tx Channel #1 Retimer BIST / PRBS Channel #2 Rx N Near loopback F Far loopback Near loopback coverage Tx + Channel #1 + retimer Rx-side Far loopback coverage Tx + Channel #1 + retimer + Channel #2 (approaches Rx) Coverage ≠ real Rx behavior keep far-end validation

Interpretation: loopback isolates segments quickly, but remote receiver behavior and system coupling can still dominate real failures.

Power, Thermal, and Layout: The Non-Obvious Failure Drivers

Many failures are intermittent because the dominant driver is not the channel itself but rail noise, thermal gradients, and return-path/layout asymmetry. Each card below uses the same engineering structure: symptom → probe → fix → pass.

Power — Rail noise increases added jitter floor (VCO/DCO + PD sensitivity)

Failure symptom

Errors are bursty and correlate with system power modes, load steps, or nearby switching activity. Short-link tests may pass while system-level integration fails.

Probe

  • Time-align rail ripple logging with error timestamps (threshold X).
  • Compare A/B/C results across power states without changing measurement settings.
  • If a dominant spur exists, test whether error periodicity matches the spur behavior.

Fix

  • Improve local decoupling placement and reduce current-loop area.
  • Isolate sensitive rails from noisy domains; add filtering where it reduces injected noise.
  • Enforce clean return paths around the retimer and connector region.

Pass

Ripple < X, relock count < X, burst error rate < X, and far-end BER/CRC meets target.

Thermal — gradients push frequency/lock margin to the edge

Failure symptom

Bench passes at room temperature, but failures appear only hot/cold or when airflow changes. Relock events cluster around temperature transitions.

Probe

  • Log temperature at multiple points (near retimer, near connector, ambient) and align with lock/relock time stamps.
  • Check whether failures occur at a repeatable temperature window (threshold X).
  • Repeat A/B/C under temperature stress with identical measurement settings.

Fix

  • Reduce gradients: improve heat spreading, avoid hot spots, and control airflow paths.
  • Prevent thermal short-circuits between heat sources and sensitive clock/control regions.
  • After thermal stabilization, re-check conservative BW/peaking settings if lock margin is still tight.

Pass

Across temperature points: lock time < X, relock count < X, and far-end BER/CRC meets target.

Layout — deterministic jitter from return-path breaks and asymmetry

Failure symptom

Sensitivity to connector region, via transitions, or specific routing areas. Different board lots show different stability. Errors can track specific patterns or positions in the layout rather than random noise only.

Probe

  • Check return-plane continuity under the diff pair (plane gaps/slots, reference changes, and via fields).
  • Compare “good vs bad” boards at the same A/B/C settings to avoid instrumentation bias.
  • Inspect asymmetry: pair skew, impedance discontinuities, and stubs near connectors.

Fix

  • Restore return continuity: add stitching vias, avoid plane splits under high-speed routes, and control transitions.
  • Reduce discontinuities near connectors and retimer pins/balls; keep the routing symmetric.
  • Place decoupling to minimize the power-loop area that couples into the high-speed region.

Pass

Deterministic-error sensitivity reduces (placeholder X), and far-end BER/CRC meets target with stable lock (relock count < X).

Figure 10 — Return-path + decoupling + plane gap: a compact coupling model

Board-level return path and noise coupling sketch Two traces (diff pair) run from connector to retimer over a return plane. A plane gap interrupts return current, increasing coupling. Decoupling placement and loop area are shown as arrows. PCB (simplified) Connector Retimer sensitive nodes Diff pair Return plane low impedance path Plane gap return detour Decap loop area Noise inject Rule: continuous return + small power loop area reduce deterministic coupling into timing-sensitive nodes Plane gaps and asymmetric transitions often create intermittent, temperature-sensitive failures

Debug shortcut: if touching ground/cable changes behavior, return-path continuity and power-loop coupling are high-priority suspects.

Key Specs & Selection Logic (CDR / Retimer family)

This section converts “retimer” marketing into a requirements checklist: given channel, reference / frequency stress, target reliability, and latency constraints, it outputs must-have capabilities and how to validate them.

Scope guard: electrical layer + system impact + validation actions only. Protocol compliance templates and spec-by-spec requirements belong to protocol pages.

Selection Decision Tree (inputs → must-have capabilities)

  • Start from what stresses the link first: channel loss/echo, frequency error (ppm/thermal/SSC), or power/thermal intermittency.
  • Force a 3-point validation plan: Point A (before retimer), Point B (after retimer), Point C (far-end receiver).
  • Treat each knob as a “stability window search”: pick a conservative default, then widen loop BW / adjust peaking only if the error mode points to tracking limits.
Figure 11 · Selection Decision Tree (inputs → must-have capabilities) Keep labels short; validate with A/B/C points; write requirements as measurable pass criteria. Channel Cable / Backplane · Loss / Echo Reference & PPM Thermal drift · SSC · Supply noise Target Reliability BER / CRC · Burst tolerance Latency Budget · Stability (drift) Lanes & Topology x1 / x4 / x8 / x16 · Skew risk Need re-time? Loss / reflections close eye OR jitter margin collapses Tracking stress high? ppm / thermal / SSC frequent relock or slips Reliability strict? BER target very low OR burst errors not allowed Must-have #1 Low added jitter floor A/B/C validation ready Must-have #2 Wide capture/lock range Programmable loop BW Peaking control Must-have #3 Latency meets budget Lane consistency (skew) Power/thermal headroom

Tip: write the purchasing requirement as a measurable list (channels, data rate, capture/lock range, loop-BW modes, added jitter floor, latency drift, and a mandatory A/B/C validation plan).

Key Spec Cards (why it matters + how to validate)

1) Data-rate headroom & channel reach

Why it matters: retiming only helps if the device supports the link’s highest symbol rate with margin; otherwise, “clean-up” turns into stress and early failures under temperature or cable variation.

How to validate: run PRBS / stress traffic at max rate across worst-case channel; log lock time, error bursts, and far-end BER/CRC at Point C. Pass: BER ≤ X, lock time ≤ X ms, no recurring burst pattern (X events/hour).

2) Capture / Lock range (ppm, thermal drift, SSC stress)

Why it matters: “bench OK, system fails” often starts with frequency stress eating lock margin: ppm error, warm-up drift, supply-induced modulation, or spread-spectrum behavior.

How to validate: sweep effective ppm offset and temperature points; record relock count and time-to-recover under identical traffic. Pass: capture range ≥ X ppm, relock count ≤ X across temp, recovery time ≤ X ms.

3) Programmable loop bandwidth & peaking control (stability window)

Why it matters: loop BW defines tracking vs filtering; peaking can amplify mid-band jitter. Correct selection finds a stable window: clean-up improves without creating new error bursts.

How to validate: test 2–3 BW modes × 2 peaking modes under the same channel/temperature; compare Point C BER and relock stats. Pass: a stable window exists where BER ≤ X and relock ≤ X, with no mid-band “worse mode” unexplained.

4) Added jitter floor (filtered-in + added)

Why it matters: a retimer can make links worse if its own added jitter (VCO/DCO PN, PD noise, supply coupling) consumes system margin.

How to validate: enforce the 3-point method: Point A (pre), Point B (post), Point C (far-end). If B looks “better” but C does not, the metric is not representative. Pass: Point B jitter floor ≤ X (ps rms / fs rms), and Point C BER/CRC meets target across temperature.

5) Latency & latency stability (system budget)

Why it matters: retimers may insert buffering/gearboxing; systems fail not only on “too much latency” but on latency drift with temperature, supply, or mode changes.

How to validate: measure end-to-end latency using repeatable markers under hot/cold and supply ripple stress. Pass: latency ≤ X ns and drift ≤ X ns across conditions; mode switching does not create step jumps beyond X.

6) Lane count & lane-to-lane consistency (skew / symmetry)

Why it matters: multi-lane systems can fail even when single-lane looks clean: mismatch, skew, or unequal margin creates intermittent lane dropouts.

How to validate: stress all lanes simultaneously; compare per-lane error counters and margin indicators at Point C. Pass: lane-to-lane skew ≤ X, no “one weak lane” dominating BER/CRC under thermal cycling.

7) Diagnostics hooks (lock detect / loopback / PRBS readiness)

Why it matters: production and field success depend on repeatable pass/fail metrics; missing hooks forces “scope-only debugging” and yields inconsistent decisions.

How to validate: ensure loopback/PRBS modes cover both sides of the retimer; verify that pass/fail metrics correlate with far-end success. Pass: BER threshold X, lock time X, relock count X, all reproducible across stations.

8) Power / thermal headroom (intermittent killers)

Why it matters: supply noise and thermal gradients can translate into phase noise, marginal lock, and “only fails hot/cold” behavior.

How to validate: run the same BER test after thermal soak; inject controlled ripple and watch lock margin change. Pass: BER stable within X, no new spurs/periodic errors, lock remains stable across thermal soak window.

Requirement Output (copy/paste into design review)

  • Channels / lanes: ≥ X, lane-to-lane skew ≤ X
  • Max data rate: ≥ X (with margin on worst-case channel)
  • Capture range: ≥ X ppm; lock stability across temperature: relock ≤ X
  • Programmable loop BW: ≥ X modes; peaking control present
  • Added jitter floor at Point B: ≤ X (ps rms / fs rms); validated by Point C BER ≤ X
  • Latency: ≤ X ns; latency drift across conditions: ≤ X ns
  • Diagnostics hooks: loopback/PRBS readiness; lock time ≤ X ms; reproducible pass/fail

Representative Material Numbers (retimers & boundary comparators)

These part numbers are examples used in real links. Always confirm: data rate, channel count, package, temperature grade, suffix, and availability.

A) Multi-rate signal-conditioning retimers (generic high-speed links)

  • Texas Instruments: DS250DF410ABMR / DS250DF410ABMT (4-ch multi-rate retimer)
  • Texas Instruments: DS280DF810ABWR (8-ch multi-rate retimer)

B) PCIe / CXL-class retimers (cable/backplane reach extension examples)

  • Astera Labs: PT4161LRS (x16 class), PT5161LRS (x16 class)
  • Astera Labs family examples: PT5161LR, PT5161LX, PT5081LR, PT5081LX, PT4080LR
  • Montage (澜起): M88RT51632 (x16 class), M88RT40816 (x8 class), M88RT61632 (x16 class)
  • Renesas (IDT): 89HT0832P (x16 class), 89HT0808P (x8 class)

C) USB-C / USB4 retimers (retiming examples; protocol compliance handled elsewhere)

  • Parade: PS8830 (USB4 retimer class)
  • Parade: PS8833 (USB4/TBT4 retimer class)
  • Parade: PS8838 (retiming/switch class)

D) Boundary comparators (linear redrivers; not retimers)

These parts are useful for “retimer vs redriver” A/B comparisons in the same channel, especially when diagnosing whether the failure is tracking-related or purely loss/EQ-related.

  • Texas Instruments: DS100BR410SQE/NOPB (redriver class)
  • Texas Instruments: DS160PR410 (linear redriver class)
  • NXP: PTN36502 / PTN36502A (combo redriver class)

Procurement note: when collecting quotes, request the full orderable suffix (package/reel/temp grade) and require evidence of A/B/C validation correlation to the far-end receiver (Point C).

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (CDR / Retimer) — Debug-first, no protocol compliance

Intent: close long-tail troubleshooting without expanding main text. Each answer is a 4-line, executable mini-procedure: Likely cause → Quick check → Fix → Pass criteria. All thresholds use placeholders X_* for easy customization.

Standard measurement anchors (recommended):

  • Point A = before retimer, Point B = after retimer, Point C = far-end receiver.
  • Prefer Point C BER/CRC as the final truth; “pretty eye” at A/B is not a pass by itself.
  • Common placeholder examples: BER ≤ X_BER, lock time ≤ X_LOCK_MS, relock ≤ X_RELOCK_PER_HR, rail ripple ≤ X_RIPPLE_MVPP.
Adding a retimer makes the link less stable — “added jitter” or “over-tracking / weak filtering”?

Likely cause: (1) retimer added jitter floor (VCO/PD noise, supply/reference coupling), or (2) loop BW too wide / peaking amplifies mid-band jitter (tracks noise instead of filtering).

Quick check: freeze channel + traffic, toggle BW (wide → narrow) and compare trend A→B vs B→C; if C improves strongly when BW narrows, tracking/peaking is the primary stress.

Fix: if BW-sensitive, search a “stable window” (narrower BW + low peaking first); if BW-insensitive but B jitter floor is high, prioritize rail/REF isolation, decoupling, and ground return integrity.

Pass criteria: Point C BER ≤ X_BER, relock ≤ X_RELOCK_PER_HR, and no periodic burst pattern over X_OBS_WINDOW.

Short cable is stable, long cable loses lock — check loop BW first or reference ppm/thermal first?

Likely cause: long channel reduces eye/SNR and magnifies sensitivity to tracking stress (ppm drift, SSC, thermal); lock loss often appears when margin is already narrow.

Quick check: keep BW constant and swap only channel length/type; if failures correlate tightly with length while ppm/temperature is unchanged, channel margin is primary; then repeat with BW narrow vs wide to see if tracking aggravates.

Fix: start conservative (narrow BW, low peaking) to avoid tracking noise; if still failing, improve segmentation (retime location) or reduce frequency stress (cleaner ref / lower ppm drift path).

Pass criteria: long channel meets Point C BER ≤ X_BER and lock holds for X_SOAK_TIME with relock ≤ X_RELOCK_PER_HR.

Works at room temperature, fails hot/cold — what are the first 3 log fields to record?

Likely cause: temperature pushes frequency error (ppm), rail noise, and/or thermal gradients into a marginal tracking window, triggering relock or burst errors.

Quick check: log these three fields with timestamps aligned to errors: (1) local retimer temperature (or closest sensor), (2) key rail ripple / droop state, (3) lock/relock events + error counter window at Point C.

Fix: require thermal soak before measurement, then tighten rails (decoupling, regulator stability) and re-evaluate BW window under hot/cold; avoid optimizing only at room conditions.

Pass criteria: across temp range X_TMINX_TMAX, Point C BER ≤ X_BER and relock ≤ X_RELOCK_PER_HR.

Eye diagram looks larger, but BER gets worse — which measurement trap to eliminate first?

Likely cause: “better eye” is an instrument/settings artifact (CTLE preset, equalizer state, RBW/VBW, reference/trigger sharing) that does not correlate with far-end sampling.

Quick check: lock measurement settings (same bandwidths, same equalizer preset, same trigger/reference) and compare Point C BER/CRC under identical traffic; treat eye-only improvement as “not proven.”

Fix: enforce A/B/C correlation: accept any tuning only if it improves C (BER/CRC) and does not raise relock; remove hidden EQ presets that change between captures.

Pass criteria: Point C BER ≤ X_BER (or CRC ≤ X_CRC_PER_WINDOW) with stable lock; eye metrics are secondary.

Locks fast, but errors appear under load — power ripple or crosstalk injection first?

Likely cause: load increases rail ripple/ground bounce that modulates the CDR (phase noise), or introduces switching-coupled deterministic jitter via return-path/crosstalk into the sampling front-end.

Quick check: align error timestamps with rail ripple state; if error bursts track ripple/VRM events, power coupling is primary; otherwise prioritize physical injection (return path/crosstalk).

Fix: power-first path: reduce ripple (decoupling placement, rail damping, isolation of sensitive rails); injection-first path: improve return continuity, spacing, shielding, and symmetry near retimer pins.

Pass criteria: at max load, rail ripple ≤ X_RIPPLE_MVPP and Point C BER/CRC meets target (BER ≤ X_BER) over X_OBS_WINDOW.

Swapping one cable turns stable into intermittent — reflection point or tracking stress?

Likely cause: cable/connector impedance changes create reflections (ISI/eye collapse), or change loss such that tracking margin becomes fragile under ppm/thermal noise.

Quick check: freeze BW/peaking and compare only cable SKUs/lengths; if failures follow specific cable types regardless of BW, reflection/loss dominates; then toggle BW to see if tracking sensitivity is the amplifier.

Fix: reflection/loss path: improve segmentation or restrict cable spec; tracking path: move to conservative BW window and reduce peaking, then re-qualify cable variation set.

Pass criteria: across qualified cable set (SKU/length), Point C BER ≤ X_BER and relock ≤ X_RELOCK_PER_HR.

Output jitter at Point B looks great, but far-end still errors — what is the first 3-point comparison?

Likely cause: Point B metric is not representative of far-end sampling, or B measurement is biased by hidden EQ/trigger/reference; far-end may be dominated by channel/receiver interaction not visible at B alone.

Quick check: compare trends, not absolute numbers: does A→B improvement produce B→C improvement under the same traffic and settings? If not, treat B as non-authoritative and pivot to C-driven validation.

Fix: lock measurement configuration, then optimize only against Point C BER/CRC and relock; re-choose which metric at B correlates with C (often error statistics beat eye-only measures).

Pass criteria: A/B/C correlation holds within X_CORR_PCT and Point C BER ≤ X_BER over X_OBS_WINDOW.

Narrower loop BW improves BER but increases lock time — how to set an acceptable trade-off threshold X?

Likely cause: narrower BW filters more jitter but reduces acquisition/tracking speed; the “best BER” mode may violate system recovery/startup budget.

Quick check: define the system budget first (max acceptable lock/recover time), then sweep BW modes and record (BER, lock time, relock) under the same stress.

Fix: select the most conservative BW that still meets X_LOCK_MS (startup/relock budget); within that subset, pick the mode with lowest Point C errors and no mid-band burst behavior.

Pass criteria: lock time ≤ X_LOCK_MS and Point C BER ≤ X_BER with relock ≤ X_RELOCK_PER_HR.

Same board behaves very differently across chassis — thermal path or ground/return first?

Likely cause: chassis changes airflow/thermal gradients (ppm drift, margin loss) or changes grounding/return current paths (deterministic jitter injection).

Quick check: force comparable thermal steady state (soak to stable temperature) and repeat the same BER test; if behavior remains different at the same temperature, prioritize ground/return coupling.

Fix: thermal path: improve heat spreading/airflow consistency; ground/return path: enforce return continuity near retimer, reduce common-mode currents, and stabilize reference/rail routing.

Pass criteria: across chassis variants, at steady-state temperature, Point C BER ≤ X_BER and relock ≤ X_RELOCK_PER_HR.

Field failure cannot be reproduced — how to use BIST/loopback to turn “field conditions” into a repeatable experiment?

Likely cause: missing context (temperature, rails, reference state, BW/peaking mode, channel identity) hides the true trigger; “one-time” failures are often statistical bursts under specific stress combinations.

Quick check: capture a “field replay bundle”: retimer config snapshot + cable ID/length + temperature + rail ripple state + error counter window (duration X_OBS_WINDOW), then replay with loopback segmentation.

Fix: use near-end and far-end loopback points to isolate which segment fails; then reproduce with the same stress factors and only one variable change per run (BW, cable, ref, rails).

Pass criteria: replay reproduces the field symptom at ≥ X_REPRO_PCT (or explains root cause with ≥ X_EXPLAIN_PCT correlation to a stress variable).

Periodic errors (spur-like) — check reference spur or supply coupling first?

Likely cause: periodic modulation enters the CDR via (1) reference spur path, or (2) rail ripple / switching harmonics coupling into VCO/PD and creating deterministic timing error.

Quick check: change only the reference condition (alternate ref source or ref routing) and observe whether the error periodicity shifts; if unchanged but tracks load/VRM state, prioritize rail coupling.

Fix: reference path: isolate/clean ref distribution and reduce spur injection; rail path: reduce ripple in the relevant band and improve local decoupling/return continuity near sensitive pins.

Pass criteria: periodic burst rate ≤ X_BURSTS_PER_WINDOW, and Point C BER ≤ X_BER across X_OBS_WINDOW.

“Downspeed fixes it” — when is a retimer required instead of adding more EQ?

Likely cause: downspeed increases margin; if failures are dominated by tracking/clean-up limits (ppm/thermal/rail coupling), linear EQ alone cannot restore stability at the target rate.

Quick check: at target rate, run a BW/peaking sweep and compare C results; if no configuration meets BER/lock targets across stress while downspeed does, the limitation is beyond linear equalization.

Fix: require re-timing (retimer) when the target rate must be maintained and stability requires breaking jitter transfer; place the retimer to segment the channel and validate with A/B/C correlation.

Pass criteria: at target rate, Point C BER ≤ X_BER, lock time ≤ X_LOCK_MS, relock ≤ X_RELOCK_PER_HR across the qualified stress set.