CDR (Clock & Data Recovery): Design, JTOL, and Validation
← Back to:Reference Oscillators & Timing
Clock & Data Recovery (CDR) is the receiver function that reconstructs a stable sampling clock directly from incoming data so bits can be decided reliably on lossy, asynchronous, and SSC/ppm-drifting links. This page turns CDR selection and validation into an engineering loop—architecture → loop bandwidth → JTOL/BER/slip tests → board/measurement fixes—so bring-up becomes repeatable and production-ready.
What is a CDR and where it sits (Definition + scope)
A CDR (Clock & Data Recovery) is a receiver control loop that recovers a sampling clock from incoming data by tracking phase/frequency error and placing the sampling instant where decision errors are minimized. The goal is not “a pretty clock,” but a stable sampling point that meets JTOL/mask + BER evidence.
Minimal engineering model (input → loop → outputs)
- JTOL / jitter mask is met under defined modulation, SSC, and ppm offset conditions.
- BER/bathtub passes at required stress and observation time (not just “looks OK”).
- No slips or alignment loss under temperature, supply, and aggressor activity.
Terminology map (avoid category mistakes)
CDR is a receiver sampling problem. Clock cleaners and synthesizers are reference clock problems. EQ is a channel compensation problem. This page stays on the CDR boundary and uses EQ only as it affects lock/tolerance.
- Primary goal: stable sampling point + low decision errors
- Key evidence: JTOL/mask + BER/bathtub + slip=0
- Typical location: RX PHY, retimer core, optical/SerDes front-end
- Includes CDR; adds elastic buffering and lane alignment features
- Watch: latency class (fixed/variable), slip behavior, monitoring hooks
- Validate: compliance under SSC/ppm + multi-lane drift
- Helps eye opening but does not recover a new sampling clock
- Risk: amplifies noise/jitter; can worsen CDR lock downstream if mis-set
- Validate: eye mask + BER, not just amplitude
Mention-only here: cleaners shape reference clocks; CDR shapes sampling from data. Deep loop design belongs to the PLL/cleaner page.
Does not cover: PLL synthesizer theory, jitter cleaner deep dive, protocol compliance tables, EQ algorithm derivations.
When you need CDR (and when you don’t)
The decision is not “CDR is always better.” A CDR is justified when the receive sampling clock must be derived from data or when the link’s timing uncertainty cannot be contained by a shared reference alone. This section turns the decision into measurable inputs → a yes/no flow → a validation plan.
Decision drivers (what really forces CDR)
- No shared stable reference at RX: the sampling clock must be recovered from embedded transitions.
- High-loss / heavy ISI channel: eye closure requires EQ+CDR cooperation to maintain margin.
- SSC / ppm offset / drift / wander: the receiver must track slow and mid-band timing variation without losing lock.
- Multi-lane alignment: lane-to-lane drift and slip monitoring become first-class requirements.
- Latency behavior matters: fixed vs variable latency constraints can rule out certain approaches.
Required link inputs (fill this before selecting any CDR)
- Data rate range (min/typ/max)
- Shared refclk available at RX (yes/no); allowed ppm offset
- SSC presence (depth/rate) and whether SSC is allowed end-to-end
- Expected drift/wander (temperature, oscillator class, system modes)
- Insertion loss near Nyquist (or a comparable channel loss metric)
- Expected aggressors (crosstalk, power noise coupling, mode-switching)
- Need for EQ and whether EQ is inside the device or external
- Target BER and observation time (confidence matters)
- JTOL/jitter mask requirement under defined stress
- Latency constraint (fixed vs variable; deterministic behavior)
- Multi-lane count, allowed skew/drift, and slip tolerance (usually zero)
When adding CDR does not solve the root cause
CDR architectures you’ll actually meet
Many datasheets label “CDR” as a feature, but engineering outcomes depend on recognizing the underlying architecture and the behaviors it tends to produce in JTOL, BER/bathtub, lock time, and slip events. This section builds a practical map that helps interpret block diagrams and predict validation risks.
The practical buckets (what the block diagram implies)
- PLL-based CDR: phase detector → loop filter → VCO/DCO (or phase interpolator). Best understood as a tracking loop whose bandwidth shapes jitter transfer.
- Oversampling / Digital CDR: multi-phase sampling → digital phase estimation → sampler adjustment. Often includes adaptation logic and internal margin telemetry.
- PD type (bang-bang vs linear) is a cross-cutting choice: it changes small-error response and how decision noise maps into recovered timing.
Datasheet grab handles (fields that predict behavior)
- Hold-in / pull-in range: survival under ppm offset and drift.
- Loop bandwidth (or equivalent): tracking vs jitter transfer trade-off.
- Lock detect + slip indicators: observability in bring-up.
- JTOL curve/mask: which modulation bands dominate failure.
- Latency behavior: fixed vs variable constraints.
- Sampling phases / OS ratio: margin granularity vs complexity.
- Margin telemetry: internal eye/phase counters for debug.
- Adaptation hooks: ability to stage and freeze training.
- Pattern sensitivity: PRBS changes that move BER tails.
- Tracking window: SSC/ppm handling during transitions.
- Bang-bang: quantized correction; robust acquisition; may show timing dither floor.
- Linear: fine small-error response; can be more eye-quality sensitive.
- Evidence differences often appear in JTOL mid-band and bathtub tails.
Architecture comparison (strengths, risks, validation focus)
- JTOL sweep (dominant band)
- Lock time + hold-in/pull-in
- Slip counters under SSC/temp
- PRBS A/B (7 vs 31)
- EQ state sweep (train/freeze)
- Margin counter ↔ BER correlation
- JTOL mid-band margin
- Bathtub tails
- Lock-but-high-BER cases
Loop dynamics: tracking vs cleaning (the CDR bandwidth story)
CDR bandwidth is the main lever that decides whether the recovered clock behaves like a tracker or a filter. Lower bandwidth tends to suppress more input timing variation at mid/high offsets, but it may fail under drift/SSC and can stretch lock time. Higher bandwidth improves tracking of slow variation (ppm, wander, SSC), but it can transfer more input jitter into sampling.
Bandwidth selection in three steps (goal → trade-off → verify)
- Is SSC enabled? What depth/rate must be tolerated?
- What ppm offset and drift/wander must be tracked without slips?
- Is multi-lane alignment tight (skew/drift budget)?
- Is latency determinism (fixed vs variable) a hard requirement?
- Cleaner-like: reduces jitter transfer at mid/high offsets
- Slower tracking: vulnerable to ppm/SSC/wander
- Often longer lock and slower recovery from mode changes
- Stronger tracking of slow variation (ppm/SSC)
- More jitter transfer into sampling at higher offsets
- Can create “lock-but-high-BER” if input jitter dominates
- JTOL sweep across modulation frequency: confirm mask margin in the failure-dominant band.
- SSC on/off A/B test: confirm slip=0 and stable BER.
- Temperature + supply stress: confirm lock stability, not only a clean lab condition pass.
- PRBS7 vs PRBS31: expose pattern-sensitive tails and data-dependent jitter effects.
Common bandwidth-related failure signatures (fast diagnosis)
Jitter taxonomy for CDR (what matters in practice)
CDR does not “care” about every textbook jitter category equally. What matters is the path from a timing disturbance to sampling-point error: whether the loop tracks it, transfers it, or leaves it as residual phase error that shows up as JTOL mask loss, BER tails, slips, or slow recovery.
The only taxonomy that matters: frequency band + impact path
Symptom cards (cause → check → fix) for fast debugging
Jitter tolerance (JTOL) and masks: how to read and use them
JTOL is the most actionable way to express “how much timing modulation a CDR can survive” across modulation frequency. Treat it as a verification language: stimulus, setup, sweep, and pass criteria must be specified together or the result is not portable.
JTOL test recipe (Stimulus / Setup / Sweep / Pass)
- Sine phase modulation (PJ): sweep modulation frequency; increase UIpp until failure.
- SSC: verify tracking margin and slip=0 across the configured spread depth/rate.
- Frequency offset (ppm): stress hold-in/pull-in behavior and long-term stability.
- Injection point: TX / pre-channel / RX input — do not mix results across points.
- EQ state: training vs frozen; log the state for every sweep.
- Pattern: PRBS length and encoding must match the intended worst case.
- Observation window: BER accumulation time and confidence; log duration.
- Telemetry: lock indication, slip counters, internal margin counters (if available).
- Sweep modulation frequency (log spacing) to find the weakest band.
- At each frequency, increase UIpp until fail; record margin to the mask boundary.
- Run A/B sweeps with SSC on/off, temperature points, and at least one alternate EQ/CDR profile.
- Mask pass across the specified modulation band.
- BER ≤ threshold under the logged conditions (pattern, time, temperature, SSC).
- Slip events = 0 over the defined observation window.
Common JTOL traps (why “good hardware” looks bad)
Equalization interaction (CDR + CTLE/DFE)
Equalization can improve eye opening while making clock recovery less stable. The root cause is usually not “EQ vs CDR” as separate blocks, but the coupling path: CTLE/DFE reshape edge slope, ISI residue, and data-dependent timing behavior, which changes what the phase detector “sees” and where loop dynamics become sensitive.
What a CDR-friendly eye looks like (practical criteria)
Tuning sequence and rollback strategy (step-by-step)
- If slips appear or lock becomes intermittent → revert one step and freeze EQ.
- If BER tails worsen while the eye looks larger → suspect DDJ/DFE bias; reduce DFE aggressiveness.
- If JTOL fails in a narrow mid-band → shift profile away from that band; avoid peaking that piles jitter near loop sensitivity.
Common “EQ makes CDR worse” cases (fast mapping)
Multi-lane links: deskew, lane-to-lane alignment, and slips
In multi-lane links, the dominant failure mode is rarely “one lane cannot lock.” The real risk is relative drift: lane-to-lane skew changes with temperature, supply disturbance, or channel differences. Deskew buffers and alignment markers keep the system coherent, but they also provide clear points where slips can be detected and managed.
Architecture map: per-lane CDR vs shared clock + deskew
- Strength: each lane adapts to its channel and drift.
- Risk: recovered clocks differ; relative drift accumulates into FIFO over/under events.
- Breaks first: shallow FIFO, weak drift monitoring, frequent re-alignment events.
- Strength: a common timing base simplifies lane-to-lane coherence.
- Risk: distribution asymmetry and routing imbalance become the main sensitivity.
- Breaks first: skew budget exceeded under temperature gradients or supply noise.
Multi-lane risk checklist and monitoring points
- Lane drift: temperature/supply causes relative phase migration.
- Deskew FIFO margin: fill level approaches thresholds; over/under flags increase.
- Alignment stress: marker/comma re-alignment becomes frequent.
- Slip events: overflow/underflow or marker loss triggers alignment reset.
- Per-lane: lock state, slip counter (if provided), margin telemetry (if available).
- Deskew: FIFO fill level, over/under flags, threshold crossings per time.
- Alignment: marker detect rate, alignment error flags, re-alignment count.
- System: drift vs temperature and supply disturbance correlation.
Typical multi-lane failures (fast mapping)
Design hooks & pitfalls (board + power + layout)
Board-level details often dominate CDR outcomes. Power ripple, return-path discontinuities, and termination placement can convert voltage noise into sampling jitter by disturbing VCO/DCO/phase-interpolator nodes, bias/common-mode networks, or edge crossings seen by the phase detector. This section provides a practical checklist with quick checks and fix actions.
How voltage noise becomes sampling jitter (three sensitive paths)
Layout triad: Power, Return, Termination (fast rules)
- Prioritize the closest capacitors to CDR/VCO/DCO/PI rails; minimize loop area.
- Use a small “local island” approach: tight cap cluster + short vias + solid reference plane.
- Keep noisy digital rails from sharing impedance with sensitive clock-control rails.
- Avoid crossing plane splits/slots with high-speed differential pairs.
- Control via transitions: keep pair symmetry and provide a continuous return path.
- Reduce common-mode conversion by maintaining pair geometry and reference continuity.
- Place termination close to the receiver/DUT to prevent reflections from re-shaping crossings.
- Keep AC coupling capacitors symmetric and near the intended interface boundary.
- Protect common-mode/bias nodes from sharing routing with aggressor lines.
Top 10 pitfalls checklist (each includes quick check + fix)
Measurement & validation: BER, bathtub, eye, and injection tests
Validation must be reproducible and production-friendly. Results often disagree across labs because pattern, observation window, injection point, and bandwidth definitions are not held constant. The goal here is a closed-loop approach: define stimulus, control the measurement chain, and log a minimal metadata set so comparisons remain meaningful.
Reproducible test rules (minimum metadata to log)
- Pattern: PRBS7 for fast bring-up; PRBS31 to expose long-correlation DDJ/ISI.
- Observation window: time or bits counted; avoid “short peek” conclusions for tails/floor.
- EQ/CDR state: training vs frozen; selected profile; relock events.
- Injection definition: point, calibration note, and bandwidth consistency.
- Environment: temperature, SSC on/off, frequency offset condition.
Measurement traps (quick check + fix)
IC selection logic: device class → key fields → validation plan
Selection is treated as an engineering closure: fields (read with consistent conditions) + decision gates (pick the right device class) + validation mapping (prove JTOL/SSC/latency/slips on a reproducible setup). Example MPNs are included for faster datasheet lookup; always verify speed grade, package, temperature, and lifecycle.
A) Selection field sheet (what to compare + how to validate)
| Category | Field | How to read (conditions) | What to validate (lab/production) |
|---|---|---|---|
| Link & range | Data rate range / sub-rates | Confirm NRZ vs PAM4, supported rates, and any “auto-rate” assumptions. Check if half/quarter rates are supported for legacy. | Rate sweep with PRBS (e.g., PRBS7/PRBS31): lock detect stable, BER < target, no unexpected mode flips. |
| Acquisition | Lock range / hold-in / pull-in | Read ppm (or Hz) limits, whether referenced or reference-less, and whether limits depend on pattern, temperature, or supply. | Frequency-offset sweep: record lock time, stable tracking, and slip/elastic-buffer events = 0. |
| SSC & wander | SSC tolerance (depth/rate) | Note down-spread %, modulation frequency range, and whether tolerance is guaranteed across corners. | Apply SSC-stressed source: confirm no intermittent slips, BER stays below target, and lane alignment remains stable. |
| Jitter | JTOL masks / jitter transfer | Read injection method (sine-PM/SSC), measurement bandwidth, and any peaking constraints (transfer shape matters more than a single number). | JTOL sweep (mod freq vs UIpp): mask pass, BER gate pass, slip counter = 0. Keep injection point and bandwidth consistent. |
| Latency | Fixed vs variable latency | Confirm whether latency changes with equalization, rate switching, or relock. Determinism matters for sync/deskew budgets. | Measure latency distribution across power cycles and temperature: bounded variation and predictable relock behavior. |
| EQ | CTLE/DFE integration + bypass | Check whether EQ can be bypassed, training order constraints, and whether DFE adds data-dependent jitter sensitivity. | Bring-up with a controlled sequence: coarse EQ → lock → fine EQ. Track lock margin and BER changes after tuning. |
| Multi-lane | Deskew FIFO / slip counter | Verify lane-bonding assumptions, marker alignment, and availability of counters/telemetry for slips and margin. | Stress temperature/supply: lane-to-lane drift bounded; deskew never overflows; slip events remain 0 in the pass window. |
| Monitoring | LOS/LOL, margin & eye monitors | Prefer devices with actionable observability (lock states, counters, eye/margin telemetry) for bring-up and production. | Production screen fields: lock time, lock stability, slip counter, BER under stress, margin width (bathtub). |
| Power/layout | Supply sensitivity & I/O constraints | Read recommended rails, filtering, AC-coupling, termination placement, and any “reference-less” caveats. | Correlate supply noise to jitter/BER: inject ripple (bounded) and confirm no lock instability or margin collapse. |
Tip: Any field without a matching validation step is “non-actionable” and should not drive the decision.
B) Concrete MPN examples (for datasheet lookup only)
- DS280DF810 (TI) — 28Gbps multi-rate 8-channel retimer (reference-less option on some configs) :contentReference[oaicite:1]{index=1}
- DS125DF410 (TI) — 9.8–12.5Gbps quad retimer with adaptive EQ / CDR / DFE :contentReference[oaicite:2]{index=2}
- DS125RT410 (TI) — 9.8–12.5Gbps quad retimer with adaptive EQ + CDR :contentReference[oaicite:3]{index=3}
- DS110DF111 (TI) — 8.5–11.3Gbps 2-channel retimer :contentReference[oaicite:4]{index=4}
- LMH1219RTWR (TI) — 12G-SDI adaptive cable equalizer with integrated reclocker :contentReference[oaicite:5]{index=5}
- LMH1226 (TI) — dual-output 12G UHD reclocker (video/SDI + 10GbE use cases) :contentReference[oaicite:6]{index=6}
- DS160PT801 (TI) — PCIe® 4.0 protocol-aware retimer (16 GT/s, 8-lane/16-channel) :contentReference[oaicite:7]{index=7}
Use this class when “link training / deterministic behavior / platform compatibility” is a first-order requirement, not just eye opening.
- ADN2814ACPZ (Analog Devices) — CDR for 10 Mb/s to 675 Mb/s (continuous-rate lock without external refclk) :contentReference[oaicite:8]{index=8}
- SY87701L (Microchip) — AnyRate® CDR / data retiming up to 1.25 Gb/s NRZ :contentReference[oaicite:9]{index=9}
- GN2255 (Semtech) — 50Gbps PAM4 Tri-Edge™ CDR (optical-module oriented integration) :contentReference[oaicite:10]{index=10}
- GN2044 (Semtech) — integrated bi-directional CDR with laser-driver/limiting-amp building blocks (module use cases) :contentReference[oaicite:11]{index=11}
C) Decision gates (choose the device class first)
- Link semantics required? If training/compatibility/deterministic behavior is mandatory → protocol-aware retimer/PHY-class.
- Recovered clock as a deliverable? If a dedicated recovered clock/output interface is needed → standalone CDR or reclocker-class.
- Rate coverage & drift stress? Wide rate range + frequent SSC/ppm drift → prioritize hold-in/pull-in + SSC tolerance + slip observability.
- Multi-lane bonding risk? If lane-to-lane drift matters → deskew FIFO + marker alignment + slip counters become non-negotiable.
- EQ interaction controllability? If bring-up must be repeatable → require EQ bypass/telemetry and a stable tuning sequence.
- Production readiness? Prefer devices with lock states, counters, eye/margin monitors, loopback/PRBS features.
- Pass window: max BER, “slip=0” rule, allowed latency variation, temperature range.
- Stress set: max ppm offset, SSC depth/rate, worst-case channel loss, aggressor coupling condition.
- Observability: lock state granularity, slip counters, margin/eye monitors, accessible telemetry bus.
D) Validation plan mapping (fields → tests → logs)
| Test item | Stimulus / sweep | Logging | Pass criteria (placeholders) |
|---|---|---|---|
| Lock acquisition | Cold start, rate sweep, ppm sweep | lock time, lock state trace | lock within < X ms; no relock loops |
| JTOL | sine-PM sweep (freq, UIpp) | BER, slip counter | mask pass; slip=0; BER < target |
| SSC robustness | down-spread depth/rate sweep | slips, deskew status | no intermittent slips across corners |
| Latency determinism | power-cycle + temperature sweep | latency histogram | variation bounded to < X UI (or < X ns) |
Diagram rule: requirements choose the class; fields choose the part; validation proves the selection.
FAQs: CDR bring-up, jitter/SSC, EQ interaction, and multi-lane slips
This section closes long-tail debug questions without expanding scope: each item is a repeatable hypothesis loop with measurable checks and pass criteria.
Formatting contract: each answer remains a 4-line, measurable loop (cause → check → fix → pass) to keep scope tight and production-friendly.