SATA/SAS TxRx & Redriver: EQ, CDR, and OOB Compatibility
← Back to:Interfaces, PHY & SerDes
SATA/SAS redrivers/retimers keep storage links reliable by restoring electrical margin (eye/BER/OOB robustness) on the worst channel segments—without changing the upper-layer protocol. The core is measurable proof: pick OOB-safe + high-rate presets, verify across cables/backplanes/temperature, and ship with pass/fail thresholds.
Definition & Scope: SATA/SAS TxRx, Redriver, Repeater, Retimer
This page covers signal-conditioning devices used in existing SATA/SAS physical links to restore usable eye margin (loss/ISI/jitter tolerance) while staying protocol-transparent to upper layers.
- Channel impairments: insertion loss, ISI, reflections/return loss, crosstalk, jitter, supply-noise modulation.
- RX EQ knobs (CTLE/VGA/adaptation behavior) and TX conditioning (swing / pre-emphasis) when provided.
- Retimer/CDR behavior: latency, jitter transfer, lock/recover characteristics that impact stability.
- OOB / electrical-idle compatibility: detect thresholds, burst/gap integrity, pass-through behavior.
- Upper-layer SATA/SAS protocol, frames/commands, register-level sequencing.
- Training specifics of other SerDes families (PCIe/CXL/JESD, etc.).
- Generic SI textbook expansion not tied to storage link bring-up and interoperability.
- Does: gain + EQ shaping (CTLE/VGA/limiting).
- Does not: CDR resampling.
- Risk: noise can rise; over-boost can worsen ringing/reflections.
- May include: detect/gating/threshold features plus EQ.
- Verify: idle/OOB bypass and threshold behavior for interoperability.
- Does: CDR lock + resample + re-transmit.
- Changes: latency and lock/recover behavior.
- Risk: OOB/idle transitions must be validated, not assumed.
SATA/SAS links include non-continuous phases (electrical idle and OOB bursts/gaps). A device that looks “better” under continuous patterns can still fail link-up or recovery if detection thresholds or conditioning reshapes these transitions.
Storage Link Topologies: Where These Parts Sit (Host/Backplane/Expander/Cable)
Placement is a margin-allocation problem: split the dominant-loss segment into smaller, verifiable pieces without breaking electrical-idle/OOB behavior. SATA commonly appears as point-to-point links; SAS often adds backplanes, expanders, and multi-hop paths—still treated here as physical segments.
- Localize loss/ISI: split the worst segment; do not let one segment consume the full budget.
- Control reflections: gain/EQ does not “fix” return loss; placement must respect discontinuities.
- Multi-hop discipline: each hop closes its own margin budget; avoid relying on one downstream “big fix”.
- Bring-up protection: validate idle/OOB detect behavior at the planned insertion point.
Symptom: intermittent link-up or instability that correlates with connector count.
Placement: split the reflection-dominant region; avoid placing gain where ringing dominates.
Gotcha: over-boost can amplify ringing and shift detect thresholds during idle/OOB transitions.
Symptom: eye collapse and BER sensitivity to temperature/voltage.
Placement: split the dominant-loss trace so each segment stays within an achievable EQ/retime envelope.
Gotcha: linear gain after excessive loss often raises noise floor without restoring transition sharpness.
Symptom: failures cluster on hot-plug/recovery or specific cable/backplane combinations.
Placement: treat each hop as a separate budget; insert conditioning at hop boundaries to prevent accumulation.
Gotcha: lock/recover behavior and idle/OOB handling become first-order interoperability risks.
Channel Impairments & Margin Budget (Eye, Jitter, Loss, Return Loss)
Storage links fail when margin is consumed by a small number of dominant physical impairments. The goal here is not “pretty scope shots”; it is a verifiable budget that predicts whether link-up, recovery, and steady-state BER remain stable across worst-case conditions.
- Insertion loss (IL): high-frequency attenuation slows edges → more ISI; long backplanes/cables and high connector counts amplify it.
- Return loss (RL) / reflections: ringing and double edges reduce sampling margin and can corrupt detect thresholds during idle/OOB transitions.
- ISI (history dependence): eye closes because current bits depend on prior bits; EQ may help, but only within a bounded range.
- Crosstalk: aggressor activity collapses victim margin; often appears as “only fails in chassis / full population”.
- Jitter (RJ/DJ): timing uncertainty reduces eye width; recovery/lock events can become first-order failure triggers.
- Supply noise modulation: AM/PM injection shifts amplitude/edge timing; correlates with load steps and platform power domains.
Treat margin as a running balance: TX margin + EQ benefit − noise lift − reflection residual − crosstalk penalty − jitter increment → RX usable margin.
- Identify the dominant consumer: loss/ISI vs reflections vs jitter vs crosstalk (do not average them into “eye looks bad”).
- Apply the cheapest corrective action first: placement + termination + EQ knob selection, before adding complexity.
- Validate in two regimes: (a) continuous data margin and (b) storage-specific phases (idle/OOB/recovery/link-up success rate).
A clean eye under a continuous pattern does not guarantee stable link-up, recovery, or OOB/idle transitions. Pass criteria must include BER and success-rate metrics across the non-continuous phases.
Large eye height can coexist with poor eye width if jitter grows. Always include a timing view (bathtub or width proxy) before concluding margin.
Scope improvements can be misleading when noise floor rises or reflections remain. Close the loop with BER (or error counters) under worst-case activity.
Many failures are population- and platform-dependent (crosstalk, PSU modulation, hot-plug/recovery). Validate across temperature/voltage and realistic lane activity.
- Eye height @ RX: ≥ [X]
- Eye width / timing margin @ RX: ≥ [X]
- BER target: ≤ [Y] (spec / system requirement)
- Link-up / recovery success rate: ≥ [Z]% across worst-case conditions (include idle/OOB events)
Architecture Deep Dive: Equalizer Path + CDR Path (What Changes, What Doesn’t)
Redrivers and retimers can both “improve the eye,” but they do it through fundamentally different paths. The decision should be driven by what is dominant: loss/ISI shaping vs timing recovery and jitter isolation—while keeping storage link behavior (idle/OOB/recovery) intact.
- Analog EQ path (redriver): shapes frequency response (CTLE/VGA/limiting). It can lift noise and cannot remove upstream sampling jitter.
- CDR re-timed path (retimer): recovers clock, resamples data, and re-transmits. It changes latency and lock/recover behavior, and alters jitter transfer.
Redriver typically adds low, stable latency. Retimer adds meaningful latency and introduces lock/recover timing that must be acceptable to the platform.
Redriver reshapes edges but does not resample timing; upstream jitter still propagates. Retimer can isolate some upstream jitter, but recovery robustness becomes critical.
Linear gain/EQ can lift the noise floor (especially under high boost). Retimers change noise/jitter shaping after resampling, but PSU/ground injection still matters.
Any device must preserve idle/OOB detection behavior. Retimers have higher risk because lock/recover and pass-through behavior can affect link-up and recovery phases.
Before optimizing EQ presets, validate that the device does not break electrical-idle / OOB burst/gap behavior and that recovery remains stable. This becomes the primary interoperability gate for SATA/SAS links.
RX Equalization: CTLE/VGA/DFE and Adaptation Traps
RX equalization primarily compensates high-frequency attenuation and ISI so that the receiver can recover a stable sampling margin. In storage links, improvement under continuous patterns must be validated against link-up/recovery phases because overly aggressive EQ can distort electrical-idle/OOB detection behavior.
- High-frequency loss: edges slow down, eye closes, and ISI increases.
- ISI (history dependence): current bits depend on prior bits; EQ attempts to reshape the effective channel response.
- What EQ does not “solve”: reflections/return loss driven by discontinuities; gain/boost can make ringing more visible.
- Effect: boosts high-frequency content to counter loss and reduce apparent ISI.
- Cost: lifts noise floor; can amplify ringing when reflections dominate.
- Verify: BER/error counters improve, not just the eye opening.
- Effect: restores usable swing at the front-end input.
- Cost: amplifies signal + noise together; too much gain can push stages toward compression.
- Verify: stability across temperature/voltage and across different cables/backplanes.
- Effect: cancels specific ISI components that remain after linear EQ.
- Cost: can be sensitive to noise/crosstalk and may create bursty error behavior if adaptation is unstable.
- Verify: error statistics stay stable (no “clusters”) and recovery events remain robust.
- Over-boost trap: eye appears wider, but BER does not improve because noise/ringing dominates the decision point.
- Reflection-dominant trap: more boost increases overshoot/ringing; discontinuities require physical fixes (layout/termination/placement).
- DFE burst trap: noise/crosstalk triggers wrong decisions; feedback can turn isolated errors into clusters.
- Non-continuous phase trap: EQ/detect behavior that passes continuous patterns may distort idle/OOB transitions (validate in H2-8 tests).
- Baseline: default settings; record BER/error counters and note any recovery/link-up issues.
- CTLE coarse sweep: move in a few discrete steps to find the “BER improves” region (not just bigger eye).
- VGA adjust: restore usable amplitude without pushing the front-end into compression.
- Re-check statistics: confirm error behavior is stable (no clusters) across time and activity.
- DFE only if needed: start conservative; stop if error clustering or sensitivity increases.
- Storage guardrail: replay idle/OOB/recovery events to ensure the tuned EQ does not break detection behavior.
TX Conditioning: Swing, Pre-emphasis/De-emphasis, and Spectral Side Effects
TX conditioning compensates channel loss by shaping the transmitted spectrum. Swing and pre/de-emphasis interact with RX equalization as a coupled system: more high-frequency energy can improve edge integrity but also increases EMI risk, overshoot/ringing, and crosstalk sensitivity.
- Swing: overall amplitude. Helps eye height, but can increase power, crosstalk, and risk of RX compression.
- Pre-emphasis / de-emphasis: redistributes energy around edges to counter high-frequency loss and ISI.
- Coupling with RX EQ: TX emphasis + RX CTLE both lift high-frequency content; excessive combined boost often causes instability.
- EMI: more high-frequency content increases radiated/conducted emission risk.
- Overshoot / ringing: reflections become more visible; return-loss issues can dominate.
- Crosstalk: faster edges couple more energy into adjacent lanes/traces.
- Interop risk: overly sharp edges can impact detect thresholds and recovery stability (validate with idle/OOB tests).
Start with swing and ensure RX is not saturating. Use emphasis only after confirming amplitude is not the primary limiter.
Try pre-emphasis (or a modest RX CTLE step) to sharpen edges. Stop if ringing/noise grows faster than BER improves.
Reduce emphasis and/or swing first, then recover margin with the minimum RX EQ required. Avoid “max TX + max RX” combinations.
Suspect crosstalk and platform noise. Back off high-frequency energy (emphasis) and re-validate recovery and idle/OOB behavior.
Jitter & CDR: Tolerance, Transfer, and Why “Cleaner” Can Still Fail
A waveform that looks “cleaner” on a scope does not automatically mean the link is more stable. CDR loop bandwidth determines what jitter is tracked, what is filtered, and where jitter peaking can occur. Storage links must remain robust not only in steady streaming data, but also during idle, recovery, and burst-like phases that stress lock and re-lock behavior.
- Low-frequency tracking: wider BW tends to follow slow wander more closely; narrower BW rejects more LF movement but may become fragile during events.
- High-frequency filtering: narrower BW can reduce HF jitter on recovered timing, but may increase sensitivity to burst/recovery transitions.
- Jitter peaking risk: near the bandwidth corner, transfer can peak; scope “cleanliness” may improve while BER/lock margin worsens.
- Jitter generation: device power/ground noise can modulate timing even if transfer looks “better” in one measurement view.
- Output timing jitter: RMS/TJ with a defined integration window (placeholders: [BW1]…[BW2]).
- Recovered clock stability: lock / re-lock behavior, re-lock time (placeholder: ≤ [T]), and any frequency/phase discontinuities around events.
- BER or error counters: statistics over time windows (placeholder: ≥ [N] seconds) and across realistic activity patterns.
- PVT sensitivity: BER/lock margin trends vs temperature/voltage (placeholders: [Temp] / [V]).
- Baseline: record BER/error counters and re-lock behavior using default settings and a fixed observation window.
- Separate regimes: evaluate steady data and event phases (idle/recovery/burst transitions) independently.
- Sweep loop BW (or equivalent knob): find the stability region where BER and re-lock metrics both improve.
- Stress sensitivity: introduce controlled disturbances (placeholder: periodic jitter, PSU ripple coupling) to locate fragile frequency bands.
- PVT check: repeat a reduced sweep across temperature/voltage corners to confirm the “best point” generalizes.
- Storage guardrail: re-run idle/recovery/OOB-facing behaviors after tuning to ensure compatibility is not degraded.
OOB & Electrical Idle Compatibility: COMRESET/COMINIT/COMWAKE (and SAS Considerations)
In storage links, OOB and electrical-idle behaviors are a primary interoperability gate. SATA uses COMRESET, COMINIT, and COMWAKE sequences for initialization and wake-up handshakes. Devices placed in the channel must not over-filter, over-EQ, or re-shape these burst-and-gap patterns in a way that causes false triggers or missed detections. SAS also includes non-continuous sequences around reset/initialization that require compatible detect and pass-through behavior.
- OOB is burst + gap detection: the receiver detects specific burst patterns separated by gaps and idle periods.
- Common failure mode: “data looks fine” but OOB fails because analog shaping changes the detect envelope and thresholds.
- Retimer sensitivity: lock/recovery policies can interact with non-continuous phases; validate behavior during events.
- SAS note: keep focus on compatibility checks and troubleshooting; avoid importing entire SAS PHY timing tables into this page.
- Idle / wake detect gate: confirm electrical-idle detection and wake detect are consistent (no intermittent behavior).
- Burst/gap integrity gate: check whether burst envelope or gap timing is re-shaped by EQ, filtering, or gain (look for compression, smear, or clipping).
- Reflection/termination gate: test sensitivity to connectors/cables/backplane variants; strong sensitivity often points to return-loss and ringing causing false detect or miss-detect.
- OOB pass-through mode: supported (Y/N) + conditions (placeholder).
- Electrical idle detect: supported (Y/N) + threshold/gating knobs (placeholder).
- OOB-safe EQ policy: bypass / conservative / auto (placeholder).
- Detect observability: status pins / counters / logs for OOB detect (placeholder).
- Recovery robustness: event success rate ≥ [Z]% (placeholder).
Power, Clocking, Reset, and Sideband Controls (Practical Integration)
Integration stability is often determined by power integrity, reset/enable sequencing, and configuration paths (strap, I²C, EEPROM). In storage systems, hot-plug and backplane power ordering can introduce real disturbances that translate into timing jitter, amplitude noise, false detects, or intermittent initialization failures.
- Rail ripple / ground bounce → gain/threshold variation → amplitude noise and margin loss.
- Supply noise coupling → timing modulation → added jitter, lock sensitivity, or re-lock failures.
- Brownout during hot-plug → undefined state → wrong strap capture / partial EEPROM load / mis-detect behavior.
- Key verification: correlate event logs with rails, reset timing, and status pins (not only scope waveforms).
- Strap sampling window: ensure rails stable before capture (placeholder: ≥ [T]).
- Pull-up/down sizing and tolerance: avoid marginal logic levels (placeholder: [R] range).
- Default mode must be “safe” for initialization events and detection (platform gate).
- Record a “register snapshot” after boot (placeholder: cfg_rev / preset_id).
- Apply settings in a stable order: enable detect → set mode → then EQ.
- Avoid abrupt changes during active links unless the part supports hitless updates.
- Define the “factory safe preset” and version it (placeholder: preset_id).
- Guard against partial reads during brownout/hot-plug (platform reset policy).
- Keep an escape path: force-safe strap or recovery command (placeholder).
- Define behavior on fault: disable-drive / bypass / safe-EQ (placeholder policy).
- Use status pins/counters: LOS/detect/lock/fail and event counts.
- Hot-plug: log re-lock time and failure signatures (placeholder: ≤ [T]).
rail_ok_ts · rst_deassert_ts · enable_ts · strap_mode · cfg_rev · eq_preset · los_cnt · lock_cnt · relock_ms · oob_fail_cnt · temp · vin
Board/Backplane/Cable Co-design: Placement, Routing, Return Path, ESD/EMI
Storage link stability is strongly tied to placement and return-path continuity. The most effective location is typically where it splits the worst channel segment, reduces cumulative discontinuities, and preserves predictable impedance. Protection components (TVS, CM chokes) should follow clear principles and risk checks, without turning this section into a protection device selection guide.
- Why: splitting the loss/return-loss peak increases recoverable margin.
- Quick check: mark the highest discontinuity region (connector/via field) and measure distance to device.
- Why: plane splits force return current detours, increasing ringing, EMI, and error sensitivity.
- Quick check: ensure the differential pair does not cross gaps or reference swaps without stitching.
- Why: stubs create resonances and sharp return-loss spikes that break links at specific rates.
- Quick check: identify unused via barrels; compare failure rate vs slots/lanes with larger stubs.
- Why: imbalance converts to common-mode and increases crosstalk and EMI.
- Quick check: review neck-down, via transitions, and connector escape zones for asymmetry.
- Why: extra capacitance and imbalance can close the eye or distort detect behavior.
- Quick check: validate insertion loss/return loss impact and lane-to-lane matching before freezing the BOM.
Engineering checklist: Design → Bring-up → Production (Pass criteria placeholders)
The purpose of this section is to turn signal-integrity knowledge into a repeatable, auditable workflow. Every checkpoint includes a measurable threshold placeholder and the evidence that must be saved.
- Pass criteria placeholders: BER ≤ [BER], Eye @ Rx ≥ [EH]/[EW], OOB pass rate ≥ [X]%, Link-up time ≤ [T] ms, Relock ≤ [T2] ms.
- Evidence types to archive: scope screenshots, eye/bathtub exports, BERT logs, temperature/voltage logs, fixture/cable revision, and register dumps (I²C/strap snapshots).
- MPN rule: part numbers below are examples; always verify exact suffix/package/temp grade/availability before freezing BOM.
Design (prevent non-fixable failures) Channel budget, placement, power/clock hygiene, OOB-safe defaults, and observability.
Pass criteria: IL@Nyquist ≤ [IL], RL ≥ [RL], margin ≥ [M].
Evidence: link budget sheet + measured S-parameters (or validated stack-up surrogate).
Pass criteria: OOB pass rate ≥ [X]% across cable/backplane variance.
Evidence: OOB waveform captures + device mode/config snapshot.
Pass criteria: post-device eye opening improvement ≥ [ΔE] with stable OOB.
Evidence: topology map + before/after eye/BERT result.
Pass criteria: supply ripple ≤ [Vpp], rail droop at events ≤ [ΔV].
Evidence: rail probe capture synchronized with link errors (time correlation).
Pass criteria: config readback matches golden image (CRC/bytewise) = [OK].
Evidence: I²C transaction log + register dump + firmware/script revision.
Example MPNs: TI SN75LVCP601 (SATA 6Gb/s, 2ch), TI SN75LVCP600S (SATA/SAS 6Gb/s, 1ch), Diodes PI3EQX6741ST (SATA 6Gb/s), Diodes PI3EQX12902A/PI3EQX12908A (SAS3 12Gb/s).
Pass criteria: datasheet limits satisfied and OOB validation plan exists.
Evidence: approved vendor list + validated config matrix.
Bring-up (make link-up repeatable and event-stable) The goal is not “pretty scope pictures” but stable initialization, OOB compliance, and repeatable timing.
Pass criteria: reset deassert after rails stable by ≥ [Δt]; mode readback = [expected].
Evidence: timing capture + register dump.
Pass criteria: BER ≤ [BER] for at least [N] seconds per preset.
Evidence: BERT log + preset map.
Pass criteria: OOB first-try pass ≥ [X]%; link-up time ≤ [T] ms.
Evidence: OOB captures + pass-rate log (timestamped).
Pass criteria: BER ≤ [BER] and OOB pass ≥ [X]% at the same preset.
Evidence: preset table + BER/OOB joint report.
Pass criteria: relock ≤ [T2] ms; burst error count ≤ [E].
Evidence: event script log + error counter snapshots.
Production (station correlation and traceability) Prevent “passes on ATE but fails in system” by controlling fixtures, scripts, and logs.
Pass criteria: delta(BER) ≤ [ΔBER]; delta(link-up) ≤ [ΔT].
Evidence: correlation report + station configuration snapshot.
Pass criteria: fixture return loss degradation ≤ [ΔRL] over [N] cycles.
Evidence: fixture serial + revision + maintenance log.
Pass criteria: preset CRC matches golden = [OK].
Evidence: golden config file + script revision + register dump.
Pass criteria: required-field completeness ≥ [C]%.
Evidence: raw logs + schema version + audit sample.
Applications & IC selection notes (kept right before FAQ)
This section stays within storage fabrics only (SATA/SAS/STP). It provides application buckets and a mobile-safe selection matrix using cards (no wide tables).
Strategy: place the redriver close to the dominant discontinuity; keep defaults OOB-safe.
Top risk: over-boost that improves data eye but distorts burst/gap detect.
Validation: OOB pass rate ≥ [X]% + link-up ≤ [T] ms across cable lots.
Strategy: split the worst hop and control reflection peaks; tune per-hop EQ windows.
Top risk: “cleaner output” still fails due to CDR/equalization interaction and event behavior.
Validation: segmented BERT + relock ≤ [T2] ms + corner coverage (T/V).
Strategy: prioritize OOB/idle compatibility verification; test combination matrix early.
Top risk: threshold/shape mismatch amplified across vendors.
Validation: combo matrix (controller/drive/cable) with OOB and link-up metrics logged.
Why it matters: insufficient bandwidth forces over-EQ and collapses OOB/idle robustness.
Pass criteria: supports ≥ [rate] with validated margin.
Trade-off: higher-rate parts typically increase power and tuning complexity.
Why it matters: CDR retiming can isolate jitter, but can introduce different event/latency behaviors.
Pass criteria: choose “linear-only” or “CDR path” based on BER/OOB joint success across corners.
Trade-off: retiming adds latency and may require deeper validation for event states.
Why it matters: insufficient range forces “too much boost” that amplifies noise and reflections.
Pass criteria: stable window exists where BER ≤ [BER] and OOB pass ≥ [X]%.
Trade-off: more knobs require controlled presets + production logging.
Why it matters: a device can “open the data eye” yet break initialization by reshaping burst/gap.
Pass criteria: OOB first-try pass ≥ [X]% across cables/slots + stable link-up ≤ [T] ms.
Trade-off: safest presets may not maximize eye; prioritize system success over scope cosmetics.
Why it matters: storage backplanes are airflow-limited; thermal drift can show as BER spread.
Pass criteria: junction headroom ≥ [ΔTj] at worst ambient.
Trade-off: higher-channel parts reduce BOM but increase hotspot density.
- TI SN75LVCP601 — 2-channel SATA 6Gb/s redriver (e.g., SN75LVCP601RTJR is a common orderable code; confirm reel/suffix).
- TI SN75LVCP600S — single-channel SATA/SAS up to 6Gb/s (useful when only one direction/lane needs conditioning).
- Diodes/Pericom PI3EQX6741ST — SATA Gen3 6Gb/s ReDriver with programmable EQ and output emphasis.
- TI SN75LVCP412A — SATA up to 3Gb/s (legacy or lower-rate segments).
- ADI/Maxim MAX4951 — SATA bidirectional redriver (commonly used near eSATA connector paths).
- Diodes PI3EQX12902A — 12Gb/s 2-channel linear ReDriver (SAS3/10GE/PCIe3/SATA3 combo electrical layer).
- Diodes PI3EQX12908A — 12Gb/s 8-channel linear ReDriver (high-lane-count backplane conditioning).
- Diodes PI3EQX32904 — 32Gb/s 4-channel linear ReDriver (compliant family targeting PCIe 5.0 / SAS4-class electrical layer).
- (Optional shortlist expansion) Diodes PI3EQX32908E — 32Gb/s 8-channel linear ReDriver family (confirm feature set per revision).
Recommended topics you might also need
Request a Quote
FAQs (storage-only): SATA/SAS TxRx, Redriver/Retimer, OOB/Idle robustness
These FAQs are intentionally narrow: electrical-layer margin, OOB/idle compatibility, EQ/CDR behavior, and production correlation. Every answer is actionable and includes measurable pass-criteria placeholders.
Eye @ Rx ≥ [EH]/[EW] · supply droop ΔV ≤ [ΔV] · RL ≥ [RL] · station delta ≤ [Δ]
Link is OK in lab, but OOB intermittently fails on the backplane — “burst/gap reshaped” or “reflection false-trigger” first? OOB / idle / discontinuity triage
Quick check: capture OOB at the receiver side (before/after the conditioner if test points exist): compare burst amplitude/width and gap timing vs a known-good lab setup; repeat with “minimal EQ / OOB-safe preset”. If failures correlate with connector/via fields, run quick TDR/return-loss check (look for large reflection peaks).
Fix: prioritize an OOB-safe preset (reduced high-pass boost, adjusted squelch/idle-detect thresholds, disable aggressive limiting); then reduce reflection sources (termination tuning, damping, stub control, placement nearer the dominant discontinuity).
Pass criteria: OOB first-try pass ≥ [X]% across slots/cables; link_up_time P95 ≤ [T] ms; false-wake / missed-detect counters ≤ [E].
Eye looks larger after adding a redriver, but BER gets worse — noise amplification or over-EQ? “Pretty eye” ≠ better system margin
Quick check: sweep EQ presets and log BER + eye/bathtub at the same measurement point. If BER worsens monotonically with boost while eye height improves, suspect noise amplification/peaking. If BER worsens only at certain presets, suspect resonance/over-boost interacting with reflections (look for ringing/overshoot).
Fix: back off CTLE/gain to the lowest preset that meets BER; then coordinate Tx pre-emphasis/de-emphasis (small steps) instead of forcing Rx boost. If ringing dominates, address RL/stubs/placement before adding more EQ.
Pass criteria: BER ≤ [BER] for ≥ [D] seconds per preset; eye @ Rx ≥ [EH]/[EW]; no BER regression when switching from “lab cable” to “worst backplane path”.
Retimer output jitter is smaller, yet link-up becomes harder — which step is most suspicious? CDR lock vs idle/OOB recovery
Quick check: log retimer lock/status pins/registers vs time during link-up attempts; capture link-up time distribution (P50/P95) and correlate failures to “CDR not locked” vs “OOB not detected” events. Try a retimer preset that favors faster acquisition (if available) and compare OOB pass rate.
Fix: select a preset that prioritizes acquisition/idle detect robustness; ensure reset/enable timing and reference/power stability are not violating acquisition windows. If failures correlate to idle detect gating, relax squelch thresholds or enable OOB pass-through mode (if supported).
Pass criteria: OOB first-try pass ≥ [X]%; link_up_time P95 ≤ [T] ms; relock ≤ [T2] ms after controlled disturbance; lock-fail counters ≤ [E].
Failures happen only on hot-plug — power droop or signal-detect threshold? Event-state robustness
Quick check: simultaneously capture supply rails at the conditioner and a detect/status pin/counter during hot-plug. If failures correlate with ΔV spikes, it is power integrity. If rails are stable but detect toggles rapidly, it is threshold/chatter (debounce needed).
Fix: add hot-plug-friendly power buffering (bulk + local decoupling, soft-start where applicable) and tighten return-path continuity. Then adjust detect/squelch hysteresis or debounce logic; ensure reset/enable sequence after hot-plug is deterministic.
Pass criteria: supply droop ΔV ≤ [ΔV] during hot-plug; relock ≤ [T2] ms; hot-plug success rate ≥ [X]% across [N] cycles with no lingering error-counter growth.
SATA links up, but SAS negotiation is unstable at one speed grade — how to capture the rate-switch transient? Evidence-first speed-change debug
Quick check: instrument the system to timestamp negotiation/state changes and error counters; trigger scope on rate-switch event (via sideband/status pin if available) and record eye/edge behavior around the transition. Compare “fixed-rate forced” vs “auto-negotiate” behaviors to isolate transition sensitivity.
Fix: define per-rate presets (Rx EQ + Tx emphasis) with a controlled switch sequence; avoid aggressive boost near the transition if it destabilizes detect/lock. Verify that the selected device truly supports the target SAS speed grade with margin in the worst path.
Pass criteria: negotiation success ≥ [X]% at the target rate across [N] cycles; BER ≤ [BER] after stabilization time ≤ [Ts]; link_up_time P95 ≤ [T] ms.
A certain board lot shows intermittent “drive drop” — what are the first 3 correlation fields to log? Production correlation hygiene
Quick check: log at minimum: (1) preset ID + config CRC/register dump, (2) fixture/cable/connector revision (and usage cycles), (3) temperature + key rails during the event. Correlate failures by these 3 fields before changing hardware.
Fix: freeze and version presets/scripts; lock fixtures and track lifetime; add event-triggered logging for rails/temp and error counters. Only then chase layout/placement deltas if correlation points to SI variance.
Pass criteria: required-field completeness ≥ [C]%; station delta ≤ [Δ]; intermittent drop rate ≤ [F] ppm over [H] hours soak.
The issue disappears after swapping the cable — insertion loss dominated or return-loss/reflection dominated? Fast IL vs RL discriminator
Quick check: A/B two cables with similar length but different construction; if BER improves mainly with higher-quality cable while reflections/ringing reduce in time-domain capture, suspect RL. If BER scales mostly with length and responds predictably to EQ, suspect IL. A quick TDR/return-loss check is the fastest confirmation.
Fix: IL-dominated: optimize EQ window and consider moving the conditioner closer to the dominant-loss segment. RL-dominated: reduce discontinuities (connector/via stubs), tune termination/damping, and avoid over-boost that amplifies reflection artifacts.
Pass criteria: RL ≥ [RL] (where measurable) and BER ≤ [BER] on the worst path; link_up_time P95 ≤ [T] ms across cable lots.
Stability collapses when connector count increases — change placement first, or Tx pre-emphasis first? “One knob at a time” decision
Quick check: do an A/B experiment: keep presets constant and move only the conditioner position in a topology prototype (or use alternate test points/slots that emulate shorter/longer segments). If improvement is large, placement dominates. If placement effect is small but pre-emphasis steps change BER predictably, Tx conditioning dominates.
Fix: prioritize placement that splits the dominant discontinuity; then use minimal Tx pre-emphasis and minimal Rx boost that meet BER. Avoid “max boost” solutions that break OOB/idle and increase crosstalk sensitivity.
Pass criteria: BER ≤ [BER] + OOB first-try pass ≥ [X]% at the chosen preset; no degradation when adding [k] connectors in the worst path.
Production passes, but system burn-in drops link — where do station-to-station differences usually hide? ATE vs system mismatch
Quick check: run the same DUT through multiple stations and in-system with the same preset/config; compare BER/link-up distributions and rail ripple logs. If failures correlate with a station/fixture revision, it is test-path dominated; if only system fails, it is event-state/power/return-path dominated.
Fix: lock down fixtures and scripts (revision control + lifetime), add rail ripple capture to ATE, and include OOB/idle event checks in production screening. Align acceptance thresholds with system-representative conditions.
Pass criteria: station delta ≤ [Δ] for BER/link-up; burn-in drop rate ≤ [F] ppm over [H] hours; required-field completeness ≥ [C]%.
OOB passes, but the link cannot reach the target speed — is training/negotiation being limited by EQ/clipping? “Pass OOB” does not guarantee “pass high-rate”
Quick check: force a known lower rate and verify BER margin; then attempt the target rate while logging speed-change events and error counters. Look for signs of saturation (flat-topped waveform, clipped transitions) or abrupt BER spikes only during/after rate change.
Fix: define separate “OOB-safe bring-up preset” and “high-rate operating preset” with a controlled switch once the link is stable. If clipping/saturation is observed, reduce gain/boost or adjust Tx swing/emphasis to stay within linear range.
Pass criteria: target-rate negotiation success ≥ [X]%; BER ≤ [BER] after [Ts] settling; eye @ Rx ≥ [EH]/[EW] at the operating preset.
After changing CTLE, OOB starts to miss-detect — why can a “low-speed burst” be affected? CTLE is not “data-only”
Quick check: capture OOB before and after the CTLE change and compare burst peak, edge slew, and gap baseline; repeat with “CTLE minimal / OOB-safe preset” and confirm whether miss-detect disappears. If miss-detect only happens on the backplane, suspect RL + CTLE peaking interaction.
Fix: keep a dedicated OOB-safe preset (minimal CTLE, relaxed squelch/idle thresholds) for link bring-up; then switch to the operating preset after stable link. If the device supports it, enable OOB pass-through / separate detection path behavior.
Pass criteria: OOB first-try pass ≥ [X]% across cables/slots; link_up_time P95 ≤ [T] ms; no increase in false-wake or miss-detect counters after CTLE changes.
Only a certain drive/cable vendor combination fails — how to build a minimal reproduction matrix? Avoid blind testing with a 2×2/3×3 matrix
Quick check: start with a minimal 2×2 matrix: (Drive A/B) × (Cable A/B) at the same backplane slot and the same preset/config CRC. Record OOB pass rate, link_up_time distribution (P50/P95), and BER (or error counters) per cell. Expand to 3×3 only if the 2×2 result is inconclusive.
Fix: define vendor-robust presets (wider safe window), and gate final selection by joint metrics (BER + OOB + event stability). If a single vendor pair fails consistently, treat it as an electrical compatibility constraint and document it in the support matrix.
Pass criteria: all matrix cells meet: OOB first-try pass ≥ [X]%, link_up_time P95 ≤ [T] ms, BER ≤ [BER] (or error count ≤ [E]) over [D] seconds.