123 Main Street, New York, NY 10001

Ethernet PHY (10/100/1G/2.5G) for TSN: Clocks, EEE, PTP

← Back to:Interfaces, PHY & SerDes

An Ethernet PHY is not just “link up or down” for TSN: it must keep time and latency bounded under real conditions. This page shows how to design, measure, and validate low-jitter clocks, deterministic latency, EEE side-effects, and PHY hardware timestamp paths so the link stays stable and the timing stays trustworthy.

What an Ethernet PHY is (and what TSN expects from it)

An Ethernet PHY is the physical-layer engine that turns MAC-side digital traffic into a compliant electrical link (and back), while managing clocks, training, link states, and—when required—hardware timestamp hooks that determine timing repeatability.

Scope boundaries (to prevent topic overlap)
In scope (this page)
  • Ethernet PHY functions for 10/100/1G/2.5G electrical links
  • Low-jitter clocking paths and how they affect determinism
  • EEE (802.3az) behavior as a timing/latency side-effect source
  • PHY-side hardware timestamp paths (for TSN timing use)
Not in scope (linked elsewhere)
  • TSN scheduling/shaping (switch/bridge algorithms and profiles)
  • MAC/driver stack tutorials (queues, OS networking, firmware)
  • Optical modules and high-speed SerDes beyond 2.5G
  • PoE system design (PSE/PD power delivery specifics)

PHY vs MAC: responsibilities that matter for timing repeatability

MAC (host-side digital)
  • Frame handling (Tx/Rx), CRC, flow-control exposure
  • Management interface control (e.g., MDIO register access)
  • May provide timestamping—but location can be far from the line
PHY (line-side mixed-signal + DSP)
  • Physical coding/decoding, analog front-end, link training
  • Clock recovery/PLL behavior that shapes jitter and lock stability
  • EEE state machine transitions (enter/exit) that can introduce steps
  • Hardware timestamp capture/insert points tied to physical events
Practical rule

Determinism is rarely limited by average throughput. It is limited by bounded behavior: clock integrity, latency variability, and timestamp integrity must be measurable, repeatable, and stable across environment changes.

TSN-friendly PHY: minimum capability checklist (bounded, measurable, repeatable)

A) Clock integrity
Quick check
Verify refclk quality meets the system jitter budget (< X) and lock is stable across temperature and cable conditions.
Engineering hook
Must expose lock/training state and (ideally) allow selecting a clean external refclk.
Pass criteria
No clock-related drops; recovered/derived clocks remain within the defined jitter envelope under stress.
B) Bounded latency
Quick check
Confirm latency is either fixed-mode or at least measurable/compensable; detect step events < X ns.
Engineering hook
Track FIFO alignment, training convergence, and EEE exit behavior as the primary variability sources.
Pass criteria
Under controlled A/B conditions, peak-to-peak latency variation stays below the system bound (X).
C) Timestamp integrity
Quick check
Determine where timestamps are captured (near the line vs host-side). Target resolution/precision < X.
Engineering hook
Identify clock-domain crossings and buffers between capture point and software-visible timestamp registers.
Pass criteria
Offset steps remain bounded (X) and do not correlate with EEE transitions, relinks, or temperature ramps.
Diagram: System layering and TSN-relevant hooks (clocks, EEE, timestamps)
Ethernet PHY system position and TSN hooks MAC to PHY to magnetics to cable layered diagram with highlighted refclk, tx/rx clocks, EEE gate, and timestamp capture/insertion points. Host SoC MAC Ethernet PHY PCS PMA/AFE PLL/CDR + EQ PTP TS EEE gate Magnetics Isolation Cable Connector Tx Rx refclk txclk rxclk timestamp path TVS

Modes & interfaces: 10/100/1G/2.5G and MAC-side buses

PHY selection becomes predictable when rate, MAC-side interface, and clocking are treated as one decision. The goal is not only link-up—it is repeatable timing behavior: bounded latency, stable clocks, and a timestamp path that stays consistent under stress.

Selection framing: rate × host interface × clocking × determinism risk

  • Parallel buses (e.g., RGMII): low algorithmic variability, but highly sensitive to skew, timing margins, and IO/power noise.
  • Serial buses (e.g., SGMII / 2500BASE-X): clock recovery and alignment can add bounded—but real—variability during lock, retraining, and transitions.
  • 2.5G choices: the “right” answer is the one that matches SoC capability and keeps the clock/timestamp path verifiable.

Quick map: speed → common MAC bus → clock notes → primary risk

10/100
MAC bus
MII / RMII
Clock notes
Lower edge rates; simpler timing closure
Primary risk
Misconfiguration, wiring, basic EMC/ESD paths
1G
MAC bus
RGMII or SGMII
Clock notes
Timing/skew closure matters; refclk quality starts to dominate
Primary risk
RGMII delay/skew or SGMII refclk/lock stability
2.5G
MAC bus
SGMII@2.5 / 2500BASE-X / (SoC-dependent)
Clock notes
Clock integrity and training behavior become first-order
Primary risk
Mode/partner compatibility, retraining steps, cable/magnetics margins
TSN focus
What matters
Timestamp capture point, bounded latency, stable clocks under EEE/link events
Pass criteria
Offset steps < X, latency variation < X, no relink-induced timing excursions

Common pitfalls (engineer-style: symptom → fastest check → fix direction)

RGMII internal/external delay mismatch
Symptom
Link appears up, but CRC errors or intermittent drops increase with temperature or load.
Fastest check
Force a stable mode; compare error counters across “delay on PHY vs delay on MAC” configurations (one-variable A/B).
Fix direction
Apply delay on only one side; tighten skew via routing and reduce IO supply noise coupling.
SGMII refclk quality or distribution instability
Symptom
Rare training failures, periodic relinks, or error bursts correlated with environment changes.
Fastest check
Swap clock source (clean XO vs shared clock tree) and correlate lock state + error counters.
Fix direction
Isolate clock routing, improve decoupling, and avoid aggressive sharing across noisy domains.
2.5G mode/partner compatibility (autoneg edge cases)
Symptom
Negotiates to 2.5G then falls back to 1G, or only fails with specific switches/cables.
Fastest check
Force speed/duplex where possible; test against a known-good partner and fixed cable length.
Fix direction
Align the intended 2.5G host-side interface and configuration straps; keep clocking deterministic.
Diagram: Speed → host interface → clocking → determinism risk (decision tree)
Ethernet PHY interface decision tree Flow diagram from target speed to SoC interface to clocking to determinism risk buckets. Target speed 10/100 · 1G · 2.5G SoC host bus MII/RMII · RGMII · SGMII Clocking plan refclk · lock · stability Determinism bounded latency + TS Risk bucket Skew-sensitive RGMII timing Risk bucket Refclk-sensitive SGMII lock Risk bucket Training-sensitive 2.5G modes

Inside the PHY: PCS/PMA, DSP blocks, and where determinism is lost

Deterministic timing depends on predictable internal paths. Ethernet PHYs combine analog front-end blocks, digitization, adaptive DSP, coding/decoding, and multiple clock domains. The same link can be “up” while timing repeatability is degraded by state transitions, adaptive convergence, and clock-recovery dynamics.

Internal block relationships (what each stage changes)

PMA/AFE + ADC/DAC
  • Defines line-side signal integrity and noise coupling points
  • Sets SNR margins that limit DSP headroom and lock robustness
  • Susceptible to supply noise and common-mode disturbances
DSP / EQ (adaptive)
  • Adaptive equalization converges differently across cables/temperature
  • Convergence and re-training can introduce step-like timing shifts
  • Error bursts often correlate with adaptation events
PCS (coding/decoding)
  • Controls symbol alignment, buffering, and link state transitions
  • Elastic buffers can create bounded but real variability
  • State changes (autoneg/relink) are common sources of steps
PLL/CDR (clock recovery)
  • Defines jitter tolerance and how refclk noise transfers into the link
  • Lock/relock dynamics create time windows of higher variability
  • Temperature drift can shift loop behavior and margins

“Determinism killers” (grouped by what they look like in measurements)

Step events
  • EEE exit / LPI transitions (state gate toggles)
  • Link retrain / renegotiation / buffer realignment
  • Timestamp capture point mode changes or re-sync
Fast check
Correlate timing steps with EEE/link-state counters and lock/relock events.
Short-term jitter
  • Refclk phase noise transfer through PLL/CDR
  • Supply noise coupling into PLL/AFE bias networks
  • Measurement settings that hide spurs or amplify artifacts
Fast check
Swap clock source and re-measure at multiple points (refclk vs MAC IF vs recovered clock).
Slow drift
  • Temperature drift shifting CDR loop behavior and margins
  • AFE bias drift or common-mode shifts under load
  • Clock-source frequency drift interacting with timestamp domains
Fast check
Log temperature and airflow changes and align them against drift slope (X per °C).
Key takeaway

A cleaner refclk mainly improves noise-floor-driven jitter. It does not automatically remove step events caused by state changes (EEE exit, retraining, buffer realignment) or slow drift driven by thermal behavior. Classify the symptom first.

Five internal observation points used throughout this page

① PLL/CDR lock events
Confirms whether timing excursions correlate with lock/relock windows and jitter tolerance margins.
② Training/EQ convergence
Detects adaptive convergence changes that often precede error bursts and latency variability.
③ EEE state transitions
Provides the first correlation test for step-like offset/latency jumps caused by LPI entry/exit.
④ Timestamp engine status
Identifies capture/insert mode, domain crossings, and whether corrections are applied consistently.
⑤ Error/quality counters
Separates physical-layer errors from higher-layer symptoms and supports one-variable A/B experiments.
Diagram: PHY internal pipeline + determinism-sensitive points (①–⑤)
Ethernet PHY internal blocks and determinism-sensitive points AFE to ADC to DSP/EQ to PCS to MAC interface, with PLL/CDR, EEE gate, PTP timestamp points, and observation points 1 to 5. AFE MDI ADC DSP / EQ adaptive PCS MAC IF RGMII/SGMII PLL / CDR lock · transfer refclk EEE gate PTP TS ingress/egress Observe: ① lock ② EQ ③ EEE ④ TS ⑤ err Steps → correlate state events · Jitter → correlate clocks · Drift → correlate temperature

Low-jitter clocks: reference, recovered clock, jitter transfer & tolerance

Clock quality is only useful when it is mapped to measurable points and bounded outcomes. A clock plan must specify which domain drives refclk, how PLL/CDR transfers noise, where clocks are observed, and what pass criteria define “good enough” under stress.

Clock sources and roles (engineer view)

Common sources
  • On-board crystal / oscillator (XO)
  • External low-jitter XO (dedicated)
  • SoC-generated clock (shared tree)
  • Synchronized clock feed (PHY-side only)
Key roles
  • refclk: sets PLL/CDR input noise and lock robustness
  • txclk/rxclk: defines host-side timing margins (parallel buses)
  • recovered clock: reflects line-side recovery behavior and tolerance
  • TS domain: timestamp engine clock-domain consistency
Practical boundary

Clean refclk reduces noise-floor-driven jitter. It does not guarantee removal of step events caused by state changes (EEE exit, relink, buffer realignment). Measurements must separate noise-floor problems from state-driven steps.

Where to measure (multi-point) and how to avoid artifacts

Measurement points
  • refclk (input)
  • PLL/CDR-related clock (internal/available output)
  • MAC IF clock domain (txclk/rxclk or serial stats)
  • Timestamp engine domain (TS clock path)
Typical tools
  • Oscilloscope (timing/skew sanity, clock edges)
  • Phase noise analyzer (integrated jitter windows)
  • PHY internal counters/statistics (locked-state correlation)
Artifact traps (fast flags)
  • RBW/VBW/window changes that “improve” plots while system behavior stays unchanged
  • Probing/ground loop effects that add spurs
  • Excess averaging that hides rare step events

Clock budget template (X placeholders) + first 3 correlation checks

Budget template
  • refclk jitter (integrated X1–X2): < X
  • PLL/CDR output jitter: < X
  • Host interface timing margin: > X
  • Worst-case Δjitter across stress: < X
  • Lock/relock stability window: < X
Pass criteria
The defined bounds remain valid with temperature, cable length, and EEE/link events.
Correlation checks
1) Swap clock source (A/B)
Confirms whether the symptom tracks refclk noise floor or remains unchanged (suggesting a state-driven step).
2) Move measurement point
Separates “clean refclk” from degraded MAC IF/TS domains, indicating internal crossings or gating effects.
3) Change RBW/VBW/window
Detects settings artifacts: if plots “improve” but link/timing behavior does not, the configuration is masking the true issue.
Diagram: Clock tree + measurement points (M1–M4) + tools
Ethernet PHY clock tree and measurement points Refclk sources to PLL/CDR to MAC interface and timestamp domains, with measurement points M1-M4 and tools. XO dedicated SoC clock shared tree Sync feed PHY-side PLL / CDR transfer · tolerance refclk MAC IF txclk / rxclk TS domain timestamp M1 M2 M3 M4 Scope edges / skew Phase noise RBW/VBW Internal stats counters

Deterministic latency: what varies, what can be bounded, how to measure

Deterministic latency is not a single number. It is a decomposable set of terms across TX/RX pipelines, buffers, adaptive convergence, and lock-related behavior. TSN readiness requires identifying which terms are fixed, which are configurable, which are adaptive, and how each term can be observed and bounded.

Latency pipeline map (engineer view)

TX path (host → line)
  • MAC IF → PCS → FIFO/buffer
  • DSP/EQ → PMA/AFE → MDI
  • State events can create step-like shifts
Link segment (line + peer)
  • Cable + magnetics + peer PHY behavior
  • Changes can alter EQ convergence
  • Peer state transitions matter for repeatability
RX path (line → host)
  • MDI → AFE/PMA → DSP/EQ
  • PCS → FIFO/buffer → MAC IF
  • Lock windows can distort short-term timing

Latency decomposition (fixed vs configurable vs adaptive) with observation hooks

Fixed terms (treat as constants in one mode)
Pipeline base delay
AFE/PCS basic pipeline latency under a fixed speed/mode.
Observe:
External end-to-end delay (stable conditions) + error counters (⑤) for sanity.
Constant interface path
Host-side interface timing remains constant when configuration is unchanged.
Observe:
MAC IF clock domain checks + stable skew measurements.
Configurable terms (can be pinned down)
Interface/mode settings
Speed, MAC-side bus, internal delay toggles, and buffer strategies.
Observe:
Configuration registers + repeatability across power cycles.
Timestamp capture mode
Capture/insert point selection changes the effective measured latency.
Observe:
Timestamp engine status (④) + stable end-to-end test under fixed traffic.
Adaptive terms (must be bounded by verification)
EQ convergence behavior
Cable/peer/temperature changes alter convergence and may create small but real timing variability.
Observe:
Training/EQ convergence indicators (②) aligned with latency traces.
Lock/state windows
Lock/relock and state transitions can cause step events and short windows of degraded repeatability.
Observe:
PLL/CDR lock events (①) + EEE/link-state counters (③) + error bursts (⑤).

How to bound variability (pin config, pin environment, bound state events)

Layer 1: configuration bounds
  • Freeze speed + MAC IF mode + delay toggles
  • Define EEE policy for test runs (on/off)
  • Lock peer device and firmware version
Layer 2: environmental bounds
  • Fix cable type/length and routing
  • Fix airflow and thermal state (soak defined)
  • Hold supply noise and load conditions constant
Layer 3: state-event bounds
  • Log state counters and lock events with timestamps
  • Separate step events from noise-floor jitter
  • Bound “risk windows” after relock/exit events (X)

Measurement methodology: one-variable A/B (repeatable and falsifiable)

Step 0 — lock the controls
Same cable, same peer, same temperature window, same traffic pattern, same power state.
Step 1 — change only one variable
Example: EEE on/off, clock source A/B, cable short/long, temperature step.
Step 2 — log internal hooks
Align latency traces with ① lock, ② EQ convergence, ③ EEE/link events, ⑤ error bursts.
Step 3 — classify the output shape
  • Step: suspect state transitions / buffers / EEE
  • Noise-floor: suspect clocks / supply / measurement settings
  • Drift: suspect thermal path and steady-state shifts
Diagram: Latency pipeline (MAC ↔ PHY ↔ MDI ↔ cable ↔ peer) with fixed/config/adaptive markers
Ethernet PHY latency pipeline with bounded variability markers MAC to PHY TX to MDI to cable to peer and back, showing fixed, configurable, and adaptive segments with legend. Legend: Fixed Config Adaptive MAC host PHY TX PCS/FIFO EQ MDI Cable peer-dependent PHY RX PCS/FIFO LOCK Bound variability by pinning config + controlling environment + correlating state events C A A

EEE (802.3az): power states, wake behavior, and side effects on timing

EEE is not a simple power-save toggle. It is a state machine that changes link behavior. The most important engineering task is to verify side effects: step-like latency changes around wake events, short-term jitter changes, and compatibility differences across peers.

EEE in practice: state machine and what to correlate

Active
  • Normal latency distribution (baseline)
  • Use as reference for A/B comparisons
  • Correlate with error counters (⑤)
LPI entry / LPI
  • Low-power idle behavior depends on traffic pattern
  • Exit behavior is the critical risk for timing
  • Track EEE state transitions (③)
Wake / recovery
  • Possible step in latency/offset around exit
  • Short risk window until stable behavior returns
  • Correlate with lock events (①) and error bursts (⑤)

Timing side effects (grouped by symptom shape) + fast discrimination

Step events
  • Latency/offset “jumps” near LPI exit
  • Buffer realignment and domain gating effects
  • Peer-dependent behavior across switches
Fast discrimination
Disable EEE and check whether the step disappears; then align the step time with EEE transitions (③).
Short-term jitter rise
  • Wake window shows degraded jitter tolerance
  • Clock-domain gating exposes noise coupling
  • Measurement settings can hide/overstate effects
Fast discrimination
Measure at multiple points and use the same RBW/VBW/window; compare EEE on/off under identical traffic.
Compatibility issues
  • Peer switch/port differences change stability
  • Increased error bursts or renegotiation events
  • Temperature/cable length amplifies differences
Fast discrimination
Hold traffic constant, swap peer device, and compare EEE transition counts (③) and error bursts (⑤).

EEE bring-up verification checklist (A/B, peer, cable, temperature)

A/B toggle
  • EEE off baseline vs EEE on
  • Same traffic pattern and rate
  • Record risk-window stability (X)
Log: ③ transitions, ① lock events, ⑤ bursts
Cable length & type
  • Short / medium / long runs (fixed per test)
  • Keep routing and shielding consistent
  • Watch EQ and wake sensitivity
Log: ② convergence, ③ transitions, end-to-end latency
Peer device matrix
  • Swap switch/port or peer PHY implementation
  • Hold firmware versions fixed per run
  • Detect peer-sensitive timing steps
Log: ⑤ counters, renegotiation events, step timestamps
Temperature / airflow
  • Cold start vs steady-state soak
  • Keep airflow direction constant
  • Bound drift slope (X per °C)
Log: temperature, ① lock, ③ transitions, latency drift
Diagram: EEE state timeline (Active ↔ LPI ↔ Wake) with measurement points and pass checks
EEE state timeline with wake risk window and measurement points Active to LPI entry to LPI to Wake to Active timeline with measurement points M1-M4 and pass checks. Active LPI entry transition LPI Wake exit Active time Risk window X M1 offset/lat M2 EEE state M3 lock M4 bursts Pass checks: no step stable in X no burst

PTP / gPTP hardware timestamping at the PHY: paths, errors, calibration

Hardware timestamping accuracy is determined by where the timestamp is captured/inserted, which clock domain is used, and which internal stages are included in the correction path. Engineering validation focuses on repeatability, bounded error terms, and correlation with link state transitions rather than protocol-level details.

Scope (PHY-side reality)
  • Capture/insert points and their relation to MDI
  • Error terms: bias, jitter, drift, event steps
  • Calibration/verification workflow and pass criteria
Not in scope (kept out to avoid cross-page overlap)
  • Protocol state machines and sync-tree behavior
  • BMCA, scheduling, or switch queue models
  • Network-wide configuration policies

Where timestamps happen: MAC vs PHY (engineering differences)

MAC timestamp
  • Capture point is closer to host processing
  • MAC↔PHY interface latency becomes an error term
  • FIFO/CDC effects can show as jitter or step
PHY timestamp
  • Capture point can be closer to MDI/line side
  • Internal pipeline + correction path must be consistent
  • CDC and state transitions must be bounded by tests
First checks
  • Disable EEE and check if steps disappear
  • Swap speed/cable length and verify predictable bias change
  • Align offset/latency changes with link/state counters

Timestamp error sources and an error-budget template

Bias (static offset)
  • Capture point relative to MDI and fixed pipeline terms
  • Mode/speed-specific baseline differences
  • Correction-field configuration mismatches
Budget line (placeholder)
Bias after calibration < X ns (per speed/mode).
Random jitter
  • FIFO/CDC phase relationship and sampling uncertainty
  • Clock noise mapped into timestamp domain
  • Measurement window and statistics definition matter
Budget line (placeholder)
Jitter (RMS) < X ns, p-p < X ns in window T.
Drift + event steps
  • Temperature-dependent delay and clock parameter shifts
  • EEE exit windows causing step-like offset/latency changes
  • Relock/retrain events affecting short-term timing
Budget lines (placeholders)
Drift < X ns/°C and event step < X ns.
After EEE exit, stable within X ms (no step beyond threshold).

Calibration and verification: minimal system → peer test → variable scan → full system

Level 0 — minimal board
  • Fix speed/mode and cable
  • EEE off baseline
  • Calibrate bias to target (X)
Level 1 — peer-to-peer
  • Two-ended comparison under fixed traffic
  • Validate correction settings consistency
  • Check peer sensitivity of bias/jitter
Level 2 — one-variable scan
  • Speed change (10/100/1G/2.5G)
  • Cable length/type change
  • EEE on/off and wake window checks
Level 3 — full system
  • Bring-up with real power/thermal conditions
  • Correlate offset with link/state counters
  • Prove bounded drift and no event steps beyond X
Diagram: Timestamp capture/insertion map (capture points, correction, and CDC)
Timestamp capture/insertion map with clock domains and correction Block diagram showing MAC IF, PCS, FIFO, DSP/EQ, PMA/AFE, MDI, capture and insert needles, and CDC bridges. Markers: CAPTURE INSERT CDC CORRECTION MAC IF PCS FIFO CDC DSP EQ PMA MDI MAC clk PHY clk Recovered clk CAPTURE INSERT correction path (pipeline + mode-dependent terms) Verification focus: lock config → measure bias/jitter/drift → correlate with CDC/state transitions → bound risk windows (X)

Analog front-end, magnetics, EMC/ESD/surge: making the link robust

Robust Ethernet links require treating the PHY AFE, magnetics, protection network, and connector region as a coupled system. Field failures often come from return-path mistakes and parasitics that convert differential energy into common-mode noise, especially at higher data rates such as 2.5G.

AFE + magnetics as a coupled system (loss, echo, common-mode conversion)

Differential channel
  • Insertion loss and return loss shape eye margin
  • Small impedance breaks can create reflections
  • Parasitic capacitance becomes visible at 2.5G
Common-mode channel
  • Diff-to-CM conversion drives EMI and sensitivity issues
  • Asymmetry and return-path discontinuities amplify CM
  • CMC helps only when placed and referenced correctly
2.5G “no more hand-waving”
  • ESD capacitance and stubs become first-order terms
  • Layout asymmetry directly impacts CM and robustness
  • Partitioning errors show as intermittent bursts

ESD & surge: real damage paths (energy + return path)

ESD path
  • Connector → protection → short return to reference
  • Long return loops inject noise into PHY reference
  • Wrong placement drives current through magnetics/PHY zone
Surge path
  • Energy must be steered to the external reference (chassis/PE)
  • Floating paths cause board ground lift and false failures
  • Partition boundaries prevent energy from crossing zones
Practical rule
Place protection near the connector, keep the return path short and controlled, and prevent ESD/surge current from entering the magnetics and PHY zones.

Protection selection dimensions (C, clamping, matching, placement)

Capacitance (C)
Lower C reduces eye/return-loss impact. At 2.5G, device parasitics and stubs are often dominant.
Clamping
Energy must be clamped into a controlled return path. If the return is long, clamping does not prevent reference bounce.
Differential matching
Symmetry keeps diff-to-CM conversion low. Protection arrays and routing must preserve pair balance.
Placement
Place near connector to capture energy early. Keep the return to chassis/PE or reference short and wide.

Three-zone layout partition: PHY zone → magnetics zone → connector zone

PHY zone
  • Keep reference stable and local
  • Preserve pair symmetry leaving the PHY
  • Prevent ESD/surge return from entering this zone
Magnetics zone
  • Maintain tight pair coupling and minimal stubs
  • Control impedance transitions across components
  • Place CMC only where it supports the CM strategy
Connector / external zone
  • Protection arrays close to connector
  • Short, wide return to chassis/PE/reference
  • Keep shield/ground strategy consistent and testable
Diagram: Board partition & protection placement (PHY → magnetics → connector) with ESD return path
Board partition and protection placement with ESD return path Three-zone layout showing PHY, magnetics, and connector regions with differential routing and ESD return strategy. PHY zone Magnetics zone Connector zone PHY AFE Magnetics XFMR CMC RJ45 TVS/ESD ESD return → CHASSIS/PE Keep ESD current out of PHY & magnetics zones Legend: diff pair ESD return partition

Bring-up & debug playbook: link training, autoneg, cable issues, diagnostics

Debugging Ethernet PHY stability is fastest when symptoms are routed into a small set of “root-cause buckets” using correlation checks first (peer swap, EEE off, forced speed). The goal is to reduce time-to-isolation: identify whether failures track the peer, the channel, the environment, or internal state transitions.

Symptom entry points (route first, then drill down)

Link flap
Frequent up/down or retrain loops. Start with correlation checks before deep logging.
Speed drop
Falls back to 100M/1G unexpectedly. Check autoneg outcome and partner capability.
Long cable only
Stable on short cable but fails at long run. Treat as channel-margin sensitivity first.
Hot only / soak
Pass at room temp but fails hot. Correlate with drift and state/event counters.
Chassis-only
Bench passes but fails in full system. Treat as power/ground/EMC coupling until proven otherwise.

Correlation-first triage (highest information gain)

Swap peer
If the issue follows the peer, bucket into compatibility/partner behavior.
EEE OFF
If steps/flaps disappear, bucket into LPI entry/exit windows and wake behavior.
Force speed
If forced mode is stable, bucket into autoneg/training process rather than steady-state channel.
Output of triage
Peer bucket
Different peer fixes it → compatibility or partner state behavior.
Channel bucket
Worse with length/type → magnetics/cable/impedance margin.
Power/state bucket
Temperature/chassis/EEE correlation → state/event coupling.

Autoneg failures: 5 common root causes (symptom → quickest check → next action)

1) Mode mismatch / straps
Likely cause: forced mode conflicts with partner autoneg or strap defaults.
Quick check: force a known-good speed/duplex and compare stability.
Fix: align autoneg policy and strap/config across boots.
Pass criteria: stable link with identical mode after reboot and peer swap.
2) Channel margin (cable/magnetics)
Likely cause: training/FLP is corrupted by loss/echo or poor term/magnetics.
Quick check: short-cable baseline vs long-cable failure reproduction.
Fix: isolate cable type, check magnetics + connector placement and symmetry.
Pass criteria: no renegotiation loops; error counters remain bounded.
3) Clock/power noise → state churn
Likely cause: marginal refclk or supply noise triggers retrain or false transitions.
Quick check: correlate flaps with counters/events; compare clean vs noisy power condition.
Fix: stabilize ref/power; retest under forced speed to separate autoneg vs steady-state.
Pass criteria: retrain count and drops stay below X per hour in window T.
4) Peer compatibility
Likely cause: partner implementation differences or strict corner behavior.
Quick check: swap peer across vendors/models; keep all other variables fixed.
Fix: lock negotiated subset, adjust advertisement policy, or select a compatible profile.
Pass criteria: identical results across peer set under same cable/temperature.
5) EEE interaction
Likely cause: LPI entry/exit timing causes transient instability or misclassification.
Quick check: EEE off A/B; look for steps at wake moments.
Fix: tune EEE policy or disable for deterministic timing applications.
Pass criteria: no offset step beyond X ns; no burst errors after wake.

Cable diagnostics / TDR: use it without false conclusions

When it is meaningful
  • Stable link state and fixed speed
  • Known cable type and baseline reference run
  • Trend comparison, not blind absolute distance
Common false-positive sources
  • Magnetics + CMC parasitics and routing stubs
  • ESD array capacitance near connector
  • Connector/patch-cord reflections
Correct workflow
  1. EEE off and forced speed baseline
  2. Capture a known-good reference signature
  3. Compare deltas under one-variable changes

Minimal log fields (enough to reproduce and correlate)

Link + mode
  • link up/down
  • speed / duplex
  • autoneg enabled / forced
State & features
  • EEE state (LPI, wake count)
  • timestamp state (enable/valid)
  • retrain / renegotiation reason
Error counters
  • CRC / PCS errors
  • symbol / alignment errors
  • drop count / burst markers
Context (minimal)
  • peer model/vendor
  • cable length/type
  • temperature point
Diagram: Debug decision flow (few branches, high-gain steps first)
Debug decision flow: correlation checks first Flowchart with a small number of yes/no decisions leading to root-cause buckets and next actions. Start: symptom (flap / drop speed / long cable / hot / chassis-only) Swap peer EEE OFF Force speed Peer changes result? EEE removes steps? Forced mode stable? Peer bucket compatibility advertise policy reason codes Channel bucket short cable baseline magnetics/ESD parasitics TDR delta compare Power/state bucket event correlation temp scan points counter bursts Always log: link state, speed/duplex, autoneg, EEE state, timestamp state, error counters, peer, cable, temperature

Compliance & validation: what to test, what to log, and pass/fail criteria

Validation should be structured as a matrix: conditions (cable, temperature, peer, power, feature states) × metrics (BER, drop rate, recovery, timestamp stability). The goal is to prevent “bench pass, chassis fail” by forcing worst-case corners and logging enough context to correlate failures to specific transitions.

Coverage dimensions (minimum set that prevents blind spots)

Signal & errors
  • BER / CRC / PCS errors
  • burst behavior and retrain count
  • drop rate over window T
Timing & timestamp
  • offset step during events
  • timestamp validity rate
  • recovery time after wake/relock
Environment & channel
  • short vs long cable
  • temperature corners
  • peer diversity (models/vendors)
Power disturbance
  • nominal vs ripple/step
  • mode transitions
  • correlate bursts with supply events
Feature states
  • EEE on/off
  • timestamp on/off
  • autoneg vs forced

Test matrix template (conditions × metrics)

Conditions (axis)
  • cable: short / long (X m)
  • temp: cold / room / hot (X °C)
  • peer: A / B / C
  • power: nominal / ripple (X)
  • EEE: on / off
  • timestamp: on / off
Metrics (axis)
  • BER / CRC / PCS errors
  • drop rate (X / hour)
  • recovery time (X ms)
  • offset step (X ns)
  • timestamp validity (X%)
  • retrain count (X / hour)
Corner priority
Mark high-risk corners explicitly (long + hot + EEE on + timestamp on). Run these first and gate release on them.

Preventing “bench pass, chassis fail” (force coupling paths into the matrix)

Why chassis differs
  • ground reference shifts and CM noise increases
  • thermal gradients and soak effects
  • power mode changes and fan/load transients
  • neighbor interface coupling
What to do
  • repeat key corners in the full system
  • log the same minimal fields as bring-up
  • align errors with state/event transitions
  • keep one-variable changes during debug

Pass/fail criteria templates (define the measurement window)

Offset stability
After event (EEE exit / retrain), in window T: offset step < X ns.
Drop rate
Over T hours: drops < X / hour.
Recovery time
From event to stable state: < X ms (stable means counters stop increasing and offset returns within range).
Timestamp validity
Valid timestamps > X% across corners (long + hot included).
Diagram: Validation matrix map (card-grid, not a dense table)
Validation matrix map (conditions × metrics) Card-grid matrix with highlighted corner cases and simple run/log/gate markers. Conditions Cable Temp Peer Power EEE TS Metrics BER Drops Rec Step Valid Retr High-risk corners RUN LOG RUN LOG GATE TS GATE TS GATE TS Legend: RUN = execute test, LOG = record minimal fields, GATE = release criterion corner

H2-11. Engineering checklist (design → bring-up → production)

This checklist turns all the page’s key points around determinism / low jitter / EEE / PTP / robustness into executable checks, gated by stage: Design gate prevents board re-spins, Bring-up gate prevents misdiagnosis, Production gate prevents “Monday yield collapse” and station-to-station mismatch.

Design gate · Build “determinism” into schematic + layout first

Clock / Jitter MAC interface Magnetics / Protection Power integrity
  • Quantify the refclk budget first: refclk phase noise/jitter must match the TSN timestamp error budget (placeholder: refclk jitter < X ps RMS, fill X per system budget), and explicitly define probe points (XO output pin / PHY refclk pin / MAC-side txclk).
  • Lock down one interface/timing strategy: for RGMII, choose either internal delay or board delay (one and only one); for SGMII/2500BASE-X, define refclk (25/125 MHz) + jitter requirement to avoid “eye passes but offset jumps.”
  • Place magnetics + protection by return-path logic: ESD/surge return must avoid the PHY ground-reference sensitive area; prioritize “clamp energy in the outside zone first” when setting distances between diff pairs, protection, and connector.
  • Partition rails + add low-noise post-regulation: for PHY analog/PLL rails, prefer a high-PSRR low-noise LDO as a second-stage cleanup; DC/DC switching ripple should support repeatable injection testing (placeholder: under ripple X mVpp, offset/jitter does not degrade).

Reference part numbers (examples; re-verify package/grade/availability)

PHY (candidates with PTP / timestamp capability)

  • Microchip LAN8841-V/Q2A (10/100/1G; RGMII/GMII/MII; IEEE 1588 v2 timestamp)
  • TI DP83867IRRGZ (10/100/1G; RGMII/SGMII; IEEE 1588 SFD / sync capability)
  • Marvell Alaska 88E1512/88E1514 (10/100/1G; IEEE 1588 v2 / 802.1AS; includes cable-diagnostics-class features)
  • MaxLinear MxL86288I (2.5GBASE-T multi-port; NBASE-T; for 2.5G TSN ports / multi-port use cases)

Clock source (25 MHz examples)

  • Abracon ASV-25.000MHZ-LC-T (25 MHz SMD oscillator)
  • SiTime SiT1602AI-22-33E-25.000000 (25 MHz MEMS oscillator; helps standardize across voltages/inventory)

Low-noise power (2nd-stage LDO examples)

  • TI TPS7A20 (high-PSRR low-noise LDO for PLL/analog post-regulation)
  • ADI ADP150 (ultra-low-noise LDO for sensitive analog/clock rail cleanup)

Connector magnetics / ESD protection (examples)

  • Würth 7499010121A (specific example within an RJ45 integrated-magnetics family; selection must match rate/pinout/EMC)
  • Pulse J0011D21BNL (10/100 Base-TX integrated-magnetics RJ45 example; for low-speed ports/control ports)
  • Nexperia PESD2ETH-D (dual-line low-cap ESD diode for high-speed lines)
  • Littelfuse SP3012-04UTG (specific example within a low/ultra-low-cap TVS array family)
  • Würth 744231371 (2-line common-mode choke example; “last-mile” tuning for cable common-mode noise)
  • TDK ACT45B-101-2P-TL003 (signal-line common-mode filter example; re-check insertion loss vs target band)

Note: part numbers are “implementation anchors + datasheet lookup helpers,” not universal answers. Within a single family, variants often differ by speed grade, pinout, capacitance, package suffix, etc.

Bring-up gate · Turn “autoneg/EEE/PTP” into repeatable A/B tests

  • Minimal-system baseline: fix cable / peer / temperature first; run the baseline of “lock speed + EEE OFF + PTP OFF,” and confirm BER/link-flap rate/error counters are zero or at the noise floor.
  • Add complexity one variable at a time: enable only one variable (EEE or PTP or speed) per step; every step must be roll-backable, reproducible, and explainable.
  • EEE validation focus: log LPI enter/exit counts and exit latency (placeholder: exit X µs), and check whether offset shows a step at the exit instant (placeholder: step X ns).
  • PTP timestamp validation focus: fix speed + cable length; do loopback/dual-end correlation; ensure the timestamp path is not distorted by FIFO/clock-domain crossings that introduce “rate-dependent bias,” and validate temperature-drift slope is calibratable.
  • Use cable diagnostics/TDR only as a locator: first remove EEE/peer/power-noise via correlation checks, then enable diagnostics—avoid mislabeling “clock/power issues” as “cable issues.”

Minimum fields to log during bring-up (for reproduction + correlation)

link up/down speed/duplex autoneg result EEE state/LPI count PTP enable + mode timestamp health CRC/FCS errors symbol/alignment errors die temp / board temp supply ripple snapshot

Production gate · Make “station consistency” a system capability

  • Fixture/cable standardization: define a golden cable (length/category/shielding/bend radius) and a golden peer (peer model/firmware/port config); re-check insertion loss and contact resistance drift weekly.
  • Mandatory environment fields in logs: temperature/humidity/airflow/enclosure door state/power-station ID/fixture version; otherwise “Monday yield collapse” is hard to root-cause.
  • Sampling must cover state-machine edges: frequent EEE in/out, frequent PTP servo updates, power-disturbance injection, long vs short cable A/B; avoid testing only steady-state.
  • Criteria must be numeric: flap rate X /hour; recovery time X ms; offset step X ns; temp drift slope X ns/°C (fill X per system budget).

The core of the production gate is not “testing more,” but controlling key variables and making every anomaly traceable to one of: cable / peer / environment / configuration / power.

Checklist map (stage-gated)

Blocks are “must-pass” gates for determinism + TSN readiness

Stage-gated checklist Three-stage pipeline showing Design, Bring-up, and Production gates with key checklist blocks. DESIGN GATE BRING-UP GATE PRODUCTION GATE Refclk budget & measurement points MAC IF timing strategy (RGMII/SGMII) Magnetics + ESD placement & return paths Power domains & low-noise LDO staging Layout rules: diff match / planes / stitching Baseline: lock speed, EEE OFF, PTP OFF Autoneg + counters + error taxonomy EEE state/timing A-B (LPI enter/exit) PTP timestamp health: loopback / dual-end Cable diagnostics used after correlation checks Golden peer + golden cable + fixture versioning Mandatory environment fields in logs Sampling covers state edges (EEE/PTP transitions) Stress knobs: ripple / temp / long cable / peer swap Numeric pass/fail: <X ns step, <X/hour flap Gate fails → roll back one variable → re-run baseline → only then escalate to deeper PHY internals

H2-12. Applications + IC selection logic (PHY-focused)

This section only covers Ethernet PHY-side application points and selection logic that strongly impact TSN/determinism: speed / interface / clock / EEE / PTP timestamp / EMC robustness / power & thermal. Protocol stack and switching/scheduling details are intentionally out of scope for this page.

H3-A. Typical applications (strongly PHY-related)

Industrial TSN endpoints (motion control / drives / I/O nodes)

  • Key focus: calibratable PHY hardware timestamp path + bounded delay jitter (FIFO/EEE/adaptation impacts are controlled).
  • Engineering anchor: coupling from refclk/power noise into timestamp/offset must be measurable, controllable, and reproducible.

Industrial controllers / gateways (single or dual ports)

  • Key focus: whether EEE is allowed (many deterministic systems force-disable it or bound policy) and link recovery time (placeholder: < X ms).
  • Engineering anchor: autoneg/downshift/peer-compat issues must be quickly classified via “EEE OFF / lock speed / swap peer.”

2.5G uplinks (multi-Gig backhaul / port aggregation)

  • Key focus: 2.5GBASE-T is more sensitive to cabling/magnetics/EMI; the capacitance + diff-matching “margin” is smaller.
  • Engineering anchor: prioritize a test matrix proving that under “cable length / temperature / power disturbance / EEE / PTP,” BER/flap rate and offset criteria still hold.

Automotive Ethernet note (avoid overlap with sibling pages)

If you later build a dedicated “Automotive Ethernet PHY” subpage, handle it via an internal link here. This page does not expand into T1/TC10/automotive topology or harness constraints to avoid topic overlap.

H3-B. Selection logic (scoring + decision tree)

Selection is not “comparing datasheet numbers,” but working backward from TSN goals: trustworthy timestamps, bounded latency, controlled EEE side effects, passable EMC/ESD/surge, maintainable production consistency.

Scoring dimensions (PHY-focused)

  • Speed & MAC interface: 10/100/1G/2.5G; RGMII/SGMII/2500BASE-X/USXGMII (choose based on real SoC/FPGA constraints).
  • PTP hardware timestamp: sampling point, available calibration/compensation mechanisms, GPIO/interrupt/1PPS support, driver/register accessibility.
  • Determinism controls: FIFO depth + bypass ability; whether adaptation/power-saving states impact latency in a bounded/observable way.
  • EEE behavior: LPI enter/exit timing, exit transient impact on offset/BER, peer-compat risk and configurable policy.
  • Robustness: ESD/surge path design difficulty, protection-capacitance tolerance, magnetics compatibility window, long-cable/high-temp stability.
  • Power & thermal: per-speed power, temperature rise, package thermal capability, number of power domains + cleanup cost.

Reference “starter set” (BOM anchors)

Ethernet PHY candidates

LAN8841-V/Q2A DP83867IRRGZ 88E1512/88E1514 MxL86288I

Clock + power cleanup

ASV-25.000MHZ-LC-T SiT1602AI-22-33E-25.000000 TPS7A20 ADP150

Magnetics + protection

7499010121A J0011D21BNL PESD2ETH-D SP3012-04UTG 744231371 ACT45B-101-2P-TL003

Recommended workflow: use the “starter set” to pass bring-up + the verification matrix first, then converge magnetics/protection by EMC/power/cost. Replace the PHY itself only when “timestamp/interface/diagnostics capability” is insufficient.

Selection decision tree (TSN-first)

Short labels only; details stay in text to keep the diagram clean

Ethernet PHY selection decision tree Decision tree starting from TSN need, then timestamp requirement, speed, MAC interface, EEE policy, and EMC robustness leading to example PHY choices. Start: TSN / determinism required? Need PHY hardware timestamp? Max link rate: 1G or 2.5G? MAC-side constraint (RGMII/SGMII/others) EEE policy (OFF / bounded / allowed) 1G TSN-friendly PHY set Examples: LAN8841 DP83867 88E151x Focus: timestamp path + bounded latency + diagnostics 2.5G TSN / multi-port PHY set Example: MxL86288I Focus: magnetics/ESD capacitance budget + EMC margin Board-level anchors (common) RJ45/magnetics: 7499010121A J0011D21BNL ESD: PESD2ETH-D SP3012-04UTG

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13. FAQs (TSN / EEE / PTP / determinism — PHY-focused)

Each FAQ is intentionally short and executable (no protocol deep-dives): Likely causeQuick checkFixPass criteria (thresholds use placeholders “X” to be filled by the system budget). Example part numbers are provided as “BOM anchors” only (e.g., PESD2ETH-D, SP3012-04UTG, 744231371, TPS7A20, ADP150, SiT1602) — verify package/suffix/ratings/availability.

PTP / EEE PTP offset occasionally “steps” — suspect EEE exit or inconsistent timestamp capture point?
Likely cause A transient on EEE (LPI) exit changes the timestamp path/clock domain behavior, or the system mixes MAC-side vs PHY-side capture points across builds/ports.
Quick check (1) A/B: EEE OFF vs ON, count offset steps per hour and align each step to LPI exit events. (2) Confirm one consistent capture mode: PHY HW timestamp enabled (or MAC), not mixed.
Fix Keep a deterministic policy: disable EEE for TSN-critical windows, or enforce a bounded EEE policy; standardize on a single timestamp capture point and apply the vendor’s correction/calibration flow. If supply/clock coupling is suspected, isolate PLL/clock rails with a low-noise LDO stage (e.g., TPS7A20 or ADP150).
Pass criteria With EEE policy applied: offset step < X ns and step rate < X/hour over a soak window T, with steps not correlated to LPI exit.
Correlation Same board, different switch peer → very different PTP accuracy. What is the first end-to-end correlation check?
Likely cause The peer is driving different EEE advertisement/policy, different timestamping behavior, or different link conditions (rate/duplex, downshift, retries), which changes the effective time error seen at the endpoint.
Quick check Normalize the physical layer first: (1) lock the same speed/duplex; (2) A/B with EEE forced OFF on both ends; (3) compare offset jitter while logging LPI enter/exit counts and link partner ID/capabilities.
Fix Standardize port policy: enforce the same EEE behavior, timestamp mode (PHY HW timestamp vs MAC), and link configuration across switches. If the switch peer is non-negotiable, prioritize a PHY with robust HW timestamp + diagnostics (e.g., LAN8841, DP83867, 88E151x) and validate per-peer profiles.
Pass criteria Under normalized settings: offset jitter < X ns RMS and no peer-dependent bias > X ns over window T.
2.5G / AN 2.5G links up, but later drops to 1G. Check autoneg first, or cable/magnetics bandwidth first?
Likely cause A marginal 2.5G channel triggers downshift / renegotiation (cable loss, magnetics limits, excess ESD capacitance, or supply/thermal drift).
Quick check (1) Read and log downshift / AN-restart counters at the event time. (2) A/B: known-good short Cat6 cable vs the failing cable; then A/B: force 2.5G (no AN) if supported.
Fix If channel-limited: tighten magnetics/EMC BOM and placement, reduce line capacitance (e.g., replace a high-C array with a low-C Ethernet TVS such as PESD2ETH-D or SP3012-04UTG where appropriate), and tune common-mode control (e.g., a CMC like Würth 744231371 as a tuning knob). If AN-limited: standardize advertisements and disable aggressive downshift policies.
Pass criteria At 2.5G worst-case: downshift events = 0 and link uptime > (100% − X) over T, with RX/PCS errors below threshold X.
Long / Hot Link flaps only on long cable or at high temperature — which 3 PHY states/counters should be logged first?
Likely cause A reduced margin corner exposes channel loss + EQ/CDR sensitivity, or supply/thermal drift that triggers renegotiation, retrain, or LOS events.
Quick check Log these three “first responders”: (1) link state + negotiated speed/duplex; (2) AN restart / downshift / retrain counters; (3) RX-side error counters (PCS/alignment and FCS/CRC). Correlate flap events with die temperature and supply ripple snapshots.
Fix If errors rise before the flap: improve channel margin (better cable category, magnetics choice/placement, reduce protection capacitance, tune CMC). If no errors but AN restarts spike: stabilize power/thermal and pin the link policy (lock speed for validation; then re-enable AN only after margin is proven).
Pass criteria Worst-case (long + hot): link flap < X/hour, AN restart = 0 (or < X) over window T, and error counters remain below X.
EEE EEE ON: throughput is fine, but latency jitter increases. How to prove it is caused by EEE state transitions?
Likely cause The Active ↔ LPI ↔ Wake transitions change buffering and clock behavior, adding a bursty component to latency even if average throughput remains unchanged.
Quick check (1) Collect a latency histogram with timestamps of LPI enter/exit; overlay jitter spikes on exit events. (2) A/B: EEE OFF vs ON at the same link rate and traffic pattern; keep all other variables fixed.
Fix Disable EEE for TSN-critical ports, or raise the EEE entry threshold so LPI is rare during deterministic traffic. If EEE must remain enabled, require a bounded wake behavior and validate across peers and temperature (do not assume peer compatibility).
Pass criteria With the final EEE policy: p99.9 latency jitter < X ns and wake recovery < X ms over window T, with no jitter bursts aligned to LPI exit.
RGMII RGMII shows occasional CRC errors, but the line side looks clean. Check internal delay / routing skew or I/O supply noise first?
Likely cause Most often it is RGMII timing margin (wrong internal delay mode, skew, edge rate) — but a close second is I/O rail noise / ground bounce corrupting sampling.
Quick check (1) A/B: toggle the PHY’s RGMII internal delay configuration (ID on/off) and see whether CRC errors track the setting. (2) Scope the I/O rail ripple and correlate CRC bursts with rail noise and simultaneous switching events.
Fix Lock one timing strategy: correct internal delay mode + enforce length matching; add modest series damping where needed (board-specific). If rail noise is implicated, strengthen decoupling and isolate sensitive rails with a clean LDO stage (e.g., TPS7A20) and tighten the ground return stitching near the PHY.
Pass criteria At worst-case traffic and temperature: CRC errors = 0 over T, and timing margin remains > X (as defined by the interface budget).
SGMII SGMII “looks locked” but occasionally fails training — verify refclk quality first, or lane termination first?
Likely cause If training failures are rare and temperature/voltage-dependent, refclk jitter / supply noise is often the first suspect; if failures are deterministic by board/cable, then lane SI/termination is more likely.
Quick check (1) Swap to a known low-jitter 25 MHz source (e.g., SiT1602…25.000000 or ASV-25.000MHZ-LC-T) and re-run training statistics. (2) If still failing, validate lane termination/coupling against the PHY reference design and check for skew/return-path discontinuities.
Fix Improve refclk integrity (clean routing, isolation, stable rail; add a low-noise LDO stage such as ADP150 where applicable) and match the recommended SGMII termination/AC coupling scheme. Then lock the interface configuration (no mixed modes across builds).
Pass criteria Training failures = 0 over soak T across temperature range, and interface error counters remain below X.
EMI EMI fails in one band — suspect common-mode leakage or clock-harmonic coupling? What is the “one-step” validation?
Likely cause A narrow band peak is often either clock-harmonic coupling (refclk/PLL domains) or common-mode conversion at magnetics/connector due to return-path asymmetry.
Quick check Do a single A/B change while measuring the failing band: populate/bypass the common-mode choke (e.g., try a CMC option such as 744231371) to see if the peak moves/drops. If it does not, A/B the clock source path (swap to a cleaner oscillator like SiT1602) and observe peak correlation.
Fix If common-mode dominated: enforce symmetry and return stitching, tune CMC, and reduce parasitic imbalance near the connector. If clock-harmonic dominated: shorten and shield the clock path, isolate the clock/PLL rail with a clean LDO stage (TPS7A20/ADP150), and reduce coupling loops in layout.
Pass criteria EMI margin at the failing band improves to ≥ X dB under the same test setup, with no regression in link stability or TSN timing metrics.
ESD ESD hit drops the link, but reconnect is normal. What is the most common return-path mistake?
Likely cause ESD current returns through the PHY reference ground (or sensitive analog/PLL ground) instead of being dumped at the connector/chassis region, causing a transient reset/LOS/renegotiation.
Quick check Verify placement and return: TVS arrays (e.g., PESD2ETH-D / SP3012-04UTG) should be connector-side with a short, low-inductance return to the intended ESD sink (often chassis/connector ground). Correlate link drops with PHY reset/interrupt flags.
Fix Re-route the ESD return so the discharge loop closes locally at the connector region; add stitching vias/short paths; keep the PHY zone “quiet.” Use low-capacitance protection where the channel margin is tight, and re-validate at the worst-case link rate.
Pass criteria Under the target ESD level: no link drop (preferred), or auto-recovery < X ms with no speed downgrade and no persistent error-counter increase.
Diagnostics Cable diagnostics/TDR reports a short, but swapping the cable “fixes it.” How to tell measurement artifact vs real defect?
Likely cause The diagnostic ran under inconsistent conditions (link/EEE state, peer behavior, calibration), or the connector contact is intermittent — creating a false short signature.
Quick check Run diagnostics only in a controlled state: EEE OFF, stable link policy (or link forced down per vendor guidance), and with a known “golden cable.” Repeat N times; if the result is not repeatable, treat it as an artifact.
Fix Standardize the diagnostic procedure (state + temperature + cable type), update PHY firmware/driver if required, and treat intermittent connector contact as a hardware defect. Use diagnostics as a secondary tool after correlation checks, not as the first root-cause verdict.
Pass criteria Under the standardized procedure: false-short rate = 0 / N, and any reported defect is repeatable across runs and correlates with independent evidence (errors/flaps).
Chassis PTP is stable on the bench, but drift increases in the chassis. First priority: airflow baffle, ground bounce, or supply noise?
Likely cause In chassis, two new couplings dominate: supply noise / ground bounce from shared rails and return paths, and temperature gradients (airflow) that change oscillator/PLL behavior.
Quick check Start with the highest information gain: (1) log PHY die temp and PLL/I/O rail ripple at the same cadence as drift; (2) A/B: power from a clean bench supply vs system supply; (3) only then A/B a simple airflow baffle.
Fix If drift tracks supply/ground: add a dedicated low-noise stage for clock/PLL rails (TPS7A20 / ADP150), improve return stitching, and reduce shared high-di/dt coupling. If drift tracks temperature gradients: improve airflow guidance and keep the clock source stable (e.g., a robust oscillator like SiT1602).
Pass criteria In chassis worst-case: drift slope < X ns/°C and offset stability meets the system requirement over soak T, with no correlation to rail ripple beyond X mVpp.
Field / Logs Compliance passes, but the field shows intermittent downshift. Which log field is usually missing (EEE / PTP / temperature / cable info)?
Likely cause Field downshift is often triggered by a state transition or corner condition that compliance didn’t exercise (peer diversity, EEE policy, supply/thermal drift, cable variability).
Quick check Compare “good vs bad” field captures: if downshift events exist but EEE state (LPI enter/exit count at event time) is missing, root-cause classification becomes impossible. Correlate downshift with temperature and cable category/length if available.
Fix Add EEE state + LPI enter/exit counters as mandatory fields, plus the peer ID/capabilities and a minimal cable descriptor. Then re-run the validation matrix at the field-relevant corner (long + hot + peer diversity + EEE policy).
Pass criteria Field events become classifiable: each downshift is attributable to a bucket (EEE/peer/cable/temp/supply) with > X% confidence, and the final policy holds downshift < X/hour over T.