Ethernet PHY (10/100/1G/2.5G) for TSN: Clocks, EEE, PTP

Q: PTP offset occasionally “steps” — suspect EEE exit or inconsistent timestamp capture point?

Likely cause: EEE (LPI) exit transient changes timestamp/clock-domain behavior, or MAC-side vs PHY-side capture points are mixed. Quick check: A/B EEE OFF vs ON and align offset steps to LPI exit events; confirm one consistent timestamp capture mode. Fix: Disable or bound EEE for TSN windows; standardize the capture point and apply vendor correction; isolate PLL/clock rails with a low-noise stage (e.g., TPS7A20 or ADP150) if coupling is suspected. Pass criteria: offset step < X ns and step rate < X/hour over soak window T, with steps not correlated to LPI exit.

Q: Same board, different switch peer → very different PTP accuracy. What is the first end-to-end correlation check?

Likely cause: Peer EEE advertisement/policy, timestamp behavior, or link conditions differ and change the observed time error. Quick check: Normalize PHY layer first—lock speed/duplex and force EEE OFF on both ends—then compare offset while logging LPI counts and peer capabilities. Fix: Standardize port policy (EEE, timestamp mode, link config) across switches; validate per-peer profiles using TSN-friendly PHYs (e.g., LAN8841, DP83867, 88E151x) when needed. Pass criteria: offset jitter X ns over window T under normalized settings.

Q: 2.5G links up, but later drops to 1G. Check autoneg first, or cable/magnetics bandwidth first?

Likely cause: Marginal 2.5G channel triggers downshift/renegotiation (cable loss, magnetics limits, protection capacitance, or thermal/supply drift). Quick check: Log downshift/AN-restart counters at the event; A/B known-good short Cat6 vs failing cable; A/B force 2.5G (no AN) if supported. Fix: Improve channel margin (magnetics/placement, lower-cap TVS such as PESD2ETH-D or SP3012-04UTG where appropriate, tune CMC such as 744231371); standardize AN advertisements/downshift policy. Pass criteria: Downshift events = 0 and link uptime > (100% − X) over T at 2.5G; RX/PCS errors remain below X.

Q: Link flaps only on long cable or at high temperature — which 3 PHY states/counters should be logged first?

Likely cause: Reduced margin corner exposes channel loss + EQ/CDR sensitivity or supply/thermal drift that triggers renegotiation/retrain/LOS. Quick check: Log (1) link state + negotiated speed/duplex, (2) AN restart/downshift/retrain counters, (3) RX-side PCS/alignment and FCS/CRC counters; correlate events with die temperature and rail ripple. Fix: If errors rise before flap, increase channel margin (cable/magnetics/ESD capacitance/CMC tuning). If AN restarts spike without errors, stabilize supply/thermal and pin link policy for validation. Pass criteria: Link flap < X/hour and AN restart = 0 (or < X) over T at long+hot; error counters remain below X.

Q: EEE ON: throughput is fine, but latency jitter increases. How to prove it is caused by EEE state transitions?

Likely cause: Active↔LPI↔Wake transitions change buffering and clock behavior, adding bursty latency components. Quick check: Overlay latency histogram spikes with LPI enter/exit timestamps; A/B EEE OFF vs ON at the same rate and traffic. Fix: Disable EEE for TSN-critical ports or bound EEE behavior (raise entry threshold; validate peer compatibility). Pass criteria: p99.9 latency jitter < X ns and wake recovery < X ms over T, with no jitter bursts aligned to LPI exit.

Q: RGMII shows occasional CRC errors, but the line side looks clean. Check internal delay / routing skew or I/O supply noise first?

Likely cause: RGMII timing margin (wrong internal delay mode, skew, edge rate) or I/O rail noise/ground bounce corrupting sampling. Quick check: A/B toggle RGMII internal delay (ID on/off) and observe CRC correlation; scope I/O rail ripple and align CRC bursts with noise. Fix: Lock a single timing strategy (correct ID + length match); add damping where needed; strengthen decoupling and isolate rails with a clean stage (e.g., TPS7A20) if rail noise is implicated. Pass criteria: CRC errors = 0 over T at worst-case conditions; interface margin remains > X per budget.

Q: SGMII “looks locked” but occasionally fails training — verify refclk quality first, or lane termination first?

Likely cause: Rare, temperature/voltage-sensitive failures often point to refclk jitter/supply noise; deterministic failures point to lane SI/termination. Quick check: Swap to a known low-jitter 25 MHz source (e.g., SiT1602…25.000000 or ASV-25.000MHZ-LC-T) and compare training statistics; then validate termination/coupling and return-path continuity. Fix: Improve refclk integrity and stabilize rails (e.g., ADP150 where applicable); match the reference termination/AC coupling scheme; avoid mixed interface modes. Pass criteria: Training failures = 0 over T across temperature; interface error counters remain below X.

Q: EMI fails in one band — suspect common-mode leakage or clock-harmonic coupling? What is the “one-step” validation?

Likely cause: Narrow band peak is commonly clock-harmonic coupling or common-mode conversion at magnetics/connector due to asymmetry. Quick check: Single A/B change at the failing band—populate/bypass a CMC option (e.g., 744231371). If the peak does not change, A/B the clock source/path (e.g., swap to SiT1602) and observe correlation. Fix: Common-mode: enforce symmetry and return stitching, tune CMC. Clock-harmonic: shorten/shield clock path, isolate PLL rails (TPS7A20/ADP150), reduce coupling loops. Pass criteria: EMI margin ≥ X dB at the failing band with no regression in link stability or TSN timing.

Q: ESD hit drops the link, but reconnect is normal. What is the most common return-path mistake?

Likely cause: ESD current returns through the PHY reference ground instead of being dumped at connector/chassis, causing transient reset/LOS/renegotiation. Quick check: Verify TVS placement and low-inductance return (e.g., PESD2ETH-D / SP3012-04UTG connector-side) and correlate link drops with reset/interrupt flags. Fix: Re-route the discharge loop locally at the connector region; add stitching vias and keep the PHY zone quiet; use low-capacitance protection where margin is tight. Pass criteria: No link drop (preferred) or auto-recovery < X ms with no speed downgrade and no persistent counter growth at the target ESD level.

Q: Cable diagnostics/TDR reports a short, but swapping the cable “fixes it.” How to tell measurement artifact vs real defect?

Likely cause: Diagnostics ran under inconsistent conditions (EEE/link/peer/calibration), or intermittent connector contact created a false short. Quick check: Run diagnostics only under a controlled state (EEE OFF, standardized policy) using a golden cable; repeat N times—non-repeatable results indicate artifact. Fix: Standardize the procedure, update firmware/driver if required, and treat intermittent contacts as hardware defects; use diagnostics after correlation checks. Pass criteria: False-short rate = 0/N under the standardized procedure; any defect is repeatable and correlates with independent evidence.

← Back to:Interfaces, PHY & SerDes

An Ethernet PHY is not just “link up or down” for TSN: it must keep time and latency bounded under real conditions. This page shows how to design, measure, and validate low-jitter clocks, deterministic latency, EEE side-effects, and PHY hardware timestamp paths so the link stays stable and the timing stays trustworthy.

What an Ethernet PHY is (and what TSN expects from it)

An Ethernet PHY is the physical-layer engine that turns MAC-side digital traffic into a compliant electrical link (and back), while managing clocks, training, link states, and—when required—hardware timestamp hooks that determine timing repeatability.

Scope boundaries (to prevent topic overlap)

In scope (this page)

Ethernet PHY functions for 10/100/1G/2.5G electrical links
Low-jitter clocking paths and how they affect determinism
EEE (802.3az) behavior as a timing/latency side-effect source
PHY-side hardware timestamp paths (for TSN timing use)

Not in scope (linked elsewhere)

TSN scheduling/shaping (switch/bridge algorithms and profiles)
MAC/driver stack tutorials (queues, OS networking, firmware)
Optical modules and high-speed SerDes beyond 2.5G
PoE system design (PSE/PD power delivery specifics)

PHY vs MAC: responsibilities that matter for timing repeatability

MAC (host-side digital)

Frame handling (Tx/Rx), CRC, flow-control exposure
Management interface control (e.g., MDIO register access)
May provide timestamping—but location can be far from the line

PHY (line-side mixed-signal + DSP)

Physical coding/decoding, analog front-end, link training
Clock recovery/PLL behavior that shapes jitter and lock stability
EEE state machine transitions (enter/exit) that can introduce steps
Hardware timestamp capture/insert points tied to physical events

Practical rule

Determinism is rarely limited by average throughput. It is limited by bounded behavior: clock integrity, latency variability, and timestamp integrity must be measurable, repeatable, and stable across environment changes.

TSN-friendly PHY: minimum capability checklist (bounded, measurable, repeatable)

A) Clock integrity

Quick check

Verify refclk quality meets the system jitter budget (< X) and lock is stable across temperature and cable conditions.

Engineering hook

Must expose lock/training state and (ideally) allow selecting a clean external refclk.

Pass criteria

No clock-related drops; recovered/derived clocks remain within the defined jitter envelope under stress.

B) Bounded latency

Quick check

Confirm latency is either fixed-mode or at least measurable/compensable; detect step events < X ns.

Engineering hook

Track FIFO alignment, training convergence, and EEE exit behavior as the primary variability sources.

Pass criteria

Under controlled A/B conditions, peak-to-peak latency variation stays below the system bound (X).

C) Timestamp integrity

Quick check

Determine where timestamps are captured (near the line vs host-side). Target resolution/precision < X.

Engineering hook

Identify clock-domain crossings and buffers between capture point and software-visible timestamp registers.

Pass criteria

Offset steps remain bounded (X) and do not correlate with EEE transitions, relinks, or temperature ramps.

Diagram: System layering and TSN-relevant hooks (clocks, EEE, timestamps)

Modes & interfaces: 10/100/1G/2.5G and MAC-side buses

PHY selection becomes predictable when rate, MAC-side interface, and clocking are treated as one decision. The goal is not only link-up—it is repeatable timing behavior: bounded latency, stable clocks, and a timestamp path that stays consistent under stress.

Selection framing: rate × host interface × clocking × determinism risk

Parallel buses (e.g., RGMII): low algorithmic variability, but highly sensitive to skew, timing margins, and IO/power noise.
Serial buses (e.g., SGMII / 2500BASE-X): clock recovery and alignment can add bounded—but real—variability during lock, retraining, and transitions.
2.5G choices: the “right” answer is the one that matches SoC capability and keeps the clock/timestamp path verifiable.

Quick map: speed → common MAC bus → clock notes → primary risk

10/100

MAC bus

MII / RMII

Clock notes

Lower edge rates; simpler timing closure

Primary risk

Misconfiguration, wiring, basic EMC/ESD paths

MAC bus

RGMII or SGMII

Clock notes

Timing/skew closure matters; refclk quality starts to dominate

Primary risk

RGMII delay/skew or SGMII refclk/lock stability

2.5G

MAC bus

SGMII@2.5 / 2500BASE-X / (SoC-dependent)

Clock notes

Clock integrity and training behavior become first-order

Primary risk

Mode/partner compatibility, retraining steps, cable/magnetics margins

TSN focus

What matters

Timestamp capture point, bounded latency, stable clocks under EEE/link events

Pass criteria

Offset steps < X, latency variation < X, no relink-induced timing excursions

Common pitfalls (engineer-style: symptom → fastest check → fix direction)

RGMII internal/external delay mismatch

Symptom

Link appears up, but CRC errors or intermittent drops increase with temperature or load.

Fastest check

Force a stable mode; compare error counters across “delay on PHY vs delay on MAC” configurations (one-variable A/B).

Fix direction

Apply delay on only one side; tighten skew via routing and reduce IO supply noise coupling.

SGMII refclk quality or distribution instability

Symptom

Rare training failures, periodic relinks, or error bursts correlated with environment changes.

Fastest check

Swap clock source (clean XO vs shared clock tree) and correlate lock state + error counters.

Fix direction

Isolate clock routing, improve decoupling, and avoid aggressive sharing across noisy domains.

2.5G mode/partner compatibility (autoneg edge cases)

Symptom

Negotiates to 2.5G then falls back to 1G, or only fails with specific switches/cables.

Fastest check

Force speed/duplex where possible; test against a known-good partner and fixed cable length.

Fix direction

Align the intended 2.5G host-side interface and configuration straps; keep clocking deterministic.

Diagram: Speed → host interface → clocking → determinism risk (decision tree)

Inside the PHY: PCS/PMA, DSP blocks, and where determinism is lost

Deterministic timing depends on predictable internal paths. Ethernet PHYs combine analog front-end blocks, digitization, adaptive DSP, coding/decoding, and multiple clock domains. The same link can be “up” while timing repeatability is degraded by state transitions, adaptive convergence, and clock-recovery dynamics.

Internal block relationships (what each stage changes)

PMA/AFE + ADC/DAC

Defines line-side signal integrity and noise coupling points
Sets SNR margins that limit DSP headroom and lock robustness
Susceptible to supply noise and common-mode disturbances

DSP / EQ (adaptive)

Adaptive equalization converges differently across cables/temperature
Convergence and re-training can introduce step-like timing shifts
Error bursts often correlate with adaptation events

PCS (coding/decoding)

Controls symbol alignment, buffering, and link state transitions
Elastic buffers can create bounded but real variability
State changes (autoneg/relink) are common sources of steps

PLL/CDR (clock recovery)

Defines jitter tolerance and how refclk noise transfers into the link
Lock/relock dynamics create time windows of higher variability
Temperature drift can shift loop behavior and margins

“Determinism killers” (grouped by what they look like in measurements)

Step events

EEE exit / LPI transitions (state gate toggles)
Link retrain / renegotiation / buffer realignment
Timestamp capture point mode changes or re-sync

Fast check

Correlate timing steps with EEE/link-state counters and lock/relock events.

Short-term jitter

Refclk phase noise transfer through PLL/CDR
Supply noise coupling into PLL/AFE bias networks
Measurement settings that hide spurs or amplify artifacts

Fast check

Swap clock source and re-measure at multiple points (refclk vs MAC IF vs recovered clock).

Slow drift

Temperature drift shifting CDR loop behavior and margins
AFE bias drift or common-mode shifts under load
Clock-source frequency drift interacting with timestamp domains

Fast check

Log temperature and airflow changes and align them against drift slope (X per °C).

Key takeaway

A cleaner refclk mainly improves noise-floor-driven jitter. It does not automatically remove step events caused by state changes (EEE exit, retraining, buffer realignment) or slow drift driven by thermal behavior. Classify the symptom first.

Five internal observation points used throughout this page

① PLL/CDR lock events

Confirms whether timing excursions correlate with lock/relock windows and jitter tolerance margins.

② Training/EQ convergence

Detects adaptive convergence changes that often precede error bursts and latency variability.

③ EEE state transitions

Provides the first correlation test for step-like offset/latency jumps caused by LPI entry/exit.

④ Timestamp engine status

Identifies capture/insert mode, domain crossings, and whether corrections are applied consistently.

⑤ Error/quality counters

Separates physical-layer errors from higher-layer symptoms and supports one-variable A/B experiments.

Diagram: PHY internal pipeline + determinism-sensitive points (①–⑤)

Low-jitter clocks: reference, recovered clock, jitter transfer & tolerance

Clock quality is only useful when it is mapped to measurable points and bounded outcomes. A clock plan must specify which domain drives refclk, how PLL/CDR transfers noise, where clocks are observed, and what pass criteria define “good enough” under stress.

Clock sources and roles (engineer view)

Common sources

On-board crystal / oscillator (XO)
External low-jitter XO (dedicated)
SoC-generated clock (shared tree)
Synchronized clock feed (PHY-side only)

Key roles

refclk: sets PLL/CDR input noise and lock robustness
txclk/rxclk: defines host-side timing margins (parallel buses)
recovered clock: reflects line-side recovery behavior and tolerance
TS domain: timestamp engine clock-domain consistency

Practical boundary

Clean refclk reduces noise-floor-driven jitter. It does not guarantee removal of step events caused by state changes (EEE exit, relink, buffer realignment). Measurements must separate noise-floor problems from state-driven steps.

Where to measure (multi-point) and how to avoid artifacts

Measurement points

refclk (input)
PLL/CDR-related clock (internal/available output)
MAC IF clock domain (txclk/rxclk or serial stats)
Timestamp engine domain (TS clock path)

Typical tools

Oscilloscope (timing/skew sanity, clock edges)
Phase noise analyzer (integrated jitter windows)
PHY internal counters/statistics (locked-state correlation)

Artifact traps (fast flags)

RBW/VBW/window changes that “improve” plots while system behavior stays unchanged
Probing/ground loop effects that add spurs
Excess averaging that hides rare step events

Clock budget template (X placeholders) + first 3 correlation checks

Budget template

refclk jitter (integrated X1–X2): < X
PLL/CDR output jitter: < X
Host interface timing margin: > X
Worst-case Δjitter across stress: < X
Lock/relock stability window: < X

Pass criteria

The defined bounds remain valid with temperature, cable length, and EEE/link events.

Correlation checks

1) Swap clock source (A/B)

Confirms whether the symptom tracks refclk noise floor or remains unchanged (suggesting a state-driven step).

2) Move measurement point

Separates “clean refclk” from degraded MAC IF/TS domains, indicating internal crossings or gating effects.

3) Change RBW/VBW/window

Detects settings artifacts: if plots “improve” but link/timing behavior does not, the configuration is masking the true issue.

Diagram: Clock tree + measurement points (M1–M4) + tools

Deterministic latency: what varies, what can be bounded, how to measure

Deterministic latency is not a single number. It is a decomposable set of terms across TX/RX pipelines, buffers, adaptive convergence, and lock-related behavior. TSN readiness requires identifying which terms are fixed, which are configurable, which are adaptive, and how each term can be observed and bounded.

Latency pipeline map (engineer view)

TX path (host → line)

MAC IF → PCS → FIFO/buffer
DSP/EQ → PMA/AFE → MDI
State events can create step-like shifts

Link segment (line + peer)

Cable + magnetics + peer PHY behavior
Changes can alter EQ convergence
Peer state transitions matter for repeatability

RX path (line → host)

MDI → AFE/PMA → DSP/EQ
PCS → FIFO/buffer → MAC IF
Lock windows can distort short-term timing

Latency decomposition (fixed vs configurable vs adaptive) with observation hooks

Fixed terms (treat as constants in one mode)

Pipeline base delay

AFE/PCS basic pipeline latency under a fixed speed/mode.

Observe:

External end-to-end delay (stable conditions) + error counters (⑤) for sanity.

Constant interface path

Host-side interface timing remains constant when configuration is unchanged.

Observe:

MAC IF clock domain checks + stable skew measurements.

Configurable terms (can be pinned down)

Interface/mode settings

Speed, MAC-side bus, internal delay toggles, and buffer strategies.

Observe:

Configuration registers + repeatability across power cycles.

Timestamp capture mode

Capture/insert point selection changes the effective measured latency.

Observe:

Timestamp engine status (④) + stable end-to-end test under fixed traffic.

Adaptive terms (must be bounded by verification)

EQ convergence behavior

Cable/peer/temperature changes alter convergence and may create small but real timing variability.

Observe:

Training/EQ convergence indicators (②) aligned with latency traces.

Lock/state windows

Lock/relock and state transitions can cause step events and short windows of degraded repeatability.

Observe:

PLL/CDR lock events (①) + EEE/link-state counters (③) + error bursts (⑤).

How to bound variability (pin config, pin environment, bound state events)

Layer 1: configuration bounds

Freeze speed + MAC IF mode + delay toggles
Define EEE policy for test runs (on/off)
Lock peer device and firmware version

Layer 2: environmental bounds

Fix cable type/length and routing
Fix airflow and thermal state (soak defined)
Hold supply noise and load conditions constant

Layer 3: state-event bounds

Log state counters and lock events with timestamps
Separate step events from noise-floor jitter
Bound “risk windows” after relock/exit events (X)

Measurement methodology: one-variable A/B (repeatable and falsifiable)

Step 0 — lock the controls

Same cable, same peer, same temperature window, same traffic pattern, same power state.

Step 1 — change only one variable

Example: EEE on/off, clock source A/B, cable short/long, temperature step.

Step 2 — log internal hooks

Align latency traces with ① lock, ② EQ convergence, ③ EEE/link events, ⑤ error bursts.

Step 3 — classify the output shape

Step: suspect state transitions / buffers / EEE
Noise-floor: suspect clocks / supply / measurement settings
Drift: suspect thermal path and steady-state shifts

Diagram: Latency pipeline (MAC ↔ PHY ↔ MDI ↔ cable ↔ peer) with fixed/config/adaptive markers

EEE (802.3az): power states, wake behavior, and side effects on timing

EEE is not a simple power-save toggle. It is a state machine that changes link behavior. The most important engineering task is to verify side effects: step-like latency changes around wake events, short-term jitter changes, and compatibility differences across peers.

EEE in practice: state machine and what to correlate

Active

Normal latency distribution (baseline)
Use as reference for A/B comparisons
Correlate with error counters (⑤)

LPI entry / LPI

Low-power idle behavior depends on traffic pattern
Exit behavior is the critical risk for timing
Track EEE state transitions (③)

Wake / recovery

Possible step in latency/offset around exit
Short risk window until stable behavior returns
Correlate with lock events (①) and error bursts (⑤)

Timing side effects (grouped by symptom shape) + fast discrimination

Step events

Latency/offset “jumps” near LPI exit
Buffer realignment and domain gating effects
Peer-dependent behavior across switches

Fast discrimination

Disable EEE and check whether the step disappears; then align the step time with EEE transitions (③).

Short-term jitter rise

Wake window shows degraded jitter tolerance
Clock-domain gating exposes noise coupling
Measurement settings can hide/overstate effects

Fast discrimination

Measure at multiple points and use the same RBW/VBW/window; compare EEE on/off under identical traffic.

Compatibility issues

Peer switch/port differences change stability
Increased error bursts or renegotiation events
Temperature/cable length amplifies differences

Fast discrimination

Hold traffic constant, swap peer device, and compare EEE transition counts (③) and error bursts (⑤).

EEE bring-up verification checklist (A/B, peer, cable, temperature)

A/B toggle

EEE off baseline vs EEE on
Same traffic pattern and rate
Record risk-window stability (X)

Log: ③ transitions, ① lock events, ⑤ bursts

Cable length & type

Short / medium / long runs (fixed per test)
Keep routing and shielding consistent
Watch EQ and wake sensitivity

Log: ② convergence, ③ transitions, end-to-end latency

Peer device matrix

Swap switch/port or peer PHY implementation
Hold firmware versions fixed per run
Detect peer-sensitive timing steps

Log: ⑤ counters, renegotiation events, step timestamps

Temperature / airflow

Cold start vs steady-state soak
Keep airflow direction constant
Bound drift slope (X per °C)

Log: temperature, ① lock, ③ transitions, latency drift

Diagram: EEE state timeline (Active ↔ LPI ↔ Wake) with measurement points and pass checks

PTP / gPTP hardware timestamping at the PHY: paths, errors, calibration

Hardware timestamping accuracy is determined by where the timestamp is captured/inserted, which clock domain is used, and which internal stages are included in the correction path. Engineering validation focuses on repeatability, bounded error terms, and correlation with link state transitions rather than protocol-level details.

Scope (PHY-side reality)

Capture/insert points and their relation to MDI
Error terms: bias, jitter, drift, event steps
Calibration/verification workflow and pass criteria

Not in scope (kept out to avoid cross-page overlap)

Protocol state machines and sync-tree behavior
BMCA, scheduling, or switch queue models
Network-wide configuration policies

Where timestamps happen: MAC vs PHY (engineering differences)

MAC timestamp

Capture point is closer to host processing
MAC↔PHY interface latency becomes an error term
FIFO/CDC effects can show as jitter or step

PHY timestamp

Capture point can be closer to MDI/line side
Internal pipeline + correction path must be consistent
CDC and state transitions must be bounded by tests

First checks

Disable EEE and check if steps disappear
Swap speed/cable length and verify predictable bias change
Align offset/latency changes with link/state counters

Timestamp error sources and an error-budget template

Bias (static offset)

Capture point relative to MDI and fixed pipeline terms
Mode/speed-specific baseline differences
Correction-field configuration mismatches

Budget line (placeholder)

Bias after calibration < X ns (per speed/mode).

Random jitter

FIFO/CDC phase relationship and sampling uncertainty
Clock noise mapped into timestamp domain
Measurement window and statistics definition matter

Budget line (placeholder)

Jitter (RMS) < X ns, p-p < X ns in window T.

Drift + event steps

Temperature-dependent delay and clock parameter shifts
EEE exit windows causing step-like offset/latency changes
Relock/retrain events affecting short-term timing

Budget lines (placeholders)

Drift < X ns/°C and event step < X ns.

After EEE exit, stable within X ms (no step beyond threshold).

Calibration and verification: minimal system → peer test → variable scan → full system

Level 0 — minimal board

Fix speed/mode and cable
EEE off baseline
Calibrate bias to target (X)

Level 1 — peer-to-peer

Two-ended comparison under fixed traffic
Validate correction settings consistency
Check peer sensitivity of bias/jitter

Level 2 — one-variable scan

Speed change (10/100/1G/2.5G)
Cable length/type change
EEE on/off and wake window checks

Level 3 — full system

Bring-up with real power/thermal conditions
Correlate offset with link/state counters
Prove bounded drift and no event steps beyond X

Diagram: Timestamp capture/insertion map (capture points, correction, and CDC)

Analog front-end, magnetics, EMC/ESD/surge: making the link robust

Robust Ethernet links require treating the PHY AFE, magnetics, protection network, and connector region as a coupled system. Field failures often come from return-path mistakes and parasitics that convert differential energy into common-mode noise, especially at higher data rates such as 2.5G.

AFE + magnetics as a coupled system (loss, echo, common-mode conversion)

Differential channel

Insertion loss and return loss shape eye margin
Small impedance breaks can create reflections
Parasitic capacitance becomes visible at 2.5G

Common-mode channel

Diff-to-CM conversion drives EMI and sensitivity issues
Asymmetry and return-path discontinuities amplify CM
CMC helps only when placed and referenced correctly

2.5G “no more hand-waving”

ESD capacitance and stubs become first-order terms
Layout asymmetry directly impacts CM and robustness
Partitioning errors show as intermittent bursts

ESD & surge: real damage paths (energy + return path)

ESD path

Connector → protection → short return to reference
Long return loops inject noise into PHY reference
Wrong placement drives current through magnetics/PHY zone

Surge path

Energy must be steered to the external reference (chassis/PE)
Floating paths cause board ground lift and false failures
Partition boundaries prevent energy from crossing zones

Practical rule

Place protection near the connector, keep the return path short and controlled, and prevent ESD/surge current from entering the magnetics and PHY zones.

Protection selection dimensions (C, clamping, matching, placement)

Capacitance (C)

Lower C reduces eye/return-loss impact. At 2.5G, device parasitics and stubs are often dominant.

Clamping

Energy must be clamped into a controlled return path. If the return is long, clamping does not prevent reference bounce.

Differential matching

Symmetry keeps diff-to-CM conversion low. Protection arrays and routing must preserve pair balance.

Placement

Place near connector to capture energy early. Keep the return to chassis/PE or reference short and wide.

Three-zone layout partition: PHY zone → magnetics zone → connector zone

PHY zone

Keep reference stable and local
Preserve pair symmetry leaving the PHY
Prevent ESD/surge return from entering this zone

Magnetics zone

Maintain tight pair coupling and minimal stubs
Control impedance transitions across components
Place CMC only where it supports the CM strategy

Connector / external zone

Protection arrays close to connector
Short, wide return to chassis/PE/reference
Keep shield/ground strategy consistent and testable

Diagram: Board partition & protection placement (PHY → magnetics → connector) with ESD return path

Bring-up & debug playbook: link training, autoneg, cable issues, diagnostics

Debugging Ethernet PHY stability is fastest when symptoms are routed into a small set of “root-cause buckets” using correlation checks first (peer swap, EEE off, forced speed). The goal is to reduce time-to-isolation: identify whether failures track the peer, the channel, the environment, or internal state transitions.

Symptom entry points (route first, then drill down)

Link flap

Frequent up/down or retrain loops. Start with correlation checks before deep logging.

Speed drop

Falls back to 100M/1G unexpectedly. Check autoneg outcome and partner capability.

Long cable only

Stable on short cable but fails at long run. Treat as channel-margin sensitivity first.

Hot only / soak

Pass at room temp but fails hot. Correlate with drift and state/event counters.

Chassis-only

Bench passes but fails in full system. Treat as power/ground/EMC coupling until proven otherwise.

Correlation-first triage (highest information gain)

Swap peer

If the issue follows the peer, bucket into compatibility/partner behavior.

EEE OFF

If steps/flaps disappear, bucket into LPI entry/exit windows and wake behavior.

Force speed

If forced mode is stable, bucket into autoneg/training process rather than steady-state channel.

Output of triage

Peer bucket

Different peer fixes it → compatibility or partner state behavior.

Channel bucket

Worse with length/type → magnetics/cable/impedance margin.

Power/state bucket

Temperature/chassis/EEE correlation → state/event coupling.

Autoneg failures: 5 common root causes (symptom → quickest check → next action)

1) Mode mismatch / straps

Likely cause: forced mode conflicts with partner autoneg or strap defaults.
Quick check: force a known-good speed/duplex and compare stability.
Fix: align autoneg policy and strap/config across boots.
Pass criteria: stable link with identical mode after reboot and peer swap.

2) Channel margin (cable/magnetics)

Likely cause: training/FLP is corrupted by loss/echo or poor term/magnetics.
Quick check: short-cable baseline vs long-cable failure reproduction.
Fix: isolate cable type, check magnetics + connector placement and symmetry.
Pass criteria: no renegotiation loops; error counters remain bounded.

3) Clock/power noise → state churn

Likely cause: marginal refclk or supply noise triggers retrain or false transitions.
Quick check: correlate flaps with counters/events; compare clean vs noisy power condition.
Fix: stabilize ref/power; retest under forced speed to separate autoneg vs steady-state.
Pass criteria: retrain count and drops stay below X per hour in window T.

4) Peer compatibility

Likely cause: partner implementation differences or strict corner behavior.
Quick check: swap peer across vendors/models; keep all other variables fixed.
Fix: lock negotiated subset, adjust advertisement policy, or select a compatible profile.
Pass criteria: identical results across peer set under same cable/temperature.

5) EEE interaction

Likely cause: LPI entry/exit timing causes transient instability or misclassification.
Quick check: EEE off A/B; look for steps at wake moments.
Fix: tune EEE policy or disable for deterministic timing applications.
Pass criteria: no offset step beyond X ns; no burst errors after wake.

Cable diagnostics / TDR: use it without false conclusions

When it is meaningful

Stable link state and fixed speed
Known cable type and baseline reference run
Trend comparison, not blind absolute distance

Common false-positive sources

Magnetics + CMC parasitics and routing stubs
ESD array capacitance near connector
Connector/patch-cord reflections

Correct workflow

EEE off and forced speed baseline
Capture a known-good reference signature
Compare deltas under one-variable changes

Minimal log fields (enough to reproduce and correlate)

Link + mode

link up/down
speed / duplex
autoneg enabled / forced

State & features

EEE state (LPI, wake count)
timestamp state (enable/valid)
retrain / renegotiation reason

Error counters

CRC / PCS errors
symbol / alignment errors
drop count / burst markers

Context (minimal)

peer model/vendor
cable length/type
temperature point

Diagram: Debug decision flow (few branches, high-gain steps first)

Compliance & validation: what to test, what to log, and pass/fail criteria

Validation should be structured as a matrix: conditions (cable, temperature, peer, power, feature states) × metrics (BER, drop rate, recovery, timestamp stability). The goal is to prevent “bench pass, chassis fail” by forcing worst-case corners and logging enough context to correlate failures to specific transitions.

Coverage dimensions (minimum set that prevents blind spots)

Signal & errors

BER / CRC / PCS errors
burst behavior and retrain count
drop rate over window T

Timing & timestamp

offset step during events
timestamp validity rate
recovery time after wake/relock

Environment & channel

short vs long cable
temperature corners
peer diversity (models/vendors)

Power disturbance

nominal vs ripple/step
mode transitions
correlate bursts with supply events

Feature states

EEE on/off
timestamp on/off
autoneg vs forced

Test matrix template (conditions × metrics)

Conditions (axis)

cable: short / long (X m)
temp: cold / room / hot (X °C)
peer: A / B / C
power: nominal / ripple (X)
EEE: on / off
timestamp: on / off

Metrics (axis)

BER / CRC / PCS errors
drop rate (X / hour)
recovery time (X ms)
offset step (X ns)
timestamp validity (X%)
retrain count (X / hour)

Corner priority

Mark high-risk corners explicitly (long + hot + EEE on + timestamp on). Run these first and gate release on them.

Preventing “bench pass, chassis fail” (force coupling paths into the matrix)

Why chassis differs

ground reference shifts and CM noise increases
thermal gradients and soak effects
power mode changes and fan/load transients
neighbor interface coupling

What to do

repeat key corners in the full system
log the same minimal fields as bring-up
align errors with state/event transitions
keep one-variable changes during debug

Pass/fail criteria templates (define the measurement window)

Offset stability

After event (EEE exit / retrain), in window T: offset step < X ns.

Drop rate

Over T hours: drops < X / hour.

Recovery time

From event to stable state: < X ms (stable means counters stop increasing and offset returns within range).

Timestamp validity

Valid timestamps > X% across corners (long + hot included).

Diagram: Validation matrix map (card-grid, not a dense table)

H2-11. Engineering checklist (design → bring-up → production)

This checklist turns all the page’s key points around determinism / low jitter / EEE / PTP / robustness into executable checks, gated by stage: Design gate prevents board re-spins, Bring-up gate prevents misdiagnosis, Production gate prevents “Monday yield collapse” and station-to-station mismatch.

Design gate · Build “determinism” into schematic + layout first

Clock / Jitter MAC interface Magnetics / Protection Power integrity

Quantify the refclk budget first: refclk phase noise/jitter must match the TSN timestamp error budget (placeholder: refclk jitter < X ps RMS, fill X per system budget), and explicitly define probe points (XO output pin / PHY refclk pin / MAC-side txclk).
Lock down one interface/timing strategy: for RGMII, choose either internal delay or board delay (one and only one); for SGMII/2500BASE-X, define refclk (25/125 MHz) + jitter requirement to avoid “eye passes but offset jumps.”
Place magnetics + protection by return-path logic: ESD/surge return must avoid the PHY ground-reference sensitive area; prioritize “clamp energy in the outside zone first” when setting distances between diff pairs, protection, and connector.
Partition rails + add low-noise post-regulation: for PHY analog/PLL rails, prefer a high-PSRR low-noise LDO as a second-stage cleanup; DC/DC switching ripple should support repeatable injection testing (placeholder: under ripple X mVpp, offset/jitter does not degrade).

Reference part numbers (examples; re-verify package/grade/availability)

PHY (candidates with PTP / timestamp capability)

Microchip LAN8841-V/Q2A (10/100/1G; RGMII/GMII/MII; IEEE 1588 v2 timestamp)
TI DP83867IRRGZ (10/100/1G; RGMII/SGMII; IEEE 1588 SFD / sync capability)
Marvell Alaska 88E1512/88E1514 (10/100/1G; IEEE 1588 v2 / 802.1AS; includes cable-diagnostics-class features)
MaxLinear MxL86288I (2.5GBASE-T multi-port; NBASE-T; for 2.5G TSN ports / multi-port use cases)

Clock source (25 MHz examples)

Abracon ASV-25.000MHZ-LC-T (25 MHz SMD oscillator)
SiTime SiT1602AI-22-33E-25.000000 (25 MHz MEMS oscillator; helps standardize across voltages/inventory)

Low-noise power (2nd-stage LDO examples)

TI TPS7A20 (high-PSRR low-noise LDO for PLL/analog post-regulation)
ADI ADP150 (ultra-low-noise LDO for sensitive analog/clock rail cleanup)

Connector magnetics / ESD protection (examples)

Würth 7499010121A (specific example within an RJ45 integrated-magnetics family; selection must match rate/pinout/EMC)
Pulse J0011D21BNL (10/100 Base-TX integrated-magnetics RJ45 example; for low-speed ports/control ports)
Nexperia PESD2ETH-D (dual-line low-cap ESD diode for high-speed lines)
Littelfuse SP3012-04UTG (specific example within a low/ultra-low-cap TVS array family)
Würth 744231371 (2-line common-mode choke example; “last-mile” tuning for cable common-mode noise)
TDK ACT45B-101-2P-TL003 (signal-line common-mode filter example; re-check insertion loss vs target band)

Note: part numbers are “implementation anchors + datasheet lookup helpers,” not universal answers. Within a single family, variants often differ by speed grade, pinout, capacitance, package suffix, etc.

Bring-up gate · Turn “autoneg/EEE/PTP” into repeatable A/B tests

Minimal-system baseline: fix cable / peer / temperature first; run the baseline of “lock speed + EEE OFF + PTP OFF,” and confirm BER/link-flap rate/error counters are zero or at the noise floor.
Add complexity one variable at a time: enable only one variable (EEE or PTP or speed) per step; every step must be roll-backable, reproducible, and explainable.
EEE validation focus: log LPI enter/exit counts and exit latency (placeholder: exit X µs), and check whether offset shows a step at the exit instant (placeholder: step X ns).
PTP timestamp validation focus: fix speed + cable length; do loopback/dual-end correlation; ensure the timestamp path is not distorted by FIFO/clock-domain crossings that introduce “rate-dependent bias,” and validate temperature-drift slope is calibratable.
Use cable diagnostics/TDR only as a locator: first remove EEE/peer/power-noise via correlation checks, then enable diagnostics—avoid mislabeling “clock/power issues” as “cable issues.”

Minimum fields to log during bring-up (for reproduction + correlation)

link up/down speed/duplex autoneg result EEE state/LPI count PTP enable + mode timestamp health CRC/FCS errors symbol/alignment errors die temp / board temp supply ripple snapshot

Production gate · Make “station consistency” a system capability

Fixture/cable standardization: define a golden cable (length/category/shielding/bend radius) and a golden peer (peer model/firmware/port config); re-check insertion loss and contact resistance drift weekly.
Mandatory environment fields in logs: temperature/humidity/airflow/enclosure door state/power-station ID/fixture version; otherwise “Monday yield collapse” is hard to root-cause.
Sampling must cover state-machine edges: frequent EEE in/out, frequent PTP servo updates, power-disturbance injection, long vs short cable A/B; avoid testing only steady-state.
Criteria must be numeric: flap rate X /hour; recovery time X ms; offset step X ns; temp drift slope X ns/°C (fill X per system budget).

The core of the production gate is not “testing more,” but controlling key variables and making every anomaly traceable to one of: cable / peer / environment / configuration / power.

Checklist map (stage-gated)

Blocks are “must-pass” gates for determinism + TSN readiness

H2-12. Applications + IC selection logic (PHY-focused)

This section only covers Ethernet PHY-side application points and selection logic that strongly impact TSN/determinism: speed / interface / clock / EEE / PTP timestamp / EMC robustness / power & thermal. Protocol stack and switching/scheduling details are intentionally out of scope for this page.

H3-A. Typical applications (strongly PHY-related)

Industrial TSN endpoints (motion control / drives / I/O nodes)

Key focus: calibratable PHY hardware timestamp path + bounded delay jitter (FIFO/EEE/adaptation impacts are controlled).
Engineering anchor: coupling from refclk/power noise into timestamp/offset must be measurable, controllable, and reproducible.

Industrial controllers / gateways (single or dual ports)

Key focus: whether EEE is allowed (many deterministic systems force-disable it or bound policy) and link recovery time (placeholder: < X ms).
Engineering anchor: autoneg/downshift/peer-compat issues must be quickly classified via “EEE OFF / lock speed / swap peer.”

2.5G uplinks (multi-Gig backhaul / port aggregation)

Key focus: 2.5GBASE-T is more sensitive to cabling/magnetics/EMI; the capacitance + diff-matching “margin” is smaller.
Engineering anchor: prioritize a test matrix proving that under “cable length / temperature / power disturbance / EEE / PTP,” BER/flap rate and offset criteria still hold.

Automotive Ethernet note (avoid overlap with sibling pages)

If you later build a dedicated “Automotive Ethernet PHY” subpage, handle it via an internal link here. This page does not expand into T1/TC10/automotive topology or harness constraints to avoid topic overlap.

H3-B. Selection logic (scoring + decision tree)

Selection is not “comparing datasheet numbers,” but working backward from TSN goals: trustworthy timestamps, bounded latency, controlled EEE side effects, passable EMC/ESD/surge, maintainable production consistency.

Scoring dimensions (PHY-focused)

Speed & MAC interface: 10/100/1G/2.5G; RGMII/SGMII/2500BASE-X/USXGMII (choose based on real SoC/FPGA constraints).
PTP hardware timestamp: sampling point, available calibration/compensation mechanisms, GPIO/interrupt/1PPS support, driver/register accessibility.
Determinism controls: FIFO depth + bypass ability; whether adaptation/power-saving states impact latency in a bounded/observable way.
EEE behavior: LPI enter/exit timing, exit transient impact on offset/BER, peer-compat risk and configurable policy.
Robustness: ESD/surge path design difficulty, protection-capacitance tolerance, magnetics compatibility window, long-cable/high-temp stability.
Power & thermal: per-speed power, temperature rise, package thermal capability, number of power domains + cleanup cost.

Reference “starter set” (BOM anchors)

Ethernet PHY candidates

LAN8841-V/Q2A DP83867IRRGZ 88E1512/88E1514 MxL86288I

Clock + power cleanup

ASV-25.000MHZ-LC-T SiT1602AI-22-33E-25.000000 TPS7A20 ADP150

Magnetics + protection

7499010121A J0011D21BNL PESD2ETH-D SP3012-04UTG 744231371 ACT45B-101-2P-TL003

Recommended workflow: use the “starter set” to pass bring-up + the verification matrix first, then converge magnetics/protection by EMC/power/cost. Replace the PHY itself only when “timestamp/interface/diagnostics capability” is insufficient.

Selection decision tree (TSN-first)

Short labels only; details stay in text to keep the diagram clean

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13. FAQs (TSN / EEE / PTP / determinism — PHY-focused)

Each FAQ is intentionally short and executable (no protocol deep-dives): Likely cause → Quick check → Fix → Pass criteria (thresholds use placeholders “X” to be filled by the system budget). Example part numbers are provided as “BOM anchors” only (e.g., PESD2ETH-D, SP3012-04UTG, 744231371, TPS7A20, ADP150, SiT1602) — verify package/suffix/ratings/availability.

PTP / EEE PTP offset occasionally “steps” — suspect EEE exit or inconsistent timestamp capture point?

Likely cause A transient on EEE (LPI) exit changes the timestamp path/clock domain behavior, or the system mixes MAC-side vs PHY-side capture points across builds/ports.

Quick check (1) A/B: EEE OFF vs ON, count offset steps per hour and align each step to LPI exit events. (2) Confirm one consistent capture mode: PHY HW timestamp enabled (or MAC), not mixed.

Fix Keep a deterministic policy: disable EEE for TSN-critical windows, or enforce a bounded EEE policy; standardize on a single timestamp capture point and apply the vendor’s correction/calibration flow. If supply/clock coupling is suspected, isolate PLL/clock rails with a low-noise LDO stage (e.g., TPS7A20 or ADP150).

Pass criteria With EEE policy applied: offset step < X ns and step rate < X/hour over a soak window T, with steps not correlated to LPI exit.

Correlation Same board, different switch peer → very different PTP accuracy. What is the first end-to-end correlation check?

Likely cause The peer is driving different EEE advertisement/policy, different timestamping behavior, or different link conditions (rate/duplex, downshift, retries), which changes the effective time error seen at the endpoint.

Quick check Normalize the physical layer first: (1) lock the same speed/duplex; (2) A/B with EEE forced OFF on both ends; (3) compare offset jitter while logging LPI enter/exit counts and link partner ID/capabilities.

Fix Standardize port policy: enforce the same EEE behavior, timestamp mode (PHY HW timestamp vs MAC), and link configuration across switches. If the switch peer is non-negotiable, prioritize a PHY with robust HW timestamp + diagnostics (e.g., LAN8841, DP83867, 88E151x) and validate per-peer profiles.

Pass criteria Under normalized settings: offset jitter < X ns RMS and no peer-dependent bias > X ns over window T.

2.5G / AN 2.5G links up, but later drops to 1G. Check autoneg first, or cable/magnetics bandwidth first?

Likely cause A marginal 2.5G channel triggers downshift / renegotiation (cable loss, magnetics limits, excess ESD capacitance, or supply/thermal drift).

Quick check (1) Read and log downshift / AN-restart counters at the event time. (2) A/B: known-good short Cat6 cable vs the failing cable; then A/B: force 2.5G (no AN) if supported.

Fix If channel-limited: tighten magnetics/EMC BOM and placement, reduce line capacitance (e.g., replace a high-C array with a low-C Ethernet TVS such as PESD2ETH-D or SP3012-04UTG where appropriate), and tune common-mode control (e.g., a CMC like Würth 744231371 as a tuning knob). If AN-limited: standardize advertisements and disable aggressive downshift policies.

Pass criteria At 2.5G worst-case: downshift events = 0 and link uptime > (100% − X) over T, with RX/PCS errors below threshold X.

Long / Hot Link flaps only on long cable or at high temperature — which 3 PHY states/counters should be logged first?

Likely cause A reduced margin corner exposes channel loss + EQ/CDR sensitivity, or supply/thermal drift that triggers renegotiation, retrain, or LOS events.

Quick check Log these three “first responders”: (1) link state + negotiated speed/duplex; (2) AN restart / downshift / retrain counters; (3) RX-side error counters (PCS/alignment and FCS/CRC). Correlate flap events with die temperature and supply ripple snapshots.

Fix If errors rise before the flap: improve channel margin (better cable category, magnetics choice/placement, reduce protection capacitance, tune CMC). If no errors but AN restarts spike: stabilize power/thermal and pin the link policy (lock speed for validation; then re-enable AN only after margin is proven).

Pass criteria Worst-case (long + hot): link flap < X/hour, AN restart = 0 (or < X) over window T, and error counters remain below X.

EEE EEE ON: throughput is fine, but latency jitter increases. How to prove it is caused by EEE state transitions?

Likely cause The Active ↔ LPI ↔ Wake transitions change buffering and clock behavior, adding a bursty component to latency even if average throughput remains unchanged.

Quick check (1) Collect a latency histogram with timestamps of LPI enter/exit; overlay jitter spikes on exit events. (2) A/B: EEE OFF vs ON at the same link rate and traffic pattern; keep all other variables fixed.

Fix Disable EEE for TSN-critical ports, or raise the EEE entry threshold so LPI is rare during deterministic traffic. If EEE must remain enabled, require a bounded wake behavior and validate across peers and temperature (do not assume peer compatibility).

Pass criteria With the final EEE policy: p99.9 latency jitter < X ns and wake recovery < X ms over window T, with no jitter bursts aligned to LPI exit.

RGMII RGMII shows occasional CRC errors, but the line side looks clean. Check internal delay / routing skew or I/O supply noise first?

Likely cause Most often it is RGMII timing margin (wrong internal delay mode, skew, edge rate) — but a close second is I/O rail noise / ground bounce corrupting sampling.

Quick check (1) A/B: toggle the PHY’s RGMII internal delay configuration (ID on/off) and see whether CRC errors track the setting. (2) Scope the I/O rail ripple and correlate CRC bursts with rail noise and simultaneous switching events.

Fix Lock one timing strategy: correct internal delay mode + enforce length matching; add modest series damping where needed (board-specific). If rail noise is implicated, strengthen decoupling and isolate sensitive rails with a clean LDO stage (e.g., TPS7A20) and tighten the ground return stitching near the PHY.

Pass criteria At worst-case traffic and temperature: CRC errors = 0 over T, and timing margin remains > X (as defined by the interface budget).

SGMII SGMII “looks locked” but occasionally fails training — verify refclk quality first, or lane termination first?

Likely cause If training failures are rare and temperature/voltage-dependent, refclk jitter / supply noise is often the first suspect; if failures are deterministic by board/cable, then lane SI/termination is more likely.

Quick check (1) Swap to a known low-jitter 25 MHz source (e.g., SiT1602…25.000000 or ASV-25.000MHZ-LC-T) and re-run training statistics. (2) If still failing, validate lane termination/coupling against the PHY reference design and check for skew/return-path discontinuities.

Fix Improve refclk integrity (clean routing, isolation, stable rail; add a low-noise LDO stage such as ADP150 where applicable) and match the recommended SGMII termination/AC coupling scheme. Then lock the interface configuration (no mixed modes across builds).

Pass criteria Training failures = 0 over soak T across temperature range, and interface error counters remain below X.

EMI EMI fails in one band — suspect common-mode leakage or clock-harmonic coupling? What is the “one-step” validation?

Likely cause A narrow band peak is often either clock-harmonic coupling (refclk/PLL domains) or common-mode conversion at magnetics/connector due to return-path asymmetry.

Quick check Do a single A/B change while measuring the failing band: populate/bypass the common-mode choke (e.g., try a CMC option such as 744231371) to see if the peak moves/drops. If it does not, A/B the clock source path (swap to a cleaner oscillator like SiT1602) and observe peak correlation.

Fix If common-mode dominated: enforce symmetry and return stitching, tune CMC, and reduce parasitic imbalance near the connector. If clock-harmonic dominated: shorten and shield the clock path, isolate the clock/PLL rail with a clean LDO stage (TPS7A20/ADP150), and reduce coupling loops in layout.

Pass criteria EMI margin at the failing band improves to ≥ X dB under the same test setup, with no regression in link stability or TSN timing metrics.

ESD ESD hit drops the link, but reconnect is normal. What is the most common return-path mistake?

Likely cause ESD current returns through the PHY reference ground (or sensitive analog/PLL ground) instead of being dumped at the connector/chassis region, causing a transient reset/LOS/renegotiation.

Quick check Verify placement and return: TVS arrays (e.g., PESD2ETH-D / SP3012-04UTG) should be connector-side with a short, low-inductance return to the intended ESD sink (often chassis/connector ground). Correlate link drops with PHY reset/interrupt flags.

Fix Re-route the ESD return so the discharge loop closes locally at the connector region; add stitching vias/short paths; keep the PHY zone “quiet.” Use low-capacitance protection where the channel margin is tight, and re-validate at the worst-case link rate.

Pass criteria Under the target ESD level: no link drop (preferred), or auto-recovery < X ms with no speed downgrade and no persistent error-counter increase.

Diagnostics Cable diagnostics/TDR reports a short, but swapping the cable “fixes it.” How to tell measurement artifact vs real defect?

Likely cause The diagnostic ran under inconsistent conditions (link/EEE state, peer behavior, calibration), or the connector contact is intermittent — creating a false short signature.

Quick check Run diagnostics only in a controlled state: EEE OFF, stable link policy (or link forced down per vendor guidance), and with a known “golden cable.” Repeat N times; if the result is not repeatable, treat it as an artifact.

Fix Standardize the diagnostic procedure (state + temperature + cable type), update PHY firmware/driver if required, and treat intermittent connector contact as a hardware defect. Use diagnostics as a secondary tool after correlation checks, not as the first root-cause verdict.

Pass criteria Under the standardized procedure: false-short rate = 0 / N, and any reported defect is repeatable across runs and correlates with independent evidence (errors/flaps).

Chassis PTP is stable on the bench, but drift increases in the chassis. First priority: airflow baffle, ground bounce, or supply noise?

Likely cause In chassis, two new couplings dominate: supply noise / ground bounce from shared rails and return paths, and temperature gradients (airflow) that change oscillator/PLL behavior.

Quick check Start with the highest information gain: (1) log PHY die temp and PLL/I/O rail ripple at the same cadence as drift; (2) A/B: power from a clean bench supply vs system supply; (3) only then A/B a simple airflow baffle.

Fix If drift tracks supply/ground: add a dedicated low-noise stage for clock/PLL rails (TPS7A20 / ADP150), improve return stitching, and reduce shared high-di/dt coupling. If drift tracks temperature gradients: improve airflow guidance and keep the clock source stable (e.g., a robust oscillator like SiT1602).

Pass criteria In chassis worst-case: drift slope < X ns/°C and offset stability meets the system requirement over soak T, with no correlation to rail ripple beyond X mVpp.

Field / Logs Compliance passes, but the field shows intermittent downshift. Which log field is usually missing (EEE / PTP / temperature / cable info)?

Likely cause Field downshift is often triggered by a state transition or corner condition that compliance didn’t exercise (peer diversity, EEE policy, supply/thermal drift, cable variability).

Quick check Compare “good vs bad” field captures: if downshift events exist but EEE state (LPI enter/exit count at event time) is missing, root-cause classification becomes impossible. Correlate downshift with temperature and cable category/length if available.

Fix Add EEE state + LPI enter/exit counters as mandatory fields, plus the peer ID/capabilities and a minimal cable descriptor. Then re-run the validation matrix at the field-relevant corner (long + hot + peer diversity + EEE policy).

Pass criteria Field events become classifiable: each downshift is attributable to a bucket (EEE/peer/cable/temp/supply) with > X% confidence, and the final policy holds downshift < X/hour over T.

Ethernet PHY (10/100/1G/2.5G) for TSN: Clocks, EEE, PTP

Ethernet PHY (10/100/1G/2.5G) for TSN: Clocks, EEE, PTP

What an Ethernet PHY is (and what TSN expects from it)

PHY vs MAC: responsibilities that matter for timing repeatability

TSN-friendly PHY: minimum capability checklist (bounded, measurable, repeatable)

Modes & interfaces: 10/100/1G/2.5G and MAC-side buses

Selection framing: rate × host interface × clocking × determinism risk

Quick map: speed → common MAC bus → clock notes → primary risk

Common pitfalls (engineer-style: symptom → fastest check → fix direction)

Inside the PHY: PCS/PMA, DSP blocks, and where determinism is lost

Internal block relationships (what each stage changes)

“Determinism killers” (grouped by what they look like in measurements)

Five internal observation points used throughout this page

Low-jitter clocks: reference, recovered clock, jitter transfer & tolerance

Clock sources and roles (engineer view)

Where to measure (multi-point) and how to avoid artifacts

Clock budget template (X placeholders) + first 3 correlation checks

Deterministic latency: what varies, what can be bounded, how to measure

Latency pipeline map (engineer view)

Latency decomposition (fixed vs configurable vs adaptive) with observation hooks

How to bound variability (pin config, pin environment, bound state events)

Measurement methodology: one-variable A/B (repeatable and falsifiable)

EEE (802.3az): power states, wake behavior, and side effects on timing

EEE in practice: state machine and what to correlate

Timing side effects (grouped by symptom shape) + fast discrimination

EEE bring-up verification checklist (A/B, peer, cable, temperature)

PTP / gPTP hardware timestamping at the PHY: paths, errors, calibration

Where timestamps happen: MAC vs PHY (engineering differences)

Timestamp error sources and an error-budget template

Calibration and verification: minimal system → peer test → variable scan → full system

Analog front-end, magnetics, EMC/ESD/surge: making the link robust

AFE + magnetics as a coupled system (loss, echo, common-mode conversion)

ESD & surge: real damage paths (energy + return path)

Protection selection dimensions (C, clamping, matching, placement)

Three-zone layout partition: PHY zone → magnetics zone → connector zone

Bring-up & debug playbook: link training, autoneg, cable issues, diagnostics

Symptom entry points (route first, then drill down)

Correlation-first triage (highest information gain)

Autoneg failures: 5 common root causes (symptom → quickest check → next action)

Cable diagnostics / TDR: use it without false conclusions

Minimal log fields (enough to reproduce and correlate)

Compliance & validation: what to test, what to log, and pass/fail criteria

Coverage dimensions (minimum set that prevents blind spots)

Test matrix template (conditions × metrics)

Preventing “bench pass, chassis fail” (force coupling paths into the matrix)

Pass/fail criteria templates (define the measurement window)

H2-11. Engineering checklist (design → bring-up → production)

Design gate · Build “determinism” into schematic + layout first

Bring-up gate · Turn “autoneg/EEE/PTP” into repeatable A/B tests

Production gate · Make “station consistency” a system capability

H2-12. Applications + IC selection logic (PHY-focused)

H3-A. Typical applications (strongly PHY-related)

H3-B. Selection logic (scoring + decision tree)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

H2-13. FAQs (TSN / EEE / PTP / determinism — PHY-focused)

Explore

Categories

Get in Touch