Automotive Ethernet T1 PHY reliability is won by closing three loops: TC10 wake robustness, EMC-shaped transmit, and ASIL-grade evidence.
This page turns each loop into measurable checks and pass/fail criteria so link stability, EMI compliance, and diagnosability can be proven—not guessed.
What this page delivers
This page focuses on 100/1000BASE-T1 PHY link robustness in vehicles—specifically TC10 sleep/wake,
EMC-shaped transmit, and ASIL-friendly diagnostics—so design, validation, and troubleshooting can be executed with repeatable evidence.
H2-1. Definition & Where It Fits (100/1000BASE-T1 PHY in the vehicle)
Automotive Ethernet PHYs translate a controller/MAC interface into a single-twisted-pair in-vehicle link. The engineering goal is not only “link up”,
but stable margins with evidence across harness variation, EMC constraints, low-power wake, and safety diagnostics.
Positioning: what problem is being solved
Physical link reality: in-vehicle wiring (connectors, branches, shields) creates reflections and common-mode paths that can destabilize training and increase burst errors.
EMC constraint: “works on a bench” may still fail radiated limits; transmit shaping and port co-design are required to reduce peaks without collapsing margin.
Low-power behavior: TC10 sleep/wake adds state transitions and timing windows; robust wake must handle noisy environments without wake storms.
Safety observability: ASIL-friendly hooks depend on counters, fault reasons, and consistent evidence logs—not just a link LED.
100BASE-T1 vs 1000BASE-T1: the practical differences that matter
The headline rate is not the only decision lever. For bring-up and production repeatability, the differentiators are margin sensitivity,
EMC shaping headroom, and how harness variability shifts the training boundary.
100BASE-T1: often easier to stabilize on diverse harnesses; EMC is still real, but the link typically tolerates more wiring variance before retrains appear.
1000BASE-T1: higher throughput often means tighter constraints on port co-design (layout, protection parasitics, common-mode control) and tuning discipline for shaping vs margin.
Decision principle: select by system constraints first (harness, EMC limits, wake behavior, diagnostic evidence needs),
then map to the minimum viable link rate.
Scope guard (this page)
TC10
EMC-shaped TX
ASIL hooks
Coverage stays at the PHY and its co-design boundaries (port, harness, wake, diagnostics). Topics below are intentionally out of scope here and should be handled on their own pages.
Out of scope: TSN (802.1 time scheduling/shaping), PTP/SyncE deep dives, industrial protocol stacks (PROFINET/EtherCAT/CIP), PoDL/PoE power classes.
Allowed mention: one-sentence constraints + link to the dedicated page (no tutorials inside this page).
Key terms (minimal, PHY-boundary only)
PHY vs MAC
The PHY handles signaling, training, and error visibility at the physical/PCS boundary; the MAC/controller handles frames, queues, and host interfacing.
TC10
A low-power state model where wake entry/exit timing and wake noise immunity become first-class link requirements.
EMC-shaped transmit
PHY-side knobs (edge/drive, pre-emphasis/EQ, common-mode control) used to reduce radiated peaks while maintaining a stable training boundary.
ASIL hooks
Diagnostics that make link health observable: fault reasons, counters, interrupt causes, and evidence fields for consistent logging.
Diagram — Vehicle Domain Link Map (PHY boundary + three main hooks)
The diagram intentionally keeps only PHY-boundary hooks: TC10 behavior, EMC transmit shaping knobs, and ASIL-friendly diagnostics evidence.
H2-2. Link Bring-up State Machine (Power-up → Link → Normal)
Bring-up must be driven by state + evidence. Each state below defines what must be true,
what can be observed (interrupt reasons, MDIO status, counters), and the fastest next action when the link stalls or flaps.
Why a state machine matters (field-proof bring-up)
Avoid false diagnoses: a configuration/clock problem often looks like “training failure” unless states are separated.
Make failures comparable: production and field logs become actionable only when counters and state transitions are consistent.
Connect to main hooks: TC10 mainly affects transitions and recovery; EMC shaping affects training boundary and burst errors; ASIL hooks define evidence fields.
Pre-link prerequisites (the minimum set that prevents blind alleys)
Power rails: correct ramp + stable reset release; eliminate brownout oscillation during first training attempt.
Reference clock: stable and within spec; treat marginal clock as a primary cause for intermittent training or link flaps.
Configuration latch: straps/MDIO values must be read back and recorded as a baseline for correlation across units and harnesses.
Port co-design sanity: protection and common-mode parts must not introduce excessive mismatch; verify the PHY sees a consistent port environment.
Entry: reset asserted or power-on reset active. Exit: reset deasserted with rails + clock stable for ≥ X ms. Observables: reset pin level; basic status shows “not ready”; no meaningful counters yet. Failure signature: reset bounce / brownout oscillation → repeated partial bring-ups. First action: validate rail stability + reset release timing; log the exact release moment and any droop events.
CONFIG (straps / MDIO)
Entry: reset released; strap latch window or MDIO configuration start. Exit: configuration readback matches intended profile; interrupts cleared; baseline counters zeroed. Observables: MDIO readback; strap status; interrupt cause register after clear. Failure signature: “training fails” only on some units → mismatched strap pull or MDIO write sequence. First action: always read back and log config; standardize a single golden config and diff against it.
INIT
Entry: clocks locked internally; port biasing active; link attempt begins. Exit: PHY declares readiness to train (pre-train checks passed). Observables: “ready” status bit; interrupt reason if pre-checks fail. Failure signature: immediate error before training → clock/rail/config issue masquerading as link problem. First action: confirm readiness flags and capture the first failing reason code (no guessing).
TRAIN
Entry: training starts (equalization / adaptation / convergence attempts). Exit: convergence passes quality thresholds (e.g., quality metric ≥ X, no timeouts). Observables: training progress/status; timeouts; error counters (should remain within X during training). Failure signature: passes on bench but times out on harness → reflection/common-mode environment moves the boundary. First action: capture training result codes + time-to-converge; avoid changing multiple knobs at once.
LINK_UP
Entry: training complete; link reported up. Exit: stable counters and no retrain within Y seconds; baseline established. Observables: link status; CRC/error counters; “retrain” events; interrupt reasons. Failure signature: link up but unusable → burst errors, repeated retrains, or counter explosions under load. First action: define a baseline window (Y seconds) and record counters per 1k frames or per minute (consistent denominators).
NORMAL
Entry: link stable with a known-good baseline; operating traffic present. Exit: none; transitions occur to TC10 or recovery paths as designed. Observables: health counters, temperature/voltage events, TC10 entry/exit events. Failure signature: periodic flaps (minutes apart) → environmental coupling, recovery aggressiveness, or wake noise. First action: correlate counter bursts with events (temp, rail noise, wake, EMC test steps) using a unified log schema.
ERROR ↔ RECOVERY
Entry: fault reason asserted (timeout, severe error burst, loss of lock, explicit remote fault). Exit: recovery completes and retraining returns to LINK_UP without storm behavior. Observables: fault reason codes; retry counters; backoff timers; retrain counts per hour. Failure signature: retry storm → recovery criteria too aggressive or wake/noise repeatedly triggers transitions. First action: log “fault reason + state transition timeline”; enforce a storm guard (rate limit) with pass criteria X.
Diagram — Bring-up State Machine (with observability hooks)
The same state machine is reused later for TC10, EMC shaping, ASIL diagnostics, and troubleshooting: every symptom must map to a state and an evidence set.
H2-3. Signal Integrity on Single-Twisted Pair (Reach, harness effects, reflections)
The in-vehicle STP harness is not a uniform transmission line: connectors, branches, and shielding paths create discontinuities that shift the training boundary.
The practical goal is to prevent reflections and mode conversion from landing inside the PHY’s effective window, which otherwise leads to
timeouts, burst errors, and retrain storms.
Key harness risks that reduce PHY stability (no theory detours)
Stub (drop line): a short branch behaves like a reflection generator; if the return timing overlaps the PHY’s effective window, training becomes fragile.
T-branch / multi-branch: every branch point adds impedance steps; multiple reflections can stack and move the link across the stability cliff.
Connector discontinuity: contact geometry and shield transitions introduce local mismatch that may be “invisible” on a bench but destabilizing in-vehicle.
Return loss (reflection strength): a practical indicator that the channel is deviating from its expected impedance profile.
Mode conversion (DM → CM leakage): asymmetry, shield bonding choices, and mismatched parasitics can convert differential energy into common-mode, hurting both EMC and margin.
100 vs 1000: what becomes tighter (engineering, not marketing)
1000BASE-T1 typically has a narrower stability margin against discontinuities (branch points, connector mismatch, port parasitics), so the same harness shift can trigger retrains sooner.
100BASE-T1 often tolerates more harness variance before training fails, but severe mode conversion can still cause burst errors and EMC peaks.
Practical implication: channel variability must be treated as a first-class input; a “golden bench cable” is not a sufficient bring-up reference.
Each rule is written as a constraint + symptom hook, so failures map back to the bring-up state machine (TRAIN / LINK_UP / NORMAL).
Rule 1 — Stub length ≤ X (placeholder)
Why: stub reflections can return inside the effective window and destabilize convergence. Quick verify: training time distribution widens; retrain/hour rises; errors appear in bursts.
Rule 2 — Avoid extra T-branches and hidden spur additions
Why: branch points create stacked reflections and shift the training boundary under temperature and layout changes. Quick verify: link works on bench but fails on vehicle harness; retrain events correlate with harness configuration.
Why: connector geometry changes local impedance and can turn marginal links into intermittent ones. Quick verify: only one node or one harness batch shows errors; swapping connectors moves the problem.
Rule 4 — Treat shielding and return paths as part of the channel
Why: poor shield bonding increases mode conversion and injects common-mode noise into the PHY. Quick verify: EMC steps trigger link flaps; burst errors correlate with chassis events or wiring reroutes.
Rule 5 — Keep the port environment symmetric (minimize Δparasitics)
Why: asymmetry converts differential energy into common-mode, reducing margin and raising emissions. Quick verify: changing TVS/CMC vendor worsens BER even with the same footprint; errors appear immediately after the change.
Rule 6 — Avoid routing harness near strong noisy bundles when possible
Why: injected common-mode noise can push the channel over the edge and cause periodic retrains. Quick verify: failures appear only in vehicle integration; rerouting changes the error rate without PHY config changes.
Rule 7 — Establish a “baseline harness” and log its identity
Why: without a baseline, “improvements” can be statistical artifacts from changing channels. Quick verify: compare training time + retrain/hour + errors/1k against the baseline on each test station.
Rule 8 — Validate across temperature and supply corners
Why: edge rate and thresholds drift; a reflection timing that is “outside the window” can move inside it. Quick verify: errors only at hot/cold; retrain rises when rails are noisy or slightly low.
Rule 9 — Use a consistent counter denominator (per 1k frames / per minute)
Why: inconsistent windows create false stability/instability conclusions. Quick verify: lock a baseline window Y seconds and compare errors/1k and retrain/hour across runs.
Rule 10 — Change one knob at a time (channel vs PHY tuning)
Why: channel changes and PHY tuning changes look similar in symptoms; mixing them blocks root-cause isolation. Quick verify: keep harness constant while tuning, then keep tuning constant while swapping harness variants.
Risk: added branch / longer spur / different connector stack-up moves reflection timing. Symptom: TRAIN timeouts or repeated convergence attempts. Observable: train_time_ms ↑, train_attempts ↑, retrain events appear immediately after plug-in. Fix: remove hidden branches; enforce stub ≤ X; standardize harness variant and retest on the same baseline.
Case 2 — Only one node is “fragile”
Risk: local connector mismatch or a longer drop line on that node. Symptom: LINK_UP but burst errors under load; retrain spikes on that endpoint. Observable: errors/1k higher on one endpoint; swapping harness/connector moves the failure. Fix: unify connector stack-up; shorten drop; verify port symmetry and shield termination at that node.
Case 3 — Errors appear only at hot/cold
Risk: drift changes edge timing and thresholds so a reflection moves into the effective window. Symptom: NORMAL is stable at room temp but shows periodic burst errors at corners. Observable: burst_flag asserted; errors correlate with temp_c changes; retrain/hour rises at corners. Fix: tighten harness topology; increase margin via port symmetry and controlled shielding; re-baseline across corners.
Case 4 — EMC step triggers link flaps
Risk: mode conversion / common-mode injection pushes the channel over the stability cliff. Symptom: NORMAL → ERROR → RECOVERY loops during specific EMC stimuli. Observable: fault reason codes align with EMC steps; error bursts cluster in time windows. Fix: stabilize shield bonding and return paths; reduce asymmetry; then tune TX shaping with a locked harness baseline.
Case 5 — Protection vendor change makes BER worse
Risk: Δparasitics (Cdiff / mismatch) increases mode conversion and local mismatch at the port. Symptom: immediate shift in errors/1k after BOM change; training becomes slower. Observable: train_time_ms ↑; errors rise even with unchanged harness; unit-to-unit variance increases. Fix: enforce symmetry constraints; qualify parts by channel impact (not only ESD energy); re-run baseline with the same harness.
Case 6 — Retrain storms after harness reroute
Risk: increased coupled noise and altered shield return paths change common-mode behavior. Symptom: ERROR ↔ RECOVERY repeats; link never settles into a stable baseline. Observable: retrain/hour exceeds X; failures cluster when nearby loads switch. Fix: restore routing separation; improve shield bonding strategy; apply a storm guard and re-qualify across events.
Use this map to classify failures before tuning: first lock the harness topology (stubs/branches/connectors), then evaluate PHY settings against a stable baseline channel.
H2-4. Clocking & Latency Hooks inside the PHY (what matters for automotive)
Clocking affects link stability primarily through lock behavior (PLL/CDR) and the repeatability of internal boundaries.
The actionable objective is to expose measurable hooks so training time, burst errors, and latency variation can be correlated to events.
Clock inputs (what to treat as first-class risk)
REFCLK stability: marginal reference clock frequently appears as intermittent training or periodic flaps rather than a clean fail.
Noise coupling path: clock routing and return-plane integrity can inject supply/EMI noise into PLL lock behavior.
Temperature dependency: clock source and internal thresholds drift; correlation fields (temp_c, vdd_mv) must be logged alongside link metrics.
Internal clock domains (minimal model for debugging)
REFCLK → PLL → TX path: transmit shaping and symbol timing rely on PLL stability; lock transitions can change observed error patterns.
Line → CDR → RX path: the receiver CDR must remain locked under harness and noise conditions; lock instability often appears as burst errors.
PCS → MAC IF boundary: monitoring should be anchored at PHY evidence (status/counters), not inferred from higher-layer traffic behavior.
What to measure (hooks that enable repeatable correlation)
Board-level hooks
REFCLK test pad: probeable reference clock point near the PHY (correlate with training outcomes).
Clean return path: preserve reference plane continuity so clock quality reflects the source, not layout artifacts.
PHY observability (MDIO / INT / counters)
Lock states: pll_lock, cdr_lock, and first fault reason code.
Training evidence: train_time_ms, train_attempts, timeout_count.
Stability evidence: errors_per_1k, retrain_per_hour, burst_flag (threshold X placeholders).
Recommended correlation fields (log schema)
Use a stable denominator and always log the same context fields alongside errors:
Diagram — Clock Domain Block (REFCLK → PLL / RX CDR → PCS → MAC IF, with test hooks)
The diagram anchors debugging to PHY evidence: lock states, training time, counters, and a probeable REFCLK point—enabling repeatable correlation across harness and environment.
H2-5. TC10 Sleep/Wake Behavior (low-power with wake robustness)
TC10 power states must be engineered as an observable three-stage contract: Entry (enter reliably),
Monitor (stay asleep without false wakes or storms), and Exit (wake, retrain, and remain stable).
The goal is to prevent false wake, wake failure, and wake oscillation
using PHY-visible hooks (state bits, interrupts, counters) and board-level signals.
TC10 Stage A — Entry (enter TC10 deterministically)
Entry prerequisites (keep it PHY-bound)
Power domain readiness: rails stable before state transition; avoid brownout-induced bounce.
Config completeness: strap/MDIO options consistent with intended TC10 mode; wake enables set.
Wake pin definition: polarity and pull network defined; avoid floating inputs.
Entry success rate: ≥ X% over Y cycles (same harness + environment).
Entry time distribution: entry_time_ms P50 ≤ X and P95 ≤ X (avoid tail risk).
No bounce: tc10_state does not revert within Z ms after entry.
TC10 Stage B — Monitor (stay asleep; stop false wake & storms)
False wake (wake without real intent)
Typical PHY-visible triggers: common-mode injection, WAKE pin glitches, threshold jitter under harness/EMC events. Quick checks: compare wake_reason codes vs WAKE edge count; correlate with temp_c / emc_step. Fix direction: debounce/filter window (X ms), stabilize pull network, reduce asymmetry that increases mode conversion. Pass criteria: false_wake_rate ≤ X/hour in Y-hour soak under corners.
Wake storm (sleep↔wake oscillation)
Definition (placeholder): wake_events_per_minute > X or tc10_toggle_count > X within Y minutes. Observable: repeated INT reasons, retrain_count spikes, train_time_ms does not converge. Fix direction: storm guard (rate-limited re-entry), widen stability window, lock baseline harness before tuning knobs. Pass criteria: storm_events = 0 over Y hours; sleep_residency ≥ X%.
Diagram — TC10 Timeline (Sleep → Wake detect → Link re-train → Normal)
Use three windows as non-negotiable gates: debounce (X ms), training convergence (X ms), and post-wake stability observation (X s) to prevent “wake then fall back” behavior.
Transmit shaping is a set of PHY-level knobs that reduce spectral peaks and manage common-mode behavior, but every knob trades off against link margin.
The disciplined flow is: lock the harness baseline, tune one knob, and verify both
EMI peak and BER margin across corners.
Shaping goals (keep scope to PHY effects)
Lower EMI peaks: reduce radiated hotspots by controlling edges and spectral distribution.
Control common-mode spectrum: limit DM→CM leakage sensitivity and reduce CM-driven emissions.
Preserve link margin: ensure training and stability remain robust across temperature and harness variation.
Knobs (each with effect, side effect, validation, pass criteria placeholders)
Knob — Slew / edge rate
Effect: slower edges reduce high-frequency spectral peaks. Side effect: smaller eye opening under corner harness; training tail can widen. Validate: training_time P95, errors_per_1k, retrain/hour vs EMI peak scan. Pass criteria: EMI_peak_reduction ≥ X and errors_per_1k ≤ X (same baseline).
Knob — Drive strength
Effect: adjusts signal amplitude and spectral energy distribution. Side effect: too strong can worsen emissions; too weak can collapse margin on long/variable harness. Validate: compare errors burst rate vs harness variants; confirm stable retrain/hour. Pass criteria: retrain_per_hour ≤ X and BER margin trend stable across corners.
Knob — Pre-emphasis / De-emphasis / EQ profile
Effect: compensates channel loss; can recover margin on difficult harnesses. Side effect: over-emphasis amplifies noise; can create burst errors and worsen EMI in certain bands. Validate: correlate errors with load and temperature; check EMI peaks with fixed harness baseline. Pass criteria: errors_per_1k ≤ X and EMI_peak ≤ X (no new peak regressions).
Knob — Adaptive EQ (enable/limit/freeze)
Effect: tracks channel variation; improves robustness when harness variance is unavoidable. Side effect: adaptation can drift under noise, producing instability or longer convergence tails. Validate: train_attempts distribution and retrain/hour under EMC stimuli and corners. Pass criteria: train_time_ms P95 ≤ X and storm_events = 0 in Y-hour soak.
Knob — Common-mode control
Effect: shapes CM behavior to reduce CM-driven emissions and sensitivity. Side effect: CM changes can interact with wake detect and marginal channels if asymmetry exists. Validate: monitor false_wake_rate and errors_per_1k before/after CM tuning with identical harness. Pass criteria: CM emission trend improves and false_wake_rate ≤ X/hour.
Knob — Shaping profile selection (freeze after validation)
Effect: provides discrete, repeatable settings across units and stations. Side effect: profiles that pass in one harness may fail in another; baseline lock is mandatory. Validate: corner replay (temp + harness variants) with stable denominators (per 1k / per hour). Pass criteria: EMI and margin criteria both met across all required corner runs.
Trade-off closure (how to avoid “passes EMC but becomes fragile”)
Treat shaping as a repeatable profile: adjust a single knob at a time and freeze the profile only after EMI and margin pass criteria remain stable across corner harness and temperature runs.
Functional safety integration at the PHY is about an auditable evidence chain: what is detected,
how it reacts, and what is logged. Diagnostics should be expressed as
items with clear windows, denominators, and thresholds (X) so system monitors can make deterministic decisions.
Diagnostic coverage map (PHY-visible)
Link integrity: link_state, flap detection, retrain tracking.
Signal quality: SQI/quality indicators and minima over windows.
Error accounting: CRC/frame errors, burst behavior, errors per 1k frames.
Remote fault: remote_fault flags and train_fail codes for correlation.
Latch & safe entry: fault latch state and controlled safe-state paths.
Keep the evidence chain auditable: every decision must have a windowed detection rule and a reproducible snapshot field set.
H2-8. Protection & Survivability at the PHY Port (ESD/surge, CM choke, magnetics note)
Port protection must survive ESD/surge families while preserving differential integrity. The engineering focus is the
stack (CMC / TVS / connector) and the return path.
Low-cap TVS symmetry (Cdiff/ΔC) and CMC placement can improve EMC yet create margin loss and training tails if misapplied.
Keep the protection stack symmetric and the return path short. Any change in TVS/CMC can shift training tails and false-wake behavior; re-validate with the same denominators and windows.
H2-9. Reference Design & Layout (minimum rules that prevent re-spins)
Layout quality determines whether the PHY has stable training margin and repeatable EMC outcomes. The rules below are the
minimum set that prevents common re-spins: differential continuity, correct port/shield handling near the connector,
and power/ground practices that avoid hidden coupling paths.
The 10 hard rules (Rule → Why → Quick verify)
Each rule is written as an executable action with a fast verification step. Thresholds are placeholders (X).
A) Differential pair integrity (routing)
Rule 1 — Keep a continuous reference plane under the pair
Why
Plane cuts force return current detours, raising common-mode energy and shrinking training margin.
Quick verify
Review the pair path for any split/slot crossings. Any unavoidable crossing must include a short, direct return bridge (X rules).
Rule 2 — Match geometry and symmetry before chasing micrometer length match
Why
Over-aggressive meanders add discontinuities and mode conversion; stable symmetry usually beats excessive serpentine length match.
Quick verify
Limit serpentine density and keep both sides mirror-symmetric. Correlate any meander region with error burst hotspots (X).
Rule 3 — Keep via transitions paired, symmetric, and short
Why
Asymmetric transitions create differential imbalance and reflections that show up as longer training tails and intermittent CRC bursts.
Quick verify
Inspect every layer swap: two signal vias with symmetric spacing and a nearby return strategy. Flag any “single-via” deviation.
Rule 4 — Avoid T-branches and minimize board-level stubs
Why
Stub reflections can land inside training/sample windows, turning “link up” into periodic flaps under temperature or harness variation.
Quick verify
Count all branching points and measure any spur length. Target spur length < X (board-level), then re-check train_time_ms P95.
B) Connector & shield (port-near best practices)
Rule 5 — Provide a short, low-impedance shield bond near the connector
Why
High-frequency common-mode energy must return locally; otherwise it couples into board ground and amplifies radiated emissions and error bursts.
Quick verify
Confirm the shield bond loop is short and wide. Any “long trace to ground” around the connector is a red flag.
Rule 6 — Place Y-cap or discharge paths as close as possible to the entry point
Why
A discharge element that is far from the connector turns into an injection path; the loop area becomes the antenna.
Quick verify
Compare loop area for the discharge return. If the return detours around keep-outs or plane gaps, treat as a re-spin candidate.
Rule 7 — Follow the protection-stack priority (TVS/CMC) and do not “push it inward”
Why
Clamping after the energy has already entered the board defeats the purpose and can worsen both survivability and link margin.
Quick verify
TVS should sit near the connector (priority P1). Any relocation requires re-baselining train_time_ms P95 and errors_per_1k (X).
C) Power & ground (noise and coupling control)
Rule 8 — Use a staged decoupling network: near-HF + mid + bulk
Why
PHY internal clocking and transmit shaping are sensitive to supply transients; poor decoupling turns current steps into common-mode disturbance.
Quick verify
Check capacitor placement by loop area (pad-to-via-to-plane). Any long stubs to decaps should be flagged for revision.
Rule 9 — Identify and cut the coupling paths (REFCLK/IO → supply → port)
Why
Many field-only CRC bursts come from internal coupling, not from the harness. Layout must prevent digital noise from modulating the port common-mode.
Quick verify
Mark three likely coupling paths on the layout (X). Validate by correlating error bursts with switching activity and supply noise snapshots.
Rule 10 — Add test hooks without breaking symmetry or return continuity
Why
Test pads, probe grounds, and fixtures can create stubs and common-mode injection, producing misleading “fails” during debug.
Quick verify
A/B compare with and without probes/fixtures: sqi_min and errors_per_1k should not shift beyond X.
Treat the port region as a controlled system: return path, symmetry, and placement priorities decide both robustness and EMC repeatability.
H2-10. Validation & Test Hooks (what to measure, how to pass)
Validation should establish a reproducible baseline, prove stability across corners, and provide traceable evidence fields.
The checklist below is organized as Design gate, Bring-up gate, and
Production gate, with placeholder thresholds (X).
Checklist (Design gate → Bring-up gate → Production gate)
Gate A — Design gate (prevent “built-in” failures)
Must-check
Port stack placement: TVS/CMC priority and short return loops.
Pair continuity: no plane-split crossings, symmetric transitions, minimal stubs.
Supply integrity: staged decoupling and identified coupling paths.
This section maps real vehicle link use-cases to three page mainlines: TC10 wake robustness,
EMC-shaped transmit, and ASIL-grade observability. Only PHY-facing constraints and
pass/fail gates are included—no stack/TSN expansion.
Use-case A · Camera / ADAS edge link
Focus: TC10 + EMC + harness variability
Context
Point-to-point STP/UTP link to a remote sensor module (camera/radar/telematics). Cold-start and frequent sleep/wake cycles
often combine with strict EMI limits and harness/connector variability across trims.
PHY constraints (only PHY-facing)
Wake path sensitivity: TC10 entry/exit windows, wake filter behavior, and wake-source arbitration (local vs remote).
EMI vs margin trade: TX spectral shaping can lower peaks but reduce eye/BER margin under temperature/harness drift.
Burst errors: error bursts often correlate with retrain attempts, CM noise events, or edge-rate shifts.
Design hooks
Profile ID: freeze a per-vehicle “PHY profile” (TC10 + TX shaping + diagnostics set) and log the profile_id with every field event.
Wake de-glitch: place wake filter/debounce close to the PHY wake pin and keep wake ground return short/quiet.
Closed-loop EMC: link pre-scan “hot bands” back to a limited set of TX shaping knobs (avoid uncontrolled tuning).
Pass criteria (placeholders)
Wake success rate ≥ X% across temperature/voltage corners; false-wake ≤ X/hour.
Retrain count ≤ X/day; error_bursts_per_1k ≤ X with defined time window.
EMI peak reduction ≥ X dB versus baseline with no BER regression beyond X.
Example BOM (non-exhaustive, for grounding the discussion)
Use-case B · Zonal (spur-rich harness)
Focus: stubs + false wake + diagnostics
Context
Short branches and topology variants dominate. Field issues often present as “random drops” or “wake storms” that only occur in certain harness builds.
PHY constraints
Reflection timing: spur/stub reflections can land inside the sampling/training window and create repeatable CRC bursts.
Wake ambiguity: CM events can look like wake activity unless wake filtering and grounding are controlled.
Serviceability: without standardized counters and fields, “which branch failed” becomes unanswerable.
Design hooks
Harness rule: enforce stub length < X and document topology_variant per vehicle variant.
Counters schema: define a single window/denominator for error rates (errors_per_1k) and log P50/P95.
Train_time_ms P95 ≤ X across harness variants; retrain/hour ≤ X.
False wake ≤ X/hour in the worst-case CM event environment.
CRC bursts correlate to a single captured topology signature within X minutes of logging.
Example BOM anchors
PHY examples often used in zonal links:
NXP TJA1101B (100BASE-T1)TI DP83TC811R-Q1 (100BASE-T1)Broadcom BCM89810 (100BASE-T1)Microchip LAN8770 (100BASE-T1)
Use-case C · Domain controller (multi-port)
Focus: consistency + logs + isolation of noise paths
Context
Multi-port PHY deployment concentrates clock/power noise and increases the need for identical configuration, identical diagnostics, and repeatable production gates per port.
PHY constraints
Port-to-port drift: identical harnesses can behave differently if REFCLK routing, decoupling, or CM return is asymmetric.
Profile skew: mixed TX shaping / TC10 / counter settings destroy comparability and slow field triage.
Evidence alignment: logs must share the same field schema and time window to compare ports.
Design hooks
Per-port profile control: enforce a single “golden” configuration set and gate any deviations by profile_id.
Noise containment: isolate per-port return and avoid plane cuts under the MDI and protection parts.
Forensics-ready counters: snapshot counters on every wake/retrain/error burst with temp_c and vdd_mv.
All ports meet EMC target with a single shaping profile (no per-port “special tuning”).
Field triage can identify failing port + harness_id + protection_rev within X minutes.
Example BOM anchors
1G multi-port deployments commonly anchor on:
TI DP83TG720S-Q1NXP TJA1121Marvell 88Q2110 / 88Q2112
Use-case Tiles (PHY view)
3 scenes · few words · more elements
Diagram intent: quickly select the matching scene, then apply the same three mainlines (TC10 / EMC shaping / ASIL evidence) with scene-specific gates.
H2-12. IC Selection Logic (metrics → decision tree)
Selection is framed as a decision flow with explicit gates, not a vendor comparison.
The goal is to minimize re-spins by deciding topology/harness first, then EMC strategy, then TC10 wake robustness, then diagnostics/ASIL evidence.
Decision flow (4 steps, each outputs a gate)
Step 1 · Topology / harness
Decide trunk vs spur-rich, expected harness variants, and connector environment.
Gate outputs: stub_len < X, return_loss_margin ≥ X, topology_variant logged.
Step 2 · EMC strategy
Require a controllable set of TX shaping knobs and a measurable EMC closure loop.
Gate outputs: shaping_range ≥ X, CM control mode available, no uncontrolled “field tuning”.
Step 3 · TC10 sleep/wake robustness
Confirm entry/monitor/exit behavior, wake sources, and false-wake suppression.
Gate outputs: wake_success ≥ X%, false_wake ≤ X/hour, retrain_after_wake ≤ X.
Purpose: close long-tail troubleshooting without expanding scope. Each FAQ uses a fixed, auditable 4-line format and shares the same metric/log schema.
Shared metric schema (used in every Pass criteria)
Core metrics
errors_per_1k: PHY-layer error events per 1000 frames (window = W seconds, fixed across ports/vehicles).
burst_count: number of bursts where ≥ M errors occur within T ms (burst definition is fixed).
retrain_count: retrain attempts per hour, and retrain_after_wake within S seconds after wake.
train_time_ms P95: 95th percentile bring-up time across N cycles (captures tail failures).
▸ TC10 enters sleep, then sporadic “wake storm” — check wake source or harness common-mode events first?
Likely cause: wake_src is truly asserted (legit remote/local wake) or CM disturbance is interpreted as wake activity (ground/shield/CMC/return path).
Quick check: correlate each wake with wake_reason + IRQ cause and the same-window burst_count / errors_per_1k; verify harness_id/batch_id pattern.
Fix: tighten wake filter/debounce (bounded change), then harden CM return (shield bond/Y-cap location/CMC placement) and freeze a single profile_id for TC10+filters.
Pass criteria: false_wake ≤ X/hour, wake storm disappears under worst CM events; burst_count ≤ X in window W.
▸ Remote wake success rate is low — wake window/debounce or power domains not ready?
Likely cause: wake detect window is too narrow/filtered out or VDD/REFCLK/strap readiness is late so wake triggers retrain failure.
Quick check: timestamp sequence: wake_detect → rails_ready → refclk_stable → retrain_start; flag cases where rails_ready arrives after wake window (S seconds).
Fix: adjust wake window/debounce (bounded), then gate wake exit on rails_ready/refclk_stable; keep reset/strap deterministic across all ports.
Pass criteria: wake_success ≥ X% across temp/voltage corners; retrain_after_wake ≤ X within S seconds.
▸ Link meets BER/CRC targets but EMI fails — tune TX shaping first or check CM return/CMC placement first?
Likely cause: TX shaping profile is not aligned to harness resonance or CM return path is uncontrolled (shield bond/return discontinuity/CMC location).
Quick check: lock harness_id and environment, then A/B one shaping knob at a time; if EMI hot-band moves with shaping, start there—if not, suspect return/CMC placement.
Fix: define a bounded knob set (slew/drive/EQ/CM control) → freeze profile_id; if still failing, rework CM return (short, continuous, near connector).
Pass criteria: EMI hot-bands meet target with fixed profile_id; errors_per_1k ≤ X and no burst_count regression beyond X.
▸ Increasing drive improves errors but worsens EMI — how to balance with a single metric set?
Likely cause: objective function is undefined—drive raises eye margin but spikes spectral peaks; “good BER” and “good EMI” are not gated together.
Quick check: use a 3-metric gate: errors_per_1k (window W) + EMI_peak at hot-band + retrain_count/hour; compare profiles by profile_id only.
Fix: bind drive to shaping/EQ as one profile, set hard upper bound for drive, and require EMI+retrain gates before accepting any “BER improvement”.
Pass criteria: errors_per_1k ≤ X, EMI_peak ≤ X, retrain_count/hour ≤ X under the same harness_id and corners.
▸ Training becomes sporadically failing after TVS change — suspect Cdiff or ΔC mismatch first?
Likely cause: TVS adds too much differential capacitance (Cdiff) or pair mismatch (ΔC) creates mode conversion that breaks training margin.
Quick check: A/B compare old vs new TVS on the same board/harness; flag if failures correlate with direction, temperature, or only certain batch_id.
Fix: revert to lower-C/low-ΔC TVS, keep placement symmetric and as close as possible to the connector with a controlled return path.
Pass criteria: train_success ≥ X%, train_time_ms P95 ≤ X, retrain_count/hour ≤ X across corners.
▸ Adding CMC improves EMI but sporadic link drops appear — check DM insertion loss or CM saturation first?
Likely cause: CMC adds differential-mode loss/bandwidth limitation or saturates under certain CM bias/events, causing nonlinear bursts.
Quick check: correlate drops with burst_count + temp_c/vdd_mv; if failures cluster with high-current events or temperature rise, suspect saturation; otherwise suspect insertion loss.
Fix: select CMC validated for T1 bandwidth, place per reference design, and avoid pairing “aggressive shaping + marginal CMC” (freeze safe profile_id).
Pass criteria: EMI target met and retrain_count/hour ≤ X; burst_count ≤ X in W; errors_per_1k ≤ X.
▸ Link is stable at low temp but drops at high temp — TX amplitude drift or power-noise coupling first?
Likely cause: temperature shifts TX edge/level enough to reduce margin or supply/ground noise coupling increases at high temp (decoupling/return weakness).
Quick check: plot temp_c vs train_time_ms P95 and errors_per_1k for fixed harness_id/profile_id; if VDD variation co-moves with errors, suspect power coupling.
Fix: choose a temperature-robust shaping profile, then harden local decoupling and return continuity near the PHY and MDI network.
Pass criteria: failing batches identified within X minutes of logs; replacement batch restores errors_per_1k ≤ X and retrain_count/hour ≤ X.
▸ Error counters spike but scope looks “fine” — metric window/denominator mismatch or burst mechanism?
Likely cause: counter is interpreted with the wrong window/denominator or errors occur as short bursts that the scope never triggered on.
Quick check: align the counter window W to scope trigger capture; check burst_count with a fixed definition (≥M errors within T ms) and snapshot at every burst.
Fix: standardize metric definitions (errors_per_1k, burst_count) and add burst-triggered captures/log snapshots tied to profile_id and harness_id.
Pass criteria: errors_per_1k ≤ X and burst_count ≤ X in W; no unexplained spikes after definition unification.
▸ ASIL diagnostic coverage is weak — which PHY-side evidence fields are most often missing?
Likely cause: evidence chain breaks because mandatory PHY observables are not logged consistently per event (wake/retrain/burst).
Quick check: for each event type, verify presence of: profile_id, wake_reason, train_time_ms, errors_per_1k, burst_count, retrain_count, temp_c, vdd_mv (completeness audit).
Fix: define an evidence schema + snapshot timing rules (on wake, on retrain, on burst), then connect to MCU safety monitor → DTC/log → safe state triggers.
Pass criteria: evidence completeness ≥ X% and event-to-DTC latency ≤ X ms; safe-state action taken within X when required.
▸ After wake, immediate repeated retraining occurs — retrain thresholds or over-aggressive EMC shaping?
Likely cause: retrain trigger threshold is too sensitive or EMC shaping reduces margin below what wake transient conditions require.
Quick check: inspect retrain_after_wake within S seconds and compare profiles by profile_id; if retrains cluster only on one aggressive profile, suspect shaping.
Fix: relax retrain thresholds within bounded limits and revert to a known-safe shaping profile for wake transitions; lock the wake profile_id.
Pass criteria: retrain_after_wake ≤ X, wake_success ≥ X%, errors_per_1k ≤ X in the first W seconds post-wake.
▸ Multi-port PHY: only one port is fragile — layout/return asymmetry or parameter inconsistency first?
Quick check: audit profile_id equality across ports first; then compare port-to-port deltas for train_time_ms P95 and errors_per_1k under the same harness_id and window W.
Fix: enforce identical profile_id across ports; if fragility remains, correct return continuity and protection symmetry (placement + shortest CM return).
Pass criteria: port-to-port delta ≤ X (train_time_ms P95, errors_per_1k, retrain_count/hour); no single-port outliers across corners.