Ethernet Controllers / MAC-PHY: RGMII, SGMII, Offload & WOL
← Back to:Interfaces, PHY & SerDes
This page teaches how to select and bring up Ethernet controllers / MAC-PHY from the MAC-side: pick the right host interface, stabilize DMA/queues, validate offloads, and make Wake-on-LAN reliable with measurable pass criteria.
It focuses on fail-fast debugging and production-ready configuration (counters, loopback isolation, strap/EEPROM/driver precedence), without expanding into analog PHY, magnetics/ESD, or TSN switching.
Scope & System View: What this MAC-PHY Controller Page Solves
This page focuses on the controller-side Ethernet endpoint: host interfaces (RGMII/SGMII and similar), DMA/buffering, feature offloads, and Wake-on-LAN—so bring-up reaches “link + packets + stable throughput” without drifting into PHY analog, magnetics, or TSN switching.
Page boundary (hard rule)
This page covers
- Host-side interfaces: RGMII/SGMII (timing, mode, status)
- Packet path: DMA rings/descriptors, buffering, interrupts/polling strategy
- Offloads: checksum, segmentation (if present), VLAN filtering hooks
- Low-power: Wake-on-LAN arming, pattern/magic handling, wake signaling
- Bring-up + field diagnostics: counters, loopback isolation (MAC/PCS-side)
This page does NOT cover
- PHY analog line performance, eye/return-loss tuning, or cable/magnetics design
- ESD/surge component selection and compliance test details
- TSN switch scheduling (802.1AS/Qbv/Qbu) or industrial protocol stacks
Two common deployment patterns
1) SoC/MCU MAC + external PHY
Most failures are controller-side: RGMII/SGMII mode and timing, reset/strap correctness, DMA starvation, offload toggles, and WOL bring-up. Line-side analog and magnetics stay outside this page boundary.
2) External MAC-PHY controller (SPI/USB/PCIe ↔ Ethernet)
Critical factors shift toward host-bus throughput, driver maturity, EEPROM/strap configuration, WOL wake path, and counter-driven field isolation. The practical target is stable packets under load, not just “link up”.
Expected outcome (what “done” means)
- Bring-up: link established, packets pass bidirectionally, counters stay clean
- Performance: sustained throughput without ring overruns or CPU interrupt storms
- Low-power: WOL arms reliably and wakes the host with bounded latency
Pass criteria (placeholders)
- Link up within X ms after reset release
- Sustained throughput ≥ X with overrun/underrun = 0
- WOL wake latency ≤ X ms from packet arrival
Diagram note: the dashed scope box intentionally stops before magnetics/RJ45 to prevent cross-topic expansion.
What is a MAC-PHY Controller: Practical Block Ownership (MAC vs PCS vs PHY)
A MAC-PHY controller is best treated as a host-facing Ethernet endpoint: it exposes a host interface and implements MAC-side packet handling (DMA, filtering, statistics) with optional tightly-coupled PCS/PHY functions. The engineering value comes from clear ownership: which block explains a symptom, and which register/counter is the first probe point.
Block ownership map (use for fast isolation)
MAC (packet plane)
- Responsibilities: frame Tx/Rx, address filtering, CRC generation/check, flow control (PAUSE), statistics counters, DMA ownership
- Typical symptoms: throughput collapses, CPU interrupt storms, drops under burst, Rx overrun/Tx underrun
- First probe: descriptor/ring health + MAC counters (overrun/underrun, dropped, FCS error rate)
PCS (interface + link-status plane)
- Responsibilities: interface coding/status (e.g., SGMII PCS), auto-negotiation status reporting, link mode mapping
- Typical symptoms: link up but wrong speed/duplex, one-way traffic, unstable mode switching
- First probe: PCS status (in-band link/speed/duplex) + host interface mode pins/straps
PHY (line-side) — boundary reminder
- Responsibilities: cable/line interaction, energy detect, analog front-end behavior, line-side robustness
- When to exit this page: magnetics/ESD/surge events, analog margining, link quality vs cable/EMI
- Action: keep this page on controller-side checks; hand off to Ethernet PHY / protection pages for line-side analysis
Minimal glossary (only what is needed for this page)
- MII family: host↔MAC/PHY interface families; this page focuses on RGMII/SGMII bring-up behavior
- Offload: hardware assistance for checksum/segmentation/VLAN that changes “who computes what” in the packet path
- Counters first: prefer counter-driven isolation before tuning complex features
Diagram note: ownership is the goal—map symptoms to MAC counters, ring state, and PCS status before changing features.
Host-Side Interfaces: RGMII vs SGMII Selection Logic (Bring-up Ready)
Interface selection is a board-level decision driven by timing margin, routing complexity, and debug visibility. This section stays on the digital boundary (MAC/PCS and host interface behavior) and avoids PHY analog and magnetics depth. The goal is a deterministic bring-up path: correct mode, correct status, and clean counters before performance tuning.
Decision output (what to decide here)
- Choose RGMII when short routing and low complexity can preserve DDR timing margin with correct delay strategy
- Choose SGMII when long routes, connectors, or noise risk requires a serial PCS-based interface with status-driven bring-up
- Bring-up must confirm: speed/duplex/link-state consistency with the peer before any feature enablement
RGMII engineering model (source-synchronous DDR)
- What it is: 4-bit DDR data + clock; sampling margin depends on clock-to-data skew control
- Margin driver: the effective clock delay path (internal delay, external trace, or both) must match the device expectation
- Common failure pattern: link appears up but CRC/FCS errors rise under load or temperature
Bring-up checks (RGMII)
- Confirm link partner expected mode (1G/100M and duplex) matches the controller configuration
- Verify TX/RX delay enable is set on the correct side (TX delay, RX delay, or both per datasheet)
- Monitor frame integrity counters: CRC/FCS errors should remain near zero over a fixed window
- If errors appear: reduce rate (or force a simpler mode) as an isolation step, then re-check delay settings
Pass criteria (placeholders): CRC/FCS error rate ≤ X over X seconds at target load.
SGMII engineering model (serial PCS + status alignment)
- What it is: a serial lane to a PCS layer; routing is simpler, but mode/state must be consistent end-to-end
- Bring-up axis: decide auto-negotiation vs fixed rate, and whether in-band status is required
- Common failure pattern: link up but wrong speed/duplex, or one-way traffic due to strategy mismatch
Bring-up checks (SGMII)
- Confirm both ends agree on AN enabled or fixed 1G/100M
- If using in-band status: validate the PCS status fields reflect the negotiated speed/duplex consistently
- If fixing rate: force the same mode on the peer to avoid hidden negotiation asymmetry
- Validate counters: frame errors should not climb as throughput increases
Pass criteria (placeholders): reported speed/duplex matches peer, and frame error counters remain ≤ X.
Selection rules (fast heuristic, not marketing)
- Long routes / connectors / noisy environment: prefer SGMII to reduce multi-signal skew sensitivity
- Short routes / minimal layers / cost pressure: RGMII is viable if delay strategy is explicitly designed and validated
- Debug phase: lock configuration first (fixed mode), then introduce negotiation and power-save features later
Clocking & Reset Sequencing: Power Rails, Strap Sampling, and Link Start Order
Bring-up failures frequently originate before packet traffic exists: rails are not stable, clocks are not stable, or straps/EEPROM are sampled at the wrong time. The engineering objective is a deterministic sequence: rails stable → reference clock stable → strap sample window → reset release → link training starts.
Engineering sequence (minimum required ordering)
- Core rails reach regulation and remain stable (no brownout within the sampling window)
- I/O rails reach regulation and remain stable (interface pins are valid)
- Reference clock is present and stable (frequency stable; “clock-good” if available)
- Straps/EEPROM are sampled during the defined window (mode, default speed, optional WOL/LED modes)
- Reset is deasserted after all above conditions are met; link training should begin within a bounded time
Strap / EEPROM sampling (what it commonly decides)
- Interface mode selection (e.g., RGMII vs SGMII)
- Default speed/duplex policy (auto-negotiation enabled vs fixed mode)
- Optional behavior pins (LED/WOL mode selection) — only verify sampling here; detailed WOL is handled elsewhere
Reset pitfalls (common root causes)
- Reset deasserted too early: straps sampled incorrectly → wrong mode/speed policy
- Clock not stable at start: MAC/PCS does not initialize cleanly → unstable state or no link start
- Multiple reset sources unsynchronized: POR, external reset, watchdog reset cause partial-domain mismatch
Pass criteria (placeholders; define per datasheet)
- tCLK_STABLE > X ms (measure at clock-good or clock output)
- tRESET_DEASSERT after rails stable > X ms (measure at reset_n pin)
- tSTRAP_SAMPLE window = X µs (confirm strap pins are valid during this window)
- tLINK_START < X ms after reset (observe PCS/link status transition)
Packet Path & DMA Model: Root Causes of Low Throughput, Drops, and High CPU
Throughput, latency, and loss behavior is dominated by ownership handoff (HW↔SW control of descriptors and buffers) and backpressure points (where rings fill or starve). The fastest isolation path is counter-driven: identify whether the bottleneck is Rx ring saturation, Tx ring starvation, or interrupt/poll scheduling overhead.
Packet path (controller-side view)
- Rx: DMA writes frames into Rx buffers → flips descriptor ownership → SW consumes → returns buffers to the ring
- Tx: SW posts descriptors → DMA reads Tx buffers → transmits → returns completion to free ring slots
- Backpressure: drops happen when Rx buffers are not replenished fast enough, or when Tx descriptors cannot be posted/recycled
Interrupt vs polling (engineering decision)
- Prefer interrupts for sparse traffic, low average rate, and power-first systems (avoid constant CPU wakeups)
- Prefer polling/batching for sustained high throughput, latency stability, and to avoid interrupt storms under burst load
- Symptom hint: “low throughput + high CPU” often indicates excessive interrupt rate or too-small batching thresholds
Three dominant bottlenecks (actionable patterns)
Bottleneck A — Rx ring too small (burst drop)
- Likely cause: Rx buffers/descriptors exhausted during bursts; SW cannot replenish fast enough
- Quick check: rx_dropped / overrun increments correlated with traffic spikes
- Fix: increase ring depth (or buffer pool), enable batching/polling, reduce per-packet overhead
- Pass criteria: burst test shows rx_dropped = 0 over X minutes at target load
Bottleneck B — DMA coherency / cache-line alignment mismatch
- Likely cause: incorrect cacheability attributes, missing invalidate/clean operations, misaligned buffers/descriptors
- Quick check: “random” corruption/drops without strong link counter evidence; behavior changes with CPU load
- Fix: enforce cache-line alignment, correct DMA mapping, use non-cacheable or coherent memory region as required
- Pass criteria: sustained test shows zero data integrity errors and stable counters at X throughput
Bottleneck C — interrupt storm (CPU high, rings not serviced)
- Likely cause: per-packet interrupts, too-low moderation, or insufficient batching thresholds
- Quick check: CPU spikes with high IRQ rate while throughput stays low; tx_underrun may appear during heavy Tx
- Fix: enable polling/batching, tune interrupt moderation, enlarge rings to absorb bursts
- Pass criteria: CPU utilization ≤ X% at target throughput with stable ring occupancy
Quick check (5-minute isolation checklist)
- Snapshot counters at idle, then after a fixed load window: rx_dropped, overrun, tx_underrun
- Check ring health: descriptor recycle rate and occupancy trend (flat vs saturating)
- Correlate with IRQ/poll rate: confirm whether CPU time is spent servicing events or moving payload
- Only after rings are stable: toggle offloads one-by-one (covered in the next section)
Boundary note: this section isolates controller-side packet handling. Line-side analog/magnetics issues belong to PHY/protection pages.
Offload Features: When to Enable Checksum, TSO/LRO, and VLAN
Offloads improve throughput by shifting work from software to hardware, but they also change who computes or edits packet fields and where packets become observable. A reliable strategy is to stabilize the ring model first, then enable offloads one-by-one with counter-based pass criteria.
Toggle strategy (stabilize → optimize)
- Stabilize: disable complex offloads; confirm ring health and counters remain stable under load
- Enable checksum: verify rx_csum_err == 0 over a fixed window
- Enable segmentation: TSO/GSO only after checksum passes; confirm throughput improves without new drops
- Enable aggregation/filtering: LRO/VLAN filter last; validate latency/observability trade-offs
Pass criteria (placeholders): iperf3 throughput > X and rx_csum_err == 0.
Feature blocks (what each changes)
Checksum offload (Rx/Tx)
- Scope: may cover IPv4/IPv6 + TCP/UDP depending on implementation
- Common pitfall: capture points may show “bad checksum” when hardware fills fields late
- Quick check: A/B toggle checksum offload and compare rx_csum_err trend
- Pass criteria: rx_csum_err == 0 at target load for X minutes
TSO/GSO and LRO (throughput vs observability)
- TSO/GSO: shifts segmentation to hardware/software layers to reduce CPU per-byte cost
- LRO: aggregates received packets for efficiency; can affect latency profiles and fine-grain capture fidelity
- When to disable: debugging, latency-sensitive traffic, or when precise packet-level observability is required
- Pass criteria: throughput improves to ≥ X with no new drops and acceptable latency (≤ X)
VLAN tag handling and filtering
- Tag operations: insert/strip/forward VLAN tags depending on mode
- Common pitfall: filters can silently drop frames if rule ownership is misconfigured
- Quick check: start with VLAN pass-through (no filtering), then add rules incrementally
- Pass criteria: expected VLAN traffic passes with drop counters == 0 over X minutes
Wake-on-LAN & Low-Power Path: Always-On Listen → Match → WAKE Assert
Wake-on-LAN is a chain, not a checkbox. Reliable wake requires an Always-On (AON) path that remains powered and (if required) clocked in low-power state, a deterministic match engine (magic packet or pattern), and a clean PME/WAKE assertion that the host power manager accepts. Debug must prove which stage fails: Armed → Packet seen → Pattern hit → WAKE edge.
WOL chain (three segments, two must-have conditions)
- Listen (low-power Rx): minimal receive path stays alive to observe relevant frames
- Match: magic packet or pattern engine decides whether a wake event should be generated
- Assert: PME/WAKE pin or internal wake event crosses domains and triggers host wake
- Must-have #1: WOL logic must be in an AON power domain (and clocked if required)
- Must-have #2: match rule must align with actual traffic format (VLAN/IPv6/encapsulation can break offsets)
Magic packet vs pattern match (engineering differences)
Magic packet (compatibility-first)
- Simpler rule set; commonly supported across controllers
- Lower risk of offset/mask mismatch
- Best for early bring-up and cross-platform verification
Pattern match (flexibility-first)
- Precise triggers (protocol/port/payload signature) but easy to misconfigure
- Offsets can shift with VLAN tags, IPv6 headers, tunneling, or driver-side packet shaping
- Must verify with hit counters and a known-good generator toolchain
Failure tree (prove which stage fails)
Step 1 — WOL armed?
- Likely cause: driver did not enable WOL, or low-power entry cleared WOL state
- Quick check: readback WOL_ARMED status (register/flag)
- Fix: apply WOL enable in the final pre-sleep stage and confirm it persists into low-power mode
Step 2 — Packet seen in low-power?
- Likely cause: low-power mode disabled the minimal Rx listen path
- Quick check: PACKET_SEEN_COUNT increments (or low-power Rx event flag)
- Fix: adjust low-power policy to keep the required listen path enabled (AON domain)
Step 3 — Pattern hit?
- Likely cause: mismatch in rule/offset/mask; VLAN or encapsulation shifts headers
- Quick check: PATTERN_HIT_COUNT increments; compare magic vs pattern behavior
- Fix: start with magic packet; then add pattern rules incrementally and validate each hit counter
Step 4 — PME/WAKE asserted?
- Likely cause: WAKE pin wiring/polarity/pull is incorrect; wake event not latched across domains
- Quick check: observe WAKE edge at pin or a host GPIO sampling point
- Fix: confirm pin mux, polarity, pull network, and wake latch configuration
Step 5 — Host wake accepted?
- Likely cause: host sleep state is too deep or wake source is not mapped/enabled
- Quick check: host power manager wake-source log (platform-specific)
- Fix: verify wake-source mapping and supported sleep state for network wake
Quick check (minimum evidence fields) + pass criteria
- Log fields: WOL_ARMED, PACKET_SEEN_COUNT, PATTERN_HIT_COUNT, PME/WAKE_EDGE_COUNT
- Pass (latency): packet → wake latency < X ms (measure from packet_seen event to WAKE edge)
- Pass (integrity): pattern_hit implies WAKE asserted within X ms (no lost wake)
- Optional pass (robustness): false wake rate < X/day
Driver / EEPROM / Straps Configuration: Priority, Readback, and Stable Mode
Stable bring-up depends on knowing which configuration source wins and proving it via readback. The priority chain is typically: Strap (latched at boot) → EEPROM/NVM (loaded at init) → Driver override (runtime). Debug should treat “effective mode” as the single source of truth.
Configuration priority model (what overrides what)
- Strap: sampled at boot/reset; defines default personality (interface mode, default policy)
- EEPROM/NVM: loads persistent fields (MAC address, LED/WOL mode, interface options) if enabled
- Driver override: last layer; can override defaults and must be audited for reproducibility
Typical fields to persist (keep within this page boundary)
Identity (must be stable)
- MAC address source and programming policy (factory vs field)
- Per-port allocation rule (placeholder) when multiple ports exist
Bring-up critical (wrong → no link / errors)
- RGMII delay mode (TX/RX delay enable as required)
- SGMII policy (auto-negotiation vs fixed rate; in-band status enable)
Behavior (verify enablement via readback)
- LED mode selection (verify correct strap/NVM field applied)
- WOL enable (verify effective WOL state persists into low power)
Debug flow (readback-first, reproducible)
- After boot: read strap-latched register(s) to confirm sampled personality
- After NVM load: read NVM/EEPROM status (load OK, CRC/signature OK if available)
- After driver init: read effective mode registers (RGMII/SGMII policy, WOL enable, LED mode)
- Record: store readback fields in bring-up logs for regression and production correlation
Pass criteria (placeholders)
- MAC addr stable across reset == true
- Effective mode stable across cold boot == true
- Readback mismatch count ≤ X over X cycles
Reliability & Field Diagnostics: Counters → Rates → Loopback → Isolation
Field failures are solved fastest by evidence ordering: choose the right counter group, watch growth rate (delta per minute), then use loopback to isolate host-side vs link-side. This chapter stays on controller/PCS observability and avoids analog PHY deep dives.
Counter groups (map symptoms to evidence)
Group A — Frame integrity (are frames corrupted?)
- FCS/CRC errors (FCS_err)
- Alignment / malformed frame flags (if exposed)
- Error rate rule: use delta per minute, not total count
Group B — Pressure & buffering (are rings/DMA overwhelmed?)
- Missed frame / dropped
- Overrun / no-buffer / ring overflow (rx_overrun)
- Interpretation: high small-packet load + rising overrun often points to host-side pacing
Group C — Flow control (is PAUSE shaping throughput?)
- PAUSE frames rx/tx (pause_rx, pause_tx)
- Symptom mapping: sawtooth throughput + increasing pause counters → congestion policy is a prime suspect
- Action rule: confirm rate vs mode changes before tuning offload features
Field logging (rate-first, correlation-friendly)
- Sampling: snapshot counters every X seconds and compute delta per minute
- Correlate: compare deltas across load (idle vs stress), temperature (ambient vs hot), and power policy transitions
- Minimum context: throughput, packet size mix (large vs small), current power mode (sleep/EEE/WOL armed)
Loopback as an isolation knife (host-side vs link-side)
MAC loopback (controller-focused)
- Cuts off external link while testing host ↔ MAC ↔ DMA integrity
- If errors persist in MAC loopback, prioritize driver/DMA/ring pacing and memory mapping
PCS loopback (digital link logic, no analog deep dive)
- Keeps digital link logic in path while excluding far-end traffic variability
- If PCS loopback is clean but external link errors exist, escalate to link-side topics in sibling pages
Pass criteria (placeholders with windows)
- FCS_err_rate < X / hour over X hours at throughput > X% line-rate
- overrun_delta == 0 over X minutes during small-packet stress
- pause_frame_rate within expected policy bounds (no unexplained bursts)
Bring-up Checklist: Design → First Light → Performance → Production
A controller bring-up should behave like a reproducible SOP. Each phase lists actions, evidence (readbacks/counters), and pass criteria (placeholders). Keep configuration changes single-layer at a time and log results for regression.
Phase 1 — Design review (avoid non-bring-upable boards)
- Interface: lock decision (RGMII/SGMII) and required strap defaults
- Clock/reset: ensure stable reference and reset timing windows (placeholders)
- Config paths: strap/EEPROM/driver plan; define single source of truth per phase
- Debug hooks: MDIO access, strap readback point, WAKE probe point, basic test pads
Phase 2 — First light (ID → mode → link up → basic traffic)
- Read ID/version: confirm management bus access and register map sanity
- Apply mode: set RGMII delays or SGMII policy; read back effective mode
- Link up: verify speed/duplex match the peer (avoid ambiguous AN states)
- Basic traffic: ARP/ping baseline; ensure no immediate FCS/overrun deltas
Phase 3 — Performance (run full rate without dirty counters)
- Throughput: iperf stress with large packets, small packets, and bidirectional traffic
- Evidence: counter deltas stay clean (FCS, overrun, missed, pause rate is explainable)
- If failing: return to “counters → rates → loopback” isolation before changing features
Performance pass criteria (placeholders)
- iperf3 throughput > X at X% line-rate
- FCS_err_rate < X/hour over X hours
- overrun_delta == 0 over X minutes
Phase 4 — Power/WOL + Production hooks (repeatable in factory)
- Sleep/WOL: sleep → magic/pattern → wake → link restore; log armed/hit/wake edges
- Identity: persist MAC address and serial policy; verify stable across cold boot
- Self-test: loopback + counter snapshot in < X seconds (station-friendly)
- Station output: store PASS/FAIL with counter deltas and effective-mode readback
IC Selection Logic (controller/MAC-side only)
Selection is treated as a decision path, not a parameter dump: host interface gate → throughput/CPU gate → offload set → WOL path → software ecosystem. Scope is strictly the controller/MAC-side (interfaces, DMA/queues, offload, WOL, driver/tooling). Analog PHY/magnetics/TSN switching are excluded.
Selection boundary & expected outputs
- Allowed dimensions: host interface, DMA/ring capability, offload set, WOL support path, driver/tool maturity, power states
- Excluded: analog PHY tuning, magnetics selection, ESD/surge networks, TSN switching features
- Output: one interface choice + must-have feature list + minimal bring-up/production evidence list
Gate checklist (fail-fast filters)
Gate 1 — Host interface compatibility
- RGMII/SGMII: used when host already exposes MAC-side lanes; selection is dominated by bring-up risk and debug readiness
- PCIe/USB/SPI: used when host lacks native Ethernet MAC lanes or needs an external network port module
- Evidence: confirm driver support on target OS and confirm management/register access path exists
Gate 2 — Throughput & CPU budget
- Target line-rate: 10/100 vs 1G vs 2.5G must match actual system need
- Small-packet sensitivity: high PPS workloads demand stronger DMA/queue design (not just peak Mbps)
- Evidence: iperf + counter deltas (FCS/overrun) stay clean at throughput > X% line-rate (placeholders)
Gate 3 — Must-have offload & WOL
- Offload floor: checksum offload (Rx/Tx) is a baseline requirement for CPU containment
- Optional accelerators: TSO/LRO/VLAN filtering become score items after stability is proven
- WOL floor: confirm WOL arming + pattern hit + wake assertion observability (registers/status), not marketing labels
Scorecard (compare survivors without crossing PHY/TSN scope)
DMA / buffering strength (burst absorption)
- Max descriptors: supports X Rx and X Tx descriptors (placeholder)
- Queues: supports X queues (placeholder) if traffic separation is needed
- Interrupt mitigation: supports coalescing/polling mode to avoid interrupt storms
Software ecosystem (risk & time-to-stable)
- Driver maturity: stable on target OS/kernel versions; known issues are bounded
- Config tooling: strap/EEPROM/OTP programming flow is scriptable for production
- Observability: exposes counters + loopback + status readbacks for field diagnostics
Power & WOL readiness (placeholders)
- Active: < X mW (placeholder) at target link speed
- Idle: < X mW (placeholder) with link idle
- WOL armed: < X mW (placeholder) + wake latency < X ms (placeholder)
Concrete material numbers (examples by host interface)
Part numbers below are commonly used reference ICs for this topic. Always verify package, temperature grade, suffix, and driver support on the exact target platform.
SPI Ethernet controllers (embedded/low-pin-count ports)
- WIZnet W5500 — SPI Ethernet controller with integrated TCP/IP offload (fit for MCU-class hosts)
- WIZnet W5100S — SPI Ethernet controller family option (verify offload/feature set vs W5500)
- Microchip ENC28J60 — 10/100 Ethernet controller (SPI), widely used in cost-sensitive embedded designs
- Microchip KSZ8851SNL — 10/100 Ethernet controller with SPI host interface (common in industrial embedded ports)
- Davicom DM9051 — SPI Ethernet controller option (verify driver availability and feature coverage)
Selection note: SPI ports are often dominated by PPS/latency and driver overhead. Treat checksum/offload availability and buffer depth as primary score items.
USB to Ethernet controllers (external ports / dongles / gateways)
- Microchip LAN7800 — USB 3.x to Gigabit Ethernet controller (commonly supports WoL; verify platform power policy)
- Microchip LAN7850 — USB 3.x to Gigabit Ethernet controller family option (verify tooling/OTP flow)
- Realtek RTL8153B — USB 3.x to Gigabit Ethernet controller (widely deployed; WoL support depends on OS/driver policy)
- ASIX AX88179 — USB 3.0 to Gigabit Ethernet controller (commonly offers WoL features; verify driver stack behavior)
- ASIX AX88772B — USB 2.0 to 10/100 Ethernet controller option (cost/legacy fit)
Selection note: USB solutions are sensitive to small-packet PPS and host USB power management. Confirm suspend/resume stability and WoL behavior early.
PCIe Ethernet controllers (PC-class hosts / higher PPS stability)
- Intel I210-AT — PCIe Gigabit Ethernet controller (enterprise/industrial class; verify exact feature set per stepping)
- Intel I211-AT — PCIe Gigabit Ethernet controller family option (platform-dependent integration)
- Intel I225-V — PCIe 2.5GbE controller option (verify revision/driver compatibility on target OS)
- Realtek RTL8111H — PCIe Gigabit Ethernet controller (common on embedded x86 boards; verify driver choice and WoL policy)
- Realtek RTL8125B — PCIe 2.5GbE controller option (throughput headroom; verify thermal/power budget)
Selection note: PCIe controllers usually win on PPS and CPU containment. The dominant risk shifts to driver maturity, platform power states, and WoL integration.
Selection pass criteria (placeholders)
- iperf3 throughput > X with clean counter deltas (FCS_err_rate < X/hour)
- overrun_delta == 0 over X minutes under small-packet stress
- WOL: armed==true, pattern_hit==true, wake_asserted==true; latency < X ms
- Ecosystem: stable driver on target OS + scriptable configuration path (strap/EEPROM/OTP)
Recommended topics you might also need
Request a Quote
FAQs (MAC-PHY controller scope): actionable checks with pass thresholds
Each answer is intentionally short and executable. Scope is controller/MAC-side only: host interface (RGMII/SGMII/PCIe/USB/SPI), DMA/rings, offloads, WOL, counters, loopback isolation, and config precedence.
RGMII link is up, but throughput is poor — TX/RX delay or rings too small?
Likely cause: RGMII TX/RX internal delay/edge selection is mismatched (silent retries/corruption), or DMA rings are too shallow causing burst drops.
Quick check: Read back effective RGMII delay mode; compare counter deltas FCS_err_rate vs rx_overrun_rate (per minute); run A/B with large vs small packets.
Fix: Make delay configuration unequivocally correct first, then increase RX/TX descriptors (one variable per iteration).
Pass criteria: iperf3_throughput > X and FCS_err_rate < X/hour and rx_overrun_rate < X/min over X minutes.
SGMII link is up, but intermittent one-way traffic — in-band status or PAUSE flow control?
Likely cause: In-band status / autoneg state interpretation is inconsistent, or PAUSE is throttling one direction under congestion.
Quick check: Read back SGMII mode (AN, in_band, fixed rate); log pause_rx/pause_tx delta and compare with unilateral throughput tests.
Fix: Force a known-good speed/duplex policy (fixed or consistent AN) for isolation; then enable PAUSE only if needed and understood.
Pass criteria: Unidirectional and bidirectional tests meet throughput > X, with pause_frames_rate stable and explainable (no unexplained bursts).
Enabling checksum offload makes captures show “bad checksum” — display artifact or real corruption?
Likely cause: Capture point observes packets before HW fills checksum (normal offload artifact), or offload coverage/mode mismatch causes real bad frames.
Quick check: A/B disable checksum offload and compare FCS_err_rate and retry symptoms; verify enabled offload types (IPv4/IPv6/TCP/UDP) match driver capability flags.
Fix: Use link evidence (FCS/counter deltas) as truth; keep checksum offload only after the A/B test proves no real corruption.
Pass criteria: FCS_err_rate < X/hour and (if exposed) rx_csum_err == 0 over X hours at throughput > X.
Jumbo frames (9k MTU) drop, but small packets are fine — buffers/descriptors or segmentation offload?
Likely cause: RX buffer sizing / descriptor chain is insufficient for MTU, or TSO/GSO/LRO configuration is incompatible with the MTU path.
Quick check: Enable 9k and watch rx_overrun/no_buffer deltas; read back effective MTU/offload toggles; A/B disable TSO/LRO while keeping MTU constant.
Fix: Stabilize jumbo path with complex offloads disabled, then increase buffers/descriptors and re-enable offloads one-by-one.
Pass criteria: With MTU=9000, drop_delta == 0 and rx_overrun_rate < X/min over X minutes.
CPU usage is high even at low traffic — interrupt storm or polling/NAPI thresholds?
Likely cause: Interrupt rate is excessive (no/poor coalescing), or polling thresholds cause frequent wake-ups with low work per wake.
Quick check: Log irq_rate (X/s) vs throughput; confirm whether rx_overrun rises (pressure) or CPU burns without drops (scheduling/IRQ).
Fix: Enable/tune interrupt coalescing first; then tune polling thresholds (one knob at a time) while keeping traffic profile fixed.
Pass criteria: CPU% < X and irq_rate < X/s with throughput > X and drop_delta == 0.
Wake-on-LAN is configured but does not wake — AON power domain or pattern/magic mismatch?
Likely cause: Always-on (AON) domain is not powered/clocked in sleep, or WOL pattern/magic packet configuration does not match the sender.
Quick check: Verify wol_armed; send packet then read pattern_hit (if available); observe PME/WAKE asserted (pin or status).
Fix: Guarantee AON rail and wake line integrity first; then align pattern/magic settings and re-test with a known-good generator.
Pass criteria: wol_armed==true, pattern_hit==true, wake_asserted==true, and wake_latency < X ms.
After sleep/wake, ping works but link drops later — restore order or driver reinit missing?
Likely cause: Post-wake sequence leaves queues/descriptors partially stale, or mode/offload/flow-control is not restored to the pre-sleep effective state.
Quick check: Snapshot counters immediately after wake, then watch delta for X minutes; compare pre/post wake readbacks (mode, offloads, PAUSE policy).
Fix: Enforce a deterministic restore sequence: mode → ring/queue init → enable traffic; reapply only known-good offloads after stability is proven.
Pass criteria: sleep_wake_success_rate == 100% over X cycles and FCS_err_rate < X/hour.
Speed/duplex keeps renegotiating — force mode first or check peer consistency?
Likely cause: Autoneg advertisement mismatch, or driver state machine triggers repeated renegotiation under specific conditions.
Quick check: Log speed/duplex transitions with timestamps; A/B compare fixed mode vs AN while keeping the peer unchanged.
Fix: Lock a known-good speed/duplex for isolation; only re-enable AN after peer settings and driver behavior are verified stable.
Pass criteria: No unexpected renegotiation events for X hours; throughput remains within ±X%.
FCS errors increase, but “the eye looks OK” — how to isolate with MAC/PCS loopback?
Likely cause: Fault can be host-side (DMA/driver/mode) or link-side; visual inspection is not a pass criterion.
Quick check: Run MAC loopback (isolates host-side); then PCS loopback (digital link logic) and compare FCS_err_rate deltas.
Fix: If MAC loopback fails, focus on controller/driver/rings; if loopbacks pass but external link fails, escalate to sibling pages (PHY/line-side) without expanding scope here.
Pass criteria: In loopback modes, FCS_err_rate == 0 over X minutes at throughput > X.
MAC address changes after every reset — strap/EEPROM/driver precedence conflict?
Likely cause: Multiple sources compete (strap → EEPROM/OTP → driver override), or EEPROM programming/verification is unreliable across cold vs warm resets.
Quick check: Read back current MAC and (if available) MAC-source indicator; compare cold boot vs warm reset; verify EEPROM content and checksum/valid flag.
Fix: Define a single authoritative MAC source; disable or harmonize other override paths; add production write + readback + verify step.
Pass criteria: mac_addr_stable == true across X cold boots and X warm resets.
VLAN enabled and some packets “disappear” — filter rules or tag direction is wrong?
Likely cause: VLAN filter is too strict (drops frames), or tag insert/strip direction is misconfigured so frames are not delivered to the intended path.
Quick check: A/B disable VLAN filtering while keeping tag processing constant; check for filtered_drop_counter (if exposed) and drop_delta deltas.
Fix: Stabilize VLAN as pure pass-through first, then add filter rules incrementally with A/B verification per rule.
Pass criteria: In the target VLAN scenario, drop_delta == 0 over X minutes and throughput > X.
Drops only at low/high temperature — log counter rates first or disable complex offloads for A/B?
Likely cause: Temperature shifts a marginal boundary where complex features (TSO/LRO/EEE/WOL-armed policies) become brittle, or ring/interrupt pacing fails under stress.
Quick check: Start with rate-based logging (FCS_err_rate, rx_overrun_rate, drop_delta) across temperature steps; then A/B disable complex offloads while holding traffic profile constant.
Fix: Use A/B to isolate the feature boundary first, then re-enable features one-by-one with counter-rate evidence at each temperature point.
Pass criteria: Over X~X°C, drop_delta == 0 and FCS_err_rate < X/hour at throughput > X.