Ethernet Controller & MAC (RMII, GMII, SGMII)
← Back to: Industrial Ethernet & TSN
An Ethernet Controller/MAC is the traffic “timekeeper and traffic cop” between the CPU and the PHY: it defines how frames are queued, moved (DMA), offloaded, and timestamped.
This page shows how to choose RMII/GMII/SGMII semantics, control FIFO/descriptor behavior, avoid latency/jitter traps, and validate PTP hardware timestamps with measurable pass criteria.
H2-1 · Definition & Where It Sits (MAC vs Controller vs NIC)
An Ethernet Controller/MAC is where packet I/O becomes measurable and controllable: it defines the data path (DMA/FIFOs/queues), latency behavior under bursts, and the exact tap points for PTP hardware timestamps.
- Latency-first: prioritize shallow/controllable queues, clear drop reasons, and predictable DMA servicing.
- Timestamp-first: demand explicit PTP timestamp tap points and a measurable timestamp path budget.
- Operability-first: require rich counters (CRC/drops/overruns/underruns) and event logging fields.
H2-2 · Interface Map: RMII / GMII / RGMII / SGMII (Semantics & Clocking)
- 10/100: RMII is the common low-pin choice when a clean reference clock plan exists.
- 1G: GMII/RGMII or SGMII—choose based on clocking tolerance, latency sensitivity, and debug visibility.
- 2.5G and above: commonly SGMII-family / USXGMII-class interfaces (treated as an entry point only here).
- Clock source ownership: which side provides the reference clock (MAC/Controller, PHY, or external).
- Status/negotiation transport: how link speed/duplex/status is conveyed (explicit pins vs in-band signaling).
- Latency stability knobs: clock-domain crossing and buffering choices that change tail-latency under bursts.
First check: confirm whether TX/RX internal delays are enabled on the correct side (exactly once).
Fix: make delay ownership explicit and consistent across MAC and PHY configuration.
Pass criteria: error counters remain < X over Y minutes at sustained load.
First check: ensure both ends agree on in-band status usage and negotiation mode (forced vs AN).
Fix: standardize negotiation policy and validate partner status interpretation.
Pass criteria: negotiated mode stays stable for Z hours with repeated link cycles.
First check: verify which device is the clock master and whether the clock is present at reset/strap time.
Fix: define clock ownership, startup sequencing, and validate with a simple “clock-present” bring-up checklist.
Pass criteria: link-up success rate ≥ (100% − X%) over N cold boots.
H2-3 · Data Path Anatomy: RX/TX Pipeline, FIFOs, DMA, Descriptors
Instability typically comes from queueing and service mismatch inside the Controller/MAC data path: FIFO depth, DMA servicing cadence, descriptor supply (ring health), and backpressure behavior. This chapter turns RX/TX into a measurable pipeline with clear observability points.
Interpretation: RX FIFO absorbs bursts; DMA drains into memory; RX ring health determines whether frames are accepted or dropped.
Interpretation: TX ring feeds DMA; TX FIFO decouples memory timing from line timing; underruns typically indicate service cadence problems.
- Drops: per-reason counters (ring full, buffer unavailable, policy drop).
- FIFO events: RX overrun, TX underrun, watermark hits.
- Backpressure: pause frames sent/received and internal throttle events.
- DMA / ring: descriptor starvation, wrap errors, ownership mismatches.
Failure mode: overrun when service lag exceeds FIFO capacity; “no CRC errors but drops.”
Quick check: RX overrun counter + FIFO watermark events + pause/backpressure activity.
Pass criteria: RX overrun < X per hour at sustained load, with stable P99 latency.
Failure mode: jittery service (bursty DMA) causes FIFO oscillation and tail-latency spikes.
Quick check: DMA completion pacing + ring fill level over time (avoid saw-tooth extremes).
Pass criteria: ring occupancy stays within a target band; no starvation events in Y minutes.
Failure mode: descriptor starvation (ring empty) drops frames even when line is clean.
Quick check: “buffer unavailable” drop reason + starvation counter + wrap/ownership flags.
Pass criteria: starvation < X per 10^6 frames and no wrap/ownership anomalies.
Failure mode: ring overfill or ownership mismatch causes long tail latency or periodic stalls.
Quick check: ring fullness + “TX busy” duration + descriptor reclaim rate.
Pass criteria: reclaim keeps up with enqueue; TX busy never exceeds X ms at steady load.
Failure mode: underrun indicates service gaps or overly aggressive pacing/coalescing upstream.
Quick check: TX underrun counter + pause/backpressure correlation + DMA pacing.
Pass criteria: TX underrun = 0 in Y minutes at target throughput and packet size mix.
- Burst: DMA burst length must match memory behavior to avoid saw-tooth latency.
- Cache line & alignment: descriptor and buffers should align to reduce jitter from extra transactions.
- Ownership: producer/consumer flags must be monotonic and recoverable after resets.
- Wrap: ring wrap boundaries must be validated under stress (long soak, mixed packet sizes).
H2-4 · Latency & Determinism: From Micro-bursts to Bufferbloat
- Throughput: average delivery rate.
- Latency: time per packet, tracked as P50/P99 (tail latency matters most).
- Jitter: latency variation (spread between P50 and P99/P999).
- End-to-end: application-to-application behavior (system-level).
- Controller-internal: FIFO/ring queueing and DMA service cadence (this page’s focus).
- Ring occupancy leaving its normal band (rapid climb, slow drain).
- FIFO watermark hits followed by drops or pause frames.
- Tail latency (P99) rising while average throughput stays stable.
- Shallow queues: keep queueing delay within a target window X.
- Fast service: maintain consistent DMA drain/reclaim cadence during bursts.
- Explicit drop policy: drops must have reasons (observable), avoiding “silent loss.”
H2-5 · Offloads: Checksum / TSO / LRO / VLAN / Filtering (What Helps vs Hurts)
Offloads can reduce CPU cost and raise throughput, but may also increase tail latency, alter queue behavior, and make packet capture misleading. A latency-first approach evaluates every offload with the same four axes: determinism, observability, CPU, and throughput.
- CPU cycles saved for per-packet checksum work.
- Higher throughput when CPU is the bottleneck.
Pass criteria: capture interpretation matches hardware reality; no increase in P99 beyond X.
Risk: large “work items” can dominate TX ring service, increasing tail latency for small real-time flows.
Quick check: compare TX ring occupancy and P99 latency with/without TSO under mixed packet sizes.
Use when: throughput-first bulk traffic; avoid for strict cyclic real-time streams unless isolated.
Risk: merge windows add waiting time and reshape arrival timing, often increasing jitter for real-time traffic.
Quick check: P99/P999 jitter comparison and queue drift when coalescing is enabled.
Use when: non-real-time traffic or when real-time flows are isolated to different queues.
Pass criteria: real-time class P99 latency stays within X while bulk class consumes offload benefits.
- Apply VLAN tags and priority markings for traffic classes.
- Steer frames into dedicated RX queues (when supported).
- Drop/accept based on simple rules to protect critical queues.
Pass criteria: critical class P99 < X and drop reasons remain explainable.
H2-6 · PTP Hardware Timestamping: Where the Timestamp Is Taken
Two devices can both claim “hardware timestamping” yet show different offsets and jitter. The difference usually comes from where the timestamp is taken and how much queueing and service jitter exists between the event and the software-visible readout.
Best when: the tap point is close to MAC egress and queueing is controlled.
Best when: system integration/compatibility is prioritized.
- FIFO queueing: variable waiting time before the tap point or before readout.
- DMA delay: service cadence and descriptor availability change the visibility timing.
- Clock-domain crossing: synchronization granularity and phase uncertainty add error.
- Interrupt coalescing: reporting delay changes when software “sees” the event.
Tap → FIFO → DMA → memory → readout.
Pass criteria: total jitter < X and drift remains stable over window W.
H2-7 · Clocks & Clock Domains: Ref Clock, PLL, CDC, Sync to Timebase
Stable timestamps require a clean reference clock, predictable PLL behavior, and well-defined clock-domain crossings. This chapter focuses on the Controller/MAC interior: timebase construction, CDC risk points, and calibration fields used to keep drift and jitter observable.
- Free-running counter: the internal time counter driven by a clock source (ref/PLL-derived).
- Frequency adjust: fine rate trimming to reduce drift against an external time/frequency reference.
- Phase adjust: offset alignment for fixed biases (e.g., deterministic pipeline delay compensation).
- Capture/compare hooks: timestamp capture points and scheduled compare events for alignment tasks.
- CDC granularity: synchronization steps inject quantization-like uncertainty.
- Crossing FIFOs: occupancy and service cadence can add variable latency.
- Mixed domains: interface clock (GMII/RGMII/SGMII) and system clock drift differently.
Pass criteria: jitter remains < X and does not scale sharply with occupancy.
H2-8 · Interrupts, Polling, Coalescing: The Hidden Latency Knobs
Tail latency is often dominated by service cadence: interrupt storms, polling budgets, and coalescing thresholds shape when packets become visible to software. This chapter focuses on adjustable knobs, their symptoms, and validation patterns — not OS parameter catalogs.
- Timer threshold: wait time window before raising an interrupt or delivering a batch.
- Packet threshold: wait until N packets accumulate before service.
- Budget: how many packets/bytes are processed per service pass.
Pass criteria: P99 stays < X, drops remain explainable, CPU stays < X%.
H2-9 · Measurement & Bring-up: What to Probe, What to Log
Link-up and throughput alone are not enough. Robust bring-up defines what to probe at Controller/MAC boundaries, what counters to log, how to build a minimal test matrix, and how to set pass criteria for latency and timestamps.
Drops + FIFO overrun → treat as queue depth/service cadence risk.
Descriptor/DMA errors → treat as memory contract/alignment risk.
- Ingress: RX entry counters and timestamp capture path.
- Queues: RX/TX FIFO watermarks, ring occupancy peaks and dwell time.
- DMA: completion rates, fault/timeout counters, burst behavior (concept-level).
- Egress: TX underrun, pause behavior, egress timestamp tap-point.
- Path: internal loopback → external loopback → normal link.
- Load: steady flow and micro-burst injections.
- Targets: throughput, P50/P99 latency, drops, timestamp sanity.
- Environment: temperature steps and voltage edges (where relevant).
- Timestamp error histogram: buckets (0–50 ns / 50–200 ns / 200 ns–1 μs / >1 μs) with X thresholds.
- Temperature / voltage: periodic samples + event-trigger snapshots.
- Link events: up/down, speed change, pause state change, lock transitions (if exposed).
- Queue peaks: FIFO/ring peak occupancy and dwell window.
- Drop reason tags: overrun / no desc / underrun / budget / pause (as available).
H2-10 · Design Hooks & Pitfalls (Controller-centric)
This section consolidates controller-centric pitfalls without drifting into PHY electricals or PCB layout. Each item maps a visible symptom to the first check that narrows the root cause quickly.
H2-11 · Engineering Checklist (Design → Bring-up → Production)
This section turns controller/MAC decisions into three execution gates. Each checklist item is written as a testable contract: Check → How → Pass (threshold placeholder X) → Evidence.
Scope guard: Controller/MAC only. PHY electrical/PCB length matching, magnetics placement, and switch/TSN scheduling are intentionally out of scope here.
Design gate — Freeze contracts & budgets before layout
-
Check: Interface contract (RMII/RGMII/SGMII semantics + status reporting).
How: Document clock source, link state path, and auto-negotiation boundaries.
Pass: Link-up/down events are deterministic within X retries; no “phantom link” states.
Evidence: Link-event log + state diagram snapshot. -
Check: Descriptor ring contract (alignment, ownership, wrap, cache behavior).
How: Fix ring entry size, cache-line alignment, and DMA burst policy in a single spec page.
Pass: Descriptor errors = 0 across stress tests; no wrap-related drops.
Evidence: Driver debug counters + ring dump excerpt. -
Check: Queue strategy (FIFO/ring depth vs tail latency).
How: Define burst tolerance and explicit drop reasons (overrun/underrun/timeout).
Pass: P99 latency < X and drops are classifiable (no “unknown”).
Evidence: Queue peak histogram + drop-reason breakdown. -
Check: Offload policy (debug-first vs latency-first vs throughput-first profiles).
How: Decide default enable/disable for checksum/TSO/LRO/VLAN filtering and document side-effects on capture/visibility.
Pass: Packet capture remains interpretable; latency profile does not regress beyond X.
Evidence: A/B profile report + capture notes. -
Check: PTP timestamp path budget (tap point, FIFO/DMA/jitter contributors).
How: Write a “timestamp chain” budget: ingress/egress tap → queueing → DMA → memory visibility.
Pass: Timestamp error tail buckets < X of samples under load.
Evidence: Timestamp histogram + load annotation. -
Check: Instrumentation hooks reserved (counters, logs, event triggers).
How: Lock a must-have field list: CRC, drops, FIFO over/under, descriptor, pause, coalescing, thermal/power events.
Pass: Field logs can explain every drop class without guessing.
Evidence: “Black-box schema” v1 + sample export.
Example PNs (controller/MAC anchors — verify features in datasheets):
- PCIe controllers with PTP focus: Intel i210-AT, Intel I225, Intel i350
- PCIe→GbE controllers (PTP-capable variants exist): Microchip LAN7430, LAN7431
- USB→GbE controller (offload anchors): Microchip LAN7800
- Reference clock XO anchors (25 MHz examples): SiTime SiT1602AI-22-33E-25.000000, Abracon ASFL1-25.000MHZ-EC-T
Bring-up gate — Execute a repeatable test ladder
-
Check: Bring-up ladder is followed (Link → Loopback → Throughput → Latency → PTP sanity).
How: Freeze the step order and capture artifacts per step (counters + logs + plots).
Pass: Each step meets its gate threshold X before moving forward.
Evidence: One-page bring-up report (v1). -
Check: Must-have counters are readable and meaningful (CRC, drops, FIFO over/under, descriptor, pause, retry).
How: Correlate counter deltas with injected events (micro-burst, CPU load, link flap).
Pass: “Symptom → First counter to check” mapping is stable across runs.
Evidence: Counter snapshot set + correlation notes. -
Check: Queue peak is captured (not just average throughput).
How: Record peak ring occupancy and peak FIFO watermark in a fixed time window.
Pass: No overflow at target burst profile; P99 remains < X.
Evidence: Peak histogram + burst configuration. -
Check: Latency metrics are standardized (P50/P99/P999 + measurement tap point).
How: Lock the denominator: which timestamp, where taken, and how synchronized.
Pass: Cross-lab deltas < X with the same method.
Evidence: “Latency definition” page + sample dataset. -
Check: PTP sanity under load (offset stability + histogram tail).
How: Compare idle vs sustained traffic; watch for offset step events tied to queueing/coalescing.
Pass: Offset drift and step rate within X over Y minutes.
Evidence: Offset plot + histogram + event log.
Example PNs (bring-up friendly ecosystems): Intel i210-AT / I225 / i350 (PTP timestamping focus), Microchip LAN7430/LAN7431 (register-level visibility), TI AM3358 (CPSW timestamping module in industrial SDK ecosystems), NXP i.MX RT10xx (1588/PTP application-note ecosystems).
Production gate — Prove long-run stability & regression
-
Check: Soak stability (multi-day).
How: Continuous traffic + periodic PTP sanity; log every event and counter drift.
Pass: Unexplained drop/step events < X per day.
Evidence: Soak log + daily summary. -
Check: Temperature sweep (cold/room/hot) impacts on latency & timestamp tails.
How: Step temperature, mark steady-state windows, compare hist tails under identical traffic.
Pass: Tail buckets remain within budget X across temperature.
Evidence: Temp-tagged histogram set. -
Check: Power margin sweep (voltage corners + events).
How: Induce controlled dips/noise; correlate with FIFO/desc errors and offset steps.
Pass: No systematic correlation beyond X.
Evidence: Power-event log + counter correlation. -
Check: Regression suite is frozen (no “human memory” dependency).
How: Lock the test ladder + key stress patterns; version every threshold.
Pass: New firmware/driver does not break any gate threshold X.
Evidence: CI-style regression report. -
Check: Field log schema compatibility (forensics over months/years).
How: Version log schema; keep backward-readable exports.
Pass: Cross-version comparison is possible without re-parsing hacks.
Evidence: Schema changelog + samples.
Production note: Favor parts with stable counter sets, clear timestamp capabilities, and well-documented register behaviors. Example anchors: Intel i350 (multi-port deployments), Intel I225 (TSN-adjacent ecosystems), Microchip LAN7430/LAN7431 (PTP-capable controller family).
Diagram · 3-Gate Checklist Cards (controller/MAC execution flow)
H2-12 · Applications & IC Selection Logic (before FAQ)
The goal is not “pick a brand.” The goal is to select a controller/MAC capability set that can be verified against measurable outcomes: Latency, Timing (PTP), and Throughput.
Scope guard: This logic stays controller-centric. Switch/TSN scheduling parameters, PHY SI/layout rules, and compliance workflows are referenced only as “handoff points”.
A) Application buckets → primary goal → must-have capabilities
Gateway / Edge compute
Primary goal: high throughput + stable tail latency under mixed traffic.
Must-have: DMA robustness, multi-queue visibility, offload profiles that do not hide drops, clear counter taxonomy.
First validation: micro-burst stress → queue peak + drop reason + P99.
Example PNs: Intel i210-AT, Intel I225, Intel i350, Microchip LAN7430, Microchip LAN7800.
PLC / Industrial controller
Primary goal: predictable latency and debuggability across temperature and long uptimes.
Must-have: stable counter set, deterministic event logging, clear interrupt/coalescing knobs, timebase hooks for PTP or system time alignment.
First validation: soak + temp sweep → “unexplained drop” rate.
Example PNs: TI Sitara AM3358, NXP i.MX RT10xx (integrated Ethernet MAC families), Intel i210-AT (PTP-focused ecosystems).
Remote I/O / Field box
Primary goal: fast fault isolation and consistent behavior during link disturbances.
Must-have: explicit drop/overrun classification, link-event traceability, minimal “hidden buffering” features enabled by default.
First validation: link flap + burst injection → counter correlation.
Example PNs: NXP i.MX RT10xx, TI AM3358, Microchip LAN7431 (PCIe→RGMII class).
High-speed imaging / Motion control
Primary goal: timestamp integrity and bounded tail latency under load.
Must-have: hardware timestamping with well-defined tap point, histogram/step observability, queue visibility under bursts.
First validation: PTP sanity under sustained traffic → tail buckets + step events.
Example PNs: Intel i210-AT, Intel i350, Microchip LAN7430.
B) Key specs that actually matter (controller-centric)
Interface & porting
What to check: port count, host interface type (PCIe / USB / integrated), and MAC↔PHY interface contract (RMII/RGMII/SGMII).
How to validate: link event determinism + stable status reporting under flap conditions.
Pass: no ambiguous link states; recovery within X seconds.
Queues, DMA, offloads
What to check: FIFO/ring depth controls, DMA burst behavior, descriptor rules, checksum/TSO/LRO policy.
How to validate: micro-burst tests with queue peak logging + drop reason classification.
Pass: P99 bounded < X and drops remain explainable.
Timing, diagnostics, and forensics hooks
What to check: hardware timestamping capability, tap point clarity, histogram/step observability, black-box event logs.
How to validate: idle vs load PTP sanity; correlate offset steps with queueing/coalescing.
Pass: timestamp tail buckets within budget X under load.
C) Selection decision flow (Latency / Timing / Throughput)
Start with the primary objective, then lock the must-have capabilities and the first validation artifact.
Example PNs by entry (anchors only)
- Latency-first anchors: Intel i210-AT, Intel I225, Microchip LAN7431
- Timing-first (PTP) anchors: Intel i210-AT, Intel i350, Microchip LAN7430, TI AM3358, NXP i.MX RT10xx
- Throughput-first anchors: Intel I225, Intel i350, Microchip LAN7430, Microchip LAN7800
For each PN, the first pass is always the same: confirm timestamp tap point, counter availability, and queue observability in the datasheet + driver documentation.
H2-13 · FAQs (Controller/MAC Troubleshooting)
These FAQs close out long-tail troubleshooting without expanding the main body. Each answer follows a fixed 4-line, measurable format: Likely cause / Quick check / Fix / Pass criteria (threshold placeholders X, Y, Z).
Scope guard: Interface semantics, FIFO/DMA, offloads, timestamping tap points, and measurement methodology only. PHY electrical/PCB SI, switch/TSN scheduling, and protocol-stack tuning are out of scope here.
Throughput is fine but P99 latency explodes under bursts — first knob to check?
Likely cause Queue depth/ring growth during micro-bursts (bufferbloat) or interrupt coalescing delaying service.
Quick check Capture ring occupancy peak + FIFO watermark in a fixed window; compare P50 vs P99; note coalescing timer/packet thresholds.
Fix Reduce effective queueing (smaller rings / lower watermarks), tighten coalescing (smaller timer or packet threshold), and ensure drops are explicit (reason-tagged) rather than silent delay.
Pass criteria Under burst load Y pps for Z s: P99 latency < X ms and ring peak stays below X% of capacity.
Enabling TSO increases jitter — TSO segmentation point or queueing?
Likely cause Segmentation happens late (near DMA/MAC), creating bursty small-packet emission, or increases per-queue contention that amplifies tail jitter.
Quick check Compare TX queue peak, TX underrun, and pacing intervals with TSO on/off; inspect whether real-time flows share the same queue as bulk traffic.
Fix Disable TSO for latency-critical traffic (profile/queue split), or keep TSO but enforce smaller queueing windows (coalescing and ring limits) to prevent burst amplification.
Pass criteria With target traffic mix: jitter (P99-P50) < X µs and TX queue peak does not exceed X% during sustained load Y.
PTP offset drifts although PHY claims timestamping — wrong tap point?
Likely cause Timestamp is taken at a different point than assumed (MAC vs PHY vs PCS), or the driver reads the wrong timestamp source under load (queueing makes it visible as drift).
Quick check Log timestamp histogram (tail buckets) vs load; verify the reported tap point by matching the timestamp to ingress/egress events and confirming which register/path provides it.
Fix Force a single timestamp source (MAC-side or PHY-side) and ensure the driver uses the matching path consistently; reduce queueing variability (coalescing/queue limits) to prevent tap-point ambiguity.
Pass criteria Under traffic load Y: offset drift < X ns over Z min and tail bucket > X ns stays below X%.
RX drops with no CRC errors — FIFO overrun or descriptor starvation?
Likely cause RX FIFO overruns (service too slow) or RX ring runs out of descriptors (refill starvation), both of which drop frames before CRC would fail.
Quick check Compare RX FIFO overrun vs RX no-descriptor/ring empty counters; log ring occupancy peak and ISR/coalescing thresholds during the drop window.
Fix If FIFO overruns dominate: reduce coalescing latency and shorten service interval; if descriptor starvation dominates: increase refill headroom (ring size or refill policy) and ensure descriptors are cache-line aligned.
Pass criteria At load Y pps: RX drops < X per 10⁶ packets, and FIFO-overrun + no-descriptor counters remain at 0.
Link is stable but intermittent stalls — pause frames or backpressure?
Likely cause Flow-control pause/backpressure stops transmission (or drains RX servicing) without dropping link; stalls appear as “traffic freezes” with no CRC faults.
Quick check Read TX/RX pause frame counters, queue occupancy, and backpressure events during stall windows; check whether stalls correlate with queue high-watermarks.
Fix For diagnosis, disable flow control temporarily; then tune pause thresholds/watermarks or adopt shallower queues so backpressure triggers less often and resolves faster.
Pass criteria No stall longer than X ms over Z min at load Y, and pause/backpressure events stay below X per minute.
Different OS shows different latency — interrupt coalescing mismatch?
Likely cause Different driver defaults for interrupt moderation/coalescing, queue mapping, or polling budget lead to different service intervals and tail latency.
Quick check Compare coalescing timer/packet thresholds, number of queues used, and ISR rate. Measure P50/P99 under identical traffic and pin CPU affinity consistently.
Fix Align coalescing and queue settings across OS/driver profiles; prioritize smaller coalescing windows for latency-first modes and validate with the same measurement tap point.
Pass criteria Under the same traffic Y: P99 delta between OSes < X µs, and ISR/coalescing settings match within X%.
SGMII links up but negotiation looks wrong — in-band status mismatch?
Likely cause PCS “in-band status” encoding/decoding mismatch or one side forcing speed/duplex while the other expects auto-negotiation semantics.
Quick check Read PCS status for resolved speed/duplex, check whether in-band status is enabled, and compare link partner advertisement vs forced settings across both ends.
Fix For debug, force a known good mode on both ends (speed/duplex + in-band status policy). Then re-enable auto-negotiation with consistent in-band status handling.
Pass criteria Resolved mode matches expected (speed/duplex) across Z re-links, and mismatch events = 0 during Y minutes of traffic.
Checksum offload makes captures “look broken” — how to verify correctly?
Likely cause Captures occur before hardware inserts/verifies checksums; software displays “bad checksum” although on-wire frames are correct (or checksum status is carried out-of-band).
Quick check Use the driver’s RX checksum status flags/counters; compare captures with checksum offload enabled vs disabled; validate on-wire integrity using a known-good external capture point.
Fix For debugging, disable checksum offload (or force verification in software) and rely on hardware-reported checksum status once validated; document the capture tap point in the test report.
Pass criteria Captures remain interpretable (no false “broken” checksums) and checksum error counters < X per 10⁶ packets over Z minutes at load Y.
Timestamp is stable in lab but shifts with temperature — timebase vs CDC?
Likely cause Timebase frequency drifts with temperature, or clock-domain crossing (CDC) adds temperature-dependent delay/jitter visible in timestamp tails.
Quick check Log offset vs temperature alongside timestamp histogram tails; compare idle vs load across temp steps; flag deterministic “steps” versus smooth drift.
Fix Improve timebase stability (calibration/compensation hooks) and reduce CDC sensitivity by minimizing variable queueing and ensuring consistent clock-domain mapping for timestamp capture.
Pass criteria Across temperature range: drift slope < X ns/min, and tail bucket > X ns stays below X% under load Y.
DMA errors only at high load — cache line/alignment or burst length?
Likely cause Descriptor/data buffers are not cache-line aligned or coherent, or DMA burst/outstanding settings exceed the memory subsystem’s safe envelope under peak load.
Quick check Check DMA error counters vs traffic rate; validate descriptor alignment and buffer boundaries; log whether errors appear at a repeatable throughput threshold.
Fix Enforce cache-line alignment for descriptors/buffers and use a conservative DMA burst/outstanding configuration; verify memory coherency handling is consistent for both RX and TX paths.
Pass criteria DMA error counters remain at 0 up to load Y for Z minutes; throughput stays within X% of target.
Low-latency mode reduces throughput too much — what minimum FIFO target?
Likely cause Latency-first settings make queues too shallow and service too frequent, increasing overhead and reducing batching efficiency (especially for small packets).
Quick check Observe throughput vs CPU/ISR rate; check TX/RX underrun/overrun counters; log queue occupancy distribution (not just peak) to see if batching collapsed.
Fix Set a minimum FIFO/ring headroom target (small but non-zero), and use “tight coalescing” rather than “no batching” (e.g., small timer + modest packet threshold).
Pass criteria Throughput ≥ (100 − X)% of target while P99 latency < Y ms; underrun/overrun counters remain at 0.
Counters show drops but app sees none — where is loss hidden (driver ring)?
Likely cause Drops occur in a layer not directly visible to the application (e.g., ring-level drop recovered by higher-layer retries, or drops counted as “queue discard” before delivery).
Quick check Compare per-queue driver counters (ring drops, no-descriptor, FIFO overrun) against per-flow observations; confirm whether drops align with burst windows and ring occupancy peaks.
Fix Make loss explicit and attributable: enable reason-tagged drop accounting, reduce burst-induced ring starvation (ring headroom + tighter servicing), and separate latency-critical traffic from bulk queues.
Pass criteria Drops are fully explainable by counters (no “unknown loss”), and ring-level drops < X per 10⁶ packets under burst profile Y for Z s.