5G DU (Distributed Unit) Hardware Architecture Guide

← Back to: Telecom & Networking Equipment

A 5G DU is a determinism-driven baseband compute node: it must move packets, feed FEC accelerators, and keep PTP/SyncE time consistent while staying inside strict P99 latency and reliability budgets. This page maps the DU’s Ethernet/PCIe/accelerator/timing/power trees into measurable KPIs, counters, and validation steps so issues can be proven and fixed, not guessed.

H2-1 · What a 5G DU is (and is not): scope, splits, and bottlenecks

Intent

Target questions this section must answer

Clarify what a Distributed Unit (DU) does compared with the Centralized Unit (CU) and the Radio Unit (RU), and identify the engineering bottlenecks that dominate DU hardware architecture.

What does a 5G DU do vs CU/RU? (role boundary, data path responsibilities)
What are the hard bottlenecks in DU L1/baseband hardware? (throughput, determinism, timing, power)

Scope boundary

One-sentence boundary (prevents topic overlap)

This page focuses on the DU’s internal data path, timing/synchronization tree, and power/telemetry; it does not cover RU RF chains (DPD/PA/LNA/JESD) or optical transport internals (DWDM/ROADM/OTN).

Why this matters: the DU’s performance is dominated by “inside-the-box” contention (queues, DMA, fabrics), timestamp integrity, and rail stability—topics that are easy to dilute if RU RF or optical chassis details are mixed in.

Engineering

DU position in the RAN (data-path view, protocol-light)

In practical deployments, the DU sits between the RU and CU/edge compute. The DU must handle the highest sensitivity to latency and jitter because its workload is tightly coupled to real-time processing and fronthaul transport. The clean way to explain “splits” is not by enumerating standards, but by stating what moves where:

Closer to RU ⇒ tighter real-time constraints. Work placed in the DU is typically the part most sensitive to microbursts, queue depth, and timestamp integrity.
Closer to CU/edge ⇒ more scheduling flexibility. Work moved upward can tolerate larger buffers and longer control loops.
DU hardware architecture is therefore a “determinism machine”: it must keep worst-case delays bounded while sustaining very high aggregate bandwidth.

The 4 hard bottlenecks (each mapped to a physical, measurable point)

Throughput headroom — limited by port oversubscription, switch buffering, PCIe effective bandwidth, and accelerator feed efficiency. Evidence: sustained high queue depth, DMA backlogs, PCIe utilization at ceiling with rising latency.
Deterministic latency — dominated by queueing variance, memory copies (DMA/descriptor churn), clock-domain crossings, and backpressure loops. Evidence: P99/P999 latency spikes aligned with queue peaks or accelerator enqueue latency.
Timing/sync integrity — depends on where timestamps are taken, whether timebases stay consistent across domains, and how holdover behaves under stress. Evidence: offset steps, time-error drift in holdover, timestamp monotonicity anomalies.
Power/thermal reliability — burst loads from SerDes/accelerators can cause rail droop or thermal throttling, which shows up as retrains, drops, or resets. Evidence: retrain counters, PG/RESET events, WDT resets, correlated rail telemetry and throttling flags.

Verify

Three “done-right” KPIs (DU-scoped, field-verifiable)

Latency budget (DU contribution): define a DU-internal P99 target and track each stage’s contribution (queues, DMA, accelerator, CPU scheduling). Pass signal: bounded queue depth; no unexplained P99 cliffs at fixed load.
Loss / congestion indicators: monitor queue drops, buffer overflows, pause/backpressure events, and CRC/FCS error rates. Pass signal: drops only appear at clearly defined overload points, not at normal operating headroom.
Time error / holdover: record offset, steps, lock state, and holdover mode transitions with timestamps. Pass signal: no offset steps during link/failover events; holdover drift stays within defined policy limits.

Figure

Figure F1 — DU in the RAN context, with the in-scope “zoom” area highlighted

图中文字均≥22px（最小字号要求满足），只标注模块名，避免信息过载。

H2-2 · DU traffic & dataflow: where packets go, where latency hides

Intent

Target questions this section must answer

Provide a DU-internal pipeline that makes latency sources explicit, and show how to turn “mystery jitter” into a measurable, tunable system.

Where do packets actually go inside the DU? (ingress → fabrics → memory → accelerators → egress)
Where does latency hide? (queues, copies, crossings, backpressure loops)

Engineering

A practical DU pipeline (6 stages, hardware-centric)

A DU should be described as a deterministic pipeline, not as a monolithic “compute box”. The most reliable mental model is to break the DU into stages that each create a specific kind of delay:

Ingress (parse/classify): headers parsed, traffic class selected; errors here look like drops or mis-QoS.
Ethernet switch/queues: buffering and scheduling; this is the #1 source of burst-amplified jitter.
DMA & memory path: descriptor rings, copies, mappings; hidden cost shows as ring fill and timeout drops.
FEC offload stage: enqueue/dequeue and batch behavior; backpressure emerges if feeding is inefficient.
CPU scheduling / exception path: control decisions and slow paths; small average cost but large tail risk.
Egress (shape/transmit): shaping, timestamp emission, uplink behavior; drops here indicate upstream congestion or mis-queues.

Latency sources (layered, with the missing “DU-critical” layer)

Queueing variance: same average load can produce wildly different P99 if queues are unmanaged.
Copies / DMA churn: extra memory touches inflate tail latency and create ring pressure.
Clock-domain crossings: mismatched timebases produce hidden waiting and timestamp inconsistency.
Accelerator enqueue/dequeue: batching trades throughput for determinism; the tail is the enemy.
Backpressure propagation (most diagnostic value): when an offload stage saturates, the “pressure wave” travels backward: accelerator full → DMA backlog → PCIe congestion → switch queue deepens → ingress drops and P99 spikes.

Design targets

Turn principles into controllable knobs (what to actually do)

Minimize black-box buffering: define queue policy (priority, watermarks) so bursts do not turn into uncontrolled tail latency.
Reduce unnecessary copies: keep the fast path as close to DMA/zero-copy intent as the platform allows (without protocol deep-dive).
Constrain crossings: keep the number of timebase boundaries small and explicitly documented (who timestamps where).
Shorten the backpressure loop: ensure “congestion state” becomes visible early (before deep queues form).
Instrument every stage: any stage without counters becomes a “latency black hole” that cannot be debugged in the field.

纵向深刻点在于：这里不是“讲概念”，而是把 DU 的可预测性，落到可调的 queue、DMA、enqueue 与 timebase 一致性上。

Verify

Minimum observability set (enough to explain P99 spikes)

If the goal is to explain tail latency and intermittent drops, the following signals form a minimal closed loop:

Port queue depth + drops: capture peaks and duration (not just averages).
DMA ring fill + drop reason: distinguish ring-full, timeout, mapping failure, and descriptor starvation.
Accelerator enqueue latency + busy%: separate “busy but stable” from “busy and jittery”.
P99 end-to-end latency: always correlate with queue/ring/enqueue metrics in the same time window.

Fast triage order (when P99 jumps)

Step 1: check switch queue peaks and drops → confirms whether jitter is buffering-driven.
Step 2: check DMA ring pressure and drop reasons → confirms whether the memory path is the choke point.
Step 3: check accelerator enqueue latency and backpressure flags → confirms whether offload feeding is unstable.
Step 4: only then check CPU scheduling/exception counters → isolates slow-path or interrupt storms.

Figure

Figure F2 — Packet + compute pipeline (each stage tagged with one key metric)

H2-3 · Throughput & determinism budgets: sizing Ethernet, PCIe, and accelerators

Intent

Target questions this section must answer

Convert “performance” into engineering budgets that prevent tail-latency cliffs: how to size Ethernet ports, PCIe lanes/fabric, and LDPC/Polar/FEC offload capacity with meaningful headroom.

How to size PCIe lanes for a DU? (effective bandwidth, hops, retrain risk, tail latency impact)
How much bandwidth for DU accelerators? (feed efficiency, enqueue latency, backpressure stability)

Engineering

Budget philosophy: “capacity × determinism” (not just peak throughput)

A DU budget is valid only if it protects P99 latency and drop-free operation under bursts. The practical method is to budget three coupled domains—Ethernet, PCIe, and offload—then apply a headroom policy so backpressure never becomes an uncontrolled latency amplifier.

1) Ethernet port budget (ports, rates, oversubscription)

Role-based grouping: front (ingress-sensitive), mid (internal distribution), back (uplink egress). Treat each group as a separate budget line.
Oversubscription is a policy, not a constant: higher oversub is acceptable only when queue policy is explicit and observable (peaks, duration, drops).
Microburst reality check: port sizing must be validated against queue peak and drop, not average utilization.

2) PCIe fabric budget (Gen, lanes, hops, effective bandwidth)

Effective bandwidth ≠ line rate: protocol overhead, payload sizes, read/write mix, and DMA transaction patterns change usable throughput.
Topology affects tail latency: extra switch hops and longer paths increase contention and enlarge the backpressure loop.
Link health is determinism: retrains, lane degradation, or error corrections show up as P99 events long before average throughput collapses.

3) Accelerator budget (LDPC/Polar/FEC throughput vs enqueue latency)

Two-axis sizing: (a) peak throughput headroom, and (b) enqueue latency ceiling (tail-latency protection).
Batch is a trade: larger batches boost throughput but increase jitter; smaller batches stabilize latency but stress capacity.
Feed efficiency matters: the real limiter is often how stably the accelerator is fed (DMA ring pressure, queue depth, backpressure flags).

Engineering criteria (the “headroom rule” that prevents cliffs)

Headroom policy: reserve 20–30% capacity margin across the coupled path (Ethernet ↔ PCIe ↔ accelerator) to absorb bursts, retries, and queue variance.
Cliff avoidance: at target load + headroom, P99 must remain smooth (no sudden jumps) and queue peaks must stay below critical watermarks.
Budget coupling: a shortfall in any one corner (ports, PCIe, offload) will propagate backward and present as tail jitter elsewhere.

Verify

Validation method: synthetic traffic + controlled offload load (P50/P99 closure)

A budget is proven only when load increases do not create a tail-latency cliff. The minimal proof loop correlates: (1) P50/P99 latency, (2) queue peak & drops, and (3) accelerator enqueue latency in the same time window.

Run synthetic bursts (not just steady load) to expose queue-watermark behavior and oversubscription limits.
Sweep offload load (enqueue pressure) to find the backpressure onset point and measure how far it propagates.
Pass signal: P99 rises gradually with load (no sudden step), and drops appear only at clearly defined overload thresholds.

Figure

Figure F3 — Budget triangle: Ethernet ↔ PCIe ↔ Accelerator (throughput, latency, headroom)

H2-4 · Ethernet switching in DU: ports, queues, timestamps, and retiming

Intent

Target questions this section must answer

Define what makes DU switching different from generic switching: queue determinism, hardware timestamp integrity, and retimer placement for robust links under temperature and burst load.

Which switch capabilities matter for DU determinism? (queues/QoS/watermarks, observability)
Where should timestamps be taken? (PHY/MAC vs pipeline insertion points, sanity checks)
How to place retimers/redrivers? (margin, routing, connectors, thermal drift)

Engineering

DU port roles: front / mid / back (risk-driven design)

Front (ingress-sensitive): absorbs bursts; poor queue policy here immediately becomes P99 jitter and drops.
Mid (internal distribution): common oversubscription trap; requires explicit mapping of critical flows to controlled queues.
Back (uplink egress): congestion feedback affects the whole box; shaping and watermarks must be measurable.

Queues & QoS: turning “black-box buffering” into controlled determinism

Tail latency is the target: prioritize policies that bound peak queue depth and peak duration, not only average throughput.
Microburst handling: define watermarks and drop/mark behavior so bursts do not silently inflate tail latency.
Observability requirement: queue peaks, drops, and per-class counters must be available, or field debugging becomes guesswork.

Hardware timestamp path: integrity & sanity checks (DU-scoped)

Insertion point clarity: identify whether timestamps are taken at PHY/MAC or inside the switch pipeline.
Queue interaction awareness: confirm whether queue scheduling affects timestamp latency (critical for consistency).
Sanity checks: detect monotonicity breaks, offset steps, and inconsistent per-port behavior under congestion.

Retimer/redriver placement: margin engineering (not “parts dumping”)

Channel margin: place retimers where loss/connector/backplane effects are largest (restore eye margin early enough).
Training stability: avoid layouts that trigger frequent retraining or downshift under temperature drift.
Thermal realism: keep retimers out of hotspots or ensure airflow; thermal drift can present as intermittent errors and P99 spikes.
Serviceability: choose placement that makes link issues diagnosable (clear counters and isolation points).

Verify

Counters that distinguish link issues vs congestion vs timestamp faults

FCS/CRC & alignment errors: indicate physical/link integrity problems rather than pure queue policy issues.
PCS lane counters: surface SerDes lane-specific degradation or training instability.
Queue depth & queue drops: prove whether determinism breaks are buffering-driven.
Timestamp sanity: detect monotonicity breaks, unexpected steps, or per-port inconsistencies under load.

Figure

Figure F4 — DU Ethernet block: PHY/retimer → switch ASIC (queues + timestamp) → uplinks

H2-5 · PCIe topology in DU: switching, DMA, coherency (what matters, what doesn’t)

Intent

Target questions this section must answer

Explain DU-relevant PCIe design decisions and troubleshooting signals: how switch topology, lane allocation, and DMA behavior shape throughput and P99 determinism—without drifting into protocol textbooks.

PCIe switch in DU design: where contention forms and why hop count matters
DMA bottleneck troubleshooting: how to distinguish mapping/ring pressure from raw bandwidth limits
Link health vs performance: how retrains and correctable errors become tail-latency events

Engineering

Typical DU PCIe fabric (common only): Root Complex ↔ Switch ↔ endpoints

A DU commonly looks like: CPU/SoC (Root Complex) upstream into a PCIe switch, then downstream to accelerators, network interfaces, and sometimes NVMe. The key DU lesson is that PCIe is not only “bandwidth”; it is also a determinism path—contention and link events often appear first as P99 spikes.

What matters (DU-critical)

Lane allocation & topology affinity: keep the hottest data path short (fewer shared uplinks, fewer switch hops), and avoid placing producer/consumer on “distant” paths that amplify contention.
NUMA / locality effects (practical): when traffic crosses domains, memory access variability increases and tail latency grows. The goal is not perfection, but keeping critical flows away from worst-path placement.
DMA behavior (throughput + jitter): ring pressure, transaction fragmentation, and bursty completion patterns can create “looks fine on average” performance with unstable tails.
Link health as determinism: retrains, speed/width changes, or frequent correctable errors create short, sharp stalls that become system-wide backpressure under load.

What doesn’t (avoid over-optimizing)

Chasing absolute shortest paths for every endpoint: focus on keeping critical flows on stable, low-contention paths.
Protocol-layer deep dives: DU engineering value comes from topology + counters + correlation, not packet-format trivia.
Micro-tuning without observability: changes that cannot be verified with link/ring/latency signals rarely hold in the field.

DU-focused pitfalls (quick diagnosis mapping)

Cross-hop contention: utilization below 100% but P99 rises sharply → shared uplink congested or hop chain too long.
DMA ring pressure: bandwidth looks adequate but enqueue latency grows → descriptor/ring churn or mapping behavior dominates.
Thermal/link drift: intermittent errors + retrains + step-like P99 spikes → margin/training stability issue, not “need more lanes”.

Verify

PCIe observability set (enough to separate capacity vs jitter vs link events)

LTSSM state & retrain count: flags link instability that often manifests as tail-latency events.
Correctable error counters: early warning for margin/temperature drift before hard failures.
Bandwidth utilization: confirms whether a path is truly capacity-limited or “jitter-limited”.
DMA latency: the most direct signal for mapping/ring behavior turning into tail latency.
Speed/width changes: captures downshifts that create sudden contention.

Triage order (when P99 jumps)

1) Check utilization vs P99 (capacity-limited or jitter-limited?)
2) Check DMA latency + ring pressure indicators (mapping/ring churn?)
3) Check LTSSM/retrain + correctable errors (link event?)
4) Check speed/width changes (downshift causing contention?)

Figure

Figure F5 — PCIe fabric map: Root Complex → Switch → Endpoints (Gen/lane labels + observability icons)

H2-6 · FEC / LDPC / Polar accelerators: where they sit and how they are fed

Intent

Target questions this section must answer

Describe accelerators as a DU pipeline element (not algorithm theory): how data enters (DMA ring/queue), how batch size changes latency, and how backpressure propagates into PCIe, switch queues, and scheduler behavior.

Offload vs CPU in DU: what is placed where (queueing + determinism view)
Latency vs throughput tuning: batch and queue policy as controllable knobs
Backpressure propagation: how “accelerator full” becomes system-wide jitter

Engineering

Placement model: accelerators as a queueing stage in the DU pipeline

The DU-relevant view is: ingress traffic is staged into DMA rings, then into an accelerator queue, processed, and returned to a scheduler. The system fails determinism when any queue becomes a hidden buffer that expands tail latency without visibility.

How input enters (what to design for)

DMA ring is the first gate: ring fill and completion jitter directly translate into accelerator enqueue latency.
Accelerator queue is the second gate: queue depth and batching policy determine throughput efficiency and tail behavior.
Feed stability beats peak rating: a high peak throughput accelerator still produces poor P99 if feeding is bursty or ring-limited.

Batch size: the key knob (throughput ↑ vs jitter ↑)

Larger batch: improves throughput, but increases waiting time variance and can amplify P99 under bursts.
Smaller batch: stabilizes latency, but stresses capacity margin and can trigger backpressure earlier.
Engineering criterion: set an enqueue-latency ceiling and tune batch/queue to keep P99 below it at target load + headroom.

Backpressure propagation: where congestion spreads

Queue full → enqueue stalls
Enqueue stalls → DMA backlogs / ring pressure
DMA backlogs → PCIe contention (shared uplink becomes hot)
PCIe contention → switch queues deepen and ingress drops appear
System reaction → scheduler throttles, tail latency spikes, and retry/recirculation can worsen bursts

Selection criteria (no part dumping; DU-scoped)

Throughput headroom: sustained capacity with 20–30% margin at target workload.
End-to-end latency: enforce a tail-latency ceiling (P99) for enqueue + compute + return.
Concurrency: number of queues/contexts supported without cross-interference.
Memory bandwidth demand: ensure the DMA/memory path can feed without ring pressure cliffs.
Power & thermal behavior: throttling events must be observable and correlated with tail spikes.

Verify

Five metrics that close the feeding-loop proof

Enqueue latency: measures feeding stability and queue pressure.
Dequeue latency: confirms return-path stability and scheduler coupling.
Busy%: shows compute saturation (separate “busy but stable” from “busy and jittery”).
Backpressure/drop count: proves whether congestion leaks into upstream stages.
PCIe read/write ratio + thermal throttles: captures feeding shape and heat-triggered tail events.

Interpretation patterns (fast classification)

Busy% high, backpressure low → capacity tight but system stable (headroom may be small).
Busy% moderate, enqueue latency high → feeding/topology/ring behavior dominates, not raw compute.
Thermal events align with backpressure → throttling triggers the backpressure wave and P99 spikes.

Figure

Figure F6 — Accelerator feeding loop: NIC/switch → DMA → queue → compute → scheduler (backpressure loop)

H2-7 · PTP/SyncE clock tree in DU: timestamp integrity, holdover, alarms

Intent

Target questions this section must answer

Build a DU-scoped timing view: how external references (PTP/SyncE) become a stable internal timebase, how timestamps remain consistent across consumers, and how holdover/alarms prevent “link looks fine but service jitters”.

PTP boundary clock in DU: where the DU terminates and distributes time
SyncE + PTP design: how frequency lock supports consistent timestamps
Holdover strategy: what happens on GM loss and how to alarm/record it

Engineering

DU timing system (3 layers): reference → cleanup/distribution → timestamp consumers

A DU time system is only “correct” when all timestamp consumers share a consistent timebase and the system can detect and report integrity breaks. The DU-relevant layers are:

Layer 1 — External reference inputs (PTP / SyncE)

PTP reference: provides time/phase alignment, but integrity depends on how timestamps are produced/consumed internally.
SyncE reference: provides frequency lock; frequency stability reduces drift and makes timestamp behavior predictable.
Risk focus: reference loss events must be detectable and time-correlated with service quality metrics.

Layer 2 — Jitter cleaner / PLL and clock distribution (clock tree)

Cleanup: jitter/phase noise is conditioned into a stable internal timebase for distribution.
Holdover mode: when reference is lost, the PLL maintains continuity with controlled drift—until the holdover budget is exceeded.
Distribution: fanout ensures switch/NIC/SoC timebases remain aligned under load and temperature.

Layer 3 — Timestamp consumers (where integrity can break)

Switch timestamp: pipeline/queue conditions must not create inconsistent timestamp latency across ports/classes.
NIC timestamp: hardware timestamping must remain consistent under congestion and link events.
SoC timebase: system time and scheduling must stay aligned with the distributed timebase.
Alignment requirement: consumers across different clock domains must not “silently diverge”.

Integrity risks (DU-specific symptoms)

Inconsistent timestamp paths: offsets appear acceptable on average, but jitter/steps show up under load.
Clock-domain alignment failure: internal consumers disagree, creating intermittent service jitter without obvious link faults.
Holdover drift: after GM loss, service degrades gradually; the network may look “up” while timing quality is out-of-budget.

Verify

Minimum records & alarms (turn timing into a field-debuggable system)

PTP offset: trend + spikes + steps (not just a mean value).
GM loss / GM changes: event log with timestamps and duration.
SyncE lock status: lock/unlock transitions and stability counters.
PLL state: lock / holdover entry/exit and holdover duration.
Timestamp sanity: monotonic violations and step events across consumers.

Fast triage (when service jitters but links look normal)

1) Check offset spikes/steps and timestamp sanity events in the same time window.
2) Check GM loss / SyncE unlock / PLL holdover state transitions.
3) Compare consumers (switch vs NIC vs SoC) for consistency; divergence is the integrity signal.

Figure

Figure F7 — DU timing tree: Ref in → jitter cleaner/PLL → fanout → timestamp consumers (+ alarm points)

H2-8 · Power tree & PMIC sequencing: rails, transient load, and serviceability

Intent

Target questions this section must answer

Explain why DUs reboot, retrain links, or show silent errors under burst loads: how the power tree is layered, how sequencing/PG/RESET domains interact, and which telemetry makes issues reproducible and serviceable in the field.

DU PMIC sequencing best practice: PG/RESET dependencies that prevent intermittent boot failures
Why DU reboots under burst load: transient droop → retrain/errors → WDT/reset chains
Serviceability: the minimal telemetry/logging to avoid “it rebooted” mysteries

Engineering

Power tree layering (platform-typical, not voltage-specific)

A DU power design must be read as a dependency graph, not a list of rails. The practical layers are: Input → intermediate bus → domain rails (SoC, switch, PCIe, accelerators, SerDes/PHY). Determinism failures under load typically start as a transient event in a small subset of these domains.

The three most common failure mechanisms

1) Sequencing / PG timing vs reset domains
If PG signals and reset dependencies do not match real domain readiness, symptoms appear as intermittent boot failures, partial bring-up, or “links never come up” events that are hard to reproduce.
2) Burst load droop (accelerators / SerDes)
A short droop may not trigger a full reset but can cause link retrain, silent data corruption, or sudden P99 spikes. This is why average power is a poor predictor of stability.
3) Telemetry gaps (no field proof)
Without PMBus/current/temperature + event logs, the system can only report “reboot happened”. Serviceability requires that the DU can explain why it rebooted or throttled.

Telemetry & protection (minimum set for a serviceable DU)

PMBus / VRM telemetry: voltage, current, temperature, and fault flags over time windows that include bursts.
PG / RESET logs: who asserted first, who lagged, and whether glitches occurred.
WDT reset cause: power-good loss vs watchdog vs thermal vs firmware-triggered recovery.
Thermal throttling events: correlate with throughput drops and tail-latency spikes.

Serviceability (DU chassis level; keep it actionable)

Field reproducibility: store short rolling windows of power + PG + reset causes around fault events.
Replaceability: identify which domain caused the trip (SoC vs switch vs accelerator) to reduce MTTR.
Operational clarity: alarms must point to a domain and a cause category (sequencing, droop, thermal).

Verify

Field-reproducible evidence chain (turn “reboot” into a root cause)

Power profile: state-based load steps and burst events (capture peak + duration).
PG log: ordering, delays, and glitches across dependent domains.
WDT reset cause: store last reset reason and preceding warnings.
VRM telemetry: droop events, current spikes, fault flags.
Thermal throttling: time-correlate with retrains, errors, and throughput dips.

Correlation rule (what proves causality)

Droop → retrain/errors in the same time window indicates a transient-driven determinism failure.
PG glitch → partial bring-up indicates sequencing/reset-domain mismatch.
Thermal event → throughput/P99 change indicates power/thermal governance driving service behavior.

Figure

Figure F8 — Power + reset tree: Input → VRMs/PMIC → domains → PG/RESET dependencies (+ burst risk path)

H2-9 · Control plane & telemetry: what to log so field issues are diagnosable

Intent

Target questions this section must answer

Make DU field issues diagnosable: define the minimum evidence chain to capture across links, queues, accelerators, timing, and power/thermal so intermittent drops and jitter can be explained on a single time axis.

DU telemetry checklist: what to log (minimum set) and why each item is load-bearing
Debug intermittent drops: how to correlate a symptom window with causal signals
OOB/BMC view: remote sensors, fans, event logs, and diagnostic bundle capture (DU device management only)

Engineering

The DU evidence chain (5 categories, one correlation window)

A DU is diagnosable only when a symptom (drops, retries, P99 spikes, retrains) can be explained by cause signals captured in the same time window. Each category should provide at least: (1) counters, (2) maxima/percentiles, and (3) events.

1) Link health (Ethernet / PCIe)

Record: error counters (CRC/FCS/PCS lane, correctable errors), retrain counts, utilization.
Trigger: sudden retrain bursts, CE spikes, error rate step-changes.
Correlate: queue drops, accelerator enqueue latency, timing sanity events.

2) Queues & congestion (switch / NIC / fabric)

Record: queue depth (high-water), queue drops, congestion indicators.
Trigger: depth stays high, drops appear, depth oscillates with bursts.
Correlate: P99 latency rise, backpressure counts, link errors/retrains.

3) Accelerators (busy / latency / backpressure)

Record: busy%, enqueue/dequeue latency, backpressure/drop counts, thermal throttle events.
Trigger: enqueue latency rises while busy% is moderate (feeding/fabric issue), BP bursts (queue full).
Correlate: PCIe utilization/errors, queue depth, system P99 spikes.

4) Time integrity (offset / holdover / steps)

Record: PTP offset trend, GM loss events, SyncE lock transitions, PLL holdover state, timestamp sanity (monotonic/step).
Trigger: offset steps, holdover entry, frequent GM changes, sanity violations.
Correlate: service jitter windows, drops/retries, queue congestion spikes.

5) Power & thermal (droop / temp / throttle)

Record: VRM/PMBus telemetry, droop/fault flags, temperatures, throttling events, WDT reset cause.
Trigger: droop/throttle events, WDT cause changes, thermal excursions.
Correlate: retrains/errors/offset steps and throughput dips in the same window.

OOB/BMC

Device-management viewpoint (only what helps diagnosis)

OOB/BMC should make the DU diagnosable without relying on in-band services: sensors, fans, event logs, and remote diagnostic bundle capture.

Sensors: temps, currents, voltages, fan RPM, and board-level alarms.
Event logs: reset cause, throttle entry/exit, link retrains, time holdover events.
Remote bundle: trigger-based capture of last/next time windows (ring buffer style).

Verify

“5-minute diagnostic bundle” (minimum configuration)

The bundle is a short, trigger-driven capture that answers: what happened, where, and what changed first across the five categories.

Sampling cadence (two lanes)

Fast lane: queue depth, accel enqueue latency, offset/sanity (captures spikes/steps).
Slow lane: temps, power trends, utilization (captures drift and sustained stress).

Triggers (any one fires the capture)

Queue drops appear or high-water persists.
Retrain/CE spikes on Ethernet/PCIe.
Offset step, holdover entry, sanity violation.
WDT reset cause update, droop/throttle event.
P99 latency exceeds a baseline multiple (relative threshold).

Bundle contents (single time axis)

Event list (ordered by time): retrain, drop, step, holdover, droop, throttle, reset cause.
Top counters + maxima/percentiles for each category in the same window.
One “first-change” hint: which signal changed before the symptom peak.

Figure

Figure F9 — Observability map: attach Link/Queue/Accel/Time/Power probes to the DU pipeline blocks

All labels ≥20px; single-column layout; probes are attached to DU blocks for a closed-loop story.

H2-10 · Validation & production checklist: proving the DU is stable

Intent

Target questions this section must answer

Define what “done” means: a DU is stable only if link, system determinism, timing integrity, and power/thermal behavior all pass clear criteria in lab, production, and field workflows.

DU validation checklist: what to test and what to measure
Test retimers/timestamps/accelerators: prove stability under load + temperature
Production focus: fast screening for marginality (retimer margin, thermal, droop)

Checklist

Layer 1 — Link stability (Ethernet/PCIe)

Full-load ports: sustained utilization without error escalation.
Retrain behavior: no persistent retrains; retrain bursts must be absent under nominal conditions.
Temperature sweep: error counters must not trend upward with temperature.
Margin awareness: if link stability depends on a narrow operating window, it is not production-ready.

Pass/Fail (relative wording)

PASS: no sustained retrain loops; error counters remain flat and low across load/temperature.
FAIL: persistent retrains, step-like error growth, or temperature-correlated degradation.

Checklist

Layer 2 — System determinism (throughput + queues + accelerators)

Throughput + load: drive realistic accelerator load at target throughput.
P99 latency: tail latency must remain within a baseline multiple.
Queue behavior: no sustained high-water; no drops inside the intended envelope.
Backpressure containment: accelerator BP must not expand into system-wide congestion.

Pass/Fail (relative wording)

PASS: P99 bounded; drops absent; BP does not correlate with broad congestion.
FAIL: P99 cliffs, frequent drops, or BP waves propagating into queues/links.

Checklist

Layer 3 — Time integrity (PTP/SyncE, holdover, steps)

Reference loss drills: induce loss; verify holdover entry/exit is logged.
Holdover budget: offset remains within allowable bounds during holdover.
Step / sanity: no timestamp steps or monotonic breaks across consumers.
Consumer alignment: switch/NIC/SoC timebases show no persistent divergence.

Pass/Fail (relative wording)

PASS: controlled holdover + logs; sanity violations absent; alignment holds.
FAIL: offset steps, repeated holdover flaps, or consumer divergence without alarms.

Production

Fast screening for marginality (catch edge degradation early)

Retimer margin: temperature + load stress; reject units flirting with retrains/errors.
Thermal: airflow/fans; reject units with early throttling or error coupling.
Power droop: burst load steps; reject droop-correlated retrains/errors.
Evidence reuse: keep the same key counters used in field diagnosis.

Figure

Figure F10 — Test matrix: domains × environments (Lab / Production / Field)

All labels ≥20px; matrix shows coverage without over-explaining. Single-column WP-safe layout.

H2-11 · Failure modes & troubleshooting: symptom → evidence → root cause

How to use

This section is a DU-only troubleshooting playbook. Each failure mode is written as a closed loop: symptom → evidence (first checks) → root-cause buckets → actions (A/B experiments + fixes). The same-window correlation rule applies: the “first-change signal” in the same time window is the strongest hint.

Evidence categories (keep one time axis)

Link Queue PCIe/DMA Accelerator Time Power/Thermal WDT / reset cause BMC/OOB event log

Do not start with “random tuning.” Start with first checks that separate drop vs error, queue vs fabric, and time vs transport.

Related components (by class, not SKUs): Ethernet PHY/retimer, switch ASIC, PCIe switch/retimer, NIC/DMA engine, FEC accelerator, jitter-cleaner PLL, PMBus telemetry, BMC sensors/log.

Symptom #1

Throughput stalls, but CPU utilization stays low

Observed: throughput plateaus early; P50 may look acceptable; P99/queueing can spike.
Observed: CPU appears idle; raising CPU frequency does not restore throughput.
Observed: issue can be workload-dependent (burst patterns, mixed flows, accelerator offload on/off).

Evidence (first checks, in this order)

Queue high-water & drop: ingress/egress queue depth maxima and any drops.
PCIe bandwidth utilization: link utilization vs expected; check read/write balance if available.
DMA ring health: descriptor starvation, enqueue/dequeue latency spikes, any “no-descriptor” events.
Accelerator feed vs busy: enqueue latency rising while busy% is moderate implies feeding/fabric bottleneck.

Root-cause buckets (what the evidence typically means)

Hidden queueing: oversubscription or queue policy creates sustained high-water without obvious CPU load.
Fabric/NUMA mismatch: traffic or DMA paths cross extra hops, creating jitter and effective bandwidth loss.
DMA pacing mismatch: ring depth, batching, or completion pacing creates periodic stalls despite idle CPU.

Actions (A/B experiments → fix direction)

Queue isolation A/B: temporarily isolate hot flows into dedicated queues; throughput gain implies queue policy issue.
Topology A/B: reduce hop count (alternate slot/port) for the hot endpoint; improvement implies fabric/topology issue.
DMA ring A/B: adjust ring depth / batching one knob at a time; reduced enqueue spikes implies DMA pacing issue.
Containment fix: keep headroom and avoid sustained high-water by shaping bursts or enforcing deterministic queue mapping.

Symptom #2

Intermittent packet loss (short bursts), service jitter appears

Observed: loss appears in bursts; average throughput may remain high.
Observed: repeated micro-outages correlate with congestion windows or temperature/load transitions.
Observed: “loss” may actually be drop (buffers) or error (PHY/PCS).

Evidence (separate drop vs error first)

Queue drop counters: if drops rise while FCS/PCS errors remain flat, this is likely buffer/queue overflow.
FCS/PCS lane errors: if errors rise first, suspect link integrity or temperature-coupled margin.
Buffer/queue depth: high-water persistence vs sharp pulses (pulses often match burst behavior).
Backpressure markers: accelerator BP events aligning with queue spikes indicates congestion propagation.

Root-cause buckets

Buffer overflow: burst traffic exceeds instantaneous buffering, causing drops without PHY errors.
Link integrity drift: temperature or load reduces margin, elevating error counters before drops appear.
Backpressure spillover: downstream stall pushes congestion upstream, turning localized stalls into queue drops.

Actions

Burst shaping A/B: reduce burstiness at ingress; if drops disappear, buffer/queue sizing or policy is the lever.
Flow isolation A/B: isolate high-impact flows; if loss becomes localized, queue mapping/policy needs rework.
Thermal sweep A/B: hold workload constant and sweep temperature; error-coupled loss indicates margin issue.
Fix direction: prevent sustained high-water (policy/headroom) and eliminate temperature-coupled error growth (margin).

Symptom #3

PCIe retrains or negotiates down (Gen/width drops)

Observed: periodic bandwidth collapse; endpoints show changed negotiated speed/width.
Observed: bursts of correctable errors; retrain episodes correlate with temperature or load steps.
Observed: system may remain “up” but latency and determinism degrade sharply.

Evidence (what to capture around retrain)

LTSSM state transitions and retrain counts.
Negotiated speed/width history (before/after).
Correctable error trend vs time and temperature.
Power events in the same window: rail droop, PG glitches, or burst load steps.

Root-cause buckets

Insufficient margin: layout/connector/temperature reduces eye margin, retrains become frequent at certain conditions.
Training sensitivity: equalization/training becomes unstable in a specific temperature band.
Noise coupling: burst load or rail noise degrades receiver performance and triggers retrains.

Actions

Load-step A/B: keep temperature fixed, vary burst load; retrain following load steps implies noise coupling.
Temp-sweep A/B: keep workload fixed, sweep temperature; retrain clustering implies margin/training sensitivity.
Hop-reduction A/B: reduce hop count or swap port/slot; improvement implies topology/margin limits.
Fix direction: widen margin (routing/connector/retimer placement), stabilize training, and reduce noise coupling.

Symptom #4

PTP offset jumps (step changes), while links look “healthy”

Observed: offset shows step jumps or monotonicity breaks; service jitter appears without obvious packet loss.
Observed: events often coincide with GM changes, reference loss, or holdover transitions.
Observed: different consumers (switch/NIC/SoC) disagree on time, creating hidden determinism faults.

Evidence (separate reference issues vs timestamp path issues)

GM loss/change events and their timestamps.
SyncE lock / holdover state transitions.
Timestamp sanity: step events, monotonic breaks, and consumer divergence indicators.
Correlation check: retrains/drops/droop in the same window—avoid blaming time if transport is unstable.

Root-cause buckets

Reference transition: GM changes or reference loss creates a controlled/expected step—must be logged and bounded.
Holdover drift: offset ramps during holdover and crosses the allowed budget.
Timestamp path inconsistency: different clock domains or timestamp insertion points cause consumer disagreement.

Actions

GM lock A/B: prevent GM changes temporarily; if steps disappear, the root is reference transition handling.
Forced holdover drill: deliberately enter/exit holdover; verify logs, alarms, and offset budget behavior.
Consumer consistency A/B: compare switch vs NIC vs SoC timebase; divergence indicates timestamp path alignment issue.
Fix direction: tighten reference transition policy, enforce timestamp path integrity, and validate holdover budgets.

Symptom #5

Reboots or widespread link drops at full load (burst scenarios)

Observed: unit reboots during burst load, or many ports drop simultaneously.
Observed: WDT / BOR / PG-related events appear; performance may degrade before the reset.
Observed: thermal throttling can precede congestion and cause cascading faults.

Evidence (power/thermal first, then transport)

WDT reset cause and any BOR/PG glitch logs.
Rail telemetry: droop minima, fault flags, current spikes, and timing relative to resets.
Thermal throttle: entry/exit events and temperature peaks near the fault.
Transport coupling: retrains/errors that begin after droop/throttle indicate cascading effects.

Root-cause buckets

Burst-induced droop: transient load overwhelms rail response; PG/RESET glitches trigger reset or silent corruption.
Reset domain coupling: dependency ordering causes one domain glitch to cascade into fabric instability.
Thermal limit: throttling shifts determinism; congestion and retrains appear as secondary symptoms.

Actions

Load-step reproduction: run a repeatable burst pattern; confirm droop/PG timing is consistent.
Cooling policy A/B: force higher airflow; if resets become “no reset but degraded,” thermal coupling is dominant.
Domain isolation A/B: gate non-critical loads during burst; reduced droop implies rail headroom issue.
Fix direction: improve transient response & sequencing integrity, and ensure throttle behavior is observable and bounded.

Symptom #6

Accelerator throughput looks fine, but latency becomes jittery (tail spikes)

Observed: average throughput meets target; P99 latency spikes or oscillates.
Observed: busy% is not always high; jitter appears under mixed flows or bursty scheduling.
Observed: backpressure waves can propagate into queues and appear as system-level determinism loss.

Evidence (histograms & backpressure correlation)

Enqueue/dequeue latency histogram (P50/P95/P99), not just averages.
Backpressure count aligned with queue depth high-water events.
PCIe/DMA pacing: spikes in DMA latency or completion pacing indicate feeding instability.
Cross-domain alignment hints: periodicity can indicate domain-crossing or scheduling alignment issues.

Root-cause buckets

Batch too large: great throughput but tail latency expands; jitter increases under bursty arrivals.
Queue saturation: backpressure indicates queue full; stalls propagate upstream.
Feeding instability: DMA pacing or fabric jitter creates periodic starvation bursts.

Actions

Batch A/B sweep: reduce batch size; if P99 drops sharply with small throughput loss, batch tuning is the lever.
Concurrency limit A/B: cap queue concurrency; if BP drops, queue saturation is the dominant issue.
DMA pacing A/B: adjust one pacing knob; if periodic starvation disappears, feeding instability was the cause.
Fix direction: tune for tail latency, contain backpressure, and keep the feed path stable under bursts.

Figure

Figure F11 — Troubleshooting flow: symptom → first counters → A/B experiment

All labels ≥20px; DU-only flow; minimal text with clear “what to check first” and “what to try next.”

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12 · FAQs (5G DU · Distributed Unit)

FAQ usage

These FAQs are written for DU engineering decisions and field debugging. Each answer emphasizes observable indicators, evidence-first checks, and one-knob A/B experiments to isolate root causes.

1 Where is the practical DU vs CU engineering boundary, and what metrics reveal a wrong split?

A DU boundary is correct when DU-local work stays within deterministic budgets: stable P99 latency, bounded queue high-water, and consistent timing error across DU consumers. A split is likely wrong when the DU becomes a hidden queue-and-copy factory (DMA spikes, queue depth saturation) or when timing/jitter issues dominate despite “enough” average throughput.

See: H2-1, H2-2

2 Why can a DU meet average throughput at full load, yet P99 latency suddenly collapses?

Average throughput does not protect tail latency. P99 collapses when bursts push queues to high-water, when DMA rings stall (descriptor starvation or completion pacing), or when accelerators trade latency for throughput via large batches. The fastest isolation is evidence-first: queue depth/drop → DMA enqueue/dequeue latency histogram → accelerator backpressure and queue occupancy in the same time window.

See: H2-2, H2-3, H2-4

3 When selecting an Ethernet switch for a DU, which queue/QoS capabilities matter most?

The DU cares about determinism more than peak features. The most valuable capabilities are: queue isolation that avoids “black-box” queueing, predictable queue scheduling under mixed flows, and visibility (queue depth, high-water marks, drops). Timestamp-related hooks matter when time is consumed inside the DU: a clear, testable path for timestamp integrity is more important than generic data-center routing features.

See: H2-4

4 What are the most common symptoms of wrong retimer/redriver placement, and how to localize quickly?

Wrong placement often shows as (1) error counters rising without queue drops (FCS/PCS/alignment), (2) intermittent retrain or negotiated speed/width downshift, and (3) temperature-band sensitivity where failures cluster in a specific thermal range. Quick localization starts by separating drop vs error, then running A/B tests: fixed load with a temperature sweep, and fixed temperature with burst-load steps to catch margin vs noise coupling.

See: H2-4, H2-10, H2-11

5 How should PCIe topology be chosen to avoid DMA jitter and “phantom bandwidth”?

“Phantom bandwidth” appears when link utilization looks high but tail latency is dominated by extra hops, poor affinity, or unstable DMA pacing. Prefer topology that minimizes switch hops on hot paths and keeps traffic locality consistent. Validate with histograms (DMA enqueue/dequeue latency, completion pacing) rather than averages, and correlate spikes with link state and queue depth in the same time window.

See: H2-5, H2-3

6 When PCIe retrains or downshifts, is it usually SI, power noise, or temperature—and what evidence comes first?

Start with evidence sequencing. First capture LTSSM transitions, retrain counts, negotiated speed/width, and the trend of correctable errors. Next check clustering by temperature (margin/training sensitivity) versus clustering by burst-load steps and rail events (noise coupling). A reliable method is two A/B drills: fixed temperature with load steps, and fixed workload with a temperature sweep.

See: H2-5, H2-11

7 Why can LDPC/Polar accelerators show high throughput but jittery latency, and how should batch/backpressure be tuned?

High throughput can be achieved by batching, but batching expands tail latency under bursty arrivals. Jitter becomes severe when backpressure propagates upstream, turning local stalls into fabric-wide queueing. Tuning should be data-driven: sweep batch size while tracking P99 latency and throughput, then cap concurrency/queue depth to reduce backpressure waves. Always validate with enqueue/dequeue latency histograms, not averages.

See: H2-6, H2-2

8 Should PTP timestamps live in the switch, NIC, or SoC—and how is DU-wide consistency guaranteed?

The best placement is the one that stays consistent across all DU time consumers. A DU is healthy when switch/NIC/SoC time bases agree within budget and timestamp sanity checks pass (monotonic progression, no unexplained steps). Consistency is guaranteed by enforcing a single reference and verifying consumer alignment under load. The practical validation is a same-window comparison: consumer divergence must not appear when queues and links remain stable.

See: H2-7, H2-4

9 When using SyncE + PTP together, how should holdover be designed to avoid “locked but drifting” service?

“Locked” does not guarantee service-grade time. Holdover must be treated as a budgeted mode with explicit entry/exit logs, bounded offset drift, and alarms that trigger before service becomes unstable. The DU should record GM loss/change, SyncE lock transitions, holdover state, and offset trend. A forced holdover drill is essential: deliberately remove reference and verify drift slope, alarm timing, and recovery behavior against the allowed budget.

See: H2-7

10 Why do burst loads cause occasional DU resets or link drops, and how should the power tree be accepted?

Burst loads stress rails serving SerDes and accelerators, creating transient droop that can trigger PG/RESET glitches, silent corruption, or link retraining. A correct acceptance strategy ties power to observability: capture WDT reset cause, PG events, rail minima and fault flags, and thermal throttle transitions in the same window as the fault. Validate by a repeatable burst step test and confirm that no sustained retrain/offset jumps are induced by power events.

See: H2-8, H2-10, H2-11

11 What logs are most often missing in DU field debugging, and what is a “minimum viable telemetry” set?

The most common gap is lack of time-aligned evidence across link, queue, fabric, accelerator, time, and power. A minimum set should include: link errors (FCS/PCS, retrain events), queue depth/high-water/drop, PCIe/DMA latency or stalls, accelerator enqueue/dequeue latency and backpressure, PTP offset/GM/holdover state, and rail/thermal events (droop minima, throttle, WDT reset cause). Sampling must support “same-window correlation,” even if rates are modest.

See: H2-9

12 How can a consistent lab/production/field acceptance matrix be built to prevent “field-only” failures?

Build the matrix by layers and by environments. Layers should include link integrity (BER/retrain/error trends), system determinism (P99 latency, queue stability, loss-free operation under load), time integrity (offset/holdover/step injection and end-to-end timestamp consistency), plus power/thermal coupling (droop and throttle behavior). Each cell must have a clear pass/fail rule: “no sustained retrain,” “no queue drops under defined load,” and “time sanity holds during drills.”

See: H2-10

5G DU (Distributed Unit) Hardware Architecture Guide

5G DU (Distributed Unit) Hardware Architecture Guide

H2-1 · What a 5G DU is (and is not): scope, splits, and bottlenecks

H2-2 · DU traffic & dataflow: where packets go, where latency hides

H2-3 · Throughput & determinism budgets: sizing Ethernet, PCIe, and accelerators

H2-4 · Ethernet switching in DU: ports, queues, timestamps, and retiming

H2-5 · PCIe topology in DU: switching, DMA, coherency (what matters, what doesn’t)

H2-6 · FEC / LDPC / Polar accelerators: where they sit and how they are fed

H2-7 · PTP/SyncE clock tree in DU: timestamp integrity, holdover, alarms

H2-8 · Power tree & PMIC sequencing: rails, transient load, and serviceability

H2-9 · Control plane & telemetry: what to log so field issues are diagnosable

H2-10 · Validation & production checklist: proving the DU is stable

H2-11 · Failure modes & troubleshooting: symptom → evidence → root cause

Evidence categories (keep one time axis)

Throughput stalls, but CPU utilization stays low

Evidence (first checks, in this order)

Root-cause buckets (what the evidence typically means)

Actions (A/B experiments → fix direction)

Intermittent packet loss (short bursts), service jitter appears

Evidence (separate drop vs error first)

Root-cause buckets

Actions

PCIe retrains or negotiates down (Gen/width drops)

Evidence (what to capture around retrain)

Root-cause buckets

Actions

PTP offset jumps (step changes), while links look “healthy”

Evidence (separate reference issues vs timestamp path issues)

Root-cause buckets

Actions

Reboots or widespread link drops at full load (burst scenarios)

Evidence (power/thermal first, then transport)

Root-cause buckets

Actions

Accelerator throughput looks fine, but latency becomes jittery (tail spikes)

Evidence (histograms & backpressure correlation)

Root-cause buckets

Actions

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

H2-12 · FAQs (5G DU · Distributed Unit)

Explore

Categories

Get in Touch