O-RAN Radio Unit (O-RU) Architecture & Key IC Blocks
← Back to: Telecom & Networking Equipment
O-RAN Radio Unit (O-RU) is the fronthaul-facing RF endpoint that must keep EVM/ACLR/SEM stable by controlling the full RU loop—RF chain + DPD/PA/LNA biasing + JESD converter links + clock/jitter + PTP/SyncE timing—while proving it in the field with counters, telemetry, and logs.
This page shows what a “good RU” looks like, where failures typically originate, and how to validate and troubleshoot with measurable evidence instead of guesswork.
H2-1 · Scope, assumptions & what “good” looks like for an O-RU
This section locks the page boundary (O-RU only) and defines a measurable acceptance language for RF quality, link determinism, timing integrity, and thermal stability.
Why What this page covers (and what it does not)
An O-RU is responsible for converting the air-interface RF chain into deterministic digital sample streams and transporting them over fronthaul, while keeping linearity, synchronization, and field evidence under control.
- O-RU responsibilities (in-scope): RF front-end partitioning, PA/LNA bias behavior, DPD/observation loop integration, data-converter interfaces (JESD204B/C), fronthaul PHY readiness (eCPRI over Ethernet), and timing inputs/distribution (PTP/SyncE/1PPS/ToD).
- Explicitly out-of-scope: DU/CU baseband compute and scheduling, FEC engines, PCIe switching inside compute nodes, and transport-network internals (OTN/DWDM/ROADM).
- Goal of this boundary: every metric and every troubleshooting step in this page must be provable from RU-side measurements, counters, or logs.
Define “Good RU” = four acceptance domains
“Good” is not a single number. A deployable O-RU must simultaneously meet RF emission quality, sampling determinism, time/frequency coherence, and stability across temperature and aging.
| Domain | What “good” must demonstrate | Primary RU-side evidence |
|---|---|---|
| RF emission quality | Stable EVM/ACLR/SEM across output power steps, temperature corners, and configured carriers; spurs under control; predictable DPD gain. | Spectrum/ACLR plots vs Pout; EVM vs integrated jitter; spur map; DPD on/off delta. |
| Sampling determinism | JESD links maintain lane alignment and deterministic latency; error counters remain quiet under thermal and supply noise stress. | JESD lane error counters; alignment status; deterministic-latency stability checks (LMFC/SYSREF alignment evidence). |
| Timing integrity | Time/frequency remains coherent under SyncE/PTP lock, and degrades predictably in holdover with clear alarms and state transitions. | PTP time error statistics; SyncE lock/holdover state; timebase switch logs; holdover drift trend. |
| Thermal & retention | RF quality does not collapse at realistic site temperatures; protection and derating are orderly, observable, and repeatable. | Thermal sensors + derating states; PA current/voltage; VSWR/reflection alarms; event logs with timestamps. |
Verify RU acceptance checklist (what to measure, not just what to claim)
The checklist below is written as “measurement → evidence artifact.” Values are implementation-specific; the critical point is producing comparable evidence across operating modes and environmental corners.
| Item | How to measure (RU-side) | Pass evidence artifact |
|---|---|---|
| EVM | Capture EVM across power steps and temperature bins; isolate clock/jitter sensitivity by controlled clock conditions. | EVM vs Pout and vs temperature plots; “DPD on/off” delta plots for the same carrier configuration. |
| ACLR / SEM | Spectrum measurement at representative Pout and bandwidth; repeat with DPD enabled/disabled and after warm-up. | ACLR/SEM tables + screenshots; spur map snapshots tied to timebase state and PA bias state. |
| PTP time error | Log time error statistics under lock and during holdover; verify source switch and alarm behavior. | Time error trend and histograms; “lock/holdover/source” state timeline with timestamps. |
| JESD lane health | Monitor lane errors/alignment status during temperature ramps and supply-noise injections. | Lane error counters staying flat; alignment status stable; deterministic-latency evidence around SYSREF/LMFC events. |
| Thermal derating point | Drive RU into thermal stress; observe orderly derate states and RF quality degradation profile. | Derate state ladder (Normal→Derate→Shutdown) with triggers; RF quality plotted against temperature and derate state. |
H2-2 · RF signal chain partitioning: where transceiver ends and PA/LNA/filters begin
This section makes responsibility explicit: each block “owns” a dominant set of impairments and has specific test taps that prove or falsify root causes.
Partition RU RF chain = blocks with accountable metrics
A useful partition is not “RF vs digital.” A useful partition answers: which block dominates which metric under realistic operating points, and which measurement tap proves it.
| Block | Dominant metric(s) it most often drives | Fast proof tap (RU-side) |
|---|---|---|
| PA / driver | ACLR and in-band distortion at high Pout; temperature-sensitive compression; interaction with DPD/CFR. | ACLR vs Pout curve; DPD on/off delta; PA current/thermal correlation. |
| Filters / duplexer | Out-of-band mask (SEM) and insertion-loss sensitivity; can distort emission when heated or mismatched. | Spectrum shoulders vs temperature; return loss/VSWR alarms vs emission changes. |
| LO / synthesizer | Phase-noise driven EVM degradation; spurs that appear as discrete spectral lines. | Spur map; EVM vs clock condition; correlated with timebase states. |
| LNA / RF front-end | NF and blocker tolerance; early overload signatures under strong interferers. | Blocking test response; compression/overload counters; gain-state transitions. |
| ADC/DAC interface | Sampling coherence and deterministic latency stability when using JESD; lane errors can masquerade as “RF quality” issues. | JESD lane counters; alignment status; stability across thermal ramp. |
Tx Transmit chain (DAC → upconversion → PA → duplexer → antenna)
The Tx chain must meet emission masks and in-band modulation quality across power steps. The most valuable analysis is identifying which sub-block is limiting EVM versus limiting ACLR/SEM.
- DAC/Tx baseband path (within RU): sets the baseline for in-band noise and images; becomes visible when PA is backed off and still EVM-limited.
- Upconversion + LO: phase noise raises EVM floor broadly; spurs show up as discrete spectral lines or fixed-offset tones.
- Driver/PA: dominates ACLR near compression; interacts with DPD adaptation quality and thermal headroom.
- Duplexer/filters: enforce spectral mask; insertion loss and temperature drift can change output power and shoulder behavior.
| Measurement | What it proves | Common failure signature |
|---|---|---|
| EVM vs Pout | Separates “nonlinearity-limited” (worse at high Pout) from “clock/LO/jitter-limited” (floor-like behavior). | EVM flat but poor across Pout → suspect clock/LO; EVM collapses near top power → suspect PA/DPD headroom. |
| ACLR vs Pout | Quantifies PA compression and DPD effectiveness; confirms whether linearization tracks power steps. | ACLR jumps at a repeatable Pout → PA bias/thermal or DPD mis-tracking. |
| Spur map | Distinguishes discrete spur sources (LO, switching supplies, coupling) from broadband distortion. | Fixed-offset tones that follow LO plan or timebase state changes. |
| VSWR / reflection | Detects mismatch-driven behavior that can trigger protection or distort emission under real antennas/cables. | Emission changes align with VSWR alarms and derating state transitions. |
Rx Receive chain (antenna → filters/LNA → downconversion → ADC → digital)
The Rx chain is not “just sensitivity.” A deployable RU must remain stable under blockers and adjacent interferers, where overload and recovery behavior become the decisive factors.
- Front-end filtering: defines how much blocker energy reaches the LNA/mixer; insertion loss directly impacts NF.
- LNA bias and gain states: trade NF against linearity (IIP3); must avoid unstable gain toggling under real blocker dynamics.
- Mixer/IF path: can generate intermodulation products that mimic real signals; LO leakage and IQ imbalance amplify confusion.
- ADC dynamic range: overload recovery and clipping behavior determine whether digital compensation remains meaningful.
| Measurement | What it proves | Common failure signature |
|---|---|---|
| NF / sensitivity | Quantifies insertion loss + LNA bias health; establishes baseline performance when blockers are absent. | Sensitivity shifts with temperature or bias rails → suspect LNA bias / front-end losses. |
| Blocking response | Identifies the first overload point (LNA, mixer, ADC) via gain compression or recovery artifacts. | Sudden desense events aligned with AGC transitions or overload counters. |
| IIP3 / intermod | Separates linearity bottlenecks from noise bottlenecks; predicts behavior under multi-tone interferers. | Phantom products at predictable offsets → intermod source in RF/IF path. |
| LO leakage / images | Validates calibration integrity and shielding/ground coupling; explains repeatable “mystery tones.” | Fixed tones that move with LO plan or appear after thermal or power state changes. |
Choice Direct sampling vs IF (RU-only decision matrix)
This page treats the choice strictly as an RU integration trade-off, not an ADC tutorial. The decision is driven by filter realizability, dynamic range pressure, LO planning complexity, and calibration burden.
- Front-end filtering feasibility: if adequate preselection is hard, direct sampling increases ADC overload risk under blockers.
- Dynamic range & blockers: stronger blocker environment requires either more analog selectivity or more ADC headroom and careful gain distribution.
- LO/image management: IF architectures shift complexity into LO planning and image rejection; direct sampling shifts complexity into sampling integrity and clock/jitter.
- Calibration & retention: whichever path increases temperature-sensitive calibration, must be paired with retention evidence and field observability.
H2-3 · DPD/CFR architecture: main path, observation path, and adaptation loops
In an O-RU, DPD is not a “nice-to-have algorithm.” It is a maintainable closed-loop system that must remain truthful (observation path), aligned (time/phase), and stable across temperature and power states.
Concept The three-piece DPD system (RU closed loop)
A practical RU implementation can be described as three cooperating subsystems that must be validated together—not in isolation.
- Main path (truth target): the emitted spectrum and in-band modulation quality that must meet ACLR/EVM/SEM across power steps.
- Observation path (truth source): a measurement chain that must be wide enough, linear enough, and quiet enough to represent the PA behavior.
- Adaptation loop (maintenance): a controlled update mechanism with explicit “when to learn / freeze / relearn” rules tied to temperature and power states.
Decisions Observation path: bandwidth, dynamic range, and coupler placement
The observation chain must not “lie.” If it compresses or filters away the information that DPD needs, the adaptation loop will learn the wrong model.
| Decision | Why it matters in a RU | RU-side proof |
|---|---|---|
| Obs bandwidth | Too narrow hides spectral regrowth and memory-related behavior; DPD appears “stable” but ACLR refuses to recover at high Pout. | ACLR improvement vs bandwidth setting; residual shoulder comparison after warm-up. |
| Obs dynamic range | If the Obs chain clips before the PA does, the learned correction is biased; convergence can be fast but wrong. | Obs chain linearity check; DPD on/off delta collapses at top power → suspect Obs clipping. |
| Coupler location | Coupling before/after filtering changes what DPD “sees” (PA only vs PA+front-end effects). Placement must match the intended correction boundary. | DPD effectiveness consistency across antenna mismatch states (VSWR) and thermal corners. |
| Obs noise floor | Observation noise becomes model noise → longer convergence and persistent residual distortion. | Convergence time vs temperature; residual EVM/ACLR stability under fixed power state. |
Alignment The non-negotiables: delay align + gain/phase align + retention rules
Alignment is the foundation of the closed loop. Misalignment often looks like “DPD does not work” even when PA behavior is correctable.
- Delay alignment: the main-path and observation-path samples must represent the same waveform time reference (group delay mismatch is a common hidden failure).
- Gain/phase alignment: amplitude and phase errors (including frequency-dependent response) must be calibrated; otherwise the learned correction is rotated or scaled incorrectly.
- Retention policy: define when alignment is trusted, when it must be refreshed (temperature bin change), and when learning is frozen (unstable conditions).
CFR CFR ↔ DPD coupling: PAPR, headroom, and efficiency
CFR changes peak behavior (PAPR) and therefore shifts where the PA operates in its nonlinear region. The DPD loop must be validated with CFR enabled, not as a separate feature.
- CFR stronger: reduces peaks and can improve thermal headroom, but adds intentional distortion that can become the EVM limiter.
- CFR weaker: preserves waveform fidelity but pushes the PA deeper into compression, increasing ACLR pressure and reducing DPD margin.
- Engineering order: set a CFR target → choose PA backoff/operating point → validate DPD margin across temperature bins.
Proof Evidence set: not just “DPD on/off”
DPD must be proven with reproducible artifacts that separate power-state effects, thermal effects, and observation-path integrity.
| Artifact | What it proves | Failure signature |
|---|---|---|
| ACLR/EVM vs Pout | Separates “PA-limited at top power” from “alignment/obs-limited across all powers.” | DPD delta shrinks only at high Pout → PA deep compression or Obs chain clipping. |
| DPD on/off delta | Quantifies correction margin and verifies the correction boundary matches coupler placement. | ACLR improves but EVM worsens → CFR/DPD interaction or misalignment. |
| Convergence time vs temp bin | Validates retention policy and observation SNR; demonstrates stable field maintenance behavior. | Convergence slow in a specific temperature region → obs noise or model mismatch/retention gap. |
H2-4 · PA biasing & supply strategies: average power, peak power, ET/Doherty implications
In a RU, PA bias and PA supply behavior are the root levers that shape linearity headroom, thermal margin, and the repeatability of protection and derating. This section stays RU-local: PA rails, bias control, sensing, and state behavior.
Map Bias types and why they change linearity + heat
Bias strategy is not an isolated circuit choice; it sets the PA operating point and how the PA reacts to peaks (PAPR), temperature drift, and protection events.
| Bias approach | What it optimizes | Typical risk |
|---|---|---|
| Static bias | Predictability and simplicity; stable behavior for a fixed operating point. | Lower efficiency under varying traffic; higher thermal stress at average power. |
| Dynamic bias | Better efficiency and thermal margin across power states and carrier loading. | Bias noise / stability can leak into RF quality; requires clear state logic and validation. |
Rails PA supply strategies (RU-local): fixed, multi-rail, adjustable rail
PA supply choice determines peak capability, droop behavior, ripple coupling, and how cleanly the RU can derate while keeping emissions compliant. “Envelope tracking” and “Doherty” are mentioned only as RU integration implications—not as amplifier theory.
- Fixed rail: simplest validation; droop and ripple must be controlled to avoid modulation of the RF envelope.
- Multi-rail: supports operating-point selection and derating without extreme distortion; rail switching behavior must be observable and repeatable.
- Adjustable rail: can improve efficiency; places tighter demands on loop stability and transient response (droop/recovery shows up as RF impairment).
- ET/Doherty implication (RU view): improving efficiency changes the “shape” of PA nonlinearity and its temperature sensitivity—DPD proof must be repeated under the intended rail/bias modes.
Protect Sensing + protection + derating states (what must be logged)
A deployable RU requires protection behavior that is deterministic and auditable. Protection is not only for survival; it also preserves emission compliance by derating in a controlled manner rather than failing abruptly.
| Trigger | Expected RU behavior | Log fields (minimum set) |
|---|---|---|
| OTP (over-temp) | Stepwise derate → protect emission and hardware; clear state transitions. | temp_PA, temp_board, derate_state, power_backoff%, timestamp |
| OCP/UV events | Limit or reduce power; prevent oscillatory state flapping. | PA_V, PA_I, rail_state, fault_cause, recovery_attempts |
| VSWR / mismatch | Reduce power or mute carriers; protect PA while maintaining predictable alarms. | vswr_metric, action_taken, derate_state, duration, timestamp |
Proof Evidence set for bias/rails decisions
Bias and rail strategies are validated by showing the joint relationship between efficiency, emission quality, and thermal behavior—under the same modes used in deployment.
| Artifact | What it proves | Failure signature |
|---|---|---|
| PAE vs ACLR | Quantifies the efficiency/linearity trade-off and confirms operating-point selection is consistent. | PAE improves but ACLR collapses at a specific mode → rail/bias pushing PA into deep compression. |
| Thermal rise vs state | Demonstrates stability across temperature; verifies that derating keeps RF quality within bounds. | Thermal corner produces abrupt emission failure aligned with state transition → state logic or sensor thresholds need tuning. |
| Protection event timeline | Proves repeatable behavior and auditability; prevents “mystery” field failures. | Frequent state flapping or missing logs → insufficient hysteresis / missing instrumentation. |
H2-5 · LNA biasing, gain control, and receiver linearity under blockers
A receive chain may look clean in the lab but collapse on site when strong adjacent or out-of-band signals appear. The difference is rarely “mystery RF”—it is usually gain staging, bias drift, and overload points interacting with the AGC state machine.
Goal What “good Rx” means in an O-RU (under blockers)
Receiver performance in a RU is defined by stability under realistic blocker conditions, not only by a best-case noise figure. The practical target is to keep the chain inside its linear operating region while the AGC makes repeatable decisions.
Bias LNA bias and gain staging: NF vs linearity is a RU decision
LNA bias sets transconductance and therefore influences both noise and linearity. In a RU, bias must be evaluated together with gain staging because too much early gain can force later stages into overload during blocker events.
| Decision lever | What it impacts | How it fails on site |
|---|---|---|
| LNA bias point | NF, compression margin, temperature sensitivity of gain and linearity. | Hot corner: gain/linearity drift shrinks margin → sudden desense or distortion. |
| Front-end gain | Noise contribution and headroom for mixer/IF/ADC under blockers. | Lab looks fine; site blocker causes mixer/ADC overload despite “normal RSSI.” |
| Attenuation / step gain | Protects later stages; sets the dynamic range distribution. | Too coarse steps cause AGC hunting; too slow steps allow overload bursts. |
AGC AGC must be deterministic: thresholds, hysteresis, and hold time
Under blockers, AGC is a control system. If thresholding and timing are not engineered, the chain can oscillate between gain states, creating intermittent EVM/BLER spikes that are difficult to reproduce without proper logs.
- Fast protection vs slow optimization: a fast response prevents hard overload; a slower loop stabilizes quality once the chain is safe.
- Hysteresis: prevents state flapping near a threshold; required for field stability.
- Hold time: keeps gain stable long enough to observe quality metrics and avoid “self-inflicted” variability.
Blockers Where the chain saturates first: LNA, mixer/IF, or ADC
Strong blockers reveal the real limiting block. A RU should explicitly instrument overload signatures so “site collapse” becomes a diagnosable event.
| First to saturate | Typical signature | RU-side proof |
|---|---|---|
| LNA compression | Front-end distortion dominates; reducing later-stage gain does not recover emission quality. | Blocking sweep shows early collapse; LNA gain/bias state correlates with event. |
| Mixer / IF overload | Intermod products raise in-band noise-like floor; effects appear frequency-combination dependent. | Adjacent-tone combinations trigger; IF gain step changes shift the failure boundary. |
| ADC clip / overload | Hard clipping events; sudden EVM/BLER spikes with clear overload counter bursts. | ADC overload counters align with throughput drops; recovery after gain step is immediate. |
Cal RU-local calibration that matters under blockers
Under strong signals, small calibration errors become large performance problems. RU calibration must be treated as a retention problem across temperature bins, not a one-time factory adjustment.
- DC offset drift: shifts baselines and thresholds, which can bias AGC decisions and mask true sensitivity.
- IQ imbalance drift: reduces image rejection and can fold blocker energy into the wanted band.
- Retention rules: define when to refresh calibration (temperature bin change, large gain-state change, or repeated overload events).
Proof Evidence set: blocking sweeps + temperature curves + event counters
The receiver must be proven with repeatable artifacts that connect RF behavior to RU-visible counters and state transitions.
| Artifact | What it proves | Minimum instrumentation |
|---|---|---|
| Blocking test matrix | Identifies the first saturation block and validates AGC behavior under stress. | AGC_state, gain_step, overload_cnt (by stage if available), timestamp |
| NF/gain vs temperature | Shows margin erosion and the temperature bin where collapse becomes likely. | temp_bin, LNA_bias_state, gain, NF proxy/metric |
| Overload event timeline | Converts “site failure” into a diagnosable event with reproducible signatures. | overload_cnt, relock_cnt, calibration_refresh_cnt, temp |
H2-6 · JESD204B/C link planning: lanes, clocks, SYSREF, LMFC, deterministic latency
A JESD link can come up and still behave “strangely” when deterministic latency is not controlled, SYSREF capture is inconsistent, or board-level margin is eroded by routing, jitter, or supply noise. This section stays RU-local: ADC/DAC ↔ FPGA/SoC.
Objects The minimal JESD set that explains 90% of RU issues
RU design and validation can be organized around a small set of objects that directly map to bring-up behavior and timing repeatability.
Clocks Clock domains in RU: device clock, SerDes reference, SYSREF
JESD behavior is governed by relationships between three timing domains. The engineering focus is consistency and capture windows, not a textbook discussion of clock theory.
- Device clock: defines the ADC/DAC sampling timebase and the internal framing rhythm.
- SerDes reference: anchors the serial link and impacts lane margin and error behavior.
- SYSREF: aligns LMFC phase for deterministic latency (when used); capture consistency is critical.
Board Why “strange performance” happens: routing, jitter, and supply noise
When margin is thin, the link may remain up while quality degrades intermittently. RU proof requires lane-resolved counters and a bring-up state log.
| Board-level factor | Typical symptom | RU-side evidence |
|---|---|---|
| Channel loss / reflections | Lane errors appear on specific lanes; behavior worsens at temperature. | Lane error counters by lane; BER/PRBS margin trend. |
| Jitter / reference instability | Intermittent alignment issues; deterministic latency drifts across resets. | Bring-up stage where failures occur; LMFC lock / SYSREF capture events. |
| Supply noise on SerDes/PLL | “Good then bad” behavior under load or temperature; bursts of lane errors. | Error bursts correlated with rail telemetry and temperature bin changes. |
| SYSREF distribution | Link up but phase repeatability is inconsistent; startup-to-startup changes. | SYSREF capture window violations; deterministic latency stability test fails. |
Bring-up Deterministic bring-up requires a state log (RU-local)
Field reliability depends on knowing exactly where bring-up fails and whether failures are lane-specific or systemic. A minimal RU log should record both the bring-up stage and lane-resolved errors.
| Log item | What it explains | Example fields |
|---|---|---|
| Bring-up stage | Shows whether failures occur early (link training) or late (alignment / data phase). | stage_id, fail_reason, retry_cnt |
| Lane alignment / deskew | Identifies weak lanes and boundary conditions (temperature, load). | lane_id, align_err_cnt, elastic_events |
| SYSREF / LMFC events | Explains deterministic latency repeatability across resets. | sysref_seen, capture_ok, lmfc_phase_id |
| BER / PRBS | Separates physical margin issues from alignment/latency issues. | ber, prbs_lock, time_window |
H2-7 · Clock tree and jitter budgeting (LO, converters, JESD, Ethernet)
In an O-RU, clocks are not a shared utility—they are a performance boundary. A “working” RU can still fail EVM, create spurs, or show intermittent JESD/Ethernet errors if the clock tree is not budgeted, cleaned, and instrumented per branch.
Consumers Three clock consumer classes and what each one punishes
A single root clock typically fans out into multiple domains, but the consumers care about different noise shapes and failure signatures. Jitter budgeting starts by separating these classes and giving each a branch-specific budget.
| Consumer | What it is sensitive to | RU-visible symptom |
|---|---|---|
| LO PLL / Synthesizer | Phase noise and reference contamination that lifts skirts or injects deterministic spurs. | Spur map changes, adjacent leakage shifts, EVM degrades without obvious power change. |
| ADC/DAC sampling | Integrated sampling jitter that directly reduces SNR and widens error vectors. | EVM worsens; SNR drops; performance changes with clock-cleaner mode or temperature bin. |
| SerDes / Ethernet PHY | Reference instability that shrinks eye margin and raises BER/PCS error probability. | CRC/PCS errors, link flaps under load, error bursts correlated with rail noise or heat. |
Cleaner What a jitter-cleaner PLL must do in a RU
In practice, a RU clock IC is a system function: it cleans noise, distributes a stable reference, and provides holdover behavior when external references degrade or disappear. The key is to budget additive jitter per stage and prove it at measurement points.
- Clean: shape input noise so downstream domains meet their budget (loop bandwidth and reference quality matter).
- Distribute: fanout and buffers add their own jitter—each branch must account for additive contributions.
- Holdover: keep a stable output when references are lost (validated in H2-8 as a timing requirement).
Budget A RU-friendly budgeting rule: root → cleaner → fanout → branch load
A usable budget is not a single “total jitter” number. It is a per-branch limit that includes additive noise from each distribution step. Keep the math simple in documentation: define the branches, list the contributors, then verify each measurement point.
| Branch | Budget decomposition (placeholders) | Primary outcome |
|---|---|---|
| LO reference | Budget (xx fs) = root + cleaner additive + fanout additive + local coupling | Spur / phase-noise driven EVM |
| ADC/DAC sampling | Budget (xx fs) = root + cleaner additive + buffer additive + board coupling | SNR / EVM floor |
| JESD clocks | Budget (xx ps) = ref source + distribution additive + supply/jitter coupling | Lane margin / error bursts |
| Ethernet PHY ref | Budget (xx ps) = ref source + distribution additive + EMI/rail coupling | BER → CRC/PCS errors |
Symptoms Bad clock symptoms: map the symptom to a measurement point
Symptoms become actionable when a RU can connect them to a branch and a measurement point. Keep the mapping short and testable.
| Symptom | Most likely branch | Minimum evidence |
|---|---|---|
| EVM degrades | ADC/DAC sampling clock or LO reference. | EVM vs integrated jitter trend (same RF power, same temperature bin). |
| Spurs appear | LO reference contamination or coupling into PLL loops. | Spur map repeatability; changes with clock-cleaner mode or rail condition. |
| JESD errors | JESD ref/device clock margin and rail noise coupling. | Lane error bursts aligned with jitter point changes or temperature transitions. |
| CRC/PCS errors | Ethernet PHY reference domain and local coupling. | CRC/PCS error counters correlated with ref quality or rail noise events. |
Proof Evidence set: phase noise/jitter points + EVM curve + spur map
RU clocking must be proven as a measurable system. The most useful artifacts are branch-level measurement points and one performance curve per sensitive domain.
- Measurement points list: root reference, cleaner outputs, fanout outputs, and each branch endpoint.
- EVM vs integrated jitter: demonstrate monotonic sensitivity for the converter path under controlled conditions.
- Spur map: show repeatable spur signatures and how they move with clock/rail conditions.
H2-8 · Timing and synchronization (SyncE/PTP/1PPS/ToD): what the RU must guarantee
RU synchronization is a contract: which inputs are supported, how the timebase is selected and disciplined, what happens during reference loss, and which evidence the RU exposes (time error, lock/holdover state, and source switching history).
Inputs Sync inputs commonly present on an O-RU
RU timing interfaces typically include frequency references, time references, and packet-based timing. The key is not protocol detail, but the RU’s guarantees and observable state.
Use How the RU uses timing internally (RU-local distribution)
Timing is consumed by the RU timebase and then distributed into internal domains that require repeatable phase and timestamps. This section stays RU-local: distribution to LO reference, JESD-related domains, and timestamp units.
| RU consumer | Why it needs disciplined timing | What to observe |
|---|---|---|
| Clock-tree root | Defines the stable reference that all performance branches inherit (H2-7). | Locked/holdover, TE, source |
| LO distribution | Maintains repeatable phase behavior and reduces time-varying spurs/quality shifts. | TE trend vs EVM/spurs |
| JESD domains | Supports consistent timing behavior and reduces error bursts due to reference drift. | Lane errors vs TE bins |
| Timestamp unit | Ensures consistent time tagging and externally visible alignment evidence. | PTP stats + TE |
Holdover Reference loss behavior: controlled degradation, not silent drift
Holdover must be specified as an RU state with alarms, numeric drift evidence, and a recovery strategy. The goal is predictable behavior during sync loss and transparent proof after recovery.
| State | What the RU does | Evidence to expose |
|---|---|---|
| Locked | Disciplines timebase to selected reference; maintains stable distribution. | sync_source, sync_state=locked, TE |
| Holdover | Free-runs on local quality with defined drift behavior and explicit alarms. | sync_state=holdover, holdover_elapsed, TE drift |
| Degraded / Out-of-spec | Signals that timing contract is exceeded; optional service limitation is a system decision. | alarm_id, TE threshold crossing, switch history |
Evidence What the RU must report for validation and operations
RU timing must be auditable. The minimum evidence set should make “timing issues” diagnosable without guessing.
| Field | Purpose |
|---|---|
| sync_source | Which input is currently selected (PTP / SyncE / 1PPS-ToD / Local). |
| sync_state | locked / holdover / degraded / out_of_spec. |
| time_error (TE) | Numeric error metric and a trend over time (curve evidence in validation). |
| holdover_elapsed | How long the RU has been running without an external reference. |
| source_switch_history | Count, timestamps, and reason code for every source switch. |
| ptp_stats (names only) | Offset/delay/servo state counters as fields, without protocol exposition. |
Tests Validation proof: TE curve, lock time, and holdover drift
Timing validation should be curve-driven and state-aware. A RU passes when the TE curve and state transitions are consistent across resets and temperature bins.
- TE curve: locked → reference loss → holdover → recovery, with clear state markers.
- Lock time: distribution of time-to-locked across restarts.
- Holdover drift: TE growth vs time in holdover; validate threshold crossings and alarms.
H2-9 · eCPRI/Ethernet fronthaul hardware (PHY, timestamping, resilience, modules)
A fronthaul link being “up” does not prove it is stable. RU-side hardware must be partitioned into measurable segments, with explicit tap points and counters that make link quality, latency, and timestamp stability auditable.
Partition RU-side fronthaul datapath: what ends where
The goal is a segment-by-segment view that separates “traffic moves” from “margin exists” and “timestamps stay stable”. Keep the scope RU-local: MAC/PCS/PHY/SerDes and the external module/interface.
| Segment | What to verify (RU-local) | Evidence hook |
|---|---|---|
| MAC (SoC/FPGA) | Queueing and internal timing boundaries; where hardware timestamping is taken. | TS tap point; latency deltas |
| PCS | Encoding / FEC behavior and error observability under stress conditions. | PCS/FEC counters |
| PHY | Link training stability, margin sensitivity vs temperature/power events. | CRC/PCS error bursts |
| SerDes | Reference-clock cleanliness and coupling effects at interface level. | BER proxy via CRC/PCS |
| SFP/QSFP module | LOS behavior and DDM trends (temperature, Tx/Rx power drift). | DDM/LOS events |
Tap points Timestamp and delay observability: 2–3 taps that matter
Timestamp stability becomes diagnosable only when its measurement point is explicit. Use a small set of tap points that can be correlated with counters and environmental telemetry.
- Tap-A (MAC TS): hardware timestamp boundary used for stability assessment (distribution over time and temperature).
- Tap-B (PCS/PHY): counter boundary where CRC/PCS/FEC error bursts are visible (quality and margin proxy).
- Tap-C (Module/I/O): LOS/DDM boundary for optical/electrical interface symptoms (drift and flap correlation).
Resilience “Up but not stable”: failure modes and the RU-side counters that separate them
RU field issues usually look similar at first (drops, retries, missing deadlines). The RU must separate them using counters and event timing.
| Observed symptom | Likely RU-local root class | Minimum proof |
|---|---|---|
| Error bursts without link down | Margin sensitivity (temperature/power coupling) in PHY/SerDes domain. | CRC/PCS/FEC burst timing + temperature bin |
| Link flaps | LOS/LOF events, module interface instability, or hard threshold crossing. | LOS/LOF + flap statistics + DDM snapshot |
| Timestamp instability | TS tap jitter/drift due to reference quality at interface level. | TS delta distribution vs temperature/load |
| Slow drift → sudden failure | Module/connector degradation or thermal drift reducing margin. | DDM trend + rising counters prior to flap |
Modules RU module choices: what to care about (without turning into an optics chapter)
Modules are treated as observable endpoints. The RU should rely on module diagnostics and link counters, not assumptions.
- DDM/DOM completeness: temperature + Tx/Rx power + alarm flags must be readable and logged.
- LOS behavior: verify LOS thresholds and persistence; correlate with flap events.
- CDR/retimer (mention only): used as an engineering lever to restore margin when board/channel constraints demand it.
Validation RU fronthaul acceptance checklist (hardware-focused)
A robust checklist proves stability across environmental bins and operational stress, using only RU-visible evidence.
- Counters: CRC / PCS / FEC, LOS / LOF, flap count (with timestamps).
- TS stability: delta distribution (p50/p99) vs temperature bins and traffic stress.
- Correlation: error bursts aligned with module DDM changes and RU power/thermal events.
- Recovery behavior: flap recovery time and post-recovery counter reset/continuity policy (documented).
H2-10 · Telemetry, protection, and thermal (field stability and traceability)
Field stability is achieved by closing the loop between measured stress (thermal, power, VSWR), controlled actions (derating ladder), and auditable evidence (logs and counters) that explain any RF KPI drift.
Telemetry RU signals that directly predict RF KPI drift
Telemetry must prioritize signals that move PA operating point or trigger non-linear behavior in a repeatable way. Keep the set minimal but sufficient for correlation.
Protection Derating ladder: controlled degradation instead of abrupt collapse
Protection is most effective when it is staged and hysteretic. Each level has a clear trigger, an action, and a recovery condition. This avoids oscillation and prevents silent out-of-spec operation.
| State | Trigger examples (2 signals) | Action and evidence |
|---|---|---|
| Normal | Temp within nominal band; VSWR within nominal band. | Full performance; record periodic snapshots. |
| Warm | Rising hot-spot temp; fan approaching high duty. | Increase cooling, minor bias guard; log temp bin entry. |
| Hot (Derate) | PA temp high; PA current near limit. | Reduce power; optionally disable a subset of carriers; log action_taken. |
| Critical | VSWR/reflected power high; temperature beyond safe limit. | Shutdown; raise alarm; capture fault_cause + snapshots. |
Traceability Field logs: minimum schema that makes every incident reproducible
Logs should be field-replayable: what happened, why it happened, what action was taken, and what the RU looked like at that moment. Use a compact key set that ties thermal/power/VSWR to DPD/sync/link context.
| Field | Why it is non-optional |
|---|---|
| timestamp, uptime | Aligns events with counters and external site alarms; supports correlation windows. |
| temperature_bin | Explains repeatable KPI shifts across environmental conditions. |
| protection_state, action_taken | Proves controlled degradation and prevents “silent derate” ambiguity. |
| fault_cause | Distinguishes thermal, VSWR, rail OCP/OTP, and sensor faults. |
| dpd_state | Separates RF drift from linearization state changes (enabled/converging/frozen/fallback). |
| sync_state | Correlates timebase state (locked/holdover/degraded) with KPI stability and link timing evidence. |
| eth_error_burst, link_flap_count | Captures fronthaul instability context without protocol exposition. |
| pa_v, pa_i, pa_temp, board_temp, vswr | Provides the minimal snapshot needed to replay thresholds and validate protection correctness. |
Validation Field stability proof: step-temperature + KPI hold + transition correctness
Validation is strongest when it is state-aware. Demonstrate that RF KPIs remain within limits across temperature bins, and that protection transitions occur at documented thresholds with consistent evidence.
- Step-temperature: apply discrete plateaus and record EVM/ACLR with telemetry and protection state.
- Transition correctness: verify trigger thresholds, hysteresis, and recovery conditions (no oscillation).
- Event replay: any derate/shutdown event must be explainable using the minimum log schema.
H2-11 · Validation & troubleshooting playbook (symptom → root cause → proof)
This section provides a shortest-path field workflow for O-RU issues using RU-local evidence only: measurement points, counters, and structured logs. Each symptom maps to 2–3 likely root causes, required proof, a corrective action, and a retest criterion.
How to use The shortest-path rule (do not “theory dive” first)
- Step 1 — Classify: decide whether the dominant symptom is RF KPI, Clock/Sync, JESD, or Fronthaul.
- Step 2 — Prove: gather one hard evidence packet: spur map / jitter or TE curve / lane counters / CRC-PCS-FEC bursts.
- Step 3 — Act: apply the smallest RU-local corrective action (alignment, clock cleanup, rail/bias/thermal guard, SI constraint) and retest.
Symptom table Symptom → likely root causes → proof → fix → retest
| Symptom (RU-side) | Likely root causes (2–3) | Proof required (meas/counters/logs) | Corrective actions (RU-local) | Example MPN anchors |
|---|---|---|---|---|
| ACLR/EVM worsens or spurs jump |
|
|
|
LMX2594
Si5341A
|
| DPD does not converge or temperature-sensitive |
|
|
|
ADRV9009
|
| JESD intermittent lane errors |
|
|
|
AD9528
DS280DF810
|
| Sync unlock or frequent holdover |
|
|
|
ZL30732A
8A34001
|
| Eth link flap or error bursts |
|
|
|
88X3310
DS280DF810
|
Evidence scripts Proof as executable steps (copy into lab/field checklist)
Script S1 — RF KPI drift (EVM/ACLR/spurs)
- Capture spur map across the problematic condition (temperature bin or output power point).
- Measure integrated jitter / phase-noise proxy at the RU clock boundary; correlate with EVM change.
- Record log snapshot: dpd_state, temperature_bin, pa_v, pa_i, protection_state.
- Pass/Fail: after corrective action, spur lines reduce and EVM/ACLR return within acceptance limits at the same bin.
Script S2 — DPD convergence
- Run DPD on/off A/B at fixed output power; compare ACLR and EVM deltas.
- Check ORx operating region (no clipping; not near noise floor); lock the sampling reference.
- Repeat across temperature bins to confirm convergence stability (no “single-bin collapse”).
- Pass/Fail: convergence time is bounded and ACLR improvement is consistent across bins.
Script S3 — JESD intermittent
- Log lane counters and bring-up state transitions around the event; identify burst timing.
- Validate SYSREF/LMFC relationship at RU clock distribution (window not edge-clinging).
- Temperature sweep at fixed configuration; note bin where errors rise.
- Pass/Fail: lane errors do not occur over the full sweep and deterministic behavior remains stable.
Script S4 — Sync unlock / holdover churn
- Collect TE curve and sync_state transitions; annotate source changes.
- Force controlled source loss; verify deterministic fall to holdover and bounded drift.
- Restore source; verify lock time and transition hysteresis (no rapid toggling).
- Pass/Fail: churn disappears, drift stays within bound, and logs fully explain transitions.
Script S5 — Fronthaul error bursts / flaps
- Read CRC/PCS/FEC counters with timestamps; mark burst windows.
- Read module DDM/LOS snapshots aligned to bursts; correlate with temperature and optical power drift.
- Use RU tap points: MAC TS vs PCS counters vs module indicators to isolate segment.
- Pass/Fail: bursts stop under the same stress profile and flap count stays flat.
H2-12 · FAQs (O-RU focused)
Each answer stays RU-side and ends with a “Proof hook” (what to measure/read/log) plus a quick next step.