Edge Backhaul Node (Microwave/mmWave/Fiber)

← Back to: 5G Edge Telecom Infrastructure

An Edge Backhaul Node bridges microwave/mmWave or fiber physical links to Ethernet/OTN transport while distributing low-noise frequency/time across internal modules. In practice, success is proven by controlling LO/PLL jitter and interface stability—and by exporting counters/logs that make EVM, BER, and switchover behavior measurable in the field.

H2-1 · What it is: system role and hard boundaries

DEFINITION (for quick extraction)

An edge backhaul node bridges physical transport (microwave/mmWave RF/IF or fiber optics) to Ethernet/OTN bearer interfaces, while distributing frequency/time inside the node to keep modulation quality, link stability, and transport counters within spec.

RF/IF ↔ Ethernet/OTN Low phase-noise LO/PLL BER/FEC evidence SyncE/PTP I/O Holdover + alarms

What is visible in the field (ports → KPIs)

Start from what technicians and buyers can verify on day one: ports, counters, and pass/fail signals.

RF / IF ports → link quality signals (EVM, spurs), stability under temperature and power cycling.
Ethernet ports → throughput, link flap rate, PRBS/BER measurements, retimer/PHY margins.
OTN ports → pre/post-FEC BER, FEC correction counters, framing alarms, long-haul stability evidence.
Timing I/O (SyncE/PTP/1PPS/ToD) → in/out availability, holdover behavior, time/frequency alarms.
Management / OAM → logs, counters, alarm history (lock/unlock, switchover events, temperature excursions).

Boundary checks (anti-overlap, fixed wording)

Bearer + physical mapping only: focuses on PHY/RF/OTN/Ethernet integration and validation; does not cover MEC/UPF compute and service pipelines.
Time/frequency I/O and distribution: supports SyncE/PTP/1PPS/ToD interfaces and node-level holdover; does not cover GNSS disciplining and time-source internals.
OTN/fiber interfaces (not fiber-panel switching): covers transport framing/FEC/counters; does not cover optical power/loss AFE or relay-based panel switching.

What “done” looks like (evidence-first)

Every claimed capability maps to at least one counter/log/alarm that can be exported and trended.
Transient events (link retrain, reference switch, temperature step) are captured with pre/post windows to convert “rare” issues into repeatable evidence.

Figure B1 — System boundary (ports, core blocks, timing distribution)

Boundary view: RF/IF conversion, LO/PLL quality, Ethernet/OTN bearer interfaces, and node-level timing I/O with observable evidence (counters/logs/alarms).

H2-2 · Deployment patterns and link budgets (engineering scenarios)

This section maps real deployment shapes to measurable KPIs, then connects each KPI to the internal block that usually dominates it.

Three common patterns (kept separate to avoid mixing)

PATTERN A

Microwave / mmWave backhaul (point-to-point, high capacity, often E-band)

Dominant KPIs: EVM margin, spurs, link stability under temperature and reference changes.
Most fragile block: LO/PLL phase-noise and conversion chain isolation.
What proves it: EVM trend vs. temperature; spur scan results; lock/unlock and switchover event logs.

PATTERN B

Fiber / OTN backhaul (long reach bearer, strong emphasis on statistics)

Dominant KPIs: pre/post-FEC BER, FEC correction margin, framing alarms, long-run stability.
Most fragile block: PHY/retimer margin and OTN framer/FEC behavior under stress.
What proves it: PRBS/BER reports, FEC counter baselines, link flap frequency, alarm timelines.

PATTERN C

Hybrid backhaul (wireless at the edge + fiber aggregation)

Dominant KPIs: availability, switchover hit, multi-interface recovery time.
Most fragile block: reference/time distribution and interface recovery behavior during transitions.
What proves it: before/after windows around interface retrain and reference switch; alarm + counter correlation.

KPI → dominant block → verification (field-friendly)

KPI (what users feel)	Dominant block (where it usually enters)	How to verify (evidence to capture)
Throughput (Gbps)	Ethernet PHY / retimer margin, FEC overhead behavior, interface retrain frequency	Traffic test + link flap count; PRBS; FEC counters baseline vs. stress
BER (pre/post-FEC)	SerDes/retimer equalization, optical/RF interface margin, framer/FEC robustness	PRBS BER report; post-FEC error trend; alarm timeline (frame loss, LOS/LOF)
EVM margin	LO/PLL phase noise, spurs and leakage paths, conversion chain isolation	EVM vs. temperature; spur scan; lock/unlock logs; switchover disturbance windows
PDV / jitter (packet/time)	Timing distribution (SyncE/PTP I/O), holdover behavior, reference transitions	Offset/jitter statistics (node I/O); reference change logs; alarm correlation
Availability (hitless expectation)	Interface recovery and switchover handling, alarm thresholds, event logging completeness	Drop/restore timestamps; counters around events (pre/post); reproducible switchover script

Budget thinking (three parallel chains)

RF chain budget: phase noise / spurs → EVM margin → achievable modulation/capacity under temperature and aging.
Bearer chain budget: SerDes margin → BER (pre/post-FEC) → long-run stability with counters as proof.
Sync chain budget: time/frequency I/O quality → holdover + transitions → alarms and evidence windows for field issues.

Figure B2 — KPI budget map (error sources tied to blocks + proof points)

KPI budget map: each external metric (EVM/BER/Throughput/PDV) is tied to the block that typically dominates it and to the proof artifacts that should be captured.

H2-3 · Reference architecture: a port-to-module profile (field-ready)

WHY THIS SECTION EXISTS

This architecture is organized as a repeatable profile from ports to internal blocks, so every KPI can be tied to a measurable counter, log, or alarm. The same block names are reused in later troubleshooting sections to keep the page consistent and searchable.

Fixed layers (do not reorder)

Port / Front-end → connectors, protection boundaries, and what enters the node.
Conversion / PHY → RF/IF conversion or optical-electrical PHY/retimers.
Framing / Mapping → OTN framing/FEC and mapping into Ethernet services.
Timing → SyncE/PTP/1PPS/ToD I/O, distribution, node-level holdover alarms.
Mgmt / OAM → counters/logs/alarms that prove the above layers.

Module checklist (Inputs / Outputs / KPIs / Evidence)

Microwave/mmWave front-end (RF/IF)

I/O: RF/IF ports ↔ conversion chain
KPIs: EVM sensitivity, spur levels
Evidence: EVM trend, spur scan list, temperature-linked alarms

Conversion chain (Mixer / IF filters)

I/O: IF ↔ RF (Tx/Rx)
KPIs: image rejection, LO leakage, group delay ripple
Evidence: spur signatures vs LO power, IQ calibration residuals

LO / PLL distribution

I/O: reference in → LO out to mixers / SerDes clock domains
KPIs: phase noise, lock stability, switchover disturbance
Evidence: lock/unlock logs, switchover events, jitter/offset windows

Optical module + PHY / Retimer (fiber/electrical)

I/O: optical/electrical ports ↔ SerDes lanes
KPIs: BER margin, equalization stability, retrain frequency
Evidence: PRBS BER report, link flap counters, LOS/LOF alarms

OTN framer + FEC (bearer integrity)

I/O: OTN bearer ↔ framed payload
KPIs: corrected/uncorrected errors, latency overhead budget
Evidence: pre/post-FEC BER, corrected counters, framing alarms timeline

Mapping to Ethernet services

I/O: framed payload ↔ Ethernet ports/services
KPIs: throughput stability, loss events around transitions
Evidence: traffic tests, drops around retrain/switchover, service counters

Timing I/O and distribution (node scope)

I/O: SyncE/PTP/1PPS/ToD in/out
KPIs: offset/jitter stats, holdover drift profile, alarms
Evidence: offset logs, holdover alarms, reference change records

Environmental health (only what this page needs)

I/O: temperature/voltage/current monitors → alarms
KPIs: stability under heat, repeatability of counters
Evidence: alarm history + counters correlated to temperature steps

Mgmt/OAM plane (proof artifacts)

I/O: counters/logs/alarms export
KPIs: completeness and time-correlation
Evidence: event timelines, “before/after” windows, trendable baselines

Figure B3 — Port → modules → port profile (single-page architecture map)

A field-ready map: ports and internal blocks are aligned to KPIs (EVM/Phase Noise/BER/Latency) and supported by Mgmt/OAM evidence.

H2-4 · Up/down conversion chain: where EVM and spurs are decided

ENGINEERING FOCUS

In microwave/mmWave backhaul, modulation quality is most often limited by the conversion chain and its LO behavior. The goal here is not broad RF theory, but a repeatable mapping from visible symptoms to the first measurement that reduces uncertainty.

Typical paths (minimum necessary view)

Tx: Baseband/IF → Mixer → RF → PA → RF out
Rx: RF in → LNA → Mixer → IF → baseband

Architecture choice (only what changes EVM/spurs/filtering)

Option	What it improves	Where it becomes fragile
Super-heterodyne	Image filtering opportunities, spur separation, more controllable IF shaping	More blocks and conversions; more places for LO feedthrough and group delay ripple to enter
Direct conversion	Simpler frequency plan and fewer conversions; potentially lower latency	Highly sensitive to LO leakage, I/Q imbalance, and DC/near-DC impairments that show up as EVM loss

Failure modes (symptom → likely cause → first measurement)

Symptom: EVM degrades suddenly or at a specific temperature

Likely causes: LO phase-noise change, PLL marginal lock, group delay ripple shift
First measurement: EVM vs temperature step; PLL lock/unlock log; spur scan before/after

Symptom: a strong spur appears near the channel

Likely causes: LO leakage, mixing products, reference spurs, image folding
First measurement: spectrum spur list; spur amplitude vs LO power; confirm image frequency location

Symptom: capacity drops at higher modulation orders

Likely causes: phase noise floor, IQ imbalance residual, PA/LNA compression edge
First measurement: EVM distribution histogram; IQ calibration residual; power sweep vs EVM

Symptom: RX noise floor rises and comes/goes

Likely causes: LO leakage coupling, intermodulation, front-end isolation drift
First measurement: noise floor vs LO on/off; spur scan; compare with shield/grounding states

Symptom: EVM changes when switching references or links retrain

Likely causes: LO distribution disturbance, marginal timing domains, temporary unlock
First measurement: event window capture (pre/post); lock events; EVM/BER correlation in time

Figure B4 — Mixing and spur paths (LO leak, image, I/Q, IMD, group delay)

Keep text minimal: LO leak, image folding, I/Q imbalance, IMD, and group-delay ripple are common mechanisms that manifest as EVM loss and spur violations.

H2-5 · Low phase-noise PLL/LO: from L(f) to jitter to EVM/BER

ENGINEERING SUMMARY (extractable)

Phase noise L(f) becomes integrated jitter after the loop and distribution chain, and that jitter reduces constellation separation (EVM margin) and effective demodulation thresholds—especially at high-order modulation and tight link budgets. This section focuses on node-local reference choices, loop-bandwidth trade-offs, and where jitter cleaning is most effective.

L(f) → Jitter Jitter → EVM EVM → BER margin Node holdover Loop BW

The practical chain: what to measure and what it impacts

Phase noise L(f)

What it is: noise around the carrier at different frequency offsets
Why it matters: different offset regions dominate different impairments (close-in vs far-out)
Evidence: spur/phase-noise report + lock stability logs

Integrated jitter

What it is: time-domain summary of noise over a bandwidth window
Why it matters: reduces sampling/LO stability and increases demodulation uncertainty
Evidence: jitter metrics + event-window correlation (reference switch → jitter/EVM spike)

EVM / BER margin

What it is: modulation quality and error margin under real channel conditions
Why it matters: sets achievable modulation order and capacity at a given link budget
Evidence: EVM histogram/trend, BER trend, capacity step-down events

Node reference choice (TCXO vs OCXO): boundary and decision cues

This page stays within node-local behavior: short disruptions, brief reference instability, and holdover alarms—without expanding into GNSS disciplining or atomic sources.

Reference	Best for (node-local)	Typical risks / what to verify
TCXO	Cost/power-sensitive nodes, moderate holdover needs, fast warm-up	Temperature sensitivity shows as drift/offset during transitions; verify reference switch windows + alarms
OCXO	Stricter stability during short reference disruptions and tighter EVM margins	Power/thermal constraints; verify warm-up behavior, stability under enclosure temperature ramps

Verification anchor: capture “before/after” windows around reference changes (or induced disturbances) and correlate with EVM/BER and PLL lock status.

Loop bandwidth: the three dominance regions (how to tune and how to accept)

Close-in region (near carrier)

Dominates: reference-related noise, reference spurs, marginal lock behavior
Engineering move: improve reference quality and spur control; validate lock stability
Acceptance: lock/unlock frequency, spur list stability, “no surprise” events

Mid region (around loop BW)

Dominates: jitter transfer and disturbance recovery behavior
Engineering move: set BW/ damping to balance tracking vs noise injection
Acceptance: reference switch window shows limited EVM/BER spike and fast settle

Far-out region (higher offsets)

Dominates: VCO noise floor, distribution buffers, multiplication chain noise
Engineering move: choose lower-noise VCO/distribution and improve isolation
Acceptance: noise floor and spur baseline remain stable across load/temperature

Where to place jitter cleaning inside the node (most effective first)

RF LO path (most direct EVM impact)

Goal: reduce LO-related phase noise and spurs seen by mixers
Prove it: EVM improves without changing channel conditions; spur list stabilizes

SerDes / Retimer path (BER and stability)

Goal: improve link margin and reduce retrain/link-flap probability
Prove it: PRBS BER improves; link retrain counters drop; FEC pressure trends lower

SyncE / timing output path (I/O quality and alarms)

Goal: stabilize timing I/O under reference changes and short disruptions
Prove it: timing alarms reduce; offset/jitter logs tighten during transitions

Figure B5 — Noise contribution + loop bandwidth (dominance regions without curves)

A curve-free view: close-in/mid/far-out regions have different dominant contributors; loop bandwidth shifts which source dominates and how fast the system settles.

H2-6 · Ethernet / OTN interfaces: bearer mapping and clock recovery boundaries

BOUNDARY (anti-overlap)

This section covers interface integrity and bearer mapping evidence (PHY/retimer, framer/FEC, counters and alarms). It does not cover switching, queue behavior, or TSN scheduling, which belong to edge switching and boundary-clock switch topics.

Run full Run stable Prove stability BER / PRBS FEC counters

Ethernet side (PHY / Retimer / SerDes): what breaks stability first

Training / equalization stability

Failure symptom: repeated retrain, link flaps, rate fallback
What to capture: retrain counters, link-up/down timeline, temperature correlation

Jitter and BER margin

Failure symptom: BER rises under stress, sporadic packet loss bursts
What to capture: PRBS BER reports, eye margin summaries (if available), error counters

FEC overhead awareness (where applicable)

Failure symptom: “throughput looks fine” but errors keep accumulating
What to capture: corrected/uncorrected trend and whether it climbs with temperature/load

OTN side (Framer / FEC / mapping): why it exists in this node

OTN is treated here as an evidence-rich bearer layer: it provides framing alarms and FEC statistics that prove long-run stability under real stress. The focus stays on what it adds inside the node (verification items), not on network-level service engineering.

Framing alarms and long-run evidence

Verification: LOS/LOF, frame loss, mapping alarms with timestamps
Use: tie alarm bursts to retrain events and environmental steps

Pre-FEC / Post-FEC BER and counters

Verification: baseline + drift over time (not a single snapshot)
Use: distinguish “margin shrinking” vs “rare transient” behavior

Mapping into Ethernet services (node scope)

Verification: stable throughput and minimal drop spikes around transitions
Use: event-window capture (retrain/switchover) to prove no hidden instability

Interface checklist: run full, run stable, prove stable

Checklist item	Failure symptom	Primary evidence
Run full rate (sustained)	Rate fallback, unstable lane bring-up	Throughput test + link training status + lane error counters
Run stable BER (stress/temperature)	BER spikes, intermittent drops	PRBS BER report; time-correlated error counter trend
FEC pressure stays bounded	Corrected errors climb continuously	Corrected/uncorrected counters; pre/post-FEC BER trend
No hidden instability around events	Drops during retrain/switchover	Event windows (pre/post) + drop counters + alarm timeline
Alarms are actionable	“No alarm” yet link is unstable	Alarm taxonomy + timestamped logs; cross-check with counters

Figure B6 — Electrical/Optical → Retimer → Framer/FEC → Service (counters attached)

Interface flow is kept at bearer level. Each stage has a small set of counters that prove “run full, run stable, and stay stable during events.”

H2-7 · Precision timing in-node: SyncE + PTP landing points (node scope only)

IN-NODE FIVE-STEP LOOP (extractable)

Precision timing in this node is implemented as a closed loop: Inputs → Selection → Distribution → Monitoring → Alarms. SyncE provides frequency, PTP provides time, and 1PPS/ToD provides auxiliary alignment/verification. The goal is stable delivery to RF/clock consumers with evidence-based alarms and event-window proofs.

SyncE (frequency) PTP (time) 1PPS / ToD Selector Holdover Evidence

Inputs: three timing signal classes and what each means

SyncE frequency

Meaning: a frequency reference transported over Ethernet PHY timing
Interface focus: SyncE-capable port/clock recovery domain
Common misuse: treating “frequency OK” as “time OK” without time validation

PTP time

Meaning: time-of-day and phase alignment, validated through offset statistics
Interface focus: PTP packet path and timestamp domain (TSU boundary)
Common misuse: relying on a single snapshot instead of trends + event windows

1PPS / ToD auxiliary

Meaning: auxiliary alignment, verification, or local distribution aid
Interface focus: 1PPS pin + ToD interface (format dependent)
Common misuse: ignoring distribution path noise and treating the pin as “ideal”

Selection and switchover: reference priority, hitless intent, and real disturbance paths

Reference selector and priority policy

What to define: primary/backup, failure detection, recovery hysteresis (avoid flapping)
What to prove: selector decisions match alarms and measurable degradations

Switchover types

Hitless goal: minimal service impact (ideal behavior)
Realistic case: brief disturbance may show as PLL transient, PTP offset spike, or RF EVM glitch
Evidence: event-window capture around switch action

Holdover (TCXO/OCXO in-node)

Role: maintain acceptable stability during short reference loss/instability
Prove it: holdover-enter/exit logs + drift/offset trend stays bounded

Distribution: who needs what quality (RF LO vs SerDes vs timestamp domain)

Consumer	Primary sensitivity (what degrades first)	Fastest proof (node evidence)
RF LO / sampling clock	EVM margin and demodulation threshold; micro-glitches show during disturbances	EVM trend + spur stability + switch-window correlation
SerDes / retimer domain	BER margin, retrain probability, and FEC pressure trends	PRBS BER + link flap/retrain counters + corrected/uncorrected trend
Timestamp unit (TSU) domain	PTP offset/jitter stability; spikes reveal reference and distribution events	Offset/jitter time series + holdover/selector logs on the same timeline

Monitoring and alarms: the minimum evidence set (actionable, not decorative)

Reference + selector status

Capture: active ref, priority, switch count, switch reason, flapping guard actions

PLL lock + holdover state

Capture: unlock events, relock duration, holdover enter/exit, drift trend summary

Timing statistics on one timeline

Capture: PTP offset/jitter summary + windowed spikes aligned with switch events

Cross-domain correlation

Capture: overlay EVM / BER / offset around events to identify dominant disturbance paths

Figure B7 — In-node clock tree: Inputs → Selector → PLL/Cleaner → Consumers + alarm loop

A node-local timing view: SyncE/PTP/1PPS enter the node, a selector decides priority and switchover, PLL/cleaner distributes clocks to RF/SerDes/TSU, and monitoring/alarms close the evidence loop.

H2-8 · Redundancy & reliability: proving switchovers and rollbacks with evidence

KEY PRINCIPLE (extractable)

“Switched over” is not the same as “service is unaffected”. Every redundancy action must be validated with an event window and a minimal KPI set: drops, EVM micro-glitch, PLL unlock, and PTP offset spike. This section provides acceptance scripts that bind each action to counters and logs.

Event window Drops EVM PLL unlock PTP offset Rollback proof

Redundancy dimensions (kept separate to avoid mixed conclusions)

Link redundancy

Examples: dual ports, dual media paths, alternate backhaul routes (node view)
Risk: retrain/link flap, short packet drops, FEC pressure climb
Evidence: link timeline + PRBS/BER + FEC counters + drop window

Reference redundancy

Examples: ref1/ref2 policy, SyncE/PTP/1PPS combinations (node view)
Risk: PLL transient, offset spike, holdover enter/exit oscillation
Evidence: selector logs + lock/holdover + offset/jitter windows + (optional) EVM window

Protection and rollback (in-node)

Examples: thermal throttling, safe-mode, firmware rollback with audit trail
Risk: “recovered” but margins shift (BER/EVM/offset baseline changes)
Evidence: version log + baseline comparison before/after

Acceptance method: one event window, four KPIs, one decision

KPI	What “bad” looks like	Fast evidence (node scope)
Drops	burst drops during switch and settle phases	drop counters + throughput trace aligned to switch timestamp
EVM micro-glitch	brief constellation expansion or spur pop during switchover	EVM trend/histogram with event markers (if RF chain is present)
PLL unlock	unlock/relock events or long settle time	lock log + settle duration + holdover status
PTP offset spike	offset exceeds thresholds or slow return to baseline	offset/jitter time series with pre/post window summary

Acceptance scripts (each action bound to counters and logs)

Script A — Link switchover (Port A → Port B)

linkevent window

Pre-check

Capture baseline throughput, drop counters, link status, retrain counters, and BER/FEC trend (if available).

Action

Trigger link failover by disabling Port A (or forcing media loss) while keeping Port B ready.

Window

Record pre-window and post-window on the same timeline; mark the exact switchover timestamp.

Capture

Drops + throughput trace, link flap/retrain counters, PRBS BER (if used), corrected/uncorrected counters, temperature snapshot.

Pass

No burst drops beyond policy limits; retrain stabilizes quickly; BER/FEC pressure does not trend upward after settle.

Script B — Reference switchover (Ref1 → Ref2 / holdover)

timingoffset

Pre-check

Confirm active reference, selector priority, lock status, and baseline offset/jitter trend; capture RF/BER baseline if applicable.

Action

Remove or degrade Ref1 to force selector decision; optionally test holdover entry and exit.

Window

Mark the switch moment; evaluate “switch” and “stabilize” phases separately.

Capture

Selector logs (reason codes), PLL unlock/relock timeline, holdover status, PTP offset spike/settle, and (optional) EVM micro-glitch.

Pass

No unexpected unlock storms; offset spike is bounded and returns to baseline quickly; no recurring flapping.

Script C — Firmware/software rollback (vX → vY) with proof

rollbackbaseline

Pre-check

Record current version, configuration hash, and baseline KPIs (drops/BER/FEC pressure/offset) under a repeatable traffic profile.

Action

Perform rollback to the target version and reboot/restore service according to standard procedure.

Window

Capture boot-to-service timeline and the first stable operating window; compare against pre-check baseline.

Capture

Version/audit logs, service readiness markers, counters baseline after restore, and any protective-mode entries.

Pass

KPIs return to the known-good baseline; no new alarm patterns; repeatability confirmed across multiple cycles.

Figure B8 — Switchover state machine: Normal → Degrade → Switch → Stabilize (what to capture)

Treat every switchover as a multi-phase process. The proof chain comes from event windows and KPI correlation, not from “it switched” status alone.

H2-9 · Observability & operations: using OAM/telemetry to debug jitter, errors, and link drops

WHY THIS MATTERS (extractable)

The most expensive field faults are intermittent: jitter bursts, transient BER/FEC events, and short link drops. The node becomes diagnosable only when it exposes a minimal evidence set and records pre/post event windows so “cannot reproduce” turns into a replayable timeline.

PLL lock Jitter proxy FEC counters BER Link flap Temp Power alarms Ref switch log Pre/Post window

Minimal observability set (grouped by what it can prove)

Timing / clock evidence

Capture: PLL lock/unlock + lock time, holdover enter/exit, selector active reference + switch reason, jitter proxy (cleaner status/metrics).
Proves: whether timing disturbances align with service symptoms (EVM spikes, offset spikes, BER stress).

Link / PHY evidence

Capture: link up/down, link flap count, retrain events, training/equalization summary (if available), windowed BER (not a single snapshot).
Proves: whether errors originate in the physical link and how quickly recovery stabilizes.

FEC / framing evidence (Ethernet/OTN interface scope)

Capture: corrected/uncorrected counters, FEC pressure trend (margin proxy), mapping/framing fault flags if exposed.
Proves: whether the link is “alive but stressed” and how close it is to uncorrectables.

Thermal / power alarms (node-local)

Capture: temperature snapshots/trends, throttle/safe-mode markers (if present), node-local power alarms (UV/OV/PG) and reset reasons.
Proves: whether environment or node-local power events coincide with drops or retrains.

Audit trail

Capture: firmware version + config hash, reboot cause, major state transitions (reason codes).
Proves: whether “same symptom” is actually a different software/config regime.

The most valuable log: pre/post event windows (make intermittents replayable)

Window thinking beats snapshots

Problem: intermittent faults rarely show on a single read; they show as a short spike and a settle phase.
Approach: always capture a pre-window and post-window around a trigger and align all evidence onto one timeline.

Trigger sources (automatic + operator)

Automatic: PLL unlock, offset spike, BER jump, FEC uncorrected, link flap/retrain, thermal/power alarm.
Operator: manual “mark event” during observed degradation, then export the buffered window.

Correlation rule (one timeline)

Align: timing (lock/selector/holdover) + interface (BER/FEC/link) + environment (temp/power) to the same event timestamp.
Result: the dominant cause becomes visible through consistent co-occurrence patterns.

Symptom → evidence mapping (what to pull first, and what often misleads)

Symptom	First evidence to pull	Second evidence (confirm/triage)	Common misread
EVM gets worse	PLL lock/unlock + jitter proxy + selector switches (event window)	Temperature trend + any RF chain status snapshots (if exported)	Blaming RF hardware without checking timing disturbances and settle events
Throughput drops	Drop counters + link retrain/link flap timeline	FEC pressure trend (corrected growth) + BER window stats	Assuming “traffic” is the cause while ignoring a recovering link or rising FEC stress
Intermittent disconnect	Link up/down + flap counters + uncorrected events + reset reasons	Node-local power alarms (PG/UV) + thermal alarm markers	Chasing a remote network issue when the node is cycling a local protection path
Delay jitter spikes	PTP offset/jitter series + selector/holdover events	Interface error windows (BER/FEC) that correlate with packet timing variance	Over-attributing to switching/queues without first proving a timebase disturbance

Figure B9 — Evidence pipeline: counters → OAM logic → event window buffer → telemetry export

Node-local evidence should flow from hardware counters into OAM logic, be preserved as pre/post windows, and be exported with enough context to replay intermittent faults remotely.

H2-10 · Validation & production test: what “done” means and how stability is proven in the field

DEFINITION OF “DONE” (extractable)

Validation is complete only when key KPIs remain stable across disturbance scenarios and the node can produce field evidence. Testing is structured in three layers: engineering validation, production screening, and field self-check (node scope only).

EVM PRBS / BER FEC margin PLL lock time Ref switchover Holdover vs temp Hot-plug / retrain Evidence fields

Three-layer structure (engineering → production → field self-check)

Engineering validation

Goal: characterize margins under combined disturbances (timing, interface stress, temperature).
Output: repeatable procedures + windowed KPI summaries tied to firmware/config identity.

Production test

Goal: fast screening for outliers that will fail in the field (PHY/clock recovery issues).
Output: pass/fail records + minimal counters captured per unit.

Field self-check (node scope)

Goal: prove stability with evidence logs (event windows) and verify readiness after resets/switchovers.
Output: self-check results + aligned evidence export to operations.

Test template (test item → fixture/input → pass criteria → record fields)

Test item	Fixture / input	Pass criteria (windowed)	Record fields (evidence)
EVM stability	Modulated RF link / controlled channel; optional thermal steps	No abnormal expansion/spikes during steady state and during disturbance windows	EVM trend + event markers + temperature snapshot + active reference
PRBS / BER	PRBS generator/checker or link test mode; cable/optical path variations	BER stays within margin; no burst errors after settle	BER windows + retrain/link flap counters + eye/eq summary (if available)
FEC margin / pressure	Stress link (attenuation/noise); framing active	Corrected counter growth acceptable; uncorrected events absent or bounded	Corrected/uncorrected counters + margin proxy + timeline correlation
PLL lock time	Power cycle / ref re-apply; repeated runs	Lock time consistent; no unlock storms	Lock/unlock log + settle duration + holdover transitions
Reference switchover disturbance	Force Ref1 loss/degrade; trigger selector decision	Bounded offset spike; fast return to baseline; no flapping	Selector reason code + offset window stats + PLL events
Holdover vs temperature	Thermal sweep (engineering) / limited hot/cold points (production)	Drift trend remains within operational tolerance for expected durations	Holdover duration + drift/offset summary + temperature trace
Hot-plug / retrain recovery	Repeated plug/unplug or link drop injection	Recovery time consistent; service stabilizes quickly	Link timeline + retrain counts + drops window + FEC/BER after settle
Evidence export completeness	Trigger windows; export to remote collector	All required fields present and aligned to one timeline	Event window bundle: timing + link + FEC + thermal + audit

Figure B10 — Test-point overlay (TPs) across the node profile (RF / Ref / SerDes / FEC / OAM)

A practical overlay: define test points (TPs) that map directly to KPIs and recorded evidence. This keeps validation, production screening, and field self-check aligned to the same proof chain.

H2-11 · BOM / IC selection checklist (criteria + example part numbers)

This section is a selection-by-evidence checklist for an edge backhaul node: each device class is judged by criteria → why it matters → how to verify, then a short list of representative IC part numbers is provided for fast sourcing and design reviews.

Figure B11 — Where BOM criteria land inside an Edge Backhaul Node