Synchronous Ethernet (SyncE) for Carrier & 5G Backhaul
← Back to:Reference Oscillators & Timing
Synchronous Ethernet (SyncE) distributes a stable frequency across Ethernet networks so carrier backhaul stays traceable and resilient. This page shows how to design, configure, and verify the full chain—from PHY recovery and jitter filtering to QL/SSM policy, protection switching, and holdover—using measurable pass criteria.
What is SyncE (Synchronous Ethernet) — and what it is NOT
Synchronous Ethernet (SyncE) is frequency distribution over Ethernet: a node recovers a stable line-derived clock at the PHY and uses it as a disciplined frequency reference for its local clock tree. It improves network-wide frequency coherence across hops—especially valuable in carrier/5G backhaul where ppm-level drift and noisy clock recovery can cascade into timing alarms and service instability.
- A frequency base delivered hop-by-hop via PHY recovered timing (line clock → recovered clock → disciplined local outputs).
- A controllable jitter/wander profile when paired with filtering (tracker vs cleaner partitioning) and well-defined pass criteria.
- Operational traceability hooks (quality/alarms/holdover behavior) so timing degradation is visible and actionable.
- Not absolute time-of-day (no “wall-clock time” delivery).
- Not phase/time alignment by itself (phase/time convergence belongs to time protocols and external references).
- Not a substitute for full timing budget ownership (it is one layer; filtering, switching, and observability still decide outcomes).
Where SyncE sits in a carrier timing chain (EEC/SEC/SSU roles)
In carrier networks, SyncE is not “a feature on a port”—it is a system timing chain. A high-quality upstream reference is transported over Ethernet links, recovered at each node, filtered, selected against alternatives, and redistributed to local endpoints. Carriers require traceability (knowing what is locked to what), switchability (fault isolation and fallback), and observability (alarms and logs that explain degradations).
- Does: recover line frequency, filter it, and output a controlled local frequency reference.
- Verify by: lock status + frequency-quality indication + jitter/wander transfer consistency under stress.
- Does: choose the best source, enforce quality policy, and distribute clocks to all local domains.
- Verify by: stable source selection (no flapping) + deterministic switching + complete logs for cause/effect.
- Does: keep frequency stable enough when the reference is lost, then re-acquire without instability.
- Verify by: drift trend across temperature/time + clean re-acquisition transient + bounded alarm behavior.
- Fault isolation: upstream degradation should not silently poison many downstream nodes.
- SLA protection: switching policy and holdover behavior are part of service continuity, not optional tuning.
- Operational replay: alarms and logs must reconstruct “what changed” (source/quality/lock) when a timing event occurs.
Treat the system as a repeated unit: recover → filter → select → distribute → monitor. Each hop can add noise and can also filter noise; the engineering job is to decide who owns which noise band and to make failures observable (lock, quality, switching, and holdover traces).
- Selected source ID + current quality level (QL/quality state)
- Lock state timeline (locked → holdover → reacquire)
- Switch reason + holdoff timers + revertive policy state
- Key performance snapshots (output jitter/wander checks at defined points)
SyncE architecture model: Source → Link → Recovery → Distribution
A SyncE design is best treated as a repeatable chain: source quality enters the system, the Ethernet link transports line timing, the node recovers frequency at the PHY/PLL boundary, and the board distributes disciplined clocks to local endpoints. This model prevents gaps: every block must deliver both clock and truth (status) for operations.
- Delivers: frequency stability and holdover potential.
- Exposes: temperature/aging behavior that becomes long-term drift.
- Primary risk: “good on bench, bad in field” due to thermal gradients and aging.
- Delivers: transport of frequency over physical-layer timing.
- Exposes: hop-to-hop noise accumulation and path dependency.
- Primary risk: upstream degradation silently propagating downstream.
- Delivers: recovered frequency plus lock/quality state.
- Exposes: transfer behavior (what noise is tracked vs rejected).
- Primary risk: “locked but unqualified” behavior without usable status.
- Delivers: disciplined clocks (e.g., 25 / 125 / 156.25 MHz) with controlled skew.
- Exposes: board coupling (power/EMI/return paths) that can re-inject noise.
- Primary risk: the clock tree becoming a noise amplifier and skew generator.
- Decision: what noise is tracked vs attenuated.
- Bad sign: faster lock but worse output jitter, or quiet output but poor tracking.
- Verify: compare recovered clock vs cleaner output under controlled disturbances.
- Decision: priority, holdoff timers, revertive behavior.
- Bad sign: source flapping or timing loops after adding redundancy.
- Verify: event logs reconstruct “source → quality → switch reason” unambiguously.
- Decision: drift bounds, temperature behavior, re-acquisition behavior.
- Bad sign: acceptable minutes-level drift but hours-level divergence in the field.
- Verify: long-soak drift logs + temperature sweep with repeatable entry/exit criteria.
Performance targets that matter: jitter vs wander, masks, and pass criteria
SyncE acceptance must be written as measurable behaviors, not abstract numbers. The key is separating jitter (short-term phase variations) from wander (slow frequency/phase drift), then assigning ownership across the chain (recovery vs cleaning vs distribution). Pass criteria should be derived from network budgets and equipment class, and expressed as repeatable tests.
- Typical symptom: intermittent lock instability, timing-quality alarms, endpoint tolerance issues.
- Common causes: bandwidth partitioning mistakes, power/EMI coupling into PLLs, distribution re-injection.
- Most useful check: compare recovered clock vs cleaner output under the same stimulus to locate ownership.
- Typical symptom: slow degradation, holdover drift, quality downgrades after hours or temperature changes.
- Common causes: oscillator thermal gradients, aging model mismatch, policy-driven long settling after switching.
- Most useful check: long-soak drift logs correlated with temperature and source-selection events.
- Measure point: cleaner output (TP2) and one endpoint clock (TP3).
- Stimulus: normal operation + induced supply/EMI stress representative of deployment.
- Pass: jitter stays within the allocated budget for the equipment class and endpoint tolerance.
- Measure point: local frequency output used as timing reference (TP2/TP3).
- Stimulus: long soak + temperature sweep + reference-loss entry into holdover.
- Pass: drift stays bounded by the service budget over the required holdover window.
- Measure point: lock-state timeline and performance snapshots at defined instants.
- Stimulus: link up/down, source change, and controlled degradations.
- Pass: predictable settling without alarm flapping, consistent across temperature and lots.
- Measure point: the same output clock before/during/after switching (TP2/TP3).
- Stimulus: forced primary loss and controlled revertive transitions.
- Pass: transient stays within the network’s permitted window and is repeatable.
- Use placeholders tied to ownership: X = network jitter budget, Y = equipment class requirement, Z = endpoint tolerance.
- Express criteria as behavior: “After switching, output frequency error settles within ±X in Y seconds, with no quality flapping.”
- Always bind criteria to measure point (TP2/TP3) and stimulus (temperature, link loss, forced switch).
Clock recovery in Ethernet PHY: what can and cannot be controlled
In SyncE systems, the Ethernet PHY is the frequency recovery boundary: line timing enters on the left, a CDR/PLL recovers frequency, and the result is either consumed internally or exported as a recovered clock accompanied by status/alarms. Engineering success depends on separating knobs (what is configurable and observable) from limits (what is dictated by PHY architecture and board coupling).
- Line timing in: frequency is embedded in data transitions; SyncE uses physical-layer timing, not packets.
- CDR/PLL recovery: the PHY recovers frequency and maintains lock under defined jitter tolerance.
- Recovered clock out: may be internal-only or exported (path and output standard matter).
- Status/alarms: lock/quality indicators are mandatory for operations, switching, and postmortems.
- Decision: internal-only vs exported, and LVCMOS/LVDS/HCSL/LVPECL as applicable.
- Why: mux and pin routes can inject noise; exported clocks may differ from internal clocks.
- Verify: compare TAP (near PHY) vs system receive point under the same stimulus.
- Decision: recovery behavior, lock criteria, and any jitter-optimized modes the PHY provides.
- Why: the transfer behavior can change substantially between modes.
- Verify: measure output jitter and lock robustness trends after mode changes.
- Decision: reference quality, routing, filtering, and power domain isolation for PHY timing.
- Why: reference pollution can lift the recovered-clock noise floor directly.
- Verify: isolate/swap reference and check whether recovered-clock metrics improve coherently.
- Decision: what can be read (lock/quality/freq offset) and what can be set (holdover entry/exit, priorities).
- Why: without truth signals, field failures become blind tuning.
- Verify: define minimum log schema: source ID, quality, lock state, switch reason, holdover state.
- Architecture ceiling: recovered-clock jitter floor and jitter tolerance are PHY-dependent; configuration changes have bounded impact.
- Board sensitivity: supply noise, ground bounce, and crosstalk can dominate high-frequency jitter even with a clean upstream source.
- Export path penalties: muxing and pin routing may add noise; internal clocks can differ from exported clocks.
- Black-box behavior: lock/quality alarm policies can be opaque; reliable diagnosis requires status plus controlled stimuli.
Jitter filtering strategy: tracker vs cleaner loop bandwidth partitioning
Two-loop SyncE designs work when each loop has a clear job: the tracker follows upstream frequency (wider bandwidth), while the cleaner suppresses random jitter and isolates noise (narrower bandwidth). The system must enforce ownership—who passes, who attenuates, and where noise is re-added— and must avoid loop contention that turns “locked” into “unstable in the field.”
- Purpose: follow upstream frequency and maintain SyncE continuity.
- Expected behavior: pass slow variations while avoiding unnecessary high-frequency noise transfer.
- Common misuse: using tracker output as final delivery clock without downstream cleaning.
- Purpose: reduce random jitter and decouple upstream noise from local distribution.
- Expected behavior: avoid chasing slow disturbances that belong to tracking/selection policy.
- Common misuse: setting cleaner BW too wide, turning it into a noisy follower.
- Pass: slow frequency variations that must be tracked to stay aligned to the timing chain.
- Attenuate: random jitter that should not reach distribution and endpoints.
- Add: board-level coupling and distribution artifacts that can re-inject noise even after cleaning.
- Symptom: lock holds, but alarms flap; switching causes overshoot or ringing.
- Fast check: tracker output and cleaner output move in the same direction under the same stimulus.
- Fix direction: enforce bandwidth hierarchy and clear ownership boundaries.
- Symptom: periodic modulation appears after changes or during holdover transitions.
- Fast check: lock stays true while frequency error or phase trend oscillates.
- Fix direction: reduce coupling paths (control, power, policy), and widen separation of dynamic responses.
- Symptom: lock time improves but output jitter exceeds budget or endpoints lose tolerance margin.
- Fast check: cleaner output “follows” upstream disturbances too closely.
- Fix direction: narrow cleaner bandwidth and keep it as a true jitter attenuator.
- Symptom: good jitter at steady state, but drops lock or downgrades quality under drift/switch events.
- Fast check: tracker output cannot accommodate wander-like changes without entering alarms/holdover.
- Fix direction: widen tracker tracking intent and align policy timers with loop dynamics.
- Wander-like stimulus: tracker should reflect tracking; cleaner should not over-chase.
- Jitter-like stimulus: cleaner should attenuate; endpoint should gain margin vs recovered clock.
- Switching stimulus: transients remain within budget-derived placeholders and do not cause alarm flapping.
- Pass template: under defined stimuli, tracker and cleaner outputs must show distinct ownership trends rather than co-moving.
Quality Levels (QL), SSM/ESMC, and avoiding timing loops
In SyncE networks, frequency is selected and propagated through many nodes. Quality Levels (QL) carried by SSM/ESMC exist to make source selection deterministic: every node can tell which reference is more trustworthy, when to downgrade, and when to switch. Without consistent QL policy, timing loops can form—nodes indirectly reference each other—leading to drift-like behavior, alarm flapping, and unstable switching.
- QL: a network-visible trust label for frequency references, enabling consistent “best source” decisions.
- SSM/ESMC: a transport mechanism for QL so downstream nodes see upgrades/downgrades in real time.
- Operational value: prevents “link-up but wrong source” scenarios and makes failures isolatable by policy and logs.
- Setting: ordered list of eligible inputs.
- Risk: priority-only selection can choose a high-priority but low-trust source.
- Pass intent: selection must satisfy both priority and minimum QL gate.
- Setting: acceptable QL set for “valid reference”.
- Risk: loose gating spreads degraded sources downstream.
- Pass intent: downgrades must propagate consistently; no local “false high quality”.
- Setting: whether to return to primary after recovery.
- Risk: immediate revertive behavior can cause repeated disturbances and flapping.
- Pass intent: any return-to-primary must satisfy holdoff/WTR stability windows.
- Setting: delay before switching, recovery wait, and alarm suppression window.
- Risk: too short → flapping; too long → slow fault isolation and recovery.
- Pass intent: switch triggers require persistent evidence, not transient events.
- Conditions: mutual reference selection, inconsistent QL propagation, revertive + short timers, or wrong advertisement of derived sources.
- Symptoms: drift-like oscillation, QL/source flapping, alarms synchronized across nodes, “stable link but unstable quality”.
Protection switching & holdover: hitless goals and what to log
Protection switching in SyncE is an operational requirement: references change because links fail, QL degrades, or maintenance occurs. A “hitless” goal is achieved when transients are contained within system tolerance windows and do not trigger service-impacting behaviors. Holdover is the controlled bridge between references; it must remain bounded, observable, and recoverable through disciplined timers and logging.
- Link loss (LOS): largest transient risk; must rely on predefined priorities and timers.
- QL degrade: link remains up but is untrusted; risk is overreaction and flapping without gating.
- Manual switch: maintenance action; must obey the same holdoff/WTR rules to avoid unnecessary disturbances.
- Internal fault: local PLL/thermal/supply alarm; must be logged with root reason and configuration context.
- Transient window: switching disturbance stays within tolerance derived from network budget and equipment grade.
- Alarm-gate window: short post-switch disturbance does not cause persistent alarms or repeated switching.
- Settling window: reacquisition converges within a defined window and remains stable (no flapping).
- Entry: reference loss or QL below gate; policy must define when to enter and what to freeze.
- Bounded drift: temperature and aging dominate; holdover duration is budgeted, not assumed.
- Recovery: restored references require holdoff/WTR and a settling gate to prevent “recover-then-flap”.
- timestamp (local time base)
- event type (LOS / QL degrade / manual / internal fault)
- from source → to source (source ID)
- decision inputs: QL, priority rank, timer states (holdoff/WTR/gate)
- reason code (primary + secondary)
- state (LOCKED / HOLDOVER / REACQUIRE)
- freq offset trend / drift indicator
- temperature / supply health snapshots
- PLL profile ID (configuration “mode”, not raw registers)
Hardware implementation: reference tree, power isolation, and distribution
A SyncE design succeeds or fails on the board. The practical goal is a clock tree that is hierarchical (clean near the root, distribute near the leaves), isolated (supply and return paths do not inject into PLL-sensitive nodes), and measurable (test points exist to separate “root quality” from “distribution damage”).
- Tree: Ref in → (selector/protection) → tracker/cleaner → fanout/buffers → endpoints (PHY / FPGA / SoC).
- Root rule: place “cleaning” early so downstream branches do not multiply noise paths.
- Leaf rule: fanout stages should focus on drive, standards, and load isolation—avoid asking them to “fix” jitter.
- Dedicated low-noise rail (LDO or filtered branch) placed close to the load.
- Short, tight decoupling loops to avoid turning filters into resonators.
- Return path kept continuous; avoid forcing sensitive currents to detour.
- Separate control rail prevents I²C/SPI activity from modulating the analog island.
- Fanout drivers can warrant their own branch to avoid switching noise back-injection.
- If rails must be shared, enforce isolation with filters and a controlled return.
- ZDB: best for delay alignment across domains when feedback alignment is part of the requirement.
- Fanout buffers: best for multi-output low-jitter distribution and level-standard matching.
- Across connectors/backplanes: treat clock as a controlled-impedance channel with continuous reference planes and explicit return paths.
PCB layout & real-world pitfalls (EMI coupling, terminations, SSC policy)
Most field failures are not caused by “wrong frequency,” but by injection paths that turn a clean clock into an unstable reference: supply ripple modulates PLLs, return paths create common-mode injection, and poor terminations amplify reflections. SSC can help EMI, but it must be treated as a policy knob—enabled only where compatibility and timing margins are proven.
- Mistake: termination topology/placement inconsistent with the receiver channel.
- Symptom: reflections and common-mode stress → jitter rises or lock alarms become sensitive.
- Quick check: observe far-end waveform and common-mode stability around switching events.
- Mistake: impedance discontinuities or routing across split reference planes.
- Symptom: stable on bench, fragile in chassis/backplane conditions.
- Quick check: compare near-end vs far-end; if far-end degrades sharply, the channel is the culprit.
- Mistake: incorrect bias/termination network leading to shifted common-mode point.
- Symptom: reduced receiver margin and higher sensitivity to ground bounce.
- Quick check: verify common-mode stability during activity bursts and thermal changes.
- Ripple: PSU noise → PLL modulation. Evidence comes from a nearby analog-rail probe.
- Crosstalk: fanout switching or parallel routing → differential-pair disturbance. Evidence comes from near-end vs far-end comparison.
- Return: connector/backplane + discontinuous reference planes → ground bounce/common-mode injection. Evidence comes from local ground-sense observations.
- downstream endpoints explicitly tolerate SSC and the timing margin is proven.
- SSC does not sit inside the most sensitive synchronization branch (or can be isolated to non-critical outputs).
- logs correlate SSC state with any switching/alarms for fast rollback decisions.
- the branch is timing-critical and jitter/alarms are already near the edge.
- compatibility is uncertain and symptoms appear as “lock but flaps” or “training fails”.
- switching behavior becomes unstable even when the link remains up.
- stabilize the analog island rail and its return loop (prove with power probe evidence).
- repair return-path discontinuities and ground bounce injection paths.
- fix termination topology and common-mode stability for the chosen signaling standard.
- reduce long parallel coupling (routing spacing, stitching, layer planning) and connector return weakness.
Engineering checklist (bring-up → verification → deployment)
This checklist is designed to deliver a repeatable bring-up: establish one stable timing baseline first, then introduce switching and alarms only after measurement taps prove the clock tree is clean at the root and intact at the far end. Each step below includes Action, Probe, and Pass criteria.
Probe: reference selector status + lock indicators (LOS/LOL) + event counter baseline.
Pass: stable selection state (no flapping) within a defined observation window.
Probe: TP2 (far-end endpoint input) + PHY status (link/lock).
Pass: TP2 frequency stable and PHY lock indicators remain steady during normal traffic.
Probe: TP1 (cleaner output near-end) + TP_PWR_A (analog rail ripple near the cleaner).
Pass: TP1 improves versus TP2 (root is cleaner than the far end) and no lock alarms appear under controlled load steps.
Probe: one near-end output + one far-end output per group (compare degradation).
Pass: enabling additional branches does not cause a measurable regression beyond the allocated distribution margin.
Probe: state timeline (LOCKED → HOLDOVER → REACQUIRE) + switching reason codes.
Pass: switching is explainable (why/when/which) and does not trigger flapping or alarm storms.
Probe: TP1 vs TP2 + analog rail ripple correlation.
Pass: RMS jitter meets the allocated budget for the equipment class; no timing alarms triggered by normal operations.
Probe: frequency offset statistics across short/long windows.
Pass: drift remains within the network budget and does not cause QL oscillation or repeated switching.
Probe: state timeline + TP1/TP2 transient capture.
Pass: transient stays inside the system tolerance window and settles without flapping.
Probe: drift slope + alarm rate + switch count.
Pass: predictable behavior across temperature and stable alarm rate under sustained operation.
- priority order + revertive / non-revertive selection
- holdoff timers to prevent fast oscillation
- clear loop-avoidance policy (single upstream ownership)
- debounce gates for LOS/LOL and QL drop
- event-rate limits to prevent alarm storms
- SSC is a policy switch: must be reversible and logged
- force single reference + disable auto switching
- disable SSC + fall back to conservative loop profile
- isolate non-critical branches (fanout group disable)
Applications & IC selection notes (carrier / 5G backhaul patterns)
This section ties SyncE to real carrier topologies and turns “timing requirements” into a block-level bill of materials. The goal is not a brand catalog, but a repeatable selection flow that maps each deployment pattern to the minimum set of blocks (PHY + cleaner + fanout + switching + monitoring) required to meet the local jitter/wander budget.
Pressure points: thermal drift, rail ripple, connector return-path noise.
Blocks: SyncE-capable PHY + root cleaner + fanout (grouped) + minimal monitoring.
Log: lock/holdover transitions, drift slope, alarm rate, switch counter (if dual ref).
Pressure points: QL oscillation, timing loops, policy flapping.
Blocks: multi-input DPLL/cleaner with input selection + holdoff timers + fanout + robust logs.
Log: active input, QL in/out history, switch reason codes, holdoff state, event counters.
Pressure points: loop formation, unstable priority when references are close.
Blocks: explicit QL policy + conservative revertive behavior (if needed) + stable alarm gating.
Log: QL changes and loop-related alarms correlated to traffic/routing events.
Pressure points: frequent switching when inputs are similar; transient spikes.
Blocks: glitch-free selection + holdoff + holdover plan + detailed event logs.
Rollback: force one input + disable auto switching + disable SSC.
- SyncE-capable endpoints? If recovered clock is usable and observable, keep architecture simple; otherwise add an external cleaner stage.
- Tight jitter budget at the far end? Put a dedicated jitter attenuator/cleaner at the root and prove TP1 vs TP2 margin.
- Need hitless / controlled switching? Require glitch-free selection, holdoff timers, and replayable logs (reason codes + counters).
- Need holdover? Select oscillator class and DPLL features based on temperature profile and expected outage duration.
- Need monitoring? Add clock monitors / frequency or phase measurement hooks to enable fast field isolation.
These part numbers are provided to accelerate datasheet lookup and lab prototyping. Always verify suffix/package, supported input/output standards, telecom timing features, and availability for the target BOM.
Recommended topics you might also need
Request a Quote
FAQs (SyncE troubleshooting, 4-line answers)
Each FAQ is intentionally constrained to the SyncE boundary and uses the same 4-line, testable format: Likely cause / Quick check / Fix / Pass criteria.