Synchronous Ethernet (SyncE): Jitter Filtering & Holdover
← Back to: Industrial Ethernet & TSN
What is SyncE and When You Actually Need It
Synchronous Ethernet (SyncE) distributes a network-wide frequency reference by recovering line timing from the Ethernet physical layer. A node uses the PHY CDR recovered clock as an input, applies jitter filtering (DPLL / jitter attenuator), and drives a local clock tree so downstream devices track a stable frequency baseline.
Scope note: SyncE addresses frequency stability (jitter/wander/holdover). Time-of-day or phase alignment belongs to PTP / White Rabbit pages and is not expanded here.
Each node free-runs on a local XO. Short tests can look fine, but long-term wander and multi-hop drift accumulate.
Aligns time/phase using timestamps. If local frequency is noisy, the time loop must correct more often and becomes harder to keep stable.
Builds a frequency foundation via line timing recovery + filtering. Time alignment (if needed) becomes easier on top of a stable baseline.
Practical takeaway: SyncE primarily reduces frequency error propagation; it is most valuable when the system budget is dominated by wander or when holdover is mandatory during reference loss.
- Multi-hop chains where long-term drift becomes visible.
- Strict jitter/wander budget with small margin at endpoints.
- Reference-loss tolerance is required (holdover for X minutes/hours).
- Frequency stability affects deterministic behavior or RF/ADC/DAC timing.
- Single node or short links with relaxed long-term stability.
- Systems where timing only impacts non-critical functions.
- Short duty cycles where wander never accumulates to a limit.
- Noise coupling from power/ground/layout dominates; SyncE cannot “filter away” board-level coupling.
- Reference switching flaps due to policy; stability requires hysteresis/cooldown.
- Link behavior (EEE, auto-neg, flaps) injects disturbances into recovered timing.
Design Targets: Jitter, Wander, and Holdover Goals (What to Budget)
A SyncE design becomes predictable only when error sources are budgeted along a timing chain. Instead of treating “jitter” as a single number, each stage should declare what it passes, what it attenuates, and what it adds.
- Reference input → input quality placeholder: X ppm / X dB
- PHY CDR (recovered clock) → sensitivity to link behavior; added fast noise: X ps
- DPLL / jitter filter → bandwidth and attenuation; switching behavior; lock/re-lock time: BW = X
- Clock tree / fanout → additive jitter and coupling from power/ground: Add = X ps
- Sink requirement → system pass criteria: Total < X over Y time window
- Observable: recovered/filtered outputs, lock flags, alarms
- Coupling: temperature, voltage, link state events
- Evidence: logs/plots with timestamped context
- Keep measurement windows consistent across stages.
- Do not compare numbers produced under different integration bands.
- Separate “fast jitter” limits from “slow wander/holdover” limits.
Dominant losses often come from filter cascade strategy and switching behavior (lock time, hitless requirements, anti-flap policy).
Dominant losses often come from holdover drift under temperature and stress, plus incomplete event logging during reference anomalies.
Dominant losses often come from power/ground coupling into the clock tree and link-state disturbances (EEE, flaps, auto-neg transitions).
Engineering rule: separate fast jitter limits from slow wander/holdover limits, and keep measurement windows and integration settings consistent across the entire timing chain.
SyncE Architecture: EEC/SEC Roles and the End-to-End Timing Chain
Provides the preferred frequency reference and declares usable quality. Stability matters more than a “high label” that flaps.
Selects among references, filters recovered timing, and enforces switching/holdover behavior to keep the chain stable.
Passes timing through without injecting disturbances; link-state behaviors can couple into recovered timing if unmanaged.
Consumes the clock and defines pass criteria (jitter/wander/holdover windows) with evidence logging for verification.
Scope note: architecture here describes frequency timing roles. Time/phase roles (PTP/WR) are out of scope for this page.
A practical SyncE system can be described by four functions that appear at key nodes. The SEC (clock element) is the node-level “engine” that recovers timing from the PHY, filters it, selects among references, and provides holdover when inputs degrade. The EEC (equipment clock capability) describes how a network device participates and exposes the synchronization clock behavior.
- Short path, simple integration.
- Higher coupling to link-state behavior.
- Limited control range in some designs.
- Stronger control (BW, hitless, multi-input).
- Good for cascade strategy across nodes.
- Requires clean power/clock-tree hygiene.
- High integration, fewer parts.
- Clock-domain complexity can hide coupling.
- Verification needs better observability.
Placement rule: the filter location determines which disturbances are attenuated and which are imported into the downstream clock tree.
Architecture rule: map every SyncE issue to a chain position (recover/filter/select/holdover) before tuning parameters; the fix becomes measurable and repeatable.
Quality Level and Network Messaging: SSM/ESMC Without Getting Lost
Quality information carried by SSM / ESMC is a selection signal for frequency reference choices. It helps nodes converge on a consistent reference path, but it must be treated as a policy input, not a guarantee that the chosen reference is always the cleanest.
- Consistent reference selection across nodes.
- Deterministic switching behavior during faults.
- Loop-avoidance when combined with topology rules.
- A “best-labeled” input can be unstable (flapping).
- Configuration mismatch can create false-best decisions.
- Policies without timers can turn marginal changes into oscillation.
Scope note: this page uses quality messaging only for frequency reference selection. PTP time-domain selection (BMCA) is out of scope here.
- Priority: a preferred order for inputs (per node, per port).
- Quality: SSM/ESMC quality level used as eligibility and preference.
- Alarms: LOS / degradation flags that disqualify an input.
- Timers: hold-off / cooldown / stability windows that prevent flapping.
Returns to the preferred input when it becomes available again. Needs strong stability timers to avoid bouncing.
Stays on the current input after switching. Reduces bounce risk, but must track long-term quality to avoid silent degradation.
Every switch should record reason codes, QL values, alarms, timer states, and temperature for later correlation.
- Symptom: reference “chases itself” across a ring.
- Pattern: mutual preference or topology rule missing.
- First check: topology eligibility + per-port priority/QL mapping.
- Symptom: “best” input yields worse stability.
- Pattern: config mismatch or unstable link behavior.
- First check: ESMC parsing/config + link-event correlation.
- Symptom: frequent switching every minutes/seconds.
- Pattern: no hysteresis/hold-off/cooldown around thresholds.
- First check: timer states + reason codes + stability window.
Selection rule: use QL/alarms as eligibility signals, then enforce timers (hold-off/cooldown) so marginal changes do not become oscillation; log every switch with reason codes for forensics.
PHY Clock Recovery: CDR, Recovered Clock, and the Hidden Couplings
SyncE frequency stability starts at the PHY. Before tuning filters or policies, identify the clock source and the exact tap point used as the reference input.
- Derived from the link signal via CDR.
- Most sensitive to link-state and cable behaviors.
- Best for inheriting network frequency when the link is stable.
- Independent from link disturbances.
- Primary anchor for holdover behavior.
- Quality depends on oscillator class, power, and thermal gradients.
- Generated via PLL/dividers for distribution.
- Useful for SoC clock trees and fanout.
- Can import PLL/power noise if the clock tree is not isolated.
Mapping rule: document the chain as source → tap point → reference input → filter → distribution → sink. Debugging becomes faster and measurable.
Power-save entry/exit can create short timing disturbances that appear as “random” phase steps unless event logs are aligned to clock alarms.
Brief down/up cycles can force re-lock paths and trigger unnecessary reference switching if hold-off and stability windows are missing.
Negotiation windows can temporarily degrade recovered timing; treating these intervals as “valid reference” causes selection oscillation.
Reflections, return-loss margin, and connector behavior can modulate the PHY’s recovery loop, especially under EMI and load steps.
Key idea: many “network features” are effectively timing injectors at the PHY recovery boundary unless guard timers and eligibility rules are defined.
Real networks introduce link-state transitions, negotiation, and device-to-device behavior that PRBS does not exercise.
EMI, temperature gradients, power noise, and ground/shield paths can convert “margin” into timing instability.
Cascaded recover/filter/select stages create system-level dynamics that are absent in bench loopbacks.
- Baseline: PRBS/loopback for link health and error floors.
- Events: force EEE, renegotiation, and controlled re-link; record timing alarms.
- Field: replay typical load/EMI events; correlate to clock quality counters and selection logs.
Root cause shortcut: when recovered clock becomes unstable, check the tap point and link-state couplings first; filters and policies cannot fully “fix” a contaminated source.
Jitter Filtering: DPLL Bandwidth, Cascading, and Hitless Behavior
Fast disturbances should be attenuated; shrinking bandwidth blindly can hide problems by slowing lock and increasing recovery time.
Too-narrow bandwidth can treat wander as “noise” and push error into long-period deviations that are harder to detect.
Filtering cannot remove oscillator drift; long outages require explicit holdover policy and oscillator quality targets.
Decision input list: recovered-clock disturbance shape, acceptable lock time, and downstream sensitivity. Bandwidth is a system constraint, not a single “better” knob.
- Useful when each hop is noisy.
- Reduces immediate jitter propagation.
- Risk: overly slow chain response when cascaded.
- Centralizes timing cleanup at key nodes.
- Other nodes keep minimal shaping.
- Needs observability to prevent silent degradation.
- Declare every tap point and bandwidth goal per stage.
- Avoid “black-box filters” in series without visibility.
- Keep reason codes and lock metrics per node for correlation.
- Input frequency delta stays within the capture and steering range.
- Inputs meet eligibility and stability windows before switching.
- Switching is protected by hold-off/cooldown/hysteresis to prevent bounce.
- DPLL mode transitions are explicit (not accidental) during switch events.
- Max phase step ≤ X
- Recovery time ≤ X
- Post-switch stability within X for Y seconds
A switch “looks smooth” only because measurement windows are too slow or alarms are not time-aligned to the event.
Filter rule: set DPLL bandwidth to attenuate fast jitter while tracking slow wander; cascading “strong filters everywhere” often makes the chain slow and fragile.
Holdover: Oscillator Choice, Disciplining, and Recovery Strategy
Holdover is a budgeted behavior: it specifies how long frequency stays within limits and which drift terms dominate when the reference disappears.
- Ambient changes and local hot-spots drive frequency drift.
- Airflow direction and enclosure gradients matter as much as absolute temperature.
- First check: log ΔT/Δt near the oscillator, not only chassis air.
- Long-term monotonic drift that short tests rarely expose.
- Needs baseline snapshots across lifecycle milestones.
- First check: compare to archived “day-0” frequency reference.
- Supply noise can translate into phase noise and short disturbances.
- Co-rail coupling with PHY/CPU can create timing spikes.
- First check: align supply events with drift slope changes.
- Board flex, mounting torque, and encapsulation can shift frequency.
- Often shows up as “bench stable, installed drifting”.
- First check: compare before/after assembly and mounting steps.
Budget entries (placeholders): drift_rate = X, temp_sensitivity = X, max_holdover_time = X, max_recovery_step = X.
On reference loss, stop following link noise and hold the last valid state to avoid chasing flapping inputs.
Apply slow, bounded corrections using temperature history or learned trends; enforce rate and step limits.
When reference returns, re-lock in stages to avoid phase steps: small capture bandwidth first, then restore normal tracking.
- MAX HOLDOVER TIME = X
- MAX RECOVERY STEP = X
- ELIGIBILITY WINDOW = X
A good holdover design is not “perfect stability”, but a predictable drift envelope and a controlled return without secondary disturbances.
Fan profile changes can create timing wander “steps” by forcing local temperature transients near the oscillator.
A stable ambient reading can still hide a persistent gradient across the board that pushes long-term drift.
Mounting torque and board constraints can shift the oscillator’s operating point and amplify temperature sensitivity.
Implementation hint: holdover performance is only as good as logging. Store reference loss/return timestamps, temperature near the oscillator, and power/fan events.
Holdover rule: select oscillator class by drift budget, enforce time/step limits, and re-lock using staged, phase-continuous recovery.
Reference Selection and Protection Switching: Avoid Loops and Flapping
A stable selector treats reference inputs as eligible only after they pass a window, then switches through controlled transitions rather than reacting to every glitch.
- A is selected and stable.
- Exit on LOS / invalid / policy trigger.
- Transitions guarded by timers.
- B is selected and stable.
- Exit on LOS / invalid / policy trigger.
- Symmetric rules reduce corner cases.
- No eligible reference is available.
- Holdover policy maintains frequency.
- Return requires eligibility + hold-off.
Require stability for X before selection; prevents glitch-driven switching.
Use separate enter/exit thresholds; reduces threshold-edge oscillation.
Delay switching on transient degradation; avoids reacting to short disturbances.
After a switch, block further switching for X; prevents bounce loops.
Combine LOS + validity + quality flags; avoids “false-best” from a single indicator.
Store reason codes and timer states for every switch; enables field forensics and tuning.
- Define a single reference direction per domain; avoid “mutual upstream” paths.
- Ensure A/B identities remain consistent across ports, nodes, and logs.
- Bind revertive behavior to eligibility windows and cooldown (no instant bounce-back).
- Do not cascade multiple unknown selector policies without visibility.
- Keep failure modes monotonic: degrade → holdover → recover (avoid rapid toggles).
- Pin policy at boundary nodes so edge glitches do not amplify network-wide switching.
- Use consistent timestamps and reason codes for loss/restore/switch events.
- Verify switching with pass criteria: max step ≤ X, recover time ≤ X.
Switching rule: selection must be gated by eligibility windows and protected by hysteresis/hold-off/cooldown to prevent loops and flapping.
Implementation Blueprint: Switch/Gateway Clock Tree, Layout, Noise Hygiene
A robust SyncE implementation treats the clock path as a domain-crossing system: recovery, selection, cleaning, and distribution must be explicit and reviewable.
- Input: PHY recovered clock
- Clean: DPLL / jitter attenuator
- Distribute: fanout → SoC/FPGA/SerDes
- Key check: isolate noisy rails from the cleaner
- Inputs: recovered A + recovered B
- Select: eligibility + cooldown guarded
- Clean: single DPLL stage
- Key check: switch events must be observable
- Input: recovered + local XO
- Discipline: bounded steer model
- Recover: staged lock return
- Key check: temperature near XO must be logged
- Input: recovered into SoC PLL
- Risk: platform noise coupling
- Mitigation: re-clean before fanout
- Key check: compare filtered vs sink clocks
Supply ripple and load steps can translate into phase noise in cleaners and fanouts.
Return discontinuities and ground bounce create edge timing uncertainty.
Long clock traces and proximity to high-speed pairs inject jitter through coupling and reflections.
Parasitic capacitance across isolation and ESD current paths can pollute the clock domain.
Quick triage: if recovered looks good but filtered looks bad, suspect DPLL power/return; if filtered looks good but sink looks bad, suspect fanout/routing/keepout.
Confirms link recovery quality and PHY sensitivity to link events.
Confirms the cleaner isolates upstream disturbance and local power noise.
Confirms fanout, routing, and clock consumers do not re-contaminate the clock.
Interpretation map: Tap1 bad → link/PHY; Tap1 good & Tap2 bad → cleaner power/return/bandwidth; Tap2 good & Tap3 bad → fanout/routing/keepout.
Blueprint rule: make recovery, cleaning, and distribution explicit; separate power/return domains; validate at Tap1/Tap2/Tap3 to localize contamination.
Verification & Monitoring: Lab Tests, Field Telemetry, and Pass Criteria
A good validation plan is scenario-driven: apply controlled impairments, change link/platform conditions, and measure at the same three taps to keep comparisons consistent.
- loss / restore / degrade (placeholders)
- validate holdover + selection behavior
- observe Tap1/Tap2/Tap3
- link flap / re-train / cable change
- check recovered sensitivity vs filtered isolation
- observe Tap1/Tap2/Tap3
- load steps / fan steps / temp ramps
- check cleaner + fanout immunity
- observe Tap2/Tap3 changes
Fixed taps: Tap1 = recovered, Tap2 = filtered, Tap3 = sink out. Keeping taps fixed prevents “instrumented optimism” in lab-only setups.
- lock_state (A/B/HOLDOVER)
- enter/exit timestamps
- lock quality (placeholder)
- switch A↔B / to holdover
- reason_code (LOS/invalid/QL/timeout)
- timers snapshot (eligibility/hold-off/cooldown)
- temp near XO
- fan PWM/RPM
- supply events / brownout
- link up/down count
- error counters (placeholder)
- reset/restart events
Field forensics rule: every switch must have a reason code and timer context; otherwise tuning becomes guesswork and cross-node correlation breaks.
- Lock acquisition: lock time ≤ X within Y window.
- Switching stability: switch count ≤ X per Y; cooldown enforced.
- Holdover envelope: drift ≤ X over Y after ref loss.
- Recovery behavior: max step ≤ X; recover time ≤ X.
- Noise immunity: platform event causes ≤ X change at Tap2/Tap3.
- Logging completeness: missing required fields ≤ X.
Acceptance posture: criteria must be measurable from the same taps and the same log schema in both lab and field.
Verification rule: use scenario-driven impairments, fixed taps, a unified telemetry schema, and X/Y/Z pass criteria so lab results translate to field closures.
H2-11 · Engineering Checklist (Design → Bring-up → Production)
H2-12 · Applications & IC Selection (Keep It Near the End)
- Recovered-clock PHY: TI DP83867IR or Microchip KSZ9131RNX
- DPLL / network synchronizer: Renesas 82P33714 or Microchip ZL30722
- Jitter cleaner (if split-stage): Silicon Labs Si5341A
- Oscillator for holdover: Microchip OX-2211-EAE-3091-10M000 or OX-049
- Fanout / distribution: TI CDCM6208 + TI LMK1C1104
- Telemetry: TI TMP117AIDRVR + TI INA226AIDGST
- Recovered-clock capable nodes: Microchip switch KSZ8567 (recovered clock support) + PHY KSZ9131RNX
- DPLL / network synchronizer: Microchip ZL30722 or Renesas 82P33714
- Oscillator emphasis: Abracon OCXO AOCTQ5-X-10.000MHZ-I3-SW or Microchip OX-2211-EAE-3091-10M000
- Distribution buffer: Renesas 5PB1108 (OE control + low additive jitter)
- Forensics monitors: ADI LTC2990 + Microchip MCP9808-E/MS
- Recovered-clock PHY: TI DP83867IR or Microchip KSZ9131RNX
- Jitter cleaning (compact): Silicon Labs Si5341A
- TCXO option: SiTime SiT5356 or Abracon AST3TQ-T-30.720MHZ-28
- Clock fanout: TI LMK1C1104 or Renesas 5PB1108
- Telemetry: TI INA226AIDGSR + TI TMP117AIDRVR