123 Main Street, New York, NY 10001

Edge Timing & Sync (PTP Timestamps, Jitter Cleaner PLL, RTC Backup)

← Back to: IoT & Edge Computing

Edge Timing & Sync is an end-device hardware time subsystem that turns a reference (1PPS/10 MHz/SyncE or local XO) into usable, consistent timestamps—by controlling jitter/holdover with a PLL, enforcing monotonic time (no rollback), and keeping time through power loss with RTC + backup energy. If any link in that chain is weak, the field symptom is almost always the same: long-tail timestamp spikes and time jumps—so the fix starts with evidence at the reference, the PLL events, and timestamp consistency.

H2-1 · Scope & Boundary

Scope & Boundary: What “time sync” means inside an edge device

“Time sync” is not a single feature. In edge hardware, it is a timing subsystem that must keep time measurable, stable, and recoverable across link changes, reference loss, and power cycles. The focus here is the hardware path that turns a reference into a usable clock and a trustworthy timestamp.

What this page covers (hardware timing subsystem)

Ref 1PPS / 10 MHz / SyncE (as input) PLL jitter-cleaner + clock tree TSU hardware timestamp unit ToD time-of-day interface RTC + supercap backup rail
  • Reference to clock: reference input qualification, muxing, jitter cleaning, and distribution to device clock domains.
  • Clock to timestamp: where timestamping happens (PHY vs MAC/TSU), and how to keep timestamps consistent and monotonic.
  • Loss and recovery: holdover behavior, RTC backup power, and “no time going backwards” recovery policies.

The three outcomes to guarantee (engineering deliverables)

  • Timestamp correctness: timestamps represent the real event order; no unexpected jumps; no backward time after recovery.
  • Jitter & wander under control: short-term jitter is bounded; long-term drift during holdover is predictable and budgeted.
  • Holdover + RTC backup: time remains valid through reference loss and power cycles; recovery is repeatable and testable.

Out of scope (kept on sibling pages)

  • TSN scheduling (Qbv/Qci, traffic shaping) — only the timing hardware is covered here.
  • BMCA / grandmaster selection — treated as system-level control, not hardware implementation details.
  • GNSS anti-jam / RF front-end — belongs to GNSS Timing / Positioning Module pages.
  • Cloud / fleet time management — operational architecture, not device timing subsystem design.

Practical boundary test: if the problem can be verified by probing reference inputs, PLL/clock outputs, timestamp behavior, or RTC backup rail, it belongs here. Otherwise, it belongs to a system/network sibling page.

Figure E1 — Edge timing subsystem: reference → jitter cleaning → timestamps → RTC backup
Edge Timing & Sync hardware chain Reference Inputs 1PPS 10 MHz SyncE (ref) Input qualify loss / glitch detect Clock Conditioning Ref Mux priority + switchover Jitter-Cleaner PLL Loop BW Holdover Lock detect + event log Clock Tree / Fanout Device Timing Uses Clock Domains SoC · PHY · ADC · SERDES HW Timestamp (TSU) PHY TS / MAC TSU Monotonic + sanity checks RTC Backup Domain RTC + supercap rail Charge limit OR-ing TP1 ref quality TP2 PLL output TP3 timestamp TP4 backup rail Deliverables: correct timestamps bounded jitter/wander holdover + RTC backup
H2-2 · Requirements Decomposition

Turn “sync” into acceptance criteria: accuracy, jitter, and holdover

Successful timing designs start with testable acceptance criteria. “Better sync” is ambiguous; a device needs separate targets for timestamp accuracy, short-term jitter, and holdover drift. These targets interact, but they must be specified and validated independently to avoid false confidence.

Define the requirement in three questions (fast triage)

  • Timestamp alignment: do multiple devices stamp the same event within the required bound (ns or µs)?
  • Frequency / wander control: does the offset grow over minutes when the reference is lost (holdover)?
  • Phase noise / jitter control: is short-term jitter low enough for sampling clocks and high-speed I/O margins?
Dimension What to specify Why it matters (impact path) Minimum verification
Timestamp accuracy Max error bound + statistic (e.g., p99/p999), plus “no backward time” policy Event correlation, logs, multi-sensor fusion, audit trails; errors show up as mis-ordered or mis-timed events Same-event multi-device compare; distribution over time; check for jumps after link/restart
Short-term jitter Allowed jitter budget at clock outputs (qualitative threshold if no jitter analyzer) Sampling clocks (ADC/AFE), SERDES margins, control loops; too much jitter causes noise, BER, or instability PPS/clock edge stability checks; compare “clean vs noisy” modes; correlate with error bursts
Long-term wander Allowed drift rate over minutes/hours under holdover Offset accumulates; a system that starts aligned can diverge steadily without clear alarms Ref-off holdover curve; measure time error vs time; identify dominant drift contributors
Holdover Duration + bound: “X minutes with ≤Y error” (separate steady vs temperature change cases) Reference outage is common in the field; robustness requires predictable degradation, not surprises Ref removal + temperature sweep; compare against the drift budget; record recovery behavior

Convert application language into acceptance language (examples)

  • Logging & event forensics: prioritize timestamp correctness + monotonic recovery; µs-class may be acceptable but jumps are not.
  • DAQ synchronous sampling: prioritize jitter + phase stability; “time-of-day” alone is insufficient.
  • Motion/control coordination: prioritize wander + holdover; steady divergence is the main failure pattern.
  • Multi-sensor fusion: prioritize event-alignment statistics (p99/p999), not just average offset.

Minimal acceptance pack (recommended)

  • Three numbers: max event alignment error, required holdover duration, and maximum allowed time jump on recovery.
  • Three tests: steady-state timestamp distribution, ref-off holdover drift curve, and power-cycle recovery monotonicity check.

A device that “locks” but fails these acceptance checks is not synchronized. Lock indicators are status signals, not proof of time quality.

Figure E2 — Acceptance triangle: accuracy ↔ jitter ↔ holdover (with test hooks)
Timing acceptance criteria triangle Acceptance Criteria (Device Timing) Specify targets + verify with repeatable tests A Accuracy event alignment J Jitter short-term noise H Holdover ref-off drift Test T1: steady-state stats p99 / p999 timestamp distribution Test T2: reference removal (holdover) time error vs time curve Test T3: power-cycle recovery monotonic time + jump limit Write down (3 numbers) • max alignment error • holdover duration • max time jump Fast triage (choose 1) ① timestamp alignment ② frequency / wander ③ phase noise / jitter Rule: “Lock” ≠ “quality” — prove with T1/T2/T3 against the 3 numbers
H2-3 · Core Hardware Chain

From reference to usable timestamps: the end-to-end timing chain

A high-quality time system is a chain, not a single block. Every link must be responsible for a specific output: reference qualityclock conditioningtimestamp capturetime-of-day continuity. When a design fails, the fastest diagnosis is to walk the chain in order and collect evidence at defined test points.

Chain overview (four layers)

L1 Reference sources L2 Ref mux + jitter cleaning L3 Timestamp capture + reporting L4 ToD/RTC hold + recovery
  • L1 (Reference): 1PPS / 10 MHz / SyncE-as-input / local XO. The job is to provide a stable time or frequency anchor and detect invalidity.
  • L2 (Clock tree): qualify → mux → PLL/jitter cleaner → fanout. The job is to produce clean, continuous clocks for all timing domains.
  • L3 (Timestamp path): capture point (PHY/MAC/TSU) → insert/report. The job is to stamp events deterministically with minimal uncertainty.
  • L4 (ToD/RTC): write → keep → restore. The job is to preserve continuity across power/reference loss and prevent backward time.

Measurement points (evidence hooks used later in debugging)

  • Reference input quality: missing pulses, glitches, edge stability, frequency offset trends.
  • PLL status + events: lock/unlock, ref switch, holdover entry/exit, relock time.
  • Clock output behavior: continuity during switchovers, jitter/wander indicators (direct or proxy).
  • Timestamp consistency: outliers, jumps, monotonicity after restarts and ref changes.
  • RTC drift + backup rail: actual retention time, drift vs temperature, recovery “jump limit”.

Engineering rule: do not treat “locked” as proof. Treat it as a hint, then validate time quality at TP points with repeatable checks.

Figure E3 — End-to-end chain: reference → clock tree → timestamp → ToD/RTC (with TP hooks)
End-to-end timing chain for edge devices Timing Chain Walk left → right to diagnose time quality L1 Reference 1PPS 10 MHz SyncE (ref) Local XO TP1 ref quality L2 Clock Tree Input Qualify loss / glitch detect Ref Mux priority + switchover Jitter-Cleaner PLL Loop BW Holdover Lock + event log Fanout TP2 clock out L3 Timestamp PHY TS MAC TSU Report Sanity monotonic outliers TP3 timestamp L4 ToD/RTC ToD Write RTC drift control Backup Rail supercap Recovery no backward TP4 RTC/rail Outputs: clean clocks trustworthy timestamps predictable holdover safe recovery
H2-4 · Timestamp Location

PHY timestamp vs MAC/TSU timestamp: where uncertainty is created

Two boards can both claim IEEE-1588 support yet deliver very different results. The main differentiator is the timestamp capture point and the uncertainty contributors between the wire and that capture point. The closer the capture point is to the physical interface, the fewer internal variability sources exist.

High-level contrast (what changes in practice)

  • PHY timestamp: capture occurs near the wire; fewer internal path delays leak into the timestamp; better for tight tails (p99/p999).
  • MAC/TSU timestamp: capture occurs deeper in the device; easier integration; more sensitive to internal clock-domain and path effects.

Uncertainty contributors (error terms) and how they appear

Contributor Why it happens Typical symptom Fast check
Clock-domain crossing (CDC) Timestamp capture and reporting live in different clock domains; edge capture + transfer adds quantization and variability Random outliers even at steady load; occasional “spikes” not correlated with traffic Hold traffic constant; check if outliers persist
Variable latency (queue/IRQ/driver) Software time capture or delayed reporting introduces load-dependent delay variability Good average but poor tails; degrades under higher CPU/traffic load Increase load; observe tail widening (p99/p999)
RX/TX asymmetry Transmit and receive paths do not match (different pipeline delays, different corrections) Direction-dependent bias; offset shifts when link/path changes Compare behavior across directions and link states
Clock tree discontinuity Reference switching or relock causes phase discontinuity that leaks into timestamps Step changes (“jumps”) aligned with ref switch/relock events Correlate jumps with PLL/ref event log

Selection decision tree (practical)

  • Need ns-class alignment or tight tails: prioritize PHY timestamp or a tightly integrated TSU with controlled CDC and event logging.
  • Need µs-class log alignment: MAC/TSU timestamp can be acceptable if monotonicity, recovery jump limits, and tail behavior are validated.
  • Any requirement level: require a way to detect and record ref switch / relock / holdover events; timestamps must be auditable.

A common failure mode is “reasonable average, unacceptable tail.” Always validate p99/p999 and correlate outliers with CDC, load, and ref events.

Figure E4 — Capture points (PHY vs MAC/TSU) and the main uncertainty bubbles
Timestamp capture points and uncertainty contributors Timestamp Capture Points Uncertainty grows with internal distance from the wire Cable wire events PHY line interface MAC packet pipeline TSU / SoC timestamp report PHY TS MAC/TSU TS CDC cross-domain capture Variable Latency load-dependent tails Asymmetry RX ≠ TX Decision ns-class / tight tails prefer PHY TS + controlled CDC µs-class logging MAC/TSU OK if monotonic + tails validated Checklist: capture point CDC control tail stats event log jump limits
H2-5 · Jitter-Cleaner PLL

Why “Locked” is not “Good”: loop bandwidth, jitter transfer, and switching transients

A jitter-cleaner PLL is a noise-shaping system, not a simple “clock lock” indicator. Lock status confirms a control loop is active, but it does not prove the output clock is quiet, continuous, or predictable during reference loss. Time quality depends on how the PLL is configured and how it behaves during reference switching and holdover.

The three engineering knobs (what actually determines time quality)

1 Loop bandwidth (BW) 2 Jitter transfer / attenuation 3 Ref switching + transient behavior
  • Loop BW: decides how much reference noise leaks to the output and how fast the loop can track ref changes.
  • Jitter transfer: describes what the PLL passes vs cleans; the output can be ref-dominated or VCO-dominated depending on frequency.
  • Switching transient: ref switch/relock events can create phase steps or frequency bumps that show up as timestamp outliers.

Jitter-cleaner vs clock generator (practical boundary)

  • Jitter-cleaner PLL: prioritizes clock cleanliness, holdover, and auditable ref switching for time-sensitive subsystems.
  • Clock generator: prioritizes frequency synthesis and fanout; it can produce the right frequencies without guaranteeing time-quality tails.

Common failure patterns (symptom → likely cause → engineering action)

Symptom Likely cause Engineering action Evidence to collect
Locked, but tails are bad Loop BW too wide; reference noise is passed into the output Narrow BW or enable stronger cleaning; qualify reference inputs and record ref-quality events TP: output TIE trend; correlate tail spikes with ref-quality changes
Slow lock / unstable during ref switch Loop BW too narrow; loop cannot track ref changes quickly enough Increase BW or use staged behavior (fast reacquire then settle); review switch policy PLL event log: relock time, switch timestamps, unlock bursts
Time jumps after ref loss / recovery Holdover mode mismatched to oscillator quality (freeze vs flywheel vs local ref switch) Define holdover acceptance (X minutes ≤ Y error) and select holdover strategy accordingly Holdover drift curve; recovery jump limit; monotonicity check
Periodic “spikes” even at steady load Switching transient / CDC interactions or hidden ref toggling Audit ref mux policy; ensure transient is bounded; log every ref switch/holdover event Event log + timestamp outlier correlation; TP at ref and PLL output

Proof strategy (status bits are insufficient)

  • Required: PLL event history (lock/unlock, ref switch, holdover entry/exit, relock time).
  • Required: output quality via proxy metrics when lab gear is limited (e.g., clock period stability, TIE trend, outlier rate).
  • Optional: phase-noise / jitter analyzer confirmation for final sign-off (same targets, higher resolution).

Acceptance mindset: a PLL can remain locked while violating jitter tails, creating transient steps, or drifting beyond holdover limits. Time quality must be proven at the output and correlated with ref/PLL events.

Figure E5 — PLL lock vs clock quality (BW · transfer · holdover · ref switching)
Jitter-cleaner PLL: why locked is not enough Jitter-Cleaner PLL Lock status ≠ output quality References 1PPS 10 MHz SyncE (ref) Local XO TP-REF quality Qualify glitch detect loss detect ref mux switch policy event log lock / switch Jitter-Cleaner PLL Loop BW wide Jitter Transfer clean Holdover freeze flywheel swap LOCK bit not enough TP-CLK TIE Outputs PHY SoC ADC SERDES jitter · wander · transient Prove: output quality event correlation holdover drift switch transient
H2-6 · Oscillators & Reference Choice

XO vs TCXO vs OCXO vs CSAC: picking the holdover baseline without drifting into GNSS RF

Oscillator selection is best driven by holdover acceptance: when the primary reference is lost, the device must stay within a defined error budget for a defined duration. The oscillator defines the baseline for wander, temperature drift, and aging; the PLL/clock tree can only shape or distribute what the local reference can support.

Start with a holdover budget (turn requirements into a selection gate)

  • Define: “Ref lost → within Y time error for X minutes” (include expected temperature change).
  • Then choose: oscillator class that can plausibly meet the drift budget under temperature, aging, and vibration constraints.
  • Finally verify: holdover drift curve + recovery jump limit (monotonic behavior).

Practical boundaries (what each oscillator class is good at)

  • XO: lowest cost; larger temperature drift and aging. Suitable for low holdover demands or frequent re-discipline.
  • TCXO: improved temperature stability with low power. Common choice for edge devices that need practical holdover.
  • OCXO: strong short-term stability and phase-noise performance, but higher power, warm-up time, and volume.
  • CSAC (when truly needed): strong long-term stability for extended reference outages, at higher cost and integration constraints.

Key parameters (how to read them as system impacts)

Temp drift curve Aging ppm/day or ppm/year PN phase noise (dBc/Hz) g sensitivity Warm warm-up time
  • Temperature drift: dominates holdover when ambient changes; focus on curve shape, not a single number.
  • Aging: sets long-holdover baseline drift; critical for long outages and long calibration intervals.
  • Phase noise: impacts jitter-sensitive domains (sampling clocks, high-speed links) and can drive tail behavior.
  • g sensitivity: matters in vibration/portable environments; frequency shifts can appear as timing noise.
  • Warm-up: can create “good only after minutes” behavior; treat as a requirement, not a surprise.

Qualitative comparison (use for early architecture choices)

Type Holdover drift Phase-noise / jitter Power / warm-up Typical edge fit
XO weak under temp/aging varies; usually moderate best power; no warm-up low requirement, frequent discipline
TCXO good practical drift good enough for many low power; minimal warm-up common edge holdover baseline
OCXO strong short-term strong (quiet) high power; warm-up required high-end sync, DAQ, tight tails
CSAC strong long-term varies; often good higher cost; integration tradeoffs extended outages with strict drift limit

Scope guard: GNSS anti-jamming and antenna/RF front-end design is out of scope here (covered by the GNSS Timing / Positioning Module page).

Figure E6 — Holdover-driven oscillator selection (XO/TCXO/OCXO/CSAC) and parameter mapping
Oscillator selection for holdover: XO vs TCXO vs OCXO vs CSAC Oscillator Choice for Holdover Start with X minutes ≤ Y time error Holdover Budget X min ≤ Y error recovery jump limit Environment temperature vibration (g) Candidates XO temp aging PN power TCXO temp aging PN power OCXO PN short power warm CSAC aging long cost power Impacts Accuracy time error Jitter short-term Holdover wander Scope: local reference & holdover GNSS RF front-end out of scope
H2-7 · Holdover & Drift Budget

Holdover drift budgeting: how long time stays “within spec” after reference loss

Holdover is only meaningful when written as an acceptance statement that can be calculated and verified: after reference loss, time error stays within ±E for T minutes (under a defined temperature profile). This section turns that statement into a drift budget and a validation loop.

Holdover is a sum of error contributors (what must be budgeted)

1 oscillator stability 2 temperature trajectory 3 aging 4 discipline / calibration strategy
  • Oscillator stability: initial frequency offset and short-term wander define the starting slope of time error.
  • Temperature trajectory: drift follows temperature over time; curve shape matters more than a single spec number.
  • Aging: long-holdover baseline drift sets the floor for extended outages and long calibration intervals.
  • Discipline strategy: pre-loss training reduces the residual frequency error at the moment holdover starts.

Write the budget in the same unit used for acceptance: time error TE(t)

The primary evidence is the time error curve TE(t). Treat it as the scorecard: if |TE(T)| ≤ E, the system passes for the defined temperature profile.

  • Linear TE(t): constant frequency offset dominates (slope stays roughly constant).
  • Curved TE(t): temperature drift or compensation changes the slope over time.
  • Piecewise TE(t): mode transitions (holdover entry, relock, ref switching) introduce slope changes or steps.

Back-calculate the allowed frequency error from the acceptance statement

For an initial gate, convert time error to relative frequency error:

Acceptance target Derived gate How it is used
After ref loss: ±E time error within T Avg. |Δf/f| ≤ E/T (convert to ppm) Filters oscillator class and sets the holdover margin before lab tests
Temperature changes during holdover Reserve margin for temp curve + compensation limits Ensures the “E/T” gate is not consumed by environmental drift
Extended outage or long intervals Aging budget (ppm over time horizon) Defines recalibration interval and required oscillator grade

Evidence priority rule: first confirm TE(t) stays inside the acceptance envelope; only then use deeper metrics (e.g., Allan) to explain residuals.

Verification loop (ref cut + thermal sweep)

  • Ref cut test: disconnect the reference, record TE(t) and holdover/relock events, then check |TE(T)| ≤ E.
  • Thermal sweep: repeat with a controlled temperature trajectory; compare TE(t) envelopes and compensation effectiveness.
  • Correlation: annotate TE(t) with event timestamps (ref loss, holdover entry, ref switch, relock) to attribute slope changes or steps.
Figure E7 — Holdover drift budgeting loop (acceptance → budget → back-calc → validate)
Holdover drift budgeting loop: from acceptance statement to TE(t) validation Holdover Drift Budget Acceptance → back-calc → TE(t) proof Acceptance Ref lost → |TE(T)| ≤ E for T minutes (temp profile) Budget Blocks Oscillator stability Temperature trajectory Aging Discipline strategy Back-calc Allowed |Δf/f| ≤ E / T (ppm gate) Validation (Primary Evidence) Ref cut test Record TE(t) + events Pass if |TE(T)| ≤ E Thermal sweep TE(t) envelope vs temp Compensation effectiveness TP-TE TP-EVENT TP-TEMP Evidence priority: TE(t) curve → event correlation → deeper metrics if needed
H2-8 · RTC + Supercap Backup

RTC and supercap backup: keeping time through power loss without collapsing the main rail

RTC backup is a power-domain design: it must keep the RTC domain alive through power loss while preventing charging inrush, backfeed paths, and leakage from defeating the backup-time target. A correct design is defined by an effective voltage window, a total backup current, and an auditable startup recovery that avoids time rollback.

RTC selection checklist (what matters for holdover + recovery)

  • Backup current: the dominant term in backup-time estimation; measure worst-case, not typical.
  • Temperature behavior: drift across the expected temperature range; consider calibration/trim registers.
  • Clock source: 32 kHz crystal vs integrated oscillator (power, drift curve, startup repeatability).
  • Calibration registers: enables writing measured offset back into RTC for improved holdover alignment.

Backup chain blocks (charge limit → OR-ing → domain isolation)

1 charge limiter 2 ideal diode / OR-ing 3 RTC domain isolation
  • Charge limiter: prevents cold-start supercap inrush from drooping the main rail.
  • Ideal diode / OR-ing: seamless switchover while blocking reverse current between main and backup.
  • Domain isolation: prevents backfeed through IO/ESD structures and hidden rails.

Three common field failures (symptom → likely cause → fix + evidence)

Symptom Likely cause Engineering action Evidence
Backup time too short Supercap leakage/ESR + underestimated total backup current Budget I_total (RTC + leakage + OR-ing leakage); validate effective voltage window TP-BACKUP_V curve; leakage isolation test
Main rail droops at cold start Supercap behaves like a short; missing or weak inrush limiting Add charge limiter/soft-start; stage charging if needed TP-INRUSH current; rail dip waveform
Weird power paths / partial power Backfeed through IO/ESD or OR-ing path into RTC domain Audit isolation; ensure reverse blocking and domain separation Reverse current check; unexpected “alive” rails

Backup-time estimation (use the effective voltage window, not the full capacitor)

A practical first estimate uses the usable RTC voltage window: t ≈ C · (V_hi − V_lo) / I_total

  • V_hi/V_lo: RTC domain usable range (depends on RTC + OR-ing drop + isolation elements).
  • I_total: RTC backup current + supercap leakage + OR-ing leakage + board leakage (contamination can dominate).
  • Reality check: always confirm with the power-off timer test and compare to the estimate to locate hidden leakage.

Validation (power-off timer + cold-start recovery + monotonicity)

  • Power-off timer: remove power, measure how long RTC stays valid and how much it drifts.
  • Cold-start recovery: verify time reconstruction does not cause excessive jump on boot.
  • Monotonicity check: confirm time does not go backward; bound the allowed correction step.

Acceptance mindset: the backup domain passes only if backup time meets target and recovery preserves monotonic time behavior.

Figure E8 — RTC + supercap backup domain (inrush limiting, OR-ing, isolation, and typical pitfalls)
RTC and supercap backup domain: charge limit, OR-ing, isolation, and validation points RTC + Supercap Backup Domain Keep time through power loss without rail droop or backfeed Main rail 3.3V / 5V Charge limiter inrush control Supercap C · ESR · leakage V_hi − V_lo Ideal diode / OR-ing reverse block RTC domain RTC 32 kHz crystal or integrated osc calibration regs offset / trim domain isolation block backfeed Validation Power-off timer backup time Cold-start recovery jump limit Monotonicity check no rollback Common pitfalls: leakage/ESR → short backup inrush → rail droop backfeed → weird paths TP-BACKUP_V TP-INRUSH TP-MONO
H2-9 · Reference Switching & Relock Recovery

Reference switching and relock recovery: ref mux, glitchless handover, and time-jump governance

Reference switching becomes unstable when three layers are mixed: reference qualification, clock-loop behavior, and time-of-day mapping. A robust design separates responsibilities: qualify inputs, execute a controlled handover, then govern time corrections with monotonic rules.

Typical switch scenarios (what triggers a handover)

1 GNSS → SyncE → local XO 2 primary ref drop 3 ref quality degrade 4 maintenance / forced switch
  • Ref loss: missing PPS pulses, missing 10 MHz, or SyncE lock loss events.
  • Ref degrade: growing jitter, phase steps, or intermittent pulses that still look “present”.
  • Anti-flap rule: apply hysteresis and minimum dwell time before switching again.

Ref qualification + ref mux (separate “decide” from “execute”)

  • Qualification: detect loss, count missing pulses, track phase stability, and produce a coarse GOOD / WARN / BAD score.
  • Decision: the policy selects a target reference using hysteresis and dwell time.
  • Execution: the ref mux performs the handover and records the event timestamp.

Design intent: ref mux should not “hunt.” It follows a policy and produces auditable switch events.

Three continuity layers (often confused, with different hardware requirements)

Continuity goal What it means Engineering implications
Glitchless No short pulses or missing edges during the switchover Switch on a safe boundary; gate/align the mux control; verify with PPS/clock waveform
Frequency-continuous Output frequency does not step abruptly at handover DPLL slews or flywheels through transition; “lock” alone is not proof—settle window matters
Phase-continuous Phase does not exhibit a step; hardest target Requires phase alignment/phase accumulator continuity; stricter constraints and longer validation

Relock recovery as a state machine (make transitions observable)

S0 LOCKED S1 HOLDOVER S2 REF_SWITCH S3 REACQUIRE S4 SETTLE
  • LOCKED → HOLDOVER: reference quality drops below threshold; log holdover_enter.
  • HOLDOVER → REF_SWITCH: policy selects the next best reference; log switch_event.
  • REACQUIRE → SETTLE: DPLL relocks; output quality must pass a settle window before declaring stable.

Time-jump governance (hardware/firmware rules only)

  • Monotonic rule: time must not go backward (no rollback), even during correction.
  • Jump limit: cap the maximum correction step; large jumps must be explicitly marked.
  • Step vs slew:
    • Step: fast alignment but produces a visible timestamp jump (must be recorded).
    • Slew: gradual convergence by controlled frequency offset (preferred for control/sampling continuity).

Boundary reminder: only time mapping and correction rules are covered here—no network selection algorithms are expanded.

TP PPS phase step / missing pulses TP PLL relock + settle time TP TS outliers / monotonic violations
Figure E9 — Reference switching + time-jump governance (qualify → mux → DPLL → ToD mapping)
Reference switching and time-jump governance: qualify inputs, switch safely, recover lock, govern time corrections Reference Switching & Time Governance Qualify → Switch → Relock → Map time (monotonic) Reference sources GNSS 1PPS / 10 MHz SyncE Local XO (holdover) Ref qualification loss detect quality score hysteresis + dwell Ref mux + policy glitchless switch event logging anti-flap guard DPLL / clock loop holdover relock + settle device clocks → PHY / SoC / TSU / RTC ToD mapping rules monotonic (no rollback) step slew TP-PPS_PHASE TP-PLL_EVENT TP-TS_OUTLIER
H2-10 · Field Diagnostics (Evidence-First)

Field triage: the three evidence classes to quickly isolate reference, PLL, or timestamp issues

Fast diagnostics starts with a strict evidence order. The goal is not to “tune PTP,” but to localize failure to one of three hardware-visible layers: reference input, PLL/clock tree, or timestamp consistency. Each layer has a fastest tool and a minimal proof method.

The forced order (do not swap steps)

1 Reference evidence 2 PLL / clock-tree evidence 3 Timestamp consistency evidence

Reason: if reference is unstable, downstream jitter and timestamp outliers are symptoms—not root causes.

Evidence class #1 — reference input (PPS / 10 MHz / SyncE)

  • 1PPS: verify missing pulses, phase steps, and widened jitter (scope or logic analyzer).
  • 10 MHz: verify continuity and gross stability (counter trend; avoid deep RF analysis here).
  • SyncE: check lock/alarm events and correlate to observed time anomalies.

If the reference is not trustworthy, stop and fix the input path before analyzing timestamps.

Evidence class #2 — PLL / clock tree (lock is not enough)

  • Must log: lock/unlock, holdover entry/exit, ref switch events, relock time, and settle window outcome.
  • Must correlate: time anomalies that align with switch or relock boundaries point to loop transition behavior.
  • Practical observation: when phase noise tools are unavailable, use time-error trends and outlier bursts as a substitute indicator.

Evidence class #3 — timestamp consistency (only after ref + PLL pass)

  • Same event, multiple timestamps: compare capture points (e.g., PPS capture vs TSU record vs software log).
  • Check monotonicity: detect any backward time step (hard failure).
  • Check outliers: bursts of spikes suggest capture/CDC boundary issues.
  • Check persistent offset: stable, repeatable offset indicates fixed path delay or capture-point mismatch.

Minimal tool mapping (within this page boundary)

Tool Fastest target What it proves
Scope / logic analyzer 1PPS stability, phase steps, missing pulses Confirms reference presence and gross quality; catches switch-induced glitches
Counter / frequency trend 10 MHz or derived clock drift Shows frequency offset and slow drift that drives TE(t) slope during holdover
Software logger events + timestamp comparisons Auditable correlation: switch/relock boundaries vs timestamp outliers and monotonicity
Figure E10 — Three-lane field triage flow (reference → PLL → timestamp)
Field diagnostics flow: three evidence lanes to isolate reference, PLL, or timestamp root causes Field Triage: Evidence-First Isolate root cause in 3 lanes: reference → PLL/clock tree → timestamp Symptom timestamp spikes · time jumps · misalignment · drift Lane 1 — Reference Lane 2 — PLL / Clock tree Lane 3 — Timestamp Check PPS / 10 MHz / SyncE missing pulses phase steps jitter widen Tools: scope / logic / counter Check lock + transitions holdover events switch logs relock + settle Tools: logger + waveforms Compare timestamps monotonic check outlier bursts persistent offset Tools: software logger Conclusion: fail Lane 1 → ref issue fail Lane 2 → PLL/loop issue fail Lane 3 → TS issue

H2-11|Validation Test Plan: Turn “sync quality” into repeatable tests

The goal is to convert “correct timestamps, controlled jitter, and recoverable operation after reference loss/power events” into an executable test matrix: every test case has input conditions, hardware-first observation points, a data logging template, and clear pass/fail criteria—so it can be used for R&D acceptance, production sampling, and fast field attribution (reference issues / PLL issues / timestamp path issues / backup power issues).

Timestamp correctness Jitter & wander control Holdover & RTC backup No time rollback

1) Test matrix (T1–T5) and deliverables

For each test, produce the same “evidence bundle”: raw logs (CSV/register/event logs), statistical summaries (p50/p99/p999/min/max), waveforms/screenshots, and a final decision (Pass/Fail + root-cause tag).

Test Stimulus / conditions Observation taps (hardware-first) Pass criteria (template)
T1 Steady-state Stable reference, normal lock, room temperature 1PPS phase jitter; output clock jitter (or proxy); timestamp error distribution p99 TS_err ≤ X and p999 ≤ Y; no outlier spikes; stable lock state
T2 Holdover Reference/link loss → enter holdover Time error curve TE(t); frequency offset/phase drift; mode switch points |TE(T)| ≤ E (T minutes/hours) and no steps; after recovery, no rollback
T3 Temperature Chamber sweep (with ramp + dwell) Thermal drift and compensation; lock margin; TE vs temperature dTE/dT ≤ K; T1/T2 thresholds met across the specified temperature range
T4 Power disturbance Brownout / hot-plug / reset disturbance PLL unlock/relock time; TSU timestamp spikes; any time rollback relock ≤ R; monotonic time (no rollback); complete event logs
T5 RTC+Supercap backup Power-off → backup domain only → power-on Backup duration; charge inrush; post-restore time continuity (step/rollback) backup ≥ H; inrush does not droop the main rail; restore is continuous or controlled stepping
How to write criteria: avoid “average only”. Prefer a defined time window plus tail metrics (p99/p999/max), then add event-style criteria (step/rollback/spike present or not).
Figure E11 — Validation plan map: stimuli → taps → metrics → pass/fail
Validation plan map for edge timing and sync Three lanes show test stimuli, hardware observation taps, and computed metrics used for pass fail decisions. Designed for mobile readability. T1–T5 Validation: Stimuli → Taps → Metrics → Pass/Fail Stimuli Hardware taps Metrics / decisions Reference Stable / Loss / Switch Temperature Sweep / Dwell / Ramp Power Dip / Hot-plug / Reset Backup Supercap → RTC only PPS / 10MHz pins phase step, missing pulses scope / counter PLL status & clocks lock, holdover, relock time jitter proxy if needed Timestamp consistency same event, multi-point TS spikes / rollback RTC domain backup current, droop TS error stats p50 / p99 / p999 / max outlier rate Holdover TE(t) |TE(T)| ≤ E no step / no rollback Thermal robustness dTE/dT, lock margin relock under stress Backup behavior backup ≥ H, no inrush dip Log everything: time, temp, ref state, lock, TE(t), TS stats → attach waveforms/screenshots → PASS/FAIL + root-cause tag
ALT suggestion: “Validation map for edge timing & sync: stimuli, tap points, metrics, and pass/fail outputs.”

2) Unified test architecture: fix the “stimuli + taps + logging format”

For repeatability, split the setup into two layers: the stimulus layer (reference/temperature/power/backup-off) and the observation layer (PPS/clocks/timestamps/RTC domain). When boards or components change, keep the stimulus layer unchanged and only swap the DUT.

  • Stimulus layer: programmable reference input (1PPS/10MHz or SyncE), temperature chamber, controllable brownout/hot-plug fixture, and a power-off backup fixture (supercap domain).
  • Observation layer: PPS phase (scope/counter), PLL lock state (registers + event logs), timestamp consistency (same event cross-point comparison), and RTC-domain voltage/current + backup duration.
  • Logging format: a unified CSV schema + fixed statistics windows (e.g., 1 s / 10 s / 60 s) + tail metrics (p99/p999).
Field-friendly constraint: even without a phase-noise analyzer, keep “proxy evidence”: PPS phase noise, output clock period-jitter statistics, and timestamp outlier rate.

3) T1 — Normal reference: steady-state timestamp distribution and jitter baseline

T1 does one thing: build a statistical “healthy profile”. Every anomaly later should be compared against the T1 baseline (tail degradation, more spikes, or lock-state instability).

  • Conditions: stable reference, lock complete, room temperature; fixed load/traffic (avoid uncertain queue behavior).
  • Taps: PPS phase jitter; period jitter at key clock points (or a proxy); timestamp deltas for the same event observed at multiple points.
  • Outputs: TS_err distribution (p50/p99/p999/max); outlier rate; lock/switch logs are empty or stable.
  • Criteria template: p99 ≤ X and p999 ≤ Y; spike amplitude max ≤ Z; spike rate ≤ N/hour.

4) T2 — Reference loss: holdover drift curve TE(t)

The core evidence for holdover is the time error curve TE(t). Use TE(t) to capture the dominant drift first, then decide whether deeper phase-noise/Allan analysis is necessary.

  • Stimulus: after steady lock, disconnect the reference input or simulate link loss; keep the DUT running.
  • Logging: sample TE(t) every Δt; also log temperature, PLL mode, and frequency-offset estimates.
  • Criteria template: |TE(T)| ≤ E (T minutes/hours); the curve must be continuous with no steps; after reference returns, no rollback is allowed.

5) T3 — Temperature sweep: drift/compensation, lock margin, and the degradation knee

The point is not merely “it still runs”, but to find the degradation knee: at what temperature/slope do lock jitter rise, timestamp tails worsen, or relock slow down.

  • Profile: dwell points + ramp sweeps; compare behavior before/after thermal stabilization.
  • Evidence: TE vs T; lock state and relock time; whether TS_err p99/p999 degrade with temperature.
  • Criteria template: dTE/dT ≤ K; T1/T2 thresholds still met across the target temperature range.

6) T4 — Power disturbance/hot-plug: unlock/relock and time monotonicity

Power events often create “random-looking” timestamp spikes and rollback risk. This test forces checks for: any rollback, predictable relock, and whether event logs can close the loop.

  • Stimulus: controlled dips, brief interruptions, hot-plug, reset; cover both main power and clock/RTC-only rails.
  • Taps: PLL lock→unlock→relock time; TSU timestamp spikes; whether system time rolls backward.
  • Criteria template: relock ≤ R; no rollback; every anomaly maps to a logged reference/power/PLL/timestamp-path cause.

7) T5 — RTC + supercap: backup duration, charge inrush, and restore consistency

A backup path must both “last long enough” and “not collapse the main rail during charging”. T5 therefore validates backup duration, charge inrush, backfeed paths, and time consistency after restore.

  • Power-off timing: disconnect the main rail and keep only the RTC domain; record backup duration (backup ≥ H).
  • Inrush evidence: cold-start charge peak current and main-rail droop; verify no UV/PG false triggers.
  • Restore consistency: on power return, verify no rollback/unexplained large jump; controlled stepping is acceptable, unexplained steps are not.

8) Logging template (CSV field suggestions) and final report structure

The more uniform the schema, the higher the cross-project reuse. Put “environment, reference, lock state, time error, timestamp distribution, and backup domain” on the same row so scripts can generate summary plots and decisions directly.

  • Base: test_id, dut_rev, fw_rev, timestamp_utc, run_id.
  • Environment: temp_c, vin_main_v, vin_rtc_v, load_state.
  • Reference: ref_sel (PPS/10MHz/SyncE/XO), ref_ok, ref_loss_count.
  • PLL: pll_lock, mode (lock/holdover/relock), relock_ms, alarm_flags.
  • Time error: te_ns (instant), te_ns_max, te_ns_slope.
  • Timestamp stats: ts_err_p50/p99/p999/max, outlier_rate, rollback_flag.
  • Backup: backup_elapsed_s, charge_inrush_a (peak), rtc_drift_ppm_est.

Report structure: for every test case, keep four fixed sections: Condition → Evidence (waveforms/logs) → Stats (p99/p999/TE curve) → Decision (Pass/Fail + root-cause tag).

9) Part numbers (examples) — common building blocks for reusable validation fixtures

The part numbers below are for building a validation platform or a reference “control group”. Final selection must consider availability, package, and system power/cost targets; if lifecycle changes, use an equivalent-class substitute.

Module role Example part numbers Which validation points
Jitter-cleaner / DPLL Silicon Labs / Skyworks Si5341 (with Si5341-D-EVB)
Analog Devices AD9545 (with AD9545-PCBZ)
Renesas 8A34001
Microchip ZL30733
T1 jitter baseline, T2 holdover, T4 unlock/relock behavior
1588 timestamp PHY / switch Texas Instruments DP83640 (IEEE 1588 PTP PHY)
Microchip KSZ9477 (IEEE 1588v2-capable switch)
T1 timestamp distribution, T4 spike/rollback isolation (PHY TS vs TSU path)
RTC (temp-comp / low power) Analog Devices / Maxim DS3231
NXP PCF2129
T5 backup duration and restore consistency; thermal drift comparison
Supercap backup management Analog Devices LTC3350 (supercap charger + backup control)
Texas Instruments TPS61094 (ultra-low IQ approach with supercap management)
T5 backup path: duration, charge policy, power-off/power-on behavior
OR-ing / ideal diode Analog Devices LTC4412 (PowerPath/ideal-diode controller) T5 backup-domain isolation and backfeed prevention
Inrush limiting (eFuse) Texas Instruments TPS2595 (adjustable current limit + adjustable soft-start) T4/T5: cold-start charge inrush, rail droop, repeatable protection behavior
Supercapacitor (device) Murata DMF series (5.5V EDLC)
Panasonic EEC-F series (5.5V Gold Cap family, e.g., EEC-F5R5U / EEC-F5R5H families)
T5: impact of leakage/ESR on backup duration and inrush
Materials ↔ root-cause tags: run the same tests with a “high-performance DPLL + clean oscillator” and with a “cost-optimized stack” to quickly build a sensitivity curve for reference quality / loop bandwidth / backup path. Field issues then become much easier to reproduce and attribute.

10) Production & field rollout: make T1/T2/T5 the minimal closed loop

If cost/time must be compressed, keep three “minimal loop” cases: T1 (baseline) + T2 (holdover) + T5 (backup). These three cover most “timing feels like black magic” field failures while staying within this page’s hardware boundary.

  • T1: defines what “healthy” looks like—without it you cannot judge degradation.
  • T2: reference loss is the most common fault injection; TE(t) is the most explanatory evidence.
  • T5: power-off/cold-start is the highest-risk scenario for time jumps and rollback—must be forced-verified.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12|FAQs — Edge Timing & Sync (Hardware Time Subsystem)

These FAQs stay strictly inside the device’s timing hardware boundary: reference input quality, ref mux switching, PLL/jitter-cleaner behavior, timestamp tap placement (PHY/MAC/TSU), ToD monotonicity, and RTC+supercap backup.

p99/p999 metrics PLL lock vs usable clocks PHY vs MAC/TSU timestamps holdover & backup validation tests
Should timestamp requirements use “average error” or p99/p999? Why?

Use p99/p999 when rare spikes can break control, multi-sensor alignment, or event reconstruction. Averages hide tail events caused by ref switching, PLL relock settling, clock-domain crossings, or timestamp tap uncertainty. Keep the average as a sanity check, but accept/reject with a fixed statistics window and tail metrics (plus an outlier rate).

  • Acceptance template: p99 ≤ X, p999 ≤ Y, outlier_rate ≤ N/hour.
  • Always log the window length and exclude warm-up (e.g., first 2 minutes after lock).
PLL shows “LOCKED” but timestamps still spike—what two evidence classes should be checked first?

A lock indicator only means the loop is closed, not that the output is “production-clean.” First, check PLL/clock-tree events (holdover entry, ref switch, relock time, settle gating) and correlate them with spike timestamps. Second, check timestamp consistency for the same event across tap points (PHY vs TSU vs software-readout) to separate “clock quality” from “tap/path issues.”

  • Example jitter-cleaner/DPLL parts used in endpoints/gateways: SiLabs Si5341, ADI AD9545, Microchip ZL30733, Renesas 8A34001.
  • Fast triage: spikes aligned with relock/switch events → PLL/switching/settling; spikes without events → tap/path/measurement chain.
In the field, what is the most visible difference between PHY timestamping vs MAC/TSU timestamping?

The most visible difference is the tail behavior: PHY timestamping, taken closer to the wire, is less sensitive to internal timing uncertainty and usually produces a tighter p99/p999 distribution. MAC/TSU timestamping is easier to integrate but can inherit extra variation from internal latency drift and clock-domain boundaries.

  • Symptom pattern: similar averages, but MAC/TSU has more outliers during bursts, switching, or thermal drift.
  • Example IEEE-1588-capable devices often used for comparison: TI DP83640 (PTP PHY), Microchip KSZ9477 (PTP-aware switch).
When switching from external 1PPS to local XO, what is the most common root cause of a “time step” jump?

The most common cause is phase discontinuity at the switch boundary: the local oscillator phase is not aligned to the outgoing reference, and the PLL allows a step before the system applies a controlled slew policy. A second common cause is releasing “LOCKED” too early—output is not fully settled, so ToD mapping amplifies transient phase error into a visible time step.

  • Mitigation: phase-/frequency-continuous switching where possible, plus settle-gate before enabling timestamps.
  • Governance: enforce no rollback and a jump limit (step vs slew).
Should PLL loop bandwidth be larger or smaller? What field symptoms indicate the wrong choice?

A wider bandwidth tracks the reference faster but can import reference noise; a narrower bandwidth cleans noise better but reacts slowly. If bandwidth is too wide, timestamp tails worsen even when lock is stable (reference noise leaks through). If bandwidth is too narrow, relock takes longer and switching causes prolonged error windows or slow recovery.

  • Validate with T1/T4: p999 and spike rate during switching/relock are the first indicators.
  • Rule of thumb: choose bandwidth together with “settle time budget” and “switch frequency” constraints.
How to choose XO/TCXO/OCXO by back-calculating from a target holdover time (minutes)?

Start with an acceptance statement: “within T minutes after reference loss, time error stays within E.” Convert this into an allowable frequency error budget, then split it into temperature drift, aging, and short-term stability. XO fits short/low-risk holdover; TCXO is typical for edge devices; OCXO is used when tight holdover is needed but power/volume and warm-up are acceptable.

  • Verification is mandatory: prove the budget using holdover TE(t) tests (T2) and temperature sweep (T3).
  • Clock conditioning examples: SiLabs Si5341 or ADI AD9545 can combine holdover + switching policy control.
In a holdover budget, which term is most often underestimated: temp drift, aging, or thermal sweep behavior?

The most underestimated term is usually real thermal behavior (ramp + non-steady-state), not the “25°C ppm” value. Many designs validate only at room temperature and ignore the drift during temperature transitions and settling, which directly stretches TE(t) and increases tail events. Aging matters over long time scales, but thermal transitions dominate many edge deployments.

  • Action: include temperature ramp rates and dwell time in the test plan (T3), not just a single-point measurement.
  • Evidence: TE(t) slope changes that correlate with temperature slope are the fastest signal.
Why does RTC + supercap backup often last “much shorter in reality” than theory suggests?

Theory assumes ideal capacitance and a clean backup load. Reality is dominated by supercap leakage, effective voltage window, ESR-related droop, and hidden loads or backfeed paths in the RTC domain. If the backup rail is not isolated, unexpected current drains can dwarf the RTC’s budget and collapse the supercap early.

  • RTC examples: ADI/Maxim DS3231, NXP PCF2129 (calibration and low backup current options vary by design).
  • Measure backup domain current and rail droop curve—do not estimate from capacitance alone.
Supercap charging causes supply droop—should current limit be fixed first, or power sequencing?

Fix inrush control first because droop is usually driven by peak charge current. Power sequencing is often the second step to prevent sensitive rails from seeing the transient. A robust approach uses a controlled charge path plus isolation so the backup domain cannot pull down the main rail during cold start or brownouts.

  • Examples: eFuse/soft-start TPS2595; ideal-diode/PowerPath LTC4412; supercap manager LTC3350.
  • Acceptance: no repeated PLL unlocks during charge; no timestamp rollback after recovery.
How to quickly tell if the issue is “reference input quality” or “PLL output quality”?

Use a two-step split: verify the reference first, then verify what the PLL does with it. If 1PPS/10 MHz shows missing pulses, unstable amplitude, or phase steps, the input is suspect. If the reference is stable but output jitter proxies or spike bursts still appear, the PLL switching/settling policy or clock-tree distribution is the more likely cause.

  • Reference evidence: PPS phase stability on scope/counter.
  • PLL evidence: event timeline (switch/relock/settle) aligned to timestamp spikes.
After power restoration, how to ensure time does not go backward and logs do not reorder?

Enforce monotonic time as a hard rule: never allow rollback. After power-up, RTC provides a seed, but the system must apply a governance policy (step vs slew) with a jump limit and explicit “time-adjust” markers. For logging, keep an always-increasing sequence counter so ordering is preserved even when ToD is corrected within allowed bounds.

  • Backup chain examples used in validation: RTC DS3231/PCF2129 + supercap control LTC3350 + isolation LTC4412.
  • Pass criteria: rollback_count = 0; time adjustments are annotated and bounded.
Minimum validation loop: which three tests catch ~80% of real-world timing pitfalls?

A strong minimum loop is T1 + T2 + T5. T1 establishes steady-state tail metrics (p99/p999 and outlier rate). T2 proves holdover drift TE(t) under reference loss and validates recovery behavior. T5 validates RTC+supercap backup and ensures charging transients do not cause supply droop, repeated PLL unlocks, or time rollback during cold start.

  • T1: distribution + spikes; T2: TE(t) drift; T5: backup time + inrush + monotonicity.
  • Require a uniform record template and fixed pass/fail criteria across builds.
Figure E12 — FAQ evidence ladder: input → PLL → timestamp → ToD/backup
FAQ evidence ladder for edge timing and sync A single-column style diagram: reference input checks flow into PLL event checks, then timestamp consistency, then ToD monotonicity and RTC backup verification. Evidence ladder for timing issues (fastest path to root cause) 1) Reference input evidence 1PPS / 10MHz stability, missing pulses, phase steps Tools: scope / counter 2) PLL & clock-tree event evidence holdover, ref switch, relock_ms, settle gate Examples: Si5341 / AD9545 3) Timestamp consistency evidence same event, multi-tap TS, p99/p999 tails, outlier bursts Tap: PHY vs MAC/TSU 4) ToD governance & backup evidence no rollback, jump limit, RTC backup time, supercap inrush droop Examples: DS3231 / LTC3350 Pass rule: rollback_count = 0; adjustments must be bounded and logged
ALT suggestion: “Edge timing FAQ evidence ladder: reference input, PLL events, timestamp tails, and ToD/backup checks.”