What is the most visible field difference between PHY timestamping and MAC/TSU timestamping?

The most visible difference is tail behavior. PHY timestamping, taken closer to the wire, is less sensitive to internal latency uncertainty and typically yields tighter p99/p999 distributions. MAC/TSU timestamping is easier to integrate but can show more outliers from internal delay drift and clock-domain boundaries, even if average error looks similar.

Should PLL loop bandwidth be larger or smaller, and what symptoms indicate the wrong choice?

A wider bandwidth tracks the reference faster but can import reference noise; a narrower bandwidth cleans noise better but reacts more slowly. Too wide often worsens p99/p999 tails even when lock is stable. Too narrow increases relock time and can create long error windows during switching or disturbances. Validate with tail metrics and event-correlated spikes during relock.

Minimum validation loop: which three tests catch most real-world timing pitfalls?

A strong minimum loop is T1 + T2 + T5. T1 establishes steady-state tail metrics (p99/p999 and outlier rate). T2 proves holdover drift TE(t) under reference loss and validates recovery behavior. T5 validates RTC plus supercap backup and ensures charging transients do not cause supply droop, repeated PLL unlocks, or time rollback during cold start. Use a uniform record template and fixed pass/fail criteria.

Edge Timing & Sync (PTP Timestamps, Jitter Cleaner PLL, RTC Backup)

Q: Should timestamp requirements use average error or p99/p999? Why?

Use p99/p999 when rare spikes can break control loops, multi-sensor alignment, or event reconstruction. Averages hide tail events caused by reference switching, PLL relock settling, clock-domain boundaries, or timestamp tap uncertainty. Keep the average as a sanity check, but accept/reject with a fixed statistics window, tail metrics, and an outlier rate.

Q: PLL shows LOCKED but timestamps still spike—what two evidence classes should be checked first?

First, check PLL/clock-tree event evidence (holdover entry, reference switch, relock time, settle gating) and correlate events with spike timestamps. Second, check timestamp consistency for the same event across tap points (PHY vs TSU vs readout). Lock bits confirm loop closure, not usable output quality or phase continuity.

Q: When switching from external 1PPS to local XO, what is the most common root cause of a time step jump?

The most common root cause is phase discontinuity at the switch boundary: the local oscillator phase is not aligned to the outgoing reference, and a step occurs before a controlled slew policy is applied. Another frequent cause is releasing timestamps too early after relock—output is not fully settled, so transient phase error becomes a visible time step in ToD mapping.

Q: How to choose XO, TCXO, or OCXO by back-calculating from a target holdover time?

Define an acceptance statement: within T minutes after reference loss, time error stays within E. Convert this into an allowable frequency error budget, then split it into temperature drift, aging, and short-term stability. XO fits short/low-risk holdover, TCXO is typical for edge devices, and OCXO is used when tight holdover is needed and power, warm-up, and volume are acceptable. Prove with holdover and temperature tests.

Q: In a holdover budget, which term is most often underestimated: temp drift, aging, or thermal sweep behavior?

Thermal sweep behavior (ramp and non-steady-state) is most often underestimated. Many designs validate only at room temperature and ignore drift during temperature transitions and settling, which stretches TE(t) and increases tail events. Aging matters over longer time scales, but real temperature dynamics dominate many edge deployments. Include ramp rates and dwell time in validation.

Q: Supercap charging causes supply droop—should current limit be fixed first, or power sequencing?

Fix inrush control first because droop is usually driven by peak charge current. Power sequencing is often the second step to prevent sensitive rails from seeing the transient. A robust approach uses a controlled charge path plus isolation so the backup domain cannot pull down the main rail during cold start or brownouts. Acceptance should include no repeated PLL unlocks during charge and no time rollback after recovery.

Q: How to quickly tell if the issue is reference input quality or PLL output quality?

Verify the reference first, then verify what the PLL does with it. If 1PPS or 10 MHz shows missing pulses, unstable amplitude, or phase steps, the input is suspect. If the reference is stable but jitter proxies or spike bursts still appear, the PLL switching, settling policy, or clock distribution is more likely. Correlate PLL events (switch/relock/settle) with spike timestamps for fast attribution.

← Back to: IoT & Edge Computing

Edge Timing & Sync is an end-device hardware time subsystem that turns a reference (1PPS/10 MHz/SyncE or local XO) into usable, consistent timestamps—by controlling jitter/holdover with a PLL, enforcing monotonic time (no rollback), and keeping time through power loss with RTC + backup energy. If any link in that chain is weak, the field symptom is almost always the same: long-tail timestamp spikes and time jumps—so the fix starts with evidence at the reference, the PLL events, and timestamp consistency.

H2-1 · Scope & Boundary

Scope & Boundary: What “time sync” means inside an edge device

“Time sync” is not a single feature. In edge hardware, it is a timing subsystem that must keep time measurable, stable, and recoverable across link changes, reference loss, and power cycles. The focus here is the hardware path that turns a reference into a usable clock and a trustworthy timestamp.

What this page covers (hardware timing subsystem)

Ref 1PPS / 10 MHz / SyncE (as input) PLL jitter-cleaner + clock tree TSU hardware timestamp unit ToD time-of-day interface RTC + supercap backup rail

Reference to clock: reference input qualification, muxing, jitter cleaning, and distribution to device clock domains.
Clock to timestamp: where timestamping happens (PHY vs MAC/TSU), and how to keep timestamps consistent and monotonic.
Loss and recovery: holdover behavior, RTC backup power, and “no time going backwards” recovery policies.

The three outcomes to guarantee (engineering deliverables)

Timestamp correctness: timestamps represent the real event order; no unexpected jumps; no backward time after recovery.
Jitter & wander under control: short-term jitter is bounded; long-term drift during holdover is predictable and budgeted.
Holdover + RTC backup: time remains valid through reference loss and power cycles; recovery is repeatable and testable.

Out of scope (kept on sibling pages)

TSN scheduling (Qbv/Qci, traffic shaping) — only the timing hardware is covered here.
BMCA / grandmaster selection — treated as system-level control, not hardware implementation details.
GNSS anti-jam / RF front-end — belongs to GNSS Timing / Positioning Module pages.
Cloud / fleet time management — operational architecture, not device timing subsystem design.

Practical boundary test: if the problem can be verified by probing reference inputs, PLL/clock outputs, timestamp behavior, or RTC backup rail, it belongs here. Otherwise, it belongs to a system/network sibling page.

Figure E1 — Edge timing subsystem: reference → jitter cleaning → timestamps → RTC backup

H2-2 · Requirements Decomposition

Turn “sync” into acceptance criteria: accuracy, jitter, and holdover

Successful timing designs start with testable acceptance criteria. “Better sync” is ambiguous; a device needs separate targets for timestamp accuracy, short-term jitter, and holdover drift. These targets interact, but they must be specified and validated independently to avoid false confidence.

Define the requirement in three questions (fast triage)

Timestamp alignment: do multiple devices stamp the same event within the required bound (ns or µs)?
Frequency / wander control: does the offset grow over minutes when the reference is lost (holdover)?
Phase noise / jitter control: is short-term jitter low enough for sampling clocks and high-speed I/O margins?

Dimension	What to specify	Why it matters (impact path)	Minimum verification
Timestamp accuracy	Max error bound + statistic (e.g., p99/p999), plus “no backward time” policy	Event correlation, logs, multi-sensor fusion, audit trails; errors show up as mis-ordered or mis-timed events	Same-event multi-device compare; distribution over time; check for jumps after link/restart
Short-term jitter	Allowed jitter budget at clock outputs (qualitative threshold if no jitter analyzer)	Sampling clocks (ADC/AFE), SERDES margins, control loops; too much jitter causes noise, BER, or instability	PPS/clock edge stability checks; compare “clean vs noisy” modes; correlate with error bursts
Long-term wander	Allowed drift rate over minutes/hours under holdover	Offset accumulates; a system that starts aligned can diverge steadily without clear alarms	Ref-off holdover curve; measure time error vs time; identify dominant drift contributors
Holdover	Duration + bound: “X minutes with ≤Y error” (separate steady vs temperature change cases)	Reference outage is common in the field; robustness requires predictable degradation, not surprises	Ref removal + temperature sweep; compare against the drift budget; record recovery behavior

Convert application language into acceptance language (examples)

Logging & event forensics: prioritize timestamp correctness + monotonic recovery; µs-class may be acceptable but jumps are not.
DAQ synchronous sampling: prioritize jitter + phase stability; “time-of-day” alone is insufficient.
Motion/control coordination: prioritize wander + holdover; steady divergence is the main failure pattern.
Multi-sensor fusion: prioritize event-alignment statistics (p99/p999), not just average offset.

Minimal acceptance pack (recommended)

Three numbers: max event alignment error, required holdover duration, and maximum allowed time jump on recovery.
Three tests: steady-state timestamp distribution, ref-off holdover drift curve, and power-cycle recovery monotonicity check.

A device that “locks” but fails these acceptance checks is not synchronized. Lock indicators are status signals, not proof of time quality.

Figure E2 — Acceptance triangle: accuracy ↔ jitter ↔ holdover (with test hooks)

H2-3 · Core Hardware Chain

From reference to usable timestamps: the end-to-end timing chain

A high-quality time system is a chain, not a single block. Every link must be responsible for a specific output: reference quality → clock conditioning → timestamp capture → time-of-day continuity. When a design fails, the fastest diagnosis is to walk the chain in order and collect evidence at defined test points.

Chain overview (four layers)

L1 Reference sources L2 Ref mux + jitter cleaning L3 Timestamp capture + reporting L4 ToD/RTC hold + recovery

L1 (Reference): 1PPS / 10 MHz / SyncE-as-input / local XO. The job is to provide a stable time or frequency anchor and detect invalidity.
L2 (Clock tree): qualify → mux → PLL/jitter cleaner → fanout. The job is to produce clean, continuous clocks for all timing domains.
L3 (Timestamp path): capture point (PHY/MAC/TSU) → insert/report. The job is to stamp events deterministically with minimal uncertainty.
L4 (ToD/RTC): write → keep → restore. The job is to preserve continuity across power/reference loss and prevent backward time.

Measurement points (evidence hooks used later in debugging)

Reference input quality: missing pulses, glitches, edge stability, frequency offset trends.
PLL status + events: lock/unlock, ref switch, holdover entry/exit, relock time.
Clock output behavior: continuity during switchovers, jitter/wander indicators (direct or proxy).
Timestamp consistency: outliers, jumps, monotonicity after restarts and ref changes.
RTC drift + backup rail: actual retention time, drift vs temperature, recovery “jump limit”.

Engineering rule: do not treat “locked” as proof. Treat it as a hint, then validate time quality at TP points with repeatable checks.

Figure E3 — End-to-end chain: reference → clock tree → timestamp → ToD/RTC (with TP hooks)

H2-4 · Timestamp Location

PHY timestamp vs MAC/TSU timestamp: where uncertainty is created

Two boards can both claim IEEE-1588 support yet deliver very different results. The main differentiator is the timestamp capture point and the uncertainty contributors between the wire and that capture point. The closer the capture point is to the physical interface, the fewer internal variability sources exist.

High-level contrast (what changes in practice)

PHY timestamp: capture occurs near the wire; fewer internal path delays leak into the timestamp; better for tight tails (p99/p999).
MAC/TSU timestamp: capture occurs deeper in the device; easier integration; more sensitive to internal clock-domain and path effects.

Uncertainty contributors (error terms) and how they appear

Contributor	Why it happens	Typical symptom	Fast check
Clock-domain crossing (CDC)	Timestamp capture and reporting live in different clock domains; edge capture + transfer adds quantization and variability	Random outliers even at steady load; occasional “spikes” not correlated with traffic	Hold traffic constant; check if outliers persist
Variable latency (queue/IRQ/driver)	Software time capture or delayed reporting introduces load-dependent delay variability	Good average but poor tails; degrades under higher CPU/traffic load	Increase load; observe tail widening (p99/p999)
RX/TX asymmetry	Transmit and receive paths do not match (different pipeline delays, different corrections)	Direction-dependent bias; offset shifts when link/path changes	Compare behavior across directions and link states
Clock tree discontinuity	Reference switching or relock causes phase discontinuity that leaks into timestamps	Step changes (“jumps”) aligned with ref switch/relock events	Correlate jumps with PLL/ref event log

Selection decision tree (practical)

Need ns-class alignment or tight tails: prioritize PHY timestamp or a tightly integrated TSU with controlled CDC and event logging.
Need µs-class log alignment: MAC/TSU timestamp can be acceptable if monotonicity, recovery jump limits, and tail behavior are validated.
Any requirement level: require a way to detect and record ref switch / relock / holdover events; timestamps must be auditable.

A common failure mode is “reasonable average, unacceptable tail.” Always validate p99/p999 and correlate outliers with CDC, load, and ref events.

Figure E4 — Capture points (PHY vs MAC/TSU) and the main uncertainty bubbles

H2-5 · Jitter-Cleaner PLL

Why “Locked” is not “Good”: loop bandwidth, jitter transfer, and switching transients

A jitter-cleaner PLL is a noise-shaping system, not a simple “clock lock” indicator. Lock status confirms a control loop is active, but it does not prove the output clock is quiet, continuous, or predictable during reference loss. Time quality depends on how the PLL is configured and how it behaves during reference switching and holdover.

The three engineering knobs (what actually determines time quality)

1 Loop bandwidth (BW) 2 Jitter transfer / attenuation 3 Ref switching + transient behavior

Loop BW: decides how much reference noise leaks to the output and how fast the loop can track ref changes.
Jitter transfer: describes what the PLL passes vs cleans; the output can be ref-dominated or VCO-dominated depending on frequency.
Switching transient: ref switch/relock events can create phase steps or frequency bumps that show up as timestamp outliers.

Jitter-cleaner vs clock generator (practical boundary)

Jitter-cleaner PLL: prioritizes clock cleanliness, holdover, and auditable ref switching for time-sensitive subsystems.
Clock generator: prioritizes frequency synthesis and fanout; it can produce the right frequencies without guaranteeing time-quality tails.

Common failure patterns (symptom → likely cause → engineering action)

Symptom	Likely cause	Engineering action	Evidence to collect
Locked, but tails are bad	Loop BW too wide; reference noise is passed into the output	Narrow BW or enable stronger cleaning; qualify reference inputs and record ref-quality events	TP: output TIE trend; correlate tail spikes with ref-quality changes
Slow lock / unstable during ref switch	Loop BW too narrow; loop cannot track ref changes quickly enough	Increase BW or use staged behavior (fast reacquire then settle); review switch policy	PLL event log: relock time, switch timestamps, unlock bursts
Time jumps after ref loss / recovery	Holdover mode mismatched to oscillator quality (freeze vs flywheel vs local ref switch)	Define holdover acceptance (X minutes ≤ Y error) and select holdover strategy accordingly	Holdover drift curve; recovery jump limit; monotonicity check
Periodic “spikes” even at steady load	Switching transient / CDC interactions or hidden ref toggling	Audit ref mux policy; ensure transient is bounded; log every ref switch/holdover event	Event log + timestamp outlier correlation; TP at ref and PLL output

Proof strategy (status bits are insufficient)

Required: PLL event history (lock/unlock, ref switch, holdover entry/exit, relock time).
Required: output quality via proxy metrics when lab gear is limited (e.g., clock period stability, TIE trend, outlier rate).
Optional: phase-noise / jitter analyzer confirmation for final sign-off (same targets, higher resolution).

Acceptance mindset: a PLL can remain locked while violating jitter tails, creating transient steps, or drifting beyond holdover limits. Time quality must be proven at the output and correlated with ref/PLL events.

Figure E5 — PLL lock vs clock quality (BW · transfer · holdover · ref switching)

H2-6 · Oscillators & Reference Choice

XO vs TCXO vs OCXO vs CSAC: picking the holdover baseline without drifting into GNSS RF

Oscillator selection is best driven by holdover acceptance: when the primary reference is lost, the device must stay within a defined error budget for a defined duration. The oscillator defines the baseline for wander, temperature drift, and aging; the PLL/clock tree can only shape or distribute what the local reference can support.

Start with a holdover budget (turn requirements into a selection gate)

Define: “Ref lost → within Y time error for X minutes” (include expected temperature change).
Then choose: oscillator class that can plausibly meet the drift budget under temperature, aging, and vibration constraints.
Finally verify: holdover drift curve + recovery jump limit (monotonic behavior).

Practical boundaries (what each oscillator class is good at)

XO: lowest cost; larger temperature drift and aging. Suitable for low holdover demands or frequent re-discipline.
TCXO: improved temperature stability with low power. Common choice for edge devices that need practical holdover.
OCXO: strong short-term stability and phase-noise performance, but higher power, warm-up time, and volume.
CSAC (when truly needed): strong long-term stability for extended reference outages, at higher cost and integration constraints.

Key parameters (how to read them as system impacts)

Temp drift curve Aging ppm/day or ppm/year PN phase noise (dBc/Hz) g sensitivity Warm warm-up time

Temperature drift: dominates holdover when ambient changes; focus on curve shape, not a single number.
Aging: sets long-holdover baseline drift; critical for long outages and long calibration intervals.
Phase noise: impacts jitter-sensitive domains (sampling clocks, high-speed links) and can drive tail behavior.
g sensitivity: matters in vibration/portable environments; frequency shifts can appear as timing noise.
Warm-up: can create “good only after minutes” behavior; treat as a requirement, not a surprise.

Qualitative comparison (use for early architecture choices)

Type	Holdover drift	Phase-noise / jitter	Power / warm-up	Typical edge fit
XO	weak under temp/aging	varies; usually moderate	best power; no warm-up	low requirement, frequent discipline
TCXO	good practical drift	good enough for many	low power; minimal warm-up	common edge holdover baseline
OCXO	strong short-term	strong (quiet)	high power; warm-up required	high-end sync, DAQ, tight tails
CSAC	strong long-term	varies; often good	higher cost; integration tradeoffs	extended outages with strict drift limit

Scope guard: GNSS anti-jamming and antenna/RF front-end design is out of scope here (covered by the GNSS Timing / Positioning Module page).

Figure E6 — Holdover-driven oscillator selection (XO/TCXO/OCXO/CSAC) and parameter mapping

H2-7 · Holdover & Drift Budget

Holdover drift budgeting: how long time stays “within spec” after reference loss

Holdover is only meaningful when written as an acceptance statement that can be calculated and verified: after reference loss, time error stays within ±E for T minutes (under a defined temperature profile). This section turns that statement into a drift budget and a validation loop.

Holdover is a sum of error contributors (what must be budgeted)

1 oscillator stability 2 temperature trajectory 3 aging 4 discipline / calibration strategy

Oscillator stability: initial frequency offset and short-term wander define the starting slope of time error.
Temperature trajectory: drift follows temperature over time; curve shape matters more than a single spec number.
Aging: long-holdover baseline drift sets the floor for extended outages and long calibration intervals.
Discipline strategy: pre-loss training reduces the residual frequency error at the moment holdover starts.

Write the budget in the same unit used for acceptance: time error TE(t)

The primary evidence is the time error curve TE(t). Treat it as the scorecard: if |TE(T)| ≤ E, the system passes for the defined temperature profile.

Linear TE(t): constant frequency offset dominates (slope stays roughly constant).
Curved TE(t): temperature drift or compensation changes the slope over time.
Piecewise TE(t): mode transitions (holdover entry, relock, ref switching) introduce slope changes or steps.

Back-calculate the allowed frequency error from the acceptance statement

For an initial gate, convert time error to relative frequency error:

Acceptance target	Derived gate	How it is used
After ref loss: ±E time error within T	Avg. \|Δf/f\| ≤ E/T (convert to ppm)	Filters oscillator class and sets the holdover margin before lab tests
Temperature changes during holdover	Reserve margin for temp curve + compensation limits	Ensures the “E/T” gate is not consumed by environmental drift
Extended outage or long intervals	Aging budget (ppm over time horizon)	Defines recalibration interval and required oscillator grade

Evidence priority rule: first confirm TE(t) stays inside the acceptance envelope; only then use deeper metrics (e.g., Allan) to explain residuals.

Verification loop (ref cut + thermal sweep)

Ref cut test: disconnect the reference, record TE(t) and holdover/relock events, then check |TE(T)| ≤ E.
Thermal sweep: repeat with a controlled temperature trajectory; compare TE(t) envelopes and compensation effectiveness.
Correlation: annotate TE(t) with event timestamps (ref loss, holdover entry, ref switch, relock) to attribute slope changes or steps.

Figure E7 — Holdover drift budgeting loop (acceptance → budget → back-calc → validate)

H2-8 · RTC + Supercap Backup

RTC and supercap backup: keeping time through power loss without collapsing the main rail

RTC backup is a power-domain design: it must keep the RTC domain alive through power loss while preventing charging inrush, backfeed paths, and leakage from defeating the backup-time target. A correct design is defined by an effective voltage window, a total backup current, and an auditable startup recovery that avoids time rollback.

RTC selection checklist (what matters for holdover + recovery)

Backup current: the dominant term in backup-time estimation; measure worst-case, not typical.
Temperature behavior: drift across the expected temperature range; consider calibration/trim registers.
Clock source: 32 kHz crystal vs integrated oscillator (power, drift curve, startup repeatability).
Calibration registers: enables writing measured offset back into RTC for improved holdover alignment.

Backup chain blocks (charge limit → OR-ing → domain isolation)

1 charge limiter 2 ideal diode / OR-ing 3 RTC domain isolation

Charge limiter: prevents cold-start supercap inrush from drooping the main rail.
Ideal diode / OR-ing: seamless switchover while blocking reverse current between main and backup.
Domain isolation: prevents backfeed through IO/ESD structures and hidden rails.

Three common field failures (symptom → likely cause → fix + evidence)

Symptom	Likely cause	Engineering action	Evidence
Backup time too short	Supercap leakage/ESR + underestimated total backup current	Budget I_total (RTC + leakage + OR-ing leakage); validate effective voltage window	TP-BACKUP_V curve; leakage isolation test
Main rail droops at cold start	Supercap behaves like a short; missing or weak inrush limiting	Add charge limiter/soft-start; stage charging if needed	TP-INRUSH current; rail dip waveform
Weird power paths / partial power	Backfeed through IO/ESD or OR-ing path into RTC domain	Audit isolation; ensure reverse blocking and domain separation	Reverse current check; unexpected “alive” rails

Backup-time estimation (use the effective voltage window, not the full capacitor)

A practical first estimate uses the usable RTC voltage window: t ≈ C · (V_hi − V_lo) / I_total

V_hi/V_lo: RTC domain usable range (depends on RTC + OR-ing drop + isolation elements).
I_total: RTC backup current + supercap leakage + OR-ing leakage + board leakage (contamination can dominate).
Reality check: always confirm with the power-off timer test and compare to the estimate to locate hidden leakage.

Validation (power-off timer + cold-start recovery + monotonicity)

Power-off timer: remove power, measure how long RTC stays valid and how much it drifts.
Cold-start recovery: verify time reconstruction does not cause excessive jump on boot.
Monotonicity check: confirm time does not go backward; bound the allowed correction step.

Acceptance mindset: the backup domain passes only if backup time meets target and recovery preserves monotonic time behavior.

Figure E8 — RTC + supercap backup domain (inrush limiting, OR-ing, isolation, and typical pitfalls)

H2-9 · Reference Switching & Relock Recovery

Reference switching and relock recovery: ref mux, glitchless handover, and time-jump governance

Reference switching becomes unstable when three layers are mixed: reference qualification, clock-loop behavior, and time-of-day mapping. A robust design separates responsibilities: qualify inputs, execute a controlled handover, then govern time corrections with monotonic rules.

Typical switch scenarios (what triggers a handover)

1 GNSS → SyncE → local XO 2 primary ref drop 3 ref quality degrade 4 maintenance / forced switch

Ref loss: missing PPS pulses, missing 10 MHz, or SyncE lock loss events.
Ref degrade: growing jitter, phase steps, or intermittent pulses that still look “present”.
Anti-flap rule: apply hysteresis and minimum dwell time before switching again.

Ref qualification + ref mux (separate “decide” from “execute”)

Qualification: detect loss, count missing pulses, track phase stability, and produce a coarse GOOD / WARN / BAD score.
Decision: the policy selects a target reference using hysteresis and dwell time.
Execution: the ref mux performs the handover and records the event timestamp.

Design intent: ref mux should not “hunt.” It follows a policy and produces auditable switch events.

Three continuity layers (often confused, with different hardware requirements)

Continuity goal	What it means	Engineering implications
Glitchless	No short pulses or missing edges during the switchover	Switch on a safe boundary; gate/align the mux control; verify with PPS/clock waveform
Frequency-continuous	Output frequency does not step abruptly at handover	DPLL slews or flywheels through transition; “lock” alone is not proof—settle window matters
Phase-continuous	Phase does not exhibit a step; hardest target	Requires phase alignment/phase accumulator continuity; stricter constraints and longer validation

Relock recovery as a state machine (make transitions observable)

S0 LOCKED S1 HOLDOVER S2 REF_SWITCH S3 REACQUIRE S4 SETTLE

LOCKED → HOLDOVER: reference quality drops below threshold; log holdover_enter.
HOLDOVER → REF_SWITCH: policy selects the next best reference; log switch_event.
REACQUIRE → SETTLE: DPLL relocks; output quality must pass a settle window before declaring stable.

Time-jump governance (hardware/firmware rules only)

Monotonic rule: time must not go backward (no rollback), even during correction.
Jump limit: cap the maximum correction step; large jumps must be explicitly marked.
Step vs slew:
- Step: fast alignment but produces a visible timestamp jump (must be recorded).
- Slew: gradual convergence by controlled frequency offset (preferred for control/sampling continuity).

Boundary reminder: only time mapping and correction rules are covered here—no network selection algorithms are expanded.

TP PPS phase step / missing pulses TP PLL relock + settle time TP TS outliers / monotonic violations

Figure E9 — Reference switching + time-jump governance (qualify → mux → DPLL → ToD mapping)

H2-10 · Field Diagnostics (Evidence-First)

Field triage: the three evidence classes to quickly isolate reference, PLL, or timestamp issues

Fast diagnostics starts with a strict evidence order. The goal is not to “tune PTP,” but to localize failure to one of three hardware-visible layers: reference input, PLL/clock tree, or timestamp consistency. Each layer has a fastest tool and a minimal proof method.

The forced order (do not swap steps)

1 Reference evidence 2 PLL / clock-tree evidence 3 Timestamp consistency evidence

Reason: if reference is unstable, downstream jitter and timestamp outliers are symptoms—not root causes.

Evidence class #1 — reference input (PPS / 10 MHz / SyncE)

1PPS: verify missing pulses, phase steps, and widened jitter (scope or logic analyzer).
10 MHz: verify continuity and gross stability (counter trend; avoid deep RF analysis here).
SyncE: check lock/alarm events and correlate to observed time anomalies.

If the reference is not trustworthy, stop and fix the input path before analyzing timestamps.

Evidence class #2 — PLL / clock tree (lock is not enough)

Must log: lock/unlock, holdover entry/exit, ref switch events, relock time, and settle window outcome.
Must correlate: time anomalies that align with switch or relock boundaries point to loop transition behavior.
Practical observation: when phase noise tools are unavailable, use time-error trends and outlier bursts as a substitute indicator.

Evidence class #3 — timestamp consistency (only after ref + PLL pass)

Same event, multiple timestamps: compare capture points (e.g., PPS capture vs TSU record vs software log).
Check monotonicity: detect any backward time step (hard failure).
Check outliers: bursts of spikes suggest capture/CDC boundary issues.
Check persistent offset: stable, repeatable offset indicates fixed path delay or capture-point mismatch.

Minimal tool mapping (within this page boundary)

Tool	Fastest target	What it proves
Scope / logic analyzer	1PPS stability, phase steps, missing pulses	Confirms reference presence and gross quality; catches switch-induced glitches
Counter / frequency trend	10 MHz or derived clock drift	Shows frequency offset and slow drift that drives TE(t) slope during holdover
Software logger	events + timestamp comparisons	Auditable correlation: switch/relock boundaries vs timestamp outliers and monotonicity

Figure E10 — Three-lane field triage flow (reference → PLL → timestamp)

H2-11｜Validation Test Plan: Turn “sync quality” into repeatable tests

The goal is to convert “correct timestamps, controlled jitter, and recoverable operation after reference loss/power events” into an executable test matrix: every test case has input conditions, hardware-first observation points, a data logging template, and clear pass/fail criteria—so it can be used for R&D acceptance, production sampling, and fast field attribution (reference issues / PLL issues / timestamp path issues / backup power issues).

Timestamp correctness Jitter & wander control Holdover & RTC backup No time rollback

1) Test matrix (T1–T5) and deliverables

For each test, produce the same “evidence bundle”: raw logs (CSV/register/event logs), statistical summaries (p50/p99/p999/min/max), waveforms/screenshots, and a final decision (Pass/Fail + root-cause tag).

Test	Stimulus / conditions	Observation taps (hardware-first)	Pass criteria (template)
T1 Steady-state	Stable reference, normal lock, room temperature	1PPS phase jitter; output clock jitter (or proxy); timestamp error distribution	p99 TS_err ≤ X and p999 ≤ Y; no outlier spikes; stable lock state
T2 Holdover	Reference/link loss → enter holdover	Time error curve TE(t); frequency offset/phase drift; mode switch points	\|TE(T)\| ≤ E (T minutes/hours) and no steps; after recovery, no rollback
T3 Temperature	Chamber sweep (with ramp + dwell)	Thermal drift and compensation; lock margin; TE vs temperature	dTE/dT ≤ K; T1/T2 thresholds met across the specified temperature range
T4 Power disturbance	Brownout / hot-plug / reset disturbance	PLL unlock/relock time; TSU timestamp spikes; any time rollback	relock ≤ R; monotonic time (no rollback); complete event logs
T5 RTC+Supercap backup	Power-off → backup domain only → power-on	Backup duration; charge inrush; post-restore time continuity (step/rollback)	backup ≥ H; inrush does not droop the main rail; restore is continuous or controlled stepping

How to write criteria: avoid “average only”. Prefer a defined time window plus tail metrics (p99/p999/max), then add event-style criteria (step/rollback/spike present or not).

Figure E11 — Validation plan map: stimuli → taps → metrics → pass/fail

ALT suggestion: “Validation map for edge timing & sync: stimuli, tap points, metrics, and pass/fail outputs.”

2) Unified test architecture: fix the “stimuli + taps + logging format”

For repeatability, split the setup into two layers: the stimulus layer (reference/temperature/power/backup-off) and the observation layer (PPS/clocks/timestamps/RTC domain). When boards or components change, keep the stimulus layer unchanged and only swap the DUT.

Stimulus layer: programmable reference input (1PPS/10MHz or SyncE), temperature chamber, controllable brownout/hot-plug fixture, and a power-off backup fixture (supercap domain).
Observation layer: PPS phase (scope/counter), PLL lock state (registers + event logs), timestamp consistency (same event cross-point comparison), and RTC-domain voltage/current + backup duration.
Logging format: a unified CSV schema + fixed statistics windows (e.g., 1 s / 10 s / 60 s) + tail metrics (p99/p999).

Field-friendly constraint: even without a phase-noise analyzer, keep “proxy evidence”: PPS phase noise, output clock period-jitter statistics, and timestamp outlier rate.

3) T1 — Normal reference: steady-state timestamp distribution and jitter baseline

T1 does one thing: build a statistical “healthy profile”. Every anomaly later should be compared against the T1 baseline (tail degradation, more spikes, or lock-state instability).

Conditions: stable reference, lock complete, room temperature; fixed load/traffic (avoid uncertain queue behavior).
Taps: PPS phase jitter; period jitter at key clock points (or a proxy); timestamp deltas for the same event observed at multiple points.
Outputs: TS_err distribution (p50/p99/p999/max); outlier rate; lock/switch logs are empty or stable.
Criteria template: p99 ≤ X and p999 ≤ Y; spike amplitude max ≤ Z; spike rate ≤ N/hour.

4) T2 — Reference loss: holdover drift curve TE(t)

The core evidence for holdover is the time error curve TE(t). Use TE(t) to capture the dominant drift first, then decide whether deeper phase-noise/Allan analysis is necessary.

Stimulus: after steady lock, disconnect the reference input or simulate link loss; keep the DUT running.
Logging: sample TE(t) every Δt; also log temperature, PLL mode, and frequency-offset estimates.
Criteria template: |TE(T)| ≤ E (T minutes/hours); the curve must be continuous with no steps; after reference returns, no rollback is allowed.

5) T3 — Temperature sweep: drift/compensation, lock margin, and the degradation knee

The point is not merely “it still runs”, but to find the degradation knee: at what temperature/slope do lock jitter rise, timestamp tails worsen, or relock slow down.

Profile: dwell points + ramp sweeps; compare behavior before/after thermal stabilization.
Evidence: TE vs T; lock state and relock time; whether TS_err p99/p999 degrade with temperature.
Criteria template: dTE/dT ≤ K; T1/T2 thresholds still met across the target temperature range.

6) T4 — Power disturbance/hot-plug: unlock/relock and time monotonicity

Power events often create “random-looking” timestamp spikes and rollback risk. This test forces checks for: any rollback, predictable relock, and whether event logs can close the loop.

Stimulus: controlled dips, brief interruptions, hot-plug, reset; cover both main power and clock/RTC-only rails.
Taps: PLL lock→unlock→relock time; TSU timestamp spikes; whether system time rolls backward.
Criteria template: relock ≤ R; no rollback; every anomaly maps to a logged reference/power/PLL/timestamp-path cause.

7) T5 — RTC + supercap: backup duration, charge inrush, and restore consistency

A backup path must both “last long enough” and “not collapse the main rail during charging”. T5 therefore validates backup duration, charge inrush, backfeed paths, and time consistency after restore.

Power-off timing: disconnect the main rail and keep only the RTC domain; record backup duration (backup ≥ H).
Inrush evidence: cold-start charge peak current and main-rail droop; verify no UV/PG false triggers.
Restore consistency: on power return, verify no rollback/unexplained large jump; controlled stepping is acceptable, unexplained steps are not.

8) Logging template (CSV field suggestions) and final report structure

The more uniform the schema, the higher the cross-project reuse. Put “environment, reference, lock state, time error, timestamp distribution, and backup domain” on the same row so scripts can generate summary plots and decisions directly.

Base: test_id, dut_rev, fw_rev, timestamp_utc, run_id.
Environment: temp_c, vin_main_v, vin_rtc_v, load_state.
Reference: ref_sel (PPS/10MHz/SyncE/XO), ref_ok, ref_loss_count.
PLL: pll_lock, mode (lock/holdover/relock), relock_ms, alarm_flags.
Time error: te_ns (instant), te_ns_max, te_ns_slope.
Timestamp stats: ts_err_p50/p99/p999/max, outlier_rate, rollback_flag.
Backup: backup_elapsed_s, charge_inrush_a (peak), rtc_drift_ppm_est.

Report structure: for every test case, keep four fixed sections: Condition → Evidence (waveforms/logs) → Stats (p99/p999/TE curve) → Decision (Pass/Fail + root-cause tag).

9) Part numbers (examples) — common building blocks for reusable validation fixtures

The part numbers below are for building a validation platform or a reference “control group”. Final selection must consider availability, package, and system power/cost targets; if lifecycle changes, use an equivalent-class substitute.

Module role	Example part numbers	Which validation points
Jitter-cleaner / DPLL	Silicon Labs / Skyworks Si5341 (with Si5341-D-EVB) Analog Devices AD9545 (with AD9545-PCBZ) Renesas 8A34001 Microchip ZL30733	T1 jitter baseline, T2 holdover, T4 unlock/relock behavior
1588 timestamp PHY / switch	Texas Instruments DP83640 (IEEE 1588 PTP PHY) Microchip KSZ9477 (IEEE 1588v2-capable switch)	T1 timestamp distribution, T4 spike/rollback isolation (PHY TS vs TSU path)
RTC (temp-comp / low power)	Analog Devices / Maxim DS3231 NXP PCF2129	T5 backup duration and restore consistency; thermal drift comparison
Supercap backup management	Analog Devices LTC3350 (supercap charger + backup control) Texas Instruments TPS61094 (ultra-low IQ approach with supercap management)	T5 backup path: duration, charge policy, power-off/power-on behavior
OR-ing / ideal diode	Analog Devices LTC4412 (PowerPath/ideal-diode controller)	T5 backup-domain isolation and backfeed prevention
Inrush limiting (eFuse)	Texas Instruments TPS2595 (adjustable current limit + adjustable soft-start)	T4/T5: cold-start charge inrush, rail droop, repeatable protection behavior
Supercapacitor (device)	Murata DMF series (5.5V EDLC) Panasonic EEC-F series (5.5V Gold Cap family, e.g., EEC-F5R5U / EEC-F5R5H families)	T5: impact of leakage/ESR on backup duration and inrush

Materials ↔ root-cause tags: run the same tests with a “high-performance DPLL + clean oscillator” and with a “cost-optimized stack” to quickly build a sensitivity curve for reference quality / loop bandwidth / backup path. Field issues then become much easier to reproduce and attribute.

10) Production & field rollout: make T1/T2/T5 the minimal closed loop

If cost/time must be compressed, keep three “minimal loop” cases: T1 (baseline) + T2 (holdover) + T5 (backup). These three cover most “timing feels like black magic” field failures while staying within this page’s hardware boundary.

T1: defines what “healthy” looks like—without it you cannot judge degradation.
T2: reference loss is the most common fault injection; TE(t) is the most explanatory evidence.
T5: power-off/cold-start is the highest-risk scenario for time jumps and rollback—must be forced-verified.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12｜FAQs — Edge Timing & Sync (Hardware Time Subsystem)

These FAQs stay strictly inside the device’s timing hardware boundary: reference input quality, ref mux switching, PLL/jitter-cleaner behavior, timestamp tap placement (PHY/MAC/TSU), ToD monotonicity, and RTC+supercap backup.

p99/p999 metrics PLL lock vs usable clocks PHY vs MAC/TSU timestamps holdover & backup validation tests

Should timestamp requirements use “average error” or p99/p999? Why?

Use p99/p999 when rare spikes can break control, multi-sensor alignment, or event reconstruction. Averages hide tail events caused by ref switching, PLL relock settling, clock-domain crossings, or timestamp tap uncertainty. Keep the average as a sanity check, but accept/reject with a fixed statistics window and tail metrics (plus an outlier rate).

Acceptance template: p99 ≤ X, p999 ≤ Y, outlier_rate ≤ N/hour.
Always log the window length and exclude warm-up (e.g., first 2 minutes after lock).

PLL shows “LOCKED” but timestamps still spike—what two evidence classes should be checked first?

A lock indicator only means the loop is closed, not that the output is “production-clean.” First, check PLL/clock-tree events (holdover entry, ref switch, relock time, settle gating) and correlate them with spike timestamps. Second, check timestamp consistency for the same event across tap points (PHY vs TSU vs software-readout) to separate “clock quality” from “tap/path issues.”

Example jitter-cleaner/DPLL parts used in endpoints/gateways: SiLabs Si5341, ADI AD9545, Microchip ZL30733, Renesas 8A34001.
Fast triage: spikes aligned with relock/switch events → PLL/switching/settling; spikes without events → tap/path/measurement chain.

In the field, what is the most visible difference between PHY timestamping vs MAC/TSU timestamping?

The most visible difference is the tail behavior: PHY timestamping, taken closer to the wire, is less sensitive to internal timing uncertainty and usually produces a tighter p99/p999 distribution. MAC/TSU timestamping is easier to integrate but can inherit extra variation from internal latency drift and clock-domain boundaries.

Symptom pattern: similar averages, but MAC/TSU has more outliers during bursts, switching, or thermal drift.
Example IEEE-1588-capable devices often used for comparison: TI DP83640 (PTP PHY), Microchip KSZ9477 (PTP-aware switch).

When switching from external 1PPS to local XO, what is the most common root cause of a “time step” jump?

The most common cause is phase discontinuity at the switch boundary: the local oscillator phase is not aligned to the outgoing reference, and the PLL allows a step before the system applies a controlled slew policy. A second common cause is releasing “LOCKED” too early—output is not fully settled, so ToD mapping amplifies transient phase error into a visible time step.

Mitigation: phase-/frequency-continuous switching where possible, plus settle-gate before enabling timestamps.
Governance: enforce no rollback and a jump limit (step vs slew).

Should PLL loop bandwidth be larger or smaller? What field symptoms indicate the wrong choice?

A wider bandwidth tracks the reference faster but can import reference noise; a narrower bandwidth cleans noise better but reacts slowly. If bandwidth is too wide, timestamp tails worsen even when lock is stable (reference noise leaks through). If bandwidth is too narrow, relock takes longer and switching causes prolonged error windows or slow recovery.

Validate with T1/T4: p999 and spike rate during switching/relock are the first indicators.
Rule of thumb: choose bandwidth together with “settle time budget” and “switch frequency” constraints.

How to choose XO/TCXO/OCXO by back-calculating from a target holdover time (minutes)?

Start with an acceptance statement: “within T minutes after reference loss, time error stays within E.” Convert this into an allowable frequency error budget, then split it into temperature drift, aging, and short-term stability. XO fits short/low-risk holdover; TCXO is typical for edge devices; OCXO is used when tight holdover is needed but power/volume and warm-up are acceptable.

Verification is mandatory: prove the budget using holdover TE(t) tests (T2) and temperature sweep (T3).
Clock conditioning examples: SiLabs Si5341 or ADI AD9545 can combine holdover + switching policy control.

In a holdover budget, which term is most often underestimated: temp drift, aging, or thermal sweep behavior?

The most underestimated term is usually real thermal behavior (ramp + non-steady-state), not the “25°C ppm” value. Many designs validate only at room temperature and ignore the drift during temperature transitions and settling, which directly stretches TE(t) and increases tail events. Aging matters over long time scales, but thermal transitions dominate many edge deployments.

Action: include temperature ramp rates and dwell time in the test plan (T3), not just a single-point measurement.
Evidence: TE(t) slope changes that correlate with temperature slope are the fastest signal.

Why does RTC + supercap backup often last “much shorter in reality” than theory suggests?

Theory assumes ideal capacitance and a clean backup load. Reality is dominated by supercap leakage, effective voltage window, ESR-related droop, and hidden loads or backfeed paths in the RTC domain. If the backup rail is not isolated, unexpected current drains can dwarf the RTC’s budget and collapse the supercap early.

RTC examples: ADI/Maxim DS3231, NXP PCF2129 (calibration and low backup current options vary by design).
Measure backup domain current and rail droop curve—do not estimate from capacitance alone.

Supercap charging causes supply droop—should current limit be fixed first, or power sequencing?

Fix inrush control first because droop is usually driven by peak charge current. Power sequencing is often the second step to prevent sensitive rails from seeing the transient. A robust approach uses a controlled charge path plus isolation so the backup domain cannot pull down the main rail during cold start or brownouts.

Examples: eFuse/soft-start TPS2595; ideal-diode/PowerPath LTC4412; supercap manager LTC3350.
Acceptance: no repeated PLL unlocks during charge; no timestamp rollback after recovery.

How to quickly tell if the issue is “reference input quality” or “PLL output quality”?

Use a two-step split: verify the reference first, then verify what the PLL does with it. If 1PPS/10 MHz shows missing pulses, unstable amplitude, or phase steps, the input is suspect. If the reference is stable but output jitter proxies or spike bursts still appear, the PLL switching/settling policy or clock-tree distribution is the more likely cause.

Reference evidence: PPS phase stability on scope/counter.
PLL evidence: event timeline (switch/relock/settle) aligned to timestamp spikes.

After power restoration, how to ensure time does not go backward and logs do not reorder?

Enforce monotonic time as a hard rule: never allow rollback. After power-up, RTC provides a seed, but the system must apply a governance policy (step vs slew) with a jump limit and explicit “time-adjust” markers. For logging, keep an always-increasing sequence counter so ordering is preserved even when ToD is corrected within allowed bounds.

Backup chain examples used in validation: RTC DS3231/PCF2129 + supercap control LTC3350 + isolation LTC4412.
Pass criteria: rollback_count = 0; time adjustments are annotated and bounded.

Minimum validation loop: which three tests catch ~80% of real-world timing pitfalls?

A strong minimum loop is T1 + T2 + T5. T1 establishes steady-state tail metrics (p99/p999 and outlier rate). T2 proves holdover drift TE(t) under reference loss and validates recovery behavior. T5 validates RTC+supercap backup and ensures charging transients do not cause supply droop, repeated PLL unlocks, or time rollback during cold start.

T1: distribution + spikes; T2: TE(t) drift; T5: backup time + inrush + monotonicity.
Require a uniform record template and fixed pass/fail criteria across builds.

Figure E12 — FAQ evidence ladder: input → PLL → timestamp → ToD/backup

ALT suggestion: “Edge timing FAQ evidence ladder: reference input, PLL events, timestamp tails, and ToD/backup checks.”

Edge Timing & Sync (PTP Timestamps, Jitter Cleaner PLL, RTC Backup)

Edge Timing & Sync (PTP Timestamps, Jitter Cleaner PLL, RTC Backup)

Scope & Boundary: What “time sync” means inside an edge device

What this page covers (hardware timing subsystem)

The three outcomes to guarantee (engineering deliverables)

Out of scope (kept on sibling pages)

Turn “sync” into acceptance criteria: accuracy, jitter, and holdover

Define the requirement in three questions (fast triage)

Convert application language into acceptance language (examples)

Minimal acceptance pack (recommended)

From reference to usable timestamps: the end-to-end timing chain

Chain overview (four layers)

Measurement points (evidence hooks used later in debugging)

PHY timestamp vs MAC/TSU timestamp: where uncertainty is created

High-level contrast (what changes in practice)

Uncertainty contributors (error terms) and how they appear

Selection decision tree (practical)

Why “Locked” is not “Good”: loop bandwidth, jitter transfer, and switching transients

The three engineering knobs (what actually determines time quality)

Jitter-cleaner vs clock generator (practical boundary)

Common failure patterns (symptom → likely cause → engineering action)

Proof strategy (status bits are insufficient)

XO vs TCXO vs OCXO vs CSAC: picking the holdover baseline without drifting into GNSS RF

Start with a holdover budget (turn requirements into a selection gate)

Practical boundaries (what each oscillator class is good at)

Key parameters (how to read them as system impacts)

Qualitative comparison (use for early architecture choices)

Holdover drift budgeting: how long time stays “within spec” after reference loss

Holdover is a sum of error contributors (what must be budgeted)

Write the budget in the same unit used for acceptance: time error TE(t)

Back-calculate the allowed frequency error from the acceptance statement

Verification loop (ref cut + thermal sweep)

RTC and supercap backup: keeping time through power loss without collapsing the main rail

RTC selection checklist (what matters for holdover + recovery)

Backup chain blocks (charge limit → OR-ing → domain isolation)

Three common field failures (symptom → likely cause → fix + evidence)

Backup-time estimation (use the effective voltage window, not the full capacitor)

Validation (power-off timer + cold-start recovery + monotonicity)

Reference switching and relock recovery: ref mux, glitchless handover, and time-jump governance

Typical switch scenarios (what triggers a handover)

Ref qualification + ref mux (separate “decide” from “execute”)

Three continuity layers (often confused, with different hardware requirements)

Relock recovery as a state machine (make transitions observable)

Time-jump governance (hardware/firmware rules only)

Field triage: the three evidence classes to quickly isolate reference, PLL, or timestamp issues

The forced order (do not swap steps)

Evidence class #1 — reference input (PPS / 10 MHz / SyncE)

Evidence class #2 — PLL / clock tree (lock is not enough)

Evidence class #3 — timestamp consistency (only after ref + PLL pass)

Minimal tool mapping (within this page boundary)

H2-11｜Validation Test Plan: Turn “sync quality” into repeatable tests

1) Test matrix (T1–T5) and deliverables

2) Unified test architecture: fix the “stimuli + taps + logging format”

3) T1 — Normal reference: steady-state timestamp distribution and jitter baseline

4) T2 — Reference loss: holdover drift curve TE(t)

5) T3 — Temperature sweep: drift/compensation, lock margin, and the degradation knee

6) T4 — Power disturbance/hot-plug: unlock/relock and time monotonicity

7) T5 — RTC + supercap: backup duration, charge inrush, and restore consistency

8) Logging template (CSV field suggestions) and final report structure

9) Part numbers (examples) — common building blocks for reusable validation fixtures

10) Production & field rollout: make T1/T2/T5 the minimal closed loop

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

Explore

Categories

Get in Touch