123 Main Street, New York, NY 10001

White-Rabbit-Style Timing for Sub-ns Ethernet Synchronization

← Back to: Industrial Ethernet & TSN

White-Rabbit-style timing achieves sub-nanosecond alignment by combining two-way delay measurement, asymmetry/fixed-delay calibration, and distributed frequency lock.

The practical outcome is a measurable, maintainable time base: calibrate what two-way cannot cancel, lock frequency before phase, and verify with per-hop budgets, monitoring, and field-service evidence.

H2-1 · Definition & Scope (WR-style, not “generic PTP/SyncE”)

Intent

Establish a clear and testable definition of “White-Rabbit-style timing,” explain why it can reach sub-nanosecond synchronization, and lock the page boundary so that PTP/SyncE/TSN details do not leak into this topic.

30-second definition (engineering view)
  • White-Rabbit-style timing is an end-to-end control system that combines two-way delay measurement, link asymmetry calibration, and distributed frequency locking to keep time/phase aligned across a network link.
  • Sub-ns performance is achieved by closing the loop on what cannot be cancelled by simple round-trip math: fixed device delays, asymmetry drift, and frequency wander.
  • The deliverable of this page is a measurable path from architecture → calibration → verification, rather than protocol message-field walkthroughs.
The 3 closed loops (inputs → controls → outputs → failure modes → pass criteria)
Loop A · Two-way delay measurement
  • Inputs: hardware timestamp events; stable measurement window (X ms).
  • Controls: timestamp tap-point consistency; filtering/averaging policy; queue-noise isolation.
  • Outputs: RTT; one-way delay estimate; delay jitter distribution.
  • Failure modes: “looks stable but biased”; step changes after relock; queue-induced noise.
  • Pass criteria: delay jitter ≤ X (p95) over Y minutes (placeholder).
Loop B · Asymmetry calibration
  • Inputs: measured delay; fixed device delay components; temperature/profiles.
  • Controls: calibration table; module/port binding; compensation triggers (temp ΔT, relock).
  • Outputs: asymmetry estimate; corrected one-way delay; residual bias.
  • Failure modes: module swap invalidates table; temperature drift unmodeled; path mismatch.
  • Pass criteria: offset bias ≤ X over Y minutes across Z °C swing (placeholder).
Loop C · Distributed frequency lock (syntonization + holdover)
  • Inputs: reference frequency; phase/frequency error measurements; lock-state signals.
  • Controls: loop bandwidth; filter design; holdover policy on link loss.
  • Outputs: frequency alignment; reduced wander; controlled relock behavior.
  • Failure modes: bandwidth too wide amplifies noise; too narrow slows response; fast drift in holdover.
  • Pass criteria: wander ≤ X and holdover drift ≤ Y over Z seconds (placeholder).
What this page will answer (and where)
How accurate can it get? → H2-2 (metrics) + later verification chapter (field acceptance).
What hardware is required? → hardware building blocks chapter (clock/timestamp/phase path).
How is asymmetry calibrated? → calibration chapter (tables, binding, temperature compensation).
How is it verified and monitored? → verification/monitoring chapter (KPIs, logs, failure isolation).
How should engineering decisions be made? → engineering checklist chapter (design → bring-up → production).
Stop line (not covered here)

Protocol message formats, standard clauses, and TSN scheduling tables are intentionally excluded here. This page focuses on the closed-loop timing architecture, calibration, and measurable verification.

Diagram · Scope map (what WR-style owns vs link-outs)
PTP Packet timestamping Message mechanics SyncE Frequency distribution Jitter templates TSN Deterministic forwarding Time windows & shaping PoE / PoDL Power path Thermal & limits Protection ESD / surge Grounding / return White-Rabbit-style timing Sub-ns via closed-loop control Two-way delay measurement Asymmetry calibration Distributed frequency lock link out link out link out link out link out WR-style owns: closed-loop timing (delay + asymmetry + frequency lock). Standards and schedules are linked out.
The scope map prevents cross-page overlap: WR-style timing is treated as a closed-loop control system; PTP/SyncE/TSN/PoE/Protection are referenced only as dependencies.

H2-2 · Sub-ns Requirements & Success Criteria

Intent

Convert “sub-nanosecond” from a slogan into measurable KPIs, consistent definitions, and acceptance criteria that can be applied across lab, rack, and outdoor deployments.

Engineering definitions (no theory, only what can be verified)
  • Accuracy: long-window mean time offset versus the reference.
  • Precision: short-window spread (jitter) of time offset under steady conditions.
  • Stability: how time offset changes with time and environment (wander, temperature drift, power events).
Mapping to the 3 loops: two-way delay primarily improves precision; asymmetry calibration sets accuracy; distributed frequency lock and holdover sustain stability.
KPI stack (each KPI = what it means → how to observe → common traps → pass criteria)
KPI 1 · Time offset
  • Meaning: instantaneous timing difference between a node and the reference.
  • Observe: offset time-series from hardware timestamp events (consistent window definition).
  • Trap: mismatched denominators or windowing makes “better” plots that are not comparable.
  • Pass criteria: |offset| ≤ X (p95) over Y minutes (placeholder).
KPI 2 · Time wander (low-frequency drift)
  • Meaning: slow changes in offset due to temperature, frequency error, or control-loop design.
  • Observe: offset slope over long windows; compare before/after environmental steps.
  • Trap: aggressive filtering hides wander while real systems still fail deterministic triggers.
  • Pass criteria: |drift| ≤ X ns/min under defined ΔT and power events (placeholder).
KPI 3 · Phase noise / jitter (system-level)
  • Meaning: short-term random variation that limits precision and trigger repeatability.
  • Observe: phase-error distribution or equivalent short-window statistics from the timing loop.
  • Trap: measurement settings or logging resolution creates “too clean” plots that miss real jitter.
  • Pass criteria: jitter ≤ X (p95) in the target bandwidth (placeholder).
KPI 4 · Holdover drift (during link loss)
  • Meaning: offset drift when the link is unavailable and the node must ride its local clock.
  • Observe: offset trajectory in holdover state, tagged with temperature and power conditions.
  • Trap: mixing “relock transient” samples with holdover samples breaks comparability.
  • Pass criteria: drift ≤ X over Y seconds of holdover (placeholder).
Scenario layers (dominant error sources → first KPI to check)
Layer 1 · Bench / Lab
Dominant sources: reference quality, timestamp tap-point consistency, window/filter configuration.
First KPI: precision (short-window jitter) + delay jitter distribution.
Layer 2 · Rack / Chassis
Dominant sources: thermal gradients, airflow changes, power noise coupling, EMI-induced relock events.
First KPI: stability (wander slope) and relock step behavior.
Layer 3 · Outdoor / Long fiber
Dominant sources: medium delay temperature coefficient, asymmetry drift, maintenance and module replacement.
First KPI: accuracy (bias) + temperature-correlated drift (offset vs ΔT).
Metric definition guardrails (to avoid false “improvements”)
  • Window lock: always compare KPIs under the same window length and sampling cadence.
  • Percentile lock: use p95/p99 (not only mean) to protect deterministic triggering.
  • State tagging: separate steady-state, relock transient, and holdover samples.
  • Environment tagging: log temperature/power events alongside offset to validate compensation.
Diagram · Requirements ladder (use-case → KPIs → pass criteria)
Use-case target Sub-ns sync · deterministic trigger alignment Measurable KPIs (stack) Time offset Time wander Phase jitter Holdover drift Acceptance / pass criteria (placeholders) Offset ≤ X Wander ≤ Y Jitter ≤ Z T ≤ Keep KPI windows/percentiles/state tags consistent to avoid false improvements.
This ladder forces measurable definitions: a sub-ns goal becomes a KPI stack and explicit acceptance criteria with consistent windows, percentiles, and state tagging.

H2-3 · Timing Model: Delay Decomposition & Asymmetry Sources

Intent

Explain why timing accuracy degrades when the link gets longer, temperature changes, or optics/PHY modules are swapped. Build a practical delay budget that separates cancelable terms from terms that require calibration and compensation.

Why “longer / hotter / module swap” breaks sub-ns timing
  • Longer: medium delay and asymmetry drift increase, and small non-determinism becomes visible.
  • Hotter/colder: temperature coefficients change propagation and group delay; compensation becomes mandatory.
  • Module swap: fixed delays and group delay shift, invalidating previously learned calibration tables.
Delay decomposition ledger (what it is → how it shows up → cancelability → control lever)
1) TX fixed delay
  • What: deterministic latency from local clock domain to the TX timestamp tap.
  • Shows up as: stable bias that steps after hardware/firmware changes.
  • Two-way: not fully cancelable if tap differs between directions.
  • Control: fixed-delay calibration + strict tap definition + version/port binding.
2) RX fixed delay
  • What: deterministic latency from RX tap to the local timebase/servo measurement point.
  • Shows up as: stable bias; often appears as a constant offset error.
  • Two-way: not cancelable when TX/RX paths are not symmetric.
  • Control: per-port calibration + thermal characterization where needed.
3) Timestamp tap offset
  • What: physical location where timestamps are taken (MAC/PHY/SerDes boundary).
  • Shows up as: “stable but wrong” measurements; relock creates repeatable steps.
  • Two-way: cannot be cancelled if taps are inconsistent.
  • Control: lock tap definition + verify tap-to-wire determinism in bring-up.
4) Medium propagation delay
  • What: delay through fiber/copper, strongly tied to length and temperature.
  • Shows up as: slow drift correlated with ambient changes on long links.
  • Two-way: mostly cancelable if forward/reverse paths match.
  • Control: asymmetry modeling + temperature tagging + periodic recalibration triggers.
5) Partner group delay
  • What: TX/RX group delay of optics/PHY partner path (device-specific).
  • Shows up as: offset steps after module replacement; temperature-dependent drift.
  • Two-way: not cancelable when group delays differ by direction.
  • Control: module/serial binding + calibration table invalidation rules.
6) Switch/bridge forwarding delay
  • What: fixed + variable latency inside forwarding elements.
  • Shows up as: jitter bursts under traffic load; multi-hop drift accumulation.
  • Two-way: not cancelable when queueing dominates.
  • Control: deterministic datapath + isolation of timing traffic (architecture constraint).
7) Queueing / contention noise
  • What: load-dependent delay variation, often direction-dependent.
  • Shows up as: widened offset distribution even when mean looks acceptable.
  • Two-way: cannot be fixed by calibration if non-deterministic.
  • Control: isolate timing traffic + limit burstiness + enforce deterministic forwarding rules.
8) Clock domain crossing effects
  • What: synchronization and latency uncertainty between clock domains.
  • Shows up as: repeatable “two-level” offset modes after reset or relock.
  • Two-way: not cancelable when CDC latency is not fixed.
  • Control: fixed-latency CDC design + measurement resolution matched to target.
Asymmetry sources (physical vs logical) and what can actually control them
Physical asymmetry
  • Forward/reverse medium differences (connectors, splices, pair imbalance).
  • Optics/PHY group delay mismatch by direction.
  • Temperature coefficient mismatch (drift slope differs by direction).
Primary control: calibration tables + temperature-tagged compensation.
Logical asymmetry
  • Direction-dependent queueing or shaping in forwarding elements.
  • Timestamp tap-point not identical across TX/RX or across devices.
  • Different CDC/synchronization paths per direction.
Primary control: deterministic datapath + strict tap definition + traffic isolation.
What two-way can cancel vs what must be calibrated (decision rules)
Cancelable (by symmetry)
  • Common-mode medium changes when forward/reverse paths match.
  • Stable fixed delays only if tap definitions are identical and deterministic.
  • Random noise only when its distribution is symmetric and bounded.
Must be calibrated / constrained
  • Tap-point mismatch (cannot be removed by any two-way math).
  • Asymmetry drift with temperature or module replacement.
  • Queueing/forwarding non-determinism (calibration cannot chase randomness).
First triage order (fastest to prove/disprove)
  1. Verify timestamp tap-point definition and determinism (TX/RX consistent).
  2. Confirm fixed delay tables are bound to port + module/serial and versioned.
  3. Check whether queueing noise dominates (traffic load correlates with offset spread).
  4. Validate temperature tagging and compensation slope (offset vs ΔT).
Pass criteria (placeholder): under defined load and ΔT, offset distribution remains within X (p95) and drift slope within Y ns/min.
Diagram · Delay budget block (cancel vs calibrate vs drift risk)
Cancel Calibrate Drift risk Non-deterministic Forward / reverse paths share blocks; mismatch creates asymmetry Forward path Reverse path TX fixed Tap offset Medium Switch queue RX fixed + CDC TX fixed Tap offset Medium Switch queue RX fixed + CDC Asymmetry appears when: Tap differs · group delay differs · temperature slope differs · queue noise differs by direction Two-way cancels only symmetric terms; calibration is required for stable bias; randomness must be removed by architecture.
The delay budget separates terms that two-way can cancel (symmetry), terms that require calibration (stable bias), and terms that cannot be corrected by calibration (non-deterministic queueing).

H2-4 · Reference Architecture & Roles (Node / Switch / Grandmaster-like Source)

Intent

Assemble a practical end-to-end system: define the roles, separate the clock path from the packet/time-correction path, and set clear hardware boundaries for timestamps, phase measurement, and servo loops.

Roles and responsibilities (architecture view)
Time source (grandmaster-like)
  • Provides the reference timebase and a stable frequency anchor.
  • Exposes lock/holdover state for system-level acceptance decisions.
  • Defines the reference epoch used by endpoints (architecture-level agreement).
WR-capable switch / bridge
  • Maintains a deterministic datapath for timing traffic (controls queue noise).
  • Supports hardware timestamp and/or phase measurement primitives where required.
  • Preserves a consistent timing model across multi-hop deployments.
End node (synchronized endpoint)
  • Runs the frequency and phase/time servo loops against the reference.
  • Applies fixed-delay and asymmetry calibration tables (with temperature tagging).
  • Exports measurable KPIs (offset, wander, drift, lock state) for monitoring and field service.
Two parallel paths (clock distribution vs time/phase correction)
Clock path (thick line)
  • Goal: distribute and lock frequency (syntonization) to suppress wander.
  • Sensitive to: jitter injection, PLL bandwidth choices, and holdover behavior.
  • Observable via: lock state, frequency error, holdover drift metrics.
Packet/time path (thin line)
  • Goal: measure two-way delay and apply calibrated asymmetry compensation.
  • Sensitive to: timestamp tap definition, queue noise, and multi-hop forwarding determinism.
  • Observable via: delay jitter distribution and offset percentile stability.
Hardware boundaries (must be hardware-backed to reach sub-ns)
Boundary 1 · Hardware timestamp (fixed tap)
Timestamp events must be generated at a deterministic physical tap point with consistent TX/RX definitions.
Failure symptom: “stable but biased” offset; repeatable steps after relock.
Boundary 2 · Phase/frequency measurement
Phase/frequency error must be measured with resolution aligned to the target, feeding the servo loops.
Failure symptom: wander dominates; deterministic triggering fails despite “lock.”
Boundary 3 · Deterministic datapath (remove randomness)
Timing traffic must avoid queue-dominated latency; calibration cannot correct non-deterministic jitter.
Failure symptom: offset distribution widens under load; p95/p99 explode.
Pass criteria (placeholder): across defined load and temperature bands, the system remains locked with bounded p95 offset and bounded drift slope.
Diagram · End-to-end architecture (clock path vs packet/time path)
Time source Reference time + freq WR switch / bridge Deterministic datapath End node Servo + calibration Clock engine HW timestamp Phase meas Timing lane Queue control HW timestamp Servo loops Cal tables HW timestamp Clock path (frequency lock) Packet/time path (two-way delay) Sub-ns requires: fixed tap timestamps + phase/frequency measurement + deterministic datapath (no queue-dominated jitter)
The thick clock path supports distributed frequency lock and holdover; the thin bidirectional path supports two-way delay measurement and calibrated asymmetry compensation.

H2-5 · Bi-directional Delay Measurement Workflow (Two-way time transfer)

Intent

Clarify how two-way measurement uses four timestamps to estimate one-way delay and offset, and how to keep switch/queue noise from polluting sub-ns results.

Workflow overview (inputs → outputs → hard preconditions)
  • Inputs: four timestamp events (t1, t2, t3, t4) captured at deterministic tap points.
  • Outputs: round-trip delay (RTT), one-way delay estimate, and offset estimate used by the servo loops.
  • Hard preconditions: fixed tap definitions, deterministic datapath for timing traffic, and bounded measurement jitter.
Acceptance gate (placeholder): offset distribution remains within X (p95) under defined load; measurement jitter does not scale with throughput.
Timestamp tap consistency (non-negotiable quality gate)
Rule 1 · Fixed tap definition
Timestamps must be taken at a deterministic physical boundary (MAC/PHY/SerDes boundary) with identical TX/RX tap semantics.
Rule 2 · Tap repeatability
After reset, relock, or retrain, the tap-to-wire latency must not jump between discrete “modes.”
Rule 3 · Tap-path alignment
Timestamp generation must follow the same datapath characteristics as the payload path; avoid “fast timestamp / slow payload” splits.
Fast validation (placeholder): under controlled traffic, offset width stays within X and remains insensitive to throughput steps.
Handling switch/queue noise (isolation + windowing + filtering)
A · Isolation
  • Keep timing measurement traffic on a deterministic lane (avoid burst-driven queues).
  • Prevent timing packets from competing with telemetry bursts or bulk transfers.
  • Confirm that load changes do not widen the delay/offset distribution.
B · Windowing
  • Measure during known steady phases; avoid burst boundaries.
  • Use a defined sampling cadence and enforce a minimum sample count per window.
  • Reject windows where delay jitter exceeds a threshold.
C · Filtering
  • Filter reduces random noise only; it does not remove systematic bias.
  • Track both mean and percentile spread (p50/p95) to detect hidden jitter.
  • Separate short-term jitter from long-term drift (temperature/aging).
Acceptance gate (placeholder): delay jitter p95 ≤ X and remains stable across throughput steps; offset p95 ≤ Y in steady windows.
Common failure modes (symptom → first check → corrective direction)
Mode 1 · Stable-but-wrong (looks steady, absolute value is off)
  • First check: tap definition mismatch or fixed delay table not bound to the correct port/module/version.
  • Corrective direction: re-validate tap-to-wire determinism, then re-establish fixed-delay calibration and binding rules.
  • Pass criteria: offset mean and p95 both fall within X after applying the correct table.
Mode 2 · Periodic wander (offset drifts in cycles)
  • First check: measurement windows coupling to traffic cycles or servo/PLL interaction causing oscillation.
  • Corrective direction: isolate timing traffic, adjust window selection, then revisit servo bandwidth / filtering policy.
  • Pass criteria: wander slope and cycle amplitude remain below X across ΔT and load bands.
Diagram · Two-way timeline (t1 / t2 / t3 / t4 + computation blocks)
Lane A t1: send Lane B t2: receive · t3: send time time t1 t2 t3 t4 forward reverse RTT from (t4 – t1) and (t3 – t2) Offset one-way estimate + calibration Watch: tap mismatch · queue noise Four timestamps provide the skeleton; determinism and calibration determine absolute correctness.
The timeline shows the four timestamp events and the two-way exchange; correctness depends on tap consistency and isolation from queue-dominated jitter.

H2-6 · Calibration: Fixed Delays, Link Asymmetry & Temperature Compensation

Intent

Make sub-ns possible by defining what must be calibrated, how calibration is executed (factory / field / in-service), and how calibration tables remain valid across temperature and module swaps.

Calibration targets (table-item view: parameter → binding scope → drift driver → update policy)
1) Fixed delays (TX/RX)
  • Binding: port + device version (+ module if applicable).
  • Drift driver: firmware changes, ref-clock changes, module swap.
  • Update policy: factory baseline + revalidate on version or module change.
2) Link asymmetry term
  • Binding: link (fiber/cable path) + endpoint pair.
  • Drift driver: temperature slope mismatch, path changes, connector aging.
  • Update policy: field calibration + periodic/triggered refresh on drift indicators.
3) Group delay delta (module/PHY)
  • Binding: module/serial (or PHY instance) + port.
  • Drift driver: module replacement, temperature, vendor lot variation.
  • Update policy: invalidate and rebuild on ID change unless equivalence is proven.
Acceptance gate (placeholder): after applying the correct table bindings, offset p95 ≤ X and drift slope ≤ Y within the qualified temperature band.
Calibration stages (factory → field → in-service)
Stage A · Factory calibration
  • Goal: establish baseline fixed delays and stable bias terms.
  • Output: baseline table + binding fields (port/module/version) + checksum.
  • Gate: bias within X under controlled conditions and repeated power cycles.
Stage B · Field calibration
  • Goal: capture installation-induced link asymmetry and site-specific deltas.
  • Output: site delta + environment tags (path ID, length class, temperature band).
  • Gate: stable percentiles across load windows; drift slope below X in qualified band.
Stage C · In-service recalibration
  • Goal: maintain correctness through temperature swings, aging, and maintenance swaps.
  • Triggers: ΔT > X, drift slope > Y, ID/version change, distribution widens (p95 jump).
  • Gate: after update, offset p95 returns to ≤ X and remains stable across windows.
Temperature compensation (model + fit + online correction)
Model (engineering form)
  • Use a low-order model for delay(T) and asym(T) (linear or piecewise-linear).
  • Keep the model tied to binding scope: port/module/link as required.
  • Track residual error, not just fitted slope.
Fit (data discipline)
  • Collect samples across the qualified temperature band (not room-only).
  • Record module ID, port ID, version, and traffic window state per sample.
  • Reject samples captured during queue-dominated jitter windows.
Online correction
  • Apply model-based compensation and monitor drift slope continuously.
  • Trigger recalibration when residual error exceeds thresholds.
  • Keep a rollback path (previous table) with audit fields.
Acceptance gate (placeholder): across ΔT band, compensated drift slope ≤ X and residual error p95 ≤ Y.
Calibration table lifecycle (swap / version change / invalidation rules)
Invalidate immediately
  • Module/serial ID change where group delay equivalence is not proven.
  • Tap definition change (firmware/hardware revision changes the timestamp boundary).
  • Link path change (fiber reroute, connector replacement affecting asymmetry).
Reuse with fast revalidation
  • Same module type but different serial: reuse only after passing quick offset/jitter gates.
  • Minor firmware update: reuse only if tap semantics and CDC latency remain fixed.
  • Temperature band expansion: reuse only after adding samples and refitting compensation.
Audit fields (must be stored)
  • Port ID, module ID/serial, firmware/hardware version, and table checksum.
  • Qualified temperature band and drift slope limits.
  • Last validation timestamp and the acceptance results summary.
Diagram · Calibration pipeline (measure → extract → write table → runtime compensate → monitor & trigger)
Raw measurements Parameter extraction Write cal table Runtime compensation fixed + asym + temp Health monitor offset percentiles · drift slope · temperature · ID/version Trigger ΔT > X Trigger drift > Y Trigger ID change Trigger p95 jump recalibrate Calibration tables are measurement assets: bind to port/module/version and define invalidation rules.
The pipeline emphasizes table binding, temperature-aware compensation, and trigger-driven recalibration to preserve sub-ns correctness over time.

H2-7 · Distributed Frequency Lock (Syntonization) & Servo Loop Design

Intent

Explain why frequency must be locked first to prevent wander from dominating sub-ns timing, and how nested servo loops balance noise, response speed, and holdover stability without control-theory derivations.

Frequency lock vs phase alignment (what each fixes)
  • Frequency lock (syntony): suppresses long-term slope (drift) so offset does not grow into wander.
  • Phase/time alignment: corrects instantaneous residual offset after the slope is controlled.
  • Engineering symptom: offset ramps linearly → treat as frequency/temperature/holdover problem first, not as a phase-only tuning issue.
Acceptance gate (placeholder): drift slope ≤ X over Y minutes before tightening phase-loop residual targets.
Servo structure (outer phase/time loop + inner frequency loop)
Inner loop · Frequency
  • Measure: frequency error estimate from timing measurements and local clock observables.
  • Filter: reject queue-dominated jitter; preserve drift information.
  • Actuate: discipline DPLL/NCO/VCXO to control long-term slope.
Outer loop · Phase/Time
  • Measure: offset/phase residual (after calibration and isolation policy).
  • Filter: trade response speed against injecting noise into the disciplined clock.
  • Actuate: apply fine phase/time correction to drive residual down.
Operational rule: stabilize the frequency slope first; then tighten outer-loop residual targets (p95) under defined measurement windows.
Filtering & bandwidth (noise vs response vs holdover)
If bandwidth is too wide
  • Fast response, but measurement noise and queue jitter leak into the disciplined clock.
  • Short-term jitter worsens; offset may look busy even in steady windows.
  • Recovery can overshoot and oscillate after disturbances.
If bandwidth is too narrow
  • Clean short-term behavior, but slow correction of temperature/aging drift.
  • Wander slope increases during environmental changes.
  • Reacquire time grows after link disturbances.
Tune against measurable KPIs
  • Short-term: jitter / p95 spread in steady windows.
  • Long-term: wander slope under ΔT and load changes.
  • Resilience: holdover drift and reacquire time.
Acceptance gate (placeholder): in steady windows, jitter/p95 ≤ X; across ΔT, wander slope ≤ Y; after disturbance, reacquire ≤ Z.
Holdover (link loss / packet loss: short-term stability maintenance)
State machine
Lock → Tracking → Holdover → Reacquire. Each state must define what is frozen, what continues updating, and what triggers exit conditions.
Engineering policy
  • Preserve the best recent frequency estimate; avoid aggressive phase stepping during holdover.
  • Monitor temperature and apply known compensation models (if qualified).
  • On reacquire, ramp corrections to avoid overshoot and oscillation.
Acceptance gate (placeholder): holdover drift ≤ X over Y seconds; reacquire returns offset p95 ≤ Z within W seconds.
Diagram · Nested servo loops (frequency inner / phase outer)
Outer loop phase / time residual Measure offset / phase Filter windowed stats Actuate fine correction Inner loop frequency discipline Measure freq error Filter drift focus Actuate DPLL / NCO inner loop sets slope Trade-offs: noise injection · response speed · holdover drift Tune against measurable KPIs: p95 residual, drift slope, holdover drift, and reacquire time.
The nested structure keeps long-term slope under control (inner loop) while driving residual phase/time error down (outer loop).

H2-8 · Hardware Building Blocks (Clock Tree / Phase Measurement / Timestamp Path)

Intent

Provide a hardware-first checklist for sub-ns timing: required clock-tree elements, phase measurement capability beyond basic timestamping, and deterministic timestamp paths with repeatable latency across resets, training, and load changes.

Clock tree essentials (reference → discipline → distribute → consume)
Reference source
Stable reference input with defined noise/jitter envelope and known temperature behavior.
PLL / DPLL discipline
The actuation point for syntony: resolution, tuning range, and stability determine drift slope and holdover behavior.
Distribution & consumers
Fanout to PHY/FPGA/SoC and to timestamp/phase blocks; manage noise coupling and preserve repeatable latency.
Selection hooks (placeholder): phase-noise/jitter budget, tuning resolution, and temperature sensitivity aligned to sub-ns KPIs.
Phase measurement (beyond basic timestamp resolution)
Why basic timestamps hit a limit
At sub-ns targets, the effective resolution and jitter of ordinary timestamp capture can become the bottleneck unless finer-grain phase observability exists.
Engineering requirements
  • Phase observables must share the same timebase as timestamp capture.
  • Outputs must be calibratable and temperature-aware (residual error tracked).
  • Repeatability matters more than raw “spec” resolution.
Acceptance gate (placeholder): phase/timestamp residual distribution stays within X in steady windows and remains stable across resets.
Timestamp path gates (tap point · CDC · FIFO · repeatable latency)
Tap point
Define where timestamps are taken (MAC/PHY/SerDes boundary) and keep TX/RX semantics consistent.
Clock domain crossing (CDC)
CDC must preserve ordering and determinism; timestamp capture must not become load-dependent.
FIFO / buffering
FIFO depth and arbitration must not alter timestamp-to-wire latency; avoid queue-state-dependent timing paths.
Repeatability
Latency must remain stable across resets, link training, temperature changes, and throughput steps.
Fast validation (placeholder): run reset/retrain cycles + throughput steps + ΔT sweep; confirm latency shows no discrete clusters and p95 ≤ X.
Hardware boundary (must be in HW vs can be in SW)
Must be hardware-supported
  • Deterministic timestamp capture at a defined tap point.
  • Clock discipline actuation (DPLL/NCO/VCXO control) with adequate resolution.
  • Deterministic latency through CDC/FIFO paths (repeatable across states).
  • Calibratable observables and stable timebase distribution.
Can be software-managed
  • Measurement window selection and outlier rejection policy.
  • Health monitoring, drift alarms, trigger-driven recalibration orchestration.
  • Audit logs, binding checks, table lifecycle enforcement.
  • Conservative ramping policies for reacquire stability.
Acceptance gate (placeholder): hardware provides deterministic capture and actuation; software policies keep windows stable and enforce table binding.
Diagram · Clock tree + timestamp datapath (with domain crossing)
Clock tree Reference source PLL / DPLL discipline Fanout distribution PHY FPGA/SoC Clock domain crossing determinism · ordering · repeatability Timestamp path Packet in/out Tap point CDC bridge FIFO latency Servo in Gates: determinism · repeatability
The upper half shows clock discipline and distribution; the lower half shows timestamp capture and transport through CDC/FIFO. Sub-ns correctness depends on deterministic boundaries and repeatable latency across operating states.

H2-9 · Network Topologies & Redundancy (and how to coexist with TSN)

Intent

Translate deployment reality into measurable design rules: topology choice, redundancy behavior, and TSN coexistence boundaries. TSN handles deterministic forwarding windows; WR-style timing maintains the timebase (no TSN GCL details here).

Topology selection (control vs calibration complexity)
Tree / Star
  • Paths are stable and easy to segment into per-link budgets.
  • Calibration table lifecycle is simpler (clear bindings and fewer path flips).
  • Fault isolation is faster (branch-local diagnosis).
Ring
  • Strong redundancy, but path flips can invalidate timing assumptions.
  • Failover must define when to trust new asymmetry estimates.
  • Reacquire policies must prevent thrash on marginal links.
Multi-segment chains
  • Each added segment adds fixed delay + temperature drift + asymmetry terms.
  • Segment-level calibration and binding become mandatory, not optional.
  • End-to-end verification must include path-change scenarios.
Acceptance gate (placeholder): steady-topology offset p95 ≤ X; after path change, reacquire ≤ Y; peak transient ≤ Z.
Redundancy strategies (paths, sources, and failover triggers)
Dual path
  • Define main vs backup timing overlay paths.
  • On switch, enter a protection window before trusting new offsets.
  • Bind calibration tables to path identity (avoid stale compensation).
Dual time source
  • Trigger on lock-status loss, drift slope breach, or offset instability.
  • Use hysteresis to prevent rapid flapping between sources.
  • After switch, ramp corrections to avoid overshoot.
Failover triggers
  • Link-down / error spike / lock-status drop.
  • Asymmetry estimate jump beyond threshold (table likely invalid).
  • Offset p95 breach over a defined window (not a single sample).
Acceptance gate (placeholder): failover triggers are stable (no thrash); source/path switch returns to lock within X; asymmetry stays bounded within Y.
Coexistence with TSN (strict boundary, no GCL detail)
Responsibility split
  • TSN: deterministic forwarding windows and controlled queuing behavior.
  • WR-style timing: timebase maintenance (frequency lock, phase/time residual control).
Minimal coexistence constraints
  • Timing measurement windows must avoid queue-dominated latency regimes.
  • Timestamp paths must remain deterministic across TSN load patterns.
  • Path changes must be treated as calibration-binding events.
Acceptance gate (placeholder): under TSN high-load windows, offset p95 degradation ≤ X and no periodic wander amplification.
Common deployment pitfalls (avoid silent loss of sub-ns)
  • Path changes without recalibration binding (stale tables applied to new routes).
  • Failover triggers based on single samples (thrash and oscillation).
  • “Stable but biased” offsets due to measuring at congested/edge windows.
  • Using averages only; ignoring p95/peak/reacquire behaviors.
Diagram · Topology map (line · star · ring with timing overlay)
Main Backup Timing overlay Line Star Ring Src Sw1 Sw2 Node Core N1 N2 N3 N4 S1 S2 S3 S4 Path flips → recalibration binding
Use the same timing-overlay lens across line, star, and ring: path changes are calibration-binding events; redundancy must be engineered to avoid thrash and biased “stable” offsets.

H2-10 · Verification, Monitoring & Field Service (make it measurable)

Intent

Turn “it works” into measurable acceptance, diagnosis, and reproducibility: verification ladder, minimal KPI set, black-box evidence, and field-service mechanisms (loopback/self-test/remote update and rollback) without management-protocol details.

Verification ladder (bench → system → environmental)
Bench
  • Establish noise floor and baseline lock/reacquire behaviors.
  • Validate timestamp path determinism under controlled traffic.
  • Record baseline distributions (p50/p95/peak), not just averages.
System
  • Stress under realistic switching and throughput steps.
  • Detect “stable but biased” offsets caused by queuing regimes.
  • Verify redundancy events (path/source switch) and recovery gates.
Environmental
  • Temperature sweeps/steps, power disturbances, and mechanical vibration scenarios.
  • Measure wander slope, holdover drift, and reacquire stability across ΔT.
  • Confirm calibration binding rules remain correct under component swaps.
Acceptance gate (placeholder): offset p95 ≤ X; wander slope ≤ Y; holdover drift ≤ Z; reacquire ≤ W.
Measurement windows & reproducibility (avoid “stable but biased”)
Window rules
  • Define sampling window length and denominators (p50/p95/peak).
  • Align windows to events (failover, retrain, temperature step).
  • Separate steady windows from transition windows.
Minimal reproducibility sequence
  • Traffic step: low → high → low (capture distributions each phase).
  • Environmental step: T1 → T2 (track wander slope and bias drift).
  • Link event: holdover → reacquire (confirm ramping and no overshoot).
Acceptance gate (placeholder): consistent results across repeated runs with the same windows; no discrete “cluster jumps” after resets/retrain.
Minimal monitoring KPIs (only what is needed here)
offset
Use distribution (p95/peak) to detect noise injection and biased “stable” states.
wander
Track slope/trend across ΔT and load changes; diagnose frequency discipline weakness.
lock status
State snapshot (Lock/Tracking/Holdover/Reacquire) for event-aligned diagnosis.
asymmetry estimate
Watch for jumps and slow drift; treat jumps as table/path binding alarms.
holdover state
Identify protection windows and reacquire ramping; avoid thrash on marginal links.
Acceptance gate (placeholder): KPI stream remains consistent across resets; anomalies are event-aligned and explainable via state transitions.
Black-box evidence bundle (event + timestamp + environment)
Required fields
  • Event type: link flap, failover, lock change, calibration update.
  • Event timestamp + state snapshot timestamp (explicit).
  • Environment: temperature, supply, airflow/fan state (as available).
  • Window metadata: start/end, denominators, and thresholds in force.
Why it matters
Without event-aligned evidence, a clean-looking offset plot cannot be reproduced or used to isolate path changes, table invalidation, or measurement-window artifacts.
Acceptance gate (placeholder): every anomaly includes a matching event record and state snapshot; evidence bundle supports replay-style analysis.
Field service mechanisms (loopback · self-test · remote update & rollback)
Loopback / self-test
  • Fast split: link path vs timebase vs calibration binding issues.
  • Run with known windows and export KPI snapshots and event markers.
Remote update & rollback
  • Version binding: calibration tables and servo policies must match firmware versions.
  • Rollback must restore prior known-good behavior with recorded gates.
  • Change log must capture timing-relevant parameter diffs.
Acceptance gate (placeholder): self-test returns KPIs to thresholds within X; rollback restores reacquire ≤ Y and offset p95 ≤ Z.
Diagram · Verification flow (inputs → plan → gates → evidence bundle)
Make it reproducible: event-aligned windows · measurable gates · evidence bundle Inputs Instruments scope / analyzer Traffic load steps Environment ΔT / power Plan Define windows Execute steps Analyze KPIs Gate pass Outputs KPIs offset/wander Events state marks Bundle black-box
A reproducible flow ties defined measurement windows to event markers and an evidence bundle. Acceptance is based on distributions and state-aligned gates, not on averages.

H2-11 · Engineering Checklist (Design → Bring-up → Production)

Purpose: convert “sub-ns timing” into auditable gates. Each gate is defined by measurable checks, evidence fields, and pass criteria placeholders (X/Y/Z) to prevent silent drift and non-repeatable latency.

Gate A · Design
Lock the architecture first

Goal

Freeze the clock tree, timestamp tap points, and “repeatable latency boundaries” so that later calibration/servo work is not forced to compensate for unstable hardware paths.

Checklist (tick-box actions)

  • Clock tree contract: define reference source → jitter cleaner/DPLL → distribution → PHY/FPGA/SoC domains; mark which nodes are “must-follow” vs “free-run”.
  • Timestamp tap invariance: ensure the timestamp capture point does not change across firmware paths, offloads, or switch forwarding modes.
  • Deterministic latency budget: identify every FIFO/CDC/queue that can introduce non-determinism; require fixed-depth or bounded behavior.
  • Thermal drift entry: reserve sensors/telemetry fields (temperature, supply, airflow states) and define where drift coefficients will be stored.
  • Redundancy boundary: define which links/modules invalidate calibration (e.g., swapping SFP/PHY), and what the safe degrade mode is.

Evidence to capture

  • Clock-tree block diagram version + net names + domain IDs.
  • Timestamp tap-point description (per port) + CDC/FIFO depth constraints.
  • Calibration table schema draft: fields, units, and validity rules.
  • Telemetry field list: temp/supply/load/lock-state/event stamps.

Pass criteria (placeholders)

  • Timestamp determinism (same stimulus, repeated) ≤ X ns peak-to-peak.
  • Clock-domain crossing adds bounded delay ≤ Y ns (no unbounded queue growth).
  • Thermal coefficient fields defined and stored with units (°C, ppb/°C, ns/°C) — complete / auditable.

Example parts (BOM-level, non-exhaustive)

Renesas 8A34001 (PTP/SyncE SMU/DPLL) Microchip ZL30732A (DPLL jitter attenuator) Skyworks/Silicon Labs Si5341 (jitter attenuator) Microchip LAN8840 (GbE PHY w/ IEEE 1588 support) Microchip KSZ9477 (7-port switch, IEEE 1588v2 PTP) TI DP83640 (10/100 PHY, IEEE 1588 PTP) TI TMP117AIDRVR (high-accuracy temp sensor) Winbond W25Q128JV (SPI NOR for calibration/records) Abracon AOCJY-10.000MHZ-F-T (10MHz OCXO) SiTime SiT5356AE-FQ-33E0-40.000000X (Super-TCXO example)

Notes: choose temperature grade, package, and reference frequency (10 MHz / 25 MHz / 125 MHz) to match the servo bandwidth and distribution constraints.

Gate B · Bring-up
Make it stable under stress

Goal

Prove calibration + two-way measurement + servo locks remain measurable and repeatable across load, temperature steps, and link events (drop/reacquire/failover).

Checklist (stress-driven)

  • Calibration runbook: run factory/field/online recalibration paths; verify “validity rules” trigger correctly (module swap, path change, temp delta).
  • Load-step isolation: run low→high traffic; confirm queueing noise is excluded from the measurement window or bounded by filtering rules.
  • Lock/relock state machine: enforce holdover entry/exit criteria; prevent oscillation (thrash) when link flaps.
  • Injected disturbances: temperature step (T1→T2), supply ripple step, link down/up, redundant path switch; record event-stamped KPI traces.
  • Window + denominator consistency: ensure offset/wander metrics keep the same time window and denominator across tools and builds.

Evidence to capture

  • KPI streams: offset, wander, lock status, asymmetry estimate, holdover state (all with timestamps).
  • Event markers: load-step, temperature-step, link flap, failover switch, firmware revision.
  • Calibration snapshots: table version/hash before/after, plus applied coefficients.

Pass criteria (placeholders)

  • Reacquire time (link down→stable lock) ≤ X s.
  • Offset peak during load step ≤ Y ns (measured over window W).
  • Asymmetry estimate jump after path switch ≤ Z ns (otherwise force recalibration).

Example parts (bring-up enablers)

Microchip KSZ9477 (PTP-capable switch for test topologies) Microchip LAN8840 (PHY timestamp signals + GPIO) TI TMP117AIDRVR (temp step correlation) Renesas 8A34001 (DCO/DPLL modes for servo experiments) Winbond W25Q128JV (black-box traces + rollbacks)
Gate C · Production
Make it scalable and traceable

Goal

Ensure unit-to-unit consistency by binding calibration data to hardware identity and software revisions, with fast production tests that catch drift-sensitive failure modes.

Checklist (manufacturing control)

  • Calibration table governance: version, timestamp, units, and validity rules; store a hash and protect against mismatch.
  • Identity binding: bind module serial + port/path ID + firmware build ID + calibration hash.
  • Fast tests that matter: timestamp determinism quick-test, short holdover drift test, reacquire test, and a 2-point thermal spot-check.
  • Sampling plan: define lot sampling and escalation rules (rework/stop-ship) when drift-sensitive metrics shift.
  • Field forensics readiness: black-box logs must include event + KPI + environment; support rollback and “known-good” calibration restore.

Evidence to capture

  • Per-unit record: serials, calibration hash, firmware build ID, date codes.
  • Production KPI summary: p50/p95/peak for offset, lock time, holdover drift.
  • Failure artifacts: raw KPI traces + environment + link events (time-stamped).

Pass criteria (placeholders)

  • Unit-to-unit KPI spread (same fixture) ≤ X ns.
  • Lot drift (weekly/monthly) ≤ Y ns after normalization.
  • Forensics completeness: ≥ Z% of field incidents reproducible with logs + calibration restore.

Example parts (traceability + storage)

Winbond W25Q128JV (records/rollback images) Microchip 24AA256T-I/OT (I²C EEPROM for calibration table) TI TMP117AIDRVR (per-unit thermal spot-check)
Diagram · 3-Gate Checklist Pipeline
Three gates with tick-box items and a shared “Pass criteria” block (X/Y/Z placeholders).
Design → Bring-up → Production (Quality Gates) Gate A · Design Gate B · Bring-up Gate C · Production Clock tree Tap point Return path Drift entry Budget Calibration Inject stress Lock / relock Failover Windows Table mgmt Bind IDs Fast tests Audit Rollback Pass criteria: X (offset/peak) · Y (wander/drift) · Z (reacquire/failover stability)
Use this gate structure to keep WR-style timing measurable from schematic to field incidents.

H2-12 · Applications (WR-style timing as a timebase, no stack deep-dive)

This section stays on “why sub-ns is mandatory” and how to validate it with a small KPI set. Industrial stacks and TSN configuration tables are intentionally out of scope.

Use case A
Sub-ns distributed triggering / synchronous sampling

Why sub-ns

Trigger skew directly maps into measurement error when multiple nodes sample the same event. “Stable but biased” offsets are unacceptable because calibration must survive temperature and link events.

KPIs (only two)

Offset p95 ≤ X ns Wander slope ≤ Y (placeholder)

Design hooks

  • Event-stamped measurement windows: exclude queue bursts from the estimator.
  • Calibration validity rules: module/path swap must force recalibration or safe degrade.
  • Holdover policy: define how long the trigger system may trust time during outages.

Example parts

Renesas 8A34001 Microchip ZL30732A Microchip LAN8840 TI TMP117AIDRVR
Use case B
Phase-coherent sensing / phased arrays

Why sub-ns

Phase coherence is limited by frequency stability first. If frequency is not locked tightly, wander dominates even when packet timing looks “clean”.

KPIs (only two)

Phase residual ≤ X (placeholder) Holdover drift ≤ Y (placeholder)

Design hooks

  • Nested servo intent: inner frequency lock must be stable before phase/time outer loop is trusted.
  • Thermal model coverage: record temperature and airflow to explain phase jumps in the field.
  • Identity binding: phase performance must be tied to oscillator + module identity to avoid silent swaps.

Example parts

Skyworks/Silicon Labs Si5341 Abracon AOCJY-10.000MHZ-F-T SiTime SiT5356AE-FQ-33E0-40.000000X TI TMP117AIDRVR
Use case C
Large facilities: racks / distributed labs / power-grid measurement

Why sub-ns

Multi-segment links introduce temperature-dependent path changes and asymmetry. The system must detect when the estimate is no longer valid and enforce recalibration or safe operation.

KPIs (only two)

Reacquire time ≤ X s Asymmetry jump ≤ Y ns

Design hooks

  • Path ID and calibration validity: multi-segment topology requires explicit binding and invalidation rules.
  • Redundant sources and failover: define a protection window before “trusted time” is re-enabled.
  • Black-box readiness: store event + KPI + environment for incident reconstruction.

Example parts

Renesas 8A34001 Microchip KSZ9477 Winbond W25Q128JV
Use case D
Field service: measurable, recoverable, and traceable timing

Why sub-ns

Field failures are often intermittent. Sub-ns systems must expose internal states (lock, asymmetry, holdover) and support fast isolation without requiring protocol deep dives.

KPIs (only two)

Lock stability ≥ X% uptime Offset peak after event ≤ Y ns

Design hooks

  • Self-test hooks: loopback/PRBS for link sanity + timestamp path health checks.
  • Immutable records: event logs with environment fields for reproducibility.
  • Safe recovery: calibration restore + firmware rollback to last known good set.

Example parts

Winbond W25Q128JV Microchip 24AA256T-I/OT Microchip LAN8840
Diagram · Use-case Map (Applications + KPI tags)
Four application cards connected to a single WR-style timebase, each with only two KPI tags.
WR-style timebase two-way + calibration + syntonization A · Distributed triggering Offset p95 ≤ X Wander ≤ Y B · Phase-coherent sensing Phase ≤ X Holdover ≤ Y C · Large facilities Reacquire ≤ X Asym ≤ Y D · Field service Uptime ≥ X Peak ≤ Y
Application section stays KPI-driven: each use case is validated by two metrics only, preventing scope creep into stacks or TSN configuration tables.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13 · FAQs (Troubleshooting, fixed 4-line answers)

Scope rule: these FAQs only close long-tail troubleshooting within this page’s boundaries (two-way measurement, calibration, asymmetry, servo/holdover, timestamp/tap-path, topology/verification). No new protocol domains are introduced.

Offset is small but wander is large — is it loop bandwidth/filtering or missing thermal modeling?
Likely cause: a frequency/phase loop bandwidth that passes low-frequency drift, or an estimator window/filter that aliases temperature-driven delay into wander.
Quick check: correlate wander with temperature (ΔT) and traffic/load events; compare wander under a longer vs shorter measurement window.
Fix: tighten the inner frequency lock and re-tune filter/window to reject slow drift; enable drift coefficients in the calibration model for the active path.
Pass criteria: wander p95 ≤ X over Y minutes across ΔT ≤ Z°C, while offset p95 stays ≤ A ns.
Calibrated and OK at room temperature, but degrades at high temperature — thermal compensation model or group-delay drift?
Likely cause: thermal coefficients not applied (or applied to the wrong path), or hardware group delay changes exceeding the modeled range.
Quick check: run a 2–3 point temperature sweep and compare offset slope (ns/°C) against the stored coefficient; verify the “calibration validity” rule still marks the table as valid.
Fix: re-fit the temperature model for the deployed link, store coefficients per module/path, and enforce recalibration triggers when ΔT exceeds the qualified range.
Pass criteria: |offset slope| ≤ X ns/°C over T1–T2; offset peak ≤ Y ns after a temperature step ΔT ≤ Z°C.
After swapping an optical/RJ module, the whole system shifts by a constant step — was the fixed-delay table invalidated or serial binding missed?
Likely cause: fixed TX/RX delays and/or asymmetry parameters are path-specific; the calibration table was reused after a module change, or identity binding did not trigger invalidation.
Quick check: compare module identifiers (serial/part) and calibration-table hash/version; verify the “validity rule” flags the table as invalid when module ID changes.
Fix: enforce module/path/firmware binding for calibration tables; require recalibration (or safe degrade mode) after a swap.
Pass criteria: module swap triggers recalibration within X minutes; post-swap offset step ≤ Y ns after calibration is applied.
After relock, the error shows a step change — relock state machine or calibration-parameter load timing?
Likely cause: calibration parameters applied late/early relative to timestamp domain alignment, or relock logic exits holdover before coefficients are stable.
Quick check: time-stamp the sequence: link-up → measurement-ready → coefficients-applied → “trusted-time”; check if the step aligns with a state transition.
Fix: gate “trusted-time” on (a) stable frequency lock, (b) coefficient application completion, and (c) measurement window stabilization; add a post-relock settling window.
Pass criteria: relock step magnitude ≤ X ns; time-to-trust ≤ Y s with no secondary steps over Z reboots/relocks.
Short links are excellent, but long links suddenly degrade — asymmetry estimate or medium thermal coefficient?
Likely cause: long links amplify temperature-driven propagation changes and asymmetry; an estimate that is “stable” may be stable-but-wrong when conditions shift.
Quick check: compare offset vs temperature for short vs long link; look for slope change; verify asymmetry estimate stability (jump size) during load/temperature transitions.
Fix: recalibrate asymmetry for the deployed length, add temperature coefficients per link class, and enforce revalidation triggers when link length/class changes.
Pass criteria: long-link offset p95 ≤ X ns over Y minutes; asymmetry jump ≤ Z ns under ΔT ≤ A°C.
It looks “locked”, but triggers are still not synchronous — phase measurement resolution or inconsistent timestamp tap points?
Likely cause: the lock indicator reflects frequency/phase loop convergence, but trigger alignment is limited by timestamp/phase measurement granularity or a mismatched tap location.
Quick check: measure trigger skew at multiple repetition rates; if skew quantizes in steps, resolution is the limit; compare tap-path configuration hashes across nodes/ports.
Fix: unify tap points (same capture boundary and domain crossing), and ensure fine phase measurement is used where required; re-baseline trigger path after tap changes.
Pass criteria: trigger skew p95 ≤ X ns across Y nodes, with tap-path hashes identical and no quantization steps > Z ns.
Adding one switch/bridge segment makes the system unstable — queue jitter leakage or non-repeatable forwarding delay?
Likely cause: measurement windows include queueing bursts, or the added segment introduces variable forwarding latency that breaks repeatability assumptions.
Quick check: repeat the same test under low traffic and high traffic; if instability scales with load, queue jitter is leaking; check if forwarding latency distribution widens after insertion.
Fix: isolate timing measurement from traffic-induced queues (windowing/filtering), and require bounded/characterized forwarding latency on timing-critical paths.
Pass criteria: added segment increases offset p95 by ≤ X ns and does not increase wander p95 by > Y under load ≤ Z%.
Monitoring says aligned, but field data does not match — time-domain crossing or timestamp-domain mapping inconsistency?
Likely cause: KPIs are computed in a different domain/epoch than the application data timestamps, or a CDC/mapping layer applies the wrong reference when converting.
Quick check: trace one event end-to-end and verify domain identifiers (source clock ID, timebase ID, conversion stage); ensure the same window and denominator are used across tools.
Fix: standardize the timestamp-domain mapping contract (IDs, units, epoch), and enforce a single “source of truth” for conversions; record conversion metadata in logs.
Pass criteria: event-to-event alignment error ≤ X ns over Y trials, with domain ID consistency = 100%.
Holdover drifts too fast after a link outage — oscillator/PLL holdover strategy or environmental disturbance?
Likely cause: holdover mode does not preserve frequency stability (insufficient oscillator quality or poor holdover tuning), or environment (temperature/supply) shifts during outage.
Quick check: measure drift vs outage duration under controlled temperature and then under realistic airflow/supply; compare drift slope (ns/s or ppb equivalent).
Fix: improve holdover tuning (freeze/track strategy), validate oscillator class for the required outage window, and include temperature/supply compensation during holdover.
Pass criteria: holdover drift ≤ X ns over outage duration Y s at ΔT ≤ Z°C; reacquire time ≤ A s.
Sub-ns cannot be reached in a multi-hop topology — per-hop calibration strategy or the error budget is exhausted?
Likely cause: each hop adds fixed delay uncertainty and variable components; without per-hop calibration/validation, the end-to-end budget is consumed quickly.
Quick check: measure end-to-end and then measure hop-by-hop; identify which hop has the largest p95/peak contribution and whether it is stable or load/temperature dependent.
Fix: apply calibration and validity rules per hop/path, enforce bounded forwarding latency where required, and allocate an explicit budget table with acceptance thresholds.
Pass criteria: sum of per-hop p95 contributions ≤ X ns, and no single hop exceeds Y ns p95 under load ≤ Z%.
Measurements look “stable”, but the absolute offset is consistently biased — stable-but-wrong fixed delays or asymmetry?
Likely cause: a fixed-delay baseline or asymmetry term is incorrect but consistent, producing a stable bias that does not show as wander.
Quick check: validate against a known reference path or swap direction (where possible) to see if bias changes sign/magnitude; verify calibration table units and applied coefficients.
Fix: re-run fixed-delay/asymmetry calibration with strict validity rules; lock table identity to module/path and re-verify at two temperatures.
Pass criteria: absolute offset ≤ X ns over Y minutes and remains within ±Z ns across ΔT ≤ A°C.
Timing shifts after a firmware update even though hardware is unchanged — timestamp path/tap change or calibration schema mismatch?
Likely cause: the update changed the timestamp capture boundary, CDC/FIFO behavior, or coefficient load order; or it reads an older calibration table with mismatched schema/units.
Quick check: compare pre/post update: tap-path hash, FIFO depth constraints, calibration table version/hash, and the state transition timeline around “trusted-time”.
Fix: pin the tap-path contract across releases, migrate calibration schema with explicit unit checks, and block trusted-time until coefficients are applied and stable.
Pass criteria: no update-induced offset step > X ns across Y reboots; calibration table validation success = 100%.