White-Rabbit-Style Timing: ps-Level Phase Alignment

Q: Lock looks “OK” but phase slowly walks over hours—what should be checked first?

Likely cause: Unmodeled temperature-driven delay/asymmetry drift (SFP + fiber + chassis gradients) slowly biases one-way delay. Quick check: Log phase_error(t) together with SFP case temperature and node board temperature, then compute phase slope (ps/°C) and hysteresis over ≥2 thermal cycles. Fix: Add runtime temperature compensation (linear/LUT + guardband) and enforce a thermal management/placement rule to reduce gradients across transceivers. Pass criteria: Steady-state RMS phase error ≤ X ps over window T (e.g., 100–1000 s) and temperature slope ≤ Y ps/°C across the specified ambient range.

Q: After reboot, phase offset changes by a fixed step—what is the most common non-fixed-latency culprit?

Likely cause: A state-dependent latency element (FIFO depth, CDC elastic buffer, serdes deskew, or queueing) changes its internal state across boots. Quick check: Run N reboot cycles and record phase landing points; multiple discrete clusters (not noise) indicate hidden variable-latency states. Fix: Force deterministic pipeline configuration (fixed FIFO depth / bypass elastic buffers / lock timestamp point) and add a boot-time self-test that rejects multi-cluster landing. Pass criteria: Across N reboots, landing spread ≤ X ps and the distribution remains single-cluster (no discrete steps larger than X ps).

Q: Fiber link replaced with the same length, yet offset shifts—what parameter must be re-bound?

Likely cause: Calibration identity changed (SFP Tx/Rx latency deltas, connector pair, patch-cord dispersion), so “same length” does not preserve one-way delay. Quick check: Compare pre/post replacement: SFP serials, stored ΔTx/ΔRx, and baseline fiberDelay0; if serials differ, assume re-binding is required. Fix: Re-run the install calibration step and bind calibration parameters to the exact SFP serial + port + patch set; reject “unknown identity” links in runtime. Pass criteria: After re-binding, one-way offset shift ≤ X ps compared to baseline and remains within X ps across a re-plug/restart sanity loop.

Q: Calibration passes at room temperature but fails in a thermal sweep—what should be logged first?

Likely cause: Temperature-dependent latency/asymmetry drift exceeds the guardband because the calibration model does not include the dominant thermal driver. Quick check: Log phase error, SFP temperature, board temperature, and ambient temperature at a fixed sampling interval, then fit phase vs temperature with slope + hysteresis. Fix: Promote the strongest temperature proxy (often SFP case) into the runtime compensation model and enlarge guardband for unmodeled gradient terms. Pass criteria: Thermal sweep slope ≤ Y ps/°C and max deviation from model ≤ X ps across the full sweep rate and stabilization rules used in production.

Q: Bidirectional delay looks stable, but one-way is wrong—how can asymmetry be detected quickly?

Likely cause: RTT stability hides a constant one-way bias because Tx and Rx latencies differ and do not cancel without a valid asymmetry model. Quick check: Swap A/B transceivers between ends (or reverse the fiber pair if available) and observe whether one-way offset changes by ~2× the suspected asymmetry term. Fix: Calibrate and store ΔTx/ΔRx per module identity, and treat any identity change as invalidating one-way accuracy until re-calibrated. Pass criteria: Across swap/reverse tests, one-way offset remains within ±X ps of the modeled value and does not jump by more than X ps per identity-preserving reconnect.

Q: Failover causes a time step—how can switching be made bounded?

Likely cause: Standby path is not pre-calibrated/phase-tracked, so switchover applies an unknown delay/asymmetry state as an instantaneous step. Quick check: Measure failover_step under controlled A→B→A cycles while logging standby calibration validity, holdover window, and phase continuity flags. Fix: Continuously calibrate and phase-track standby, then switch only inside a verified holdover window and apply a bounded step acceptance policy (reject/rollback if exceeded). Pass criteria: Failover step ≤ Z ps (or ≤ Z ns by system spec) and recovery to steady-state RMS ≤ X ps occurs within the specified lock time budget.

Q: “ps-level” claim is not reproducible across instruments—what is the measurement trap?

Likely cause: Instrument noise floor, trigger jitter, or timebase discipline differences dominate the observed error instead of the system under test. Quick check: Run a loopback/control measurement to estimate the instrument floor, and force common settings (same reference, same bandwidth limit, same gating window and statistics). Fix: Standardize a metrology recipe (reference distribution, trigger method, cabling symmetry, statistics window) and publish the instrument floor alongside system results. Pass criteria: Two independent instruments agree within ±X ps after subtracting/confirming floors, and the measured floor is ≤ (X/2) ps over the same gating window.

Q: Cleaner/PLL change improves jitter but worsens alignment—what bandwidth interaction is likely?

Likely cause: Loop bandwidth shifts the jitter/wander trade-off: narrowing reduces random jitter but can increase low-frequency phase wander or slow temperature tracking. Quick check: Measure phase error PSD or time-domain wander with two BW settings and compare: (a) short-window RMS jitter and (b) long-window drift/slope. Fix: Set bandwidths by requirement split (random jitter vs wander) and add guardband so temperature tracking does not exceed alignment error budget. Pass criteria: Random jitter meets the endpoint limit while long-window phase wander stays ≤ X ps over T and thermal slope stays ≤ Y ps/°C under the expected gradient.

Q: Two nodes align well on the bench, but drift in the chassis—what mechanical/thermal gradient path dominates?

Likely cause: Chassis airflow and placement create persistent temperature gradients across transceivers and local clocking, producing repeatable phase drift not seen on an open bench. Quick check: Map temperature at SFPs, nearby regulators, and FPGA/clock areas while logging phase; correlate drift with ΔT between ends, not just absolute ambient. Fix: Reduce gradient (placement, airflow baffles, thermal coupling) and promote gradient-aware compensation (use the most correlated temperature sensors). Pass criteria: Under worst-case airflow/heat load, drift stays within X ps over T and the slope vs ΔT stays within Y ps/°C with no hysteresis-induced runaway.

Q: Packet load changes alignment—what does that imply about the timestamp point?

Likely cause: Timestamping is occurring after a variable-latency stage (queue, store-and-forward, DMA scheduling), so traffic modulates the effective timing reference. Quick check: Sweep traffic load (idle → max) while logging phase error; if phase tracks load with repeatable slope, the timestamp point is not at a fixed-latency boundary. Fix: Move timestamping to PHY/near-PHY fixed-latency points, isolate timing traffic from congested queues, and lock forwarding to deterministic paths. Pass criteria: Across the traffic sweep, phase shift ≤ X ps and any load-correlated component ≤ (X/2) ps after the timestamp point is fixed.

← Back to:Reference Oscillators & Timing

White-Rabbit-style timing achieves ps-level phase alignment by combining a deterministic (fixed-latency) data path with bidirectional link delay calibration and a closed-loop servo that tracks temperature and aging.

This page turns the concept into an engineerable workflow: delay/asymmetry modeling, factory→field→runtime calibration, verification criteria, and failover rules that keep time steps bounded.

What “White-Rabbit-Style Timing” Means (Definition + When It’s Worth It)

White-Rabbit-style timing targets phase alignment (ps-level) across distributed nodes, not just time synchronization (ns–µs-level) of timestamps. The core engineering problem is turning link delay—especially asymmetry and state-dependent latency—into quantities that are measurable, calibratable, and verifiable.

“Style” = a set of hard engineering constraints (not a brand)

Bidirectional link calibration: isolate and compensate one-way delay components and asymmetry (ΔTx/ΔRx, fiber effects).
Fixed-latency data path: prevent queue/FIFO/state changes from moving the timestamp point and introducing variable delay.
Phase measurement + servo: measure phase error with a low noise floor and close the loop to keep nodes phase-aligned.

Three targets that must not be mixed up

Time offset (ns)

Observable: 1PPS / timestamp offset. Action: hardware timestamping + delay compensation. Typical outcome: ns-class alignment.

Phase alignment (ps)

Observable: phase error / phase wander of continuous clocks. Action: fixed-latency path + calibrated asymmetry + phase servo.

Frequency syntonization (ppb)

Observable: frequency error and short-term stability. Action: disciplined local oscillator or recovered frequency as the servo base.

When it’s worth paying the complexity cost

Coherent / phased measurements (distributed digitizers, arrays): phase error directly reduces coherent gain and spectral purity.
Distributed triggering and time tagging: uncalibrated asymmetry becomes a systematic error that does not average out.
Remote sampling clocks: for high-frequency inputs, ps-level phase error behaves like sampling-time uncertainty.

Scope guardrails (prevents cross-page overlap)

This page focuses on

bidirectional calibration, fixed-latency paths, phase measurement/servo closure, and verification hooks for ps-level alignment.

This page avoids

full PTP protocol stack history/details and SyncE class deep dives; only hardware-loop implications are referenced.

Diagram #1. “Time sync” aligns timestamps (offset), while WR-style timing enforces calibrated, fixed-latency behavior to close a phase servo loop for ps-level alignment.

System Architecture: Grandmaster, WR Switch, WR Node, and the Calibrated Link

A White-Rabbit-style system is a calibrated timing chain. It treats the link as a measurable object (with known fixed components and tracked drift), then uses a phase-aware servo to keep all nodes aligned. The key is to define where timestamps are taken and which parts of the data path are fixed-latency.

The four architectural primitives

Grandmaster (GM)

Defines the reference time/frequency domain. Quality is ultimately limited by its stability and noise, but ps-level success depends on calibrated links and deterministic paths.

WR Switch

Provides a timing-aware forwarding domain. The timing-critical function is maintaining a fixed-latency behavior and enabling propagation of calibration parameters and status.

WR Node

Converts calibrated link measurements into a controlled local clock output (continuous clock phase, plus optional 1PPS/10 MHz). Contains the measurement engine and servo.

Calibrated Link

The link is treated as a delay model: measurable round-trip terms plus calibrated one-way components (ΔTx/ΔRx, asymmetry). Temperature and replacement events must be tracked.

“Two channels” concept: not necessarily two physical cables

Data + timestamp channel: packet exchange with timestamps taken at defined hardware points (TS points).
Delay/phase calibration channel: mechanisms and parameters that make one-way delay observable/compensable (asymmetry, ΔTx/ΔRx, drift).

Where ps-level alignment is won or lost (architecture-level checks)

Timestamp points must be stable: a timestamp that moves due to buffering/queue state converts network load into timing error.
Fixed-latency path must be defined: any hidden FIFO / variable retiming becomes “random delay,” which no calibration can fully remove.
Calibration parameters must be governed: binding parameters to modules (SFP/PHY), versioning, and re-calibration rules for replacements/temperature.

Diagram #2. The system map highlights fixed timestamp points (TS), fixed-latency behavior, and calibration parameters—the anchors required for ps-level alignment.

Link Delay Model: One-Way Delay, Asymmetry, and What Must Be Calibrated

Bidirectional measurements reveal a round-trip combination of delays. Ps-level alignment requires turning that combination into a one-way model with explicit components: fiber propagation, endpoint Tx/Rx chains, and an asymmetry term that does not average out.

Practical component model (engineering language)

One-way delay (A → B)

fiber propagation + Tx chain (A) + Rx chain (B) + asymmetry contribution

Round-trip (A ↔ B)

measurable as a stable combination, but not sufficient to uniquely determine one-way terms without calibration priors

Why “no calibration” gets stuck at ns–tens of ns

Asymmetry is a systematic error: Tx ≠ Rx latencies, wavelength/module differences, and thermal gradients create a one-way bias that does not vanish with averaging.
State-dependent latency is a random error: variable buffering/retiming (FIFO, queue state, clock-domain crossings) converts traffic and internal state into timing wander and non-repeatable offsets.
Ps-level demands explicit ownership of error terms: each major delay contributor must be either measured online, calibrated offline, or bounded with guardband.

Actionable breakdown: what can be measured online vs what must be calibrated

Online measurable (runtime observables)

round-trip delay trend (RTT) and stability
phase error / phase wander slope over time
repeatability after restart (step detection)

Factory calibrated (needs binding)

ΔTx / ΔRx endpoint latencies (SFP/PHY/MAC/PCB chain)
module/port-specific constants (serial-bound)
baseline offsets for fixed, deterministic paths

Thermally sensitive (model + guardband)

SFP temperature and drift coupling
fiber environmental gradients (route / airflow / enclosure)
mechanical and thermal changes after replacement events

Diagram #3. Round-trip measurements are not enough for ps-level one-way accuracy without calibrated Tx/Rx chain terms and controlled asymmetry.

Bidirectional Link Calibration Workflow (Factory + Field + Temperature)

Ps-level alignment requires a calibration lifecycle, not a one-time measurement. A practical workflow separates what must be learned at the factory (endpoint chain deltas), what must be established on installation (fiber baseline), and what must be tracked in operation (thermal drift and replacement events).

Three-layer calibration (minimum viable process)

Factory (build-time)

Measure ΔTx/ΔRx for endpoint chains and bind them to module/port identity (serial-bound). Store with version and traceability.

Install (deployment-time)

Establish fiber delay baseline for the installed path (fiber + connectors + jumpers). Record link identity and baseline timestamp.

Runtime (operation-time)

Track thermal drift, detect non-repeatable steps, and raise alarms. Apply a temperature model (linear/LUT) and guardband; trigger re-calibration or failover when needed.

Temperature strategy: drift tracking must be explicit

SFP temperature is often the fastest-changing contributor and should be logged with phase wander.
Enclosure gradients create direction-dependent drift (asymmetry) and must be bounded with guardband when not fully modelable.
Model levels: threshold-only alarms → linear tempCoef → LUT (segmented). Choose based on ps budget and environment dynamics.

Timing parameter governance (keeps calibration valid in the field)

Version + traceability

Store calVersion, fixture identity, timestamp, and measurement repeatability. Enable rollback when a new model increases drift.

Binding rules

Bind ΔTx/ΔRx and related constants to module/port serial IDs. Replacement of a bound item is a calibration invalidation event.

Re-cal triggers

Trigger re-calibration or failover on: module change, fiber path change, non-repeatable offset step, or drift slope exceeding guardband.

Diagram #4. A three-stage calibration lifecycle separates factory constants, installation baselines, and runtime drift governance to keep ps-level alignment valid over time.

Fixed-Latency Data Path: Where Determinism Is Won or Lost

Ps-level alignment fails most often when the timestamp boundary crosses a state-dependent path. “Fixed-latency” means the effective delay between timestamp points remains repeatable across load, congestion, reset, and temperature—rather than only looking stable on average.

Fixed-latency boundary (practical definition)

TS1 → TS2 delay must not change with buffer depth, arbitration outcomes, or recovered-clock state.
The requirement is repeatability (single-cluster distribution), not merely a low average offset.
Any variable stage inside TS1–TS2 converts traffic/implementation state into timing error that calibration cannot remove.

Common sources of state-dependent delay (what breaks determinism)

SerDes / PCS / clock recovery

elastic buffers / rate matching / gearbox state
recovered-clock relock events causing step-like delay changes
retiming paths that depend on alignment state

PHY/MAC boundary and timestamp placement

MAC-side timestamps exposed to DMA, caching, or arbitration
timestamp points that shift with pipeline depth or mode
ambiguous TS definition (ingress vs egress stage not fixed)

FIFO / CDC / buffering

asynchronous FIFO fill-level changes → delay modulation
clock-domain crossings inside TS boundary
store-and-forward stages hidden behind “transparent” interfaces

Switch forwarding (phenomena + countermeasures only)

queueing and congestion → multi-modal delay distribution
priority arbitration → delay depends on traffic mix
countermeasure: isolate timing flows; avoid shared congested domains

Verification hooks: how to detect a non-deterministic path quickly

Load sweep

Sweep background traffic from idle to stressed and inspect whether offset/phase error distribution widens or becomes multi-cluster.

Priority injection

Introduce bursty traffic and check for step-like timing changes correlated with queue/arb events.

Restart repeatability

Repeat link bring-up and verify the resulting offset forms a single stable cluster; discrete “bands” indicate hidden state-dependent stages.

Scope guardrail: this section focuses only on fixed-latency behavior and timestamp boundaries (not a full Ethernet QoS design guide).

Diagram #5. Any state-dependent stage inside TS1–TS2 converts traffic and internal state into timing error. Ps-level alignment requires a deterministic boundary.

Phase Measurement Engine: DDMTD/TDC and Why ps-Level Is Hard

Ps-level alignment requires a phase measurement engine with a noise floor below the target. The measurement must separate true system phase error from measurement chain noise, using repeatable statistics rather than relying on a visually “clean” 1PPS trace.

Two engineering routes to ps-resolution (selection view)

DDMTD (beat-frequency phase amplification)

uses a low-frequency beat to make tiny phase differences observable
excellent for long-window statistics and drift slope tracking
requires clean clock-domain management and repeatable sampling

TDC (direct time interval conversion)

quantizes time intervals into digital codes (event/edge friendly)
must manage quantization noise and nonlinearity across temperature
verification focuses on effective noise floor, not nominal resolution

Why ps-level is hard: measurement floor vs system jitter

Observed phase noise is the sum of true system phase error and measurement chain noise.
A clean 1PPS waveform does not prove ps-repeatability; ps validation requires windowed statistics and repeatable experiments.
If the measurement floor dominates, improvements in the actual link/servo will not appear in the results, masking real engineering progress.

Verification hooks (to confirm the measurement reflects the system)

Noise-floor check

Feed both inputs from the same source to estimate measurement floor and confirm the estimator produces a tight, single-cluster distribution.

Windowed statistics

Use short windows to characterize jitter and longer windows to characterize drift/wander; compare slope changes under controlled perturbations.

A/B isolation

Change one variable (reference quality, fixed-latency stage, or traffic isolation) and confirm the phase distribution responds in the expected direction.

Load/Temp perturbation

Apply small load or temperature perturbations and verify the response matches the deterministic boundary assumptions defined in H2-5.

Scope guardrail: this section focuses on engineering selection and validation hooks (not circuit-level DDMTD/TDC implementation details).

Diagram #6. DDMTD makes tiny phase differences observable via a low-frequency beat, then estimates phase using filtering and windowed statistics.

Servo Design: Frequency Syntonization + Phase Alignment as Two Coupled Loops

Ps-level alignment is achieved by separating error sources into two control objectives: frequency syntonization suppresses long-term drift, while phase alignment corrects residual phase error without importing link or measurement noise. The most common failure mode is a loop split that forces the phase loop to fight drift, or a bandwidth choice that injects noise into the actuator.

Two objectives (keep them distinct)

Frequency syntonization (inner loop)

tracks the recovered clock / SyncE / local OCXO to control drift slope
defines holdover behavior and long-window stability
should reject fast link noise and measurement noise

Phase alignment (outer loop)

uses phase estimate (DDMTD/TDC) to remove residual phase error
must avoid “pulling in” link jitter or estimator floor
should converge to a single-cluster phase distribution

Bandwidth selection: actionable tuning rules (no math)

Inner loop BW_f (frequency)

too wide: imports link noise → short-window phase noise rises
too narrow: cannot follow thermal drift → drift slope increases
tune to control long-window drift while keeping short-window jitter near the oscillator floor

Outer loop BW_p (phase)

too wide: injects estimator floor/link jitter → phase distribution broadens
too narrow: slow convergence → lock time grows, temp tracking lags
increase BW_p only until lock time meets requirements without raising the phase noise floor

Guardband and lock criteria (engineering-facing)

Guardband

Budget the residual uncertainty that is not fully calibratable (thermal gradients, aging, module swaps, environment changes). Guardband defines how much phase change is acceptable during tuning, holdover, and failover.

Lock criteria

A valid lock is indicated by a single-cluster phase distribution, stable drift slope under steady conditions, and repeatable convergence after restart—rather than a state flag alone.

Scope guardrail: this section provides practical loop-splitting and tuning principles (not PLL stability math or circuit-level actuator design).

Diagram #7. Separate the objectives: the frequency loop controls drift (BW_f), while the phase loop removes residual phase error (BW_p) without importing measurement or link noise.

Network Design Patterns: Topology, Redundancy, and Failover Without Time Steps

Timing-grade redundancy is not achieved by merely adding a second path. The goal is hitless or bounded-step switching: both paths must be calibrated, the failover state machine must be stable, and the holdover window must prevent transient link artifacts from becoming phase steps.

Topology patterns (timing view, not throughput view)

Star / shallow tree

Clear calibration boundaries and smaller error accumulation. Prefer fewer hops when ps-level repeatability is the top requirement.

Cascaded switches

Each hop must preserve fixed-latency behavior and propagate calibration parameters consistently. Verify determinism hop-by-hop under traffic.

Long chains

Higher risk of drift accumulation and multi-fault propagation. Use only when boundaries, monitoring, and failover constraints are strongly enforced.

Redundancy requirements for bounded-step switching

Calibrate both paths: standby must be pre-calibrated and continuously validated.
Stable state machine: avoid oscillatory switching; use explicit enter/exit criteria.
Holdover window: during transition, prevent transient artifacts from becoming phase corrections.
Guardband-aware policy: switching acceptance is defined by guardband and repeatability, not a status bit.

Switching criteria (timing observables)

Trigger

loss-of-lock alarms, phase error trend exceeding guardband, missing calibration validity, or drift slope outside the allowed envelope.

Acceptance

post-switch phase distribution returns to a single cluster and remains within guardband under load and temperature perturbations.

Scope guardrail: this section defines timing-layer switching logic and criteria (not general network redundancy protocols).

Diagram #8. Timing-grade failover requires both paths to be calibrated and validated. Switching must be governed by a stable state machine and accepted only if the resulting phase remains bounded and repeatable.

Verification & Metrology: How to Prove ps-Level Alignment (and Not Fool Yourself)

Ps-level claims must be backed by statistics and controlled comparisons, not a clean-looking waveform or a lock flag. A valid verification plan spans time, temperature, restart repeatability, and failover behavior with logged conditions and a reportable pass template.

What to measure (the minimum set)

Steady-state phase error

RMS / percentile phase error within a defined time window
single-cluster distribution (no multi-step modes)

Phase wander & drift slope

hour/day-scale trends and low-frequency excursions
correlation against temperature and configuration changes

Holdover drift

phase/time error growth during loss of link / reference
re-acquisition behavior after restoring the path

Temperature sensitivity

phase vs temperature slope and hysteresis
SFP swap / airflow change before-after comparisons

Restart repeatability

N restarts should converge to the same phase cluster
no mode-hopping or step-like re-landing behavior

Toolchain: use each instrument for what it can actually prove

Time interval counter (TIC)

Best for long-window drift, holdover growth, and event-to-event stability. Ensure fixed trigger conditions and report the time base reference used.

Phase noise / phase comparator

Best for phase noise floor and integration-window controlled jitter metrics. Always state integration bandwidth and averaging settings in the report.

DSO (oscilloscope)

Best for transient diagnosis (failover step timing, event correlation, glitches). Not sufficient to prove ps-level repeatability without statistics and controlled comparisons.

Scope guardrail: this section describes how to prove performance (not an instrument purchasing guide).

Scripted validation and portable pass templates

Logging dimensions

Log time (hours/days), temperature, and configuration (SFP/port/firmware/servo params/cal version). Require before-after comparisons for SFP swap, restart cycles, and failover events.

Pass criteria templates

steady-state RMS phase error ≤ X ps (window T, conditions C)
temperature slope |d(phase)/dT| ≤ Y ps/°C (range, soak rule)
failover step ≤ Z (ps or ns), recovery ≤ R, single-cluster afterward

Diagram #9. A ps-grade verification plan must cover time, temperature, restart repeatability, and failover. Each cell pairs a metric with a method and a reportable pass template.

Engineering Checklist: Bring-up → Calibration → Production → Field Ops

A White-Rabbit-style timing system is deliverable only when it can be built, calibrated, screened, and operated with repeatable gates. Each stage must produce logs and parameters tied to device identity and configuration version, then pass a defined stamp before proceeding.

Gate 1 — Bring-up

Fixed-latency sanity: verify timestamp points and remove variable-delay stages (FIFO/CDC/queue)
Baseline lock: confirm frequency lock first, then phase loop convergence
Baseline logs: record steady-state phase distribution and drift slope under known conditions

Outputs

logs (baseline), firmware/config versions, timestamp-point map

Pass stamp

stable lock, single-cluster phase, baseline reproducible after restart

Gate 2 — Calibration

Write & bind params: tie calibration to serial/SFP/port/link identity
Thermal model: perform temperature logging with a defined soak rule; extract slope and hysteresis
Guardband policy: budget residual uncertainty (module swap, gradients, aging)

Outputs

params (ΔTx/ΔRx, fiberDelay0, tempCoef, guardband), cal version record

Pass stamp

temperature slope within template, repeatable convergence with recorded cal version

Gate 3 — Production

Fixture consistency: keep cable/port/trigger conditions constant
Fast screen: short-window metrics plus restart repeatability within a fixed time limit
Sampling strategy: tighten checks for temperature corners and rework units

Outputs

quick report template, batch linkage, pass/fail stamps

Pass stamp

meets X/T template, repeatability holds across N restarts

Gate 4 — Field Ops

Monitoring: lock/alarm/temp proxies, drift slope, asymmetry indicators
Fault grading: separate measurement-floor issues from link/cal/version issues
Failover policy: switch only when standby parameters are valid; accept only bounded-step outcomes

Outputs

long logs (time/temp/config), event records (failover/loss), rollback notes

Pass stamp

stable operation under temperature/traffic, bounded-step failover with acceptance logs

Diagram #10. Gate the lifecycle. Each stage produces logs/params and a pass stamp so ps-level performance is deliverable, auditable, and repeatable across build, calibration, production screening, and field operation.

Applications (WR-Style): Where ps-Level Actually Changes the System Design

Ps-level alignment is not “nicer timestamps.” It changes system constraints: fixed-latency paths become mandatory, calibration data becomes a lifecycle asset, and redundancy must be designed for bounded steps rather than best-effort continuity.

A) Coherent sampling / distributed ADC

Why ps-level matters

Channel-to-channel phase consistency becomes a “make-or-break” boundary for multi-node measurement behaving like one instrument.

Design impact

fixed-latency timestamp boundary must be stable across restart cycles
phase measurement floor must be lower than the claimed system error
calibration parameters must be versioned and tied to module identity

Common failure mode

“Lock = aligned” assumption: phase re-lands into multiple clusters after reboot due to hidden variable-latency stages.

B) Distributed trigger / event time-marking

Why ps-level matters

Tight event ordering and deterministic time-marking reduce “time-walk budget pressure” and simplify downstream timing margins.

Design impact

define a consistent “event reference point” (timestamp location and signal edge)
failover must be bounded-step (measure and accept, not guess)
holdover policy must match outage window and event criticality

Common failure mode

Redundancy designed for link uptime only; a switchover creates unbounded time steps that break event correlation.

C) Multi-chassis / multi-card synchronization

Why ps-level matters

Maintainability becomes the real win: repeatable phase alignment across chassis without manual re-tuning.

Design impact

cable/SFP replacement must trigger re-calibration rules (governance)
temperature gradients must be measured and guarded (not assumed away)
restart must return to the same phase cluster (production-grade repeatability)

Common failure mode

Calibration treated as a one-time factory step; field changes (airflow, fiber routing, module swap) silently invalidate phase alignment.

Scope guardrail

This section describes timing-driven system constraints. It does not expand into full domain architectures (radar/accelerators/broadcast stacks) or algorithm details.

Diagram #11. Three WR-style application patterns. Ps-level alignment “pays off” only when the system is designed for fixed latency, calibrated links, and lifecycle governance.

IC Selection Notes (WR-Style): What to Choose and What to Measure First

WR-style selection is a decision tree driven by targets and measurable risk. Start from ps error / holdover / topology, then choose oscillator class, phase measurement engine, fixed-latency path elements (PHY/SFP), and switch redundancy strategy. Prefer “measure-first” hooks before committing to a BOM.

Measure-first hooks (before selecting parts)

1) SFP temperature-driven phase drift

log phase vs SFP temperature (slope + hysteresis)
verify repeatability after airflow and module reseat

2) Link asymmetry repeatability

cycle link up/down and compare re-landing phase (single cluster expectation)
compare A/B transceivers of the same model to expose systematic offsets

3) Restart return-to-cluster

run N restart cycles and confirm phase returns to the same cluster
if multi-cluster appears, search for hidden variable-latency stages (FIFO/CDC/queues)

Selection logic (by functional blocks)

Node oscillator class (holdover & dynamics)

need for OCXO/TCXO is set by outage window and temperature gradients
need for VCXO is set by servo bandwidth, lock time, and tuning range
verify with long-window drift and temperature slope (not only short-window jitter)

Phase measurement engine (DDMTD/TDC floor)

ensure measurement noise floor is below the target ps error
ensure stable internal clocking and isolation (avoid self-injection)
use controlled comparisons to separate measurement floor vs system jitter

PHY/SFP & fixed-latency path (determinism)

prefer deterministic latency behavior (avoid state-dependent buffering)
require readable temperatures and traceable calibration identity
test asymmetry repeatability across swaps and re-plugs

Switch capability (calibration & bounded failover)

deterministic forwarding path is required (avoid queue-dependent timing)
support calibration parameter propagation and identity binding
failover requires verified bounded step (acceptance logs)

Example material numbers (for datasheet lookup & first measurements)

These part numbers are provided to accelerate verification and datasheet discovery. Final selection must be driven by the decision tree and the measure-first hooks above. Always verify package/suffix, required features, and availability for the target build.

A) DPLL / jitter attenuators (clean + track)

Silicon Labs: Si5345A, Si5341
Analog Devices: AD9545, AD9548
Texas Instruments: LMK04828, LMK05318

B) Clock distribution / fanout

Analog Devices: ADCLK948, ADCLK954
Texas Instruments: LMK01020, CDCLVP1216
Analog Devices: LTC6952 (low jitter distribution)

C) Programmable XO / clock source

Silicon Labs: Si570 (programmable XO family)
Epson: SG-8002 (XO family)
NDK: NZ2520SD (XO family)

D) Time / phase measurement (TDC)

Texas Instruments: TDC7200, TDC7201
ams OSRAM: TDC-GPX2

E) FPGA (implementation anchor examples)

AMD/Xilinx Artix-7: XC7A35T-1FTG256C
AMD/Xilinx Kintex-7: XC7K325T-2FFG900C

F) SFP transceivers (starting points for thermal/asymmetry tests)

Finisar: FTLF1318P3BTL (1G LX-type)
Broadcom/Avago: AFBR-5710PZ (1G optical)

Diagram #12. Start from targets, then select oscillator/measurement/path/redundancy. Use measure-first hooks to expose temperature drift, asymmetry repeatability, and restart clustering before locking the BOM.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (WR-Style): Calibration, Asymmetry, Fixed Latency, Metrology, Failover, Temperature

Each answer follows a strict 4-line, measurable structure: Likely cause / Quick check / Fix / Pass criteria. Placeholders X/Y/Z are acceptance thresholds to be set by the system timing budget.

Lock looks “OK” but phase slowly walks over hours—what should be checked first?

Likely cause

Unmodeled temperature-driven delay/asymmetry drift (SFP + fiber + chassis gradients) slowly biases one-way delay.

Quick check

Log phase_error(t) together with SFP case temperature and node board temperature, then compute phase slope (ps/°C) and hysteresis over ≥2 thermal cycles.

Fix

Add runtime temperature compensation (linear/LUT + guardband) and enforce a thermal management/placement rule to reduce gradients across transceivers.

Pass criteria

Steady-state RMS phase error ≤ X ps over window T (e.g., 100–1000 s) and temperature slope ≤ Y ps/°C across the specified ambient range.

After reboot, phase offset changes by a fixed step—what is the most common non-fixed-latency culprit?

Likely cause

A state-dependent latency element (FIFO depth, CDC elastic buffer, serdes deskew, or queueing) changes its internal state across boots.

Quick check

Run N reboot cycles and record phase landing points; multiple discrete clusters (not noise) indicate hidden variable-latency states.

Fix

Force deterministic pipeline configuration (fixed FIFO depth / bypass elastic buffers / lock timestamp point) and add a boot-time self-test that rejects multi-cluster landing.

Pass criteria

Across N reboots, landing spread ≤ X ps and the distribution remains single-cluster (no discrete steps larger than X ps).

Fiber link replaced with the same length, yet offset shifts—what parameter must be re-bound?

Likely cause

Calibration identity changed (SFP Tx/Rx latency deltas, connector pair, patch-cord dispersion), so “same length” does not preserve one-way delay.

Quick check

Compare pre/post replacement: SFP serials, stored ΔTx/ΔRx, and baseline fiberDelay0; if serials differ, assume re-binding is required.

Fix

Re-run the install calibration step and bind calibration parameters to the exact SFP serial + port + patch set; reject “unknown identity” links in runtime.

Pass criteria

After re-binding, one-way offset shift ≤ X ps compared to baseline and remains within X ps across a re-plug/restart sanity loop.

Calibration passes at room temperature but fails in a thermal sweep—what should be logged first?

Likely cause

Temperature-dependent latency/asymmetry drift exceeds the guardband because the calibration model does not include the dominant thermal driver.

Quick check

Log phase error, SFP temperature, board temperature, and ambient temperature at a fixed sampling interval, then fit phase vs temperature with slope + hysteresis.

Fix

Promote the strongest temperature proxy (often SFP case) into the runtime compensation model and enlarge guardband for unmodeled gradient terms.

Pass criteria

Thermal sweep slope ≤ Y ps/°C and max deviation from model ≤ X ps across the full sweep rate and stabilization rules used in production.

Bidirectional delay looks stable, but one-way is wrong—how can asymmetry be detected quickly?

Likely cause

RTT stability hides a constant one-way bias because Tx and Rx latencies differ and do not cancel without a valid asymmetry model.

Quick check

Swap A/B transceivers between ends (or reverse the fiber pair if available) and observe whether one-way offset changes by ~2× the suspected asymmetry term.

Fix

Calibrate and store ΔTx/ΔRx per module identity, and treat any identity change as invalidating one-way accuracy until re-calibrated.

Pass criteria

Across swap/reverse tests, one-way offset remains within ±X ps of the modeled value and does not jump by more than X ps per identity-preserving reconnect.

Failover causes a time step—how can switching be made bounded?

Likely cause

Standby path is not pre-calibrated/phase-tracked, so switchover applies an unknown delay/asymmetry state as an instantaneous step.

Quick check

Measure failover_step under controlled A→B→A cycles while logging standby calibration validity, holdover window, and phase continuity flags.

Fix

Continuously calibrate and phase-track standby, then switch only inside a verified holdover window and apply a bounded step acceptance policy (reject/rollback if exceeded).

Pass criteria

Failover step ≤ Z ps (or ≤ Z ns by system spec) and recovery to steady-state RMS ≤ X ps occurs within the specified lock time budget.

“ps-level” claim is not reproducible across instruments—what is the measurement trap?

Likely cause

Instrument noise floor, trigger jitter, or timebase discipline differences dominate the observed error instead of the system under test.

Quick check

Run a loopback/control measurement to estimate the instrument floor, and force common settings (same reference, same bandwidth limit, same gating window and statistics).

Fix

Standardize a metrology recipe (reference distribution, trigger method, cabling symmetry, statistics window) and publish the instrument floor alongside system results.

Pass criteria

Two independent instruments agree within ±X ps after subtracting/confirming floors, and the measured floor is ≤ (X/2) ps over the same gating window.

Cleaner/PLL change improves jitter but worsens alignment—what bandwidth interaction is likely?

Likely cause

Loop bandwidth shifts the jitter/wander trade-off: narrowing reduces random jitter but can increase low-frequency phase wander or slow temperature tracking.

Quick check

Measure phase error PSD or time-domain wander with two BW settings and compare: (a) short-window RMS jitter and (b) long-window drift/slope.

Fix

Set bandwidths by requirement split (random jitter vs wander) and add guardband so temperature tracking does not exceed alignment error budget.

Pass criteria

Random jitter meets the endpoint limit while long-window phase wander stays ≤ X ps over T and thermal slope stays ≤ Y ps/°C under the expected gradient.

Two nodes align well on the bench, but drift in the chassis—what mechanical/thermal gradient path dominates?

Likely cause

Chassis airflow and placement create persistent temperature gradients across transceivers and local clocking, producing repeatable phase drift not seen on an open bench.

Quick check

Map temperature at SFPs, nearby regulators, and FPGA/clock areas while logging phase; correlate drift with ΔT between ends, not just absolute ambient.

Fix

Reduce gradient (placement, airflow baffles, thermal coupling) and promote gradient-aware compensation (use the most correlated temperature sensors).

Pass criteria

Under worst-case airflow/heat load, drift stays within X ps over T and the slope vs ΔT stays within Y ps/°C with no hysteresis-induced runaway.

Packet load changes alignment—what does that imply about the timestamp point?

Likely cause

Timestamping is occurring after a variable-latency stage (queue, store-and-forward, DMA scheduling), so traffic modulates the effective timing reference.

Quick check

Sweep traffic load (idle → max) while logging phase error; if phase tracks load with repeatable slope, the timestamp point is not at a fixed-latency boundary.

Fix

Move timestamping to PHY/near-PHY fixed-latency points, isolate timing traffic from congested queues, and lock forwarding to deterministic paths.

Pass criteria

Across the traffic sweep, phase shift ≤ X ps and any load-correlated component ≤ (X/2) ps after the timestamp point is fixed.

SFP module change breaks calibration—what must be version-controlled?

Likely cause

Calibration parameters (ΔTx/ΔRx, temp coefficients, baseline fiberDelay0) are not bound to module identity, so a swap silently invalidates one-way accuracy.

Quick check

Compare SFP serial/vendor/DOM fields to the calibration record; if identity mismatch exists, treat the link as “uncalibrated” until re-certified.

Fix

Version-control calibration datasets (device serial, port, firmware, date, model) and enforce a replacement workflow that triggers recalibration and acceptance tests.

Pass criteria

After swap + recalibration, one-way error ≤ X ps and repeatability ≤ X ps across replug/restart, with calibration record traceable to the new identity.

Phase error is periodic—what is the first “beat frequency / DMTD” sanity check?

Likely cause

A small frequency offset or mixing/aliasing artifact produces a beat note that appears as periodic phase modulation.

Quick check

Measure the period P of the phase ripple and compare 1/P to known offsets (reference differences, DMTD beat frequency, PLL update rate, or servo sampling rate).

Fix

Eliminate the offset source (reference mismatch, wrong divisor, sampling alias) or move the beat out of band (adjust sampling/loop rates) without breaking stability margins.

Pass criteria

Periodic component amplitude ≤ X ps (peak or RMS as specified) and does not reappear across temperature, traffic load, and restart acceptance tests.

White-Rabbit-Style Timing: ps-Level Phase Alignment

White-Rabbit-Style Timing: ps-Level Phase Alignment

What “White-Rabbit-Style Timing” Means (Definition + When It’s Worth It)

“Style” = a set of hard engineering constraints (not a brand)

When it’s worth paying the complexity cost

System Architecture: Grandmaster, WR Switch, WR Node, and the Calibrated Link

The four architectural primitives

“Two channels” concept: not necessarily two physical cables

Link Delay Model: One-Way Delay, Asymmetry, and What Must Be Calibrated

Practical component model (engineering language)

Why “no calibration” gets stuck at ns–tens of ns

Actionable breakdown: what can be measured online vs what must be calibrated

Bidirectional Link Calibration Workflow (Factory + Field + Temperature)

Three-layer calibration (minimum viable process)

Temperature strategy: drift tracking must be explicit

Timing parameter governance (keeps calibration valid in the field)

Fixed-Latency Data Path: Where Determinism Is Won or Lost

Fixed-latency boundary (practical definition)

Common sources of state-dependent delay (what breaks determinism)

Verification hooks: how to detect a non-deterministic path quickly

Phase Measurement Engine: DDMTD/TDC and Why ps-Level Is Hard

Two engineering routes to ps-resolution (selection view)

Why ps-level is hard: measurement floor vs system jitter

Verification hooks (to confirm the measurement reflects the system)

Servo Design: Frequency Syntonization + Phase Alignment as Two Coupled Loops

Two objectives (keep them distinct)

Bandwidth selection: actionable tuning rules (no math)

Guardband and lock criteria (engineering-facing)

Network Design Patterns: Topology, Redundancy, and Failover Without Time Steps

Topology patterns (timing view, not throughput view)

Redundancy requirements for bounded-step switching

Switching criteria (timing observables)

Verification & Metrology: How to Prove ps-Level Alignment (and Not Fool Yourself)

What to measure (the minimum set)

Toolchain: use each instrument for what it can actually prove

Scripted validation and portable pass templates

Engineering Checklist: Bring-up → Calibration → Production → Field Ops

Gate 1 — Bring-up

Gate 2 — Calibration

Gate 3 — Production

Gate 4 — Field Ops

Applications (WR-Style): Where ps-Level Actually Changes the System Design

A) Coherent sampling / distributed ADC

B) Distributed trigger / event time-marking

C) Multi-chassis / multi-card synchronization

IC Selection Notes (WR-Style): What to Choose and What to Measure First

Measure-first hooks (before selecting parts)

Selection logic (by functional blocks)

Example material numbers (for datasheet lookup & first measurements)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

FAQs (WR-Style): Calibration, Asymmetry, Fixed Latency, Metrology, Failover, Temperature

Explore

Categories

Get in Touch