During leap second or ToD updates, how to avoid time rollback or reordering in applications?

Avoid large time steps into the system timebase. Prefer bounded slew policies for ToD/PHC alignment, record update moments as timestamped events, and segment statistics into steady-state vs update/recovery windows. Validate by forcing leap-second-like transitions and checking that logs and event ordering remain monotonic and bounded.

Time Card (PTP/SyncE/GPSDO) for Data Center Servers

Q: If a NIC already has a PHC, why add a Time Card?

A Time Card is justified when the system needs multi-reference discipline, strong holdover, and auditable bounded behavior. A NIC PHC may timestamp packets well, but it often lacks external reference diversity (GNSS/SyncE/10MHz/1PPS), higher-grade oscillators, and validation-grade event logs for reference switching and holdover transitions.

Q: GNSS is locked—why can offset still jump intermittently?

“Locked” can still be unhealthy. Offset jumps commonly come from reference quality fluctuation (multipath/interference), reference switching events, or a disciplining loop that tracks GNSS noise too aggressively. Prove whether the jump aligns with a timestamped event (switch/holdover/alarm), and inspect CN0/satellite count and reference-valid flags.

Q: After jitter cleaning, spurs increase—what are the most common root causes?

Spurs often grow due to unsuitable loop bandwidth/profile, reference switching transients leaking through, or power/ground coupling modulating the PLL. Prove with before/after phase-noise snapshots and a controlled reference switch test. If spur families are deterministic, configuration or synthesis settings are likely; if event-correlated, switching or coupling is likely.

Q: How to turn phase noise L(f) and integrated jitter into an acceptance-ready spec?

An acceptance spec must declare measurement point and integration limits. L(f) is comparable only with fixed offset-frequency range, detector method, and bandwidth. Integrated jitter must state RMS integration band and whether spurs are included. Report an L(f) mask plus RMS jitter and spur list, and verify repeatability across temperature and switching states.

Q: How long should holdover be tested, and which statistics avoid short-test traps?

Holdover testing must match outage risk and oscillator time constants. Short tests miss temperature gradients and drift behavior. Measure time error growth long enough to reveal drift slope, capture temperature context, and report slope plus worst segments and tail risk (p99/p999) in declared windows, separating steady-state from recovery.

← Back to: Data Center & Servers

A data-center Time Card provides a provable timing foundation by combining disciplined frequency (SyncE/GPSDO) and precise time (PTP/ToD/1PPS) into a single, auditable hardware reference. It exists to keep timestamps deterministic through reference loss/switching and to make holdover, jitter, and time-step behavior measurable and acceptance-testable.

H2-1 · Scope & Boundary — What this page solves

This page focuses on time-card hardware behavior and how to measure, validate, and debug it. It treats timing as an engineering deliverable: a stable frequency reference, a controlled time-of-day output, and a deterministic hardware timestamp base.

What this page provides (verifiable deliverables)

Stable frequency (SyncE/GPSDO) with explicit holdover behavior: expected drift over time, re-lock behavior, and measurement methods (phase error / time error / ADEV-style stability views).
Accurate time (PTP / ToD / 1PPS) with controlled transitions: when to step vs slew, how to avoid time-jumps during reference loss/recovery, and what to validate before deployment.
Auditable timestamp base (PHC + HW timestamp): where the capture point sits in the data path, what creates non-determinism, and how to prove tail performance (p99/p999 stability).

Where it matters (typical failure modes it prevents)

Distributed event ordering: avoids cross-node log mis-ordering and “false causality” when time alignment degrades.
Compliance timestamps: provides traceable, verifiable timing with explicit reference quality and state transitions.
Cluster observability: reduces long-tail timestamp variance that breaks correlation across systems under load.

Out of scope (kept as one-line pointers only)

PTP network algorithm deep dives (BMCA / transparent-clock internals) — use a dedicated timing-network page.
OOB/BMC/KVM subsystems — see the Management pages for IPMI/Redfish/KVM topics.
High-speed SI retimer design — see the PCIe Retimer/Switch pages for signal integrity depth.

Holdover drift PHC determinism HW timestamp path Jitter / phase noise Reference quality state

Figure F1 — Server timing chain and the Time Card boundary

Minimal mental model: reference signals enter the card, a disciplined oscillator plus DPLL/PLL creates a stable timebase, and the card exports PHC/time-of-day and deterministic hardware timestamps with observable states.

H2-2 · 1-Minute Answer — When a Time Card is truly needed

A time card becomes a necessity when time must be deterministic, traceable, and resilient to reference loss. It moves timing from “best-effort behavior” into an engineered, measurable subsystem—with explicit holdover, jitter control, and hardware timestamp guarantees.

Definition (optimized for snippet extraction)

A Time Card is a server timing module that disciplines an OCXO/TCXO using GNSS and/or SyncE, then exposes a stable PHC, ToD/1PPS/10MHz outputs, and IEEE 1588 hardware timestamps. It is built for verifiable accuracy, low jitter, and predictable holdover during reference loss.

How it works (5 steps)

1

Acquire references GNSS (1PPS/ToD), SyncE recovered clock, or external 10MHz/1PPS—plus quality/status signals.
2

Discipline the oscillator DPLL/servo controls OCXO/TCXO frequency and phase; lock/holdover states are explicitly managed.
3

Clean jitter and phase noise Jitter-cleaner PLL shapes noise across offset frequencies to stabilize outputs and timestamp base.
4

Drive PHC and time outputs PHC/ToD/1PPS are aligned to the disciplined timebase with controlled step/slew behavior.
5

Timestamp with determinism IEEE 1588 hardware timestamps are captured at a defined point, enabling stable tail performance (p99/p999).

Decision triggers (any one suggests a Time Card)

Tail stability matters: p99/p999 timestamp error is more important than average offset.
Reference loss is realistic: GNSS can be blocked/jammed, or upstream timing can be disrupted—holdover must be bounded.
Time must be auditable: compliance logs or forensic timelines require traceable states (lock/holdover/quality) and repeatable behavior.
System correlation breaks today: cross-node logs, traces, or events fail correlation during load spikes or topology changes.
Mixed timing inputs exist: SyncE for frequency plus PTP for time needs a clean, well-defined convergence point.

Four traits to verify (expressed as measurable acceptance language)

Holdover: maximum time error after X minutes without GNSS/SyncE; drift curve should be bounded and repeatable.
Determinism: timestamp stability across load (tail metrics), avoiding long-tail spikes caused by uncertain capture points.
Jitter / phase noise: integrated jitter (10MHz) and phase-error behavior (1PPS) meet system budget.
Observability: clear state reporting (lock/holdover/quality/alarms) plus event timestamps to support debugging and audits.

Figure F2 — Capability map: Software clock vs NIC PHC vs Time Card

The comparison is intentionally “engineering-first”: it highlights bounded behavior, tail stability, and observable states—rather than average offset claims.

H2-3 · System Context — How a Time Card fits into a data-center server

A time card is the timing boundary inside a server: it takes one or more external references (GNSS, SyncE, or an external 10MHz/1PPS), disciplines an on-card oscillator, and then exposes a stable local timebase (PHC/ToD/1PPS/10MHz) plus deterministic hardware timestamps. This section explains practical integration—placement, I/O, redundancy, and the failure points that most often break real deployments.

Form factors and placement (interfaces and constraints)

PCIe add-in card (AIC): easy to service and swap, typically offers front-panel connectors (SMA/MCX). Watch for panel cabling, grounding, and vibration/airflow sensitivity around the oscillator zone.
OCP mezzanine: board-level integration is cleaner and repeatable at scale; management sideband is convenient. External reference connectors may be limited, so reference distribution strategy must be decided early.
On-board module: best for fixed platforms that prioritize mechanical stability. Field replacement is harder, so reference redundancy and observability become even more important.

Reference inputs (what exists + the practical risk points)

GNSS antenna (outdoor) / GNSS distribution (indoor)

Common risks: lightning/surge on the feeder, cable loss budget errors, multipath reflections near metal structures, intermittent遮挡/coverage holes.
Minimum engineering actions: surge protection + compliant bonding, feeder loss budget, placement validation with GNSS quality indicators (C/N0, satellites, lock stability).

1PPS ToD C/N0 lock stability

SyncE recovered clock (from the network)

Common risks: upstream quality level changes, reference switching that introduces wander, “looks locked” but tail timing degrades.
Minimum engineering actions: monitor quality/alarms, define reference priority, verify behavior during source switching (no uncontrolled time jumps).

SyncE QL wander switching

External 10MHz / 1PPS reference

Common risks: ground loops, poor termination/impedance match, noisy distribution that injects jitter into a “clean” reference.
Minimum engineering actions: define cabling/connector standards, termination rules, and validation using phase-error measurements (not only average offset).

10MHz 1PPS termination phase error

Outputs and consumers (what uses the card inside and outside the server)

Local PHC: the server’s auditable hardware timebase for time services and correlation.
IEEE 1588 hardware timestamps: deterministic packet timestamping at a defined capture point for stable tail behavior.
ToD / 1PPS / 10MHz: exported for measurement ports, lab validation, or cross-card timing relationships.
Status / alarms / event logs: essential for proving reference quality and diagnosing holdover and switching events.

Practical redundancy model (minimal but effective)

N+1 references: combine at least two independent inputs (e.g., GNSS + SyncE, or GNSS + external 10MHz/1PPS) to avoid single-point dependency.
Failure isolation: distinguish “reference quality degraded” vs “cable/connector fault” vs “surge/ground event” using explicit states and alarms.
Observable transitions: lock → holdover → re-lock must be visible and bounded; uncontrolled time steps are treated as failures.

Figure F3 — Integration view: references → time card → consumers (with redundancy & sidebands)

The diagram emphasizes the real deployment boundary: reference distribution and redundancy on the left, disciplined timing and observable states in the middle, and the server/measurement consumers on the right.

H2-4 · Reference & Oscillator — What OCXO/TCXO/GNSS specs actually mean

Timing products often advertise “excellent stability,” yet real systems fail on holdover, tail behavior, or uncontrolled re-lock steps. The practical goal is to translate oscillator and reference claims into acceptance language: measurable limits on time error, repeatable drift curves, and visible state transitions.

OCXO vs TCXO (selection boundary)

OCXO: chosen when long, bounded holdover is required and thermal stability dominates error. It is typically more resilient to airflow/ambient swings, but demands careful thermal placement.
TCXO: chosen when space/power/cost dominate and holdover requirements are shorter. It is more sensitive to gradients and board-level temperature dynamics.
Practical boundary: select by required holdover time-error bound and environmental variability, not by a single ppm line item.

Metric → Impact → How to measure (engineering-style acceptance)

Stability across time windows (ADEV meaning without formulas)

Metric: short / mid / long windows (τ≈1s–10s, 100s–1000s, 10k s).
Impact: short windows shape jitter-like behavior; mid windows reveal control and thermal dynamics; long windows dominate holdover drift and aging.
Measure: record phase/time error vs a known reference; derive stability views per window; verify repeatability across runs.

phase error time error repeatability

Drift sources (what actually moves the clock)

Metric: temperature coefficient, aging rate, supply sensitivity, mechanical/airflow sensitivity.
Impact: changes the drift slope during holdover and can alter re-lock behavior after transients.
Measure: controlled temperature sweep, long-duration holdover tests, and “airflow/placement A/B” checks in a representative chassis.

temp gradient aging supply noise

GNSS is accurate, but can be “jumpy”

Metric: lock stability, C/N0 trends, satellite visibility changes, and event frequency of reference loss.
Impact: repeated micro-loss or multipath can inject steps or force frequent transitions into holdover.
Measure: log quality indicators alongside time error; validate that loss/recovery transitions remain bounded.

C/N0 multipath bounded transitions

Holdover “anti-hype” checklist (questions that force real specs)

Bounded time error: after losing references, what is the maximum time error at 5/15/30/60 minutes (explicit curve, not a single adjective)?
Test conditions: under what temperature and airflow profile were those bounds verified (stable lab vs chassis reality)?
Re-lock behavior: does the card step or slew when references return, and what is the maximum step limit?
Proof: are lock/holdover/quality states and event timestamps exported for audits and debugging?

Figure F4 — Holdover budget: lock → loss → holdover → re-lock (acceptance-language view)

This schematic is intentionally acceptance-oriented: it highlights bounded holdover and controlled re-lock behavior, which are the most common gaps in marketing-style specifications.

H2-5 · Disciplining Loop — How GNSS/SyncE/external references are “tamed” by a DPLL

A time card is not “accurate” simply because a reference exists. Accuracy becomes usable only after the reference is translated into a bounded, observable, and controlled local timebase. This section explains the practical disciplining loop: the state machine, the phase-error → filter → frequency-correction core, and the failure modes that typically cause time steps, jitter growth, or unstable holdover behavior.

State machine (acceptance-oriented, not theory-oriented)

Acquire → Lock → Disciplined

Acquire: reference is detected and qualified. Entry/exit should be governed by quality thresholds plus a time window (debounce).
Lock: phase error is pulled into a bounded range. The system must expose lock indicators and “quality stable” timers.
Disciplined: the oscillator is continuously corrected. The key requirement is bounded short-term noise injection and bounded long-term drift.

Holdover → Re-lock

Holdover: reference is lost or rejected; the local oscillator provides continuity. Acceptance must be stated as a time-error bound vs time (not a vague “good holdover”).
Re-lock: reference returns. The critical requirement is a controlled transition: define whether re-lock may step or must slew, and define the maximum allowed step/slew.

Loop core: phase error → filtering → frequency correction (the bandwidth trade-off)

Phase-error estimation: converts reference-vs-local differences into a controllable error signal (observable and logged).
Filtering: decides which variations are trusted. Too much trust in a noisy reference injects noise; too little trust slows convergence.
Frequency correction: adjusts the oscillator so the local timebase stays close to the reference without amplifying short-term noise.
Engineering trade-off: a faster loop tracks short-term changes but can increase jitter; a slower loop reduces injected noise but can drift more during transients and take longer to settle.

Typical failure modes (what the system looks like when it fails)

Over-trusting GNSS

Symptom: occasional time steps or tail spikes even when “locked” appears true.
Why it happens: intermittent quality changes (遮挡/multipath) drive the loop as if they were real time shifts.
Acceptance: inject reference-quality disturbances and verify bounded transitions and logged state reasons.

Loop too fast (noise injection)

Symptom: short-term jitter rises; timestamp tails become worse (p99/p999 deteriorate).
Why it happens: the loop passes reference noise into the local oscillator and timebase.
Acceptance: compare configurations by tail metrics, not only average offset.

Re-lock causes a step

Symptom: when the reference returns, time suddenly jumps; ordering/correlation breaks.
Why it happens: uncontrolled phase correction during re-lock.
Acceptance: specify step limit (or require slew) and verify during repeated loss/recovery cycles.

Acceptance checklist (forces “real” specs)

Holdover bound: maximum time error at 5/15/30/60 minutes after reference loss (curve + limit).
Re-lock bound: step vs slew policy and the maximum allowed step / maximum slew rate.
Source switching: bounded transient error during reference priority changes; reason codes must be logged.
Tail behavior: validate p99/p999 time error/timestamp error, not only average offset.

Figure F5 — DPLL state machine + loop bandwidth trade-off (engineering view)

Left: define observable entry/exit conditions for each state. Right: bandwidth is an engineering choice; validate using tail metrics and bounded transitions.

H2-6 · IEEE 1588 on a Time Card — Hardware timestamps and the PHC datapath

Deterministic timing in a server depends on two things: a disciplined local hardware clock (PHC) and hardware timestamp capture at a well-defined point. Many systems look fine on “average offset” yet fail on ordering and correlation because tail errors and variable internal delays dominate real workloads. This section maps the interface-level datapath: where timestamps are captured, how PHC aligns with ToD/1PPS, and where determinism is gained or lost.

Timebase: PHC as the local, auditable hardware clock

PHC is a continuously running hardware counter used as the server’s local time reference.
It is disciplined by the time card’s loop (frequency/phase corrections), with explicit constraints on step vs slew behavior.
Acceptance requires observable states: lock/holdover/re-lock and event timestamps for each transition.

Timestamp capture location: PHY vs MAC (engineering impact)

Capture closer to the line-side (PHY-side)

Benefit: fewer variable delays between the physical arrival/departure and the capture point.
Outcome: better determinism; tail behavior tends to be tighter.

Capture higher in the datapath (MAC-side or above)

Risk: queueing, clock-domain crossings, and internal scheduling can add variable latency.
Outcome: average offset can still look fine while p99/p999 timestamp error worsens.

Why ordering can break even when offset looks small

Tail spikes: rare timestamp error spikes can exceed the event spacing that applications rely on.
Variable delay: nondeterministic internal latency corrupts correlation even when mean offset remains bounded.
Transition steps: re-lock steps (or aggressive corrections) can invert ordering within short windows.

ToD / 1PPS alignment: making outputs consistent with PHC

1PPS provides an edge reference for measurement and alignment checks.
ToD formatting converts the internal timebase into exported time-of-day while maintaining bounded update behavior.
Engineering boundary: define the policy for step vs slew so exported signals remain auditable and transitions remain bounded.

Determinism checklist (what to validate)

Tail limits: p99/p999 timestamp error bounds (not only average offset).
Fixed capture point: capture location must be explicit and consistent (deterministic path vs variable path).
Transition bounds: maximum step / maximum slew rate during re-lock and reference switching.
Consistency checks: PHC vs ToD vs 1PPS alignment logged over time.

Figure F6 — Timestamp datapath: capture point → PHC → ToD/1PPS outputs (interface level)

The capture point defines determinism. When variable delay is introduced upstream of the capture, tail timestamp errors rise even if average offset stays small.

H2-7 · SyncE Integration — Why many data centers use “SyncE for frequency, PTP for time”

SyncE and PTP solve different parts of the same timing problem. SyncE distributes a low-wander frequency foundation, while PTP aligns time/phase using packet-based measurements. In practice, separating “frequency stability” from “time alignment” reduces tail behavior, improves robustness under link quality changes, and makes acceptance criteria clearer for operators.

What SyncE adds (even when PTP already exists)

Lower wander at the local timebase: a stable recovered clock reduces how hard the timing servo must work to maintain frequency continuity.
Less noise injection into timing: when frequency is already “quiet,” PTP can focus on time/phase alignment rather than compensating for oscillator wander.
Clear separation of concerns: frequency quality can be validated independently from time alignment, which improves troubleshooting and acceptance.

EEC / ESMC / SSM / QL (quality-level language for reference selection)

Quality level (QL) is a “reference grade label”

QL communicates the expected reference quality in operational terms: “which reference is better” and “when to switch.”
SSM/ESMC carry and coordinate these quality labels so equipment can choose a reference consistently.
Stable selection requires policy: thresholds, hold-off, and hysteresis to prevent frequent flapping.

QL mismatch hold-off too short unstable upstream priority ambiguity

Why frequent switching happens on-site

Policy too sensitive: aggressive thresholds cause frequent reference changes for marginal quality variations.
Quality propagation breaks: QL labels are not aligned end-to-end, leading to inconsistent selection.
Unclear priority rules: multiple “best” references appear equal, so the system oscillates between them.

The acceptance goal is not “never switch,” but bounded switching: stable selection under normal conditions and controlled transitions with traceable reason codes.

Integration with a time card (the chain order matters)

Frequency path: SyncE recovered clock → jitter cleaner → PHC discipline (reduces wander and stabilizes the local frequency foundation).
Time path: PTP packets → hardware timestamps → PHC alignment (aligns time/phase using deterministic capture points).
Operational requirement: the time card must expose reference state, QL/selection events, and bounded transition behavior during loss/recovery.

What to specify & verify (turn words into acceptance)

Frequency foundation: low-wander behavior under normal operation; verify by time-error vs observation window (trend + bounds).
Switching stability: switching rate limits, hold-off/hysteresis policy, and reason codes for each event.
Time alignment tails: p99/p999 time error and timestamp error bounds (not only mean offset).
Loss/recovery: bounded transitions during reference loss/recovery (step vs slew policy + maximum allowed step/slew).

Figure F7 — Dual-path timing: SyncE (frequency) + PTP (time) converge on the time card

Frequency is stabilized by SyncE → jitter cleaning → discipline, while time/phase is aligned using hardware timestamps. Acceptance focuses on bounded switching and tail metrics.

H2-8 · Jitter Cleaner PLL — Phase noise, jitter, and why “cleaning bandwidth” decides success

A jitter cleaner is not a magic box that always improves timing. Its loop bandwidth decides how much input-reference noise is passed through versus how much local-oscillator noise dominates the output. Selecting the right device and configuration requires measurable acceptance: phase noise, integrated jitter, spur behavior, and bounded switching under real reference disturbances.

Key parameters (each must map to an observable failure mode)

Loop bandwidth

Too wide: passes reference noise → output short-term jitter worsens.
Too narrow: slow tracking and longer recovery during switching or disturbances.
Acceptance: compare tail jitter/time-error under the same disturbance profile.

Output phase noise & integrated jitter

Phase noise L(f) must be stated at offset regions that matter (near vs far).
Integrated jitter must declare the integration band, or the number is not comparable.
Acceptance: verify both noise floor and tail behavior; do not rely on a single “typical” value.

Spurs & switching behavior

Spurs may appear from reference coupling, fractional synthesis artifacts, or power-domain interactions.
Hitless/bounded switching requires controlled phase continuity and limited transient error.
Acceptance: test switching events and degraded references; confirm bounded transients and logged reasons.

Measurement & acceptance (no shopping list, only test logic)

Phase noise: measure L(f) and identify discrete spurs; evaluate near-offset and far-offset regions separately.
Jitter integration: integrate the phase-noise curve over a declared band to obtain RMS jitter; compare against limits.
10MHz / 1PPS phase comparison: measure time error/TIE trends and tail behavior during disturbances and switching.
Allan deviation (ADEV): use time-scale-dependent stability to validate holdover-like behavior at relevant τ windows (conceptual, not derivation).

Why “cleaning” can get worse (engineering summary)

Bandwidth mismatch: the loop passes the wrong noise region; short-term jitter rises.
Reference disturbances: switching or marginal references inject spurs or transient phase errors.
System coupling: periodic spurs appear due to reference/power interactions; verify with controlled stimulus and repeatability.

Figure F8 — Concept view: what a jitter cleaner “passes” vs “suppresses” across frequency offsets

A cleaner improves output only when the bandwidth is chosen to suppress the right noise region without passing the wrong one. Validate with declared integration bands, spur checks, and bounded switching tests.

H2-9 · Interfaces & Form Factors — PCIe cards, OCP modules, connectors, and electrical constraints

Time cards succeed or fail at the interfaces. The same oscillator and disciplining design can deliver very different results depending on cabling, grounding, shielding continuity, port direction, and how redundant references are routed. This section turns “it connects” into “it is stable, measurable, and maintainable.”

Form factors: where constraints come from (interface-level only)

PCIe AIC (add-in card)

Chassis grounding & shielding continuity become part of the timing path (front-panel openings, seams, and cable shields matter).
EMI coupling risk increases with crowded slots and mixed high-speed cards; verify tail behavior under realistic load profiles.
Serviceability is strong, but stability depends on disciplined cable routing and consistent port labeling (IN/OUT).

OCP mezzanine / module

Reference ground coupling to the baseboard is tighter; sideband signals are easier to consolidate.
Front-panel ports (if present) must maintain shield-to-chassis continuity; avoid creating unintended return paths through the module.
Redundancy strategy should be planned at the rack level (physical separation of reference feeds).

On-board timing block

Fewer external ports can reduce exposure, but system-level grounding and zoning become decisive.
Validation points must be planned (where to observe 1PPS/10MHz/ToD behavior without disturbing the system).
Maintainability tradeoff: less flexible field replacement requires stronger monitoring and event logging.

Server-side interfaces (what each is used for, not implementation)

PCIe (host attachment)

Purpose: exposes the timing device to the host and provides the control/visibility surface.
Constraint: platform reset/power behavior affects timing bring-up; acceptance must include realistic reboot and recovery scenarios.
Operational need: maintain a clear mapping between physical card identity and observed timing state (slot/serial/event logs).

SMBus / I²C (telemetry path)

Purpose: read reference state, temperature, alarms, and event counters that explain timing behavior.
Constraint: prioritize stable, low-noise routing; avoid placing long, noisy sideband runs adjacent to sensitive reference wiring.
Acceptance: telemetry must remain available during reference loss/recovery so the root cause is traceable.

GPIO (alarms / status)

Purpose: simple “state truth” for lock/holdover/reference switching and critical alarms.
Constraint: define direction and logic clearly; treat long GPIO runs as noise antennas unless properly referenced and shielded.
Acceptance: status transitions must match logged events (no “silent switching”).

Front/rear panel ports: 1PPS / 10MHz / ToD / IRIG (list + constraints + common pitfalls)

1PPS (edge timing)

Common roles: output to validation equipment or downstream distribution; input from an external reference when used.
Direction discipline: label IN vs OUT explicitly; avoid accidental “loop feed” when redundant ports exist.
Pitfalls: inconsistent shield/ground reference can show up as tail spikes during load changes or switching events.

10MHz (frequency reference)

Common roles: input for external frequency foundation; output for distribution or measurement.
Constraint: define termination expectations and preserve coax shield continuity to avoid common-mode pickup.
Pitfalls: “works on average” but worsens integrated jitter due to imperfect shield return or mixed grounding.

ToD / IRIG (time-of-day encoding)

Common roles: output for systems that consume absolute time encoding outside of packet timing.
Constraint: ensure a consistent reference ground model; mixed-ground attachments can cause intermittent decode instability.
Pitfalls: long unshielded runs or unclear directionality increase the chance of sporadic time jumps.

Antenna feeds & protection (engineering practice, not RF internals)

Feed routing: treat the feed as an external disturbance path; separate it physically from power bundles and noisy harnesses.
Surge / lightning protection: protection must have a defined discharge path; avoid turning the shield into a ground-loop driver.
Grounding & isolation: keep shield termination consistent across the installation; inconsistent shield bonding is a repeatable source of tail instability.

Multi-reference and redundancy: prevent redundancy from becoming a coupling source

Physical separation: route redundant reference cables separately; avoid long parallel runs that create a shared coupling path.
Role clarity: label each port (IN/OUT, primary/secondary) and keep direction consistent with configuration.
Protection symmetry: apply the same protection and grounding strategy on both paths, or switching behavior becomes asymmetric.
Acceptance: verify bounded transitions and event traceability during reference switching under realistic disturbance profiles.

Installation & bring-up checklist (interfaces)

All front-panel ports clearly labeled with signal type and direction (IN/OUT).
Coax shield continuity verified end-to-end (no “floating shield” segments).
10MHz and 1PPS cabling separated from noisy harnesses and power bundles.
Surge protection has a defined discharge path; no unintended ground loops created through shields.
Redundant reference paths are physically separated and share consistent grounding/protection policy.
Switching events produce bounded transient behavior and are traceable via alarms/telemetry.

Figure F9 — Port map: panel connectors, signal types, and directions (concept)

Use explicit IN/OUT labeling, preserve shield continuity, and keep redundant reference routes physically separated. The diagram is conceptual—port naming and availability depend on the specific card.

Bring-up & Validation: a repeatable acceptance + production test plan

The goal is to turn “looks synchronized” into “provably within spec” with clear measurement points, statistics that expose tail risk (p99/p999), and logs that make every field incident reproducible.

How to structure the validation (3 layers)

Rule of thumb: validate the local timebase and disciplining behavior in the lab first, then validate switching/tails in a rack, then shrink it into a fast, deterministic production test.

Lab verification Phase noise / integrated jitter, 1PPS–10MHz phase compare, ADEV windows, loop response & lock/holdover behavior.

Rack validation Reference switching, bounded time steps, p99/p999 offset/phase-error statistics, re-lock recovery signatures.

Production test Fast functional pass/fail: lock/holdover/alarms, output presence, config hash & traceability records.

Lab verification (what to measure, how to state pass/fail)

Lock behavior: define start state and end state (e.g., “from cold boot to disciplined”), record time-to-lock and first-stable window (avoid averaging away transients).
1PPS phase & 10MHz phase compare: treat short-term noise and long-term drift as different outputs; use TIE trend plus distribution rather than a single “typical” number.
ADEV windows: declare which τ values are acceptance-critical (e.g., τ=1s for short-term, τ=100s+ for holdover tendency), and keep the same τ set across builds.
Loop response sanity: apply a controlled disturbance on the reference (small step or frequency offset), then verify bounded response (no large time step, no excessive settling time, no jitter blow-up).

Example tool models (MPN/Model):

Phase noise / jitter: Keysight E5052B, R&S FSWP, Microchip 53100A

Time interval / phase: Keysight 53230A, Pendulum CNT-91

Example time-card timing ICs (MPN):

DPLL / sync: ADI AD9545, Microchip ZL30772, Renesas 8A34001, Skyworks/Silicon Labs Si5345

GNSS timing module/board: u-blox ZED-F9T-10B / ZED-F9T-20B, u-blox RCB-F9T-1 timing board

Rack validation (prove tails + switching are bounded)

Reference switching: test main→backup→main (and GNSS loss/recovery). The acceptance item is not just “recovers” but bounded step and repeatable signature.
Tail-aware statistics: report min / typical / p99 (optionally p999). Split measurements into steady-state vs recovery windows; never mix them into one histogram.
Evidence-first logging: every switch/holdover entry must emit a timestamped event record and a reason code; this allows one-to-one correlation to offset jumps.

bounded step / slew p99/p999 tails switch reason codes recovery signatures

Example SyncE/PTP test instrument (Model): Calnex Paragon-neo (PTP + SyncE).

Production test (fast pass/fail + traceability)

Fast functional: lock/holdover/alarm paths, output presence (1PPS/10MHz/ToD), and basic frequency sanity. Keep it deterministic.
Config integrity: record a configuration hash (loop BW profile, reference priority list, switch policy) per serial number. Prevent “same hardware, different behavior” incidents.
Calibration binding: tie any factory calibration (e.g., oscillator trim tables) to the serial number and the firmware/config set; store in a tamper-evident log if required.

Typical production pitfall: passing only “average offset” hides rare time steps and long-tail jitter. A quick “tail probe” (short p99 window after forced switching) catches the majority of escapes.

Figure F10 — Acceptance test matrix (Item × Tool Model × Pass Focus × Common Fail Clue)

Use the matrix as a living “contract”: every row must define measurement point, window length, and a pass/fail rule that exposes tails and switching transients.

Field Debug Playbook: symptom-first triage and root-cause paths

Field failures are rarely “PTP is bad.” Treat them as evidence problems: isolate whether the trigger is reference health, loop behavior, jitter-cleaner artifacts, or timestamp/ToD alignment.

Symptom → first checks (fast, high-signal)

Sudden time jump / offset spikes: confirm a reference switch or holdover entry occurred at the same timestamp; if no event exists, treat logging/config integrity as a primary suspect.
Holdover drift “way too big”: compare drift slope against declared ADEV/holdover limits; check temperature context and whether the loop incorrectly “chased” noisy GNSS before holdover.
ToD wrong by a day / leap-second chaos: validate ToD formatting source and 1PPS alignment; confirm time-scale flags and the last update moment around the incident.
SyncE flaps / jitter worsens: look for frequent QL/priority changes, switching transients, and new spurs on the output clock(s).

Non-negotiable for fast debugging: keep timestamped event logs for reference selection, holdover transitions, alarm asserts, and config hash.

Root-cause bucket #1 — GNSS/reference health (engineering evidence)

Evidence to capture: satellite count/CN0 trend, lock/unlock counters, holdover entries per hour, antenna power/current anomalies, and “reference valid” flags.
Actions: verify feedline continuity & grounding, surge protection path consistency, and whether an indoor distribution/re-radiation point is the true intermittent trigger.
Relevant MPN examples: u-blox ZED-F9T-10B / ZED-F9T-20B, u-blox RCB-F9T-1 timing board.

Root-cause bucket #2 — Disciplining loop configuration (bounded steps vs noisy chasing)

Evidence to capture: offset step size at re-lock, frequency correction saturation, settle time, and whether the recovery is “slew” or “step.”
Actions: verify configuration profile/version consistency (loop BW, filter profiles, reference priority). Force a controlled switch to see if the signature repeats.
Relevant MPN examples: ADI AD9545, Microchip ZL30772, Renesas 8A34001.

Root-cause bucket #3 — Jitter cleaner artifacts (spurs, switching transients)

Evidence to capture: phase-noise curve snapshots before/after the incident, discrete spur appearance, integrated jitter increase within the declared band.
Actions: correlate spur onset to reference switching or PSU/EMI changes; validate input jitter tolerance and “hitless switching” behavior if present.
Relevant MPN examples: Skyworks/Silicon Labs Si5345 (jitter-attenuating clock family).

Measurement model examples: Keysight E5052B, R&S FSWP, Microchip 53100A.

Root-cause bucket #4 — Timestamp/PHC/ToD alignment (tails and asymmetry)

Evidence to capture: Rx/Tx asymmetry, long-tail offset distribution (p99/p999), ToD vs 1PPS edge alignment, and whether tails only occur during recovery windows.
Actions: split statistics into steady vs recovery; validate ToD output policy (step vs slew boundaries) and port wiring reference for ToD/IRIG/1PPS.
Useful models for time-interval/phase checks: Keysight 53230A, Pendulum CNT-91.

Figure F11 — Field debug decision tree (symptom → evidence → action)

Keep the tree short: identify the evidence bucket first (reference / loop / cleaner / timestamp), then apply one controlled experiment to confirm.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (12): common selection, validation, and field-debug questions

Each answer is written for engineering decisions: clear boundary, what to verify, and which measurements/logs make the conclusion defensible.

FAQ answers

1) If a NIC already has a PHC, why add a Time Card?

A Time Card is justified when the system needs multi-reference discipline, strong holdover, and auditable, bounded behavior. A NIC PHC may timestamp packets well, but it typically lacks external reference diversity (GNSS/SyncE/10MHz/1PPS), higher-grade oscillators, and a validation-grade event trail for reference switching and holdover transitions.

Need signal: tail latency in offset (p99/p999) or intermittent time jumps during reference disturbances.
Capability gap: no deterministic “bounded step/slew” policy across re-lock and reference switching.
Operational gap: insufficient evidence (reason codes, timestamps, config hash) to reproduce incidents.

Example silicon (MPN): DPLL/synchronizer ADI AD9545, Microchip ZL30772, Renesas 8A34001; GNSS timing module u-blox ZED-F9T-10B/20B.

2) OCXO vs TCXO: where is the boundary, and which specs best predict holdover?

OCXO is usually chosen when holdover must remain tight across temperature and time; TCXO fits when cost/power/space dominate and holdover requirements are relaxed or the outage window is short. Predictive specs are those that track time error growth under loss of reference, not only “ppm at 25°C.”

Most predictive: Allan deviation (ADEV) at relevant τ windows, aging rate, and temperature sensitivity (including gradients).
Also important: close-in phase noise (for short-term stability) and g-sensitivity (mechanical stress/vibration).
Verification: measure drift slope over a realistic holdover duration; split steady vs recovery windows.

Example measurement models: phase noise/jitter Keysight E5052B or R&S FSWP; time interval/phase Keysight 53230A or Pendulum CNT-91.

3) GNSS is locked—why can offset still jump intermittently?

“Locked” can still be unhealthy. Intermittent offset jumps commonly come from reference quality fluctuation (multipath/interference), reference switching events, or a disciplining loop that tracks GNSS noise too aggressively. In practice, the fastest path is to prove whether the jump aligns with an event timestamp.

Reference health: CN0/satellite count changes, antenna feed anomalies, or interference bursts.
Switching transient: reference priority changes or “valid” flag flapping triggers a hit.
Loop behavior: too-wide loop bandwidth converts GNSS phase noise into PHC/ToD noise.

Example GNSS timing parts (MPN): u-blox ZED-F9T-10B/20B; timing board (Model): u-blox RCB-F9T-1.

4) If GPSDO loop bandwidth is wrong, what “typical field symptoms” appear?

Wrong bandwidth shows up as a tradeoff failure: too wide makes the clock “follow noise,” too narrow makes it “recover too slowly.” The symptoms that matter are those visible in offset distribution, time steps during re-lock, and jitter/phase-noise snapshots before/after events.

Too wide: higher short-term jitter, more offset spikes, time steps on reacquire, spurs coupling into outputs.
Too narrow: slow convergence, long recovery windows, larger residual drift after disturbances.
Evidence: repeated signature under a controlled reference loss/recovery test.

Example DPLL MPNs: ADI AD9545, Microchip ZL30772, Renesas 8A34001.

5) Why do many data centers use “SyncE for frequency, PTP for time”?

SyncE delivers a low-wander frequency foundation, while PTP distributes time/phase alignment. Combining them reduces the burden on the time servo: the PHC is disciplined by a cleaner, steadier frequency input, and PTP does not need to correct as much frequency error over the network.

Frequency path: SyncE recovered clock → jitter cleaner/DPLL → stable local frequency.
Time path: PTP packets → hardware timestamps → PHC/ToD phase alignment.
Outcome: fewer tails and less sensitivity to packet timing noise.

6) If SSM/QL switches frequently, what happens—and how to tell upstream vs local card fault?

Frequent QL switching causes repeated phase hits, wander growth, and long-tail offset degradation—often without obvious “average offset” changes. To separate upstream instability from local policy, correlate QL/priority changes and card switch events on a single timeline.

Upstream suspect: QL messages flap; multiple devices see the same reference instability window.
Local suspect: local thresholds/policies trigger unnecessary switching; event counters spike without upstream evidence.
Confirm: pin reference selection temporarily and replay the disturbance to see if tails vanish.

Example synchronizer class (MPN): Microchip ZL30772 (network sync / SyncE-aware designs commonly use this class).

7) After jitter cleaning, spurs increase—what are the most common root causes?

Spurs often grow when the cleaner is configured with an unsuitable bandwidth/profile, when reference switching transients leak through, or when power/ground coupling modulates the PLL. The fastest proof is a “before/after” phase-noise snapshot plus a controlled switch experiment.

Config-driven: wrong loop BW or fractional synthesis settings create deterministic spur families.
Switch-driven: hitless switching not actually hitless under real reference quality.
Coupling-driven: PSU ripple/ground bounce injects modulation into the PLL/VCO path.

Example jitter-attenuator family (MPN): Skyworks/Silicon Labs Si5345. Measurement models: Keysight E5052B, R&S FSWP.

8) How to turn phase noise L(f) and integrated jitter into an acceptance-ready spec?

An acceptance spec must declare measurement point and integration limits. L(f) is only comparable when the offset-frequency range, detector method, and bandwidth are fixed. Integrated jitter must state the RMS integration band (e.g., 12 kHz–20 MHz) and whether spurs are included.

Define: output node, termination/load, integration band, averaging/time, and pass/fail threshold.
Report: L(f) mask + RMS jitter + spur list (offset + amplitude) when relevant.
Validate: repeatability across temperature and across reference switching states.

Example instrument models: Keysight E5052B, R&S FSWP, Microchip 53100A.

9) During leap second or ToD updates, how to avoid “time rollback / reordering” in applications?

Avoid large time steps into the system timebase. Prefer bounded slew policies for ToD/PHC alignment, and isolate “update moments” from critical sequencing logic. In validation, force leap-second-like ToD transitions and verify that monotonic ordering is preserved in logs and event timestamps.

Policy: define when step is allowed vs forced slew; bound the maximum correction rate.
Evidence: record update timestamps, offset distributions, and any step events with reason codes.
Mitigation: segment stats: steady-state vs update/recovery windows; never mix them.

10) How long should holdover be tested, and which statistics avoid “short test looks good” traps?

Holdover testing must match outage risk and the oscillator’s relevant time constants. Short tests often miss temperature gradients and aging-like drift behavior. Use a duration long enough to reveal drift slope, then report both slope and tail risk (worst segments, not just averages).

Measure: time error growth vs time; capture temperature context; repeat across thermal conditions.
Stats: slope + max segment drift + p99/p999 within declared windows; split steady vs recovery.
Interpret: connect results to τ windows (ADEV) that matter for the target outage length.

Example time-interval/phase models: Keysight 53230A, Pendulum CNT-91.

11) What are the most common wiring/grounding pitfalls for 1PPS/10MHz/ToD?

The most frequent failures are not “bad clocks” but ground loops, shield discontinuities, and protection paths that inject noise. Treat 1PPS/10MHz/ToD as precision signals: control return paths, keep shielding continuous, and ensure surge currents do not share sensitive grounds.

Ground loops: multiple chassis bonds or mixed return paths create low-frequency wander and spur-like modulation.
Shield continuity: broken coax shield or adapters introduce pickup and edge timing noise.
Protection path: surge/ESD devices must dump to the intended reference (chassis/earth) without polluting signal ground.

Example interface parts (representative, not a BOM): coax connectors SMA/MCX; ESD clamp family examples: Semtech RClamp series (interface ESD class); coax surge protector example model: PolyPhaser class units.

12) What is the minimum pre-deploy test set to quickly screen unstable cards?

A minimal screen should force the two fastest failure modes: switching transients and tail growth. Combine a controlled reference loss/recovery, a short p99 window measurement, a quick spur snapshot, and a traceability check (config hash + event log). This catches the majority of “looks fine until field” escapes.

Test 1: forced reference loss/re-lock; verify bounded step/slew and repeatable signature.
Test 2: p99 offset/phase-error in a defined window (steady vs recovery split).
Test 3: spur check / integrated jitter band snapshot; verify no new discrete spurs.
Test 4: outputs present (1PPS/10MHz/ToD) + alarms + event logs + config hash record.

Example rack tester model: Calnex Paragon-neo. Phase noise model: Keysight E5052B or R&S FSWP.

Note: MPN/Model references are examples to anchor acceptance/debug vocabulary, not a purchasing recommendation.

Figure F12 — FAQ coverage map (Q groups → chapter clusters)

Clean grouping keeps the diagram readable on mobile while still showing how FAQs map to the main chapters.

Time Card (PTP/SyncE/GPSDO) for Data Center Servers

Time Card (PTP/SyncE/GPSDO) for Data Center Servers

H2-1 · Scope & Boundary — What this page solves

H2-2 · 1-Minute Answer — When a Time Card is truly needed

H2-3 · System Context — How a Time Card fits into a data-center server

H2-4 · Reference & Oscillator — What OCXO/TCXO/GNSS specs actually mean

H2-5 · Disciplining Loop — How GNSS/SyncE/external references are “tamed” by a DPLL

H2-6 · IEEE 1588 on a Time Card — Hardware timestamps and the PHC datapath

H2-7 · SyncE Integration — Why many data centers use “SyncE for frequency, PTP for time”

H2-8 · Jitter Cleaner PLL — Phase noise, jitter, and why “cleaning bandwidth” decides success

H2-9 · Interfaces & Form Factors — PCIe cards, OCP modules, connectors, and electrical constraints

Bring-up & Validation: a repeatable acceptance + production test plan

How to structure the validation (3 layers)

Lab verification (what to measure, how to state pass/fail)

Rack validation (prove tails + switching are bounded)

Production test (fast pass/fail + traceability)

Field Debug Playbook: symptom-first triage and root-cause paths

Symptom → first checks (fast, high-signal)

Root-cause bucket #1 — GNSS/reference health (engineering evidence)

Root-cause bucket #2 — Disciplining loop configuration (bounded steps vs noisy chasing)

Root-cause bucket #3 — Jitter cleaner artifacts (spurs, switching transients)

Root-cause bucket #4 — Timestamp/PHC/ToD alignment (tails and asymmetry)

Request a Quote

Accepted Formats

Attachment

FAQs (12): common selection, validation, and field-debug questions

FAQ answers

Explore

Categories

Get in Touch

Time Card (PTP/SyncE/GPSDO) for Data Center Servers

Time Card (PTP/SyncE/GPSDO) for Data Center Servers

H2-1 · Scope & Boundary — What this page solves

H2-2 · 1-Minute Answer — When a Time Card is truly needed

H2-3 · System Context — How a Time Card fits into a data-center server

H2-4 · Reference & Oscillator — What OCXO/TCXO/GNSS specs actually mean

H2-5 · Disciplining Loop — How GNSS/SyncE/external references are “tamed” by a DPLL

H2-6 · IEEE 1588 on a Time Card — Hardware timestamps and the PHC datapath

H2-7 · SyncE Integration — Why many data centers use “SyncE for frequency, PTP for time”

H2-8 · Jitter Cleaner PLL — Phase noise, jitter, and why “cleaning bandwidth” decides success

H2-9 · Interfaces & Form Factors — PCIe cards, OCP modules, connectors, and electrical constraints

Bring-up & Validation: a repeatable acceptance + production test plan

How to structure the validation (3 layers)

Lab verification (what to measure, how to state pass/fail)

Rack validation (prove tails + switching are bounded)

Production test (fast pass/fail + traceability)

Field Debug Playbook: symptom-first triage and root-cause paths

Symptom → first checks (fast, high-signal)

Root-cause bucket #1 — GNSS/reference health (engineering evidence)

Root-cause bucket #2 — Disciplining loop configuration (bounded steps vs noisy chasing)

Root-cause bucket #3 — Jitter cleaner artifacts (spurs, switching transients)

Root-cause bucket #4 — Timestamp/PHC/ToD alignment (tails and asymmetry)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

FAQs (12): common selection, validation, and field-debug questions

FAQ answers

Explore

Categories

Get in Touch