Timing Cards & Modules: Integrated PLL, Cleaner, Fanout & Alarms

Q: GNSS says “locked” but 1PPS phase still slowly drifts—first log which two counters?

Likely cause: GNSS lock indicates tracking, but timing quality is degraded so the disciplining integrator accumulates slow phase error. Quick check: Trend phase_error_slope (ps/s) and freq_steer_word (or equivalent DAC/FCW) over ≥ T hours while recording ref_quality. Fix: Tighten reference validation to reject “noisy-lock” and/or increase averaging/hysteresis; if GNSS receiver is marginal, validate with a timing-grade module (e.g., u-blox ZED-F9T / LEA-M8T) before changing loop targets. Pass criteria: Over T hours the absolute phase drift rate stays ≤ X ps/s and the steer word remains within ±Y% of its nominal range without repeated ref-quality drops.

Q: Holdover is fine at room temp but fails across temperature—what trend plot reveals it fastest?

Likely cause: Holdover model is under-calibrated versus temperature (or sensor placement misses the actual oscillator gradient), so prediction error spikes during thermal transitions. Quick check: Plot phase_error(t) together with temp_gradient = temp_osc - temp_board and holdover_residual during a controlled temp sweep. Fix: Re-run temperature calibration and update coefficients/EEPROM; if the platform requires stronger holdover, validate DPLL/holdover devices (e.g., ADI AD9545 or Microchip ZL30772) with correct sensor placement and airflow constraints. Pass criteria: Across the specified temperature range, holdover phase error remains inside the envelope E_holdover(t) ≤ X for at least T hours after reference loss.

Q: Cleaner output jitter is great, yet downstream FPGA occasionally loses lock—probe what at the connector?

Likely cause: The endpoint is failing on electrical integrity (swing/common-mode/termination/reflections) even though the source jitter is low. Quick check: At the card connector measure differential swing, common-mode level, and reflection/ringing (overshoot/undershoot) with the intended termination populated at the endpoint. Fix: Correct the output standard and termination, reduce stub length/return discontinuities, and if loading is heavy use a dedicated fanout/buffer stage (e.g., ADI ADCLK948 or TI LMK00334) per domain. Pass criteria: FPGA lock drop count equals 0 over T hours and connector waveform meets limits (e.g., overshoot/undershoot ≤ X mV and stable common-mode within ±Y mV).

Q: After failover, alignment is off by a fixed offset—what does that imply about delay table vs phase trim?

Likely cause: A fixed post-switch offset typically indicates an unaccounted fixed path latency (delay table mismatch) rather than random phase noise or lock instability. Quick check: Compare delay_table_id and phase_trim_value pre/post failover and confirm the measured offset is constant (±X ps) across repeated switches. Fix: Calibrate and store separate delay tables for main/backup paths (and for each output domain) and ensure the switch sequence applies the correct table before declaring “in-service.” Pass criteria: After any failover event, residual fixed offset ≤ X ps and channel-to-channel skew remains within the budget ≤ Y ps without manual re-trim.

Q: Periodic “time bump” every N minutes—how to tell disciplining step vs software timestamp jump?

Likely cause: The bump is either a deliberate phase step from disciplining policy or a discontinuity introduced by the timestamp/ToD distribution path. Quick check: Correlate the bump timestamps with phase_step_event_count/discipline_step_log and the host/ToD event log; if only the host log jumps, the source is software. Fix: If it is disciplining, switch to continuous steering or reduce step magnitude and increase smoothing; if it is software, enforce monotonic timestamp handling and audit the ToD update transaction. Pass criteria: No phase step exceeds X ps in magnitude and ToD/timestamps remain monotonic with max discontinuity ≤ Y ns over T hours.

Q: PTP input looks stable but card switches ref anyway—what health gate threshold is likely too tight?

Likely cause: Health gating is rejecting PTP on transient metrics (delay variation, offset spikes, or missing-stamp bursts) due to insufficient debounce/hysteresis. Quick check: Inspect the last 60–300 s before switch: switch_reason, ptp_offset_peak, and missing_stamp_count versus the configured thresholds. Fix: Add hysteresis and increase confirmation window for PTP degrade, and align thresholds to the system wander budget rather than instant jitter snapshots. Pass criteria: With stable PTP, ref switching does not occur for ≥ T days and any switch is preceded by metrics exceeding thresholds continuously for ≥ X seconds.

Q: Why does enabling SSC reduce EMI but break one output domain—what compatibility check first?

Likely cause: The affected endpoint PLL/CDR does not tolerate the applied spread depth/rate, even if other domains remain fine. Quick check: Verify SSC is enabled on the failing domain only, then measure modulation depth (ppm) and modulation rate at that output and compare to the endpoint tolerance spec. Fix: Disable SSC on sensitive domains while keeping it on EMI-critical ones, or route the sensitive domain through a non-spread path (typical clock-tree uses jitter attenuators like Si5345-class or conditioners like LMK04828-class with per-domain policy). Pass criteria: EMI peak reduction meets target while the sensitive endpoint shows 0 lock-loss events over T hours and phase/frequency excursions remain ≤ X/Y.

Q: Multi-output skew is good at boot but degrades over hours—what thermal gradient check?

Likely cause: Channel delay elements and routing experience drift under thermal gradients, so skew slowly walks even if the source remains locked. Quick check: Log per-channel skew_error alongside temp_osc and temp_board, then compute correlation with temp_gradient. Fix: Improve airflow/heat spreading, relocate/duplicate sensors, and enable periodic phase re-trim if supported (ensure trims are logged and bounded). Pass criteria: Over T hours and across operating temperatures, skew drift stays ≤ X ps (p-p) and does not correlate strongly with temperature (|r| ≤ Y).

Q: Alarm storms appear during power events—what to filter vs what must be immediate?

Likely cause: A rail transient triggers many dependent alarms simultaneously, and missing policy separation causes repeated debounce/retry loops. Quick check: Align timestamps of rail_uv/ov_event (or brownout) with the alarm burst rate (alarms/min) and verify whether resets coincide with switch_event. Fix: Debounce and rate-limit “secondary” alarms during known power-sequencing windows, but keep “hard” timing integrity alarms (loss-of-lock, missing pulse) immediate with clear single-shot actions. Pass criteria: During power events, alarm rate ≤ X alarms/min with no repeated oscillation, and critical alarms still assert within ≤ Y ms when truly violated.

Q: One channel shows higher jitter than others—how to isolate fanout loading/termination issue quickly?

Likely cause: The “bad” channel is seeing different loading/termination or crosstalk, increasing deterministic jitter and edge distortion. Quick check: Swap endpoint loads between two outputs and see whether the higher jitter follows the load, and measure connector reflections (ringing amplitude) on the affected path. Fix: Normalize termination and loading, reduce stubs, and use a robust per-output buffer if needed (e.g., ADCLK948 / LMK00334 class fanout) to isolate domains. Pass criteria: Channel-to-channel RMS jitter delta ≤ X fs (in the defined integration window) and reflection/ringing at the connector is ≤ Y mV (p-p).

← Back to:Reference Oscillators & Timing

A timing card/module is a system-level “time backbone” that turns uncertain time sources into verified, maintainable, multi-domain clocks—delivering controlled jitter, repeatable phase alignment, disciplined holdover, and actionable alarms for production and field operations.

This page explains how to design, integrate, validate, and operate timing cards/modules using measurable checks, acceptance criteria, and selection logic—so timing performance stays predictable across temperature, power events, and failovers.

Definition: What is a Timing Card / Module?

A timing card/module is a deployable clock subsystem that turns one or more time references into controlled clock outputs with repeatable alignment, auditable alarms, and maintainable holdover. It is designed to be integrated, validated, and operated as a system capability (not a single IC).

The system pain it fixes (why it exists)

Controlled jitter profile

Symptom: “Looks OK on one bench” but fails elsewhere.
Impact: downstream lock/quality becomes unpredictable.

Repeatable alignment

Symptom: reboot/reseat changes relative phase.
Impact: multi-card systems lose deterministic timing.

Observable alarms (audit-ready)

Symptom: “Something drifted” with no traceable evidence.
Impact: field debugging turns into guesswork.

Maintainable holdover

Symptom: reference degrades → time “jumps” or “walks away”.
Impact: no stable behavior can be guaranteed during outages.

Timing card vs timing module vs “clock tree board”

Clock tree board

Focus: distribution, levels, terminations, skew.
Often missing: disciplined holdover + auditable alarms as a closed loop.

Timing module

Focus: embeddable subsystem with defined I/O and control.
Typical fit: constrained space, moderate telemetry, integrated platforms.

Timing card

Focus: deployable + maintainable (upgrade, logs, alarms, redundancy).
Typical fit: systems that require operational SLAs and audit trails.

Typical inputs/outputs (card-level view)

Inputs (time sources)

GNSS / ToD + 1PPS (absolute time anchor)
PTP hardware-timestamped port (network time feed)
SyncE recovered clock (transport-grade frequency)
1PPS / 10 MHz (lab or system reference)

Outputs (clock domains)

Ref clocks (multi-output fanout to endpoints)
SYSREF / sync pulses (deterministic alignment hooks)
1PPS out (system time marker)
ToD distribution (time-of-day delivery to systems)

Scope lock for this page

Focus is on system integration, validation, disciplining/holdover behavior, and alarms. PLL math and protocol stack details are intentionally kept out to avoid cross-page overlap.

System position: a timing card/module sits between time sources and multi-domain endpoints, making clock quality and state observable and repeatable.

When to Use It: Discrete vs Card/Module (Decision Triggers)

Choose a timing card/module when clock alignment, time stability, alarms, and failover must behave like a system-level SLA. If the system can tolerate manual tuning and limited observability, a discrete clock tree can be more cost-effective.

Decision triggers (engineer-first)

Must-have triggers

Multi-chassis / multi-card alignment must be repeatable (ps–ns class), including after reboot or reseat.
Time/clock health must be auditable: alarms, timestamps, counters, and clear state transitions are required for operations.

High-ROI triggers

Redundancy and failover are required (A/B references, hitless switching goals, defined recovery behavior).
Production consistency matters: calibration parameters must be fixed, and acceptance tests must be repeatable at scale.

Practical “quantifiable” framing (without going into math)

Alignment need: is the requirement “repeatable after reboot” or “stable over hours across temperature”?
Observability need: are time events required to be logged with timestamps and state transitions?
Failover need: is a defined maximum phase transient required during switching?
Production need: is there a fixed acceptance workflow with stored calibration data and audit trails?

Typical fits (fast sanity check)

Discrete clock tree

Best when: single board/domain, low operational burden, manual tuning acceptable, limited telemetry required.

Timing module

Best when: embedded integration is needed, defined I/O is preferred, moderate alarms/logs, controlled deployment footprint.

Timing card

Best when: system-level SLAs, auditability, redundancy, remote operations, and repeatable acceptance tests are required.

Common cost of picking the wrong level

Too light (stays discrete)

Field faults become non-reproducible and hard to audit.
Alignment drifts or resets are difficult to bound.
Production variability increases without a fixed validation template.

Too heavy (over-spec card)

Unnecessary BOM/power/complexity and longer bring-up time.
More configuration states to validate and operate.
The real bottleneck may be elsewhere (layout, power noise, or endpoint constraints).

Use triggers (SLA, auditability, redundancy, and production repeatability) to select the appropriate integration level instead of chasing single-component specs.

Internal Architecture: The “Timing Stack” Inside

A timing card/module behaves like a small timing subsystem. The fastest way to understand it is to separate three parallel planes: Clock plane (what time flows through), Control plane (how behavior is configured), and Telemetry plane (what can be measured, alarmed, and audited).

The three-plane mental model (subsystem view)

Clock plane

Reference → synth/clean → fanout/levels → output domains.

Control plane

Mode selection, loop profiles, output mapping, thresholds, and stored calibration.

Telemetry plane

Lock/phase/frequency/temperature/rail status → alarms + logs + audit trail.

Five functional bricks (role → interfaces → failure signature)

Reference sources

Role: provide predictable short/mid-term stability for holdover and tracking.
Interfaces: local osc, (optional) tuning input, temperature sensing.
Signature: temperature-correlated drift, warm-up behavior changes.

Synth / Cleaner

Role: shape the jitter profile and manage modes (track vs clean).
Interfaces: reference input, loop profile, lock detect.
Signature: abnormal lock time, phase steps on mode changes.

Fanout / Levels

Role: deliver the conditioned clock to many endpoints with controlled skew.
Interfaces: per-output enable, level select, delay trim.
Signature: one output degrades due to loading/termination mismatch.

Monitor

Role: convert health into alarms and audit signals.
Interfaces: taps, thresholds, debounce, event timestamps.
Signature: alarm storms (too sensitive) or missed drift (too loose).

Control plane

Role: configuration, stored calibration, logging, remote operations.
Interfaces: MCU/FPGA, EEPROM, mgmt links, firmware control.
Signature: version drift or config mismatch causing behavior changes.

Engineering takeaway

Clock quality issues are rarely “one chip” problems on a card. The correct debug axis is: which plane failed (clock vs control vs telemetry) and which brick is responsible (reference / cleaner / fanout / monitor / control).

Internal stack: five bricks connected by three planes. Debugging becomes faster when a symptom is mapped to a plane (clock/control/telemetry) and then to a brick.

Inputs & References: Time Sources and Isolation Strategy

Cards/modules rarely rely on a single input. Multiple time sources are classified, then passed through health gates, and finally selected by priority + switching policy. The goal is predictable behavior during degraded inputs, not maximum sensitivity.

Input types (by timing meaning, not by connector)

Absolute time anchor

GNSS RF / ToD + 1PPS (time-of-day and a stable epoch marker).

Network time feed

PTP via a hardware-timestamped port (time updates plus path variability).

Frequency transport

SyncE recovered clock (frequency reference delivered by transport).

Local/lab reference

10 MHz and/or 1PPS from a system backplane or lab source.

Health gates (sanity checks that prevent “bad-but-preferred” inputs)

Freq offset gate

Reject inputs with frequency error beyond the allowed capture/hold window.

Phase step gate

Detect sudden phase jumps that would produce time “bumps” after selection.

Noise / stability gate

Prefer inputs with stable short-term behavior; avoid “flapping” between good/bad.

Continuity gate

Missing pulses/packets or unstable link states are treated as degraded even if averages look OK.

Key rule

Quality is not the same as priority. A high-priority input still must pass health gates. This prevents “preferred but unhealthy” references from dominating the system.

Isolation strategy (minimum set that prevents cross-domain contamination)

Power domain hygiene

Separate quiet rails for sensitive timing blocks; filter and control sequencing to avoid “healthy input, noisy output” surprises.

Ground/return control

Manage return paths across connectors and shields; avoid unintended current loops that convert cable motion into phase events.

Signal isolation

Use appropriate coupling/isolation on inputs; keep noisy digital edges from polluting reference-sensitive nodes.

Input arbitration: classify inputs, reject unhealthy references via gates, then select by priority with hysteresis/stable windows for predictable behavior.

Disciplining & Holdover: Control Modes and What “Good” Looks Like

The real value of a timing card/module is not “having clocks,” but bounded time behavior when references degrade or disappear. This section defines control modes as observable states, focuses on logs and measurable curves, and provides a reusable acceptance template (placeholders must be set by system requirements).

Three modes (defined by behavior, not by control theory)

Free-run

Intent: keep outputs running from the local oscillator only.
Entry: no valid external reference passes health gates.
Observable: phase error drifts according to local stability.
Log: mode, temp, tune word (if any), drift metrics.

Discipline (track)

Intent: steer local time/frequency to a selected reference.
Entry: reference selected + stable window satisfied.
Observable: phase error converges into a stable band.
Log: active_ref, loop profile, lock time, phase/freq error.

Holdover

Intent: maintain bounded time without the external reference.
Entry: reference fails gates; holdover policy asserted.
Observable: phase error grows within a defined envelope.
Log: last-good ref stats, temp, predicted drift, alarms.

What “good” looks like (curves and signatures)

Discipline: stable convergence

Phase error moves into a steady band and stays there.
No periodic “time bumps” tied to mode updates.
Recovery does not produce a visible step beyond policy limits.

Holdover: bounded envelope

Phase error grows predictably (mostly smooth slope).
Temperature changes shift slope, but remain bounded.
Alarms reflect genuine degradation, not noise flapping.

Common bad signatures

Phase steps during switching or recovery (“time jump”).
Holdover slope changes abruptly with minor temperature swings.
Alarm storms caused by missing hysteresis/stable window.

Acceptance templates (placeholders; set by system requirements)

Holdover phase drift

Test: enter holdover and observe for X hours.
Metric: peak/percentile of phase_error(t).
Pass: |phase_error| ≤ Y within X hours.

Frequency error bound

Metric: freq_error(t) and slope stability.
Pass: |freq_error| ≤ Z (Z depends on wander budget).

Mode transition behavior

Events: track↔holdover and recovery.
Pass: no phase step > A, alarms clear within B.

Minimum log set for auditability

mode • active_ref • health_gate_state • loop_profile • phase_error • freq_error • tune_word (or DAC) • temp • event_timestamp • firmware_version • config_hash

Modes are operational states. Transitions must be gated (health/stable window/hysteresis) and logged (event timeline + configuration identity).

Holdover acceptance is defined by an envelope over time. A “good” system stays in band across temperature changes; a “bad” system exits the band due to drift slope changes.

Output Clocking: Domains, Alignment, and Distribution Rules

Output clocking is domain management. A timing card/module must feed multiple endpoints while keeping skew, phase continuity, and configuration traceability under control. This section describes a practical approach without relying on interface-specific standards.

Output domains (organized by meaning)

System clock

The platform-wide timebase used as the common root for other domains.

RefClk

Continuous reference clocks delivered to endpoints that require jitter control.

SYSREF / sync pulse

Event-like alignment markers; managed separately from continuous clocks.

1PPS / ToD

Epoch marker (1PPS) and time-of-day distribution (data), often used for audit and coordination.

Alignment strategy (repeatable after reboot/reseat)

Fixed delay

Use when topology is stable and paths are repeatable; minimizes configuration states.

Programmable delay

Compensate assembly and path variation; store per-channel trim values for field reproducibility.

Phase trim

Use for fine alignment; treat as a closed-loop adjustment with a measurable before/after delta.

Skew budget template (structure only)

total_skew_budget = source_variation + fanout_variation + trace/connector + endpoint_variation (set each term and guardband for temperature and restart repeatability).

Termination & levels (card-level rules, no protocol dependence)

HCSL

Keep return paths clean across connectors; ensure output drive and termination strategy are consistent per channel.

LVDS

Maintain differential impedance continuity; avoid common-mode injection from noisy domains.

LVPECL

Treat supply noise as a jitter contributor; enforce consistent termination and avoid long stubs.

LVCMOS

Fast edges amplify coupling risk; keep routes short, manage series damping, and avoid crossing split returns.

Output acceptance templates (placeholders)

Intra-domain skew

Pass: |skew| ≤ S across channels, verified after reboot/reseat and across temperature.

Deterministic alignment

Pass: phase relationship returns within R after restart, with stored trim map and config hash.

Traceable output map

Export: enable/level/delay status per output (output_id) for field comparison and audit.

Output management: keep a clear hierarchy (cleaner→fanout→endpoints), add per-branch delay trim where variability exists, and make per-output configuration exportable for field audit.

Monitoring & Alarms: What to Measure, What to Log, How to Act

A timing card/module is operated as a closed-loop, observable system. Alarms are not “strings”; they are events with context, confidence (debounce/confirm), and a bounded action policy. This section focuses on on-card monitoring points and operational logic (not external NMS).

Alarm classes (organized by impact)

Time integrity

Examples: Loss-of-lock, phase step, wander out-of-band.
Impact: alignment/SLA risk.
Typical action: degrade or switch (policy-gated).

Reference health

Examples: freq offset, ref quality drop, GNSS degraded.
Impact: increased risk of drift and switching.
Typical action: tighten gates, change profile, prepare failover.

Hardware health

Examples: temp out-of-range, Vrail droop, sensor missing.
Impact: performance collapse or false positives if unhandled.
Typical action: protective degrade + strong logging.

Event chain (Detection → Debounce → Confirm → Report → Act)

Detection

Measure phase/freq error, lock state, ref quality, temperature, and rails. Treat each as a signal with a sampling policy.

Debounce

Use time windows/counters to avoid flapping. A single transient should not trigger a switch or storm.

Confirm

Require correlated evidence (e.g., phase step + lock transition) to raise confidence and reduce false positives.

Report

Emit an event object with context: active_ref, mode/profile, and before/after metrics around the trigger.

Act

Apply policy: log-only, degrade, or switch. Every action must be traceable to an event and reason code.

Action policy (bounded, auditable)

Log-only

Informational events and early warnings. Used for trending and root-cause correlation.

Degrade

Change operating profile or tighten gates; keep outputs stable while risk is rising (e.g., GNSS degraded).

Switch / failover

Trigger only on high-confidence events; log switch points and post-check results to prove correctness.

Minimum log fields (to reproduce and audit decisions)

Event header

event_id • timestamp • severity • reason_code • state_before • state_after

Timing context

active_ref • ref_quality • loop_mode • loop_profile • phase_error • freq_error

Hardware context

temp • Vrail(s) • sensor_status • lock_state • switch_events • firmware_version • config_hash

Operational rule

Any switch must include: trigger event + before/after window metrics + post-check result + rollback path (if fail).

Treat alarms as event objects: detect, debounce, confirm, report, then act by policy (log-only / degrade / switch), with traceable before/after windows.

Redundancy & Failover: Hitless Switching and Guard Paths

Redundancy is not “having a backup.” It is a measurable switching policy that keeps critical domains stable under reference faults. This section describes on-card A/B references, guard paths, and acceptance templates for hitless behavior (placeholders set by system requirements).

Redundancy targets (what is actually duplicated)

Reference redundancy

A/B references (e.g., two independent sources). Health gates decide eligibility and priority.

Path redundancy

Main/backup routing, including guard/bypass paths. The backup is monitored continuously (not cold).

Module redundancy

Dual modules/cards are supported by consistent configuration identity and comparable telemetry.

Guard paths (keep the backup “ready and comparable”)

Continuous health

The guard path evaluates ref quality and lock readiness so switching is not a cold-start gamble.

Compare & pre-align

Maintain a comparable phase/freq view to reduce phase steps at the switch point.

Traceable readiness

Readiness is logged as a state: eligible/not eligible, with reason codes and thresholds used.

Hitless switching (defined by allowed transients)

Phase transient

Pass: |Δphase| ≤ P at switch (P set by system window).

Frequency transient

Pass: |Δfreq| ≤ F, settles within T.

No glitch behavior

Critical domains must avoid missing/double pulses at the switch point (domain-specific rules).

Exercise template (black-box + rollback)

Routine: force main→backup→main; record before/after windows; pass if Δphase ≤ P and Δfreq ≤ F within T, alarms clear within B; if post-check fails, auto rollback and log reason_code.

Guard paths keep the backup ready and comparable. Hitless switching is defined by allowed phase/frequency transients (P/F/T), verified by post-check windows and rollback rules.

System Integration: Power, Thermal, EMC, and Backplane Reality

Timing cards/modules are unusually sensitive to supply noise, return paths, connector/backplane behavior, and thermal gradients. This section focuses on the integration-specific failure modes that typically turn “bench-good” into “system-bad”.

Power: low-noise rails, domain partitioning, filters, and boot sequencing

Define “quiet” rails

Treat the reference/cleaner/VCXO supply domain as a performance limiter. Track rail noise at the same time as jitter/phase error to prove causality.

Partition domains

Separate timing-analog from control/telemetry digital. Keep high di/dt loads out of the analog island return path and regulator headroom.

Filter with intent

Filters must target the dominant noise bands (switching fundamentals/harmonics and coupling points). Over-filtering can destabilize rails or increase droop during load steps.

Power-up sequencing

A repeatable sequence avoids false lock and alarm storms: stabilize rails → load config → enable outputs → enable alarm actions (policy gates last).

Thermal: gradients, airflow, and sensor placement

Gradient is the enemy

For OCXO/TCXO, a stable average temperature may still drift if the device sees a moving gradient. Monitor both temperature and its rate-of-change.

Airflow realism

Place oscillators away from pulsed airflow and adjacent hot spots. A “cold” location near a fan can create periodic thermal modulation.

Sensor placement

Sensors must represent the true drift source, not a distant board average. A “good-looking” sensor can hide a local gradient near the oscillator.

Backplane & chassis: returns, common-mode noise, reflections, and cable length

Return paths matter

Backplane and chassis returns can inject common-mode noise into clock paths, showing up as elevated jitter or slow alignment drift. Treat return topology as a first-class design input.

Connector reflections

Connectors/backplanes can create reflections that don’t “break” a scope check but do degrade phase stability. Validate at the output port and at the endpoint, not only at a nearby test pad.

Cable length discipline

For cross-card alignment, “controlled and repeatable” length beats “short.” Any mismatch becomes deterministic delay error and reduces margin.

Integration checklist (risk → quick check → fix → pass)

Rail noise couples into jitter

Risk: switching rail artifacts raise random jitter / create spurs.
Quick check: compare jitter/phase_error with management traffic OFF vs ON; log Vrail ripple at the same time.
Fix: separate rails, add targeted filtering, re-route returns to keep digital currents out of analog island.
Pass: incremental jitter/phase_error change ≤ ΔJ / ΔP (placeholders set by system budget).

Thermal gradient creates slow drift

Risk: stable average temperature but moving gradient causes wander and false trend alarms.
Quick check: correlate phase_error slope with temp slope and fan/airflow states; look for periodic modulation.
Fix: move oscillator away from hot spots/pulsed airflow; relocate/duplicate sensors near the drift source.
Pass: drift slope vs temperature stays within the holdover envelope (system-defined).

Backplane common-mode & reflections

Risk: connector/backplane behavior degrades phase stability even when waveforms look “ok”.
Quick check: measure at port and at endpoint; compare skew/jitter with direct-cable vs backplane path (one-variable change).
Fix: tighten terminations/levels, add common-mode control where required, enforce cable length rules for aligned domains.
Pass: endpoint jitter/skew meets budget with backplane installed (not only on bench).

A timing card’s performance is dominated by domain separation and controlled bridges (filters/isolation/returns). Backplane and chassis behavior must be treated as part of the timing system.

Validation & Acceptance: Bench Tests That Actually De-risk Deployment

Acceptance should be a repeatable engineering flow: define baselines, lock measurement windows, prove one-variable deltas, and record context. The goal is not “pretty plots” but de-risking deployment by isolating power/backplane/thermal effects and verifying modes (holdover and failover) with auditable evidence.

Test setup rules (so results remain comparable)

Baseline first

Measure the reference source and the DUT under a “golden” setup before stressing power/thermal/backplane variables.

One-variable deltas

Change a single variable per run (rail noise, airflow, backplane path, loop profile). Record configuration identity for every run.

Window discipline

Keep jitter bandwidth/integration time and alignment observation windows fixed. Report both absolute results and incremental deltas.

Output phase noise / jitter (measure points + windows + baselines)

Where to probe

Compare internal-cleaner output vs port output vs endpoint. This isolates fanout/connectors/backplane contributions.

Window definition

Use a fixed RMS jitter window (placeholder BW) and/or offset PN points that match system sensitivity.

Acceptance style

Prefer “delta to baseline” acceptance: jitter ≤ J or incremental increase ≤ ΔJ (placeholders set by budget).

Phase alignment (multi-channel + cross-card + temperature deltas)

Skew budget

Treat end-to-end delay as a budget: routing + connectors + cables + programmable delay trim. Verify budget at endpoints.

Cross-card proof

Validate with the intended backplane/cabling. A bench-only result may hide connector reflections and return-path coupling.

Temp sweep delta

Compare skew before/after temperature changes. Record Δskew vs ΔT and ensure drift stays within alignment margin.

Holdover (loss-of-reference, thermal change, and aging trend)

Reference cut test

Remove the external reference and log phase_error(t) and freq_error(t). Compare the envelope to the system budget.

Thermal change

Apply a controlled temperature step/ramp. Validate drift slope and compensation behavior under realistic gradients.

Aging trend

Use accelerated comparison or long-run deltas to confirm drift is predictable and consistent across units and time.

Acceptance placeholders (bind to SLA)

Pass examples: within X hours, |phase_error| ≤ Y; or |freq_error| ≤ Z. Values X/Y/Z depend on system alignment and service requirements.

Failover (transients, alarm correctness, recovery time)

Switch transient

Measure Δphase and Δfreq around the switch point using fixed observation windows. Validate “no glitch” rules for critical domains.

Alarm correctness

Confirm the full chain: detection → debounce → confirm → report → action. A switch without a traceable trigger is not acceptable.

Recovery time

Measure time from fault injection to stable outputs and cleared alarms. Define a rollback path and verify it in the same test plan.

A de-risking acceptance flow measures at fixed test points (TP1/TP2/TP3), compares against baselines, and uses one-variable deltas (power/thermal/backplane) to isolate root causes.

Engineering Checklist: Bring-up → Production → Field

This section turns the timing-card “capabilities” into executable stage-gates. Each gate contains only actions, required evidence, and measurable pass criteria (placeholders such as X/Y/Z/T must be set by the system timing budget and SLA).

Gate G1

Bring-up (Lock → Mode transitions → Output sanity)

Goal

Prove the card locks, switches modes deterministically, and drives endpoints with correct electrical levels and domain mapping.

Actions

Lock check: verify reference selection, lock indicators, and “healthy” state under nominal input.
Mode walk: execute Free-run → Discipline → Holdover transitions; record the exact trigger used.
Output electrical: validate standard + termination at both the port and the endpoint (HCSL/LVDS/LVPECL/LVCMOS).
Domain mapping: confirm each output domain is routed to the intended consumer (system clock / refclk / sysref / pps / ToD).
Alarm sanity: inject a controlled fault (reference removed / degraded) and confirm alarm + log closure.

Evidence to capture

Config snapshot (register dump / profile ID), firmware version, and build hash.
Lock timeline and mode transition timestamps (with input ref quality label).
Endpoint measurement screenshots (level + termination + jitter/phase delta vs baseline).
Alarm event entries for each injected fault + recovery.

Pass criteria (placeholders)

Lock time ≤ T_lock and remains locked for ≥ T_stable under nominal conditions.
Mode transitions produce no endpoint faults; phase transient ≤ Δφ_switch.
Additive jitter at endpoints ≤ ΔJ_budget (window defined by the system spec).
Alarm injection produces: detect → debounce → confirm → report → action, all within ≤ T_alarm.

Gate G2

Production (Calibration → Sealing → Sampling plan)

Goal

Ensure repeatable timing behavior across units and lots by freezing calibration parameters and enforcing traceable acceptance records.

Actions

Calibration items: temperature compensation coefficients, frequency offset trim, delay table (channel alignment), holdover model parameters.
Sealing: write EEPROM/flash, verify CRC/signature, lock critical fields; bind to serial number.
Golden baseline: compare each unit against a golden reference (relative deltas preferred over absolute).
Sampling strategy: define lot sampling rate and re-test triggers (process change / firmware change / component swap).

Evidence to capture

Calibration record: coefficients + delay table + firmware/build ID.
Acceptance summary: jitter/phase delta vs golden; holdover short test snapshot.
Configuration hash + EEPROM CRC report (pass/fail).

Pass criteria (placeholders)

Config sealing succeeds (CRC/signature valid) with version match.
Relative deltas vs golden: jitter ≤ ΔJ_golden, skew ≤ ΔSkew_golden.
Lot statistics meet thresholds: out-of-family rate ≤ R_oof, drift shift ≤ ΔDrift_lot.

Gate G3

Field (Alarm policy → Log rotation → OTA upgrade + rollback → Drills)

Goal

Make timing behavior auditable and recoverable: actionable alarms, complete logs, safe remote updates, and repeatable drills.

Actions

Alarm policy: define warning/degrade/failover thresholds and the exact action taken for each state.
Log rotation: set retention, roll size, and “must-keep” fields (timestamp, ref quality, loop mode, phase/freq error, temp, rail status, switch events).
Remote upgrade: enforce pre/post checks; require rollback trigger conditions and a validated recovery path.
Periodic drills: scheduled failover drill, holdover drill, and alarm chain drill (black-box success criteria).

Evidence to capture

Drill reports: trigger → detected → action → recovered timeline.
Upgrade reports: pre-check snapshot, post-check snapshot, and rollback record (if used).
Degrade/failover logs with root-cause tags (ref degraded, temp excursion, rail anomaly).

Pass criteria (placeholders)

Alarm-to-action latency ≤ T_action; false-trigger rate ≤ R_false.
Failover drill: phase transient ≤ Δφ_hitless; service impact = “none” by system definition.
Holdover drill: phase error envelope ≤ E_holdover(t) over X hours.
Rollback completes within ≤ T_rollback and restores last-known-good timing profile.

Diagram: Stage-gates (G1/G2/G3) with Actions / Evidence / Pass criteria (conceptual)

Applications & IC Selection Notes (Card-Level Selection Logic)

This is not a shopping list. It is a card-level selection method: required capabilities → system constraints → spec-writing rules. The material numbers below are starting points for datasheet lookup and lab validation; package, lifecycle, and availability must be verified.

A) Selection dimensions (capabilities to specify)

Inputs: GNSS / PTP (hardware timestamp) / SyncE recovered clock / 1PPS / 10 MHz; multi-source arbitration needed or not.
Outputs: count + standards (HCSL/LVDS/LVPECL/LVCMOS) + special domains (SYSREF / PPS / ToD).
Alignment: on-card channel skew budget and cross-card phase alignment target (ps–ns class).
Holdover: define an error envelope over time E_holdover(t) (phase vs time), not a single number.
Alarms: required signals/telemetry (GPIO/I²C/host) and a graded policy (warning/degrade/failover).
Redundancy: A/B ref, hitless definition (Δφ_hitless, Δf_hitless), and drill requirements.

B) System constraints (what silently breaks timing)

Thermal reality: airflow stability, gradients across OCXO/TCXO/MEMS area, and sensor placement for control decisions.
Power noise: rail noise density, load steps, and isolation between analog timing island vs digital control plane.
Backplane/cabling: reflections, common-mode injection, and ground potential differences across chassis.
Operations: remote-only vs local serviceability, allowed downtime for updates, required log retention and audit trail.

C) Risk notes (how to write specs that are testable)

Typical vs worst-case: require worst-case across temperature, rails, and chosen reference inputs; typ-only specs are not deployable.
Bind every number to a window: RMS jitter must state integration limits; phase/ToD must state averaging time and measurement method.
Budget alignment: card targets must roll up to a system budget (converter SNR, SerDes tolerance, network SLA).
Auditability: every “degrade/failover” decision must be provable by logs (fields + timestamps + thresholds).
Lifecycle reality: check PCNs, NRND/obsolete status, and second-source plan for long-life programs.

D) Reference material numbers (starting points only)

Grouped by function blocks commonly found in timing cards/modules. Verify package suffix, grade, lifecycle, and timing performance in the intended measurement window.

Clock synchronizer / DPLL (time & frequency)

AD9545 (ADI) — clock synchronizer / DPLL platform :contentReference[oaicite:13]{index=13}
ZL30772 (Microchip) — packet/SyncE DPLL class device :contentReference[oaicite:14]{index=14}

Jitter cleaner / clock generator (converter & SerDes trees)

Si5345 (Skyworks/Silicon Labs line) — jitter attenuator family :contentReference[oaicite:15]{index=15}
LMK04828 (TI) — jitter cleaner + distribution class device :contentReference[oaicite:16]{index=16}
AD9528 (ADI) — JESD clock generator class device :contentReference[oaicite:17]{index=17}
HMC7044 (ADI) — dual-loop jitter attenuator class device :contentReference[oaicite:18]{index=18}

Fanout buffer / level translation (endpoint driving)

ADCLK948 (ADI) — low-jitter fanout buffer family :contentReference[oaicite:19]{index=19}
LMK00334 (TI) — clock buffer / level translator class device :contentReference[oaicite:20]{index=20}

Low-noise power (timing island rails)

ADM7150 (ADI) — ultralow-noise LDO class device :contentReference[oaicite:21]{index=21}

Isolation + sensors (control plane robustness)

ADuM1250 (ADI) — I²C isolator class device :contentReference[oaicite:22]{index=22}
TMP117 (TI) — digital temperature sensor class device :contentReference[oaicite:23]{index=23}

GNSS timing receiver modules (if GNSS disciplining is required)

ZED-F9T (u-blox) — timing GNSS module :contentReference[oaicite:24]{index=24}
LEA-M8T (u-blox) — timing GNSS module family :contentReference[oaicite:25]{index=25}
mosaic-T (Septentrio) — GNSS timing receiver module :contentReference[oaicite:26]{index=26}
LC29H (Quectel) — dual-band GNSS module series :contentReference[oaicite:27]{index=27}

Practical note: material numbers above are intended for “block matching” (DPLL / cleaner / fanout / rails / sensors / GNSS). Final selection must be driven by the measurable acceptance criteria in H2-10 and stage-gates in H2-11.

Diagram: Scenario × Capability matrix (✓ required / ! high risk / – optional)

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Troubleshooting Only) + JSON-LD

These FAQs only close long-tail troubleshooting within the timing card/module boundary. Each answer is a data-driven 4-line checklist with measurable probes and pass criteria placeholders (X/Y/Z/T) that must be set by the system timing budget and SLA.

Recommended log fields (map to your device registers/telemetry)

ref_selected ref_quality loop_mode phase_error freq_error switch_event switch_reason temp_osc temp_board rail_event config_hash fw_version

GNSS says “locked” but 1PPS phase still slowly drifts—first log which two counters?

Likely cause: GNSS lock indicates tracking, but timing quality is degraded so the disciplining integrator accumulates slow phase error.

Quick check: Trend phase_error_slope (ps/s) and freq_steer_word (or equivalent DAC/FCW) over ≥ T hours while recording ref_quality.

Fix: Tighten reference validation to reject “noisy-lock” and/or increase averaging/hysteresis; if GNSS receiver is marginal, validate with a timing-grade module (e.g., u-blox ZED-F9T / LEA-M8T) before changing loop targets.

Pass criteria: Over T hours the absolute phase drift rate stays ≤ X ps/s and the steer word remains within ±Y% of its nominal range without repeated ref-quality drops.

Holdover is fine at room temp but fails across temperature—what trend plot reveals it fastest?

Likely cause: Holdover model is under-calibrated versus temperature (or sensor placement misses the actual oscillator gradient), so prediction error spikes during thermal transitions.

Quick check: Plot phase_error(t) together with temp_gradient = temp_osc - temp_board and holdover_residual during a controlled temp sweep.

Fix: Re-run temperature calibration and update coefficients/EEPROM; if the platform requires stronger holdover, validate DPLL/holdover devices (e.g., ADI AD9545 or Microchip ZL30772) with correct sensor placement and airflow constraints.

Pass criteria: Across the specified temperature range, holdover phase error remains inside the envelope E_holdover(t) ≤ X for at least T hours after reference loss.

Cleaner output jitter is great, yet downstream FPGA occasionally loses lock—probe what at the connector?

Likely cause: The endpoint is failing on electrical integrity (swing/common-mode/termination/reflections) even though the source jitter is low.

Quick check: At the card connector measure differential swing, common-mode level, and reflection/ringing (overshoot/undershoot) with the intended termination populated at the endpoint.

Fix: Correct the output standard and termination, reduce stub length/return discontinuities, and if loading is heavy use a dedicated fanout/buffer stage (e.g., ADI ADCLK948 or TI LMK00334) per domain.

Pass criteria: FPGA lock drop count equals 0 over T hours and connector waveform meets limits (e.g., overshoot/undershoot ≤ X mV and stable common-mode within ±Y mV).

After failover, alignment is off by a fixed offset—what does that imply about delay table vs phase trim?

Likely cause: A fixed post-switch offset typically indicates an unaccounted fixed path latency (delay table mismatch) rather than random phase noise or lock instability.

Quick check: Compare delay_table_id and phase_trim_value pre/post failover and confirm the measured offset is constant (±X ps) across repeated switches.

Fix: Calibrate and store separate delay tables for main/backup paths (and for each output domain) and ensure the switch sequence applies the correct table before declaring “in-service.”

Pass criteria: After any failover event, residual fixed offset ≤ X ps and channel-to-channel skew remains within the budget ≤ Y ps without manual re-trim.

Periodic “time bump” every N minutes—how to tell disciplining step vs software timestamp jump?

Likely cause: The bump is either a deliberate phase step from disciplining policy or a discontinuity introduced by the timestamp/ToD distribution path.

Quick check: Correlate the bump timestamps with phase_step_event_count/discipline_step_log and the host/ToD event log; if only the host log jumps, the source is software.

Fix: If it is disciplining, switch to continuous steering or reduce step magnitude and increase smoothing; if it is software, enforce monotonic timestamp handling and audit the ToD update transaction.

Pass criteria: No phase step exceeds X ps in magnitude and ToD/timestamps remain monotonic with max discontinuity ≤ Y ns over T hours.

PTP input looks stable but card switches ref anyway—what health gate threshold is likely too tight?

Likely cause: Health gating is rejecting PTP on transient metrics (delay variation, offset spikes, or missing-stamp bursts) due to insufficient debounce/hysteresis.

Quick check: Inspect the last 60–300 s before switch: switch_reason, ptp_offset_peak, and missing_stamp_count versus the configured thresholds.

Fix: Add hysteresis and increase confirmation window for PTP degrade, and align thresholds to the system wander budget rather than instant jitter snapshots.

Pass criteria: With stable PTP, ref switching does not occur for ≥ T days and any switch is preceded by metrics exceeding thresholds continuously for ≥ X seconds.

Why does enabling SSC reduce EMI but break one output domain—what compatibility check first?

Likely cause: The affected endpoint PLL/CDR does not tolerate the applied spread depth/rate, even if other domains remain fine.

Quick check: Verify SSC is enabled on the failing domain only, then measure modulation depth (ppm) and modulation rate at that output and compare to the endpoint tolerance spec.

Fix: Disable SSC on sensitive domains while keeping it on EMI-critical ones, or route the sensitive domain through a non-spread path (typical clock-tree uses jitter attenuators like Si5345-class or conditioners like LMK04828-class with per-domain policy).

Pass criteria: EMI peak reduction meets target while the sensitive endpoint shows 0 lock-loss events over T hours and phase/frequency excursions remain ≤ X/Y.

Multi-output skew is good at boot but degrades over hours—what thermal gradient check?

Likely cause: Channel delay elements and routing experience drift under thermal gradients, so skew slowly walks even if the source remains locked.

Quick check: Log per-channel skew_error alongside temp_osc and temp_board, then compute correlation with temp_gradient.

Fix: Improve airflow/heat spreading, relocate/duplicate sensors, and enable periodic phase re-trim if supported (ensure trims are logged and bounded).

Pass criteria: Over T hours and across operating temperatures, skew drift stays ≤ X ps (p-p) and does not correlate strongly with temperature (|r| ≤ Y).

Alarm storms appear during power events—what to filter vs what must be immediate?

Likely cause: A rail transient triggers many dependent alarms simultaneously, and missing policy separation causes repeated debounce/retry loops.

Quick check: Align timestamps of rail_uv/ov_event (or brownout) with the alarm burst rate (alarms/min) and verify whether resets coincide with switch_event.

Fix: Debounce and rate-limit “secondary” alarms during known power-sequencing windows, but keep “hard” timing integrity alarms (loss-of-lock, missing pulse) immediate with clear single-shot actions.

Pass criteria: During power events, alarm rate ≤ X alarms/min with no repeated oscillation, and critical alarms still assert within ≤ Y ms when truly violated.

One channel shows higher jitter than others—how to isolate fanout loading/termination issue quickly?

Likely cause: The “bad” channel is seeing different loading/termination or crosstalk, increasing deterministic jitter and edge distortion.

Quick check: Swap endpoint loads between two outputs and see whether the higher jitter follows the load, and measure connector reflections (ringing amplitude) on the affected path.

Fix: Normalize termination and loading, reduce stubs, and use a robust per-output buffer if needed (e.g., ADCLK948 / LMK00334 class fanout) to isolate domains.

Pass criteria: Channel-to-channel RMS jitter delta ≤ X fs (in the defined integration window) and reflection/ringing at the connector is ≤ Y mV (p-p).

Phase monitor shows noise but system works—what measurement bandwidth/window mistake is common?

Likely cause: The monitor is integrating the wrong band/timebase (mixing jitter with wander or using inconsistent averaging), producing “noise” that is not relevant to the system budget.

Quick check: Record the analyzer integration limits (f1..f2) and averaging time, then re-run using the exact window defined in acceptance (same reference path and trigger).

Fix: Standardize a single measurement recipe (window + averaging + reference) and validate it against a known-good baseline trace before concluding a hardware issue.

Pass criteria: With the correct window, measured RMS jitter/phase stats fall within the system budget ≤ X and correlate with observable system behavior (no false-fail alerts).

Firmware update changed timing behavior—what “golden log snapshot” should you compare?

Likely cause: Default profiles or calibration mappings changed (loop bandwidth, thresholds, delay tables), shifting behavior even if hardware is unchanged.

Quick check: Compare a golden snapshot set: fw_version, profile_id, config_hash, plus loop_mode, ref_quality, and summary stats of phase_error/freq_error under the same input conditions.

Fix: Restore the prior timing profile, migrate EEPROM calibration fields explicitly, and re-run a short acceptance suite; if the design uses DPLL/cleaner blocks, validate config equivalence for devices like AD9545, ZL30772, Si5345, LMK04828 class parts.

Pass criteria: Post-update deltas versus golden remain within limits (jitter ≤ X, skew ≤ Y, holdover envelope unchanged) and no new unexpected switch events occur over T hours.

Timing Cards & Modules: Integrated PLL, Cleaner, Fanout & Alarms

Timing Cards & Modules: Integrated PLL, Cleaner, Fanout & Alarms

Definition: What is a Timing Card / Module?

The system pain it fixes (why it exists)

Timing card vs timing module vs “clock tree board”

Typical inputs/outputs (card-level view)

When to Use It: Discrete vs Card/Module (Decision Triggers)

Decision triggers (engineer-first)

Typical fits (fast sanity check)

Common cost of picking the wrong level

Internal Architecture: The “Timing Stack” Inside

The three-plane mental model (subsystem view)

Five functional bricks (role → interfaces → failure signature)

Inputs & References: Time Sources and Isolation Strategy

Input types (by timing meaning, not by connector)

Health gates (sanity checks that prevent “bad-but-preferred” inputs)

Isolation strategy (minimum set that prevents cross-domain contamination)

Disciplining & Holdover: Control Modes and What “Good” Looks Like

Three modes (defined by behavior, not by control theory)

What “good” looks like (curves and signatures)

Acceptance templates (placeholders; set by system requirements)

Output Clocking: Domains, Alignment, and Distribution Rules

Output domains (organized by meaning)

Alignment strategy (repeatable after reboot/reseat)

Termination & levels (card-level rules, no protocol dependence)

Output acceptance templates (placeholders)

Monitoring & Alarms: What to Measure, What to Log, How to Act

Alarm classes (organized by impact)

Event chain (Detection → Debounce → Confirm → Report → Act)

Action policy (bounded, auditable)

Minimum log fields (to reproduce and audit decisions)

Redundancy & Failover: Hitless Switching and Guard Paths

Redundancy targets (what is actually duplicated)

Guard paths (keep the backup “ready and comparable”)

Hitless switching (defined by allowed transients)

System Integration: Power, Thermal, EMC, and Backplane Reality

Power: low-noise rails, domain partitioning, filters, and boot sequencing

Thermal: gradients, airflow, and sensor placement

Backplane & chassis: returns, common-mode noise, reflections, and cable length

Integration checklist (risk → quick check → fix → pass)

Validation & Acceptance: Bench Tests That Actually De-risk Deployment

Test setup rules (so results remain comparable)

Output phase noise / jitter (measure points + windows + baselines)

Phase alignment (multi-channel + cross-card + temperature deltas)

Holdover (loss-of-reference, thermal change, and aging trend)

Failover (transients, alarm correctness, recovery time)

Engineering Checklist: Bring-up → Production → Field

Bring-up (Lock → Mode transitions → Output sanity)

Production (Calibration → Sealing → Sampling plan)

Field (Alarm policy → Log rotation → OTA upgrade + rollback → Drills)

Applications & IC Selection Notes (Card-Level Selection Logic)

A) Selection dimensions (capabilities to specify)

B) System constraints (what silently breaks timing)

C) Risk notes (how to write specs that are testable)

D) Reference material numbers (starting points only)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

FAQs (Troubleshooting Only) + JSON-LD

Explore

Categories

Get in Touch