Phase & Frequency Monitors for Channel Alignment
← Back to:Reference Oscillators & Timing
What Phase/Frequency Monitors Are — and When They Are Needed
Phase/Frequency monitors quantify the relationship between clock channels (phase offset and frequency offset) so alignment, health alarms, and switch qualification can be driven by measurable numbers and clear pass criteria.
- In-scope: observability and decision logic for phase/frequency relationships (offsets, trends, steps, thresholds, qualification).
- Out-of-scope: phase noise / RMS jitter theory and integration windows. Those belong on the canonical page “Phase Noise & Jitter” (link internally).
- Alignment never converges: delay trims move, but the measured offset does not settle into a stable band.
- Rare “slip” after hours: offset drifts slowly and then a sudden phase step appears (often a threshold crossing or mechanical/thermal event).
- “Hitless” switching still looks like a jump: the endpoint seems fine, yet the monitor flags a step (tap path sees a transient).
- Temperature sweep breaks alignment: phase error changes with temperature slope (thermal gradients in routes/buffers/connectors).
- Multi-board sync is inconsistent: identical firmware, different boards show different baseline offsets (manufacturing skew + tap topology differences).
- Numbers change with averaging window: telemetry “improves” or “worsens” when filters/windowing change (estimator bias).
- Phase readings look noisy despite “good clocks”: edge-slew and trigger threshold noise dominate time measurement.
- Channel-to-channel phase offset (ps/deg) or frequency offset (ppm/ppb) must be measured and bounded.
- Alignment requires repeatable pass/fail criteria before enabling a mode, lane, or module.
- Switching needs qualification (stable window + persistence) before selecting a new clock path.
- Health monitoring needs step / drift / trend alarms with low false-trigger rate.
- The goal is random jitter / PN integration / converter SNR budgeting → link to Phase Noise & Jitter.
- The failure is missing pulses / stuck clocks / loss-of-lock switchover → link to Clock Monitor / Missing-Pulse.
- The topic is PTP/SyncE/GNSS disciplining, protocol timestamps, or system time recovery → link to Timing & Synchronization.
- The task is tuning PLL loop bandwidth/stability/jitter transfer → link to Loop Bandwidth.
Metrics That Actually Matter: Phase Offset, Frequency Offset, Skew, TIE, Drift
Phase/Frequency monitoring becomes reliable only after the “metric vocabulary” is fixed. Each number must map to a decision: alignment, alarms, or switch qualification. Misreading a metric is a common root cause of false alarms and non-converging trims.
- State: phase offset, skew (what the system “is” right now).
- Trend: frequency offset, TIE series, drift (how the system “moves” over time/temperature).
- Events: phase steps (sudden changes) must not be averaged away.
Common misuse: interpreting phase offset as a standalone “jitter quality” score.
Pass example: |phase| < X ps continuously for T seconds before enabling a mode/switch.
Common misuse: relying on Δf alone and missing sudden phase steps or transient events.
Pass example: |Δf| < X ppm averaged over gate/window W, and remains within band across temperature.
Common misuse: treating drift as skew (temperature and aging are dynamic; skew is the static baseline).
Pass example: channel-to-channel skew < X ps under known-good conditions before trimming.
Common misuse: comparing a TIE series directly to a datasheet jitter spec without matching window/filter/estimator settings.
Pass example: baseline TIE p99 < X ps with fixed measurement settings; injected step must be detected within Y ms.
Common misuse: estimating drift from too-short windows (measurement noise dominates, and slope becomes meaningless).
Pass example: |d(phase)/dT| < X ps/°C and |d(phase)/dt| < Y ps/s over the specified operating range.
- Short gates respond faster but have higher estimator noise; long gates improve resolution but can hide short events.
- Strong averaging can mask phase steps; event detection must run on a separate path that preserves step edges.
- Comparisons across boards/builds/temperatures require identical settings (sampling, gate/window, filters, and thresholds).
Measurement Methods Overview: Counter, Phase Detector, TDC, and Digital Sampling
Different measurement methods excel at different questions: slow drift and frequency offset, absolute phase offset for alignment, step/event visibility, and multi-channel scaling. Reliable selection starts with fixing the measurement target and required decision latency.
- In-scope: system-level method selection and dominant error sources that affect observability and decisions.
- Out-of-scope: deep circuit-design tutorials (e.g., how to build a TDC IC). Implementation details are covered only as they impact accuracy, latency, and comparability.
Weakness: short events and small phase steps can be blurred by long gates.
Dominant errors: gate/window selection, timebase reference, SSC/wander bias.
Typical decisions: “will channels diverge over time?” and “is drift within guardband?”
Weakness: limited linear range; not a calibrated time ruler across wide offsets.
Dominant errors: duty-cycle dependence, threshold drift, dead-zone/saturation.
Typical decisions: “is phase within a safe window?” and “is lock behavior stable?”
Weakness: practical resolution is limited by edge slew and threshold noise at the tap.
Dominant errors: quantization (LSB), bin nonlinearity, threshold-to-time conversion, CDC/metastability.
Typical decisions: “is skew trimmed to budget?” and “is pre-switch phase inside window?”
Weakness: resolution depends on sampling clock, interpolation, and calibration discipline.
Dominant errors: sampling clock quality, metastability, estimator bias, per-channel offset drift.
Typical decisions: “which channel drifts?” and “is health stable across N channels?”
- Frequency offset / slow drift → Counter (long gate) or Digital sampling (trend logging).
- Phase offset / channel skew → TDC or Digital sampling (short-window timestamp + calibration).
- Continuous trend → prefer longer windows and stable estimators.
- Sudden phase step → keep a dedicated event detector path (avoid averaging it away).
- 1–2 channels → dedicated monitor IC or focused TDC path is efficient.
- Many channels → FPGA timestamping plus per-channel calibration is often more scalable.
- High-speed differential standards → prefer clean taps + conditioning before capture.
- LVCMOS/low-speed → sampling-based timestamping can work if threshold and edge-slew are controlled.
TDC Hooks in Real Systems: Where to Tap, How to Condition, How to Capture
A practical TDC hook is not “just add a tap.” The tap point, edge conditioning, and capture path determine whether measurements represent real system phase or self-inflicted artifacts. The goal is to observe phase without loading or polluting the clock path.
Threshold noise becomes time noise when edge slew is slow. A practical rule is: σt ≈ σv / (dV/dt) . Conditioning aims to keep threshold crossings stable and repeatable without injecting noise back into the clock path.
- Prefer a dedicated fanout output to feed the monitor path (minimize loading and repeatability loss).
- Use a high-Z buffer/limiter/comparator chain that preserves edge slew and common-mode compliance.
- Keep an event detector path separate from smoothed telemetry (phase steps must remain visible).
- Lock measurement settings (window/gate/filter/threshold) before comparing boards or temperatures.
- Log phase/freq together with temperature, supply, and switch/lock flags using coherent timestamps.
- Do not use resistive pick-off taps on sensitive high-speed differential clock trunks.
- Do not allow monitor return currents to share the main clock return path across splits/slots.
- Do not rely on heavy averaging for alarm decisions (steps and bursts will be masked).
- Do not compare absolute offsets across platforms without baseline calibration and fixed settings.
- Tap topology: dedicated output or high-Z buffer; no trunk loading.
- Edge conditioning: stable threshold crossings; verified slew at the capture input.
- Capture integrity: CDC/metastability handled; timestamps are coherent across channels.
- Fixed measurement mouth: window/gate/filter/threshold settings locked for comparability.
- Context logging: phase/freq + temperature + supply + lock/switch flags stored with the same time base.
Error Budget: What Limits Phase Measurement Accuracy (and How to Bound It)
Phase numbers are only useful when they come with a bounded confidence band. An error budget turns “unstable readings” into a set of dominant contributors, a repeatable measurement mouth (window/filter/threshold), and pass/fail criteria that survive temperature and platform changes.
- In-scope: quantization, threshold-to-time conversion, edge/Comparator noise, CDC/metastability, and reference-sharing correlation.
- Out-of-scope: phase-noise curve integration details. Treat upstream jitter/noise as an input term and reference the canonical jitter/PN page.
Before comparing boards, temperatures, or firmware builds, lock window/gate, filters/statistics, threshold/hysteresis, and logging cadence. Otherwise, the budget is not comparable and the confidence band is meaningless.
Quick bound: hold a fixed phase condition and capture N samples; check histogram width and whether σ improves ~1/√N when averaging.
Mitigation: apply bin/offset calibration where available; separate event detection from averaging; keep identical sampling cadence across builds.
Pass criteria: baseline σ and p99 (with fixed mouth) remain within the alignment window guardband.
Quick bound: change conditioning (limiter/comparator threshold/hysteresis) while keeping the tap fixed; if σ changes strongly, threshold-to-time conversion dominates.
Mitigation: increase edge slew at the capture input and stabilize threshold crossings; avoid common-mode induced crossing shifts.
Pass criteria: σt sensitivity to temperature/supply stays bounded across the intended operating range.
Quick bound: A/B compare the same capture path with different upstream sources or tap points; track whether σ correlates with edge quality.
Mitigation: prefer a dedicated fanout output for the monitor; isolate monitor supply/return from the main clock path; keep conditioning consistent.
Pass criteria: baseline σ and p99 remain stable across the permitted source configurations.
Quick bound: measure outlier rate and max jump with fixed mouth; change synchronizer depth/capture strategy and observe if outliers drop sharply.
Mitigation: design capture logic to bound metastability impact; use persistence and robust statistics for alarms (separate telemetry vs decision).
Pass criteria: outlier rate and max jump are bounded under worst-case operating conditions.
Quick bound: compare absolute phase to differential phase (chA − chB) under shared reference; reduced noise indicates common-mode cancellation.
Mitigation: prefer differential metrics for alignment decisions; include reference distribution skew and stability in the budget template.
Pass criteria: differential σ and p99 stay within alignment guardband across temperature and configuration.
σtotal ≈ √(σq2 + σth2 + σedge2 + σcdc2 + σref2 + …)
- Shared reference can create common-mode terms; differential phase can cancel part of σref.
- CDC/metastability often produces rare outliers; track outlier rate + max jump separately from σ.
- Confidence bands should reference a fixed statistic: σ, p95/p99, and a fixed observation time window.
- Baseline σ / p99 with fixed mouth.
- Outlier rate and max jump bound.
- Temperature sensitivity (ps/°C) and the required guardband.
Channel Alignment Workflow: Measure → Adjust → Verify (Bring-up to Production)
Alignment becomes repeatable when it is treated as a closed-loop workflow with fixed measurement mouth, explicit inputs/outputs, and pass criteria that include statistics, drift slopes, and pre-switch qualification.
- Baseline capture with fixed mouth (median / p99 / outlier rate).
- Coarse alignment to bring phase into a safe correction window.
- Fine alignment with iterative measure → adjust → re-measure.
- Verification by statistics (not eyeballing).
- Temperature sweep to bound drift slope and guardband.
- Pre-switch qualification (window + persistence + outlier control).
- Freeze into production: config + trims + minimal test set.
Action: capture N seconds; compute median, p95/p99, and outlier rate.
Output: baseline fingerprint for later comparisons.
Pass criteria: baseline σ/p99 and outlier rate are within the intended alignment guardband.
Action: apply coarse programmable delay steps to bring phase into the fine-control window.
Output: coarse trim value and residual offset range.
Pass criteria: offset is inside the fine alignment capture window without saturating the actuator.
Action: iterate: measure → compute offset → adjust (delay/phase interpolator/NCO step) → re-measure.
Output: final trim value and convergence trace.
Pass criteria: |phase| stays within window for a persistence time (avoid passing on a transient).
Action: compute median/p99/outlier rate; validate that step/event detector still triggers on real steps.
Output: verification report and bounds (σ/p99 + outlier limits).
Pass criteria: statistics remain within guardband and event detection is not masked by averaging.
Action: sweep temperature; log phase vs temperature; estimate drift slope and curvature.
Output: drift model and a trim/guardband plan (table or slope-based margin).
Pass criteria: alignment window holds across temperature, or compensation brings it back within bounds.
Action: require “in-window + persistence + controlled outliers” before asserting qualify; then test switch behavior.
Output: qualify flag behavior and switch-step bound.
Pass criteria: post-switch phase step stays within the system’s allowed band and returns to the steady band quickly.
Action: freeze configuration (tap/mouth) and store trims; define the minimal production test set.
Output: a repeatable bring-up script and a production pass/fail checklist.
Pass criteria: cross-board and cross-build results remain consistent under the same mouth and test time.
Alarms & Decision Logic: Phase Step, Frequency Offset, Holdover Qualification, Switching Guard
Monitoring becomes operational only when measurements are converted into alarms, qualification flags, and switch guards that resist false triggers. This section focuses on observable phase/frequency behaviors and robust thresholding (hysteresis, blanking, and persistence).
- In-scope: phase step, frequency offset/slope, wander-like drift, and phase-noise bursts (phenomenology and detection logic).
- Out-of-scope: missing-pulse hard detection mechanisms (handled in the Clock Monitor / Missing-Pulse sibling page). Treat it as a separate alarm source.
Use a decision path that preserves event visibility (steps/bursts) and a telemetry path that can be smoothed for dashboards. Do not gate switching from heavily averaged traces.
- Freeze measurement mouth: keep window/gate, filters, threshold/hysteresis, and cadence fixed before any comparison.
- Collect baseline: capture enough time to estimate median, p95/p99, and a robust spread metric (σ or MAD).
- Set thresholds by multiples: define alarm limits as baseline + K×spread (K is a policy knob tuned to false-alarm tolerance).
- Add guards: hysteresis (Th_high/Th_low), persistence (N consecutive violations), and blanking (ignore known transients).
- Soft alarm: log + raise attention; keep service running if system margin allows.
- Hard alarm: block switching / trigger re-alignment / enter safe mode, depending on system policy.
Mitigation: run step detection on a light-processing path; use persistence to reject single-sample spikes while preserving genuine steps.
Pass criteria: step magnitude stays below the system limit, and the signal returns to the steady band within the allowed recovery time.
Mitigation: use hysteresis and persistence around slope thresholds; consider dual-window logic (short window detects change, long window estimates steady slope).
Pass criteria: slope remains inside limits for a defined observation time and does not exceed limits across temperature or configuration.
Mitigation: apply guardband based on measured drift slope/curvature; optionally use a trim table updated from temperature sweep characterization.
Pass criteria: alignment window holds across the operating temperature range, or compensation keeps phase inside bounds.
Mitigation: use short-window “quality metrics” with persistence; use blanking to ignore known transient periods (startup, planned switching, controlled warm-up).
Pass criteria: burst rate stays below policy limits, and bursts do not enable unsafe switching (qualify must drop during bursts).
Mitigation: qualify only when (a) in-window, (b) stable for persistence time, and (c) outlier/burst metrics are controlled.
Pass criteria: qualify remains asserted through the required observation period and drops quickly on boundary violations.
Mitigation: guard switching with (in-window AND persistence AND no burst/outlier spike); apply blanking after switching before re-arming alarms.
Pass criteria: switching does not introduce a step beyond allowed bounds, and recovery returns to the steady band within policy time.
Filtering, Statistics, and Logging: Making Numbers Stable Without Hiding Real Faults
Filters make traces readable but can also hide real events. Robust monitoring keeps a decision path that preserves steps and bursts, while a telemetry path can be smoothed for human consumption. Logging must be minimal but sufficient to diagnose root causes.
- Maintain two outputs: decision metrics and telemetry metrics.
- Use robust spread metrics (p99, MAD) for bursts and outliers; keep window definitions fixed.
- Log enough context (temp, rails, lock flags) to explain drifts and bursts.
- When references are shared, record both absolute and differential metrics (common-mode cancellation).
- Using heavy averaging for alarm decisions (steps get attenuated and delayed).
- Changing windows/filters between builds and then comparing “numbers” as if they were comparable.
- Dropping all outliers blindly (CDC outliers and real steps must be distinguished).
- Logging phase without temperature/supply/state context (root cause becomes untraceable).
Failure mode: attenuates and delays phase steps; hides short bursts.
Use safely: keep MA on telemetry path; keep step/burst decisions on a light-processing path.
Failure mode: large windows can “flatten” legitimate short events.
Use safely: keep windows matched to event time scales; track outlier rate separately.
Failure mode: poor α choice can mask bursts while still looking “stable”.
Use safely: pair EWMA with a short-window quality metric (p99/MAD) for burst detection.
- Decision path: raw (or lightly conditioned) → step/burst detector → alarm/qualify/switch guard.
- Telemetry path: raw → smoothing (MA/Median/EWMA) → dashboards and long-term trends.
- Track outlier rate and max jump as separate counters (do not “average them away”).
- Treat genuine steps as events (marker + state transition), not as disposable spikes.
- Apply persistence and hysteresis at the decision layer to suppress one-off glitches.
- timestamp (monotonic; wall time optional)
- phase (raw + decision statistic)
- frequency offset or phase slope
- lock / qualify flags + alarm state
- temperature (local sensor)
- supply snapshot (key rails or brownout flags)
- measurement mouth hash (window/filter/threshold/persistence)
- channel ID / tap ID
- switch events (reason + before/after states)
- outlier counters (rate, max jump), burst counters (short-window p99/MAD)
Use a circular buffer and freeze a pre/post window around each event (step/burst/switch). This preserves the “before/after” context without turning monitoring into a data-platform project.
When multiple channels share a reference, part of the drift can be common-mode. Recording both absolute phase and differential phase (chA − chB) helps separate “global drift” from “alignment error”, and improves alarm stability.
Hardware Integration Details: Input Standards, Termination, Isolation, and Tap Loading
A monitor tap must observe phase/frequency behavior without disturbing the clock trunk. This section focuses on how input standards, termination, common-mode constraints, isolation boundaries, and tap loading translate into measurement error or clock-chain degradation.
- In-scope: tap risks (loading, termination, common-mode), noise injection via monitor ground/supply, and “how it shows up” as time error or instability.
- Out-of-scope: full signal-integrity standards and exhaustive routing rules. Use the Output Standards canonical page for complete SI treatment; keep this chapter limited to monitor-induced error and trunk disturbance.
Extra Cin/Rin/leakage reduces amplitude or slows dV/dt, increasing threshold-to-time conversion. The result is higher time error (σt) and more sensitivity to supply/ground noise.
A tap that alters termination networks or bias points can move common-mode, create mismatch, and amplify reflection or mode conversion. Measurement drift and occasional steps can be self-inflicted by the tap itself.
Monitor supply ripple, digital activity, or ground bounce can couple into the clock via the tap front-end, then re-appear as extra time error. Isolation and return-path discipline define whether monitoring is “observe-only” or “self-disturbing”.
- Prefer high-Z differential receiver/buffer taps; avoid adding parallel termination that reduces effective impedance.
- Keep tap input symmetric (matched loading) to prevent differential-to-common-mode conversion that shows up as extra jitter.
- Ensure monitor front-end supports the common-mode window; otherwise threshold shift becomes time error.
- Bound stub impact: short, controlled tap branch; avoid “probe-like” long branches that create reflection and phase variability.
- What shows up in measurements: rising σt/p99, channel-dependent offset drift, occasional step-like artifacts after board activity.
- Do not disturb the specified termination/bias; extra loads can reshape edges and shift the crossing.
- Treat tap input capacitance as a primary risk: slower edges increase susceptibility to ground bounce and supply ripple.
- Keep monitor bias networks from “participating” in trunk biasing (avoid unintended bias sharing).
- Recommended integration: duplicate fanout output for monitoring whenever available (observe without trunk disturbance).
- What shows up in measurements: good on bench, worse in-system; burst-like spreads synchronized with digital activity or switching.
- Keep bias points stable: tap networks that shift bias can move the crossing and create apparent phase drift.
- Resistive pick-off commonly reduces amplitude and dV/dt, increasing threshold-to-time conversion and noise sensitivity.
- Differential-to-single-ended conversion must be bounded: comparator threshold noise and supply noise appear directly as time error.
- Prefer dedicated fanout outputs and a quiet receiver stage (clean supply/ground reference).
- What shows up in measurements: σt grows with temperature or supply ripple; step-like artifacts appear when bias networks are disturbed.
- Tap capacitance is often the largest error source: slowed edges magnify threshold noise and increase timing dispersion.
- Ground noise is timing noise: any ground bounce shifts the effective threshold crossing and appears as extra jitter.
- Prefer buffering before monitoring; keep monitor return currents away from the clock reference return.
- Avoid long stubs that create reflection and phase variability; keep tap branches short and controlled.
- What shows up in measurements: edge-dependent jitter that tracks digital activity; improved readings after better ground/supply isolation.
- Define a clear boundary: trunk → high-Z buffer/receiver → monitor domain (do not tap “raw” into a noisy digital region).
- Keep monitor supplies quiet: use a low-noise regulator or local filtering for the front-end; treat it as part of the time-error budget.
- Separate returns: avoid routing large digital return currents through the monitor front-end reference region.
- Schedule digital activity: if firmware controls the monitor, avoid bursty bus activity during critical observation windows when possible.
The same voltage noise produces more timing error when edges are slower. Any mechanism that reduces amplitude or dV/dt (extra loading, bias disturbance, or poor return paths) increases threshold-to-time conversion in the monitor front-end.
- Fast bounding check: if measured spread increases when edge rate decreases (or when supply ripple increases), coupling is likely dominating.
- Practical mitigation: restore edge integrity (reduce tap load) and quiet the monitor reference (supply/ground), then re-baseline thresholds.
- Lock the trunk termination/bias scheme; avoid adding parallel termination through the monitor path.
- Estimate tap front-end loading (Cin/Rin/leakage) and verify edge-rate margin at the tap point.
- Keep tap branches short; avoid “probe-like” long stubs that introduce reflection and phase variability.
- For differential standards, preserve symmetry: matched loading and balanced routing into the monitor front-end.
- Verify common-mode compatibility of the monitor input stage; prevent threshold shift outside the valid window.
- Prefer isolation topologies: high-Z buffer taps or dedicated fanout outputs for monitoring.
- Treat monitor supply/ground as a time-error source; add local filtering and keep noisy returns away.
- Baseline A/B: compare measurements with the tap disconnected vs connected to confirm the tap does not degrade the trunk.
- Re-arm logic: apply blanking after planned switching or known transients before re-qualifying alarms.
PCB Layout & Debug Traps: Why Your Phase Readings Lie
Phase monitors and TDC hooks can report “events” that are not real clock behavior. Layout return-path detours, plane splits, via asymmetry, connector micro-motion, and measurement setups can all fabricate skew, steps, or drift-like patterns. This chapter focuses on isolating measurement and integration artifacts before concluding a system-level timing fault.
- In-scope: return-path and reference discontinuities, plane-split crossings, via/transition asymmetry, connector/cable motion artifacts, and probe/trigger/bandwidth traps that masquerade as phase events.
- Out-of-scope: full EMI and SI textbooks. Only the mechanism of “how it fakes a phase event” is covered here.
A clock edge is measured against its reference. If the return path detours, crosses a plane split, or the local reference shifts with current, the reported skew can move even when the source clock is stable.
Via count, layer transitions, and mismatch between P/N create differential imbalance and path delay differences. These appear as static offset, frequency-dependent skew, or drift-like curvature under temperature and supply changes.
Probe ground leads, triggering choices, bandwidth limits, and sampling windows can fabricate steps and noise bursts. If a conclusion changes when only the measurement method changes, it is an artifact until proven otherwise.
- Freeze the measurement mouth: lock trigger, threshold, bandwidth, window length, and any smoothing; document the profile.
- Change only the measurement method: short return (ground spring), coax/active probe, or differential probing. If the “fault” disappears, it is an artifact.
- Reproduce mechanical sensitivity: restrain cables/connectors, apply controlled motion, swap connectors, and watch event counters.
- Validate return continuity: check plane splits and detours; add temporary stitching/bridges to see if Δt moves instantly.
- Audit symmetry: verify P/N transitions and via patterns are mirrored; re-tap at an earlier symmetric stage if needed.
- Only then conclude system behavior: if artifacts are ruled out, escalate to true clock-tree drift/noise budgets and alarm logic.
Production Test & Field Diagnostics: Correlating Monitor Numbers to Real Outcomes
The goal is not “nice phase plots” but repeatable pass/fail decisions in factory and actionable evidence in the field. This section standardizes the measurement mouth (window, bandwidth, trigger, filtering) so correlation does not collapse when tools or conditions change.
Factory: Make the Monitor a Production-Grade Decision Tool
- Reference model: define what “phase” is relative to (shared ref, paired refs, or a designated golden).
- Tap location: lock the physical tap point (fanout output, endpoint input, or board-to-board ingress).
- Mouth parameters: window length, trigger threshold, edge qualification, and filtering parameters are versioned and logged.
- Stimulus capability: provide a controllable “known delta” path (delay/phase step) to validate measurement direction and scale.
- Pre-check: capture config snapshot (window/filter/threshold/trigger) + firmware build ID.
- Baseline: collect a stable segment and compute mean / p99 / event-rate (no tuning allowed during this segment).
- Loopback sanity: measure a short, controlled path to confirm “near-zero / fixed offset” behavior.
- Known delta injection: apply a controlled phase/delay step and verify sign + magnitude consistency.
- Repeatability: repeat inject/release cycles; verify return-to-baseline and absence of drift with cycle count.
- Calibration store: write offset/scale (and optional temp index) with CRC + version tag.
- Guardband verify: re-check at one additional PVT corner (at least one extra temp or supply point).
- Decision: pass/fail + a failure code that separates “measurement chain” vs “real timing fault.”
- Accuracy vs known delta: measured Δt tracks injected Δt within the system’s own budget.
- Repeatability: mean and p99 stay stable across N cycles; event-rate does not grow with time-on-fixture.
- Cross-unit robustness: same mouth definition produces the same decision across units/fixtures (no “lab-only” behavior).
- Config snapshot: input standard, threshold, window length, filter type/params, trigger mode.
- Stats: phase mean, phase p99, freq offset estimate, drift slope (if enabled), event counts.
- Environment: temperature, key rails, and a simple supply-noise summary.
- Calibration: versioned offset/scale/temp-index + CRC.
- Failure code: measurement-chain vs real-timing-fault classification.
Field: Turn Trends + Event Replay into Triage-Ready Evidence
- Decision mouth is immutable: alarms use a fixed mouth definition (window/BW/trigger/filter).
- UI mouth may differ: display smoothing is allowed, but must never affect alarm logic.
- Correlation first: if conclusions flip after changing measurement setup, treat it as a mouth mismatch before blaming the clock chain.
- Read health summary (mean/p99, alarm counts, last event timestamp, lock flags).
- Pull ring buffer around the event (pre/post window) and preserve the raw + alarm-path values.
- Compare against historical trend on the same unit (slope drift, rising p99, growing event-rate).
- Run a short fixed-mouth re-measure to confirm the current state without “operator tuning.”
- Correlate events with context (temperature change, rail disturbances, switching, connector handling).
- Output actionable split: measurement artifact vs integration fault vs true timing degradation.
- Timestamp: absolute + uptime; include mouth version ID.
- Phase/freq: raw sample, alarm-path filtered value, and event marker.
- Decision state: persistence counter, hysteresis state, blanking active flag.
- Context: temperature, key rails summary, ref select/mux state, lock flags.
- Aftermath: recovery time, re-lock decision, and whether switching occurred.
Aligning with External Instruments (without turning into an instrument guide)
- Same point: measure the same tap and the same reference definition.
- Same mouth: align bandwidth, trigger threshold, window length, and filtering rules.
- Same calibration: account for probe/cable delay and diff-to-single conversions consistently.
Example Material Numbers (starting points; verify package/suffix/availability)
- TI: TDC7200 (TDC), TDC7201 (obsolete; avoid new designs).
- ScioSense: TDC-GPX2 (multi-channel TDC), TDC-GP30 (TDC family; use as concept reference).
- Integrated phase tools in synchronizers: Microchip ZL30792 / ZL30364; Renesas 8A34001.
- High-speed comparators: TI TLV3501, TI LMV7219, ADI ADCMP601, ADI/LTC LTC6752.
- LVDS receivers (diff → logic capture): TI SN65LVDS1, TI SN65LVDS2.
- Digital isolators (keep monitor noise off trunk): TI ISO7741, ADI ADuM1401, ADI ADuM1250 (I²C).
- Universal counter / time interval: Keysight 53230A / 53220A class.
- Oscilloscope timing mode: high-bandwidth scope with stable trigger + time-interval measurement.
- Key rule: align window/BW/trigger and the tap point before comparing numbers.
Applications & IC Selection Notes (Keep It Practical)
Selection must start from what needs to be observed (phase step vs frequency drift), then map to measurement approach (TDC vs timestamp vs counter) and only then to parts. The part numbers below are starting points, not a product dump.
A) Multi-ADC/DAC channel alignment (skew + drift you can close the loop on)
- Need: stable inter-channel phase offset + repeatable verification during temp sweep.
- Risk: tap inconsistency and edge conditioning dominate the time error.
- Best-fit method: TDC/time-interval measurement with consistent mouth + p99 verification.
- Example parts: ScioSense TDC-GPX2 (TDC), TI SN65LVDS2 (LVDS receiver), ADI ADCMP601 (comparator).
B) Multi-board sync + redundant clock systems (qualification before switching)
- Need: “in-window” qualification (phase + stability duration) before any hitless switch.
- Risk: connectors/cables create step-like artifacts; shared reference can hide common drift.
- Best-fit method: phase monitor + state machine (persistence/hysteresis/blanking) + event replay.
- Example parts: Microchip ZL30792 (phase measurement/monitoring), Renesas 8A34001 (sync/DPLL with phase tools), TI ISO7741 (digital isolation for monitor domain).
C) SerDes/FPGA multi-domain clock health (trend + event correlation)
- Need: frequency offset and slow drift trending, plus ring-buffer replay for rare events.
- Risk: over-smoothing hides steps; under-filtering causes false alarms.
- Best-fit method: reciprocal counter + robust logging fields + differential statistics when refs are shared.
- Example parts: TI TDC7200 (time interval building block), TI SN65LVDS1 (receiver), ADI ADuM1250 (isolated I²C for register access).
Engineering Selection Checklist (10–15 fields)
- Target: phase offset / phase step / freq offset / drift slope.
- Resolution needed: ps-class vs coarse; confirm the window and edge conditioning needed to reach it.
- Dynamic range: smallest detectable step and the largest measurable offset without ambiguity.
- Channels: number of concurrent measurements; shared vs independent references.
- Input standard: LVDS/HCSL/LVPECL/LVCMOS compatibility and common-mode constraints.
- Front-end: limiter/comparator/receiver choices; time error sensitivity to dV/dt and threshold noise.
- Alarm engine: hysteresis, blanking, persistence (N consecutive fails), and state behavior.
- Logging: ring buffer support, raw + alarm-path values, and “mouth version” tagging.
- Calibration: self-test hooks, known delta injection path, CRC/versioned storage.
- Isolation & coupling: prevent monitor rails/grounds from injecting noise into the clock trunk.
- Interfaces: SPI/I²C, interrupt pins, and register snapshot support for triage.
- Dedicated TDC IC: ScioSense TDC-GPX2; TI TDC7200.
- Synchronizer with phase tools: Microchip ZL30792 / ZL30364; Renesas 8A34001.
- Condition + capture: TI SN65LVDS2 / SN65LVDS1 + TI TLV3501 or ADI ADCMP601.
- Keep noise contained: TI ISO7741; ADI ADuM1401; ADI ADuM1250 (I²C).