A glitch-free clock mux keeps critical clock trees running through main/backup failover without illegal pulses (runt/double/missing), while making phase steps and jitter impact measurable and bounded.
The core engineering task is to define switching policy, qualification/hold-off parameters, and acceptance windows so failover is predictable, testable, and diagnosable in production and the field.
What is a Glitch-Free Clock Mux (and where it sits in a redundant clock tree)
A glitch-free clock mux switches between two clock sources without generating runt pulses, double clocks, or missing cycles at the output. In redundant systems it enables main/backup failover so downstream endpoints continue to see a valid clock during faults—provided the sources are compatible and the switching decision is properly qualified.
Terminology: three practical acceptance levels
Glitch-free (no invalid pulses)
Output switching produces no runt pulses, no double edges, and no missing cycles.
Hitless (no clock outage)
Output remains continuous (no “dead time”). Phase steps may still occur depending on source alignment.
Seamless / near phase-continuous (controlled phase transient)
Phase transient is bounded and small enough for the endpoint’s tolerance window. This typically requires tight frequency match, qualified switching windows, and a defined policy.
Where it sits in the clock tree (typical)
Reference sources (main + backup) feed the system clock chain.
Cleaning / conditioning (if used) ensures both inputs meet level, jitter, and stability requirements.
Glitch-free mux performs failover switching.
Fanout / distribution replicates the selected clock to multiple endpoints (FPGA/SerDes/Converters/PHY/SoC).
This page focuses on hitless switching mechanics, decision logic, and validation. Details of PLL loop design, crosspoint routing, or fanout buffer architectures are intentionally not expanded here.
Key prerequisites for true hitless behavior
Same frequency (or tightly bounded Δf): if the inputs drift apart, phase difference will walk, and “seamless” is not feasible.
Same signaling standard and valid levels: mismatched common-mode, swing, or termination can break edge qualification and create false switching.
Comparable quality signals: there must be reliable indicators (LOS/LOL/frequency window/phase window) to drive a deterministic decision.
Warm standby (recommended): the backup path should be stable before it is needed, otherwise failover becomes recovery-with-transient.
The mux is only one element of hitless redundancy: qualification, policy, and measurement close the loop.
What “glitch-free / hitless” really means: failure modes and pass criteria
“Glitch-free” and “hitless” only matter if they are tied to observable metrics. A failover can look acceptable on a slow timebase yet still break endpoints due to rare runt pulses, one-in-a-thousand double clocks, or a phase step that exceeds tolerance. This section defines the failure modes and a measurement-friendly acceptance template.
Failure modes to look for (grouped by impact)
Waveform validity (logic-level fatal)
Runt pulse: too narrow / too small but still crosses an input threshold.
Double clock: two valid edges occur within one expected period.
Missing cycle: output period stretches beyond the allowed limit.
Abnormal duty: duty-cycle distortion shifts edge timing or triggers mis-detection.
Timing transients (endpoint tolerance dependent)
Phase step: a sudden time offset Δt at switchover (not necessarily a glitch).
Period / frequency step: short-term period error during decision or gating windows.
Temporary wander: low-frequency drift that accumulates phase error over time.
Clock quality degradation (budget-limited)
Additive jitter increase: RMS jitter rises beyond remaining system budget.
New spurs / transients: switching control injects discrete tones or wideband noise.
Waveform: no runt pulses, no double edges, no missing cycles over N switches.
Timing: phase step (TIE) < X; max period error < Y.
Quality: additive RMS jitter increase < remaining budget; no new spur violates system mask.
Critical detail: RMS jitter numbers are only comparable if the integration window and measurement method are identical.
Why “glitch-free” ≠ “phase-continuous”
A mux can be perfectly glitch-free while still producing a measurable phase step at switchover. Phase continuity depends on source frequency match, the allowed switching window, and endpoint tolerance. For engineering clarity, success should be declared as Level 1 (glitch-free), Level 2 (hitless), or Level 3 (near phase-continuous) before tuning policies or thresholds.
A scope screenshot alone can be misleading—pair waveform validity checks with timing (TIE/phase step) and jitter-budget verification using consistent measurement windows.
Switching policies: revertive vs non-revertive, priority, manual override, warm-standby
A glitch-free mux becomes hitless in the field only when switching is driven by a deterministic policy: clear triggers, stable qualification windows, anti-flap guardrails, and a defined return strategy. The goal is to switch fast on real faults while avoiding “thrash” on marginal conditions.
Policy matrix (choose per availability vs stability needs)
Revertive vs Non-revertive
Revertive (auto return): returns to main when main is stable and qualified; best when main has a clear quality advantage.
Non-revertive (stay on backup): remains on backup until manual action or a strict return condition; reduces repeated switching risk.
Priority vs “Best clock”
Priority: main is preferred unless it is declared bad; simpler and easier to qualify.
Best-clock selection: chooses the better source based on quality metrics; requires stable measurements and strong anti-flap controls.
Warm-standby vs Cold-standby
Warm-standby: backup path is already stable (frequency/level/lock) before it is needed; enables faster failover.
Cold-standby: backup starts on demand; often cannot meet tight hitless requirements due to start/lock time.
The flapping problem (why systems switch back and forth)
“Flapping” is usually caused by unstable decision signals (borderline LOS/LOL, noisy frequency checks, marginal levels) rather than the mux core. Prevent repeated switching with guardrails that convert noisy observations into stable decisions.
Debounce: require N consecutive good/bad windows before asserting a state.
Hysteresis: use asymmetric thresholds for entering vs exiting a fault condition.
Hold-off: after any switch, block additional switching for Thold.
Soak time: require main to be good for Tsoak before any revertive return.
Failover: if LOS/LOL is confirmed, switch quickly (availability first).
Return: never return immediately; require “main present” + quality OK + Tsoak, then apply Thold after the switch.
Manual override: allow forcing MAIN/BACKUP for commissioning, but always log the reason and block auto actions if policy requires.
A stable failover system switches quickly on confirmed faults and returns only after sustained “good” qualification (debounce + hysteresis + soak + hold-off).
Inside the box: architectures that make switching glitch-free
“Glitch-free” is achieved by controlling the switching instant. The mux core must ensure that output gating never produces illegal pulse widths, even when inputs are noisy or slightly misaligned. Different internal architectures implement this with different trade-offs in phase transient, latency, and tolerance to input imperfections.
Mechanism: switches in a buffered or lower-frequency domain where safe timing margins are larger.
Strength: simplifies glitch prevention for low-frequency or divided clocks.
Risk: adds latency and may introduce phase uncertainty that must be budgeted.
Why it is glitch-free (unified principle) + key input constraints
The core rule is simple: switch only in a safe window where the output cannot form an illegal pulse width. In practice, the safe window depends on edge detection quality, logic thresholds, noise, and duty-cycle.
Constraint → common failure symptom mapping
Low swing / slow edge: unstable edge detection → runt pulses or sporadic double clocks.
High input jitter: safe window erodes → non-deterministic switch instant and tolerance violations.
Mechanism-level checks (quick validation before deep tuning)
Window stress: repeat switching while reducing input swing and observing whether failures appear as runt/double/missing.
Duty sensitivity: introduce controlled duty distortion and confirm the pass criteria remains satisfied.
Repeatability: measure the distribution of switching instants (phase step histogram) to detect non-deterministic behavior.
Conceptual blocks only: real devices vary, but glitch-free behavior always depends on qualification and a safe switching window that prevents illegal pulse widths.
“Hitless” is not a single promise. Switching can be glitch-free yet still introduce a phase step. For engineering clarity, define the target level (L1/L2/L3), then design the alignment window and switching rules to keep phase transients within endpoint tolerance.
Three acceptance levels (use as a system requirement)
L1 — Glitch-free
No runt pulses, no double clocks, no missing cycles. Phase continuity is not guaranteed.
L2 — Gapless
No clock outage and no missing periods. A bounded phase step (Δt) may occur.
L3 — Near phase-continuous
Switch timing is controlled to keep phase transient within a narrow window. Typically requires same-frequency inputs and a defined alignment strategy.
Alignment conditions (when L3 is meaningful)
Same frequency (or tightly bounded Δf): otherwise phase error drifts and “always continuous” is not realistic.
Switch window: switch only when the relative phase falls within a permitted window.
Controlled phase step: if perfect continuity is impossible, cap the step size and select the best switching instant.
Cycle slip (why phase error becomes a sawtooth with small Δf)
With a small frequency offset between sources, relative phase error accumulates over time and wraps, producing a sawtooth-like drift. In this case, the practical control lever is not “forcing phase continuity forever,” but choosing the switching instant and bounding the phase step.
Windowed switching: wait until phase error enters the allowed band.
Step cap: enforce |Δt| ≤ X (system-defined), otherwise delay switching or downgrade to L2 behavior.
Bounded waiting: set a maximum wait time to avoid excessive failover delay under fault conditions.
With small Δf, phase error drifts and wraps. “Near-continuous” switching is achieved by choosing a switch instant inside the allowed window and bounding the phase step.
Fault detection & decision logic: LOS/LOL, frequency windowing, hysteresis, debounce
Field reliability depends on stable decisions, not raw indicators. A robust design converts raw monitors (LOS/LOL, edge counting, frequency windows, phase drift, lock pins) into a qualified state with debounce, hysteresis, timers, and return gating.
Detection inputs (use more than one when possible)
LOS (loss of signal): amplitude/edge disappearance indicates a hard failure.
LOL / lock pins: fast but may chatter near boundary; always filter.
Frequency windowing: counts edges in a measurement window and compares against a Δf window.
Phase drift trend: detects gradual degradation; useful for “quality” decisions.
Anti-false-trigger controls (turn noisy signals into stable states)
Debounce: require N consecutive windows before asserting GOOD/BAD.
Hysteresis: use different thresholds for enter vs exit (Δf_in vs Δf_out).
Hold-off: after switching, ignore transient alarms for T_holdoff.
Quality-gate (return): return to main only after lock is stable for T_qual and quality checks pass.
Executable parameter set (typical configuration knobs)
N_bad / N_good: debounce counts for BAD and GOOD qualification.
Δf_window_in / Δf_window_out: frequency window thresholds (hysteresis pair).
T_holdoff: post-switch quiet time to suppress transient mis-detection.
T_qual: lock/quality qualification time before allowing a return.
LOS threshold: amplitude/edge criteria for declaring LOS (if supported).
Lock filter: delay/filter applied to lock pins to avoid chatter.
A pipeline approach prevents false triggers: debounce and hysteresis stabilize raw monitors, timers qualify returns, and hold-off suppresses post-switch transients.
Clock-quality budgeting around a mux: additive jitter, phase noise, duty-cycle distortion, spurs
A mux should be treated as a budgeted impairment element: it can add random jitter (RMS), shape phase noise (close-in vs floor), distort duty cycle (especially LVCMOS), and introduce spurs correlated with switching or control activity. Robust budgeting separates these contributions and verifies them with consistent measurement conditions.
What the mux can change (focus on incremental impairment)
Additive RMS jitter (with defined integration limits)
“Additive”: the mux contribution beyond the input source.
Bandwidth matters: compare numbers only when the integration range is the same.
Use-case: captures broadband random effects that accumulate across stages.
Phase noise (close-in vs floor)
Close-in: slow phase wander and low-offset noise sensitivity.
Floor / far-out: wideband noise contributing to RMS jitter.
Practical rule: track both, because endpoints weigh them differently.
Duty-cycle distortion (DCD)
Most visible on LVCMOS: edge-rate and threshold effects shift duty.
Why it matters: can break downstream edge-based timing assumptions.
Report: min/typ/max duty under defined load/termination.
Spurs (event-correlated impurities)
Source: switching transients, control coupling, supply/ground bounce.
Risk: may not inflate RMS jitter much but can violate masks.
Verify: spur offsets and dBc, and correlation to switch events.
A practical budgeting framework (keep measurement conditions consistent)
For random jitter-like terms, treat each stage as an RMS contributor under the same integration bandwidth, then combine by RSS:
Jtotal ≈ √(Jsrc² + Jcleaner² + Jmux² + Jfanout² + …).
Track spurs separately as a mask/peak metric rather than folding them into RMS.
RMS line items: use additive jitter terms for “what the stage adds.”
Spur checklist: top offsets + dBc + event correlation (switching/control).
Budget random jitter terms by RSS under consistent bandwidth, and track spurs separately as peak/mask items (often correlated with switching/control activity).
Switching reliability is often limited by interface discipline, not the mux core. Keep sources comparable (same standard and frequency), terminate correctly, maintain a continuous return path, and apply skew control only where it matters for the chosen hitless level.
Compatibility first (avoid cross-standard switching)
Same standard in: LVCMOS↔LVCMOS or LVDS↔LVDS is the default safe assumption.
Cross-standard risk: level translation can change edge integrity, duty, and threshold behavior.
Comparable conditions: identical termination and biasing strategy on both inputs improves deterministic switching.
Termination & common-mode (why some outputs “fail” while others pass)
HCSL / LVDS
Place Rt near RX: long stubs make reflections and threshold mis-detect more likely.
Return path continuity: crossing splits/voids increases mode conversion and overshoot.
LVPECL / LVCMOS
Bias/termination completeness: missing biasing can appear as “clipped” or shifted waveforms.
Edge rate control: too-fast edges can worsen overshoot and false edge detection.
Duty sensitivity: LVCMOS duty can shift with loading and threshold behavior.
Skew control + mux/fanout placement trade (keep it requirement-driven)
Where skew matters: mux output to fanout input (and any parallel endpoints requiring alignment).
Where it often does not: branches with no phase-relationship requirement beyond legal pulses.
Mux before cleaner: unified cleanup after switching, but recovery/qualify timing becomes critical.
Mux after cleaner: both paths are already “clean,” but layout symmetry and coupling become more sensitive.
Keep switching comparable (same standard), place termination near receivers, preserve return continuity, and apply skew control where alignment requirements demand it.
Failover timing: switchover time, holdover, and downstream tolerance windows
“Gapless output” does not automatically mean “zero phase disturbance.” A practical failover spec is a
time-budgeted sequence: detection, decision, switching, then post-switch settling. Downstream tolerance should be expressed as
fillable windows (phase/period/jitter/spur) instead of a single number.
What “switchover time” means in engineering terms
t_detect: raw fault becomes a qualified alarm (LOS/LOL, frequency window, phase drift).
t_decide: decision logic and policy gating (debounce, hysteresis, timers, priority).
t_switch: the switching actuation (may include bounded waiting for a safe/phase window).
t_settle: downstream stabilization / re-lock observation window after the switch.
Downstream tolerance focus (use to choose acceptance windows)
SerDes / high-speed links
Primary sensitivity to continuity and event-correlated phase/jitter spectrum.
Acceptance windows should emphasize phase step, TIE peaks, and spur emergence around switch events.
Converters / sampling-critical endpoints
Primary sensitivity to random jitter budget and spurs.
Acceptance windows should lock integration bandwidth and track top spur offsets/dBc before vs after switching.
FPGA / SoC edge-based logic
Primary sensitivity to illegal pulses (runt/double/missing) and abnormal period/duty.
Acceptance windows should prioritize morphology pass/fail with aggressive glitch-trigger coverage.
Fillable tolerance windows (use as a requirement template)
Continuity
No missing cycles: YES
Allowed outage < ____
Timing
Max phase step |Δt| < ____
TIE peak < ____ (window ____)
Period error |ΔT| < ____ (N cycles ____)
Quality
ΔJRMS(additive) < ____ (f1–f2: ____)
Top spurs < ____ dBc @ offsets ____
Return qualify time Tqual > ____
Recovery
Downstream re-lock < ____
No flapping for ____ after switch
A usable failover specification decomposes time into detection, decision, switching, and settling—then assigns tolerance windows per downstream sensitivity.
Validation & measurement traps: how to prove it is truly glitch-free
A single “clean looking” scope capture is not proof. Reliable validation is layered:
morphology (illegal pulses), timing (TIE/phase step/period error),
and quality (jitter/phase-noise bandwidth consistency and spurs). Each layer has common traps that can hide rare failures or create false glitches.
No new switch-correlated spur emergence after events
Proving “glitch-free” requires a measurement chain that can catch rare illegal pulses, quantify phase transients, and compare jitter/PN under identical bandwidth settings.
Engineering checklist + Applications & IC selection logic
This section turns “glitch-free / hitless” requirements into an executable bring-up and production plan, then maps those requirements to
concrete selection filters and representative IC part numbers (by candidate class, not as a universal BOM).
A) Engineering checklist (design → bring-up → validation → production)
A1) Input readiness (make A/B comparable before expecting “hitless”)
Standard & termination match: keep A/B in the same electrical standard (LVCMOS/LVDS/HCSL/LVPECL) and termination style;
avoid “cross-standard switching” unless the mux explicitly supports it.
Frequency window: verify Δf ≤ ____
at the mux input pins (not only at the source connector).
Amplitude/CM/duty: confirm swing, common-mode, and duty-cycle are within the mux input qualification limits
(duty in ____ to
____).
Power-up sequencing: define default path (MAIN/BACKUP), input-valid timing, and reset/enable ordering to prevent
startup false-switching and “first-switch” artifacts.
Warm-standby option: if available, keep the backup path qualified/locked to reduce switching transient risk.
A2) Decision parameters (prevent flapping and false triggers)
Treat failover and revert decisions as a parameterized filter chain. Default engineering posture:
fast switch on hard failure (LOS/LOL), and delayed/qualified return (soak + quality OK).
Frequency window: Δf_window = ____ ppm (or ____ Hz)
(Optional) Phase drift gate: Δφ_window = ____ (or TIE ____)
A3) Output health (unify waveform + timing + quality acceptance)
Waveform (scope)
Runt pulses: 0 events in persistence
Double clocks / extra edges: 0 events
Missing cycles: 0 events
Duty anomaly beyond ____
Timing (TIE / phase step)
Max period error: |ΔT| ≤ ____
Max phase step (time): |Δt| ≤ ____
TIE_peak within window (____): ≤ ____
Quality (jitter / spurs)
Additive RMS jitter ΔJ ≤ ____ (integration: ____ to ____)
Spurs: top spur ≤ ____ dBc @ offsets ____
No event-correlated spur bursts during switching
A4) Control & observability (make failures diagnosable in the field)
Control plane: pin-strap vs I²C/SPI; manual override; priority; revertive/non-revertive configuration.
Alarms: LOS/LOL/frequency window/phase monitor as pin or status bits (and how they map to decisions).
Event counters: switch_count, fail_count, alarm_count to correlate intermittent issues.
Timestamp hook: capture switching events (edge or interrupt) for lab correlation with spurs/phase steps.
A5) Production minimum test set (small but decisive)
Toggle MAIN↔BACKUP for X = ____ cycles/events; waveform failures must remain 0.
Corner sweep: temperature ____, voltage ____, input disturbance ____.
Record alarms + counters; reject lots with abnormal switch_rate or fail_rate.
Reuse the same three-layer acceptance (waveform / timing / quality) for consistency across teams.
B) Applications (strictly within this page boundary)
Redundant reference clock trees
Keep critical endpoints alive during MAIN reference failure without creating illegal pulses that can lock-up digital logic.
Maintenance / test bypass
Switch to test sources or alternate references for on-line validation and service, with manual override and event logging.
High availability (HA) platforms
Combine dual refs + automatic failover + alarms so the system shifts from “hard failure” to “recoverable event”.
Field diagnosability
Alarm pins and counters make intermittent switching explainable, enabling faster root-cause closure and production screening.
C) IC selection logic (decision filters → candidate class → example part numbers)
Selection should be layered: hard constraints first, then switching behavior,
then clock quality, then monitoring, then control/integration.
Treat part numbers below as representatives per class; always verify package, suffix, and measurement conditions.
C1) Hard constraints (filter)
Input count: 2:1 vs n:1 (and whether multiple independent channels are required).
L3: Near phase-continuous — requires alignment/holdover mechanics and tighter input comparability.
C3) Clock quality & monitors (rank + qualify)
Additive RMS jitter must match the same integration window used in the system budget.
Duty-cycle distortion matters most for LVCMOS paths.
Event-correlated spurs must be checked around switching events.
Prefer devices with LOS/LOL/frequency window + counters when field diagnosis is required.
Concrete example part numbers (by candidate class)
Use these as starting points for datasheet lookup and bench verification. Final selection must be driven by the filters above
(standard, frequency, jitter window, monitors, control, and power/EMI constraints).
Class L1 — Glitch-free 2:1 clock mux
Renesas580-01 — glitch-free switching, clock detect; for redundant clock trees.
Class L2 — Glitch-free mux with “zero-delay” style regeneration / multi outputs
Renesas581G-02LF (ICS581-02) — PLL-based glitch-free mux, zero delay input-to-output, multi low-skew outputs.
Class L2/L3 — Hitless input switching + monitor + distribution outputs
Renesas (IDT)873996 — dynamic clock switch monitors both inputs, automatic switch to good clock, LVPECL outputs.
Class L3 — DPLL/DSPLL devices with hitless reference selection + holdover mechanics
Texas InstrumentsLMK05028 — DPLL-based network synchronizer with hitless switching + digital holdover options.
MicrochipZL30105 — DPLL with hitless reference switching behavior and holdover-related mechanics.
SkyworksSi5345 / Si5344 / Si5342 — DSPLL family with hitless input clock switching (manual/automatic) and monitoring.
SkyworksSi5348 — DSPLL family option when higher output flexibility is required (verify switching mode constraints).
SkyworksSi5386 — DSPLL family option for advanced timing trees (use when L3 mechanics and monitoring are required).
Verification reminder (avoid false comparisons)
For any candidate, align the measurement definition: switching transient metric (phase step/TIE), additive jitter integration window,
and the exact I/O standard termination used on the PCB.
Each answer is intentionally short and executable. Use the four-line format to keep decisions measurable:
Likely cause → Quick check → Fix → Pass criteria.
1
Why does a “glitch-free” mux still cause a noticeable phase step at switchover?
Likely cause: Glitch-free prevents illegal pulses, but does not guarantee phase continuity; A/B have Δf or drifting phase.
Quick check: Measure phase step |Δt| and TIE_peak around the event; verify Δf at mux pins ≤ ____.
Fix: Use/enable windowed switching or “wait-for-safe-window”; downgrade requirement to L2 (allow bounded phase step) if inputs cannot be aligned.
Why do I see occasional double clocks during failover only at cold temperature?
Likely cause: Cold changes edge rate/swing/duty or worsens reflections, causing qualifier/gating mis-detection during switching.
Quick check: At cold, probe at mux input pins: swing/overshoot/duty; check alarm/counter for rapid LOS/qualify toggling.
Fix: Correct termination/return path and reduce stubs; add debounce (N_bad) and hold-off to prevent borderline chatter.
Pass criteria: Cold corner: toggle X=____ times; double=0, runt=0, missing=0, and counters show no repeated back-to-back switches.
3
Switching looks fine on the scope, but the FPGA sometimes miscounts—what trigger should be used?
Likely cause: Rare narrow pulses are missed by normal triggering; probing method creates false confidence.
Quick check: Use pulse-width / runt / dropout triggers + persistence; measure at the receiver/termination point, not a stub.
Fix: Use proper differential probing and short return; add an event counter (or FPGA edge counter) to correlate with failover events.
Pass criteria: Over N_switch=____ events: trigger hits=0, FPGA miscounts=0, and waveform failure counters remain 0.
4
Why does revertive switching “flap” between main and backup even though both clocks look present?
Likely cause: No hysteresis/soak on return; quality gate is too permissive, so marginal inputs cause oscillation at thresholds.
Quick check: Inspect alarm/counter logs for frequent “good/bad” toggles; confirm T_holdoff and T_soak are non-zero.
Fix: Add hysteresis + hold-off after each switch; require “good for T_soak + N_good” before revert; use non-revertive if needed.
Pass criteria: Under disturbance/temperature sweep: switch_rate ≤ ____, no back-to-back switches within T_holdoff=____.
5
What is the first check when only some outputs fail after switching (same mux, same source)?
Likely cause: Branch-level differences (termination, stub, load, routing) after the mux; not the mux core.
Quick check: Compare “pass vs fail” branch: termination location/value, stubs, return path discontinuities; probe at each receiver.
Fix: Normalize termination and remove stubs; relocate mux/fanout hierarchy so switching happens before divergent routing where possible.
Pass criteria: All outputs pass the same morphology + timing windows (runt/double/missing=0; |ΔT|≤____; |Δt|≤____).
6
Can clocks of different standards (e.g., LVCMOS ↔ LVDS) be switched without glitches?
Likely cause: Cross-standard switching violates threshold/termination/common-mode assumptions and often breaks qualification logic.
Quick check: Confirm the device explicitly supports dual-standard inputs/translation; verify both paths meet the same input-qualify limits at the pins.
Fix: Convert to a single standard before the mux (recommended); or choose a mux explicitly designed for that standard combination and re-validate.
Pass criteria: At mux pins both inputs meet qualify limits; switching shows 0 illegal pulses and bounded |Δt|≤____ across corners.
7
Why does additive jitter look worse after inserting a mux even when the datasheet seems small?
Likely cause: Different jitter integration windows or measurement modes; switch-correlated spurs inflate apparent jitter.
Quick check: Re-measure using the same f1–f2 window and same instrument settings; separately check top spurs around switch events.
Fix: Align measurement definitions; improve control/power isolation to the mux; choose a lower-additive class if ΔJ remains over budget.
Pass criteria: ΔJ_RMS(additive)≤____ with identical f1–f2; no new event-correlated spur above ____ dBc.
8
How should debounce/qualification time be set to avoid false failover but still meet availability targets?
Likely cause: One set of timers is incorrectly used for both “switch away” and “switch back,” causing either false failover or slow recovery.
Quick check: Measure the duration distribution of real disturbances; log N_bad/N_good events and compare to timer settings.
Fix: Use fast qualify for hard faults (LOS/LOL) and slower soak+quality gate for revert; tune N_bad, N_good, T_soak, T_holdoff independently.
Pass criteria: False failover rate ≤____, max outage ≤____, and flapping=0 under the defined disturbance profile.
9
Why does the backup clock pass frequency checks but still causes downstream link errors after switchover?
Likely cause: Frequency windowing is necessary but insufficient; phase transient, jitter spectrum, or event spurs exceed downstream tolerance.
Quick check: Measure |Δt|/TIE_peak at switchover and compare spurs/jitter with identical settings; check if revert is gated by “quality OK.”
Fix: Add quality-gate before using/reverting to a source; keep backup in warm-standby if supported; tighten termination/routing for sensitive endpoints.
Pass criteria: Link errors=0 and all defined windows pass: |Δt|≤____, TIE_peak≤____, ΔJ≤____, top spur≤____ dBc.
10
What is the simplest production test to prove “no runt pulse / no missing cycle” at scale?
Likely cause: Production tries to “scope screenshot” instead of using event-driven triggers and counters, missing rare failures.
Quick check: Use a pulse-width/runt trigger with persistence and log hit_count while toggling MAIN↔BACKUP X times.
Fix: Standardize a minimal script: X toggles + corner sweep (temp/voltage) + automated pass/fail counters; keep deeper jitter/PN as audit sampling.
Pass criteria: Over X=____ toggles: runt=0, double=0, missing=0, and event counters match expected totals.
11
Why does enabling SSC on one source break hitless switching?
Likely cause: SSC introduces intentional FM so A/B no longer stay within a stable Δf/phase window, defeating “safe-window” assumptions.
Quick check: Confirm SSC depth/rate and observe phase difference trend (rapid drift/sawtooth); verify Δf_window is still satisfied during modulation.
Fix: Disable SSC on hitless paths; or apply matched SSC to both sources and re-validate with updated Δf/Δt windows.
Pass criteria: During SSC operation: runt/double/missing=0 and phase/TIE windows remain within limits (|Δt|≤____, TIE≤____).
12
How can rare field events (brownout, intermittent LOS) be logged and diagnosed without a lab scope?
Likely cause: Lack of observability (no sticky status, no counters, no timestamped events) makes intermittent failures look “random.”
Quick check: Verify availability of alarm pins/status bits and counters (switch_count/fail_count); confirm MCU can timestamp interrupts/events.