123 Main Street, New York, NY 10001

TSN Switch / Bridge for Deterministic Ethernet (802.1AS/Qbv/Qbu)

← Back to:Interfaces, PHY & SerDes

A TSN switch/bridge turns Ethernet into a measurable, schedulable network by aligning time (802.1AS) and enforcing transmission windows (802.1Qbv), optionally reducing blocking with preemption (802.1Qbu).

The result is deterministic behavior you can verify: bounded end-to-end latency, low jitter, and zero out-of-window events under defined stress profiles and acceptance budgets.

Definition & Scope: What a TSN Switch/Bridge guarantees

A TSN switch/bridge provides predictable Ethernet delivery by combining a shared time base, time-aware transmission windows, and reduced blocking for critical traffic.

Guarantees (when configured correctly)

  • Bounded latency for scheduled/critical classes (measurable as worst-case E2E delay ≤ X, under load Y%).
  • Low jitter by limiting queueing variability inside time windows (residence time variation ≤ X, window-to-window).
  • Time alignment of transmission opportunities across ports/nodes via gPTP time (time error ≤ X ns at observation point).
  • Isolation of critical traffic from best-effort congestion using classification, queues, gates, and shaping (out-of-window frames = 0 over Z hours).
Condition: correct classification Condition: synchronized time base Condition: valid schedule & guard band

Not guaranteed (common misconceptions)

  • Application execution latency (OS scheduling, software pipelines, compute load). Diagnostic clue: frames arrive on-time, but the application still times out.
  • End-to-end determinism across non-TSN segments (wireless links, non-TSN islands). Diagnostic clue: determinism breaks at the boundary where gPTP/schedule is not enforced.
  • PHY/SI root causes (BER, link retrain, eye closure, analog jitter sources). Diagnostic clue: CRC errors, link drops, or retraining events dominate.

Scope guard: this page focuses on switch/bridge L2 determinism mechanisms (802.1AS/Qbv/Qbu, HW timestamps, queues, gating, shaping, and verification hooks).

Key terms used on this page

  • Determinism: predictable bounds on latency/jitter for selected traffic classes.
  • Residence time: time a frame spends inside the switch/bridge (ingress → egress).
  • Time-aware scheduling: transmission windows controlled by time-based gate lists (Qbv).
  • Preemption: interrupting best-effort transmission to reduce blocking for urgent traffic (Qbu/802.3br).

Diagram: Best-effort jitter vs scheduled windows

Best-effort vs Scheduled Windows Two timelines compare random queueing and jitter in best-effort Ethernet to time-windowed transmission in TSN with a guard band. Before After Time → Best-effort queueing jitter Scheduled Guard band

Interpretation: determinism comes from time-windowed transmission (not from making every frame faster). Verification focuses on worst-case bounds and out-of-window events, not only averages.

Standards map: 802.1AS / 802.1Qbv / 802.1Qbu and the “determinism pipeline”

Determinism is a chain: a shared time base enables time windows; reduced blocking keeps windows usable; shaping and policing prevent best-effort bursts from breaking bounds; measurement closes the loop.

802.1AS (gPTP): shared time base

  • Provides: a common notion of time across ports/nodes, enabling aligned schedules.
  • Switch must supply: consistent timestamp points and stable residence-time behavior (time error and role changes become observable).
  • Typical failure symptom: sporadic time-error spikes or frequent master/role churn (even before Qbv/Qbu is enabled).

802.1Qbv (TAS): time-aware transmission windows

  • Provides: gated windows (GCL) for selected queues, turning latency into a bounded budget.
  • Needs: correct classification (PCP→queue), GCL timing, and guard-band sizing.
  • Typical failure symptom: “Qbv enabled but still unstable” (caused by wrong queue map, wrong cycle alignment, or guard band mismatch).

802.1Qbu + 802.3br: reduce blocking via frame preemption

  • Provides: lower blocking latency by interrupting preemptable traffic for express traffic.
  • Impact: smaller guard bands and higher schedule efficiency (when interoperable).
  • Typical failure symptom: unexpected drops/counters after enabling preemption (misclassification, fragment settings, or link-partner interoperability).

Optional TSN modules (shown as “add-ons”)

Qci (per-stream policing) · Qav (credit-based shaping) · Qcc (configuration) · CB (FRER)

Navigation note: these appear as extensions around the core pipeline; this page focuses on AS/Qbv/Qbu and switch-side verification hooks.

Diagram: Determinism pipeline (core chain + verification)

TSN Determinism Pipeline A five-stage pipeline shows how time sync enables windows, preemption reduces blocking, shaping/policing controls bursts, and measurement verifies bounds. Time sync 802.1AS Windows 802.1Qbv Preemption 802.1Qbu Control QoS / shaping Verify metrics Switch-side hooks HW timestamps GCL / gates express / preempt queues / shapers

Reading guide: if time sync is unstable, window scheduling cannot be trusted. If preemption is misconfigured, blocking grows and guard bands expand. If shaping/policing is missing, best-effort bursts can violate deterministic bounds. Verification must monitor both time error and out-of-window behavior.

Switch architecture (TSN-capable): data path, timestamp points, and queues

Determinism depends on three concrete implementation choices: where timestamps are taken, where gating/shaping is applied, and how queues isolate traffic classes. The same feature name can behave differently across implementations unless these points are explicit.

Key data path (one frame’s journey)

  1. Ingress: frame enters the port; ingress timestamp may be taken near the MAC boundary. Impact: where “time” starts for residence-time accounting.
  2. Classifier: PCP (and internal policy) selects a traffic class and queue. Impact: the only way a scheduled stream reaches the gated queue.
  3. Queues: per-port priority queues (and optional per-stream resources) buffer contention. Impact: queueing variability becomes jitter unless bounded by gates/windows.
  4. Shaper / Gate: gating (Qbv) controls when a queue may transmit; shaping/policing controls burst behavior. Impact: converts “average QoS” into enforceable time windows and bounds.
  5. Egress: frame exits the port; egress timestamp may be taken near MAC/PCS boundary. Impact: where “time” ends for residence time and path delay.

Verification hint: deterministic behavior should be evaluated using worst-case latency/jitter under defined load, not only an average throughput or a “looks good” idle measurement.

Store-and-forward vs cut-through (only the actionable conclusions)

Store-and-forward (SAF)

  • Residence time often scales with frame size and buffering policy.
  • Bounds can be easier to reason about when queues/gates are explicit.
  • Practical check: sweep frame sizes (64B→MTU) and confirm the tail latency remains within the budget X.

Cut-through (CT)

  • Lower average forwarding delay, but bounds depend heavily on contention and policy.
  • Misclassification or missing gating can re-introduce blocking and jitter under load.
  • Practical check: step best-effort load (0%→high %) and verify scheduled-class jitter does not rise with queue depth.

Scope guard: PHY-level jitter, eye closure, BER, and link retraining are not expanded here; treat them as separate root-cause classes.

Typical pitfalls (why “TSN enabled” still behaves like best-effort)

  • Wrong PCP→queue map: critical traffic lands in a non-gated queue, so Qbv has no practical effect.
  • Gate applied to the wrong queue: schedule exists, but the targeted class is not the one carrying the stream.
  • Timestamp point inconsistency: different ports or paths use different tap points (pre/post queue), producing offset “jumps”.
  • Shaper vs gate order confusion: windows open, but shaping limits or burst behavior still violates the intended bounds.
  • Measurement artifacts: SPAN/mirror ports and capture devices add their own delay/jitter and can mask real residence time.
Rule: verify queue mapping first Rule: verify timestamp tap points Rule: verify out-of-window events

Counters & logs to record (for verification and production correlation)

Time & sync

time error, role changes, sync loss events, timestamp tap mode/version tag.

Gates & schedule

gate open/close events (or cycle counters), out-of-window frames, guard band configuration hash.

Queues & drops

per-queue depth watermark, drops, starvation indicators, scheduled-class occupancy under load.

Preemption

express blocked count, fragment counters, preemption errors, interoperability flags.

Port health (triage only)

CRC/alignment errors, link flaps, retrain events (treat as separate root-cause class).

Pass/fail framing example (placeholders): scheduled-class out-of-window frames = 0 over Z hours, and worst-case residence time jitter ≤ X.

Diagram: TSN-capable port data path (timestamp taps, queues, gating)

TSN Port Data Path A block diagram shows ingress, classifier, queues, shaper and gate, and egress. Timestamp taps at ingress and egress define residence time across the pipeline. Ingress port Classifier PCP → queue Queues per-port priorities Shaper Gate (Qbv) E gress Q7..Q0 TS_in TS_out Residence time

Interpretation: deterministic bounds depend on whether timestamps are taken at consistent tap points and whether the scheduled traffic class actually traverses the gated queue. Treat “PCP→queue mapping” and “timestamp tap mode” as first-order configuration artifacts that must be logged.

802.1AS (gPTP) in switches: time domains, BMCA, and residence time

In TSN networks, gPTP is not a “nice-to-have”; it is the reference clock that makes time-aware scheduling meaningful. Switch/bridge behavior matters because residence time and timestamp consistency directly shape time error and schedule alignment across ports.

Time domains and bridge roles (switch-side view)

  • Single domain: one gPTP time base aligns all TAS windows. This is the simplest path to deterministic scheduling.
  • Multiple domains: different time bases can coexist; schedules cannot be assumed aligned across domains without explicit boundary strategy. Failure signature: devices show “locked”, but cross-domain alignment drifts or is offset.
  • Bridge role impact: whether the bridge behaves like a time boundary or passes timing transparently affects how delay variation propagates. This page focuses on what must be observable and verifiable in the switch.

Residence time (why switches influence time error)

  • Residence time is the forwarding time inside the bridge (ingress → egress). It includes queueing and any gating/shaping delays.
  • Delay variation inside the bridge becomes time-error variation unless it is measured and accounted consistently.
  • Operational takeaway: schedule alignment degrades when time error spikes correlate with congestion, queue depth, or role changes.

Minimum evidence to collect: time error trend + role-change events + a congestion proxy (queue watermark or out-of-window events).

BMCA (Best Master selection): why role churn breaks determinism

  • gPTP continuously selects the best time source; role changes can occur due to link events, quality changes, or configuration.
  • Role churn (frequent master/role changes) often shows up as time error steps and schedule phase disturbance.
  • Practical rule: do not diagnose Qbv/Qbu behavior until gPTP role stability is confirmed (stable GM identity and bounded time error).

Hardware vs software timestamps (switch-side decision criteria)

Hardware timestamps are required when

  • alignment targets are tight (time error budget X ns–sub-µs),
  • traffic load is variable and queueing is non-trivial,
  • time error must remain stable during load steps and role transitions.

Software timestamps are commonly limited by

  • interrupt/CPU scheduling variability,
  • queue-dependent timing that is not captured at the tap point,
  • non-deterministic latency between capture and timestamping.

Practical verification: apply a best-effort load step and check whether time error spikes correlate with queue depth or CPU activity. Strong correlation indicates insufficient timestamp determinism for tight alignment.

Diagram: gPTP topology and the residence-time injection point

gPTP Topology (Switch View) A grandmaster connects to a TSN switch/bridge that forwards to end stations. The switch marks ingress and egress timestamp taps and accounts for residence time between them. GM Grandmaster TSN Switch Bridge ES End station ES End station TS in TS out Residence time Queues Gates Note: role stability + bounded time error must be confirmed before diagnosing TAS/preemption behavior.

Interpretation: gPTP quality is shaped by how the bridge measures and accounts for forwarding delay. If time error is unstable or roles churn, TAS windows cannot stay aligned and deterministic bounds degrade regardless of the schedule configuration.

802.1Qbv TAS: GCL, gate states, and guard band design

Time-Aware Shaper (TAS) turns QoS into a verifiable time model: a cycle repeats, gates open/close per queue, and a guard band protects window edges from non-interruptible transmissions. The outcome must be validated using worst-case bounds and out-of-window evidence.

Minimal TAS model (what must be explicit)

  • Cycle time: scheduling repeats every T_cycle (placeholder).
  • Windows: each queue has one or more open intervals [t_start, t_end).
  • Queue mapping: traffic classification (PCP/policy) must land the stream into the intended gated queue.

Configuration asset (recommended): export and version-control GCL + PCP→queue map + guard band parameters. Determinism cannot be audited if these are implicit.

Guard band (why it exists and how to size it)

  • Guard band prevents a window edge from being “polluted” by a non-interruptible transmission already on the wire.
  • Size is dominated by maximum blocking time, which depends on max frame size, line rate, and whether preemption is enabled.

No preemption (conservative skeleton)

T_guard ≈ (L_max · 8) / R_line + T_margin

L_max: maximum non-preemptable frame length (bytes, placeholder). R_line: line rate (bps, placeholder). T_margin: implementation/interop/time-error margin (placeholder).

With preemption (effective skeleton)

T_guard ≈ (L_frag_max · 8) / R_line + T_margin

L_frag_max: maximum blocking fragment length (bytes, placeholder). Guard band shrinks only if interop is clean and classification is correct.

Engineering trade-off: too small → out-of-window leakage; too large → wasted windows and increased waiting for other classes. Determinism prioritizes provable bounds over average utilization.

Worst-case latency/jitter thinking (budget skeleton, not a full derivation)

End-to-end bound (placeholder): D_worst ≈ D_path + Σ D_switch(i)

Single-switch bound (placeholder): D_switch ≈ T_res_base + T_gate_wait + T_guard + T_queue_bound + T_shaper

  • T_gate_wait is structurally bounded by the schedule (worst case: wait until the next open window).
  • T_queue_bound depends on how much non-critical traffic can accumulate ahead of the stream’s class. A queue model that cannot isolate the class cannot guarantee a tight bound.
  • T_guard is the edge protection overhead required to make window boundaries true in practice.

How to verify windows (evidence-based checklist)

  1. Configuration evidence: export GCL + PCP→queue map + guard band parameters (hash/version tag).
  2. Behavior evidence: out-of-window frames for the scheduled class = 0 over Z hours (placeholder).
  3. Worst-case evidence: apply best-effort load steps and confirm worst-case latency ≤ X (placeholder).
  4. Stability evidence: window-to-window arrival variation/jitter ≤ X (placeholder) and no role churn events during the run.

Common measurement trap: mirror/SPAN and capture devices can add delay or re-ordering artifacts; treat capture point latency as a separate uncertainty term.

Diagram: TAS cycle, gate states (Q7/Q5/Q0), and guard band

TAS Gate Waveforms A timeline shows one cycle. Three rows display gate open and closed intervals for Q7, Q5, and Q0. A guard band region is highlighted near a window boundary. Time → T_cycle Q7 Q5 Q0 Guard band Open Open Open Open

Reading guide: windows are per-queue and repeat every cycle. Guard band is a deliberate “edge protection” region sized by maximum blocking time. Verification must include out-of-window evidence and worst-case bounds under load steps.

802.1Qbu Frame Preemption: what it fixes, what it breaks

Preemption reduces the “non-interruptible blocking length” on a link by allowing express traffic to interrupt preemptable traffic. It can shrink guard bands and improve schedule efficiency, but it introduces a new failure class: interop and classification mistakes that show up as drops and error counters.

What it fixes (measurable benefits)

  • Lower blocking latency: express frames do not wait for a full best-effort large frame to complete.
  • Smaller guard band: the worst-case “edge pollution” is bounded by a fragment length rather than a full frame length.
  • Higher window efficiency: less wasted time at window edges enables tighter schedules or more capacity for critical flows.

Link to TAS: preemption reduces the blocking term in the guard band skeleton by replacing L_max with L_frag_max (placeholders).

Express vs preemptable (classification principles)

Express

time-critical scheduled classes that must hit windows with bounded latency/jitter.

Preemptable

background/best-effort bulk traffic that can tolerate being fragmented and resumed.

Rule of thumb: preemption is not “enable everywhere”. It is a targeted tool to keep large best-effort frames from violating deterministic windows.

What it breaks (new failure class to expect)

  • Interoperability mismatch: link partner does not support preemption or uses incompatible settings. Evidence: preemption error/discard counters rise.
  • Misclassification: critical traffic is marked preemptable or mapped incorrectly. Evidence: scheduled-class jitter or missed windows increase after enabling preemption.
  • Fragment overhead surprises: small fragments add overhead and can change throughput/latency budgets. Evidence: increased occupancy/watermarks despite similar offered load.
  • Measurement confusion: capture points may not reconstruct fragments reliably. Evidence: capture shows “drops” without matching switch discard counters.

Verification & trade-off (how to adopt safely)

  1. Start from TAS-only: compute a conservative guard band and prove out-of-window = 0 under load.
  2. Enable preemption only if guard band cost is excessive (placeholder threshold X% of window).
  3. Validate interop: preemption errors/discards stay at 0 (or within threshold X) during long runs.
  4. Re-evaluate bounds: worst-case latency ≤ X and jitter ≤ X (placeholders) with best-effort load steps.

If preemption counters indicate mismatch, revert to TAS-only and enlarge guard bands. Determinism favors provable bounds over peak efficiency.

Diagram: best-effort fragmentation and express insertion (preemption)

Frame Preemption (Fragment + Express) A large best-effort frame is split into fragments. An express frame interrupts between fragments and the best-effort transmission resumes afterward. Time → Preemptable (BE) Express BE frag frag frag EX Express Preempt Resume

Reading guide: preemption reduces the maximum blocking time seen by express traffic by splitting a large best-effort frame into fragments. The benefit appears only when classification and interop are correct; otherwise, error/discard counters become the dominant evidence.

QoS & traffic classes for determinism: PCP/DSCP mapping, queueing, isolation

Determinism starts at classification. A TSN schedule can only protect traffic that enters the intended queue and follows the intended gate/shaper path. Use PCP as the primary control plane, treat DSCP as an optional mapping input, and make the mapping auditable.

Classification inputs (keep the decision deterministic)

  • PCP (primary): the L2 priority field is the most direct and portable way to drive TSN queue/gate behavior.
  • DSCP (mapping only): if traffic originates in an IP domain, DSCP can map into PCP or an internal traffic class, but the final decision must be consistent at the switch ingress.
  • Non-negotiable rule: for a given stream class, the same ingress conditions must yield the same queue assignment on every relevant port.

Recommended practice: export and version-control PCP/DSCP→internal class→queue mapping as a configuration asset (hash + change log).

Queue mapping template (a practical four-class model)

Scheduled (TAS-protected)

Must land in a gated queue controlled by Qbv windows. Typical PCP set: X (placeholder).

Express (low-blocking priority)

Used to reduce blocking and protect schedule edges. Avoid overuse; an oversized express class can starve others and break fairness assumptions.

Best-effort (bulk / background)

Default low queues. Control burst and rate so best-effort cannot create uncontrolled peaks that leak into deterministic windows.

Management (control / ops)

Must remain reachable but should be rate-limited and isolated from scheduled timing. Keep it auditable and predictable under stress.

Key concept: TSN does not understand “applications”. It enforces behavior through queue assignment and gate/shaper behavior. If a stream lands in the wrong queue, schedules and shaping cannot rescue determinism.

Isolation model (what can be isolated, what cannot)

  • Logical isolation: separate queues per class reduce direct contention inside the egress scheduler.
  • Time isolation: Qbv windows create deterministic send opportunities for gated queues.
  • Rate/burst isolation: shaping and policing cap best-effort peaks that would otherwise inflate shared egress occupancy.
  • Shared constraint: egress is still one physical transmitter per port; isolation must therefore be proven using worst-case evidence, not averages.

Congestion & HOL blocking (symptoms and switch-side mitigations)

Typical symptoms

  • Scheduled windows are configured, but arrival jitter still grows under best-effort load.
  • Tail latency rises sharply during bursty background traffic.
  • Queue watermarks spike even when average utilization is moderate.

Mitigations

  • Move scheduled streams into dedicated gated queues; avoid shared “mixed” queues.
  • Cap best-effort burst/rate so occupancy peaks cannot smear schedule edges.
  • Use preemption only when interoperability is validated and counters remain clean.

Evidence-first approach: validate mapping to the intended queue, then validate out-of-window behavior, then validate worst-case bounds under best-effort load steps.

Fields to log (for correlation and production acceptance)

  • PCP/DSCP→internal class mapping version (hash) per port / VLAN (placeholders).
  • Queue assignment counters per class and per port.
  • Queue occupancy watermarks and drop counters (by queue).
  • Out-of-window evidence for scheduled classes (threshold X placeholder).
  • Preemption-related counters (if enabled): errors/discards/blocked (placeholders).

Diagram: PCP classifier → queue map (Q7..Q0) → gate/shaper path

PCP Classification and Queue Mapping Ingress frames are classified by PCP. DSCP mapping is optional. Traffic is mapped into per-port queues Q7 to Q0, then passes through Gate and Shaper blocks to egress. Frame in Classifier PCP DSCP → PCP map Queue map Q7 … Q0 Q7 Q6 Q5 Q4 Q3 Q2 Q1 Q0 Gate Shaper Egress Determinism starts here

Reading guide: classification must resolve to a stable queue assignment per ingress port. Gates and shaping enforce behavior only after the frame is in the correct queue.

Shaping & policing that actually matter to TSN (CBS / rate limiting / per-stream policing)

Shaping in TSN is not for “nice averages”. It is a boundary-protection tool: cap burst peaks so best-effort cannot inflate egress occupancy and smear deterministic windows. Keep the toolbox small and evidence-driven.

Why shaping matters to determinism (cause → effect chain)

  • Without shaping, best-effort bursts create high instantaneous egress occupancy.
  • High occupancy increases the probability of schedule-edge pollution (guard band pressure) and raises tail latency for critical classes.
  • With shaping, burst peaks are flattened, making worst-case bounds and window evidence easier to prove.

Acceptance should be based on worst-case evidence: tail latency, queue watermark peaks, and out-of-window events (placeholders), not only averages.

Toolbox (keep only what impacts determinism directly)

CBS (Credit-Based Shaper)

Controls burstiness for specific classes (commonly AVB-like streams) to stabilize queue occupancy and reduce tail spikes.

Evidence: reduced queue watermark peaks for the shaped class and fewer schedule-edge anomalies under best-effort load steps.

Rate limiting

Caps average throughput for best-effort/management so background traffic cannot dominate a port under stress.

Evidence: stable critical-class worst-case latency when best-effort offered load is increased.

Burst control

Limits instantaneous peak burst length. This is often the most direct control for protecting schedule edges.

Evidence: reduced peak occupancy and reduced tail latency spread (P99.9 / max placeholders).

Optional: per-stream filtering/policing (insurance against “bad streams”)

  • Purpose: prevent misconfigured or abnormal streams from injecting bursts that can destroy deterministic assumptions.
  • Adoption rule: apply only where a single stream can measurably threaten determinism; avoid blanket enablement that increases complexity.
  • Evidence: policer hit/drop counters (placeholders) correlate with improved worst-case evidence for critical classes.

Verification (before/after shaping; worst-case focused)

  1. Create best-effort stress: apply controlled burst parameters (burst size X, duty Y, placeholders).
  2. Record baseline evidence: critical-class tail latency (P99.9/max placeholders), queue watermarks, and out-of-window events.
  3. Enable shaping (CBS or rate/burst) and repeat the stress profile.
  4. Accept only if worst-case evidence improves: lower peaks, fewer window-edge anomalies, and stable deterministic bounds under stress.

Do not rely on averages. A stable average with an unstable tail is still a determinism failure.

Diagram: shaping reduces burst peaks (queue occupancy) and tail spread (latency)

Before/After Shaping (Occupancy + Tail) Left panel shows before shaping: best-effort bursts create high queue occupancy peaks and long latency tail. Right panel shows after shaping: best-effort is shaped, peaks are reduced and tail shortens. Before shaping After shaping Critical Best-effort Port shared egress Queue occupancy Peak Latency tail Critical Best-effort Shaper Port shared Queue occupancy Peak Latency tail Burst peaks

Reading guide: shaping targets peak occupancy and tail risk. Acceptance should compare before/after evidence under best-effort burst stress, not only average throughput.

Determinism metrics & budgets: what to measure, where to log, pass criteria templates

“Deterministic” must be stated in acceptance language: which metrics, at which measurement hooks, over which time window, with explicit pass/fail thresholds. Evidence should include worst-case behavior (peaks, tail, and out-of-window events), not only averages.

Metric set (define determinism as a bundle, not a single number)

  • E2E latency: report max / P99.9 (placeholders) over a defined run window.
  • Jitter: specify the type (arrival jitter / residence-time jitter / time-error jitter) and the statistics (peak/P99.9 placeholders).
  • Time error: gPTP domain time quality evidence (peak, rate of steps, placeholders).
  • Residence time: per-hop time-in-switch (mean + worst-case + jitter), used for localization and hop budgeting.
  • Loss: drops/errors broken down by port and (ideally) by queue/class.
  • Out-of-window: scheduled traffic observed outside its intended transmission windows (hard determinism evidence).

Budget model (engineering skeleton; fill with project-specific placeholders)

E2E latency budget (skeleton):

E2E_max ≈ Σ D_res_max(hop) + Σ D_wait_max(window) + D_endstation (placeholder; out of scope for switch guarantees)

Guidance: log per-hop residence evidence to localize violations; use out-of-window evidence to validate schedule integrity.

Jitter budget (skeleton): separate time-quality variation (time error) from load/schedule variation (queue/wait). Do not mix measurement points.

Where to measure (hooks that should exist in a TSN-capable switch)

  • Port statistics: per-port rx/tx counters, errors, drops (prefer per-queue breakdown if supported).
  • PTP/gPTP servo health: servo state, offset, rate ratio, path delay (field names are platform-specific placeholders).
  • Gate events: gate open/close events, schedule id/hash, out-of-window counters for scheduled classes.
  • Shaper counters: shaping active time, shaped drops, credit/level indicators (placeholders).
  • Preemption events: fragments, errors/discards, blocked indicators (placeholders) when Qbu/3br is enabled.
  • Queue depth: average and peak watermarks, drop-threshold hits (per queue).

Logging pack (minimum fields for correlation)

Config version evidence

  • GCL schedule id/hash (placeholder)
  • PCP→queue map id/hash (placeholder)
  • Shaper/policer configuration id/hash (placeholder)

Operating context

  • Temperature / voltage / fan / load (placeholders)
  • Best-effort offered load profile (placeholder)
  • Run duration Z and sampling cadence (placeholders)

Pass criteria templates (copy-ready; placeholders X/Y/Z)

Time quality (gPTP)

  • time_error_peak < X ns over Z hours
  • time_error_P99.9 < X2 ns over Z hours (optional)

Per-hop determinism (localization-ready)

  • residence_time_jitter_peak < Y ns per hop over Z hours
  • residence_time_max < Y2 ns per hop (optional)

Schedule integrity (hard evidence)

  • out_of_window_frames = 0 for scheduled classes per Z hours
  • drops_scheduled = 0 per Z hours (recommended)

Each criterion should declare: statistic type (max/peak/P99.9), measurement hook location, sampling cadence, and run duration.

Measurement traps (avoid “passing” with fake evidence)

  • SPAN/mirroring can distort timing evidence (buffering, re-ordering). Prefer switch-side counters for primary acceptance evidence.
  • Mixed timestamp points (ingress vs egress) create artificial jitter. Fix the tap point definition per metric.
  • Averages hide tail risk. Always collect watermark peaks, max/peak latency, and out-of-window evidence.
  • Time bases must be declared (domain, reference). Cross-domain comparisons without normalization are invalid.

Diagram: measurement hooks map (timestamps, gate events, queue depth, counters)

Measurement Hooks Map Shows where to measure and log determinism evidence in a TSN switch: timestamp taps, queue depth, gate events, shaper/preemption counters, and port statistics, plus a logging pack for config and environment context. Ingress Classifier PCP/TC Queues depth / watermark Gate events Shaper counters Egress TS TS Q GE SC PE PS Log pack Config hash/id Environment T/V/Fan/Load Hook legend TS Timestamp tap Q Queue depth GE Gate events SC Shaper counters PE Preempt events

Reading guide: acceptance metrics should map to a concrete hook (TS/GE/Q/SC/PE/port stats) and a log pack that includes config hashes and operating context.

Design hooks: clocking, TC/temperature, EMI, and layout-related failure modes (switch-side)

Field failures often present as time error spikes, residence jitter growth, or sudden out-of-window events. Switch-side diagnosis should follow a fixed ladder: confirm timestamp point consistency, then confirm clock/time health, then correlate with load/queue behavior, and only then suspect EMI/ground-return coupling paths.

Clocking hooks (switch-side only; verification actions)

  • Track clock/reference selection events and any switchover flags (placeholders).
  • Correlate time error spikes with servo state transitions and offset steps (placeholders).
  • Compare before/after load steps: stable clocks should not show event-like time error “jumps”.

Practical hint: clock-related issues frequently show as event-like steps/spikes in time error, not smooth drift.

Temperature & supply-noise correlation (what to log)

  • Temperature: inlet/outlet or board zones + silicon temp if available (placeholders).
  • Voltage: key rails that can affect time quality and I/O activity (placeholders).
  • Fan: PWM/RPM changes (placeholders) to detect airflow-driven gradients.
  • Load: per-port offered load + queue watermark peaks (placeholders).

If environmental fields are missing, station-to-station correlation becomes guesswork and root cause cannot be proven.

Load & queue interactions (time-looking failures that are actually queue failures)

Symptoms

  • Time error appears “noisy” exactly when queue watermarks spike.
  • Residence time jitter grows only under best-effort load steps.
  • Out-of-window events appear after mapping or shaping changes.

Switch-side checks

  • Confirm queue assignment for critical classes (PCP→queue evidence).
  • Confirm gate/shaper runtime events match the expected schedule (hash + counters).
  • Confirm peaks (watermarks) and tail evidence improve after shaping (before/after comparison).

EMI / ground-return coupling (switch-side evidence chain only)

  1. First, rule out tap-point inconsistency (timestamp points must be defined and stable).
  2. Then, compare event timing: time error spikes vs preemption/port error counters (placeholders).
  3. If correlation is port/cable/zone-specific, tag as suspected coupling path and preserve the evidence bundle.

Boundary note: deeper EMI mechanism and layout remediation belong to EMC/layout pages; here the focus is switch-side observability and decision flow.

Diagram: root-cause ladder for time error spikes (switch-side)

Root-Cause Ladder (Switch-Side) Step-by-step ladder for diagnosing time error spikes in TSN switches: timestamp point consistency, clock stability, load/queue correlation, gate/shaper correctness, and EMI coupling path suspicion with evidence bubbles. Event Time error spikes / Residence jitter ↑ / Out-of-window ↑ 1) Timestamp tap consistency EV tap defined 2) Clock stability (switch-side) EV servo/steps 3) Load & queue correlation Q watermark 4) Gate/Shaper correctness GE hash/events 5) EMI coupling path suspicion PS port/zone Evidence bundle • Config hash • Counters • Env fields

Reading guide: climb the ladder in order. Fix tap definition and evidence integrity first; only suspect EMI coupling after clock and load correlation are falsified.

bg:#0b1120; cardBorder:#1f2a44; text:#e5e7eb; muted:#9ca3af; accent:#38bdf8;

Engineering checklist (design → bring-up → production)

A TSN switch/bridge becomes “deterministic” only when configuration artifacts (queues, schedules, preemption rules) are paired with measurable pass criteria and stable logging. The checklist below is organized as reusable project assets.

Design: configuration assets that must exist before bring-up

1) Queue / traffic-class map (must be explicit and versioned)

  • Scheduled (TAS): one or more queues reserved for gate-controlled windows (example: Q7).
  • Express: latency-critical but not fully time-windowed (example: Q5) — often paired with preemption.
  • Best-effort: default traffic (example: Q0–Q2) — shaped/limited so it cannot destroy deterministic budgets.
  • Management: control/telemetry (example: Q3–Q4) — keep it observable and bounded.

2) TAS schedule artifacts (GCL) + guard-band assumptions (document the math inputs)

  • Cycle time: Tcycle = X µs (placeholder) + rationale (control loop / camera frame / audio period).
  • Base time: define epoch alignment source (gPTP time domain) and update behavior on re-sync.
  • GCL length limit: record max entries supported and keep margin for future revisions.
  • Guard band inputs: line rate, max interfering frame size, and whether preemption is enabled.

3) Preemption interoperability checklist (Qbu / 802.3br) (treat as a link contract)

  • Explicitly mark Express vs Preemptable classes (avoid “everything preemptable”).
  • Record min fragment / overhead assumptions and expected counter behavior under stress.
  • Define a “fallback mode” for peers that do not support preemption (larger guard band, stricter BE shaping).

4) Observability field list (the minimum logs to make determinism debuggable)

  • gPTP: GM identity, BMCA role per port, asCapable, link delay estimate, time error/offset, servo state.
  • TAS: gate state transitions, schedule change events, out-of-window drop/mark counters (if available).
  • Queues: per-queue occupancy peaks, tail-drop counters, HOL indicators (vendor-specific), egress utilization.
  • Preemption: fragment/hold events, verify mismatch/denied events, CRC/error counters correlated to preemption enablement.
  • Versioning: firmware build ID, config CRC/hash, schedule ID, topology hash (port↔peer mapping).

Bring-up: minimal closed-loop verification (AS → Qbv → Qbu → stress)

Step 0 · Baseline sanity (no TAS, no preemption)

  • Link stable across temperature/voltage; verify packet loss = 0 under BE load.
  • Confirm consistent timestamp point (ingress vs egress) used by tools and logs.

Step 1 · 802.1AS (gPTP) only (make time boring before scheduling)

  • BMCA role stable; asCapable = true on intended TSN links; link delay estimates converge.
  • Pass criteria template: time error < X ns (steady), spike < Y ns (peak) over Z minutes.
  • If spikes exist: correlate to port role changes, link flaps, or CPU/interrupt bursts (do not touch TAS yet).

Step 2 · Enable Qbv on a single egress port + single scheduled queue (prove the window)

  • Start with a wide window and conservative guard band (preemption off).
  • Measure out-of-window frames = 0 per Z hours (placeholder).
  • Pass criteria template: worst-case egress latency < X µs, egress jitter < Y ns under a defined BE stress profile.

Step 3 · Enable Qbu / 802.3br (reduce blocking, then shrink guard band)

  • Enable preemption on the intended BE class first; keep critical traffic as Express.
  • Verify expected counter movement (fragment/hold) under BE bursts; verify no new CRC/error bursts.
  • Guard band tightening rule: reduce only after the link partner’s preemption capability is confirmed and counters remain stable.

Step 4 · Stress & regression (turn determinism into a repeatable test)

  • Repeat across: temperature corners, worst-case BE load, topology changes (one link down), and schedule updates.
  • Regression must compare: time error distribution, residence time jitter (if available), out-of-window = 0, and counter deltas.

Production: station-to-station correlation and “no-surprises” logging

Correlation rules (avoid “passes on ATE but fails in system”)

  • Define a golden topology + golden traffic profile used in every station (same frame sizes, rates, PCP map).
  • Record: config CRC, firmware ID, schedule ID, link partner ID, and the exact timestamp mode used by the station.
  • Any measurement must be paired with its sampling method (window, averaging, sync state) to prevent false “improvements”.

Production pass templates (placeholders)

  • time error < X ns (steady), peak < Y ns (over Z minutes)
  • out-of-window frames = 0 (per Z hours under BE stress profile “Profile-A”)
  • preemption verify mismatch events = 0 (per Z hours), CRC/error bursts = 0 (correlated)
  • queue tail-drop = 0 for critical queues; BE drops allowed only within defined policy limits
Diagram · Bring-up flow (AS → Qbv → Qbu → stress). Each step has an explicit pass criterion and a required log set.
TSN bring-up flow A four-step flow: Time sync, TAS schedule, Frame preemption, and Stress/verify, with pass criteria checkpoints. Bring-up Flow Make time stable → make windows correct → reduce blocking → prove budgets Step 1 · 802.1AS Pass time error < X ns Log BMCA / offset / delay Step 2 · 802.1Qbv Pass out-of-window = 0 Log gate events / drops Step 3 · 802.1Qbu Pass no new error bursts Log fragment/hold counters Step 4 · Stress Pass lat/jitter < budget Log queues / drops / env Regression rule Any change in schedule / mapping / preemption must re-run the same stress profile and compare distributions, not single numbers.

Applications & IC selection notes (TSN switch/bridge)

This section avoids “product catalogs” and focuses on selection logic: which deterministic requirement forces which TSN feature, which switch architecture class fits, and what must be verified to avoid schedule/time illusions.

Application buckets (switch/bridge-side requirements only)

Robotics / motion control cells

  • Hard requirement: bounded latency + bounded jitter across a known cycle time.
  • Switch focus: Qbv schedule stability (gate resolution + GCL depth) + solid observability (out-of-window detection).
  • Typical risk: “time is fine” in isolation, but schedule alignment drifts with load or temperature → log time error + gate events together.

Automotive domain / zonal networks (TSN inside the vehicle)

  • Hard requirement: deterministic control traffic coexisting with high-bandwidth flows.
  • Switch focus: Qbu preemption interoperability + safety/security features + stable gPTP behavior under topology changes.
  • Typical risk: preemption introduces counter anomalies or bursty errors on marginal links → correlate fragment events with CRC/error spikes.

Industrial camera / machine vision aggregation

  • Hard requirement: avoid queue bursts that smear deterministic windows.
  • Switch focus: QoS mapping + shaping/policing to contain BE/video bursts from collapsing scheduled traffic.
  • Typical risk: HOL blocking in shared resources → separate critical queues and cap burst sizes with shaping.

Converged AV / control (mixed periodic + best-effort)

  • Hard requirement: consistent delivery within windows while tolerating long BE transfers.
  • Switch focus: stable Qbv windows and predictable BE behavior (shaping + optional per-stream policing).
  • Typical risk: good averages hide rare out-of-window events → acceptance must be “out-of-window = 0 per Z hours”.

Selection dimensions (keep it deterministic, not “feature-rich”)

A) Must-haves that bind determinism

  • 802.1AS time base: stable role behavior + measurable time error.
  • 802.1Qbv TAS: gate resolution, max GCL entries, schedule update semantics.
  • 802.1Qbu / 802.3br: preemption compatibility and counter transparency (to debug “it breaks only under load”).
  • Optional but high-value (depending on risk): per-stream policing/filtering (e.g., Qci) to stop misbehaving flows from destroying windows.

B) Architecture fit (what class of silicon is actually needed)

  • Switch IC + external host: good when schedule is static and host already exists.
  • Switch with integrated CPU: simplifies management + diagnostics + protocol glue (common in industrial TSN gateways).
  • MPU/SoC with integrated TSN switch: chosen when TSN switching and application control must share one chip with tight integration.

C) Verification-driven selection (do not buy features that cannot be proven)

  • Require visibility for: time error/offset, gate events/out-of-window, queue occupancy, and preemption counters.
  • Define acceptance as distributions and “zero-event” conditions (out-of-window = 0), not averages.
  • Confirm schedule update behavior (what happens on gPTP re-sync, link flap, warm reboot).

Concrete part-number examples (reference shortlist, not a buying list)

The items below are common TSN-capable switch/bridge silicon families. Exact TSN standard coverage, port configurations, security features, and lifecycle vary by variant and ordering suffix; always confirm datasheets, package, temperature grade, and licensing.

Industrial TSN switch with integrated CPU (often simplifies bring-up + diagnostics)

  • Microchip LAN9662 — example ordering codes: LAN9662-I/9MX, LAN9662/9MX
  • Microchip LAN9668 — example ordering codes: LAN9668-I/9MX, LAN9668/9MX

Higher bandwidth / higher port-count TSN switch families (selection depends on ports/SerDes/line-rate)

  • Microchip LAN9694 — example ordering code: LAN9694-V/3KW
  • Microchip LAN9696 — example ordering code: LAN9696-V/3KW
  • Microchip LAN9698 — example ordering code: LAN9698-V/3KW

Automotive TSN switch SoCs (common in zonal/domain networks; safety/security often a driver)

  • NXP SJA1105 family — examples: SJA1105TEL, SJA1105TELY, SJA1105EL, SJA1105PQRS
  • NXP SJA1110 family — examples: SJA1110BEL, SJA1110CEL, SJA1110DEL (ordering suffixes vary, e.g. /1Y)
  • Marvell/Infineon BRIGHTLANE™ — examples: 88Q5050, 88Q5072, 88Q6113

Integrated TSN switch MPUs (when switching + control must live on one chip)

  • Renesas RZ/N2L — example orderable part numbers: R9A07G084M04GBG#AC0, R9A07G084M08GBG#AC0, R9A07G084M08GBA#AC0

Tip: for MPU-class devices, selection is typically gated by software ecosystem, real-time behavior, and the ability to expose gate/time/preemption diagnostics to production logs.

Diagram · Selection decision tree (requirements → must-have TSN features → architecture class → verification hooks).
TSN switch/bridge selection decision tree A top-down decision tree from determinism requirements to TSN features, silicon architecture class, and verification hooks. Start: determinism requirement time error / latency / jitter / ports Must-have TSN set 802.1AS + Qbv + Qbu/802.3br if needed Capacity & resources queues / buffers gate resolution / GCL depth Observability time error + gate events queue + preemption counters Switch IC + host static schedules existing CPU present Switch + CPU diagnostics heavy gateway / aggregator MPU/SoC w/ TSN switch + control tight integration Finalize only after a verification plan exists Define budgets → define hooks → run stress → accept by distributions and “zero-event” rules

Scope guard: the selection list above is limited to TSN switch/bridge silicon. PHY root-cause jitter, magnetics/EMC component selection, and protocol endpoint stacks belong to sibling pages.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (TSN switch/bridge) — troubleshooting closures

Each answer is intentionally short and executable. Format is fixed: Likely cause / Quick check / Fix / Pass criteria. Thresholds use placeholders (X/Y/Z/…) to be filled by the project’s acceptance budgets.

Time sync occasionally “jumps” — timestamp point mismatch or queue congestion (residence-time jitter)? 802.1AS · Metrics

Likely cause: the measurement uses inconsistent timestamp tap points (ingress vs egress), or congestion bursts increase residence time variance on one egress port.

Quick check: correlate time_error_peak timestamps with queue_watermark / egress_utilization; if spikes align with watermark surges, it is queueing; if spikes occur with flat watermark, suspect timestamp-point inconsistency.

Fix: enforce one timestamp mode across tools and ports (document tap point), cap BE bursts (rate limit/CBS where applicable), and ensure scheduled traffic cannot be delayed by BE in the same egress resource path.

Pass criteria: time_error_peak < Y ns over Z minutes, and queue_watermark < W% during the same interval (plus out_of_window_frames = 0 if Qbv is enabled).

Qbv is configured but latency is still unstable — what are the first three items to verify? Qbv · TAS

Likely cause: GCL base-time is not aligned to the active gPTP time domain, the PCP→queue map sends the critical stream into the wrong queue, or guard band is insufficient (blocking by a long frame).

Quick check: (1) confirm schedule_id/hash matches the intended GCL, (2) verify PCP→queue by mirroring or per-queue counters, (3) check out_of_window_frames and gate-event counters (must not increment).

Fix: align base-time to the correct domain (and document update behavior), correct classification/mapping, then increase guard band or enable Qbu to reduce blocking before shrinking guard band.

Pass criteria: out_of_window_frames = 0 per Z hours, and p99 egress latency < X µs with jitter < Y ns under stress profile “Profile-A”.

After enabling Qbu, occasional packet loss appears — interoperability, fragment config, or counter misread? Qbu · 802.3br

Likely cause: link partner does not support preemption (or mismatch in configuration), Express/Preemptable classification is wrong, or fragment-related behavior triggers error bursts on marginal links.

Quick check: read preemption_verify_status and verify_mismatch; correlate timestamps of fragment/hold events with CRC/error counters (burst correlation is the key signal).

Fix: ensure both ends support and agree on preemption; keep critical traffic as Express; make BE Preemptable; if errors correlate with preemption, disable it on that link and compensate with larger guard band + stricter BE shaping.

Pass criteria: verify_mismatch = 0 per Z hours, CRC/error_bursts = 0 correlated with fragment events, and out_of_window_frames = 0 under Profile-A.

Same network, different switch → much worse time accuracy — what is the first end-to-end correlation check? Correlation

Likely cause: the “new” switch differs in time domain, GM selection behavior, timestamp tap point, or schedule/mapping artifacts (making measurements non-comparable).

Quick check: confirm the same GM identity + domainNumber, the same timestamp mode/tap point, and the same config hashes (PCP→queue + GCL + preemption settings) under the same traffic profile.

Fix: normalize domain and GM priority, replicate schedule/mapping exactly (by hash), and re-run the same stress profile before comparing accuracy results.

Pass criteria: after normalization, Δ(time_error_peak) < ΔX ns and out_of_window_frames = 0 per Z hours under Profile-A.

Jitter shows up only under stress — how to separate “queueing under load” vs “clock drift / timestamp jitter”? Metrics · Stress

Likely cause: either queue occupancy rises (load-induced waiting time), or time base quality degrades under load (servo disturbance, CPU contention, or timestamp-path inconsistency).

Quick check: during stress, log queue_watermark, out_of_window_frames, and time_error_peak on the same timeline; queue-driven jitter tracks watermark, clock-driven jitter tracks time-error spikes.

Fix: if queue-driven: tighten BE shaping, enable Qbu, and adjust TAS windows/guard band; if clock-driven: reduce CPU load on timing path, verify HW timestamps are used, and validate stable clock input/role behavior.

Pass criteria: p99 jitter < J ns, time_error_peak < Y ns over Z minutes, and out_of_window_frames = 0 per Z hours under Profile-A.

Scheduled stream is occasionally squeezed out by Best-Effort — PCP mapping error or gate/shaper ordering? QoS · Qbv

Likely cause: critical frames are classified into the wrong queue (PCP/DSCP mapping), or the scheduled queue is not truly isolated in the egress pipeline (gate/shaper mis-order or shared bottleneck).

Quick check: mirror ingress to confirm PCP on the wire, then confirm the frame increments the intended per-queue counter; if scheduled frames land in BE queues, it is mapping; if mapping is correct, check out_of_window_frames and queue drops for the scheduled queue.

Fix: correct PCP→queue mapping, ensure TAS gate controls the scheduled queue, and rate-limit BE to prevent burst occupancy from dominating shared resources; apply per-stream policing only if misbehaving flows exist.

Pass criteria: scheduled_queue_drops = 0, out_of_window_frames = 0 per Z hours, and BE utilization capped to < U% under Profile-A.

Guard band becomes large and throughput drops — how to shrink guard band using preemption? Qbv · Qbu

Likely cause: guard band is sized for worst-case blocking by a maximum-length BE frame because preemption is disabled or not supported on that link.

Quick check: compute blocking time: t_block = (max_frame_bits / line_rate); verify preemption_verify_status is supported and stable for the link partner.

Fix: enable Qbu/802.3br, keep critical traffic Express, set BE as Preemptable, then reduce guard band to the preemption-safe margin (fragment + overhead + safety). Validate with stress before further shrinking.

Pass criteria: throughput ≥ T% of line rate under Profile-A while out_of_window_frames = 0 per Z hours and verify_mismatch = 0.

Multiple time domains are mixed — what is the first config / message signature to check? 802.1AS

Likely cause: different ports follow different gPTP domains (domainNumber mismatch), or BMCA selects different GMs due to priorities/announce visibility, breaking deterministic schedule assumptions.

Quick check: capture gPTP messages and confirm domainNumber and GM identity are consistent across TSN ports; check whether BMCA role changes occur around the time errors.

Fix: enforce a single domain for the deterministic segment, isolate non-TSN timing via VLAN/port segmentation, and explicitly manage GM priorities so deterministic ports converge to the intended GM.

Pass criteria: GM identity and domainNumber remain unchanged for Z hours, with time_error_steady < X ns and time_error_peak < Y ns.

Port statistics look fine, but the application times out — what counters/mirroring should be added? Hooks · Verify

Likely cause: rare out-of-window events or tail-latency bursts occur without obvious drops; plain port RX/TX counters miss gate/shaper/preemption events that break determinism.

Quick check: enable or export (1) out_of_window_frames, (2) gate-event mismatch/schedule-change counters, (3) per-queue watermark and tail-drop counters; add egress mirroring for the scheduled queue to observe timing/spacing.

Fix: treat determinism as event-driven: keep “zero-event” counters in production logs, adjust windows/guard band based on observed tail events, and cap BE burstiness so tail events do not accumulate.

Pass criteria: out_of_window_frames = 0 per Z hours, and p99.9 latency < X µs under Profile-A (not just average).

After temperature changes, accuracy degrades — which environment fields must be logged to reproduce it? Production

Likely cause: timing quality is temperature-sensitive (clock tree/source/servo interactions), or load/airflow changes create time-error spikes that appear as determinism loss.

Quick check: log these fields on the same timeline as time error: board_temp, VDD, fan_rpm, port_utilization, plus BMCA_role_changes if any.

Fix: qualify determinism across temperature corners with the same stress profile; if time error correlates with temp/VDD, improve clock source stability/compensation and isolate timing paths from noisy load transitions.

Pass criteria: across Tmin..Tmax, time_error_steady shift < X ns and time_error_peak < Y ns, with out_of_window_frames = 0 per Z hours under Profile-A.

Switch shows “sync locked” but end-to-end offset is large — check residence time first or GM selection? 802.1AS · Topology

Likely cause: “locked” indicates local convergence, not correctness; large E2E offset often comes from GM selection changes, domain mix, or incorrect correction/residence handling on the path.

Quick check: first confirm GM identity + domainNumber stability; then inspect per-hop timing contribution signals (correction/residence reporting if available) and directional link delay estimates for asymmetry.

Fix: stabilize GM priority and domain boundaries; ensure the bridge mode (boundary/transparent behavior) matches the intended deployment; re-verify with the same timestamp mode and a fixed traffic profile.

Pass criteria: GM identity stable for Z hours and E2E offset < X ns, with time_error_peak < Y ns under Profile-A.

Configuration is identical but one board batch is unstable — which three production log fields are missing? Production · Correlation

Likely cause: “same config” hides non-config differences: firmware build, timestamp tap/mode, topology/peer identity, or environment; without these fields, station-to-station correlation fails.

Quick check: add and compare these three fields for pass vs fail units: (1) firmware_build_id, (2) config_hash (GCL + mapping + preemption), (3) timestamp_mode + peer/topology_id.

Fix: lock down straps/modes in production, enforce a single timestamp tap point, run a station-to-station correlation test using the same stress profile, and quarantine the batch until fields match and KPIs stabilize.

Pass criteria: correlation fields match across stations (100%), and KPI failures fall below P PPM while out_of_window_frames = 0 per Z hours under Profile-A.