TSN Switch / Bridge for Deterministic Industrial Ethernet
← Back to: Industrial Ethernet & TSN
A TSN switch/bridge makes Ethernet deterministic by turning traffic into scheduled, admitted, and measurable streams—so latency, jitter, and loss stay within provable bounds under worst-case load.
This page explains the mechanisms (time-aware gating, shaping, admission control, hardware timestamping, and observability) and how to configure and verify them so “works in the lab” becomes repeatable in production and field deployments.
Definition & Scope Guard: What “TSN Switch/Bridge” Covers
Working definition
A TSN switch/bridge is a forwarding node that enforces deterministic behavior using hardware time windows, hardware timestamps, traffic shaping, and admission control, so latency/jitter/loss have explicit bounds and are verifiable.
Role in the system
- Isolation: separates time-aware flows from best-effort traffic to remove queueing uncertainty.
- Bounded behavior: turns “average performance” into a worst-case bound via gates/shapers/admission.
- Proof hooks: exposes measurable points (timestamps/counters/events) to validate deterministic guarantees.
“Deterministic” constrains three outcomes (plus time error)
1) Latency bound
One-way end-to-end worst-case delay is bounded (not just mean/P95).
2) Jitter bound
Packet delay variation is bounded (queue jitter + gate timing + timestamp noise).
3) Loss bound
Drop/late-drop/police-drop are kept below a defined limit under admitted load.
+ Time error
Offset/drift/holdover events shift window alignment and directly affect worst-case behavior.
When a TSN switch/bridge is needed (vs. a managed switch)
- A hard deadline exists (missed deadline is a functional failure, not a “slower UI”).
- Background load changes over time, so queueing jitter must be removed or bounded.
- Multiple endpoints require time coordination (scheduled windows / deterministic triggering).
If traffic is best-effort only, deadlines are soft, or critical flows run on a dedicated link, a managed switch with VLAN/QoS is often sufficient.
Scope Guard (strict)
In-scope (covered on this page)
- Hardware time windows: gating concepts, schedules, guard bands (concept-to-budget).
- Hardware timestamps: tap points, error terms, validation hooks.
- Shaping: controlling burstiness to keep worst-case bounds intact.
- Admission control: ensuring new flows cannot break existing guarantees.
- Verification metrics: counters/events/tests to prove bounds (X placeholders).
Out-of-scope (link only, no deep dive)
- PHY / magnetics / ESD / surge / layout — physical-layer integrity and protection. PHY Co-Design & Protection
- PoE / PoDL — power delivery and thermal/protection co-design. PoE / PoDL
- PROFINET / EtherCAT / CIP stack details & certification — endpoint protocol/stack domain. Industrial Ethernet Stacks
- Ring redundancy (MRP/HSR/PRP) — topology redundancy and switchover mechanics. Ring Redundancy
System map: time-aware vs best-effort flows across a TSN switch/bridge
The diagram highlights the only job of this page: enforce and verify bounds inside the TSN switch/bridge (gates, shaping, timestamps, admission). Physical layer, power, stacks, and redundancy are referenced but not expanded.
Determinism Goals & Key Specs: Latency, Jitter, Loss, and Time Error
What this section locks down
- A measurable definition of deterministic outcomes (bounds, not averages).
- A common dictionary so all later chapters reuse the same metric meanings.
- A budget template that decomposes one-way E2E delay into controllable blocks.
One-way E2E latency decomposition (switch/bridge-centric)
For deterministic design, the worst-case bound is driven by a small set of blocks. The most common dominating term is gate waiting (missing a window) or queueing under burst.
- Ingress processing: parse/classify; becomes a fixed per-hop constant.
- Gate waiting: time until the next open window; worst-case can approach one cycle if a window is missed.
- Queueing + shaping delay: depends on burstiness and shaper parameters; bounded by admission + isolation.
- Fabric + egress scheduling: forwarding latency inside the switch and port scheduling effects.
- Serialization + propagation: hard lower bounds set by frame length, link rate, and cable length.
Jitter taxonomy (useful for verification and triage)
Queueing jitter
Variation caused by competing traffic and transient bursts; addressed by isolation, shaping, and admission.
Gate timing jitter
Window boundary uncertainty (schedule alignment, guard band, internal switching granularity).
Timestamp noise
Measurement noise from timestamp tap points and quantization; fixed by consistent tap definition and calibration.
Clock/time error
Offset/drift/holdover events shift schedules and can cause sporadic worst-case violations even when averages look good.
Metric Dictionary (standardized terms)
| Metric | Definition | Obs. point | Common misuse | Pass criteria |
|---|---|---|---|---|
| One-way E2E latency (worst-case) | Maximum one-way delay under admitted load | Endpoint timestamps | Using mean/P95 as “bound” | ≤ X |
| Per-hop residence time | Ingress-to-egress time inside a switch/bridge | Switch HW timestamp/counter | Mixing store-forward and cut-through paths | ≤ X |
| Gate waiting time | Time from arrival to next open window | Ingress + schedule timeline | Ignoring missed-window worst-case | ≤ X |
| PDV (delay variation) | Peak-to-peak or percentile spread of one-way delay | Endpoint timestamps | Comparing different timestamp tap points | ≤ X |
| Window miss count | Packets arriving too late/early for a configured window | Switch gate counters | Assuming “no drops” implies “no misses” | ≤ X / hour |
| Drop breakdown | Drops by queue / policer / buffer watermark | Per-queue counters | Only tracking CRC errors | ≤ X |
| Timestamp error | Tap-to-wire (or wire-to-tap) uncertainty | Calibrated test | Comparing uncalibrated devices | ≤ X ns |
| Time base events | Lock transitions / holdover enters / time step | Time-sync status/event log | Treating “locked” as “always stable” | 0 critical events |
All later chapters should reference these metrics by name. If a test cannot map to a metric above, it is likely not proving determinism.
E2E latency budget template (per-hop block view)
| Block | Worst-case bound | How to measure | Owner | Risk flag |
|---|---|---|---|---|
| Ingress processing | ≤ X | Switch counters / profiling | Switch config | Unexpected parsing path |
| Gate waiting | ≤ X (can approach one cycle) | Schedule timeline + gate counters | TSN schedule | Missed window / drift |
| Queueing + shaping | ≤ X | Per-queue occupancy / shaper stats | Shaping + admission | Microburst / oversubscription |
| Fabric + egress scheduling | ≤ X | Ingress/egress timestamps | Switch architecture | Unexpected store-forward path |
| Serialization | Frame_len / link_rate | Known constants | System spec | MTU growth |
| Propagation | Cable_len / velocity | Cable model / measurement | System design | Topology change |
Any deterministic claim should identify the dominant bound term and show how it is measured (or proven) under admitted load.
Latency budget waterfall: ingress-to-egress blocks and measurement points
The diagram is a repeatable template: each block should map to a measurement method and an owner. If a block cannot be measured or bounded, the design is not deterministic.
TSN Feature Map: Which Mechanism Solves Which Problem
Purpose (mechanism-first, not standards-first)
Determinism is achieved by choosing mechanisms that directly control the dominant term in the worst-case budget. This section maps each mechanism to the outcomes it can guarantee, the cost it introduces, and the scenarios where it is typically used. IEEE clause-level details and certification workflows are intentionally excluded to avoid overlapping with standards/certification pages.
Fast triage: pick the mechanism by the symptom
Hard deadline misses
Use Time-aware gating to bound waiting; add guard band / preemption when long frames overlap window edges.
PDV / jitter grows under load
Use Shaping (burst control) and admission control (resource cap); add policers to isolate misbehaving sources.
Looks “fine” but cannot be proven
Require hardware timestamping and per-queue counters/events; without observability, deterministic claims are not verifiable.
Mechanism-to-Outcome mapping (what each block actually guarantees)
| Mechanism | Primary outcome | Dominant term controlled | Cost / overhead | Typical use | Misuse symptom |
|---|---|---|---|---|---|
| Time-aware gating (Gate) | Bounded latency (windowed service), deterministic scheduling | Gate waiting (missed window dominates worst-case) | Schedule management (GCL), guard bands, time alignment dependency | Motion control, PLC cyclic traffic, deterministic triggers | Sporadic deadline misses; “great average, bad worst-case” |
| Traffic shaping (Shaper) | Bounded jitter under burst; predictable queue growth | Burstiness → queueing (microbursts / overshoot) | Added controllable latency; parameter tuning and validation effort | Imaging streams, mixed cyclic/acyclic networks, gateways | Jitter spikes when background load changes |
| Policing (Policer) | Isolation from misbehaving sources; protects deterministic domain | Ingress overload (rate/burst violations) | Drops/marking; requires clear violation definitions and counters | Multi-vendor integration, field expansion, mixed trust domains | “Random” drops without per-flow accountability |
| Admission control (Admission) | No starvation under admitted set; prevents future bound breakage | Resource cap (bandwidth/slots/queues/buffers) | Requires resource model + workflow (approve/rollback/versioning) | Scalable cells, factories with add-on nodes, shared infrastructures | Works until “one more device” is added, then bounds collapse |
| Hardware timestamp (Timestamp) | Proof and calibration; enables per-hop residence time measurement | Measurement credibility (tap-point consistency) | Requires clear tap definition + error budget + event logging | Any deterministic validation, multi-hop timing, commissioning | Conflicting measurements across tools/devices |
The mapping avoids standard clause references and focuses on engineering outcomes: which dominant term is controlled and how misconfiguration typically manifests.
Typical “mechanism recipes” by application (quick selection)
| Application | Gate | Shaper | Guard/Preempt | Admission | Timestamp |
|---|---|---|---|---|---|
| Motion control | ✓ | ✓ | ✓ | ✓ | ✓ |
| Imaging / vision | ✓ | ✓ | — | ✓ | ✓ |
| PLC cyclic + diagnostics | ✓ | ✓ | — | ✓ | ✓ |
| Robot cell (multi-axis) | ✓ | ✓ | ✓ | ✓ | ✓ |
The recipe matrix is a starting point. The final choice should be driven by the dominant worst-case term and the verification hooks available on the chosen switch/bridge.
TSN toolbox: five blocks and the outcomes they enable
The toolbox view emphasizes mechanism selection by outcome. Each block must map to measurable metrics and counters; otherwise deterministic claims cannot be validated.
Hardware Architecture: Ingress → Classify → Queue → Shape → Gate → Egress
Why the internal pipeline matters
Determinism is implemented inside the switch/bridge pipeline. The location of classification, queueing, shaping, gating, and timestamp taps determines what can be bounded and what can be measured. This section describes the data path and the time path without expanding into PHY signal integrity.
Typical data path (conceptual pipeline)
- Ingress parse/classify: maps frames to traffic class/queue/stream state.
- Per-stream state: eligibility, policing, accounting, and shaping context.
- Queues (per port / per TC): isolation boundary; occupancy drives worst-case delay.
- Shapers/policers: bound burst input and protect deterministic domain.
- Gate scheduler: time-aware service windows and guard-band handling.
- Egress scheduling: final arbitration and transmit timing.
Cut-through vs store-and-forward (mechanism-level implications)
Cut-through
- Lower typical latency, but worst-case must account for internal stalls and path variability.
- Determinism requires measurable per-hop residence time and clear tap-point definition.
Store-and-forward
- Includes frame-length dependent delay, but the forwarding model is often easier to bound.
- Worst-case is dominated by queueing/gating rather than hidden internal variability when counters are available.
Hardware knobs checklist (selection and bring-up)
| Knob | Why it matters | How to verify | Red flag |
|---|---|---|---|
| Queue / TC count | Sets isolation granularity and scheduling expressiveness | Map flows to queues; confirm no unintended sharing | Key flows share a queue with best-effort |
| Per-port buffer + watermarks | Microburst tolerance and drop behavior; impacts worst-case delay | Stress with bursts; watch occupancy and drop reason counters | Drops occur without accounting detail |
| Gate list depth + granularity | Limits schedule complexity and smallest achievable windows | Implement a representative GCL; verify window miss counters | Cannot express required slots/guards |
| Timestamp tap points + resolution | Determines credibility of residence time and E2E measurements | Confirm ingress/egress timestamp availability and calibration hooks | Tap points are undocumented/inconsistent |
| Per-queue counters/events | Enables proof: drops, late-to-window, shaper eligibility, policing | Check counter coverage; validate with controlled faults | Only global counters exist |
| Telemetry / mirror / event log | Supports field forensics and configuration drift detection | Verify event logs (time base step/lock change) and export path | No reliable black-box signals |
Where to measure (conceptual observation points)
| Observation point | Typical signal | Proves | Common pitfall |
|---|---|---|---|
| Ingress timestamp (T_in) | Hardware timestamp / ingress marker | Per-hop residence time (with T_out) | Tap point mismatch across devices |
| Queue occupancy | Per-queue depth / watermark | Queueing/jitter dominance under load | Only global buffer metrics available |
| Gate miss / late-to-window | Window miss counters/events | Schedule correctness and guard band adequacy | Assuming “no drops” means “no misses” |
| Shaper eligibility | Credit/state counters | Burst control and stability under background load | Misinterpreting shaped delay as failure |
| Egress timestamp (T_out) | Hardware timestamp / egress marker | Residence time + output scheduling effects | Using software timestamps for proof |
Observation points should map to the metric dictionary and the E2E budget blocks. If a critical bound term cannot be measured, deterministic validation is incomplete.
Switch pipeline block diagram: data path and time path (where determinism is enforced)
The diagram separates the data path (frames) from the time path (schedule/sync). Determinism is enforced at queues, shapers/policers, and gates, and proven via consistent timestamp tap points and per-queue counters.
Time-Aware Windows: Gate Control, Schedules, Guard Bands, and Preemption
What this section guarantees (mechanism-level)
Time-aware gating bounds worst-case waiting by turning contention into a scheduled service window. The engineering focus is: how to derive a gate schedule from periodic traffic, why guard bands are required, when preemption becomes necessary, and what multi-hop alignment must ensure. PTP algorithms and standard clause-level explanations are intentionally excluded to avoid overlapping timing/sync pages.
Bound model: what a gate window actually caps
- Window waiting cap: if a queue is closed, the worst-case waiting is bounded by one cycle plus guard-band effects (conceptually).
- Queueing vs scheduling: shaping bounds burst-driven queue growth; gating bounds service timing by design.
- Proof hooks: “late-to-window” / “gate-miss” counters and consistent timestamp tap points are required to verify the bound.
Deriving a gate schedule from periodic traffic (practical workflow)
Inputs required (no protocol details)
- Cycle time: X (control period / sync epoch).
- Per-flow envelope: period, max frame size, deadline, queue/TC mapping.
- Port constraints: link rate, queue count, gate switching overhead X, GCL depth X.
- Policy: how much best-effort bandwidth is allowed without harming key windows.
Step-by-step schedule build
- Group critical flows by deadline and isolation needs; assign them to dedicated queues/TCs where possible.
- Choose a shared epoch (cycle start) that aligns with the system’s control rhythm and commissioning reference.
- Allocate time slots per critical queue so that the slot capacity covers the per-cycle payload at line rate with margin X.
- Insert guard bands at window edges to prevent boundary infringement by non-critical frames.
- Place best-effort windows in remaining space and restrict them by shaping/policing as needed.
- Validate the schedule with “late-to-window / gate-miss” counters and per-hop residence time measurements.
Gate Control List (GCL) template (commissioning-ready)
| Cycle | Slot | Start (offset) | Duration | Gate state (Q0..Qn) | Guard band | Notes |
|---|---|---|---|---|---|---|
| T = X | S0 | 0 | X | Q0:OPEN, others:CLOSED | GB = X | Critical window |
| T = X | S1 | X | X | Q1:OPEN, others:CLOSED | GB = X | Secondary window |
| T = X | S2 | X | X | BE:OPEN (shaped), Q0/Q1:as needed | GB = X | Best-effort window |
| Repeat | S3.. | … | … | … | … | Version / rollback id |
Recommended commissioning practice: version the GCL, define a rollback plan, and validate per-hop residence time and window-miss counters before production rollout.
Guard-band quick estimation (rule block)
Minimum guard band (structural form)
GB ≥ (MaxFrameTime at line-rate) + (Gate switching overhead X) + (Margin X)
- MaxFrameTime depends on maximum non-critical frame size X and link speed X.
- Switching overhead includes implementation latency when changing gate state (X).
- Margin covers clock error, timestamp uncertainty, and measurement bias (X).
Fast symptom check (guard band too small)
- Late-to-window / gate-miss counters increase in a periodic pattern.
- Failures correlate with large best-effort frames near window edges.
- Reducing best-effort MTU or adding preemption reduces misses immediately.
When preemption is necessary (decision logic + trade-offs)
| Trigger | Why guard band alone is inefficient | Expected gain | Cost / risk |
|---|---|---|---|
| Narrow windows + large BE frames | Guard band must cover near-max frame time, wasting slot capacity | Smaller guard band, better window utilization | Higher configuration/validation complexity |
| High utilization deterministic schedule | Static gaps accumulate and reduce usable bandwidth | More payload per cycle without violating window edges | Debugging and forensics become harder |
| Strict window-edge integrity must be proven | Without preemption, BE overlap risk must be eliminated by oversized guard bands | Cleaner proof of “no overlap” with proper counters | Requires consistent capability across the path |
Multi-hop note (no sync algorithm details): window phase alignment must ensure the flow’s arrival time lands inside the intended window at every hop with margin X.
Time wheel + gate timeline: cycle, slots, guard band, and preemption boundary
The timeline view highlights the non-obvious constraint: window edges must be protected from overlap. Guard bands and preemption are the primary tools to preserve edge integrity without sacrificing excessive slot capacity.
Not covered here (scope guard)
- PTP/SyncE algorithm internals and topology selection logic.
- Clause-by-clause IEEE interpretations and certification procedures.
Shaping for Determinism: CBS / ATS / Rate-Limit and Queue Discipline
When gating is not required
Many systems achieve sufficient determinism without explicit time windows by bounding burstiness and isolating queues. This section explains how shaping reduces queue growth and packet delay variation, where stability traps occur, and how to decide when a gate schedule becomes mandatory.
Mechanism comparison (what each shaper controls)
CBS
- Best for: steady or near-periodic streams needing bounded PDV.
- Controls: output smoothness via eligibility/credit dynamics.
- Typical pitfall: credit reset/initialization causes periodic “mystery jitter”.
ATS
- Best for: mixed or bursty traffic requiring per-flow timing discipline.
- Controls: burst spreading via time-based eligibility scheduling.
- Typical pitfall: queue coupling hides microbursts unless per-queue telemetry exists.
Rate-limit
- Best for: ingress protection and simple fairness, not strict deadlines.
- Controls: peak rate/burst envelope to prevent overload.
- Typical pitfall: “average looks OK” while microbursts still hit watermarks.
Shaper stability traps (symptom → quick check → fix)
Microburst dominance
Quick check: correlate PDV spikes with queue watermarks/occupancy peaks (not averages).
Fix: tighten ingress policing, isolate queues, and set buffer watermarks X with telemetry.
Credit/state discontinuities
Quick check: jitter spikes repeat with a fixed cadence tied to state resets or link events.
Fix: validate shaper initialization, avoid hidden resets, and log eligibility state transitions.
Queue coupling
Quick check: key flows share a queue/TC with best-effort or diagnostics traffic.
Fix: dedicate queues for key classes, separate buffer pools where supported, and enforce admission rules.
Shaper selection decision tree (gate vs CBS vs ATS vs rate-limit)
- If the requirement is a hard deadline bound at specific cycle phases → prefer Gate (and guard/preempt as needed).
- If traffic is near-periodic and the target is bounded PDV under load → choose CBS with strict queue isolation.
- If traffic is bursty/mixed and needs controlled eligibility timing → choose ATS and verify per-flow/queue telemetry.
- If the goal is ingress protection and preventing overload rather than strict determinism → apply rate-limit/policer plus admission.
- If adding one more stream risks breaking existing bounds → enforce admission control regardless of shaper choice.
Microburst control checklist (field-stable determinism)
- Ingress rate guard: define burst envelope and enforce with policing.
- Queue isolation: keep key flows away from best-effort and diagnostics queues.
- Buffer watermarks: set early warning thresholds X and log occupancy peaks.
- Telemetry: require per-queue occupancy peak, drop reason, eligibility/credit state.
- Admission: prevent incremental additions from silently breaking bounds.
Queue + shaper behaviors: no shaping vs CBS vs ATS (occupancy over time)
The panels focus on occupancy peaks rather than averages. In practice, microburst control must be validated with per-queue occupancy peaks, watermarks, and eligibility/credit telemetry.
Not covered here (scope guard)
- Protocol stack specifics, parameter tables for particular industrial profiles, and certification workflow details.
- PHY/magnetics/layout/EMI topics (handled in the PHY co-design and protection pages).
Admission Control & Isolation: Keep the Network from Becoming a Tuning Lottery
Why admission exists (determinism needs contracts)
Gates and shapers control forwarding behavior, but without admission the guarantees are eventually broken by new flows, new tenants, or untracked diagnostics traffic. Admission turns determinism into an enforceable contract by making bandwidth, queues, gate slots, buffers, and time budget explicit and versioned.
Admission input → decision → contract output
Inputs (flow descriptor)
- Period: X
- Max frame: X
- Priority / TC: X (and queue mapping)
- Path: ingress port → … → egress port(s)
- Bound required: latency X / jitter X / loss X
- Burst policy: allowed / disallowed (X)
Outputs (decision + reservation contract)
- Decision: Admit / Reject / Admit-with-conditions
- Reserved resources per hop: slot time X, shaper rate X, buffer quota X
- Isolation mapping: dedicated queue/TC + counters scope
- Monitoring hooks: thresholds X + violation actions
- Versioning: config id + rollback pack
Resource dimensions admission must account for
- Bandwidth: per-cycle service capacity and sustained load margins (X).
- Queues & priorities: queue count, TC mapping, and coupling risk from shared queues.
- Gate slots: available slot time, phase margin (X), and GCL depth (X).
- Buffers: per-port buffers, shared pools, watermarks (X), and drop reasons.
- Time budget: per-hop residence time tail + window waiting cap + guard band margin (X).
Common failure pattern: “average bandwidth looks fine” while slot occupancy, buffer peaks, or window-miss counters silently cross the determinism contract.
Admission worksheet (template)
| Flow ID | Owner | Period | Max frame | Priority/TC | Path | Bound target | Reservation per hop | Monitor & thresholds | Admit? |
|---|---|---|---|---|---|---|---|---|---|
| F-001 | Line-A | X | X | TC0 / Q0 | P1→P3→P7 | Latency X, Jitter X | slot X, shaper X, buf X | wm X, miss X, drop X | Admit |
| F-002 | Diagnostics | X | X | BE / Qx | P2→P6 | Loss X | policer X, buf X | police-hit X, drop X | Conditional |
| … | … | … | … | … | … | … | … | … | Version |
Operational rule: any new flow must be recorded, evaluated, and admitted with a versioned contract; direct “add it and see” changes convert determinism into a tuning lottery.
Isolation rules (principles that prevent configuration collisions)
Principles
- Critical classes must have dedicated queue/TC mapping and independent bounds.
- Best-effort and diagnostics must never share a critical queue.
- Every admitted class must have scoped counters and thresholds (X).
Enforcement points
- Namespace configs by tenant / line / cell to avoid cross-edits.
- Lock “critical zones” (queues/slots) behind approval and rollback packaging.
- Assign hard quotas per tenant: bandwidth/slot/buffer/counter budgets (X).
Runtime monitoring & violation handling (contract must be provable)
- Monitor: queue occupancy peak, watermark hits, window-miss/late counters, drop reasons, policer hits.
- Detect: threshold breach (X) tied to a specific flow/tenant and port.
- Act: rate-limit, downgrade to BE, isolate to a sacrificial queue, or reject renewal.
- Audit: log config version + violation context for rollback and root-cause.
Admission pipeline: request → resource calc → policy → deploy → monitor → handle violations
The key requirement is traceability: admission decisions, reserved resources, and violation actions must all be tied to a versioned configuration and an auditable owner/tenant boundary.
Not covered here (scope guard)
- Industrial protocol stack parameters and certification procedures.
- Timing/sync algorithm internals (handled in Timing & Sync pages).
Hardware Timestamping & Time Base: Accuracy, Two-Step, and Residence Time
Why timestamps must be trustworthy
Deterministic forwarding still fails closed-loop control if timestamp tap points drift or time base states are not observable. This section focuses on hardware timestamp insertion points, residence time observability, one-step vs two-step engineering trade-offs, and a practical error budget structure.
Time base sources and observable states (no algorithm details)
- Local oscillator: simple deployment; requires drift monitoring and defined holdover triggers (X).
- Synchronized reference: better long-term accuracy; requires lock/offset/drift observability.
- Recovered/servo time: transitions into holdover on loss; must define re-lock behavior and alarm thresholds (X).
Minimum observables (sanity signals)
- Lock state (stable for X time)
- Offset (within X) and drift rate (within X)
- Holdover entry conditions and maximum allowed duration (X)
Timestamp tap points: where error is created
- Ingress timestamp: captures arrival into the switch pipeline; determines start of residence time.
- Egress timestamp: captures departure; determines the tail behavior under congestion and shaping.
- MAC vs PHY boundary: a more line-side tap reduces unmodeled delay; an inner tap is easier but adds tap uncertainty (X).
Residence time and one-step vs two-step (engineering view)
Residence time
The measurable quantity is the time spent inside the device from ingress tap to egress tap. Verification should focus on tail percentiles (P99/P999) with thresholds X, not only averages.
One-step vs two-step
- One-step: in-line update at line rate; demands tight implementation and validation.
- Two-step: follow-up event simplifies implementation; requires robust event matching and consistency checks.
- Debug impact: two-step can improve forensics by separating transmission and correction paths.
Timestamp error budget (structure, thresholds as X)
| Error term | What it represents | How to observe | Threshold |
|---|---|---|---|
| Time base | offset/drift/holdover effects | lock/offset/drift counters | X |
| Quantization | timestamp resolution limit | timestamp LSB and distribution | X |
| Tap uncertainty | MAC/PHY boundary ambiguity | calibration + loopback compare | X |
| Queueing tail | residence time percentile tail under load | P99/P999 residence time (X) | X |
Pass criteria suggestion: treat time base and tap uncertainty as bounded terms, and validate queueing tail with a load profile representative of production (thresholds X).
Sync sanity checklist (commissioning & field ops)
- Lock: stable for X time; no flapping.
- Offset: within X; verify after temperature steps.
- Drift: within X per unit time; flag trend increases.
- Holdover: trigger conditions defined (X), and maximum allowed duration (X).
- Recovery: post re-lock behavior verified; re-calibrate window phase if required.
Timestamp tap points: ingress/egress (MAC vs PHY boundary) and error sources
Practical verification: document the exact tap definition, treat residence time as a percentile distribution, and require time base lock/offset/drift observability with clear holdover triggers (X).
Not covered here (scope guard)
- PTP/SyncE/White-Rabbit protocol algorithm details and topology selection.
- PHY signal integrity and magnetics/layout considerations.
Configuration & Parameterization: GCL Tables, Profiles, and Guardrails
Why configuration must be engineered (not “tuned”)
TSN failures often come from table drift, partial updates, and unmanaged profiles rather than incorrect theory. A deterministic network needs configuration assets that are packaged, validated, staged, committed atomically, and rolled back reliably.
Configuration inventory: the tables a TSN switch/bridge must manage
Data-plane tables
- GCL: cycle, slots, gate masks, guard band (X).
- Stream/flow table: period, max frame, path, class, burst policy.
- Queue/TC mapping: priority → TC → queue; no holes or critical sharing.
- Shaper params: CBS / ATS / rate-limit knobs aligned to the chosen profile.
- Admission quotas: per-hop reservations and violation actions.
Time & timestamp configuration
- Time-base policy: lock/offset/drift/holdover thresholds (X).
- Timestamp mode: one-step/two-step, tap definition, residency measurement enable.
- Update gating: no schedule commit when time base is not locked.
Ops guardrails
- Thresholds & alarms: gate miss, late/early window, watermark, timestamp jump, holdover events.
- Rollback strategy: shadow bank + atomic commit + complete rollback pack.
- Audit: every applied profile ties to a version id and change record.
Parameter Set package: directory rules for a versioned config bundle
- /manifest.json — version, target device, port-rate constraints, prerequisites, checksums.
- /tables/gcl.csv — cycle, slot list, gate masks, guard band (X).
- /tables/queue_map.json — priority/TC/queue mapping rules.
- /tables/streams.csv — flow descriptors, path and bound targets.
- /tables/shapers.json — CBS/ATS/rate-limit parameters.
- /tables/quotas.json — admission reservations and violation actions.
- /ops/thresholds.json — alarms and thresholds (X).
- /ops/rollback/ — complete rollback pack (all dependent tables).
- /checks/ — offline guardrails: slot min, GCL depth, map integrity, time-lock gates.
Design rule: configuration is not a scattered set of register writes; it is a deployable asset with dependencies, validation, and rollback semantics.
Safe change process: avoid partial updates that “hang” the network
Recommended sequence
- Offline validate: manifest integrity + dependency checks.
- Stage: load into shadow bank (not active).
- Pre-check: time base lock/offset/drift within thresholds (X).
- Atomic commit: switch at a defined boundary (e.g., cycle edge).
- Canary rollout: a subset of nodes first; watch gate/window/queue tails.
- Rollback-ready: auto revert if key counters exceed thresholds (X).
- Audit: bind events and snapshots to config version.
Typical failure mode
A schedule update becomes unsafe when only part of the dependency set is applied (GCL updated, but queue map or shaper parameters remain old). Guardrails should enforce “all-or-nothing” activation or reject the commit.
Guardrails: rules that prevent unsafe parameter sets
- Slot minimum: slot ≥ X (depends on max frame, link rate, and gate switch latency).
- Guard band presence: critical window transitions reserve guard band X.
- GCL depth: entry count ≤ X; reject truncated lists.
- Map integrity: TC→queue mapping has no holes; critical queues are not shared.
- Time-lock gate: no commit when time base is unlocked or in holdover.
- Rollback completeness: rollback must include all dependent tables (GCL + map + shapers + thresholds).
When changes require re-computation (which tables must be re-validated)
| Change event | Tables impacted | Must re-check |
|---|---|---|
| Time topology change | time policy, timestamp mode, guard band margin | lock/offset/drift (X), window phase margin (X) |
| Path / hop change | streams, quotas, queue map | per-hop reservation, isolation integrity, tail percentiles (X) |
| Port rate change | GCL slot lengths, guard band, shapers | slot minimum (X), serialization budget, window boundary tests |
| Queue resource change | queue map, GCL masks, shaper policies | no holes, no critical sharing, watermark caps (X) |
Config bundle: a versioned Parameter Set staged to shadow banks and committed atomically across switches
A safe rollout requires dependency completeness (tables move together), pre-commit time-lock checks, and a rollback pack that restores all coupled parameters.
Not covered here (scope guard)
- Timing/sync algorithm details and protocol parameter internals.
- Physical layer SI/layout/magnetics and EMC design details.
Verification & Observability: Prove Bounds and Catch Field Drift
Determinism is only real when bounds are provable
Verification must demonstrate time stability, schedule correctness, and queue/shaper contract compliance under realistic worst-case load. Field observability must correlate errors with configuration versions and environmental conditions to enable fast forensics.
What must be proven (three bound categories)
- Time stability: lock/offset/drift are stable; holdover events are controlled (X).
- Schedule correctness: GCL is active; no late/early windows; gate misses near zero (X).
- Contract compliance: queue watermarks and drop reasons remain within limits; tail percentiles meet bounds (X).
Counters and events: organize by diagnostic value
Schedule & gate
- gate miss, late window, early window (threshold X)
- window transition counts and anomalies
Queue & buffer
- per-queue drops with drop reasons (overflow/policer/gate closed)
- watermark peaks and microburst indicators (X)
Timestamp & time base
- timestamp jump, lock state changes, holdover enter/exit events
- offset/drift trends (X) and stability windows
Port health (switch view)
- link flap events and transitions
- per-port error counters (concept-level)
Operational requirement: every snapshot must bind to port + queue + (optional) flow id and the config version.
Correlate failures to environment + version + events (field forensics)
- Error: gate miss spike / watermark over / tail percentile jump / drop reason change.
- Environment: temperature, voltage, power events.
- Version: config_version + change record.
- Link events: flap transitions as a frequent trigger source.
Bring-up test plan (minimum closed loop)
| Test | Method | Observables | Pass criteria |
|---|---|---|---|
| Sync lock test | stabilize and step temperature / load | lock, offset, drift, holdover events | stable within X |
| GCL boundary | probe near window edges | late/early window, gate miss | ≤ X |
| Worst-case background | stress BE + burst patterns | watermark, drops, tail percentiles | P99/P999 ≤ X |
| E2E bound proof | measure per-hop + end-to-end distribution | latency/jitter/loss distributions | upper bound ≤ X |
Field black-box schema (flight-recorder style)
- timestamp: local time + sync state
- config_version: active Parameter Set id
- event_type: holdover_enter, gate_miss_spike, watermark_over, link_flap, …
- context: port, queue, (optional) flow_id
- counters_snapshot: gate/window/queue/drop/time-base counters
- env_snapshot: temperature, voltage, power-event flags
- action_taken: rate-limit / downgrade / rollback / alert
Trigger strategy: periodic snapshots every X seconds plus event-triggered snapshots on threshold crossing, stored in a ring buffer of size X.
Observe & correlate: traffic → counters → event log → diagnosis (with env + version)
A useful black box ties counters and events to the active config version and environment snapshot, enabling fast “what changed” answers and safe rollback decisions.
Not covered here (scope guard)
- Industrial stack certification steps and protocol parameter deep-dives.
- Timing/sync algorithm internals and PHY SI/layout details.
Engineering Checklist: Design → Bring-up → Production Gates
Engineering gates that turn mechanisms into repeatable outcomes
Each gate enforces resource sufficiency, time stability, configuration integrity, and proof-quality evidence so determinism does not depend on manual tuning.
Resources & pipeline capacity
- Check: queue/TC count covers isolation (no critical sharing). Pass: isolation plan has no overlaps (X). Evidence: queue map + profile sheet.
- Check: per-port buffer covers worst-case microburst. Pass: watermark budget ≥ X. Evidence: buffer sizing worksheet.
- Check: GCL depth/slot count supports cycle composition. Pass: entry count ≤ X, slot min ≥ X. Evidence: GCL template + guardrail report.
- Check: shaper instances and granularity match profile. Pass: all required shapers available (X). Evidence: shaper allocation table.
Time base & commit safety
- Check: lock/offset/drift/holdover policy is defined. Pass: thresholds documented (X). Evidence: time policy card + manifest prereqs.
- Check: “time-lock gate” blocks schedule commits when unlocked. Pass: commit is rejected under unlock/holdover. Evidence: guardrail rule + test record.
Config asset & observability readiness
- Check: Parameter Set package is versioned and complete. Pass: manifest + dependent tables present. Evidence: config bundle tree + checksums.
- Check: counters/events cover schedule + queue + time base. Pass: must-have signals enabled (X). Evidence: counter map + black-box schema.
Time stability & schedule correctness
- Check: sync lock remains stable for X minutes. Pass: offset/drift within X; holdover events within X. Evidence: lock trend + event log.
- Check: GCL activation is correct. Pass: late/early window ≤ X; gate miss ≤ X. Evidence: counter snapshots tied to version.
- Check: window boundary probe is clean. Pass: no boundary violations (X). Evidence: boundary test record.
Worst-case background stress
- Check: key queues stay below watermark limits. Pass: watermark ≤ X; no overflow drops. Evidence: per-queue watermark log.
- Check: tail percentiles meet bounds. Pass: P99/P999 ≤ X. Evidence: latency distribution export.
- Check: drop reasons are explainable. Pass: expected-only drop reasons. Evidence: drop-reason breakdown.
Change safety rehearsal
- Check: shadow load + atomic commit works at cycle boundary. Pass: no partial activation. Evidence: commit audit with version ids.
- Check: rollback triggers on threshold crossing. Pass: rollback completes and restores bounds (X). Evidence: rollback event + post-rollback counters.
Config governance
- Check: every device boots with a known config_version. Pass: version is readable and logged. Evidence: boot log + inventory report.
- Check: dependency completeness is enforced. Pass: reject incomplete bundles. Evidence: guardrail rejection record.
Rollback readiness
- Check: rollback drill is performed and timed. Pass: restore within X; bounds recover. Evidence: drill report + counters before/after.
- Check: no-commit under unlock/holdover is enforced. Pass: commit blocked. Evidence: audit + event log.
Field black-box completeness
- Check: snapshot includes version + env + counters. Pass: required fields present. Evidence: schema validation report.
- Check: data capture completeness exceeds X%. Pass: completeness > X%. Evidence: periodic audit summary.
3-gate pipeline: design → bring-up → production → field forensics (pass criteria at each gate)
Each gate ties pass criteria to measurable counters, snapshots, and the active configuration version, enabling consistent outcomes across design, lab, production, and field.
Scope guard (not expanded here)
- PHY/layout/magnetics/EMC implementation details (refer to PHY co-design & protection pages).
- Timing algorithm internals and industrial stack certification procedures.
Applications: Where TSN Switch/Bridge Actually Pays Off
Use cases should map to mechanism recipes (not standard clause lists)
Each scenario is defined by workload patterns and bound targets. The mechanism recipe selects Gate/Shaper/Admission/Timestamp/Observe as a coherent set.
Use-case → mechanism recipe (concept-level)
| Use case | Bound target (X) | Gate | Shaper | Admission | Timestamp | Observe |
|---|---|---|---|---|---|---|
| Motion control | bounded latency + jitter (X) | on (windowed cyclic) | optional (BE smoothing) | required (no new flow breaks bounds) | optional (alignment proof) | gate/window counters + tail P99/P999 |
| Machine vision / imaging | burst control + trigger determinism (X) | sometimes (trigger windows) | required (microburst suppression) | recommended (capacity reservation) | recommended (event alignment) | watermark + drop reason + tail shift |
| PLC + distributed I/O | cyclic + acyclic coexistence (X) | optional (hard partitions) | recommended (class-based shaping) | recommended (resource isolation) | optional | queue isolation + BE starvation watch |
| Robot cell / multi-axis | multi-flow coordination bounds (X) | on when cyclic is strict | optional (tail control) | required (prevent “tuning lottery”) | recommended | per-flow quotas + violation events |
| Power / rail / utility | time integrity + auditability (X) | optional | recommended (traffic smoothing) | recommended | required (trusted stamps) | timestamp jump + holdover + event log |
| Edge gateway / bridging | mixed domains + safe updates (X) | optional | recommended | required (policy isolation) | recommended | versioned config + correlation black-box |
Mechanism recipes should be validated using the proof hooks defined in verification and observability, then locked into a versioned Parameter Set.
Recipe patterns (quick reference)
Cyclic hard bounds
Gate + Admission + Window counters. Shaper is secondary for smoothing non-critical traffic.
Burst-heavy data paths
Shaper + Watermark telemetry + Drop-reason breakdown. Gate is used only when trigger windows are strict.
Multi-tenant and expansion-safe
Admission quotas + strict isolation + versioned config bundles prevent new flows from breaking existing guarantees.
Recipe cards: each use case enables a different mechanism set (Gate / Shaper / Admission / Timestamp / Observe)
Recipes should be exported as versioned profiles and validated with schedule counters, watermark telemetry, and time-base stability checks.
Scope guard (not expanded here)
- Protocol stack specifics and certification checklists for PROFINET/EtherCAT/CIP.
- PHY/magnetics/EMC implementation detail and connector-level design.
H2-13. IC Selection Logic (TSN Switch / Bridge)
This section avoids “product dumping” and instead defines a repeatable selection method: requirements → TSN mechanisms → resource sufficiency → verifiable observability → operable configuration governance. The goal is to filter out parts that claim TSN support but cannot prove or sustain deterministic bounds in production and field operation.
Selection funnel (5 steps that prevent “tuning lottery”)
- Hard constraints — port count / port speed / host interface / thermal & package / industrial temp grade. Output: shortlist that physically fits the design.
- Determinism profile — target bounds for latency, jitter, loss, and time error, plus traffic shape (periodic / bursty / mixed). Output: which mechanisms are mandatory.
- Mechanism coverage — gate windows, shaping, admission control, and hardware timestamps. Output: “must-have list” per use-case.
- Resource sufficiency — queue count, per-port buffer, GCL depth, shaper instances, policing granularity. Output: proof that worst-case traffic still fits.
- Operate-ability — counters, events, mirror/trace aids, configuration bundle + rollback + safe commit. Output: ability to keep bounds stable over lifecycle.
Reference material examples (non-exhaustive): Microchip LAN9662 / LAN9668; NXP SJA1105P/Q/R/S / SJA1110; Renesas RZ/N2L / RZ/T2M. Use these part numbers as anchors for capability checklists, not as an implied “best choice”.
Example part buckets (how to read “fit”, not a shopping list)
- Industrial TSN switch with integrated CPU: Microchip LAN9668 (orderable examples: LAN9668-9MX, LAN9668-I/9MX). Fit: multi-port gateways / remote IO / TSN edge switches.
- Compact TSN end-point / small switch role: Microchip LAN9662 (orderable example: LAN9662/9MX). Fit: TSN-capable endpoints or small-port bridge designs.
- Automotive TSN switch family (AVB/TSN focus): NXP SJA1105P / SJA1105Q / SJA1105R / SJA1105S. Fit: deterministic multi-domain aggregation when safety/security hooks matter.
- Multi-gig safe & secure TSN Ethernet switch SoC: NXP SJA1110 family. Fit: TSN switching with security/safety features and strong ecosystem.
- MCU/MPU with integrated TSN-compliant small switch: Renesas RZ/N2L, RZ/T2M. Fit: TSN bridge / controller designs where compute + small-port TSN is sufficient.
Tip: selection should be driven by mechanism + resource + observability, not by a single “TSN supported” checkbox.
Selection scorecard (dimension → why → how to verify → pass (X) → red flags)
| Dimension | Why it matters | How to verify (engineering) | Pass criteria (X) | Red flags / reject fast |
|---|---|---|---|---|
|
Ports & speeds
Example materials:
LAN9662,
LAN9668,
SJA1105P/Q/R/S,
SJA1110
|
Determines how many deterministic flows can be isolated without queue sharing. Port speed impacts guard band sizing and worst-case serialization time per hop. | Confirm port modes, link partners, and CPU/host port bandwidth. Validate that worst-case frame + background traffic still meets the end-to-end bound. | Ports ≥ X; required speeds supported; CPU/host port not a bottleneck (utilization < X%). | CPU/host port saturates under mirror/telemetry. “TSN supported” but only on a subset of ports or queue classes. |
| Switching mode (store-and-forward vs cut-through) | Impacts per-hop latency shape and how errors are contained. Determinism needs bounded delay, not just average. | Measure per-hop forwarding delay (ingress timestamp → egress timestamp) under load and with gate windows enabled. | Per-hop delay upper bound ≤ X; mode is stable across MTU and VLAN/priority mixes. | Cut-through path lacks deterministic gating integration (gate applies “after” unpredictable pipeline stages). |
| TSN mechanism coverage | Different traffic types require different tools: time windows for strict periodic, shaping for mixed loads, admission to keep bounds valid when new streams appear. | Map each required flow class to one of: Gate / Shaper / Policer / Admission / Timestamp. Verify all required blocks are hardware-backed and have counters. | Required blocks present; each block has measurable state & events; no “software-only” critical path. | “Supported” but missing gate-miss / timestamp-jump / policing-drop reasons. |
| Queue count, per-class isolation & mapping | If critical traffic shares queues with best-effort, tail jitter and microbursts become unbounded and hard to debug. | Confirm independent queues per traffic class, queue depth controls, and queue-level watermark counters. | Dedicated queues for critical classes; mapping has no “holes”; watermark visibility available. | Queue mapping cannot be audited; only port-level drops exist (no per-queue attribution). |
| GCL depth, time resolution & safe update | Determinism relies on the schedule being representable without compressing slots or merging classes. Updates must not “tear” live traffic. | Check: max entries, minimum slot time, and whether shadow/atomic commit exists for GCL. Validate “late/early window” counters during schedule changes. | GCL depth ≥ X; min slot ≤ X; schedule switch without packet loss beyond X ppm. | No shadow table; updates require “stop traffic” maintenance windows. |
| Guard band & preemption support (when needed) | Guard bands protect time windows from being invaded by long frames. Preemption reduces wasted guard time at higher utilization. | Validate calculated guard time vs measured gate edge behavior under maximum MTU background traffic. Confirm counters exist for “late/blocked due to guard”. | No gate-edge violations; guard time margin ≥ X; preemption behavior is deterministic when enabled. | Preemption “supported” but cannot be validated (no counters / no clear enable scope). |
| Shaping instances & stability under microbursts | Many designs do not need strict gating everywhere. Shapers must remain stable under burst, credit resets, and queue coupling. | Stress test with bursty best-effort plus periodic flows. Observe per-queue occupancy, drop reasons, and shaping state transitions. | Tail latency bound holds under worst-case background load; no unexplained credit “jumps”. | No per-queue occupancy; only aggregate port counters exist (debug becomes guesswork). |
| Admission control & resource accounting | Without admission, determinism can be invalidated the day a new stream is added. Admission is what turns “tunable” into “guaranteed”. | Confirm: per-stream descriptors, resource model (bandwidth / slot / queue / buffer), and enforcement. Validate rejection behavior and “violation events”. | Admission decision is explainable; violations are logged; rejection occurs before bounds are harmed. | “Admission” exists only as a software convention (no hardware enforcement / no event logs). |
| Hardware timestamping & tap-point clarity | Timestamp error couples into scheduling and closed-loop control. If the tap point is unclear, residence time and correction become unverifiable. | Confirm ingress/egress timestamp capture paths, correction reporting, and “timestamp jump / holdover” events. | Timestamp error budget ≤ X; tap point documented; residence time observable per hop. | Hardware timestamps exist, but no access to raw capture records or no error/step detection. |
| Counters, events, mirror/trace aids | Field failures are mostly diagnosability failures. The minimum set must isolate: queue drops, gate misses, timing anomalies, and configuration version. | Require per-port + per-queue counters, gate-miss/late/early flags, timestamp events, plus mirror support for capture. | Black-box completeness > X%; event-to-counter correlation works with config version tagging. | Only link-level counters exist; no queue-level attribution; no gate miss observability. |
| Configuration governance (bundle, shadow, rollback) | TSN failures are often configuration failures. Without safe updates, networks freeze and field drift becomes unmanageable. | Verify: config package versioning, dependency checks, atomic commit (shadow → swap), and rollback rehearsal. | Rollback success rate ≥ X%; shadow swap time ≤ X; “time base unlocked” prevents schedule activation. | Schedule/config updates require reboot or uncontrolled transient behavior; no “safe commit” path. |
Scorecard usage: assign weights according to the determinism profile, then require that every “must-have” row passes verification with measurable evidence.
Bring-up verification hooks (must exist before committing a part)
1) Traffic generation hooks
- Ability to inject periodic flows + bursty best-effort simultaneously (worst-case background load).
- Repeatable stress profiles: microburst, long-frame invasion attempts, mixed priority classes.
- Pass criteria: P99.999 latency ≤ X, jitter ≤ X, and zero unexplained drops in the critical class.
2) Data-plane self-test hooks
- Port/pipeline loopback modes (to separate “configuration vs environment” quickly).
- PRBS / built-in traffic test (if available) to validate datapath stability without external complexity.
- Pass criteria: self-test completes with error counters stable within X / hour.
3) Time & timestamp hooks
- Ingress/egress timestamp capture with clear tap-point definition.
- Residence time / correction evidence path (concept-level requirement).
- Events: timestamp jump, timebase unlock, holdover enter/exit.
- Pass criteria: time error ≤ X; no timestamp step under stress; holdover behavior matches guardrails.
4) Gate/shaper/queue enforcement hooks
- Counters: per-queue drop, per-queue watermark, gate miss, late/early window, policing drops (with reasons).
- Schedule update safety: shadow table + atomic swap, plus “deny activation if timebase not locked”.
- Pass criteria: zero gate-edge violations; all drops are attributable and bounded within X.
Reject-fast red flags (high probability of “un-debuggable determinism”)
- No gate miss / late/early window counters → schedule failures cannot be proven or localized.
- Hardware timestamps exist but tap-point is unclear or inaccessible → time error budget cannot be validated.
- Only port-level drops, no per-queue attribution → microburst and starvation become guesswork.
- No shadow/atomic commit for GCL and mapping tables → updates introduce uncontrolled transient behavior.
- CPU/host port becomes the bottleneck under observability → deterministic network collapses when debugging is needed most.
Decision output template (copy/paste into a design review)
Diagram — Selection funnel + scorecard + bring-up hooks (concept map)
The diagram is intentionally mechanism- and evidence-oriented: the part number is only useful if it passes the scorecard with measurable proofs.
Request a Quote
H2-13. FAQs (TSN Switch / Bridge)
Scope: long-tail troubleshooting only. Each answer is a fixed, measurable 4-line format: Likely cause / Quick check / Fix / Pass criteria (threshold placeholders X/Y).
Q1. Enabling Qbv causes sporadic packet loss — guard band underestimated or gate switch overhead ignored?
T_guard smaller than worst-case frame serialization + internal gate edge latency; gate schedule allows long BE frame to overlap a critical window.gate_miss_cnt, late_window_cnt, per_queue_drop_cnt during max-MTU background traffic; correlate drops with gate edge timestamps.T_guard using worst-case MTU and link rate; move BE queue to a fully closed window around critical slots; enable/validate preemption only if utilization loss is unacceptable.gate_miss_cnt=0 and late_window_cnt=0 over Y cycles; critical-class drops ≤ X ppm under worst-case background load.Q2. End-to-end latency percentiles look great, but rare spikes violate the hard bound — microburst or admission leak?
p99_999_e2e_us vs max_e2e_us; inspect per_queue_watermark near spikes; check admission_violation_evt and policer_drop_reason.max_e2e_us ≤ X over Y minutes of worst-case stress; per_queue_watermark never exceeds X% of capacity; admission_violation_evt=0.Q3. Time sync shows “locked”, yet windows are shifted — timestamp tap point mismatch or time-base step?
timebase_step_evt, holdover_evt, ts_jump_cnt; measure window edge error win_edge_err_ns at ingress/egress stamps per hop.timebase_step_evt=0 and ts_jump_cnt=0 over Y minutes; absolute win_edge_err_ns ≤ X on every hop.Q4. Preemption enabled, but throughput drops or retransmissions spike — fragment handling or queue mapping issue?
preempt_frag_cnt, preempt_abort_cnt, reassembly_err_cnt, and per-queue watermark; confirm mapping table prio_to_queue_map matches design intent.reassembly_err_cnt=0; retransmission indicators (CRC/retry counters) remain within X ppm.Q5. One traffic class is always starving — CBS/ATS parameters wrong or gate schedule conflict?
per_queue_tx_bytes (starved queue flat), cbs_credit_min/cbs_credit_reset_cnt, and gate open ratio gate_open_time_us per cycle.Q6. Determinism breaks after topology change — multi-hop schedule alignment or residence-time update missing?
hop_count and per-hop res_time_ns before/after; check win_edge_err_ns growth across hops; verify config bundle version consistency across switches.win_edge_err_ns ≤ X; end-to-end max_e2e_us ≤ X; no schedule/bundle mismatch events over Y minutes.Q7. Temperature change triggers window miss — oscillator drift or aggressive holdover strategy?
offset_ppb until schedule edges misalign; holdover enter/exit introduces a step or higher wander.temp_c vs offset_ns/drift_ppb; inspect holdover_evt and timebase_step_evt; count late_window_cnt.drift_ppb ≤ X across operating temperature; timebase_step_evt=0; late_window_cnt=0 over Y thermal cycles.Q8. Measured latency is significantly higher than theory — store-and-forward path or buffer watermark throttling?
ingress_ts→egress_ts; inspect store_fwd_active state; check per_queue_watermark and throttle_evt.throttle_evt=0 during bounded-load tests; max_e2e_us ≤ X.Q9. Background traffic increases critical jitter — queue isolation missing or ingress policing absent?
prio_to_queue_map, per_queue_watermark, policer_drop_cnt; validate that critical queue has dedicated gate/shaper.jitter_us ≤ X under worst-case BE load; critical queue watermark stays below X%; critical drops ≤ X ppm.Q10. After configuration update, the network sporadically flaps — bundle inconsistency/rollback gap or time-base relock?
config_bundle_ver across nodes; inspect atomic_swap_evt, rollback_evt, timebase_unlock_evt; correlate with link_flap_evt.config_bundle_ver identical on all nodes; link_flap_evt ≤ X per Y hours; timebase_unlock_evt=0 during swap window.Q11. Mirroring/capture shows frame order “scrambled” — true multi-queue reordering or preemption visualization artifact?
seq_id; inspect mirror_port_util, preempt_frag_cnt, and per-queue drain counters.seq_id monotonic at egress; mirror overhead mirror_port_util ≤ X%; no unexplained reorder events over Y minutes.Q12. Admission passes, yet congestion still happens — which resource dimension is missing (slot/queue/buffer)?
slot_us, frames_per_cycle, burst_bytes, buffer_bytes, hop_res_time_ns; compare predicted vs observed watermark and gate misses.gate_miss_cnt=0; congestion indicators (drops/watermarks) remain within X ppm under validation load.