industrial-ethernet-switch-tsn
← Back to: Telecom & Networking Equipment
An Industrial Ethernet TSN switch delivers deterministic transport by combining a synchronized timebase (802.1AS), time-aware scheduling (802.1Qbv) and flow-level protection/redundancy (Qci/Qcc, 802.1CB) so critical traffic stays within bounded latency and jitter. In practice, “deterministic” must be proven by evidence—window hit-rate, gate-miss counters, drop-by-reason, and stable timing metrics under load and faults.
H2-1 · What makes a TSN industrial switch “deterministic”
A TSN industrial Ethernet switch is designed for bounded behavior rather than best-effort throughput. “Deterministic” means time-critical streams can be delivered with a known worst-case delay, a bounded jitter, and a predictable loss/failover behavior even when background traffic, topology changes, or electrical noise are present.
Boundary (this page): TSN = Ethernet time synchronization + gated/shaped forwarding + per-stream protection + redundancy. The focus is an industrial multi-port switch (rings, harsh EMC, isolated I/O, mixed real-time and IT flows). Topics such as routers, carrier NAT/BNG, PoE power systems, optical transport, 5G RAN hardware, or GPSDO/atomic clocks are intentionally out of scope.
A non-TSN switch typically relies on priority queues and traffic shaping that improves averages but cannot guarantee tails. Under congestion or bursts, queueing variation dominates and the “rare worst case” becomes the real failure mode (missed control cycle, motion jitter, safety interlock latency, or time-aligned sampling drift). TSN addresses this by turning network resources into time-scheduled and stream-budgeted behavior.
| Capability | Typical TSN feature | Engineering payoff | Common pitfall |
|---|---|---|---|
| Time sync | 802.1AS (gPTP), HW timestamps | Shared timebase; bounded residence-time error across hops | Timestamp point too “late/early” → offset looks OK but jitter grows |
| Scheduled forwarding | 802.1Qbv (TAS), GCL | Time windows for critical queues; bounded egress contention | Gate schedule mismatch / drift → gate miss, overruns, unintended blocking |
| Contention control | 802.1Qbu + 802.3br (preemption) | Limits blocking by large frames; smaller guard bands | Wrong configuration → fragmentation errors or unpredictable blocking |
| Per-stream protection | 802.1Qci (PSFP), Qcc (admission) | Abnormal flows cannot flood buffers or steal time windows | Policing thresholds mis-set → “mysterious drops” on only one stream |
| Redundancy | 802.1CB (FRER) | Failover without long re-convergence; predictable recovery | Duplicate elimination window too small/large → drops or memory pressure |
| Observability | Counters, alarms, logs | Prove determinism; isolate root cause (sync vs schedule vs drops) | Only “port stats” → no visibility into gate misses / stream policing |
H2-2 · System architecture: silicon blocks & data path inside the switch
A TSN-capable switch ASIC is best understood as a pipeline where determinism is enforced at specific checkpoints. Instead of treating traffic as “port level,” TSN features are typically applied at stream level (classification, policing), then translated into queue/time behavior (gating, shaping) at egress.
Architecture rule of thumb: determinism is preserved only if (a) timestamps are taken at the right boundary and (b) gate/shaper decisions are applied before the final transmit decision. If either is implemented too late, queueing variation leaks into the “real-time” stream.
Data plane (packet path) — the “must-pass checkpoints”
- Ingress parse & stream identification: VLAN/PCP, stream handle, and forwarding context are derived early so per-stream policies can be applied consistently.
- Per-stream filtering/policing (PSFP boundary): abnormal bursts and misbehaving talkers are contained before they inflate queue tail latency.
- Queueing & buffer management: queue depth and buffer allocation define worst-case queueing delay; tail latency is often a buffer story, not a link-rate story.
- Time-aware gating (TAS): queue gates enforce time windows; missed windows become measurable failure signatures (gate-miss / overrun).
- Traffic shaping: shapers smooth egress behavior and prevent background traffic from “reforming” bursts right before transmit.
- Egress scheduling & transmit: the final arbiter merges gating, priorities, and shapers into a deterministic transmit decision per port.
Control plane (TSN logic tightly coupled to the pipeline)
- Schedule manager: loads, versions, and activates the Gate Control List (GCL) and exposes “which schedule is live” for diagnostics.
- Time engine + timestamp unit: maintains the switch time domain and produces/consumes hardware timestamps; residence-time accounting lives here.
- Redundancy functions (optional): FRER replicate/eliminate may sit at ingress/egress; placement impacts buffer needs and skew tolerance.
Management plane (what makes it deployable in industrial networks)
- Management MCU/CPU: configuration control, health monitoring, and secure configuration storage (schedule versions, stream policies).
- Control buses: MDIO (PHY), I²C/SPI (sensors/EEPROM), and OOB interfaces feed telemetry and maintain traceability.
- Event logs & alarms: determinism issues should be distinguishable (sync vs schedule vs policing vs buffer overflow).
Deliverable: “checkpoint → evidence” mapping (what to observe)
- Classification: stream hit/miss, unknown stream counters, VLAN/PCP rewrite events.
- Policing: per-stream drops (rate/burst violations), gate-closed drops, out-of-profile frames.
- Queues/Buffers: per-queue occupancy highs, overflow drops, tail-drop vs WRED (if used), head-of-line indicators.
- Gating/Schedule: gate miss/overrun, schedule version active, late/early gate transitions.
- Time/Timestamps: sync state, offset history, path-delay trends, residence-time stats anomalies.
H2-3 · Time synchronization in practice: IEEE 802.1AS (gPTP) + hardware timestamps
In a TSN switch, time synchronization is not “nice-to-have.” It is the foundation that makes schedules meaningful. The goal is a shared timebase across ports so that time windows, residence time, and path delay are measurable and bounded within a TSN domain. When time is unstable, scheduling degenerates into best-effort behavior with hidden jitter.
Boundary: this section explains gPTP inside a TSN network domain and inside the switch pipeline (timestamps, delay, residence time). Upstream time-source design (GPSDO/atomic) is intentionally out of scope.
Engineering checkpoints (what must be true in a deployed TSN switch)
- Time domain consistency: ports participating in the same TSN domain must share a coherent notion of time for scheduling and measurement.
- BMCA (boundary-level): the switch must follow domain best-master selection and handle transitions without unpredictable step changes in schedules.
- Path delay & asymmetry awareness: delay estimation must remain stable; persistent asymmetry appears as a stable bias that breaks timing alignment.
- Residence time accounting: the switch must treat “time inside the device” as a measurable quantity; queueing and gating influence it.
- Hardware timestamps: timestamps must be taken close enough to the wire to avoid software/queueing artifacts being mistaken as sync error.
- Timestamp quantization: resolution and the physical point of timestamp insertion.
- Oscillator wander: temperature/power sensitivity and holdover behavior inside the switch time engine.
- Path asymmetry: cable/PHY/module directionality and link-layer processing differences.
- Queueing variation: contention and internal arbitration that leaks into timestamps when insertion is too “late.”
- Offset looks stable, but jitter grows under load → timestamp point too late / queueing variation.
- Stable one-direction bias across a link → path asymmetry (media/PHY/module) or wrong delay model.
- Sudden offset steps during topology or link events → BMCA transitions / re-sync / link renegotiation.
- Offset correlates with CPU activity → software timestamping or non-deterministic timestamp path.
| Timestamp point | Dominant error sources | Load sensitivity | Complexity / cost | Best fit |
|---|---|---|---|---|
| PHY | Minimizes MAC/queue artifacts; sensitive to link/media asymmetry and PHY pipeline variation | Low | Higher integration effort; tighter coupling to PHY implementation | High contention environments; tight jitter budgets; long/variable links |
| PCS | Less exposed to MAC arbitration; must account for encoding/decoding pipeline delays | Medium | Moderate; depends on SerDes/PCS architecture | High-speed links where PCS stage is accessible and stable |
| MAC | Most exposed to internal contention and arbitration timing; queueing can leak into timestamps | High | Lower cost; common implementation | Light to moderate load, relaxed jitter budgets, strong observability/counters |
H2-4 · Scheduling & shaping: TAS (802.1Qbv) and the Gate Control List (GCL)
Time-Aware Shaping (TAS) turns “bandwidth” into time windows. Instead of hoping priority is enough, the switch explicitly decides which queue is allowed to transmit during each scheduled interval. This isolates critical streams from background bursts by bounding egress contention.
Practical rule: TAS primarily controls what happens right before transmit. If timestamps are stable (H2-3) but TAS is mis-scheduled, determinism fails as gate misses, overruns, or drift.
How a schedule is built (mapping chain)
- Traffic class → Queue: group streams by criticality and latency/jitter target; keep critical traffic in a dedicated queue when possible.
- Queue → Gate state: define open/closed states for each queue over time; ensure critical queue has protected windows.
- Gate state → Cycle: choose cycle time so worst-case waiting fits the application budget across hops.
Engineering parameters that determine worst-case behavior
- Cycle time: sets the maximum waiting bound; too small stresses timing accuracy, too large increases worst-case delay.
- Window length: must cover critical traffic volume plus margin; undersizing causes spillover into the next cycle.
- Guard band: protects windows from blocking by non-critical frames; can be reduced when preemption is used (covered later).
- Gate transition accuracy: depends on time sync quality and internal implementation; poor accuracy looks like random jitter.
- Queue depth: tail latency and overflow risk; depth must match burstiness outside protected windows.
- Gate miss (window opens but frames do not leave) → late gate transition, scheduling mismatch, or upstream queue starvation.
- Overrun (frames transmit after window closes) → wrong gate mapping, implementation timing, or measurement point mismatch.
- Cycle drift (behavior shifts over minutes/hours) → time domain drift, schedule activation issues, or unstable path delay.
- Gate miss / overrun counters per port/queue; active schedule version and activation timestamp.
- Queue occupancy highs and drop reasons (overflow vs gate-closed vs shaping-limited).
- Critical stream drops separated from best-effort drops (avoid “port-only” statistics).
- Sync state around the event window (offset spikes correlate with schedule anomalies).
Deliverable: GCL design checklist (inputs → outputs → acceptance)
H2-5 · Frame preemption & guard bands: 802.1Qbu + 802.3br (when TAS meets big packets)
TAS (time windows) can still fail when the link is already occupied by a large, non-preemptable frame. Even if the gate “opens” on time, the critical frame cannot transmit until the ongoing frame completes. This is a physical blocking problem, not a schedule math problem.
Two strategies (same goal, different trade-offs)
- Guard band: reserve an empty interval before the critical window so a big background frame is not allowed to start. Simple and robust, but wastes capacity.
- Frame preemption: allow background traffic to be interrupted (preempt/resume) so a critical frame can cut in. Higher efficiency, but adds implementation complexity and fragment/reassembly edge cases.
- Determinism is prioritized over utilization.
- Interoperability across mixed endpoints is uncertain.
- Critical windows are not extremely narrow relative to max-frame blocking.
- Operations prefer minimal moving parts (easier validation and field service).
- Critical windows are narrow and frequent, so guard band overhead becomes large.
- Max background frame size is large relative to link rate (blocking dominates).
- High utilization is required without sacrificing critical jitter bounds.
- Hardware and firmware can provide clean preempt/resume statistics and alarms.
Debug signals (turn “jitter” into evidence)
- Preempt/resume counters per port/queue: confirms preemption triggers where expected.
- Fragment / reassembly errors: points to interoperability or implementation faults (not schedule tuning).
- Window miss correlation: if misses happen only when big frames are present, blocking is the root cause.
H2-6 · Flow protection: per-stream filtering/policing (802.1Qci) + admission control (802.1Qcc)
Determinism fails when an abnormal stream (misconfigured, faulty, or bursty) consumes buffers and scheduling capacity. Per-stream protection prevents one “bad” flow from degrading all other flows by enforcing stream-level behavior at ingress, with clear counters that separate policing drops from congestion drops.
802.1Qci (PSFP): where enforcement happens
- Stream classifier: identify frames as a specific stream (stream handle) before they touch critical queues.
- Metering: enforce rate and burst limits so bursts do not create tail latency or buffer overflow.
- Stream gate: allow/deny behavior per stream (policy), keeping critical traffic protected from unexpected timing.
- Drop actions + counters: drops must be counted per stream and by reason (policing vs gate-closed vs other).
802.1Qcc (admission control): how budgets stay valid over time
Admission control prevents the system from accepting a stream that would violate existing latency/jitter budgets. In practice, it defines the boundary between what is allowed to be installed (registered/reserved) and what must be rejected or constrained (unknown or budget-breaking streams).
- Unknown streams: do not allow into critical queues by default; constrain early, relax with evidence.
- Critical streams: conservative burst limits first, then expand only if no policing drops occur under load.
- Background streams: cap burst to protect buffer headroom and reduce tail latency effects on shared resources.
- Observability: enable per-stream counters for policing, gate-closed, and queue overflow symptoms.
Diagnosis: attributing drops correctly (policing vs congestion)
- If a stream drops: check per-stream policing drop counters first (Qci metering/gate policy).
- If policing drops are near zero: check gate-closed drops (stream gate policy or configuration mismatch).
- If gate-closed is near zero: check queue overflow / buffer drops and occupancy highs (congestion evidence).
- If congestion correlates with background bursts: tighten background burst limits or revisit admission (Qcc).
H2-7 · Redundancy for industrial rings: FRER (802.1CB) + HSR/PRP boundary
Redundancy in TSN is not just “two paths.” The hard part is keeping deterministic behavior when duplicates, path delay mismatch, and out-of-order arrivals collide with time windows and finite buffers. FRER (802.1CB) addresses this at the stream level by replicating, sequencing, and eliminating frames.
FRER in one pipeline: replicate → sequence → eliminate
- Replicate: after stream classification, the switch creates two (or more) copies for independent paths.
- Sequence: each frame carries a sequence number so late/duplicate copies can be recognized reliably.
- Eliminate: at the merge point, duplicates are removed and (optionally) limited reordering is applied inside a defined window.
HSR/PRP boundary (interface view, not a ring tutorial)
- Redundancy may be implemented outside the TSN schedule domain (end-station or edge node).
- The TSN switch may “see” duplicate frames as normal traffic unless a defined eliminate point exists.
- Key requirement: duplicates must not overrun critical queues or window capacity.
- Replication and elimination are stream-aware and can be tied to deterministic scheduling policy.
- Sequence-aware elimination enables tight diagnostics (late duplicate, window miss, out-of-order).
- Engineering focus shifts to duplicate window and path skew budgeting.
Engineering criteria (make redundancy deterministic)
- Duplicate window: too small drops “late-but-valid” copies; too large increases buffer residence time and memory pressure.
- Latency skew (Δpath): larger mismatch requires a larger elimination/reorder window, raising worst-case jitter if unbounded.
- Out-of-order handling: define what happens when sequence gaps appear (forward-first vs wait-within-window).
- Buffer impact: elimination/reorder is a hidden queue. It must have counters for occupancy highs and window-related drops.
Field symptoms and evidence
Deliverable — redundancy strategy selection (TSN switch viewpoint)
| Option | Availability | Worst-case latency predictability | Bandwidth overhead | Operational complexity |
|---|---|---|---|---|
| FRER (802.1CB) | High (duplicate delivery per stream) | High when window/skew are budgeted and observable | Medium–High (replication per protected stream) | Medium (sequence + elimination window + counters) |
| PRP (boundary) | High (two independent networks) | Depends on where elimination occurs; duplicates can stress queues if untreated | High (full duplication) | Medium (interop strong, but TSN domain must tolerate duplicates) |
| HSR (boundary) | High (ring duplicate circulation) | Depends on ring behavior and elimination; duplicates may amplify load | High (duplication on ring) | Higher (traffic behavior in ring needs careful containment at the TSN edge) |
H2-8 · Latency/jitter budget: store-and-forward vs cut-through, queues, buffers, and congestion
Deterministic networks are engineered around upper bounds. The relevant numbers are worst-case latency and bounded jitter, not average delay. A TSN switch budget is credible only when each contribution is mapped to an observable metric (timestamp deltas, queue occupancy highs, and drop reasons).
Forwarding mode changes the bound (not just the mean)
- Frames are forwarded after full reception (stable behavior, higher base latency).
- Worst-case forwarding delay scales with frame size and internal pipeline stages.
- Often easier to validate for strict bounds when error handling is conservative.
- Forwarding begins before full reception (lower base latency).
- Worst-case becomes more sensitive to arbitration, contention, and rare corner cases.
- Strong observability is required to prove bounded behavior under congestion.
Queues and buffers: tail latency is the real enemy
- Depth is not free: deeper buffers can reduce drops yet increase residence time and tail latency.
- Occupancy highs matter: worst-case jitter tracks peak occupancy, not average occupancy.
- Scheduling is not universal protection: time windows help only for traffic that is explicitly protected and budgeted.
Deliverable — budget decomposition per hop (with measurable metrics)
| Budget term | Meaning (worst-case) | Typical driver | Observable metric |
|---|---|---|---|
| Forwarding | Pipeline + mode impact (S&F or cut-through) across a hop | Frame size, pipeline stages | Ingress/egress timestamp delta for the hop |
| Queueing | Residence time due to contention and burst absorption | Occupancy highs, burstiness | Queue occupancy high-watermark, per-queue latency histogram (if available) |
| Schedule alignment | Waiting until the allowed window / gate state | Window placement, guard time | Gate-closed counters, “miss/overrun” indicators, window timing logs |
| Sync error | Timebase mismatch that shifts effective windows and timestamps | Path delay variation, residence time error | gPTP offset, path delay, time error alarms (bounded thresholds) |
Acceptance tests (evidence-first, not guesswork)
H2-9 · Industrial hardening: isolated I/O, EMC/ESD/surge, and power integrity (within switch box)
Industrial hardening is the set of design choices that keep links stable, timestamps trustworthy, and control logic quiet when the switch is surrounded by noisy cables, fast transients, and imperfect grounding. This section stays inside the switch enclosure: isolation boundaries, port-level protection strategy, and power integrity under wide input and short disturbances (not a PoE system discussion).
Isolation boundaries: keep field noise out of the logic/time domain
- Chassis/Field domain: external cables and reference shifts drive common-mode currents and surge injection.
- PHY/Port domain: the entry point for fast transients; protection and return paths must be explicit.
- Logic/Timing domain: switch ASIC, timestamp units, schedule manager, and management CPU must see a clean reference.
Digital isolator selection criteria (focus: determinism and robustness)
- Determines whether fast common-mode edges cause false toggles.
- Directly correlates with “rare resets” and sporadic link events.
- Time-sensitive I/O needs bounded delay and low variation.
- Non-critical management GPIO may tolerate wider delay.
- More channels increase coupling and power-domain complexity.
- Isolation supply noise must not leak into timing logic.
Port-level protection (ESD / EFT / surge): engineering points inside a switch
- Return-path control: protection is only effective when the surge/ESD current has a short, predictable route to its reference.
- Keep the PHY calm: uncontrolled clamp reference can inject noise into the PHY domain and cause CRC bursts or link flaps.
- Different signatures: ESD often appears as sharp error spikes; EFT tends to cause repeated micro-outages; surge may trigger brownouts or persistent instability.
Power integrity inside the box (wide input + short disturbances)
- Wide-input resilience: define ride-through expectations for short input drops and fast transients.
- Hold-up with intent: size for continuity of critical domains (switch core, timing) rather than maximizing bulk energy blindly.
- Thermal drift awareness: temperature-driven timing drift becomes visible as offset wander and window-margin erosion.
Field symptoms and how to correlate evidence
| Symptom | Strong evidence to collect | Most likely root domain |
|---|---|---|
| Link flaps / reconnects | Port event timeline, retrain count, rail dip markers, CRC bursts around the event | Power integrity or port-domain transient injection |
| Packet loss under stress | Drops by reason (congestion vs gate vs policing), queue occupancy highs | Queue/buffer pressure (often triggered by disturbances) |
| Time drift / offset jumps | PTP offset + meanPathDelay trend, residence-time stats, event alignment with resets/flaps | Timing domain upset or path asymmetry introduced by disturbances |
Deliverable — industrial reliability checklist (design + debug)
H2-10 · Management & observability: what to log and which counters prove determinism
Determinism cannot be claimed by configuration alone. It must be proven continuously using a minimal, consistent set of counters and logs: time error stays bounded, schedules execute as intended, and drops are explainable. This section organizes observability into a practical “field dashboard” that supports remote diagnosis.
Must-observe signals (grouped by the questions they answer)
- PTP offset trend and threshold crossings
- meanPathDelay stability
- Residence-time statistics (distribution widening is a warning)
- GCL version + active status
- Gate miss / overrun indicators
- Window-related transmit blocking counters (if available)
- Drops by reason (congestion vs gate-closed vs policing)
- Queue occupancy high-watermarks
- Latency histograms per class (when supported)
- FRER duplicate rate and late-duplicate counters
- Out-of-order indicators
- Elimination window overflow / sequence errors
Alarm tiers: determinism first, throughput second
| Tier | Meaning | Typical triggers | First action |
|---|---|---|---|
| P0 Determinism broken | Time or schedule bounds violated; critical streams cannot be trusted | PTP offset out of bound; gate miss rising; unexplained critical drops | Freeze config snapshot; correlate with event timeline; inspect time/schedule counters first |
| P1 Determinism at risk | Trends indicate shrinking margin; failure likely under stress | meanPathDelay drift; residence-time widening; occupancy highs near limit | Reduce background bursts; verify policing/admission margins; confirm schedule capacity |
| P2 Throughput / health | General performance/health issue without evidence of time-bound failure | Port errors; thermal warning; best-effort congestion | Check port/thermal/power health; confirm critical counters remain clean |
Remote diagnosis: minimum evidence set (so incidents are reproducible)
Deliverable — field dashboard data dictionary (one page)
| Field | Why it matters | Correlation hint |
|---|---|---|
| PTP offset, meanPathDelay, residence stats | Proves bounded time error and stable delay model | Jumps after link flap or reset often indicate power/EMI or topology events |
| GCL active, gate miss/overrun | Proves schedule execution and window integrity | Gate misses rising may correlate with time drift or load spikes |
| Drops by reason, occupancy highs | Explains delay/loss without guessing | Occupancy highs align with background bursts; reason codes isolate policy vs congestion |
| FRER duplicate rate, late dup, out-of-order | Proves redundancy helps without adding unbounded jitter | Late dup rising indicates path skew growth or window too tight |
| Config version, GCL version, event timeline | Makes incidents repeatable and debuggable remotely | Always correlate counter changes to config/time and event stamps |
H2-11 · Validation & conformance: how to test TSN features (lab + factory + field)
TSN conformance is not proven by configuration screenshots. It is proven by evidence that time stays bounded, schedules execute as intended, and loss/latency remain explainable under realistic stress (mixed traffic, bursts, failures, and redundancy events). This section provides a three-layer validation plan with concrete stimuli, observables, pass/fail criteria, and traceable artifacts.
1) Lab validation (R&D): prove bounds under stress
- Stimulus: step load (idle → high), bursty background, link flap, temperature ramp, master changeover.
- Observe: PTP offset trend, meanPathDelay trend, residence-time stats, BMCA role changes, re-lock time.
- Pass/Fail: offset remains within declared bound; recovery time after disturbances stays within requirement; no unexplained jumps.
- Evidence: pcap with timestamps + DUT counters snapshot at event boundaries.
- Stimulus: mixed critical flows + heavy best-effort; vary cycle time, guard band, and queue depth near limits.
- Observe: gate miss/overrun indicators, per-class egress timing vs gPTP time, queue occupancy highs, drops by reason.
- Pass/Fail: critical flow window hit-rate meets requirement; gate misses below threshold (often zero); bounded egress jitter.
- Evidence: per-class timing histogram (if available) + pcap + GCL version and active status.
- Stimulus: long frames occupying the link while critical windows approach; compare “guard band only” vs “preemption enabled”.
- Observe: preempted-frame stats, fragment/merge errors, critical flow timing, throughput impact on best-effort.
- Pass/Fail: critical windows protected without protocol errors; fragment counters remain consistent; no unexpected loss.
- Evidence: analyzer report + counters (preemption, fragments) + pcap.
- Stimulus: inject abnormal streams (burst/overrate/malformed class) alongside valid streams.
- Observe: policing drops vs congestion drops, per-stream counters (when supported), queue pressure response.
- Pass/Fail: abnormal traffic is contained without collateral damage; critical streams remain within bounds.
- Evidence: drop-by-reason counters + stream identifiers + config snapshot.
- Stimulus: dual-path replication with controlled latency skew and reordering; include failover events and load spikes.
- Observe: duplicate rate, late duplicates, elimination window overflow, out-of-order indicators, jitter growth during events.
- Pass/Fail: elimination does not introduce unbounded jitter; no drops due to window mis-sizing; recovery stays within requirement.
- Evidence: per-path pcap + FRER counters + event timeline of failover/switching.
2) Factory test (Production): compress TSN checks into fast, high-yield screening
Production cannot run a full standards matrix. The goal is to retain a small set of tests that detect silicon/assembly issues that would later appear as “random” field failures: timestamp sanity, gate execution sanity, and basic queue behavior under a short stress.
- Basic connectivity: port up/down, forwarding sanity, VLAN/priority baseline.
- Timestamp sanity: verify hardware timestamp path is alive and consistent (no missing stamps, no path-dependent drift in a short run).
- GCL self-check: apply a short known GCL template; verify active status + expected counter deltas (no unexpected gate misses).
- Queue health: short burst test to confirm drop-by-reason works and buffer behavior is stable.
- Optional FRER smoke: basic replicate/eliminate correctness without complex skew sweeps.
3) Field acceptance (Commissioning): convert determinism into operational KPIs
- Critical-flow KPIs: window hit-rate, worst-case latency/jitter bound, and loss bound for each declared critical class.
- Redundancy KPIs: jitter growth during switching, duplicate/late-duplicate behavior, and recovery time after failover.
- Evidence closure: counters + logs must explain every loss/jitter event (congestion vs gate-closed vs policing vs topology event).
- Traceability: capture config version + GCL version/active flag + time state, aligned to event timestamps.
Deliverable — three-layer validation checklist (R&D / Production / Field)
| Layer | What must be proven | Minimum evidence | Output artifact |
|---|---|---|---|
| R&D Bounds | Offset/jitter and schedule correctness under stress and disturbances | pcap + time error trends + gate/queue/FRER counters + config snapshot | Validation report + test matrix |
| Factory Screening | Timestamp path alive; GCL executes; queues behave; no assembly-induced “randomness” | before/after counters, short pcap (optional), versions, pass/fail flags | Per-unit test record |
| Field KPIs | Critical flows meet acceptance KPIs; redundancy does not break determinism | dashboard snapshot, event timeline, targeted pcap, config/GCL/time state | Commissioning acceptance sheet |
Deliverable — TSN feature test matrix (stimulus → observe → pass/fail → evidence)
| Feature | Stimulus | Observe | Pass/Fail |
|---|---|---|---|
| 802.1AS | Load step, burst background, link flap, master changeover | offset/meanPathDelay/residence stats + event timestamps | Offset bound + recovery time bound |
| 802.1Qbv | Critical + best-effort saturation; cycle/guard variations | gate miss + egress timing vs gPTP + occupancy highs | Window hit-rate + bounded jitter; miss below threshold |
| Qbu/3br | Long frames near critical windows; compare modes | preemption/fragments stats + critical timing | No protocol errors; protected windows; expected throughput impact |
| Qci/Qcc | Abnormal stream injection (burst/overrate) | policing drops vs congestion drops; collateral impact | Abnormal contained; critical bounds preserved |
| 802.1CB | Dual path skew + failover + load spikes | duplicate/late dup/out-of-order + jitter growth | No elimination-window-induced loss; bounded jitter during events |
Concrete materials (example models / ordering references)
The following are widely used examples for building a TSN validation bench. Exact SKUs vary by port speed, license options, and interface modules; procurement should match the target line rate and TSN feature set.
| Role | Example models / material references | Used for |
|---|---|---|
| Time reference / GM | Meinberg LANTIME class systems (e.g., M-series); Safran SecureSync class appliances | PTP grandmaster, GNSS holdover, 1PPS/10MHz distribution (when required) |
| PTP/Sync validation | Calnex Sentinel / Paragon-class timing test platforms | PTP offset/TE measurement, SyncE/clock performance correlation, network timing monitoring |
| Traffic gen/analyzer | Keysight IxNetwork (TSN-capable configurations); Spirent TestCenter + TSN solution packages; VIAVI TSN solutions (market-dependent) | Multi-port TSN streams, Qbv/Qbu/FRER scenarios, congestion and anomaly injection, KPI export |
| Tap / capture | Line-rate TAP/probe equipment for the target speed (1G/2.5G/10G/25G) + capture workstation | Evidence retention (pcap), mirror/tap visibility for event correlation |
| Controller PC | Linux PC (NICs matching test speeds) + automation scripts | GCL deployment, counters polling, log collection, test orchestration |
H2-12 · FAQs (Industrial Ethernet Switch with TSN)
These answers stay within TSN switch scope: time sync, scheduled forwarding, flow protection, redundancy behavior, observability counters, industrial hardening symptoms, and acceptance testing.