123 Main Street, New York, NY 10001

TSN Switch / Bridge for Deterministic Industrial Ethernet

← Back to: Industrial Ethernet & TSN

A TSN switch/bridge makes Ethernet deterministic by turning traffic into scheduled, admitted, and measurable streams—so latency, jitter, and loss stay within provable bounds under worst-case load.

This page explains the mechanisms (time-aware gating, shaping, admission control, hardware timestamping, and observability) and how to configure and verify them so “works in the lab” becomes repeatable in production and field deployments.

Definition & Scope Guard: What “TSN Switch/Bridge” Covers

Working definition

A TSN switch/bridge is a forwarding node that enforces deterministic behavior using hardware time windows, hardware timestamps, traffic shaping, and admission control, so latency/jitter/loss have explicit bounds and are verifiable.

Role in the system

  • Isolation: separates time-aware flows from best-effort traffic to remove queueing uncertainty.
  • Bounded behavior: turns “average performance” into a worst-case bound via gates/shapers/admission.
  • Proof hooks: exposes measurable points (timestamps/counters/events) to validate deterministic guarantees.

“Deterministic” constrains three outcomes (plus time error)

1) Latency bound

One-way end-to-end worst-case delay is bounded (not just mean/P95).

2) Jitter bound

Packet delay variation is bounded (queue jitter + gate timing + timestamp noise).

3) Loss bound

Drop/late-drop/police-drop are kept below a defined limit under admitted load.

+ Time error

Offset/drift/holdover events shift window alignment and directly affect worst-case behavior.

When a TSN switch/bridge is needed (vs. a managed switch)

  1. A hard deadline exists (missed deadline is a functional failure, not a “slower UI”).
  2. Background load changes over time, so queueing jitter must be removed or bounded.
  3. Multiple endpoints require time coordination (scheduled windows / deterministic triggering).

If traffic is best-effort only, deadlines are soft, or critical flows run on a dedicated link, a managed switch with VLAN/QoS is often sufficient.

Scope Guard (strict)

In-scope (covered on this page)

  • Hardware time windows: gating concepts, schedules, guard bands (concept-to-budget).
  • Hardware timestamps: tap points, error terms, validation hooks.
  • Shaping: controlling burstiness to keep worst-case bounds intact.
  • Admission control: ensuring new flows cannot break existing guarantees.
  • Verification metrics: counters/events/tests to prove bounds (X placeholders).

Out-of-scope (link only, no deep dive)

  • PHY / magnetics / ESD / surge / layout — physical-layer integrity and protection. PHY Co-Design & Protection
  • PoE / PoDL — power delivery and thermal/protection co-design. PoE / PoDL
  • PROFINET / EtherCAT / CIP stack details & certification — endpoint protocol/stack domain. Industrial Ethernet Stacks
  • Ring redundancy (MRP/HSR/PRP) — topology redundancy and switchover mechanics. Ring Redundancy

System map: time-aware vs best-effort flows across a TSN switch/bridge

End stations PLC / Drive cyclic control Camera triggered frames Remote I/O sensor/actuator TSN Switch / Bridge Time windows (Gate) Traffic shaping Timestamp HW tap Admission guardrails Controller / Gateway Policy + Config Monitoring Time-aware flows Best-effort flows Out-of-scope: PHY / PoE / Stack / Ring

The diagram highlights the only job of this page: enforce and verify bounds inside the TSN switch/bridge (gates, shaping, timestamps, admission). Physical layer, power, stacks, and redundancy are referenced but not expanded.

Determinism Goals & Key Specs: Latency, Jitter, Loss, and Time Error

What this section locks down

  • A measurable definition of deterministic outcomes (bounds, not averages).
  • A common dictionary so all later chapters reuse the same metric meanings.
  • A budget template that decomposes one-way E2E delay into controllable blocks.

One-way E2E latency decomposition (switch/bridge-centric)

For deterministic design, the worst-case bound is driven by a small set of blocks. The most common dominating term is gate waiting (missing a window) or queueing under burst.

  • Ingress processing: parse/classify; becomes a fixed per-hop constant.
  • Gate waiting: time until the next open window; worst-case can approach one cycle if a window is missed.
  • Queueing + shaping delay: depends on burstiness and shaper parameters; bounded by admission + isolation.
  • Fabric + egress scheduling: forwarding latency inside the switch and port scheduling effects.
  • Serialization + propagation: hard lower bounds set by frame length, link rate, and cable length.

Jitter taxonomy (useful for verification and triage)

Queueing jitter

Variation caused by competing traffic and transient bursts; addressed by isolation, shaping, and admission.

Gate timing jitter

Window boundary uncertainty (schedule alignment, guard band, internal switching granularity).

Timestamp noise

Measurement noise from timestamp tap points and quantization; fixed by consistent tap definition and calibration.

Clock/time error

Offset/drift/holdover events shift schedules and can cause sporadic worst-case violations even when averages look good.

Metric Dictionary (standardized terms)

Metric Definition Obs. point Common misuse Pass criteria
One-way E2E latency (worst-case) Maximum one-way delay under admitted load Endpoint timestamps Using mean/P95 as “bound” ≤ X
Per-hop residence time Ingress-to-egress time inside a switch/bridge Switch HW timestamp/counter Mixing store-forward and cut-through paths ≤ X
Gate waiting time Time from arrival to next open window Ingress + schedule timeline Ignoring missed-window worst-case ≤ X
PDV (delay variation) Peak-to-peak or percentile spread of one-way delay Endpoint timestamps Comparing different timestamp tap points ≤ X
Window miss count Packets arriving too late/early for a configured window Switch gate counters Assuming “no drops” implies “no misses” ≤ X / hour
Drop breakdown Drops by queue / policer / buffer watermark Per-queue counters Only tracking CRC errors ≤ X
Timestamp error Tap-to-wire (or wire-to-tap) uncertainty Calibrated test Comparing uncalibrated devices ≤ X ns
Time base events Lock transitions / holdover enters / time step Time-sync status/event log Treating “locked” as “always stable” 0 critical events

All later chapters should reference these metrics by name. If a test cannot map to a metric above, it is likely not proving determinism.

E2E latency budget template (per-hop block view)

Block Worst-case bound How to measure Owner Risk flag
Ingress processing ≤ X Switch counters / profiling Switch config Unexpected parsing path
Gate waiting ≤ X (can approach one cycle) Schedule timeline + gate counters TSN schedule Missed window / drift
Queueing + shaping ≤ X Per-queue occupancy / shaper stats Shaping + admission Microburst / oversubscription
Fabric + egress scheduling ≤ X Ingress/egress timestamps Switch architecture Unexpected store-forward path
Serialization Frame_len / link_rate Known constants System spec MTU growth
Propagation Cable_len / velocity Cable model / measurement System design Topology change

Any deterministic claim should identify the dominant bound term and show how it is measured (or proven) under admitted load.

Latency budget waterfall: ingress-to-egress blocks and measurement points

Ingress parse/classify Gate wait next window Queue contention Shaper rate/burst Fabric forward Egress schedule Serialization frame length / link rate Propagation cable length / velocity HW timestamp Counters/events Worst-case bound is often dominated by: Gate wait (missed window) or Queueing under burst

The diagram is a repeatable template: each block should map to a measurement method and an owner. If a block cannot be measured or bounded, the design is not deterministic.

TSN Feature Map: Which Mechanism Solves Which Problem

Purpose (mechanism-first, not standards-first)

Determinism is achieved by choosing mechanisms that directly control the dominant term in the worst-case budget. This section maps each mechanism to the outcomes it can guarantee, the cost it introduces, and the scenarios where it is typically used. IEEE clause-level details and certification workflows are intentionally excluded to avoid overlapping with standards/certification pages.

Fast triage: pick the mechanism by the symptom

Hard deadline misses

Use Time-aware gating to bound waiting; add guard band / preemption when long frames overlap window edges.

PDV / jitter grows under load

Use Shaping (burst control) and admission control (resource cap); add policers to isolate misbehaving sources.

Looks “fine” but cannot be proven

Require hardware timestamping and per-queue counters/events; without observability, deterministic claims are not verifiable.

Mechanism-to-Outcome mapping (what each block actually guarantees)

Mechanism Primary outcome Dominant term controlled Cost / overhead Typical use Misuse symptom
Time-aware gating (Gate) Bounded latency (windowed service), deterministic scheduling Gate waiting (missed window dominates worst-case) Schedule management (GCL), guard bands, time alignment dependency Motion control, PLC cyclic traffic, deterministic triggers Sporadic deadline misses; “great average, bad worst-case”
Traffic shaping (Shaper) Bounded jitter under burst; predictable queue growth Burstiness → queueing (microbursts / overshoot) Added controllable latency; parameter tuning and validation effort Imaging streams, mixed cyclic/acyclic networks, gateways Jitter spikes when background load changes
Policing (Policer) Isolation from misbehaving sources; protects deterministic domain Ingress overload (rate/burst violations) Drops/marking; requires clear violation definitions and counters Multi-vendor integration, field expansion, mixed trust domains “Random” drops without per-flow accountability
Admission control (Admission) No starvation under admitted set; prevents future bound breakage Resource cap (bandwidth/slots/queues/buffers) Requires resource model + workflow (approve/rollback/versioning) Scalable cells, factories with add-on nodes, shared infrastructures Works until “one more device” is added, then bounds collapse
Hardware timestamp (Timestamp) Proof and calibration; enables per-hop residence time measurement Measurement credibility (tap-point consistency) Requires clear tap definition + error budget + event logging Any deterministic validation, multi-hop timing, commissioning Conflicting measurements across tools/devices

The mapping avoids standard clause references and focuses on engineering outcomes: which dominant term is controlled and how misconfiguration typically manifests.

Typical “mechanism recipes” by application (quick selection)

Application Gate Shaper Guard/Preempt Admission Timestamp
Motion control
Imaging / vision
PLC cyclic + diagnostics
Robot cell (multi-axis)

The recipe matrix is a starting point. The final choice should be driven by the dominant worst-case term and the verification hooks available on the chosen switch/bridge.

TSN toolbox: five blocks and the outcomes they enable

Outcomes Bounded latency Bounded jitter Isolation Proof hooks Mechanisms Gate time windows deadline bound Shaper burst control queue bound Policer rate guard domain protect Admission resource cap no starvation Timestamp tap definition proof & calibration

The toolbox view emphasizes mechanism selection by outcome. Each block must map to measurable metrics and counters; otherwise deterministic claims cannot be validated.

Hardware Architecture: Ingress → Classify → Queue → Shape → Gate → Egress

Why the internal pipeline matters

Determinism is implemented inside the switch/bridge pipeline. The location of classification, queueing, shaping, gating, and timestamp taps determines what can be bounded and what can be measured. This section describes the data path and the time path without expanding into PHY signal integrity.

Typical data path (conceptual pipeline)

  • Ingress parse/classify: maps frames to traffic class/queue/stream state.
  • Per-stream state: eligibility, policing, accounting, and shaping context.
  • Queues (per port / per TC): isolation boundary; occupancy drives worst-case delay.
  • Shapers/policers: bound burst input and protect deterministic domain.
  • Gate scheduler: time-aware service windows and guard-band handling.
  • Egress scheduling: final arbitration and transmit timing.

Cut-through vs store-and-forward (mechanism-level implications)

Cut-through

  • Lower typical latency, but worst-case must account for internal stalls and path variability.
  • Determinism requires measurable per-hop residence time and clear tap-point definition.

Store-and-forward

  • Includes frame-length dependent delay, but the forwarding model is often easier to bound.
  • Worst-case is dominated by queueing/gating rather than hidden internal variability when counters are available.

Hardware knobs checklist (selection and bring-up)

Knob Why it matters How to verify Red flag
Queue / TC count Sets isolation granularity and scheduling expressiveness Map flows to queues; confirm no unintended sharing Key flows share a queue with best-effort
Per-port buffer + watermarks Microburst tolerance and drop behavior; impacts worst-case delay Stress with bursts; watch occupancy and drop reason counters Drops occur without accounting detail
Gate list depth + granularity Limits schedule complexity and smallest achievable windows Implement a representative GCL; verify window miss counters Cannot express required slots/guards
Timestamp tap points + resolution Determines credibility of residence time and E2E measurements Confirm ingress/egress timestamp availability and calibration hooks Tap points are undocumented/inconsistent
Per-queue counters/events Enables proof: drops, late-to-window, shaper eligibility, policing Check counter coverage; validate with controlled faults Only global counters exist
Telemetry / mirror / event log Supports field forensics and configuration drift detection Verify event logs (time base step/lock change) and export path No reliable black-box signals

Where to measure (conceptual observation points)

Observation point Typical signal Proves Common pitfall
Ingress timestamp (T_in) Hardware timestamp / ingress marker Per-hop residence time (with T_out) Tap point mismatch across devices
Queue occupancy Per-queue depth / watermark Queueing/jitter dominance under load Only global buffer metrics available
Gate miss / late-to-window Window miss counters/events Schedule correctness and guard band adequacy Assuming “no drops” means “no misses”
Shaper eligibility Credit/state counters Burst control and stability under background load Misinterpreting shaped delay as failure
Egress timestamp (T_out) Hardware timestamp / egress marker Residence time + output scheduling effects Using software timestamps for proof

Observation points should map to the metric dictionary and the E2E budget blocks. If a critical bound term cannot be measured, deterministic validation is incomplete.

Switch pipeline block diagram: data path and time path (where determinism is enforced)

Data path (frames) Time path (schedule & sync) Ingress parse Classify TC/queue Stream state Queues (per port / per TC) Q0 Q1 Q2 Shaper eligibility Policer rate guard Gate time windows Egress arb + transmit port scheduler tx pipeline Timestamp unit T_in / T_out T_in T_out Time base sync + schedule Gate scheduler Timestamp reference

The diagram separates the data path (frames) from the time path (schedule/sync). Determinism is enforced at queues, shapers/policers, and gates, and proven via consistent timestamp tap points and per-queue counters.

Time-Aware Windows: Gate Control, Schedules, Guard Bands, and Preemption

What this section guarantees (mechanism-level)

Time-aware gating bounds worst-case waiting by turning contention into a scheduled service window. The engineering focus is: how to derive a gate schedule from periodic traffic, why guard bands are required, when preemption becomes necessary, and what multi-hop alignment must ensure. PTP algorithms and standard clause-level explanations are intentionally excluded to avoid overlapping timing/sync pages.

Bound model: what a gate window actually caps

  • Window waiting cap: if a queue is closed, the worst-case waiting is bounded by one cycle plus guard-band effects (conceptually).
  • Queueing vs scheduling: shaping bounds burst-driven queue growth; gating bounds service timing by design.
  • Proof hooks: “late-to-window” / “gate-miss” counters and consistent timestamp tap points are required to verify the bound.

Deriving a gate schedule from periodic traffic (practical workflow)

Inputs required (no protocol details)

  • Cycle time: X (control period / sync epoch).
  • Per-flow envelope: period, max frame size, deadline, queue/TC mapping.
  • Port constraints: link rate, queue count, gate switching overhead X, GCL depth X.
  • Policy: how much best-effort bandwidth is allowed without harming key windows.

Step-by-step schedule build

  1. Group critical flows by deadline and isolation needs; assign them to dedicated queues/TCs where possible.
  2. Choose a shared epoch (cycle start) that aligns with the system’s control rhythm and commissioning reference.
  3. Allocate time slots per critical queue so that the slot capacity covers the per-cycle payload at line rate with margin X.
  4. Insert guard bands at window edges to prevent boundary infringement by non-critical frames.
  5. Place best-effort windows in remaining space and restrict them by shaping/policing as needed.
  6. Validate the schedule with “late-to-window / gate-miss” counters and per-hop residence time measurements.

Gate Control List (GCL) template (commissioning-ready)

Cycle Slot Start (offset) Duration Gate state (Q0..Qn) Guard band Notes
T = X S0 0 X Q0:OPEN, others:CLOSED GB = X Critical window
T = X S1 X X Q1:OPEN, others:CLOSED GB = X Secondary window
T = X S2 X X BE:OPEN (shaped), Q0/Q1:as needed GB = X Best-effort window
Repeat S3.. Version / rollback id

Recommended commissioning practice: version the GCL, define a rollback plan, and validate per-hop residence time and window-miss counters before production rollout.

Guard-band quick estimation (rule block)

Minimum guard band (structural form)

GB(MaxFrameTime at line-rate) + (Gate switching overhead X) + (Margin X)

  • MaxFrameTime depends on maximum non-critical frame size X and link speed X.
  • Switching overhead includes implementation latency when changing gate state (X).
  • Margin covers clock error, timestamp uncertainty, and measurement bias (X).

Fast symptom check (guard band too small)

  • Late-to-window / gate-miss counters increase in a periodic pattern.
  • Failures correlate with large best-effort frames near window edges.
  • Reducing best-effort MTU or adding preemption reduces misses immediately.

When preemption is necessary (decision logic + trade-offs)

Trigger Why guard band alone is inefficient Expected gain Cost / risk
Narrow windows + large BE frames Guard band must cover near-max frame time, wasting slot capacity Smaller guard band, better window utilization Higher configuration/validation complexity
High utilization deterministic schedule Static gaps accumulate and reduce usable bandwidth More payload per cycle without violating window edges Debugging and forensics become harder
Strict window-edge integrity must be proven Without preemption, BE overlap risk must be eliminated by oversized guard bands Cleaner proof of “no overlap” with proper counters Requires consistent capability across the path

Multi-hop note (no sync algorithm details): window phase alignment must ensure the flow’s arrival time lands inside the intended window at every hop with margin X.

Time wheel + gate timeline: cycle, slots, guard band, and preemption boundary

Cycle wheel T = X Slot A BE Slot B Idle Epoch Gate timeline Open/close per queue 0 T Q0 Q1 BE GB Preempt on Open window Guard band Preemption boundary

The timeline view highlights the non-obvious constraint: window edges must be protected from overlap. Guard bands and preemption are the primary tools to preserve edge integrity without sacrificing excessive slot capacity.

Not covered here (scope guard)

  • PTP/SyncE algorithm internals and topology selection logic.
  • Clause-by-clause IEEE interpretations and certification procedures.

Shaping for Determinism: CBS / ATS / Rate-Limit and Queue Discipline

When gating is not required

Many systems achieve sufficient determinism without explicit time windows by bounding burstiness and isolating queues. This section explains how shaping reduces queue growth and packet delay variation, where stability traps occur, and how to decide when a gate schedule becomes mandatory.

Mechanism comparison (what each shaper controls)

CBS

  • Best for: steady or near-periodic streams needing bounded PDV.
  • Controls: output smoothness via eligibility/credit dynamics.
  • Typical pitfall: credit reset/initialization causes periodic “mystery jitter”.

ATS

  • Best for: mixed or bursty traffic requiring per-flow timing discipline.
  • Controls: burst spreading via time-based eligibility scheduling.
  • Typical pitfall: queue coupling hides microbursts unless per-queue telemetry exists.

Rate-limit

  • Best for: ingress protection and simple fairness, not strict deadlines.
  • Controls: peak rate/burst envelope to prevent overload.
  • Typical pitfall: “average looks OK” while microbursts still hit watermarks.

Shaper stability traps (symptom → quick check → fix)

Microburst dominance

Quick check: correlate PDV spikes with queue watermarks/occupancy peaks (not averages).
Fix: tighten ingress policing, isolate queues, and set buffer watermarks X with telemetry.

Credit/state discontinuities

Quick check: jitter spikes repeat with a fixed cadence tied to state resets or link events.
Fix: validate shaper initialization, avoid hidden resets, and log eligibility state transitions.

Queue coupling

Quick check: key flows share a queue/TC with best-effort or diagnostics traffic.
Fix: dedicate queues for key classes, separate buffer pools where supported, and enforce admission rules.

Shaper selection decision tree (gate vs CBS vs ATS vs rate-limit)

  1. If the requirement is a hard deadline bound at specific cycle phases → prefer Gate (and guard/preempt as needed).
  2. If traffic is near-periodic and the target is bounded PDV under load → choose CBS with strict queue isolation.
  3. If traffic is bursty/mixed and needs controlled eligibility timing → choose ATS and verify per-flow/queue telemetry.
  4. If the goal is ingress protection and preventing overload rather than strict determinism → apply rate-limit/policer plus admission.
  5. If adding one more stream risks breaking existing bounds → enforce admission control regardless of shaper choice.

Microburst control checklist (field-stable determinism)

  • Ingress rate guard: define burst envelope and enforce with policing.
  • Queue isolation: keep key flows away from best-effort and diagnostics queues.
  • Buffer watermarks: set early warning thresholds X and log occupancy peaks.
  • Telemetry: require per-queue occupancy peak, drop reason, eligibility/credit state.
  • Admission: prevent incremental additions from silently breaking bounds.

Queue + shaper behaviors: no shaping vs CBS vs ATS (occupancy over time)

No shaping CBS ATS Time Queue Time Queue Time Queue WM WM WM Higher peaks Smoothed Bounded

The panels focus on occupancy peaks rather than averages. In practice, microburst control must be validated with per-queue occupancy peaks, watermarks, and eligibility/credit telemetry.

Not covered here (scope guard)

  • Protocol stack specifics, parameter tables for particular industrial profiles, and certification workflow details.
  • PHY/magnetics/layout/EMI topics (handled in the PHY co-design and protection pages).

Admission Control & Isolation: Keep the Network from Becoming a Tuning Lottery

Why admission exists (determinism needs contracts)

Gates and shapers control forwarding behavior, but without admission the guarantees are eventually broken by new flows, new tenants, or untracked diagnostics traffic. Admission turns determinism into an enforceable contract by making bandwidth, queues, gate slots, buffers, and time budget explicit and versioned.

Admission input → decision → contract output

Inputs (flow descriptor)

  • Period: X
  • Max frame: X
  • Priority / TC: X (and queue mapping)
  • Path: ingress port → … → egress port(s)
  • Bound required: latency X / jitter X / loss X
  • Burst policy: allowed / disallowed (X)

Outputs (decision + reservation contract)

  • Decision: Admit / Reject / Admit-with-conditions
  • Reserved resources per hop: slot time X, shaper rate X, buffer quota X
  • Isolation mapping: dedicated queue/TC + counters scope
  • Monitoring hooks: thresholds X + violation actions
  • Versioning: config id + rollback pack

Resource dimensions admission must account for

  • Bandwidth: per-cycle service capacity and sustained load margins (X).
  • Queues & priorities: queue count, TC mapping, and coupling risk from shared queues.
  • Gate slots: available slot time, phase margin (X), and GCL depth (X).
  • Buffers: per-port buffers, shared pools, watermarks (X), and drop reasons.
  • Time budget: per-hop residence time tail + window waiting cap + guard band margin (X).

Common failure pattern: “average bandwidth looks fine” while slot occupancy, buffer peaks, or window-miss counters silently cross the determinism contract.

Admission worksheet (template)

Flow ID Owner Period Max frame Priority/TC Path Bound target Reservation per hop Monitor & thresholds Admit?
F-001 Line-A X X TC0 / Q0 P1→P3→P7 Latency X, Jitter X slot X, shaper X, buf X wm X, miss X, drop X Admit
F-002 Diagnostics X X BE / Qx P2→P6 Loss X policer X, buf X police-hit X, drop X Conditional
Version

Operational rule: any new flow must be recorded, evaluated, and admitted with a versioned contract; direct “add it and see” changes convert determinism into a tuning lottery.

Isolation rules (principles that prevent configuration collisions)

Principles

  • Critical classes must have dedicated queue/TC mapping and independent bounds.
  • Best-effort and diagnostics must never share a critical queue.
  • Every admitted class must have scoped counters and thresholds (X).

Enforcement points

  • Namespace configs by tenant / line / cell to avoid cross-edits.
  • Lock “critical zones” (queues/slots) behind approval and rollback packaging.
  • Assign hard quotas per tenant: bandwidth/slot/buffer/counter budgets (X).

Runtime monitoring & violation handling (contract must be provable)

  • Monitor: queue occupancy peak, watermark hits, window-miss/late counters, drop reasons, policer hits.
  • Detect: threshold breach (X) tied to a specific flow/tenant and port.
  • Act: rate-limit, downgrade to BE, isolate to a sacrificial queue, or reject renewal.
  • Audit: log config version + violation context for rollback and root-cause.

Admission pipeline: request → resource calc → policy → deploy → monitor → handle violations

Flow request Resource calc Policy & quota Deploy config Runtime monitor period max frame path slot queue buffer namespace quota approve GCL shaper version counters alerts audit Violation handler rate-limit downgrade isolate reject feedback contract boundary enforcement closed loop

The key requirement is traceability: admission decisions, reserved resources, and violation actions must all be tied to a versioned configuration and an auditable owner/tenant boundary.

Not covered here (scope guard)

  • Industrial protocol stack parameters and certification procedures.
  • Timing/sync algorithm internals (handled in Timing & Sync pages).

Hardware Timestamping & Time Base: Accuracy, Two-Step, and Residence Time

Why timestamps must be trustworthy

Deterministic forwarding still fails closed-loop control if timestamp tap points drift or time base states are not observable. This section focuses on hardware timestamp insertion points, residence time observability, one-step vs two-step engineering trade-offs, and a practical error budget structure.

Time base sources and observable states (no algorithm details)

  • Local oscillator: simple deployment; requires drift monitoring and defined holdover triggers (X).
  • Synchronized reference: better long-term accuracy; requires lock/offset/drift observability.
  • Recovered/servo time: transitions into holdover on loss; must define re-lock behavior and alarm thresholds (X).

Minimum observables (sanity signals)

  • Lock state (stable for X time)
  • Offset (within X) and drift rate (within X)
  • Holdover entry conditions and maximum allowed duration (X)

Timestamp tap points: where error is created

  • Ingress timestamp: captures arrival into the switch pipeline; determines start of residence time.
  • Egress timestamp: captures departure; determines the tail behavior under congestion and shaping.
  • MAC vs PHY boundary: a more line-side tap reduces unmodeled delay; an inner tap is easier but adds tap uncertainty (X).

Residence time and one-step vs two-step (engineering view)

Residence time

The measurable quantity is the time spent inside the device from ingress tap to egress tap. Verification should focus on tail percentiles (P99/P999) with thresholds X, not only averages.

One-step vs two-step

  • One-step: in-line update at line rate; demands tight implementation and validation.
  • Two-step: follow-up event simplifies implementation; requires robust event matching and consistency checks.
  • Debug impact: two-step can improve forensics by separating transmission and correction paths.

Timestamp error budget (structure, thresholds as X)

Error term What it represents How to observe Threshold
Time base offset/drift/holdover effects lock/offset/drift counters X
Quantization timestamp resolution limit timestamp LSB and distribution X
Tap uncertainty MAC/PHY boundary ambiguity calibration + loopback compare X
Queueing tail residence time percentile tail under load P99/P999 residence time (X) X

Pass criteria suggestion: treat time base and tap uncertainty as bounded terms, and validate queueing tail with a load profile representative of production (thresholds X).

Sync sanity checklist (commissioning & field ops)

  • Lock: stable for X time; no flapping.
  • Offset: within X; verify after temperature steps.
  • Drift: within X per unit time; flag trend increases.
  • Holdover: trigger conditions defined (X), and maximum allowed duration (X).
  • Recovery: post re-lock behavior verified; re-calibrate window phase if required.

Timestamp tap points: ingress/egress (MAC vs PHY boundary) and error sources

Switch pipeline Ingress Port Parser Classify Queue Shaper Egress Port TS_in(MAC) TS_in(PHY) TS_out(MAC) TS_out(PHY) Residence time Error sources time base offset/drift Quantization resolution LSB Tap uncertainty MAC vs PHY Δ Queueing tail P99/P999 X One-step vs Two-step (engineering) One-step update-in-line tight validation Two-step follow-up event event matching

Practical verification: document the exact tap definition, treat residence time as a percentile distribution, and require time base lock/offset/drift observability with clear holdover triggers (X).

Not covered here (scope guard)

  • PTP/SyncE/White-Rabbit protocol algorithm details and topology selection.
  • PHY signal integrity and magnetics/layout considerations.

Configuration & Parameterization: GCL Tables, Profiles, and Guardrails

Why configuration must be engineered (not “tuned”)

TSN failures often come from table drift, partial updates, and unmanaged profiles rather than incorrect theory. A deterministic network needs configuration assets that are packaged, validated, staged, committed atomically, and rolled back reliably.

Configuration inventory: the tables a TSN switch/bridge must manage

Data-plane tables

  • GCL: cycle, slots, gate masks, guard band (X).
  • Stream/flow table: period, max frame, path, class, burst policy.
  • Queue/TC mapping: priority → TC → queue; no holes or critical sharing.
  • Shaper params: CBS / ATS / rate-limit knobs aligned to the chosen profile.
  • Admission quotas: per-hop reservations and violation actions.

Time & timestamp configuration

  • Time-base policy: lock/offset/drift/holdover thresholds (X).
  • Timestamp mode: one-step/two-step, tap definition, residency measurement enable.
  • Update gating: no schedule commit when time base is not locked.

Ops guardrails

  • Thresholds & alarms: gate miss, late/early window, watermark, timestamp jump, holdover events.
  • Rollback strategy: shadow bank + atomic commit + complete rollback pack.
  • Audit: every applied profile ties to a version id and change record.

Parameter Set package: directory rules for a versioned config bundle

  • /manifest.json — version, target device, port-rate constraints, prerequisites, checksums.
  • /tables/gcl.csv — cycle, slot list, gate masks, guard band (X).
  • /tables/queue_map.json — priority/TC/queue mapping rules.
  • /tables/streams.csv — flow descriptors, path and bound targets.
  • /tables/shapers.json — CBS/ATS/rate-limit parameters.
  • /tables/quotas.json — admission reservations and violation actions.
  • /ops/thresholds.json — alarms and thresholds (X).
  • /ops/rollback/ — complete rollback pack (all dependent tables).
  • /checks/ — offline guardrails: slot min, GCL depth, map integrity, time-lock gates.

Design rule: configuration is not a scattered set of register writes; it is a deployable asset with dependencies, validation, and rollback semantics.

Safe change process: avoid partial updates that “hang” the network

Recommended sequence

  1. Offline validate: manifest integrity + dependency checks.
  2. Stage: load into shadow bank (not active).
  3. Pre-check: time base lock/offset/drift within thresholds (X).
  4. Atomic commit: switch at a defined boundary (e.g., cycle edge).
  5. Canary rollout: a subset of nodes first; watch gate/window/queue tails.
  6. Rollback-ready: auto revert if key counters exceed thresholds (X).
  7. Audit: bind events and snapshots to config version.

Typical failure mode

A schedule update becomes unsafe when only part of the dependency set is applied (GCL updated, but queue map or shaper parameters remain old). Guardrails should enforce “all-or-nothing” activation or reject the commit.

Guardrails: rules that prevent unsafe parameter sets

  • Slot minimum: slot ≥ X (depends on max frame, link rate, and gate switch latency).
  • Guard band presence: critical window transitions reserve guard band X.
  • GCL depth: entry count ≤ X; reject truncated lists.
  • Map integrity: TC→queue mapping has no holes; critical queues are not shared.
  • Time-lock gate: no commit when time base is unlocked or in holdover.
  • Rollback completeness: rollback must include all dependent tables (GCL + map + shapers + thresholds).

When changes require re-computation (which tables must be re-validated)

Change event Tables impacted Must re-check
Time topology change time policy, timestamp mode, guard band margin lock/offset/drift (X), window phase margin (X)
Path / hop change streams, quotas, queue map per-hop reservation, isolation integrity, tail percentiles (X)
Port rate change GCL slot lengths, guard band, shapers slot minimum (X), serialization budget, window boundary tests
Queue resource change queue map, GCL masks, shaper policies no holes, no critical sharing, watermark caps (X)

Config bundle: a versioned Parameter Set staged to shadow banks and committed atomically across switches

Parameter Set version + manifest GCL Streams Queue map Shapers Quotas Thresholds Checks Rollback Orchestrator stage commit canary audit Guardrails slot min depth lock gate Switch fleet shadow bank + active bank Switch A shadow active Switch B shadow active Switch C shadow active stage Atomic commit cycle boundary all-or-nothing rollback

A safe rollout requires dependency completeness (tables move together), pre-commit time-lock checks, and a rollback pack that restores all coupled parameters.

Not covered here (scope guard)

  • Timing/sync algorithm details and protocol parameter internals.
  • Physical layer SI/layout/magnetics and EMC design details.

Verification & Observability: Prove Bounds and Catch Field Drift

Determinism is only real when bounds are provable

Verification must demonstrate time stability, schedule correctness, and queue/shaper contract compliance under realistic worst-case load. Field observability must correlate errors with configuration versions and environmental conditions to enable fast forensics.

What must be proven (three bound categories)

  • Time stability: lock/offset/drift are stable; holdover events are controlled (X).
  • Schedule correctness: GCL is active; no late/early windows; gate misses near zero (X).
  • Contract compliance: queue watermarks and drop reasons remain within limits; tail percentiles meet bounds (X).

Counters and events: organize by diagnostic value

Schedule & gate

  • gate miss, late window, early window (threshold X)
  • window transition counts and anomalies

Queue & buffer

  • per-queue drops with drop reasons (overflow/policer/gate closed)
  • watermark peaks and microburst indicators (X)

Timestamp & time base

  • timestamp jump, lock state changes, holdover enter/exit events
  • offset/drift trends (X) and stability windows

Port health (switch view)

  • link flap events and transitions
  • per-port error counters (concept-level)

Operational requirement: every snapshot must bind to port + queue + (optional) flow id and the config version.

Correlate failures to environment + version + events (field forensics)

  • Error: gate miss spike / watermark over / tail percentile jump / drop reason change.
  • Environment: temperature, voltage, power events.
  • Version: config_version + change record.
  • Link events: flap transitions as a frequent trigger source.

Bring-up test plan (minimum closed loop)

Test Method Observables Pass criteria
Sync lock test stabilize and step temperature / load lock, offset, drift, holdover events stable within X
GCL boundary probe near window edges late/early window, gate miss ≤ X
Worst-case background stress BE + burst patterns watermark, drops, tail percentiles P99/P999 ≤ X
E2E bound proof measure per-hop + end-to-end distribution latency/jitter/loss distributions upper bound ≤ X

Field black-box schema (flight-recorder style)

  • timestamp: local time + sync state
  • config_version: active Parameter Set id
  • event_type: holdover_enter, gate_miss_spike, watermark_over, link_flap, …
  • context: port, queue, (optional) flow_id
  • counters_snapshot: gate/window/queue/drop/time-base counters
  • env_snapshot: temperature, voltage, power-event flags
  • action_taken: rate-limit / downgrade / rollback / alert

Trigger strategy: periodic snapshots every X seconds plus event-triggered snapshots on threshold crossing, stored in a ring buffer of size X.

Observe & correlate: traffic → counters → event log → diagnosis (with env + version)

Traffic critical flows best-effort Counters gate/window queue drops time Event log gate miss spike holdover enter watermark over Context env temp/volt version cfg id Correlate spike ↔ event version ↔ env Diagnosis bucket action

A useful black box ties counters and events to the active config version and environment snapshot, enabling fast “what changed” answers and safe rollback decisions.

Not covered here (scope guard)

  • Industrial stack certification steps and protocol parameter deep-dives.
  • Timing/sync algorithm internals and PHY SI/layout details.

Engineering Checklist: Design → Bring-up → Production Gates

Engineering gates that turn mechanisms into repeatable outcomes

Each gate enforces resource sufficiency, time stability, configuration integrity, and proof-quality evidence so determinism does not depend on manual tuning.

Design Gate Goal: resource sufficiency + evidence readiness

Resources & pipeline capacity

  • Check: queue/TC count covers isolation (no critical sharing). Pass: isolation plan has no overlaps (X). Evidence: queue map + profile sheet.
  • Check: per-port buffer covers worst-case microburst. Pass: watermark budget ≥ X. Evidence: buffer sizing worksheet.
  • Check: GCL depth/slot count supports cycle composition. Pass: entry count ≤ X, slot min ≥ X. Evidence: GCL template + guardrail report.
  • Check: shaper instances and granularity match profile. Pass: all required shapers available (X). Evidence: shaper allocation table.

Time base & commit safety

  • Check: lock/offset/drift/holdover policy is defined. Pass: thresholds documented (X). Evidence: time policy card + manifest prereqs.
  • Check: “time-lock gate” blocks schedule commits when unlocked. Pass: commit is rejected under unlock/holdover. Evidence: guardrail rule + test record.

Config asset & observability readiness

  • Check: Parameter Set package is versioned and complete. Pass: manifest + dependent tables present. Evidence: config bundle tree + checksums.
  • Check: counters/events cover schedule + queue + time base. Pass: must-have signals enabled (X). Evidence: counter map + black-box schema.
Bring-up Gate Goal: prove bounds under worst-case load

Time stability & schedule correctness

  • Check: sync lock remains stable for X minutes. Pass: offset/drift within X; holdover events within X. Evidence: lock trend + event log.
  • Check: GCL activation is correct. Pass: late/early window ≤ X; gate miss ≤ X. Evidence: counter snapshots tied to version.
  • Check: window boundary probe is clean. Pass: no boundary violations (X). Evidence: boundary test record.

Worst-case background stress

  • Check: key queues stay below watermark limits. Pass: watermark ≤ X; no overflow drops. Evidence: per-queue watermark log.
  • Check: tail percentiles meet bounds. Pass: P99/P999 ≤ X. Evidence: latency distribution export.
  • Check: drop reasons are explainable. Pass: expected-only drop reasons. Evidence: drop-reason breakdown.

Change safety rehearsal

  • Check: shadow load + atomic commit works at cycle boundary. Pass: no partial activation. Evidence: commit audit with version ids.
  • Check: rollback triggers on threshold crossing. Pass: rollback completes and restores bounds (X). Evidence: rollback event + post-rollback counters.
Production Gate Goal: repeatability + rollback + forensics

Config governance

  • Check: every device boots with a known config_version. Pass: version is readable and logged. Evidence: boot log + inventory report.
  • Check: dependency completeness is enforced. Pass: reject incomplete bundles. Evidence: guardrail rejection record.

Rollback readiness

  • Check: rollback drill is performed and timed. Pass: restore within X; bounds recover. Evidence: drill report + counters before/after.
  • Check: no-commit under unlock/holdover is enforced. Pass: commit blocked. Evidence: audit + event log.

Field black-box completeness

  • Check: snapshot includes version + env + counters. Pass: required fields present. Evidence: schema validation report.
  • Check: data capture completeness exceeds X%. Pass: completeness > X%. Evidence: periodic audit summary.

3-gate pipeline: design → bring-up → production → field forensics (pass criteria at each gate)

Design Gate resources OK guardrails set evidence ready Bring-up Gate lock stable windows clean P99/P999 ≤ X Production Gate versioned cfg rollback drill black-box > X% Field forensics event + counters config version + env diagnosis → action (limit / rollback / fix) feedback

Each gate ties pass criteria to measurable counters, snapshots, and the active configuration version, enabling consistent outcomes across design, lab, production, and field.

Scope guard (not expanded here)

  • PHY/layout/magnetics/EMC implementation details (refer to PHY co-design & protection pages).
  • Timing algorithm internals and industrial stack certification procedures.

Applications: Where TSN Switch/Bridge Actually Pays Off

Use cases should map to mechanism recipes (not standard clause lists)

Each scenario is defined by workload patterns and bound targets. The mechanism recipe selects Gate/Shaper/Admission/Timestamp/Observe as a coherent set.

Use-case → mechanism recipe (concept-level)

Use case Bound target (X) Gate Shaper Admission Timestamp Observe
Motion control bounded latency + jitter (X) on (windowed cyclic) optional (BE smoothing) required (no new flow breaks bounds) optional (alignment proof) gate/window counters + tail P99/P999
Machine vision / imaging burst control + trigger determinism (X) sometimes (trigger windows) required (microburst suppression) recommended (capacity reservation) recommended (event alignment) watermark + drop reason + tail shift
PLC + distributed I/O cyclic + acyclic coexistence (X) optional (hard partitions) recommended (class-based shaping) recommended (resource isolation) optional queue isolation + BE starvation watch
Robot cell / multi-axis multi-flow coordination bounds (X) on when cyclic is strict optional (tail control) required (prevent “tuning lottery”) recommended per-flow quotas + violation events
Power / rail / utility time integrity + auditability (X) optional recommended (traffic smoothing) recommended required (trusted stamps) timestamp jump + holdover + event log
Edge gateway / bridging mixed domains + safe updates (X) optional recommended required (policy isolation) recommended versioned config + correlation black-box

Mechanism recipes should be validated using the proof hooks defined in verification and observability, then locked into a versioned Parameter Set.

Recipe patterns (quick reference)

Cyclic hard bounds

Gate + Admission + Window counters. Shaper is secondary for smoothing non-critical traffic.

Burst-heavy data paths

Shaper + Watermark telemetry + Drop-reason breakdown. Gate is used only when trigger windows are strict.

Multi-tenant and expansion-safe

Admission quotas + strict isolation + versioned config bundles prevent new flows from breaking existing guarantees.

Recipe cards: each use case enables a different mechanism set (Gate / Shaper / Admission / Timestamp / Observe)

Motion control Gate Shaper Admission Stamp Obs bounded latency Machine vision Gate Shaper Admission Stamp Obs burst control PLC + I/O Gate Shaper Admission Stamp Obs coexistence Robot cell Gate Shaper Admission Stamp Obs quotas expansion-safe Power / utility Gate Shaper Admission Stamp Obs audit time integrity

Recipes should be exported as versioned profiles and validated with schedule counters, watermark telemetry, and time-base stability checks.

Scope guard (not expanded here)

  • Protocol stack specifics and certification checklists for PROFINET/EtherCAT/CIP.
  • PHY/magnetics/EMC implementation detail and connector-level design.

H2-13. IC Selection Logic (TSN Switch / Bridge)

This section avoids “product dumping” and instead defines a repeatable selection method: requirements → TSN mechanisms → resource sufficiency → verifiable observability → operable configuration governance. The goal is to filter out parts that claim TSN support but cannot prove or sustain deterministic bounds in production and field operation.

Selection funnel (5 steps that prevent “tuning lottery”)

  1. Hard constraints — port count / port speed / host interface / thermal & package / industrial temp grade. Output: shortlist that physically fits the design.
  2. Determinism profile — target bounds for latency, jitter, loss, and time error, plus traffic shape (periodic / bursty / mixed). Output: which mechanisms are mandatory.
  3. Mechanism coverage — gate windows, shaping, admission control, and hardware timestamps. Output: “must-have list” per use-case.
  4. Resource sufficiency — queue count, per-port buffer, GCL depth, shaper instances, policing granularity. Output: proof that worst-case traffic still fits.
  5. Operate-ability — counters, events, mirror/trace aids, configuration bundle + rollback + safe commit. Output: ability to keep bounds stable over lifecycle.

Reference material examples (non-exhaustive): Microchip LAN9662 / LAN9668; NXP SJA1105P/Q/R/S / SJA1110; Renesas RZ/N2L / RZ/T2M. Use these part numbers as anchors for capability checklists, not as an implied “best choice”.

Example part buckets (how to read “fit”, not a shopping list)

  • Industrial TSN switch with integrated CPU: Microchip LAN9668 (orderable examples: LAN9668-9MX, LAN9668-I/9MX). Fit: multi-port gateways / remote IO / TSN edge switches.
  • Compact TSN end-point / small switch role: Microchip LAN9662 (orderable example: LAN9662/9MX). Fit: TSN-capable endpoints or small-port bridge designs.
  • Automotive TSN switch family (AVB/TSN focus): NXP SJA1105P / SJA1105Q / SJA1105R / SJA1105S. Fit: deterministic multi-domain aggregation when safety/security hooks matter.
  • Multi-gig safe & secure TSN Ethernet switch SoC: NXP SJA1110 family. Fit: TSN switching with security/safety features and strong ecosystem.
  • MCU/MPU with integrated TSN-compliant small switch: Renesas RZ/N2L, RZ/T2M. Fit: TSN bridge / controller designs where compute + small-port TSN is sufficient.

Tip: selection should be driven by mechanism + resource + observability, not by a single “TSN supported” checkbox.

Selection scorecard (dimension → why → how to verify → pass (X) → red flags)

Dimension Why it matters How to verify (engineering) Pass criteria (X) Red flags / reject fast
Ports & speeds
Example materials: LAN9662, LAN9668, SJA1105P/Q/R/S, SJA1110
Determines how many deterministic flows can be isolated without queue sharing. Port speed impacts guard band sizing and worst-case serialization time per hop. Confirm port modes, link partners, and CPU/host port bandwidth. Validate that worst-case frame + background traffic still meets the end-to-end bound. Ports ≥ X; required speeds supported; CPU/host port not a bottleneck (utilization < X%). CPU/host port saturates under mirror/telemetry. “TSN supported” but only on a subset of ports or queue classes.
Switching mode (store-and-forward vs cut-through) Impacts per-hop latency shape and how errors are contained. Determinism needs bounded delay, not just average. Measure per-hop forwarding delay (ingress timestamp → egress timestamp) under load and with gate windows enabled. Per-hop delay upper bound ≤ X; mode is stable across MTU and VLAN/priority mixes. Cut-through path lacks deterministic gating integration (gate applies “after” unpredictable pipeline stages).
TSN mechanism coverage Different traffic types require different tools: time windows for strict periodic, shaping for mixed loads, admission to keep bounds valid when new streams appear. Map each required flow class to one of: Gate / Shaper / Policer / Admission / Timestamp. Verify all required blocks are hardware-backed and have counters. Required blocks present; each block has measurable state & events; no “software-only” critical path. “Supported” but missing gate-miss / timestamp-jump / policing-drop reasons.
Queue count, per-class isolation & mapping If critical traffic shares queues with best-effort, tail jitter and microbursts become unbounded and hard to debug. Confirm independent queues per traffic class, queue depth controls, and queue-level watermark counters. Dedicated queues for critical classes; mapping has no “holes”; watermark visibility available. Queue mapping cannot be audited; only port-level drops exist (no per-queue attribution).
GCL depth, time resolution & safe update Determinism relies on the schedule being representable without compressing slots or merging classes. Updates must not “tear” live traffic. Check: max entries, minimum slot time, and whether shadow/atomic commit exists for GCL. Validate “late/early window” counters during schedule changes. GCL depth ≥ X; min slot ≤ X; schedule switch without packet loss beyond X ppm. No shadow table; updates require “stop traffic” maintenance windows.
Guard band & preemption support (when needed) Guard bands protect time windows from being invaded by long frames. Preemption reduces wasted guard time at higher utilization. Validate calculated guard time vs measured gate edge behavior under maximum MTU background traffic. Confirm counters exist for “late/blocked due to guard”. No gate-edge violations; guard time margin ≥ X; preemption behavior is deterministic when enabled. Preemption “supported” but cannot be validated (no counters / no clear enable scope).
Shaping instances & stability under microbursts Many designs do not need strict gating everywhere. Shapers must remain stable under burst, credit resets, and queue coupling. Stress test with bursty best-effort plus periodic flows. Observe per-queue occupancy, drop reasons, and shaping state transitions. Tail latency bound holds under worst-case background load; no unexplained credit “jumps”. No per-queue occupancy; only aggregate port counters exist (debug becomes guesswork).
Admission control & resource accounting Without admission, determinism can be invalidated the day a new stream is added. Admission is what turns “tunable” into “guaranteed”. Confirm: per-stream descriptors, resource model (bandwidth / slot / queue / buffer), and enforcement. Validate rejection behavior and “violation events”. Admission decision is explainable; violations are logged; rejection occurs before bounds are harmed. “Admission” exists only as a software convention (no hardware enforcement / no event logs).
Hardware timestamping & tap-point clarity Timestamp error couples into scheduling and closed-loop control. If the tap point is unclear, residence time and correction become unverifiable. Confirm ingress/egress timestamp capture paths, correction reporting, and “timestamp jump / holdover” events. Timestamp error budget ≤ X; tap point documented; residence time observable per hop. Hardware timestamps exist, but no access to raw capture records or no error/step detection.
Counters, events, mirror/trace aids Field failures are mostly diagnosability failures. The minimum set must isolate: queue drops, gate misses, timing anomalies, and configuration version. Require per-port + per-queue counters, gate-miss/late/early flags, timestamp events, plus mirror support for capture. Black-box completeness > X%; event-to-counter correlation works with config version tagging. Only link-level counters exist; no queue-level attribution; no gate miss observability.
Configuration governance (bundle, shadow, rollback) TSN failures are often configuration failures. Without safe updates, networks freeze and field drift becomes unmanageable. Verify: config package versioning, dependency checks, atomic commit (shadow → swap), and rollback rehearsal. Rollback success rate ≥ X%; shadow swap time ≤ X; “time base unlocked” prevents schedule activation. Schedule/config updates require reboot or uncontrolled transient behavior; no “safe commit” path.

Scorecard usage: assign weights according to the determinism profile, then require that every “must-have” row passes verification with measurable evidence.

Bring-up verification hooks (must exist before committing a part)

1) Traffic generation hooks

  • Ability to inject periodic flows + bursty best-effort simultaneously (worst-case background load).
  • Repeatable stress profiles: microburst, long-frame invasion attempts, mixed priority classes.
  • Pass criteria: P99.999 latency ≤ X, jitter ≤ X, and zero unexplained drops in the critical class.

2) Data-plane self-test hooks

  • Port/pipeline loopback modes (to separate “configuration vs environment” quickly).
  • PRBS / built-in traffic test (if available) to validate datapath stability without external complexity.
  • Pass criteria: self-test completes with error counters stable within X / hour.

3) Time & timestamp hooks

  • Ingress/egress timestamp capture with clear tap-point definition.
  • Residence time / correction evidence path (concept-level requirement).
  • Events: timestamp jump, timebase unlock, holdover enter/exit.
  • Pass criteria: time error ≤ X; no timestamp step under stress; holdover behavior matches guardrails.

4) Gate/shaper/queue enforcement hooks

  • Counters: per-queue drop, per-queue watermark, gate miss, late/early window, policing drops (with reasons).
  • Schedule update safety: shadow table + atomic swap, plus “deny activation if timebase not locked”.
  • Pass criteria: zero gate-edge violations; all drops are attributable and bounded within X.

Reject-fast red flags (high probability of “un-debuggable determinism”)

  • No gate miss / late/early window counters → schedule failures cannot be proven or localized.
  • Hardware timestamps exist but tap-point is unclear or inaccessible → time error budget cannot be validated.
  • Only port-level drops, no per-queue attribution → microburst and starvation become guesswork.
  • No shadow/atomic commit for GCL and mapping tables → updates introduce uncontrolled transient behavior.
  • CPU/host port becomes the bottleneck under observability → deterministic network collapses when debugging is needed most.

Decision output template (copy/paste into a design review)

Chosen material: PN = ________ (examples: LAN9668 / SJA1110 / SJA1105P/Q/R/S / RZ/N2L)
Hard constraints: ports = X, speeds = X, host = X, temp = X
Determinism profile: latency ≤ X, jitter ≤ X, loss ≤ X, time error ≤ X
Mechanism recipe: Gate (Y/N), Shaper (CBS/ATS/Rate), Admission (Y/N), Timestamp (Y/N)
Resource proof: queues = X, buffer = X, GCL depth = X, min slot = X
Observability minimum set: per-queue drop + watermark + gate miss + timestamp events
Config governance: bundle versioning + shadow swap + rollback rehearsal
Bring-up plan: lock stability X min; gate-edge violations = 0; P99.999 bound validated
Known risks: ________ | Mitigations: ________

Diagram — Selection funnel + scorecard + bring-up hooks (concept map)

TSN Switch/Bridge Selection Funnel and Verification Hooks Flow from requirements to mechanisms, resources, observability, and configuration governance, with scorecard and bring-up hooks. TSN Switch / Bridge — Selection Funnel + Proof Hooks 1) Constraints Ports / Speed Thermal / Pkg 2) Profile Latency bound Jitter / Time 3) Mechanisms Gate / Shaper Admission / TS 4) Resources Queues / Buffer GCL depth 5) Operate Counters / Events Bundle + Rollback Scorecard (evidence-driven) Dimension Verify Red flags Queues / Buffer Watermarks + drops No per-queue view GCL / Gate Late/early + miss No safe update Timestamp Tap + jump events Unclear tap Bring-up hooks Traffic stress Periodic + burst Loopback / PRBS Fast isolation Time & stamp Tap + holdover Gate/queue counters Attribution Material anchors: LAN9662 / LAN9668 · SJA1105P/Q/R/S · SJA1110 · RZ/N2L / RZ/T2M

The diagram is intentionally mechanism- and evidence-oriented: the part number is only useful if it passes the scorecard with measurable proofs.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13. FAQs (TSN Switch / Bridge)

Scope: long-tail troubleshooting only. Each answer is a fixed, measurable 4-line format: Likely cause / Quick check / Fix / Pass criteria (threshold placeholders X/Y).

Q1. Enabling Qbv causes sporadic packet loss — guard band underestimated or gate switch overhead ignored?
Likely cause: Guard band T_guard smaller than worst-case frame serialization + internal gate edge latency; gate schedule allows long BE frame to overlap a critical window.
Quick check: Read/plot gate_miss_cnt, late_window_cnt, per_queue_drop_cnt during max-MTU background traffic; correlate drops with gate edge timestamps.
Fix: Increase T_guard using worst-case MTU and link rate; move BE queue to a fully closed window around critical slots; enable/validate preemption only if utilization loss is unacceptable.
Pass criteria: gate_miss_cnt=0 and late_window_cnt=0 over Y cycles; critical-class drops ≤ X ppm under worst-case background load.
Q2. End-to-end latency percentiles look great, but rare spikes violate the hard bound — microburst or admission leak?
Likely cause: Microburst drives queue above planned watermark; or admission accounting missed a resource dimension (slot/queue/buffer), allowing occasional overload.
Quick check: Compare p99_999_e2e_us vs max_e2e_us; inspect per_queue_watermark near spikes; check admission_violation_evt and policer_drop_reason.
Fix: Add ingress policing for burst sources; increase isolation (dedicated queue) for critical traffic; tighten admission worksheet using worst-case burst + multi-hop residence time; add gate windows for strict bound if shaping-only is insufficient.
Pass criteria: max_e2e_us ≤ X over Y minutes of worst-case stress; per_queue_watermark never exceeds X% of capacity; admission_violation_evt=0.
Q3. Time sync shows “locked”, yet windows are shifted — timestamp tap point mismatch or time-base step?
Likely cause: Hidden time-base step (holdover enter/exit) or timestamp tap point differs across hops, causing schedule-to-time mapping error.
Quick check: Inspect timebase_step_evt, holdover_evt, ts_jump_cnt; measure window edge error win_edge_err_ns at ingress/egress stamps per hop.
Fix: Enforce guardrail: block schedule activation unless time base is stable; standardize timestamp tap point usage across devices; recompute schedule alignment after any sync topology or clock source change.
Pass criteria: timebase_step_evt=0 and ts_jump_cnt=0 over Y minutes; absolute win_edge_err_ns ≤ X on every hop.
Q4. Preemption enabled, but throughput drops or retransmissions spike — fragment handling or queue mapping issue?
Likely cause: Fragmentation overhead counted incorrectly; preemptable class mapped to a queue with conflicting shaping/gating; fragment reassembly pressure increases buffer churn.
Quick check: Track preempt_frag_cnt, preempt_abort_cnt, reassembly_err_cnt, and per-queue watermark; confirm mapping table prio_to_queue_map matches design intent.
Fix: Restrict preemption to the minimal BE class; verify non-preemptable critical queues remain isolated; increase buffer headroom for reassembly or reduce BE burst with policing.
Pass criteria: Throughput loss ≤ X% at target load; reassembly_err_cnt=0; retransmission indicators (CRC/retry counters) remain within X ppm.
Q5. One traffic class is always starving — CBS/ATS parameters wrong or gate schedule conflict?
Likely cause: Queue shares bandwidth with higher priority without guaranteed service; CBS credit never recovers due to burst pattern; gate closes the class during peak arrivals.
Quick check: Inspect per_queue_tx_bytes (starved queue flat), cbs_credit_min/cbs_credit_reset_cnt, and gate open ratio gate_open_time_us per cycle.
Fix: Assign a dedicated queue; adjust CBS/ATS to match required rate and burst tolerance; align gate slots with expected arrivals or remove gating for that class if shaping is sufficient.
Pass criteria: Starved class achieves ≥ X% of requested rate over Y cycles; queue drop ≤ X ppm; no sustained negative credit beyond planned floor.
Q6. Determinism breaks after topology change — multi-hop schedule alignment or residence-time update missing?
Likely cause: Path length/hop count changed but GCL offsets and residence-time corrections stayed old; time-domain calibration no longer matches the forwarding path.
Quick check: Compare hop_count and per-hop res_time_ns before/after; check win_edge_err_ns growth across hops; verify config bundle version consistency across switches.
Fix: Recompute multi-hop schedule offsets; update residence-time/correction terms; enforce bundle deployment rule: all nodes switch schedules atomically with rollback-ready staging.
Pass criteria: For each hop: win_edge_err_ns ≤ X; end-to-end max_e2e_us ≤ X; no schedule/bundle mismatch events over Y minutes.
Q7. Temperature change triggers window miss — oscillator drift or aggressive holdover strategy?
Likely cause: Time base drift increases offset_ppb until schedule edges misalign; holdover enter/exit introduces a step or higher wander.
Quick check: Log temperature temp_c vs offset_ns/drift_ppb; inspect holdover_evt and timebase_step_evt; count late_window_cnt.
Fix: Tighten clock quality or compensation; adjust holdover thresholds to avoid chattering; add guard margin to window edges if drift budget requires it.
Pass criteria: drift_ppb ≤ X across operating temperature; timebase_step_evt=0; late_window_cnt=0 over Y thermal cycles.
Q8. Measured latency is significantly higher than theory — store-and-forward path or buffer watermark throttling?
Likely cause: Forwarding mode or pipeline adds extra serialization/queuing; hidden buffering triggers high watermark, causing additional delay or shaping backpressure.
Quick check: Measure per-hop components: ingress_tsegress_ts; inspect store_fwd_active state; check per_queue_watermark and throttle_evt.
Fix: Align design assumptions with actual forwarding mode; increase buffer headroom or reduce burstiness via policing; ensure critical queues do not share with BE bursts.
Pass criteria: Per-hop latency decomposition matches budget within ±X%; throttle_evt=0 during bounded-load tests; max_e2e_us ≤ X.
Q9. Background traffic increases critical jitter — queue isolation missing or ingress policing absent?
Likely cause: Critical and BE flows share queue or share buffer headroom; bursts enter unpoliced and create microbursts that push critical frames off schedule.
Quick check: Compare critical jitter with/without BE load; read prio_to_queue_map, per_queue_watermark, policer_drop_cnt; validate that critical queue has dedicated gate/shaper.
Fix: Enforce dedicated queue for critical class; add ingress policing for BE sources; apply shaping to reduce burst; use gate windows if strict jitter bound is required.
Pass criteria: Critical jitter_us ≤ X under worst-case BE load; critical queue watermark stays below X%; critical drops ≤ X ppm.
Q10. After configuration update, the network sporadically flaps — bundle inconsistency/rollback gap or time-base relock?
Likely cause: Partial deployment causes mismatched schedules/mappings; lack of atomic commit introduces transient invalid states; time base relock triggers schedule misalignment.
Quick check: Compare config_bundle_ver across nodes; inspect atomic_swap_evt, rollback_evt, timebase_unlock_evt; correlate with link_flap_evt.
Fix: Enforce staged rollout with pre-check and post-check; require shadow tables + atomic swap; block schedule activation if time base is not locked; rehearse rollback for each bundle version.
Pass criteria: config_bundle_ver identical on all nodes; link_flap_evt ≤ X per Y hours; timebase_unlock_evt=0 during swap window.
Q11. Mirroring/capture shows frame order “scrambled” — true multi-queue reordering or preemption visualization artifact?
Likely cause: Observation point merges queues and timestamps differently than the forwarding path; preemption fragments distort capture order; true reordering can occur if multiple egress queues drain without a strict ordering policy.
Quick check: Capture at multiple points (ingress vs egress); compare sequence fields seq_id; inspect mirror_port_util, preempt_frag_cnt, and per-queue drain counters.
Fix: Use consistent capture tap points; annotate captures with ingress/egress timestamps; if true reordering is confirmed, enforce per-flow queue mapping and deterministic scheduling for that flow class.
Pass criteria: For critical flows: observed seq_id monotonic at egress; mirror overhead mirror_port_util ≤ X%; no unexplained reorder events over Y minutes.
Q12. Admission passes, yet congestion still happens — which resource dimension is missing (slot/queue/buffer)?
Likely cause: Admission model accounts bandwidth but ignores at least one of: slot occupancy, queue contention, buffer headroom, burst size, or hop-by-hop residence time.
Quick check: Recompute per-flow resource terms: slot_us, frames_per_cycle, burst_bytes, buffer_bytes, hop_res_time_ns; compare predicted vs observed watermark and gate misses.
Fix: Extend worksheet to include slot + queue + buffer constraints; enforce per-class dedicated queues; add policing for burst-limited terms; reject streams that violate any single dimension.
Pass criteria: For each admitted stream: predicted vs measured queue occupancy error ≤ X%; gate_miss_cnt=0; congestion indicators (drops/watermarks) remain within X ppm under validation load.