123 Main Street, New York, NY 10001

Latency & Determinism for TSN: Jitter Budgeting & Shaping

← Back to: Industrial Ethernet & TSN

Determinism is not “low average latency” — it is a provable bound on the tail (p99/p999/max). This page turns switch delay + Qbv/Qci shaping into an end-to-end jitter budget that can be measured, versioned, and accepted with clear pass criteria.

H2-1 · Scope & Non-Scope (Boundary Contract)

This page is a strict anti-overlap gate: it standardizes the latency/jitter “accounting” and focuses only on deterministic delay control via switch delay models, Qbv/Qci shaping, and end-to-end jitter budgeting with measurable pass criteria.

What this page solves (symptom-level)
  • Average latency looks fine, but sporadic “late” events or p99/p999 spikes still occur.
  • Load is “only 20%”, yet real-time traffic still stalls during bursts or contention windows.
  • TSN features are enabled, but determinism does not improve because the tail is not budgeted, bounded, or verified.
In-scope (covered here)
  • Switch delays: per-hop model = fixed pipeline + queueing + shaping / gate-wait.
  • Qbv / Qci: time-aware gating (Qbv) and per-stream policing / admission (Qci) as tail-control levers.
  • End-to-end jitter budgeting: bound each hop, sum bounds, define test matrix, and write pass/fail criteria.
Concrete deliverables (what to carry into design reviews)
  • Latency/Jitter budget sheet template: hop-by-hop fixed delay + queue bound + gate-wait bound + timebase error input (ε) + notes.
  • Qbv GCL design record: cycle time, slot plan, guard-band logic, and “max gate-wait” bound per class.
  • Qci policing record: per-stream burst/rate parameters, violation actions, and counters to audit “bad flows”.
  • Verification matrix: empty-load / full-load / burst / mixed-class / fault-injection, with p99/p999 and late-rate pass criteria.
Out-of-scope (explicit pointers to avoid cross-page overlap)
  • PTP / BMCA / topology calibration → go to PTP Hardware Timestamping. (Timebase is treated here as an input error ε only.)
  • SyncE / White-Rabbit frequency lock & holdover → go to the SyncE / White-Rabbit-Style Timing pages.
  • PROFINET / EtherCAT / CIP business models & certification → go to Industrial Ethernet Stacks pages.
  • PHY SI / EMC / TVS / CMC / grounding → go to PHY Co-Design & Protection pages. (This page focuses on switch scheduling/policing bounds.)
Keyword gates (fast routing)
If searching for “offset drift / BMCA / asymmetry…”
Route to PTP Hardware Timestamping (timebase calibration & correction).
If searching for “SyncE jitter templates / holdover…”
Route to Synchronous Ethernet (SyncE) (frequency distribution & filtering).
If searching for “TVS / CMC / surge return path…”
Route to PHY Co-Design & Protection (EMC/ESD/surge/layout stability).
Diagram: Page map (Budget → Parameterize → Validate → Accept) with explicit out-of-scope jump pointers.
Latency & Jitter Workflow + Jump Pointers Four-step workflow on the left and four out-of-scope cards on the right with arrows. Mainline: make determinism measurable 1 Budget Sum hop bounds + timebase error ε 2 Parameterize GCL (Qbv) + policing (Qci) + queues 3 Validate p99/p999 + late rate + counters + tests Accept pass criteria Out-of-scope → jump to sibling pages PTP / BMCA / calibration Go to: PTP Hardware Timestamping SyncE / White-Rabbit Go to: Timing & Sync pages Industrial stacks Go to: PROFINET / EtherCAT / CIP PHY SI / EMC / protection Go to: PHY Co-Design & Protection

H2-2 · Determinism = Latency Distribution (Beyond the Average)

Determinism is not “lower average latency”. Determinism is a tight and bounded latency distribution under defined load patterns. The engineering goal is tail control: convert unpredictable spikes into bounded wait times that can be budgeted and verified.

Mental model correction
Common mistake
  • Optimizing mean latency while ignoring p99/p999.
  • Using “average utilization” as a proxy for real-time readiness.
  • Declaring success because throughput is high, even when control traffic misses deadlines.
Correct target
  • Bounded tail: p99/p999 and max latency stay within a known upper bound.
  • Defined load patterns: empty-load, full-load, burst, and mixed-class cases are explicitly tested.
  • Deadline integrity: late-rate (deadline miss events) is controlled and audited.
Engineering definitions used on this page (to keep scope tight)
  • Latency: end-to-end time from transmit event to receive event under a defined measurement tap-point.
  • Jitter (this page): short-term spread of the latency distribution within the same configuration and test window.
  • Wander: long-term drift (not expanded here). If wander dominates, route to PTP/SyncE pages.
Metrics that actually reflect determinism
  • Percentiles: p50 (median), p95, p99, p999 — prioritize p99/p999 for tail control.
  • Max–min: highlights rare spikes that percentiles may hide in short windows.
  • Late-rate: “deadline miss events per N packets / per minute” (requires a clear denominator).
  • Burst sensitivity: tail increase under burst injection and mixed traffic classes.
Measurement hygiene (prevents “false stability”)
  • Keep window length explicit (e.g., Y minutes or N frames) and do not mix windows across runs.
  • Do not compare percentiles across different tap points or timestamp definitions.
  • Always log load pattern metadata (burst size, mix ratio, class mapping) together with results.
Tail amplifiers (kept strictly in-scope)
  • Queueing under contention: bursts and shared resources create non-linear tail growth.
  • Gate-wait (Qbv): bounded but non-zero waiting until the next open window; the bound must be budgeted.
  • Unregulated flows (Qci missing/weak): abnormal traffic can reintroduce tail spikes.
  • Observability mismatch: counters/timestamps look clean because the accounting definition is inconsistent (tap-point/denominator/window).
Diagram: same mean latency, different tails. Determinism improves when the tail is tightened and bounded.
Determinism equals tail control Three panels show the same mean with tight tail, long tail, and long tail with spikes. Determinism = bounded distribution (tail control), not “lower mean” A: Tight tail B: Long tail C: Tail + spikes Mean Mean Mean p99/p999 close tail grows under burst rare deadline misses p99 p999 p99 p999 p99 p999 Next steps: model per-hop delay → bound queueing/gate-wait → budget p99/p999 → verify with a test matrix

H2-3 · Latency Taxonomy (End-to-End Ledger Canon)

A deterministic design requires a single accounting canon. This section defines the end-to-end segments, the only three allowed delay categories, and the minimum timestamp tap-point rules so every budget line uses the same meaning.

Canonical end-to-end segment chain

Use this chain as the fixed “row order” for budgets and reports:

Endpoint Tx → NIC/Driver → Switch ingress → Queue → Shaper/Gate → Egress → Cable → Endpoint Rx
Only three allowed delay categories (use everywhere on this page)
  • Fixed latency: deterministic baseline from pipeline/serialization; weakly dependent on mode and frame length.
  • Load-dependent queueing: contention/bursts create non-linear tail growth; must be bounded with assumptions.
  • Time-dependent gate-wait: Qbv windows convert random contention into bounded waiting; the bound must be budgeted.
Ledger rule
Tail control is primarily shaped by queueing and gate-wait. Fixed latency defines the baseline and must remain traceable.
Minimum timestamp tap-point rules (no implementation details)
  • Tap-point must be declared for each measurement run (endpoint tap vs switch ingress/egress tap).
  • Do not mix tap-points when comparing p99/p999; otherwise the distribution is not comparable.
  • Window/denominator must be logged (Y minutes or N frames) together with load pattern metadata.
  • Timebase error is an input ε; if drift/asymmetry dominates, route to timing pages (PTP/SyncE).
Budget sheet fields (mobile-friendly list)
  • Segment / hop name
  • Fixed (min/typ) and notes (mode, frame length dependency)
  • Queue bound (max) with stated burst assumptions
  • Gate-wait bound (max) from GCL + guard-band
  • Source: Measured / Datasheet / Calibrated
  • Tap-point + window length + load pattern ID
Diagram: end-to-end chain with tap-points and region types (measured / inferred / opaque).
End-to-End Tap-Point Ledger View Chain shows endpoints and switches; each segment is labeled as measurable, inferred, or opaque with icons. Region types Measured Inferred Opaque EP-A Endpoint tap Tx event SW1 Ingress tap Egress tap SW2 Ingress tap Egress tap EP-B Endpoint tap Rx event NIC Cable Per-hop ledger inside switches Fixed Queue Gate wait Opaque / not directly measurable zones Driver scheduling + external path variance → treat as assumptions

H2-4 · Switch Internal Delays (Fixed-Latency Model)

Per-hop fixed latency becomes traceable when it is decomposed into pipeline stages. This section explains store-and-forward vs cut-through at a modeling level, identifies common fixed-delay contributors, and classifies each contributor by the most reliable source: measured, datasheet, or bench-calibrated.

Store-and-forward vs cut-through (determinism-oriented model)
  • Store-and-forward: forwarding starts after full frame buffering; fixed latency includes frame-length-dependent serialization terms.
  • Cut-through: forwarding starts earlier; fixed latency can be lower, but mode constraints and feature hooks may add deterministic stages.
  • Ledger requirement: fixed latency must be recorded with mode + port speed + representative frame-length assumptions.
Common fixed-delay contributors (pipeline-level)
  • Ingress parse and classification
  • Lookup and forwarding decision
  • Buffer write/read and internal fabric traversal
  • Rewrite / mirroring hooks (deterministic adders)
  • Egress scheduling baseline and MAC serialization
Feature toggles (impact statement only)
Enabling ACL, TSN, Telemetry, or Metrology typically inserts deterministic processing stages. Treat the delta as fixed latency adders and validate by a controlled bench run.
Source classification (Measured / Datasheet / Calibrated)
Measured
Use controlled traffic (fixed frame length, stable load pattern) to extract the fixed baseline and confirm mode dependencies.
Datasheet
Use vendor latency numbers as the budget seed; track the exact conditions (mode, features, speed) as ledger notes.
Bench-calibrated
When feature combinations or topology interactions dominate, calibrate fixed adders by bench and freeze them with configuration versioning.
Diagram: switch pipeline segmentation (fixed-delay contributors) and feature toggles that insert deterministic stages.
Switch Pipeline Fixed-Latency Model Pipeline blocks show parser, lookup, policer, queue, shaper, scheduler, and MAC; each block indicates fixed adder placeholders. Fixed latency is a sum of deterministic pipeline stages (Δt_fixed adders) Feature toggles (insert deterministic stages) ACL TSN Telemetry Metrology Parser adds fixed X Lookup adds fixed X Policer adds fixed X Queue adds fixed X* Shaper adds fixed X Scheduler adds fixed X MAC / Serialization deterministic term depends on port speed + frame length (record assumptions) *Queue is shown as a stage; queueing delay is budgeted separately as a bound. The stage adds deterministic overhead only.

H2-5 · Queueing Delay (Why the Tail Explodes)

Determinism fails most often because queueing delay is not linear: bursts and contention create tail growth that is invisible in long-window averages. This section maps the top three triggers to observable symptoms and to the minimum accounting fields required for budgets and logs.

The three most common queueing triggers
1) Bursts (microbursts)
  • Observable: average utilization looks safe, but p99/p999 latency spikes in short windows.
  • Accounting fields: burst size (B), burst interval, peak rate, service rate (μ), peak queue depth.
  • First check: compare 1 ms vs 1 s windows; confirm whether peaks are hidden by averaging.
2) Shared queues / shared buffers
  • Observable: a “noisy” class inflates tail latency of a “well-behaved” class.
  • Accounting fields: class→queue mapping, per-queue depth/occupancy, per-class counters.
  • First check: inspect per-queue counters, not port-wide totals; identify which queue hits saturation.
3) Head-of-line blocking / priority inversion
  • Observable: higher-priority traffic still experiences stalls because it is blocked by a shared egress dependency.
  • Accounting fields: priority→class mapping, scheduler policy, egress congestion markers.
  • First check: verify mapping consistency end-to-end; confirm which egress is the true bottleneck.
“Average utilization is low, but it feels stuck” — the accounting explanation
  • Window mismatch: a safe 1 s average can hide 1–10 ms peaks that dominate tail latency.
  • Peak vs mean: queue growth is driven by short-term peak arrival exceeding service rate.
  • Effective service loss: drops/retries or backpressure events reduce effective throughput (treat as a queueing amplifier).
Ledger rule
Any “utilization” metric must be reported with the window length and broken down to the queue level.
Minimum accounting fields for queueing (budget + logs)
  • Q(t): queue depth over time (peak, threshold crossings, recovery time).
  • μ: service rate (configured vs observed) per egress queue.
  • Burst: size (B), peak rate, burst period / inter-burst gap.
  • Mapping: class→queue and priority→class consistency across endpoints and switches.
  • Counters: drops, backpressure/pause events, and per-queue occupancy.
Diagram: queue depth over time shown as step blocks (burst-in → service → tail risk). No curves.
Queue Depth Step-Block View Step blocks illustrate burst arrival, service drain, and tail risk region. Includes a short-window magnifier box. Queue depth (Q) vs time — step blocks (burst → drain → tail) Q time burst service tail risk short-window peaks dominate p99/p999 peak window

H2-6 · Qbv Time-Aware Shaping (Turn Uncertainty into a Schedule)

Qbv replaces a portion of random contention with a time table. A gate-control list (GCL) defines repeating windows, and the resulting delay becomes deterministic but bounded: traffic waits until the next open window. This section describes the minimum GCL fields, the role of guard bands, and how timebase error enters the budget without expanding into configuration ecosystems.

Qbv in one line (budget-centric)

A repeating schedule opens/closes gates per traffic class so the worst-case waiting time becomes a bounded gate-wait term in the latency ledger.

GCL (gate-control list): minimum fields that must be written down
  • Cycle time: schedule period that repeats.
  • Slots: windows with fixed start/length inside the cycle.
  • Gate state: open/closed per traffic class per slot.
  • Repeat: deterministic repetition used to compute bounded wait.
  • Class mapping: which class uses which window (keep mapping stable).
Guard band and budget impact (bounded wait)
  • Guard band: a protective slice near window boundaries to prevent boundary-crossing interference.
  • Gate-wait term: traffic may wait until the next open slot; this is deterministic but bounded.
  • Timebase input: treat timebase misalignment as an input error ε; if it dominates, route to timing pages (PTP).
Ledger fields (Qbv)
cycle time, slot lengths, guard band, class mapping, and max gate-wait bound (per hop) → carried into end-to-end budget.
Diagram: one-cycle GCL stripe with minimal labels (Class A / Class B / BE) and guard band.
GCL Time Windows (Qbv) Stripe diagram shows cycle time, slots for classes A/B/BE, a guard band region, and bounded waiting until the next open window. Gate-Control List (GCL): one cycle, time windows, and bounded wait cycle time Class A GB Class B BE repeat Bounded wait term (gate-wait) If a frame arrives when the gate is closed, it waits until the next open window. Minimal ledger fields cycle • slot lengths • gate state • guard band • class mapping • max gate-wait bound

H2-7 · Qci Per-Stream Policing (Keep “Bad Traffic” Out)

Deterministic networking collapses when input traffic has no bound. Qci-style per-stream policing enforces rate and burst constraints at admission, isolating abnormal flows so tail latency remains budgetable. This section focuses on engineering goals, ledger fields, and counters — without expanding into configuration ecosystems.

Engineering target (what policing is supposed to guarantee)
  • Bound the arrival process: cap burst size and sustained rate per stream.
  • Protect determinism: prevent a single abnormal stream from inflating p99/p999 for others.
  • Make budgets credible: queue and gate-wait bounds assume input is already constrained.
Actions, tail impact, and trade-offs
  • Conformance decision: each stream is judged as conforming or violating against rate/burst limits.
  • Actions: allow, drop, or mark (policy-driven) to block “bad traffic” from consuming shared resources.
  • Tail reduction: fewer microburst-driven queue spikes → tighter latency distribution.
  • Cost: policing can reduce throughput or increase drop/mark events (must be included in acceptance).
Ledger rule
Tail improvements must be reported together with violation counters and drop/mark rates to avoid “fake determinism.”
Minimum fields to record (per stream)
  • Stream ID: a stable flow identifier (source/destination/port/class tuple).
  • Meter parameters: rate limit + burst limit (token-bucket parameter set).
  • Action policy: allow / drop / mark (and any severity level).
  • Violation counters: count, duration, peak violation intensity.
  • Mapping reference: class→queue reference used by the stream (for audit and drift checks).
Diagram: per-stream token bucket → action (gate/drop/mark) → counters (violations/drops). Minimal text, ledger-centric.
Per-Stream Policing (Token Bucket → Action → Counters) Three-block diagram with a token bucket meter, action choices (gate/drop/mark), and counters. Includes minimal parameters and tail protection label. Qci-style admission: bound each stream so “bad traffic” cannot dominate tail Token bucket (per stream) meter rate burst Action GATE DROP MARK tail protection Counters violations drops audit

H2-8 · End-to-End Jitter Budgeting (From “Feeling” to Acceptance)

This section is the core deliverable: a copyable budgeting method that turns end-to-end latency and jitter into a ledger with explicit bounds, assumptions, and verification hooks. Every row must be traceable to a source (measured, datasheet, or calibrated), and the acceptance criteria must include tail metrics, not only averages.

Canonical budget form (ledger canon for this page)
E2E latency = Σ fixed + Σ queue bound + Σ gate-wait bound + ε_timebase + ε_impl
  • fixed: per-hop deterministic pipeline/serialization baseline (feature/mode dependent).
  • queue bound: worst-case queueing term under declared burst/service assumptions.
  • gate-wait bound: worst-case waiting until next open window (plus guard band).
  • ε_timebase: timebase input error (treated as a parameter; details belong to timing pages).
  • ε_impl: implementation residual (tap-point limitations, host scheduling variance, unmodeled adders).
How to write bounds (engineering templates)
Queue bound
Write the worst-case queueing term using declared maximum burst (B) and service rate (μ). Attach assumptions: burst window, class→queue mapping, and load pattern ID.
Gate-wait bound (Qbv)
Bound the waiting term by the maximum closed-window length plus guard band. Record cycle time, slot lengths, and the exact GCL version.
Timebase input
Treat ε_timebase as an input parameter carried into the budget sheet; if it dominates, route to timing/sync pages.
Budget sheet fields (copyable template)
  • Hop / Segment: EP, SW1, SW2…
  • Item: fixed / queue bound / gate-wait bound / ε
  • Min / Max: or typ/max (explicit bounds, not only typical)
  • Notes: mode, features, frame length, mapping, burst window, load pattern ID
  • Source: measured / datasheet / calibrated
  • Tap-point: measurement tap declaration aligned with the page’s taxonomy
  • Verification: plan + counters used to validate this row in bring-up
Acceptance criteria (must include tail)
  • Tail metrics: p99/p999 (and a max bound where applicable) under declared load patterns.
  • Bound compliance: verify that fixed + bounded terms explain measured results within ε_impl.
  • Counter sanity: violation/drop/mark counters remain within declared limits (no hidden trade-offs).
  • Version lock: budget must bind to configuration versions (GCL/mapping/policing).
Diagram: budget “spreadsheet” structure (columns + per-hop grouped rows + Σ summary + ε inputs). Rendered as a box diagram for mobile safety.
End-to-End Budget Sheet Structure Spreadsheet-like box diagram with columns and grouped rows per hop, plus a sum block and epsilon input boxes. Budget sheet = per-hop rows + explicit bounds + traceable sources Hop Item Min Max Notes Source SW1 fixed queue bound gate-wait bound SW2 fixed queue bound gate-wait bound EP endpoint adders Σ summary E2E = Σ + ε_timebase + ε_impl Inputs ε_timebase config ver.

H2-9 · Parameterization Workflow (Requirements → Config → Feedback)

Determinism becomes repeatable only when parameters are managed as a closed loop. This workflow turns requirements into a budget, produces shaping/policing parameters, verifies results with consistent metrics, and feeds deviations back into the ledger. The focus is on reusable artifacts and version control — not vendor tool details.

Step 1 — Requirements as ledger inputs
  • Cycle / control loop: cycle time, deadline, update rate.
  • Tail constraint: E2E p99/p999 limit (not only average).
  • Topology: hop count, critical paths, fan-out / aggregation points.
  • Load pattern ID: idle / full / burst / mixed / fault-injection profile.
Step 2 — Budget allocation and hard boundaries
  • Per-hop allocation: assign fixed, queue bound, and gate-wait bound per hop.
  • Hard boundaries: identify terms that cannot be “optimized away” (e.g., gate-wait bound, ε_timebase).
  • Assumption lock: freeze mapping, burst window, and frame-size assumptions in the ledger.
Artifact
A versioned budget ledger that names every hop and bound term, plus the load pattern ID used for acceptance.
Step 3 — Generate parameters (GCL + mapping + policing)
  • GCL (Qbv): cycle time, slot lengths, guard band, and a stable class→window plan.
  • Queue mapping: class→queue mapping aligned across endpoints and switches.
  • Policing (Qci-style): per-stream rate/burst limits and violation actions.
  • Config identity: attach a config version tag to each parameter set.
Step 4 — Deploy, verify, and feed deviations back
  • Verify with consistent metrics: per-hop delay, E2E p99/p999, queue watermark, gate misses, drops/late.
  • Deviation attribution: map every failure to a ledger term (fixed / queue / gate-wait / ε).
  • Closed-loop rule: if acceptance fails, return to Step 2 (reallocate bounds) before tuning ad-hoc.
Step 5 — Versioning and change management
  • Rule: parameter sheet version = system configuration source-of-truth.
  • Bind: ledger version ↔ GCL version ↔ mapping version ↔ policing version ↔ test matrix version.
  • Rollback path: every release must have a tested rollback tag with known acceptance results.
Diagram: closed-loop workflow (Plan → Configure → Measure → Adjust) with version/rollback branch.
Parameterization Closed Loop Four-step loop Plan Configure Measure Adjust. A side branch captures Version tag and Rollback path linked to Configure and Measure. Plan → Configure → Measure → Adjust (closed loop) + Version / Rollback PLAN requirements + budget CONFIGURE GCL + mapping + policing MEASURE p99/p999 + counters ADJUST re-allocate bounds VERSION ROLLBACK

H2-10 · Validation & Measurement (Make Determinism Measurable)

Determinism cannot be accepted without consistent measurement contracts. This section defines a minimum metric set, a scenario matrix that exposes tail failures, a tap-point consistency gate, and a pass-criteria template with threshold placeholders. It avoids tool-specific details and focuses on repeatable acceptance.

Minimum metrics (ledger-aligned)
  • Per-hop delay: hop baseline + deviations (use declared tap-points).
  • E2E tail: p99 and p999 latency under declared scenario IDs.
  • Drops / late: drop counters and late-arrival events for bounded windows.
  • Queue watermark: per-queue peak occupancy (not port-wide only).
  • Gate miss: gate misses / window violations (per class/queue where applicable).
  • Policing health: violation counters (per stream) and action rates.
Scenario matrix (minimum coverage set)
  • Idle: baseline fixed + measurement noise floor.
  • Full load: sustained saturation risks and scheduling drift.
  • Burst: microburst tail exposure and queue bound validation.
  • Mixed: class interaction and mapping correctness under concurrency.
  • Fault injection: abnormal traffic / violations to test policing and isolation.
Contract
Any reported p99/p999 must include scenario ID and window length; otherwise results are not comparable.
Tap-point consistency gate (measurement contract)
  • Declare tap-points: every per-hop delay must name the timestamp tap position.
  • Reject mixed baselines: do not attribute differences to queue/gate if tap-points differ.
  • Timebase note: if timebase dominates, treat it as ε_timebase input (route to timing pages).
Pass criteria template (threshold placeholders)
  • Metric: E2E p99 / E2E p999 / per-hop delay / queue watermark / gate miss
  • Scenario ID: idle / full / burst / mixed / fault
  • Window: X ms / X s (must be declared)
  • Threshold: ≤ X (units) + counters within X
  • Evidence: counters + percentile report + config version tag
Diagram: scenario matrix (left) + metrics dashboard blocks (right). Designed for acceptance traceability.
Validation Matrix + Metrics Dashboard Left side is a scenario-by-metric coverage matrix with checkboxes. Right side is a dashboard with p99, p999, queue watermark, and counters blocks. Validate determinism with a scenario matrix + tail-focused metrics Scenario matrix Scenario p99 p999 hop ctr Idle Full load Burst Mixed Fault inj. Metrics dashboard E2E p99 E2E p999 Queue watermark Counters drops • violations • gate miss Contract: declare scenario ID + window length + tap-points before comparing p99/p999.

H2-11 · Design Hooks & Pitfalls (Where Tail Latency Explodes)

This section captures only determinism-relevant pitfalls: time windows (Qbv), queue/mapping mistakes, policing drift, and measurement contract mismatches. Each pitfall is expressed as a repeatable triage path: trigger → symptom → first counter check. Topics such as PHY/EMC/SI or protocol stack business models are intentionally excluded.

A) Time-table pitfalls (Qbv / GCL / guard band)
Pitfall: GCL cycle ≠ application cycle
  • Trigger: mismatched cycle lengths or phase drift.
  • Symptom: periodic p999 spikes (beating / phase-locked bursts).
  • First check: p99/p999 sliced by time phase + GCL cycle/slot audit.
  • Fix direction: align cycles/phase, then re-balance slot allocation.
Pitfall: guard band underestimated
  • Trigger: guard band computed with optimistic frame/serialization assumptions.
  • Symptom: late events or “window miss” bursts under real payloads.
  • First check: gate-miss / late counters aligned to the same window length.
  • Fix direction: widen guard band or adjust slot boundaries and queue service.
B) Mapping & queue pitfalls (tail grows “while utilization looks low”)
Pitfall: priority / class-to-queue mapping error
  • Trigger: high-priority class mapped into shared BE queue.
  • Symptom: high-priority tail rises (HOL blocking), even at “low average load”.
  • First check: per-queue watermark + mapping table version audit.
  • Fix direction: isolate queues and re-validate allocation with burst scenario ID.
C) Policing & metric contract pitfalls (false calm vs real congestion)
Pitfall: policing too tight / too loose
  • Trigger: token bucket set without matching the declared burst window.
  • Symptom: drops (too tight) or tail returns (too loose) under burst stress.
  • First check: violations/drops counters aligned with p99/p999 in the same scenario window.
  • Fix direction: re-derive rate/burst from the ledger assumptions, then re-run matrix.
Pitfall: counters use inconsistent windows / denominators
  • Trigger: mixing per-port, per-queue, and per-flow metrics without stating scope.
  • Symptom: “utilization looks fine” while p999 and watermarks explode.
  • First check: declare window length + denominator + object (port/queue/flow) before comparing.
  • Fix direction: standardize a metric contract, then re-baseline acceptance.
Diagram: triage paths — Trigger → Symptom → First counter check (with a small fix-direction tag).
Pitfall Triage Paths Multiple rows of triage paths showing trigger, symptom, first check counter, and a fix direction tag. Trigger Symptom First check cycle mismatch periodic p999 spikes p999 time-slice GCL cycle audit GB too small late / window miss gate miss late counter + mapping error HOL tail watermark map version policing drift drops OR tail returns violations drops aligned metric mismatch false calm window+denom object scope

H2-12 · Engineering Checklist (Design → Bring-up → Production)

This checklist is determinism-specific. It binds the budget ledger, shaping/policing parameters, measurement contracts, and the regression matrix into three gates: Design, Bring-up, and Production. Each gate requires versioned artifacts, consistency checks, and acceptance-ready evidence.

Design gate
  • Ledger done: per-hop fixed + queue bound + gate-wait bound + ε terms recorded.
  • Worst-case defined: scenario IDs + burst assumptions + window length locked.
  • Params frozen: GCL + mapping + policing tables frozen with version tag.
  • Hard bounds tagged: identify non-negotiable terms and margin ownership.
  • Pass template ready: thresholds as “≤ X” with evidence fields defined.
Bring-up gate
  • Per-hop calibration: measured vs inferred vs datasheet terms labeled.
  • Tap contract: tap-points declared and consistent across all measurements.
  • GCL readback: downloaded schedule matches readback and version tag.
  • Counters aligned: window/denominator/object scopes standardized.
  • Tail validated: p99/p999 verified under burst & mixed scenario IDs.
Production gate
  • Version binding: ledger ver ↔ config ver ↔ test ver are linked.
  • Observability: watermark, gate miss, violations, drops, late, event fields logged.
  • Regression set: minimum matrix (idle/full/burst/mixed/fault) is automated.
  • Change triggers: topology/mapping/window changes force re-budget + re-accept.
  • Rollback tag: a tested rollback version with known acceptance evidence exists.
Diagram: three gates (Design / Bring-up / Production) with 5 checkboxes each (short labels only).
Three-stage determinism gates Three side-by-side cards labeled Design, Bring-up, Production, each with five checkboxes and short labels. Includes a small Pass ≤ X line. DESIGN ledger done worst-case params frozen hard bounds pass template Pass: ≤ X BRING-UP per-hop cal tap consistent GCL readback counters aligned p999 validated Pass: ≤ X PRODUCTION version bind observability regression set change triggers rollback tag Pass: ≤ X

H2-13 · Applications & IC Selection (Determinism-First)

This section converts latency/jitter modeling into a practical selection method: define determinism targets, map them to TSN mechanisms (Qbv/Qci/observability), then shortlist devices by measurable evidence.

Deliverables Determinism-First Decision Tree Example Part Numbers (BOM-ready)
  • Applications: determinism targets expressed as p99/p999 + cycle + hop (no protocol deep-dive).
  • Selection axes: fixed per-hop latency, tail control, policing protection, observability, and resource boundaries.
  • Shortlist examples: TSN switches / TSN MPUs / industrial switch silicon with concrete orderable PNs.

A) Applications (expressed as determinism targets)

Keep the application description strictly measurable: cycle, percentile bounds, “late” rate, and protection against abnormal traffic.

Motion Control Cell Qbv-bound tail
  • Targets: cycle = X µs, E2E p99 ≤ Y µs, p999 ≤ Z µs, late ≤ A / 10^6 frames.
  • Dominant risks: mixed-load bursts, gate schedule drift, priority mapping mistakes.
  • Selection must-have: TAS/Qbv, per-queue watermark + gate-miss/late counters.
Controller-to-Remote I/O Qci domain protection
  • Targets: bounded jitter under contention; abnormal traffic must not inflate tail.
  • Dominant risks: “bad” flows (misconfigured burst/rate), queue hogging, retry storms.
  • Selection must-have: PSFP/Qci (token bucket + violation counters) and stable policing behavior.
Deterministic Triggering / Imaging tight p999
  • Targets: p999 bound dominates; small periodic tail spikes are unacceptable.
  • Dominant risks: schedule beat frequency, guard-band underestimation, egress serialization.
  • Selection must-have: deterministic schedule update, strong time-window tooling, per-hop calibration hooks.
Edge Gateway (TSN island to cloud) observability-first
  • Targets: deterministic forwarding inside TSN domain + measurable degradation outside.
  • Dominant risks: mixed policies, counter definition drift, silent tail inflation.
  • Selection must-have: rich counters (per-queue watermark, late/drop, policing violations) + black-box logging hooks.

B) IC Selection Axes (determinism-first, measurable)

Axis 1 — Fixed per-hop latency (predictable baseline)

  • Prefer architectures that keep forwarding path stable under feature enablement.
  • Require a clear method to obtain fixed latency: datasheet value, bench calibration, or per-hop measurement.

Axis 2 — Tail control (queue bound + gate-wait bound)

  • Qbv/TAS converts uncertainty into a bounded, schedulable wait term (deterministic but bounded).
  • Queue tail must be bounded by design: burst assumptions, service rate, queue isolation, and shaping policy.

Axis 3 — Domain protection (Qci / per-stream policing)

  • PSFP/Qci prevents abnormal flows from inflating tail for everyone else.
  • Must-have evidence: token bucket parameters + violation counters + deterministic drop/mark policy.

Axis 4 — Observability (field-proof determinism)

  • Require counters that directly map to the budget terms: queue watermark, gate-miss/late, drops, policing violations.
  • Prefer designs that can run PRBS/loopback/traffic tests without changing the determinism path.

Axis 5 — Resource boundaries (scale without “schedule jitter”)

  • Check table sizes, number of queues per port, GCL length, and schedule update behavior.
  • Operational rule: configuration table version is a controlled artifact (diffable, rollbackable).

C) Example Part Numbers (shortlist by determinism needs)

The part numbers below are orderable examples. Feature subsets vary by variant and software stack; verify Qbv/Qci/observability in the official documentation.

Bucket 1 — Compact TSN switch (fast path + integrated CPU)

  • Microchip: LAN9662-I/9MX (TSN switch + CPU), LAN9668/9MX (8-port TSN switch + CPU).
  • Why (determinism): TSN scheduling/policing options + rich counters are typically available for field validation.
  • Best fit: remote I/O, compact cells, gateway-class TSN islands.

Bucket 2 — Higher port-count / higher bandwidth industrial switching silicon

  • Microchip (SparX-5i family): VSC7546TSN-V/5CC (industrial switch class).
  • Why (determinism): scale (ports/bandwidth) + TSN mechanisms enable tight p999 targets across more hops.
  • Best fit: TSN backbone switches, multi-line aggregation, high-throughput deterministic cells.

Bucket 3 — TSN-capable MPU (integrated L2 switch for endpoints/gateways)

  • Renesas: R9A07G084M04GBG#AC0 (RZ/N2L group), R9A07G084M04GBG#BC0 (variant).
  • Why (determinism): integrated switching + TSN support is useful when endpoint control and deterministic forwarding are packaged together.
  • Best fit: drives, gateways, remote I/O, industrial endpoints requiring tight timing behavior.

Bucket 4 — Automotive-grade TSN switches (when safety/security constraints dominate)

  • NXP: SJA1105QELY (AVB/TSN switch family), SJA1110CEL/0Y (SJA1110 family variant).
  • Why (determinism): TSN scheduling features + strong platform support can help stabilize per-hop behavior at scale.
  • Best fit: automotive/industrial cross-over gateways, harsh environment edge boxes.

Selection rule (determinism-first): choose by measurable evidence — per-hop delay method, queue/gate counters, policing violation counters, and a repeatable test matrix. If any of these are missing, determinism becomes un-auditable in the field.

Diagram — Determinism-First Selection Tree

Inputs: p99/p999 target, cycle time, hop count, mixed-load risk. Outputs: required TSN mechanisms and shortlist buckets.

Determinism-First IC Selection (Latency/Tail/Protection/Observability) Input Targets • cycle = X µs • E2E p99 ≤ Y, p999 ≤ Z • hops = N, mixed-load risk Need bounded tail via time windows? → Require Qbv (TAS) + gate counters Need protection against abnormal flows? → Require Qci (PSFP) + violation counters Bucket 1: Compact TSN Switch + CPU LAN9662-I/9MX · LAN9668/9MX Best for: remote I/O · compact cells Bucket 2: Higher Port/BW Industrial Switch VSC7546TSN-V/5CC (SparX-5i class) Best for: TSN backbone · aggregation Bucket 3: TSN MPU (integrated switch) R9A07G084M04GBG#AC0 (RZ/N2L) Best for: drives · gateways · endpoints Bucket 4: Automotive-grade TSN Switch SJA1105QELY · SJA1110CEL/0Y
Practical usage: fill X/Y/Z/N with project targets, then lock Qbv/Qci/observability requirements and shortlist by measurable evidence.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13 · FAQs (Field Troubleshooting + Acceptance Criteria)

Scope: only long-tail troubleshooting and acceptance wording for latency/determinism (p99/p999/tail), Qbv/Qci, and budget/measurement consistency. Format is fixed per question: Likely cause / Quick check / Fix / Pass criteria.

Data-structured fields p99 / p999 / max gate_miss / late queue_watermark / violations / drops
Average latency looks OK, but p99 blows up — burst or gate miss first?

Likely cause: micro-bursts inflate queue tail, or Qbv gate miss creates periodic late frames.

Quick check: correlate p99/p999 spikes with queue_watermark vs gate_miss/late in the same window (same denominator).

Fix: cap burst (Qci token bucket) and/or adjust GCL slots + guard band until gate misses disappear.

Pass criteria: E2E p99 ≤ X, p999 ≤ X over X minutes; gate_miss = 0; late ≤ X / 10^6 frames.

Qbv enabled but jitter gets worse — GCL beat frequency or guard band too small?

Likely cause: GCL period misaligned with traffic cycle (beat), or guard band underestimates worst-case serialization.

Quick check: look for periodic p999 spikes; check gate_miss clustered at window edges; validate max-frame serialization assumption.

Fix: align GCL period to cycle; increase guard band by X (worst-case serialize + margin).

Pass criteria: no periodic p999 spikes over X minutes; gate_miss = 0; window-edge late = 0.

Low load still “stalls” — window closed too long or priority mapping wrong?

Likely cause: gate-closed duration blocks a critical class, or QoS mapping reset misroutes traffic to BE queue.

Quick check: compare stall timing vs gate state; verify class→queue counters (per-class increments must match expectation).

Fix: shorten closed windows; restore mapping matrix; pin configuration by version + readback diff.

Pass criteria: worst-case wait ≤ X; E2E p99 ≤ X; mapping counters stable within X%.

Same config, different switch silicon worsens tail — fixed latency delta or queue behavior?

Likely cause: different per-hop fixed pipeline latency and/or buffer sharing inflates queue tail under burst.

Quick check: measure per-hop idle baseline vs mixed-load tail; compare queue_watermark vs dequeue stability at same input burst.

Fix: re-calibrate fixed term; re-budget queue bound; adjust queue isolation/shaping assumptions for this silicon.

Pass criteria: updated budget closes with ≥ X margin; per-hop fixed repeatability within ±X; p999 ≤ X.

Field shows occasional “late”, but counters look clean — window definition or tap mismatch?

Likely cause: counter window/denominator hides events, or measurement tap points differ across devices/tools.

Quick check: force same observation window + denominator; run tap-consistency check (same event, same tap definition).

Fix: standardize KPI definition (window/denominator/tap); log raw late events until stable.

Pass criteria: tool-to-tool late rate within ±X; late ≤ X / 10^6 frames under standardized window.

One abnormal node slows the whole network — Qci not blocking or shared-queue HOL?

Likely cause: policing missing/weak for that stream, or shared buffering causes head-of-line blocking across classes.

Quick check: isolate offender via per-stream counters; check policing_violations and cross-queue watermark coupling.

Fix: tighten Qci token bucket; increase isolation (dedicated queue/class) for critical traffic.

Pass criteria: violations ≤ X per X minutes; critical class p99/p999 stays within X; watermark ≤ X%.

Worse only after maintenance — GCL version drift or readback mismatch?

Likely cause: schedule/config changed silently, or readback differs from intended GCL/mapping bundle.

Quick check: compare config version hash vs golden; read back GCL + mapping and diff.

Fix: enforce config-as-code; block deployment if readback diff ≠ 0; add rollback path.

Pass criteria: version hash matches; readback diff = 0; KPIs stable within X over X reboots.

Preemption still appears inside a window — guard band missed serialization time?

Likely cause: guard band omitted worst-case serialization/egress drain time, or edge behavior adds extra fixed delay.

Quick check: compute max-frame serialization; check edge-aligned gate_miss/late and egress occupancy at window open.

Fix: extend guard band by X; if needed, cap max frame size for interfering class.

Pass criteria: window-edge late events = 0; gate_miss = 0; guard-band margin ≥ X.

Port watermark is not high, but tail is bad — congestion is downstream?

Likely cause: true bottleneck is downstream (next hop/egress), so local watermark does not reflect final queueing.

Quick check: compare per-hop percentiles; check downstream utilization bursts + counters; identify where queueing accumulates.

Fix: apply shaping at the true bottleneck hop; re-budget per-hop queue bound; adjust schedule or split traffic.

Pass criteria: bottleneck hop confirmed; downstream watermark ≤ X%; E2E p999 ≤ X.

p99 meets spec, but max occasionally violates — beat frequency or burst injection?

Likely cause: periodic beat creates rare peaks, or occasional out-of-model bursts appear (maintenance scans/retries).

Quick check: test max-violation periodicity; correlate to event logs + burst counters; validate burst assumptions used in bounds.

Fix: align periods to remove beat; tighten admission (Qci) for out-of-model bursts; add regression scenario.

Pass criteria: max ≤ X over X minutes; no periodic max spikes; blocked bursts ≤ X.

After VLAN/QoS changes, determinism collapses — mapping matrix reset?

Likely cause: priority→queue mapping or policer bindings reset, breaking class isolation and bounds.

Quick check: verify mapping + policer binding tables post-change; compare per-class counters before/after.

Fix: apply mapping/policers as an atomic versioned bundle; enforce readback validation after any VLAN/QoS change.

Pass criteria: mapping diff = 0; per-class p99/p999 stable within X; no queue cross-talk events > X.

Enabling mirroring/telemetry makes tail worse — fixed path increase or queue contention?

Likely cause: extra pipeline stages add fixed latency, or telemetry shares resources and increases contention.

Quick check: A/B test telemetry on/off; compare per-hop idle baseline and queue_watermark; see whether p99 shifts or tail expands.

Fix: move telemetry off the critical class, reduce sampling, or isolate it in a dedicated queue; re-budget fixed term if unavoidable.

Pass criteria: telemetry-on still meets p99 ≤ X & p999 ≤ X; baseline delta ≤ X; critical watermark change ≤ X%.