Latency & Determinism for TSN: Jitter Budgeting & Shaping

Q: Average latency looks OK, but p99 blows up — burst or gate miss first?

Likely cause: micro-bursts inflate queue tail, or Qbv gate miss creates periodic late frames. Quick check: correlate p99/p999 spikes with queue_watermark vs gate_miss/late in the same window (same denominator). Fix: cap burst (Qci token bucket) and/or adjust GCL slots + guard band until gate misses disappear. Pass criteria: E2E p99 ≤ X, p999 ≤ X over X minutes; gate_miss = 0; late ≤ X / 10^6 frames.

Q: Qbv enabled but jitter gets worse — GCL beat frequency or guard band too small?

Likely cause: GCL period misaligned with traffic cycle (beat), or guard band underestimates worst-case serialization. Quick check: look for periodic p999 spikes; check gate_miss clustered at window edges; validate max-frame serialization assumption. Fix: align GCL period to cycle; increase guard band by X (worst-case serialize + margin). Pass criteria: no periodic p999 spikes over X minutes; gate_miss = 0; window-edge late = 0.

Q: Low load still “stalls” — window closed too long or priority mapping wrong?

Likely cause: gate-closed duration blocks a critical class, or QoS mapping reset misroutes traffic to BE queue. Quick check: compare stall timing vs gate state; verify class→queue counters (per-class increments must match expectation). Fix: shorten closed windows; restore mapping matrix; pin configuration by version + readback diff. Pass criteria: worst-case wait ≤ X; E2E p99 ≤ X; mapping counters stable within X%.

Q: Same config, different switch silicon worsens tail — fixed latency delta or queue behavior?

Likely cause: different per-hop fixed pipeline latency and/or buffer sharing inflates queue tail under burst. Quick check: measure per-hop idle baseline vs mixed-load tail; compare queue_watermark vs dequeue stability at same input burst. Fix: re-calibrate fixed term; re-budget queue bound; adjust queue isolation/shaping assumptions for this silicon. Pass criteria: updated budget closes with ≥ X margin; per-hop fixed repeatability within ±X; p999 ≤ X.

Q: Field shows occasional “late”, but counters look clean — window definition or tap mismatch?

Likely cause: counter window/denominator hides events, or measurement tap points differ across devices/tools. Quick check: force same observation window + denominator; run tap-consistency check (same event, same tap definition). Fix: standardize KPI definition (window/denominator/tap); log raw late events until stable. Pass criteria: tool-to-tool late rate within ±X; late ≤ X / 10^6 frames under standardized window.

Q: One abnormal node slows the whole network — Qci not blocking or shared-queue HOL?

Likely cause: policing missing/weak for that stream, or shared buffering causes head-of-line blocking across classes. Quick check: isolate offender via per-stream counters; check policing_violations and cross-queue watermark coupling. Fix: tighten Qci token bucket; increase isolation (dedicated queue/class) for critical traffic. Pass criteria: violations ≤ X per X minutes; critical class p99/p999 stays within X; watermark ≤ X%.

Q: Worse only after maintenance — GCL version drift or readback mismatch?

Likely cause: schedule/config changed silently, or readback differs from intended GCL/mapping bundle. Quick check: compare config version hash vs golden; read back GCL + mapping and diff. Fix: enforce config-as-code; block deployment if readback diff ≠ 0; add rollback path. Pass criteria: version hash matches; readback diff = 0; KPIs stable within X over X reboots.

Q: Preemption still appears inside a window — guard band missed serialization time?

Likely cause: guard band omitted worst-case serialization/egress drain time, or edge behavior adds extra fixed delay. Quick check: compute max-frame serialization; check edge-aligned gate_miss/late and egress occupancy at window open. Fix: extend guard band by X; if needed, cap max frame size for interfering class. Pass criteria: window-edge late events = 0; gate_miss = 0; guard-band margin ≥ X.

Q: Port watermark is not high, but tail is bad — congestion is downstream?

Likely cause: true bottleneck is downstream (next hop/egress), so local watermark does not reflect final queueing. Quick check: compare per-hop percentiles; check downstream utilization bursts + counters; identify where queueing accumulates. Fix: apply shaping at the true bottleneck hop; re-budget per-hop queue bound; adjust schedule or split traffic. Pass criteria: bottleneck hop confirmed; downstream watermark ≤ X%; E2E p999 ≤ X.

Q: p99 meets spec, but max occasionally violates — beat frequency or burst injection?

Likely cause: periodic beat creates rare peaks, or occasional out-of-model bursts appear (maintenance scans/retries). Quick check: test max-violation periodicity; correlate to event logs + burst counters; validate burst assumptions used in bounds. Fix: align periods to remove beat; tighten admission (Qci) for out-of-model bursts; add regression scenario. Pass criteria: max ≤ X over X minutes; no periodic max spikes; blocked bursts ≤ X.

← Back to: Industrial Ethernet & TSN

Determinism is not “low average latency” — it is a provable bound on the tail (p99/p999/max). This page turns switch delay + Qbv/Qci shaping into an end-to-end jitter budget that can be measured, versioned, and accepted with clear pass criteria.

H2-1 · Scope & Non-Scope (Boundary Contract)

This page is a strict anti-overlap gate: it standardizes the latency/jitter “accounting” and focuses only on deterministic delay control via switch delay models, Qbv/Qci shaping, and end-to-end jitter budgeting with measurable pass criteria.

What this page solves (symptom-level)

Average latency looks fine, but sporadic “late” events or p99/p999 spikes still occur.
Load is “only 20%”, yet real-time traffic still stalls during bursts or contention windows.
TSN features are enabled, but determinism does not improve because the tail is not budgeted, bounded, or verified.

In-scope (covered here)

Switch delays: per-hop model = fixed pipeline + queueing + shaping / gate-wait.
Qbv / Qci: time-aware gating (Qbv) and per-stream policing / admission (Qci) as tail-control levers.
End-to-end jitter budgeting: bound each hop, sum bounds, define test matrix, and write pass/fail criteria.

Concrete deliverables (what to carry into design reviews)

Latency/Jitter budget sheet template: hop-by-hop fixed delay + queue bound + gate-wait bound + timebase error input (ε) + notes.
Qbv GCL design record: cycle time, slot plan, guard-band logic, and “max gate-wait” bound per class.
Qci policing record: per-stream burst/rate parameters, violation actions, and counters to audit “bad flows”.
Verification matrix: empty-load / full-load / burst / mixed-class / fault-injection, with p99/p999 and late-rate pass criteria.

Out-of-scope (explicit pointers to avoid cross-page overlap)

PTP / BMCA / topology calibration → go to PTP Hardware Timestamping. (Timebase is treated here as an input error ε only.)
SyncE / White-Rabbit frequency lock & holdover → go to the SyncE / White-Rabbit-Style Timing pages.
PROFINET / EtherCAT / CIP business models & certification → go to Industrial Ethernet Stacks pages.
PHY SI / EMC / TVS / CMC / grounding → go to PHY Co-Design & Protection pages. (This page focuses on switch scheduling/policing bounds.)

Keyword gates (fast routing)

If searching for “offset drift / BMCA / asymmetry…”

Route to PTP Hardware Timestamping (timebase calibration & correction).

If searching for “SyncE jitter templates / holdover…”

Route to Synchronous Ethernet (SyncE) (frequency distribution & filtering).

If searching for “TVS / CMC / surge return path…”

Route to PHY Co-Design & Protection (EMC/ESD/surge/layout stability).

Diagram: Page map (Budget → Parameterize → Validate → Accept) with explicit out-of-scope jump pointers.

H2-2 · Determinism = Latency Distribution (Beyond the Average)

Determinism is not “lower average latency”. Determinism is a tight and bounded latency distribution under defined load patterns. The engineering goal is tail control: convert unpredictable spikes into bounded wait times that can be budgeted and verified.

Mental model correction

Common mistake

Optimizing mean latency while ignoring p99/p999.
Using “average utilization” as a proxy for real-time readiness.
Declaring success because throughput is high, even when control traffic misses deadlines.

Correct target

Bounded tail: p99/p999 and max latency stay within a known upper bound.
Defined load patterns: empty-load, full-load, burst, and mixed-class cases are explicitly tested.
Deadline integrity: late-rate (deadline miss events) is controlled and audited.

Engineering definitions used on this page (to keep scope tight)

Latency: end-to-end time from transmit event to receive event under a defined measurement tap-point.
Jitter (this page): short-term spread of the latency distribution within the same configuration and test window.
Wander: long-term drift (not expanded here). If wander dominates, route to PTP/SyncE pages.

Metrics that actually reflect determinism

Percentiles: p50 (median), p95, p99, p999 — prioritize p99/p999 for tail control.
Max–min: highlights rare spikes that percentiles may hide in short windows.
Late-rate: “deadline miss events per N packets / per minute” (requires a clear denominator).
Burst sensitivity: tail increase under burst injection and mixed traffic classes.

Measurement hygiene (prevents “false stability”)

Keep window length explicit (e.g., Y minutes or N frames) and do not mix windows across runs.
Do not compare percentiles across different tap points or timestamp definitions.
Always log load pattern metadata (burst size, mix ratio, class mapping) together with results.

Tail amplifiers (kept strictly in-scope)

Queueing under contention: bursts and shared resources create non-linear tail growth.
Gate-wait (Qbv): bounded but non-zero waiting until the next open window; the bound must be budgeted.
Unregulated flows (Qci missing/weak): abnormal traffic can reintroduce tail spikes.
Observability mismatch: counters/timestamps look clean because the accounting definition is inconsistent (tap-point/denominator/window).

Diagram: same mean latency, different tails. Determinism improves when the tail is tightened and bounded.

H2-3 · Latency Taxonomy (End-to-End Ledger Canon)

A deterministic design requires a single accounting canon. This section defines the end-to-end segments, the only three allowed delay categories, and the minimum timestamp tap-point rules so every budget line uses the same meaning.

Canonical end-to-end segment chain

Use this chain as the fixed “row order” for budgets and reports:

Endpoint Tx → NIC/Driver → Switch ingress → Queue → Shaper/Gate → Egress → Cable → Endpoint Rx

Only three allowed delay categories (use everywhere on this page)

Fixed latency: deterministic baseline from pipeline/serialization; weakly dependent on mode and frame length.
Load-dependent queueing: contention/bursts create non-linear tail growth; must be bounded with assumptions.
Time-dependent gate-wait: Qbv windows convert random contention into bounded waiting; the bound must be budgeted.

Ledger rule

Tail control is primarily shaped by queueing and gate-wait. Fixed latency defines the baseline and must remain traceable.

Minimum timestamp tap-point rules (no implementation details)

Tap-point must be declared for each measurement run (endpoint tap vs switch ingress/egress tap).
Do not mix tap-points when comparing p99/p999; otherwise the distribution is not comparable.
Window/denominator must be logged (Y minutes or N frames) together with load pattern metadata.
Timebase error is an input ε; if drift/asymmetry dominates, route to timing pages (PTP/SyncE).

Budget sheet fields (mobile-friendly list)

Segment / hop name
Fixed (min/typ) and notes (mode, frame length dependency)
Queue bound (max) with stated burst assumptions
Gate-wait bound (max) from GCL + guard-band
Source: Measured / Datasheet / Calibrated
Tap-point + window length + load pattern ID

Diagram: end-to-end chain with tap-points and region types (measured / inferred / opaque).

H2-4 · Switch Internal Delays (Fixed-Latency Model)

Per-hop fixed latency becomes traceable when it is decomposed into pipeline stages. This section explains store-and-forward vs cut-through at a modeling level, identifies common fixed-delay contributors, and classifies each contributor by the most reliable source: measured, datasheet, or bench-calibrated.

Store-and-forward vs cut-through (determinism-oriented model)

Store-and-forward: forwarding starts after full frame buffering; fixed latency includes frame-length-dependent serialization terms.
Cut-through: forwarding starts earlier; fixed latency can be lower, but mode constraints and feature hooks may add deterministic stages.
Ledger requirement: fixed latency must be recorded with mode + port speed + representative frame-length assumptions.

Common fixed-delay contributors (pipeline-level)

Ingress parse and classification
Lookup and forwarding decision
Buffer write/read and internal fabric traversal
Rewrite / mirroring hooks (deterministic adders)
Egress scheduling baseline and MAC serialization

Feature toggles (impact statement only)

Enabling ACL, TSN, Telemetry, or Metrology typically inserts deterministic processing stages. Treat the delta as fixed latency adders and validate by a controlled bench run.

Source classification (Measured / Datasheet / Calibrated)

Measured

Use controlled traffic (fixed frame length, stable load pattern) to extract the fixed baseline and confirm mode dependencies.

Datasheet

Use vendor latency numbers as the budget seed; track the exact conditions (mode, features, speed) as ledger notes.

Bench-calibrated

When feature combinations or topology interactions dominate, calibrate fixed adders by bench and freeze them with configuration versioning.

Diagram: switch pipeline segmentation (fixed-delay contributors) and feature toggles that insert deterministic stages.

H2-5 · Queueing Delay (Why the Tail Explodes)

Determinism fails most often because queueing delay is not linear: bursts and contention create tail growth that is invisible in long-window averages. This section maps the top three triggers to observable symptoms and to the minimum accounting fields required for budgets and logs.

The three most common queueing triggers

1) Bursts (microbursts)

Observable: average utilization looks safe, but p99/p999 latency spikes in short windows.
Accounting fields: burst size (B), burst interval, peak rate, service rate (μ), peak queue depth.
First check: compare 1 ms vs 1 s windows; confirm whether peaks are hidden by averaging.

2) Shared queues / shared buffers

Observable: a “noisy” class inflates tail latency of a “well-behaved” class.
Accounting fields: class→queue mapping, per-queue depth/occupancy, per-class counters.
First check: inspect per-queue counters, not port-wide totals; identify which queue hits saturation.

3) Head-of-line blocking / priority inversion

Observable: higher-priority traffic still experiences stalls because it is blocked by a shared egress dependency.
Accounting fields: priority→class mapping, scheduler policy, egress congestion markers.
First check: verify mapping consistency end-to-end; confirm which egress is the true bottleneck.

“Average utilization is low, but it feels stuck” — the accounting explanation

Window mismatch: a safe 1 s average can hide 1–10 ms peaks that dominate tail latency.
Peak vs mean: queue growth is driven by short-term peak arrival exceeding service rate.
Effective service loss: drops/retries or backpressure events reduce effective throughput (treat as a queueing amplifier).

Ledger rule

Any “utilization” metric must be reported with the window length and broken down to the queue level.

Minimum accounting fields for queueing (budget + logs)

Q(t): queue depth over time (peak, threshold crossings, recovery time).
μ: service rate (configured vs observed) per egress queue.
Burst: size (B), peak rate, burst period / inter-burst gap.
Mapping: class→queue and priority→class consistency across endpoints and switches.
Counters: drops, backpressure/pause events, and per-queue occupancy.

Diagram: queue depth over time shown as step blocks (burst-in → service → tail risk). No curves.

H2-6 · Qbv Time-Aware Shaping (Turn Uncertainty into a Schedule)

Qbv replaces a portion of random contention with a time table. A gate-control list (GCL) defines repeating windows, and the resulting delay becomes deterministic but bounded: traffic waits until the next open window. This section describes the minimum GCL fields, the role of guard bands, and how timebase error enters the budget without expanding into configuration ecosystems.

Qbv in one line (budget-centric)

A repeating schedule opens/closes gates per traffic class so the worst-case waiting time becomes a bounded gate-wait term in the latency ledger.

GCL (gate-control list): minimum fields that must be written down

Cycle time: schedule period that repeats.
Slots: windows with fixed start/length inside the cycle.
Gate state: open/closed per traffic class per slot.
Repeat: deterministic repetition used to compute bounded wait.
Class mapping: which class uses which window (keep mapping stable).

Guard band and budget impact (bounded wait)

Guard band: a protective slice near window boundaries to prevent boundary-crossing interference.
Gate-wait term: traffic may wait until the next open slot; this is deterministic but bounded.
Timebase input: treat timebase misalignment as an input error ε; if it dominates, route to timing pages (PTP).

Ledger fields (Qbv)

cycle time, slot lengths, guard band, class mapping, and max gate-wait bound (per hop) → carried into end-to-end budget.

Diagram: one-cycle GCL stripe with minimal labels (Class A / Class B / BE) and guard band.

H2-7 · Qci Per-Stream Policing (Keep “Bad Traffic” Out)

Deterministic networking collapses when input traffic has no bound. Qci-style per-stream policing enforces rate and burst constraints at admission, isolating abnormal flows so tail latency remains budgetable. This section focuses on engineering goals, ledger fields, and counters — without expanding into configuration ecosystems.

Engineering target (what policing is supposed to guarantee)

Bound the arrival process: cap burst size and sustained rate per stream.
Protect determinism: prevent a single abnormal stream from inflating p99/p999 for others.
Make budgets credible: queue and gate-wait bounds assume input is already constrained.

Actions, tail impact, and trade-offs

Conformance decision: each stream is judged as conforming or violating against rate/burst limits.
Actions: allow, drop, or mark (policy-driven) to block “bad traffic” from consuming shared resources.
Tail reduction: fewer microburst-driven queue spikes → tighter latency distribution.
Cost: policing can reduce throughput or increase drop/mark events (must be included in acceptance).

Ledger rule

Tail improvements must be reported together with violation counters and drop/mark rates to avoid “fake determinism.”

Minimum fields to record (per stream)

Stream ID: a stable flow identifier (source/destination/port/class tuple).
Meter parameters: rate limit + burst limit (token-bucket parameter set).
Action policy: allow / drop / mark (and any severity level).
Violation counters: count, duration, peak violation intensity.
Mapping reference: class→queue reference used by the stream (for audit and drift checks).

Diagram: per-stream token bucket → action (gate/drop/mark) → counters (violations/drops). Minimal text, ledger-centric.

H2-8 · End-to-End Jitter Budgeting (From “Feeling” to Acceptance)

This section is the core deliverable: a copyable budgeting method that turns end-to-end latency and jitter into a ledger with explicit bounds, assumptions, and verification hooks. Every row must be traceable to a source (measured, datasheet, or calibrated), and the acceptance criteria must include tail metrics, not only averages.

Canonical budget form (ledger canon for this page)

E2E latency = Σ fixed + Σ queue bound + Σ gate-wait bound + ε_timebase + ε_impl

fixed: per-hop deterministic pipeline/serialization baseline (feature/mode dependent).
queue bound: worst-case queueing term under declared burst/service assumptions.
gate-wait bound: worst-case waiting until next open window (plus guard band).
ε_timebase: timebase input error (treated as a parameter; details belong to timing pages).
ε_impl: implementation residual (tap-point limitations, host scheduling variance, unmodeled adders).

How to write bounds (engineering templates)

Queue bound

Write the worst-case queueing term using declared maximum burst (B) and service rate (μ). Attach assumptions: burst window, class→queue mapping, and load pattern ID.

Gate-wait bound (Qbv)

Bound the waiting term by the maximum closed-window length plus guard band. Record cycle time, slot lengths, and the exact GCL version.

Timebase input

Treat ε_timebase as an input parameter carried into the budget sheet; if it dominates, route to timing/sync pages.

Budget sheet fields (copyable template)

Hop / Segment: EP, SW1, SW2…
Item: fixed / queue bound / gate-wait bound / ε
Min / Max: or typ/max (explicit bounds, not only typical)
Notes: mode, features, frame length, mapping, burst window, load pattern ID
Source: measured / datasheet / calibrated
Tap-point: measurement tap declaration aligned with the page’s taxonomy
Verification: plan + counters used to validate this row in bring-up

Acceptance criteria (must include tail)

Tail metrics: p99/p999 (and a max bound where applicable) under declared load patterns.
Bound compliance: verify that fixed + bounded terms explain measured results within ε_impl.
Counter sanity: violation/drop/mark counters remain within declared limits (no hidden trade-offs).
Version lock: budget must bind to configuration versions (GCL/mapping/policing).

Diagram: budget “spreadsheet” structure (columns + per-hop grouped rows + Σ summary + ε inputs). Rendered as a box diagram for mobile safety.

H2-9 · Parameterization Workflow (Requirements → Config → Feedback)

Determinism becomes repeatable only when parameters are managed as a closed loop. This workflow turns requirements into a budget, produces shaping/policing parameters, verifies results with consistent metrics, and feeds deviations back into the ledger. The focus is on reusable artifacts and version control — not vendor tool details.

Step 1 — Requirements as ledger inputs

Cycle / control loop: cycle time, deadline, update rate.
Tail constraint: E2E p99/p999 limit (not only average).
Topology: hop count, critical paths, fan-out / aggregation points.
Load pattern ID: idle / full / burst / mixed / fault-injection profile.

Step 2 — Budget allocation and hard boundaries

Per-hop allocation: assign fixed, queue bound, and gate-wait bound per hop.
Hard boundaries: identify terms that cannot be “optimized away” (e.g., gate-wait bound, ε_timebase).
Assumption lock: freeze mapping, burst window, and frame-size assumptions in the ledger.

Artifact

A versioned budget ledger that names every hop and bound term, plus the load pattern ID used for acceptance.

Step 3 — Generate parameters (GCL + mapping + policing)

GCL (Qbv): cycle time, slot lengths, guard band, and a stable class→window plan.
Queue mapping: class→queue mapping aligned across endpoints and switches.
Policing (Qci-style): per-stream rate/burst limits and violation actions.
Config identity: attach a config version tag to each parameter set.

Step 4 — Deploy, verify, and feed deviations back

Verify with consistent metrics: per-hop delay, E2E p99/p999, queue watermark, gate misses, drops/late.
Deviation attribution: map every failure to a ledger term (fixed / queue / gate-wait / ε).
Closed-loop rule: if acceptance fails, return to Step 2 (reallocate bounds) before tuning ad-hoc.

Step 5 — Versioning and change management

Rule: parameter sheet version = system configuration source-of-truth.
Bind: ledger version ↔ GCL version ↔ mapping version ↔ policing version ↔ test matrix version.
Rollback path: every release must have a tested rollback tag with known acceptance results.

Diagram: closed-loop workflow (Plan → Configure → Measure → Adjust) with version/rollback branch.

H2-10 · Validation & Measurement (Make Determinism Measurable)

Determinism cannot be accepted without consistent measurement contracts. This section defines a minimum metric set, a scenario matrix that exposes tail failures, a tap-point consistency gate, and a pass-criteria template with threshold placeholders. It avoids tool-specific details and focuses on repeatable acceptance.

Minimum metrics (ledger-aligned)

Per-hop delay: hop baseline + deviations (use declared tap-points).
E2E tail: p99 and p999 latency under declared scenario IDs.
Drops / late: drop counters and late-arrival events for bounded windows.
Queue watermark: per-queue peak occupancy (not port-wide only).
Gate miss: gate misses / window violations (per class/queue where applicable).
Policing health: violation counters (per stream) and action rates.

Scenario matrix (minimum coverage set)

Idle: baseline fixed + measurement noise floor.
Full load: sustained saturation risks and scheduling drift.
Burst: microburst tail exposure and queue bound validation.
Mixed: class interaction and mapping correctness under concurrency.
Fault injection: abnormal traffic / violations to test policing and isolation.

Contract

Any reported p99/p999 must include scenario ID and window length; otherwise results are not comparable.

Tap-point consistency gate (measurement contract)

Declare tap-points: every per-hop delay must name the timestamp tap position.
Reject mixed baselines: do not attribute differences to queue/gate if tap-points differ.
Timebase note: if timebase dominates, treat it as ε_timebase input (route to timing pages).

Pass criteria template (threshold placeholders)

Metric: E2E p99 / E2E p999 / per-hop delay / queue watermark / gate miss
Scenario ID: idle / full / burst / mixed / fault
Window: X ms / X s (must be declared)
Threshold: ≤ X (units) + counters within X
Evidence: counters + percentile report + config version tag

Diagram: scenario matrix (left) + metrics dashboard blocks (right). Designed for acceptance traceability.

H2-11 · Design Hooks & Pitfalls (Where Tail Latency Explodes)

This section captures only determinism-relevant pitfalls: time windows (Qbv), queue/mapping mistakes, policing drift, and measurement contract mismatches. Each pitfall is expressed as a repeatable triage path: trigger → symptom → first counter check. Topics such as PHY/EMC/SI or protocol stack business models are intentionally excluded.

A) Time-table pitfalls (Qbv / GCL / guard band)

Pitfall: GCL cycle ≠ application cycle

Trigger: mismatched cycle lengths or phase drift.
Symptom: periodic p999 spikes (beating / phase-locked bursts).
First check: p99/p999 sliced by time phase + GCL cycle/slot audit.
Fix direction: align cycles/phase, then re-balance slot allocation.

Pitfall: guard band underestimated

Trigger: guard band computed with optimistic frame/serialization assumptions.
Symptom: late events or “window miss” bursts under real payloads.
First check: gate-miss / late counters aligned to the same window length.
Fix direction: widen guard band or adjust slot boundaries and queue service.

B) Mapping & queue pitfalls (tail grows “while utilization looks low”)

Pitfall: priority / class-to-queue mapping error

Trigger: high-priority class mapped into shared BE queue.
Symptom: high-priority tail rises (HOL blocking), even at “low average load”.
First check: per-queue watermark + mapping table version audit.
Fix direction: isolate queues and re-validate allocation with burst scenario ID.

C) Policing & metric contract pitfalls (false calm vs real congestion)

Pitfall: policing too tight / too loose

Trigger: token bucket set without matching the declared burst window.
Symptom: drops (too tight) or tail returns (too loose) under burst stress.
First check: violations/drops counters aligned with p99/p999 in the same scenario window.
Fix direction: re-derive rate/burst from the ledger assumptions, then re-run matrix.

Pitfall: counters use inconsistent windows / denominators

Trigger: mixing per-port, per-queue, and per-flow metrics without stating scope.
Symptom: “utilization looks fine” while p999 and watermarks explode.
First check: declare window length + denominator + object (port/queue/flow) before comparing.
Fix direction: standardize a metric contract, then re-baseline acceptance.

Diagram: triage paths — Trigger → Symptom → First counter check (with a small fix-direction tag).

H2-12 · Engineering Checklist (Design → Bring-up → Production)

This checklist is determinism-specific. It binds the budget ledger, shaping/policing parameters, measurement contracts, and the regression matrix into three gates: Design, Bring-up, and Production. Each gate requires versioned artifacts, consistency checks, and acceptance-ready evidence.

Design gate

Ledger done: per-hop fixed + queue bound + gate-wait bound + ε terms recorded.
Worst-case defined: scenario IDs + burst assumptions + window length locked.
Params frozen: GCL + mapping + policing tables frozen with version tag.
Hard bounds tagged: identify non-negotiable terms and margin ownership.
Pass template ready: thresholds as “≤ X” with evidence fields defined.

Bring-up gate

Per-hop calibration: measured vs inferred vs datasheet terms labeled.
Tap contract: tap-points declared and consistent across all measurements.
GCL readback: downloaded schedule matches readback and version tag.
Counters aligned: window/denominator/object scopes standardized.
Tail validated: p99/p999 verified under burst & mixed scenario IDs.

Production gate

Version binding: ledger ver ↔ config ver ↔ test ver are linked.
Observability: watermark, gate miss, violations, drops, late, event fields logged.
Regression set: minimum matrix (idle/full/burst/mixed/fault) is automated.
Change triggers: topology/mapping/window changes force re-budget + re-accept.
Rollback tag: a tested rollback version with known acceptance evidence exists.

Diagram: three gates (Design / Bring-up / Production) with 5 checkboxes each (short labels only).

H2-13 · Applications & IC Selection (Determinism-First)

This section converts latency/jitter modeling into a practical selection method: define determinism targets, map them to TSN mechanisms (Qbv/Qci/observability), then shortlist devices by measurable evidence.

Deliverables Determinism-First Decision Tree Example Part Numbers (BOM-ready)

Applications: determinism targets expressed as p99/p999 + cycle + hop (no protocol deep-dive).
Selection axes: fixed per-hop latency, tail control, policing protection, observability, and resource boundaries.
Shortlist examples: TSN switches / TSN MPUs / industrial switch silicon with concrete orderable PNs.

A) Applications (expressed as determinism targets)

Keep the application description strictly measurable: cycle, percentile bounds, “late” rate, and protection against abnormal traffic.

Motion Control Cell Qbv-bound tail

Targets: cycle = X µs, E2E p99 ≤ Y µs, p999 ≤ Z µs, late ≤ A / 10^6 frames.
Dominant risks: mixed-load bursts, gate schedule drift, priority mapping mistakes.
Selection must-have: TAS/Qbv, per-queue watermark + gate-miss/late counters.

Controller-to-Remote I/O Qci domain protection

Targets: bounded jitter under contention; abnormal traffic must not inflate tail.
Dominant risks: “bad” flows (misconfigured burst/rate), queue hogging, retry storms.
Selection must-have: PSFP/Qci (token bucket + violation counters) and stable policing behavior.

Deterministic Triggering / Imaging tight p999

Targets: p999 bound dominates; small periodic tail spikes are unacceptable.
Dominant risks: schedule beat frequency, guard-band underestimation, egress serialization.
Selection must-have: deterministic schedule update, strong time-window tooling, per-hop calibration hooks.

Edge Gateway (TSN island to cloud) observability-first

Targets: deterministic forwarding inside TSN domain + measurable degradation outside.
Dominant risks: mixed policies, counter definition drift, silent tail inflation.
Selection must-have: rich counters (per-queue watermark, late/drop, policing violations) + black-box logging hooks.

B) IC Selection Axes (determinism-first, measurable)

Axis 1 — Fixed per-hop latency (predictable baseline)

Prefer architectures that keep forwarding path stable under feature enablement.
Require a clear method to obtain fixed latency: datasheet value, bench calibration, or per-hop measurement.

Axis 2 — Tail control (queue bound + gate-wait bound)

Qbv/TAS converts uncertainty into a bounded, schedulable wait term (deterministic but bounded).
Queue tail must be bounded by design: burst assumptions, service rate, queue isolation, and shaping policy.

Axis 3 — Domain protection (Qci / per-stream policing)

PSFP/Qci prevents abnormal flows from inflating tail for everyone else.
Must-have evidence: token bucket parameters + violation counters + deterministic drop/mark policy.

Axis 4 — Observability (field-proof determinism)

Require counters that directly map to the budget terms: queue watermark, gate-miss/late, drops, policing violations.
Prefer designs that can run PRBS/loopback/traffic tests without changing the determinism path.

Axis 5 — Resource boundaries (scale without “schedule jitter”)

Check table sizes, number of queues per port, GCL length, and schedule update behavior.
Operational rule: configuration table version is a controlled artifact (diffable, rollbackable).

C) Example Part Numbers (shortlist by determinism needs)

The part numbers below are orderable examples. Feature subsets vary by variant and software stack; verify Qbv/Qci/observability in the official documentation.

Bucket 1 — Compact TSN switch (fast path + integrated CPU)

Microchip: LAN9662-I/9MX (TSN switch + CPU), LAN9668/9MX (8-port TSN switch + CPU).
Why (determinism): TSN scheduling/policing options + rich counters are typically available for field validation.
Best fit: remote I/O, compact cells, gateway-class TSN islands.

Bucket 2 — Higher port-count / higher bandwidth industrial switching silicon

Microchip (SparX-5i family): VSC7546TSN-V/5CC (industrial switch class).
Why (determinism): scale (ports/bandwidth) + TSN mechanisms enable tight p999 targets across more hops.
Best fit: TSN backbone switches, multi-line aggregation, high-throughput deterministic cells.

Bucket 3 — TSN-capable MPU (integrated L2 switch for endpoints/gateways)

Renesas: R9A07G084M04GBG#AC0 (RZ/N2L group), R9A07G084M04GBG#BC0 (variant).
Why (determinism): integrated switching + TSN support is useful when endpoint control and deterministic forwarding are packaged together.
Best fit: drives, gateways, remote I/O, industrial endpoints requiring tight timing behavior.

Bucket 4 — Automotive-grade TSN switches (when safety/security constraints dominate)

NXP: SJA1105QELY (AVB/TSN switch family), SJA1110CEL/0Y (SJA1110 family variant).
Why (determinism): TSN scheduling features + strong platform support can help stabilize per-hop behavior at scale.
Best fit: automotive/industrial cross-over gateways, harsh environment edge boxes.

Selection rule (determinism-first): choose by measurable evidence — per-hop delay method, queue/gate counters, policing violation counters, and a repeatable test matrix. If any of these are missing, determinism becomes un-auditable in the field.

Diagram — Determinism-First Selection Tree

Inputs: p99/p999 target, cycle time, hop count, mixed-load risk. Outputs: required TSN mechanisms and shortlist buckets.

Practical usage: fill X/Y/Z/N with project targets, then lock Qbv/Qci/observability requirements and shortlist by measurable evidence.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13 · FAQs (Field Troubleshooting + Acceptance Criteria)

Scope: only long-tail troubleshooting and acceptance wording for latency/determinism (p99/p999/tail), Qbv/Qci, and budget/measurement consistency. Format is fixed per question: Likely cause / Quick check / Fix / Pass criteria.

Data-structured fields p99 / p999 / max gate_miss / late queue_watermark / violations / drops

Average latency looks OK, but p99 blows up — burst or gate miss first?

Likely cause: micro-bursts inflate queue tail, or Qbv gate miss creates periodic late frames.

Quick check: correlate p99/p999 spikes with queue_watermark vs gate_miss/late in the same window (same denominator).

Fix: cap burst (Qci token bucket) and/or adjust GCL slots + guard band until gate misses disappear.

Pass criteria: E2E p99 ≤ X, p999 ≤ X over X minutes; gate_miss = 0; late ≤ X / 10^6 frames.

Qbv enabled but jitter gets worse — GCL beat frequency or guard band too small?

Likely cause: GCL period misaligned with traffic cycle (beat), or guard band underestimates worst-case serialization.

Quick check: look for periodic p999 spikes; check gate_miss clustered at window edges; validate max-frame serialization assumption.

Fix: align GCL period to cycle; increase guard band by X (worst-case serialize + margin).

Pass criteria: no periodic p999 spikes over X minutes; gate_miss = 0; window-edge late = 0.

Low load still “stalls” — window closed too long or priority mapping wrong?

Likely cause: gate-closed duration blocks a critical class, or QoS mapping reset misroutes traffic to BE queue.

Quick check: compare stall timing vs gate state; verify class→queue counters (per-class increments must match expectation).

Fix: shorten closed windows; restore mapping matrix; pin configuration by version + readback diff.

Pass criteria: worst-case wait ≤ X; E2E p99 ≤ X; mapping counters stable within X%.

Same config, different switch silicon worsens tail — fixed latency delta or queue behavior?

Likely cause: different per-hop fixed pipeline latency and/or buffer sharing inflates queue tail under burst.

Quick check: measure per-hop idle baseline vs mixed-load tail; compare queue_watermark vs dequeue stability at same input burst.

Fix: re-calibrate fixed term; re-budget queue bound; adjust queue isolation/shaping assumptions for this silicon.

Pass criteria: updated budget closes with ≥ X margin; per-hop fixed repeatability within ±X; p999 ≤ X.

Field shows occasional “late”, but counters look clean — window definition or tap mismatch?

Likely cause: counter window/denominator hides events, or measurement tap points differ across devices/tools.

Quick check: force same observation window + denominator; run tap-consistency check (same event, same tap definition).

Fix: standardize KPI definition (window/denominator/tap); log raw late events until stable.

Pass criteria: tool-to-tool late rate within ±X; late ≤ X / 10^6 frames under standardized window.

One abnormal node slows the whole network — Qci not blocking or shared-queue HOL?

Likely cause: policing missing/weak for that stream, or shared buffering causes head-of-line blocking across classes.

Quick check: isolate offender via per-stream counters; check policing_violations and cross-queue watermark coupling.

Fix: tighten Qci token bucket; increase isolation (dedicated queue/class) for critical traffic.

Pass criteria: violations ≤ X per X minutes; critical class p99/p999 stays within X; watermark ≤ X%.

Worse only after maintenance — GCL version drift or readback mismatch?

Likely cause: schedule/config changed silently, or readback differs from intended GCL/mapping bundle.

Quick check: compare config version hash vs golden; read back GCL + mapping and diff.

Fix: enforce config-as-code; block deployment if readback diff ≠ 0; add rollback path.

Pass criteria: version hash matches; readback diff = 0; KPIs stable within X over X reboots.

Preemption still appears inside a window — guard band missed serialization time?

Likely cause: guard band omitted worst-case serialization/egress drain time, or edge behavior adds extra fixed delay.

Quick check: compute max-frame serialization; check edge-aligned gate_miss/late and egress occupancy at window open.

Fix: extend guard band by X; if needed, cap max frame size for interfering class.

Pass criteria: window-edge late events = 0; gate_miss = 0; guard-band margin ≥ X.

Port watermark is not high, but tail is bad — congestion is downstream?

Likely cause: true bottleneck is downstream (next hop/egress), so local watermark does not reflect final queueing.

Quick check: compare per-hop percentiles; check downstream utilization bursts + counters; identify where queueing accumulates.

Fix: apply shaping at the true bottleneck hop; re-budget per-hop queue bound; adjust schedule or split traffic.

Pass criteria: bottleneck hop confirmed; downstream watermark ≤ X%; E2E p999 ≤ X.

p99 meets spec, but max occasionally violates — beat frequency or burst injection?

Likely cause: periodic beat creates rare peaks, or occasional out-of-model bursts appear (maintenance scans/retries).

Quick check: test max-violation periodicity; correlate to event logs + burst counters; validate burst assumptions used in bounds.

Fix: align periods to remove beat; tighten admission (Qci) for out-of-model bursts; add regression scenario.

Pass criteria: max ≤ X over X minutes; no periodic max spikes; blocked bursts ≤ X.

After VLAN/QoS changes, determinism collapses — mapping matrix reset?

Likely cause: priority→queue mapping or policer bindings reset, breaking class isolation and bounds.

Quick check: verify mapping + policer binding tables post-change; compare per-class counters before/after.

Fix: apply mapping/policers as an atomic versioned bundle; enforce readback validation after any VLAN/QoS change.

Pass criteria: mapping diff = 0; per-class p99/p999 stable within X; no queue cross-talk events > X.

Enabling mirroring/telemetry makes tail worse — fixed path increase or queue contention?

Likely cause: extra pipeline stages add fixed latency, or telemetry shares resources and increases contention.

Quick check: A/B test telemetry on/off; compare per-hop idle baseline and queue_watermark; see whether p99 shifts or tail expands.

Fix: move telemetry off the critical class, reduce sampling, or isolate it in a dedicated queue; re-budget fixed term if unavoidable.

Pass criteria: telemetry-on still meets p99 ≤ X & p999 ≤ X; baseline delta ≤ X; critical watermark change ≤ X%.

Latency & Determinism for TSN: Jitter Budgeting & Shaping

Latency & Determinism for TSN: Jitter Budgeting & Shaping

H2-1 · Scope & Non-Scope (Boundary Contract)

H2-2 · Determinism = Latency Distribution (Beyond the Average)

H2-3 · Latency Taxonomy (End-to-End Ledger Canon)

H2-4 · Switch Internal Delays (Fixed-Latency Model)

H2-5 · Queueing Delay (Why the Tail Explodes)

H2-6 · Qbv Time-Aware Shaping (Turn Uncertainty into a Schedule)

H2-7 · Qci Per-Stream Policing (Keep “Bad Traffic” Out)

H2-8 · End-to-End Jitter Budgeting (From “Feeling” to Acceptance)

H2-9 · Parameterization Workflow (Requirements → Config → Feedback)

H2-10 · Validation & Measurement (Make Determinism Measurable)

H2-11 · Design Hooks & Pitfalls (Where Tail Latency Explodes)

H2-12 · Engineering Checklist (Design → Bring-up → Production)

H2-13 · Applications & IC Selection (Determinism-First)

A) Applications (expressed as determinism targets)

B) IC Selection Axes (determinism-first, measurable)

C) Example Part Numbers (shortlist by determinism needs)

Diagram — Determinism-First Selection Tree

Request a Quote

Accepted Formats

Attachment

H2-13 · FAQs (Field Troubleshooting + Acceptance Criteria)

Explore

Categories

Get in Touch

Latency & Determinism for TSN: Jitter Budgeting & Shaping

Latency & Determinism for TSN: Jitter Budgeting & Shaping

H2-1 · Scope & Non-Scope (Boundary Contract)

H2-2 · Determinism = Latency Distribution (Beyond the Average)

H2-3 · Latency Taxonomy (End-to-End Ledger Canon)

H2-4 · Switch Internal Delays (Fixed-Latency Model)

H2-5 · Queueing Delay (Why the Tail Explodes)

H2-6 · Qbv Time-Aware Shaping (Turn Uncertainty into a Schedule)

H2-7 · Qci Per-Stream Policing (Keep “Bad Traffic” Out)

H2-8 · End-to-End Jitter Budgeting (From “Feeling” to Acceptance)

H2-9 · Parameterization Workflow (Requirements → Config → Feedback)

H2-10 · Validation & Measurement (Make Determinism Measurable)

H2-11 · Design Hooks & Pitfalls (Where Tail Latency Explodes)

H2-12 · Engineering Checklist (Design → Bring-up → Production)

H2-13 · Applications & IC Selection (Determinism-First)

A) Applications (expressed as determinism targets)

B) IC Selection Axes (determinism-first, measurable)

C) Example Part Numbers (shortlist by determinism needs)

Diagram — Determinism-First Selection Tree

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

H2-13 · FAQs (Field Troubleshooting + Acceptance Criteria)

Explore

Categories

Get in Touch