POWERLINK / SERCOS III → TSN: Bridging & Dual-Stack Migration
← Back to: Industrial Ethernet & TSN
This page is a migration playbook for moving POWERLINK / SERCOS III to a TSN backbone without losing determinism.
It explains how to bridge or dual-stack safely, align time domains, map cyclic/acyclic traffic into guarded TSN streams, and validate cutover with measurable pass/fail criteria.
H2-1 · Scope & Stop-Lines (What this page covers / excludes)
This page focuses on migration methodology and rollout paths—how to move POWERLINK or SERCOS III systems toward TSN using bridging and dual-stack strategies—without losing determinism, sync, diagnostics, or uptime.
- Bridging gateway and dual-stack architectures
- Coexistence rules (cyclic vs acyclic, burst isolation)
- Cutover playbooks: shadow mode → partial cutover → rollback
- Validation plan: determinism, sync, recovery, diagnostics (thresholds as X placeholders)
- Operational hooks: counters, black-box logging, field triage, safe upgrades
- TSN parameter deep-dive (Qbv/Qci/Qav/GCL/time-slot tables): only capability flags and migration risk points live here. Go to TSN Switch / Bridge
- PTP servo / timing algorithm deep-dive (servo loops, filter tuning): only time-domain alignment accounting lives here. Go to Timing & Sync
- PHY / EMC / TVS / CMC layout deep-dive: only interface constraints that impact migration are referenced here. Go to Protection / PHY Co-Design
- Determinism: cycle jitter (pk-pk) ≤ X, deadline misses = 0 within Y minutes
- Sync: time offset ≤ X, drift ≤ X over Y minutes
- Recovery: failover / reconvergence ≤ X
- Diagnostics: required counters & root-cause coverage ≥ X%
H2-2 · Why Migrate: What TSN adds (and what must NOT change)
Migration should be justified by engineering outcomes, not buzzwords. TSN enables convergence and predictable behavior, but a migration is only successful if legacy guarantees remain intact: cycle jitter, sync accuracy, recovery time, and diagnostic coverage.
- Traffic isolation by class (cyclic traffic protected from bursty flows)
- Predictable forwarding behavior across a converged Ethernet backbone
- Scalable segmentation (VLAN/QoS) without rebuilding application logic
- Cycle deadline: misses = 0 within Y minutes
- Cycle jitter (pk-pk) ≤ X (same or better than legacy)
- P99 end-to-end latency ≤ X under mixed load
- A unified time domain that scales across cells and lines
- Cleaner coordination between control, motion, and diagnostics timelines
- Time-aware behavior remains consistent when topology grows
- Offset ≤ X, drift ≤ X over Y minutes
- Time alignment is consistent across load and temperature (no step shifts)
- Timestamp accounting is stable (tap point and queue delay model match reality)
- Unified monitoring and field forensics across a single backbone
- Consistent discovery and segmentation for gradual expansion
- Clear separation between deterministic data and service/maintenance traffic
- Diagnostic availability: required root-cause signals ≥ X%
- Recovery time after link/event ≤ X (no surprise reconvergence)
- Cutover safety: rollback remains possible within the defined window
H2-3 · Legacy Determinism Model: POWERLINK vs SERCOS III essentials you must preserve
Migration succeeds only when the legacy determinism contract remains intact. This section extracts the minimum set of invariants that must hold during coexistence, cutover, and TSN-native phases—without expanding into protocol encyclopedias.
- Cyclic vs acyclic channels: hard real-time updates must be isolated from service bursts
- Slot/phase model: the cycle is partitioned into bounded windows
- Master scheduling skeleton: a single authority defines cycle cadence and update timing
- Cyclic communication backbone: fixed cadence and bounded per-cycle behavior
- Sync & phase alignment: consistent actuation/measurement timing across nodes
- Device timing constraints: update points must not drift under load or topology growth
Why: motion and closed-loop stability depend on predictable update timing.
Fail symptom: deadline misses, step-like jitter increases, or stability changes only under mixed load.
Pass criteria: deadline misses = 0 within Y minutes; jitter (pk-pk) ≤ X.
Why: coexistence phases fail when diagnostics bursts interfere with cyclic windows.
Fail symptom: cyclic latency spikes correlate with parameter download or service storms.
Pass criteria: cyclic margin ≥ X; jitter remains within X under burst tests.
Why: phase-aligned actuation requires consistent offset and drift bounds.
Fail symptom: drift increases with load/temperature; step offsets appear after topology changes.
Pass criteria: offset ≤ X; drift ≤ X over Y minutes; step shifts = 0.
Why: field failures become unfixable when evidence disappears after migration.
Fail symptom: “more fragile” behavior with no counter/log correlation.
Pass criteria: required signal coverage ≥ X%; first-level cause found within Z minutes.
Why: migration adds boundaries; uncontrolled reconvergence breaks production lines.
Fail symptom: link flaps, long partial outages, or repeated recovery storms.
Pass criteria: recovery ≤ X; flap rate ≤ X per hour under injected faults.
H2-4 · Migration Strategy Selector: Bridge vs Dual-Stack vs Hybrid
A strategy is correct only if it fits constraints on device modifiability, downtime, time/sync accuracy, certification pressure, and operations maturity. The selector below combines a 5-question decision tree with a score matrix to make trade-offs explicit.
Primary risk: coexistence bursts and accounting mismatch create jitter or drift at the boundary.
First checkpoint: traffic class isolation and time-domain accounting stability.
Primary risk: version/config drift and inconsistent accounting between stacks.
First checkpoint: shadow mode equivalence and rollback gates.
Primary risk: governance complexity (more phases, more boundaries).
First checkpoint: stage gates that map back to Contract A–E acceptance criteria.
- Devices modifiable? (firmware/stack update feasible at scale)
- Downtime window available? (cell-by-cell cutover possible)
- Time/sync accuracy tight? (phase alignment sensitive to drift/step offsets)
- Certification pressure high? (minimize changes to certified paths)
- Operations maturity strong? (version governance, logging, rollback rehearsed)
H2-5 · Architecture A: Bridging Gateway (Legacy island ↔ TSN backbone)
A bridging gateway must preserve the determinism contracts while translating traffic between a legacy island and a TSN backbone. This section describes internal building blocks—interfaces, classification, per-class queues, shaping hooks, timestamp tap points, diagnostics mapping, and failure modes— without expanding into TSN scheduling parameter deep-dives.
TSN port: stream recognition hooks (VLAN / QoS / stream label) for deterministic handling.
Why: prevents service bursts from stealing cyclic margin (Contract A/B).
Measure: per-class counters, drops by reason, per-class latency histograms.
Pass: cyclic jitter pk-pk ≤ X under worst-case acyclic bursts.
Why: queue coupling is a common source of new jitter and tail latency.
Measure: watermark peaks, queue delay, head-of-line blocking counters.
Pass: cyclic queue delay ≤ X; no starvation events over Y minutes.
Why: deterministic latency requires bounded interference at egress (Contract A/B).
Measure: egress service curve conformance, per-class egress latency.
Pass: cyclic egress jitter remains within X after enabling service traffic.
Why: inconsistent tap placement creates drift/step offsets after topology changes (Contract C).
Fail symptom: stable link but offset shifts after configuration changes or load variation.
Pass criteria: step shifts = 0; drift ≤ X over Y minutes.
Why: hidden queue delay becomes the new determinism bottleneck (Contract A/B).
Fail symptom: cyclic jitter increases only when service traffic is enabled.
Pass criteria: per-class delay budget stays within X under stress tests.
Why: migration fails in the field when evidence disappears (Contract D).
Measure: event coverage, drop reason taxonomy, time-correlated logs.
Pass: first-level cause found within Z minutes using counters + logs.
Why: sporadic issues require forensic context beyond counters.
Measure: snapshot completeness and correlation to failure timestamps.
Pass: snapshots reconstruct the interference chain for ≥ X% of incidents.
Quick check: cyclic FIFO watermark peaks and queue delay histograms.
Fix direction: strengthen per-class isolation and bound burst admission.
Pass: cyclic jitter pk-pk ≤ X under burst injection tests.
Quick check: tap placement sanity, step offset after changes, drift vs load.
Fix direction: align accounting boundaries and keep tap semantics consistent.
Pass: step shifts = 0; drift ≤ X.
Quick check: drop reason breakdown, policing hits, class utilization.
Fix direction: enforce admission control and reserve cyclic resources.
Pass: cyclic deadline misses = 0 within Y minutes.
- No per-class FIFO: cyclic jitter rises whenever service traffic increases.
- Hidden queue delay: pass on bench, fail under mixed load.
- Tap ambiguity: offset “steps” after topology/config changes.
- No drop taxonomy: failures are visible but not explainable in logs.
- Unbounded bursts: field diagnostics collapses deterministic margin.
- No snapshot: sporadic incidents cannot be reconstructed.
H2-6 · Architecture B: Dual-Stack Device/Controller (coexist, then cut over)
Dual-stack keeps a legacy interface available while introducing TSN semantics. A safe rollout requires clear internal layering, a staged cutover plan, and drift-resistant version/config governance. This section focuses on organization and operational gates— not on protocol deep-dive.
Best when: long-term maintenance demands consistency; object behavior must match across transports.
Primary risk: subtle semantic divergence (same name, different timing/ordering).
Pass gate: shadow-mode equivalence holds for ≥ X hours (Contract A/C/D).
Best when: TSN value is needed quickly but legacy compatibility cannot be dropped.
Primary risk: dual-write/dual-read races and inconsistent state exposure.
Pass gate: role switching does not change object behavior beyond allowed jitter (Contract A/E).
Best when: a clean long-term target is required and shim scope is well-defined.
Primary risk: shim boundary leaks create rare, hard-to-debug field issues.
Pass gate: shim behavior is fully covered by acceptance tests and logs (Contract D/E).
Actions: mirror key objects/traffic; compute equivalence metrics.
Observability: jitter, deadline miss, offset/drift, drop reason, queue watermark.
Rollback: disable feature flag if equivalence breaks beyond X.
Actions: cut over by cell/role/traffic class; keep rollback channel armed.
Observability: contract gates A–E with stress bursts and fault injections.
Rollback: revert roles if recovery time exceeds X or jitter exceeds X.
Actions: promote TSN roles; keep legacy for compatibility/rollback per policy.
Observability: drift over temperature/load; configuration change impact; incident reconstruction.
Rollback: defined rollback window + rehearsed procedure (must be operationally feasible).
H2-7 · Time & Sync Bridging: mapping legacy sync to gPTP/TSN time domain
Migration success depends on time-domain alignment: a clear authority model, explicit timestamp tap semantics, bounded queue delay accounting, and an evidence-driven error budget. This section focuses on engineering mapping and acceptance thresholds, without expanding into servo algorithm details.
- Covered: time domains, authority selection, correction points, error budgeting, and measurable gates.
- Excluded: servo filter/tuning details and protocol deep-dives (handled in Timing & Sync pages).
- Excluded: TSN scheduling parameterization deep-dive (handled in TSN switch/bridge pages).
Why: simplifies multi-domain scaling and keeps a single time reference for operations.
Fail symptom: step offsets after topology changes; cyclic jitter increases under mixed load.
Pass criteria: step shifts = 0; drift ≤ X; cycle jitter ≤ X.
Why: preserves certified behavior and minimizes changes in established production lines.
Fail symptom: drift vs temperature/load increases; measurements disagree across endpoints.
Pass criteria: offset ≤ X; drift ≤ X; no offset step after configuration changes.
Why: dual masters create hard-to-debug step events and time-domain loops.
Fail symptom: intermittent “perfect on bench, unstable in field” timing behavior.
Pass criteria: a single authority is observable; time-loop indicators remain 0.
Measure: repeatability under load steps; compare ingress vs egress deltas.
Mitigate: lock tap semantics to a stable boundary; guard against “silent tap relocation.”
Accept: tap repeatability ≤ X; no step after configuration changes.
Measure: queue watermark + per-class delay histograms (p95/p99).
Mitigate: enforce per-class isolation; bound admission for non-cyclic bursts.
Accept: queue delay variation ≤ X; cyclic deadline misses = 0.
Measure: step test across topology changes; validate consistent tap semantics.
Mitigate: preserve tap point across firmware/config changes; document as part of the contract.
Accept: step shifts = 0; drift ≤ X.
Measure: baseline calibration + temp sweep + load sweep with fixed observation windows.
Mitigate: maintain a calibration table; trigger recalibration on qualified change events.
Accept: asymmetry residual ≤ X; drift ≤ X across range.
| Error source | How to measure | How to mitigate | Acceptance (X) |
|---|---|---|---|
| Tap semantics ambiguity | Step test after config/topology change; compare ingress/egress deltas | Pin tap boundary; document contract; guard changes | step = 0; drift ≤ X |
| Queue delay variation | Watermark peaks + delay histograms (p95/p99) under burst injection | Per-class isolation; bound bursts; admission guards | Δdelay ≤ X; deadline miss = 0 |
| Asymmetry residual | Baseline calibration + periodic verification under stable topology | Calibration table + change-triggered recalibration | residual ≤ X |
| Temperature/load drift | Temp sweep + load sweep; fixed window stats | Guard max load; ensure bounded queue variation; re-calibrate on qualified changes | drift ≤ X across range |
- Lock and document ingress/egress tap semantics; treat changes as a controlled event.
- Run burst injection to validate queue-delay variation; record p95/p99 and watermark peaks.
- Perform topology/config step tests; confirm step shifts are 0.
- Execute temperature and load sweeps; verify drift remains ≤ X.
- Archive logs + snapshots for incident reconstruction with time correlation.
H2-8 · Traffic Engineering for Coexistence: mapping cyclic/acyclic to TSN streams
Coexistence fails when acyclic bursts consume cyclic margin. The mapping must be explicit: classify traffic, map to priorities/queues/stream identifiers, enforce guard rails, and verify with counters and pass criteria. This section describes capability-level mapping rules without diving into TSN formula derivations.
Identify: known cyclic endpoints/objects; periodic cadence.
Risk: starvation by bursts; queue coupling.
Identify: diagnostics/parameters; bursty patterns.
Risk: burst spikes and retry storms.
Identify: discovery, topology, ops messages.
Risk: broadcast/multicast storms.
Identify: large payloads, sustained throughput, long sessions.
Risk: buffer pressure and tail latency inflation.
| Legacy traffic class | TSN class / priority | Stream / queue (concept) | Shaping policy (capability) | Guards | Counters | Pass criteria (X) |
|---|---|---|---|---|---|---|
| Cyclic | High priority (PCP concept) | Dedicated deterministic queue | Reserved service + bounded interference | Queue guard + admission guard | deadline miss, jitter stats, queue watermark | deadline miss = 0; jitter ≤ X |
| Acyclic | Medium priority (PCP concept) | Service queue | Rate cap + burst cap | Policing + burst guard | policing hits, drops by reason, p99 latency | no cyclic impact; p99 ≤ X |
| Management | Low/medium priority (PCP concept) | Ops queue | Storm-resistant shaping | Storm guard + unknown flood guard | broadcast/mcast counts, storm triggers | storm triggers = 0; cyclic stable |
| OTA / bulk | Low priority (PCP concept) | Bulk queue | Hard rate cap + session guard | Admission + queue watermark guard | watermarks, tail latency, drops by reason | cyclic unaffected; tail ≤ X |
Mitigate: cap unknown floods and management bursts.
Pass: storm triggers = 0; cyclic jitter stays ≤ X.
Mitigate: enforce rate caps for acyclic/OTA; cap bursts.
Pass: cyclic deadline miss = 0 during service bursts.
Mitigate: bound non-cyclic admission when watermarks exceed thresholds.
Pass: watermark peaks ≤ X; cyclic margin stable.
Mitigate: staged enablement and rollback-ready procedures.
Pass: no regressions beyond X after changes.
Quick check: cyclic queue watermarks + jitter histograms + policing hits.
Fix direction: strengthen isolation; tighten burst caps; reserve cyclic service.
Pass: deadline miss = 0; jitter ≤ X.
Quick check: drops-by-reason + queue delay p99 + retry counters.
Fix direction: cap retries via admission and policing; keep cyclic protected.
Pass: p99 service latency ≤ X; cyclic stable.
Quick check: broadcast/mcast rates + storm guard triggers.
Fix direction: enforce storm guard; segregate ops queues; cap unknown floods.
Pass: storm triggers = 0; cyclic unaffected.
H2-9 · Redundancy during Migration: ring, dual paths, and failover without surprises
Migration is considered successful only when redundancy is maintained or strengthened. This section focuses on coexistence strategies, measurable failover targets, fault-injection validation, and acceptance thresholds—without expanding into protocol mechanism deep-dives.
- Covered: redundancy strategy combinations during coexistence, failover target tiers, fault injection, and acceptance metrics.
- Excluded: protocol internals for ring redundancy mechanisms (handled in the Ring Redundancy page).
- Excluded: TSN scheduling parameter deep-dive (handled in TSN switch/bridge pages).
Key risks: boundary node becomes a single point; ring events may be invisible to TSN ops tools; failover triggers differ.
Evidence needed: ring event counters + boundary logs + end-to-end recovery timing.
Key risks: legacy island may see “unexpected” switchovers; queue coupling may inflate recovery time; time-domain disturbances can mask failover quality.
Evidence needed: per-path counters + queue watermarks + time health markers during failover.
Key risks: simultaneous switching causes oscillation; loop/black-hole conditions; “stable on bench, fragile in field.”
Evidence needed: oscillation indicators = 0; deterministic traffic remains within jitter budget X.
Validate: recovery time ≤ X; jitter remains ≤ X.
Watch: oscillation and repeated switchovers.
Validate: loss ≤ X; reorder ≤ X; recovery time ≤ X.
Watch: tail-latency inflation during congestion.
Validate: zero loss; zero oscillation; deterministic counters remain stable.
Watch: boundary-node load and time-domain stability under dual-active paths.
Mitigate: stabilize trigger thresholds; prevent repeated toggling.
Accept: RTO ≤ X; oscillation = 0.
Mitigate: ensure boundary node recovery is bounded; protect deterministic queues on restart.
Accept: recovery ≤ X; deterministic behavior returns within X.
Mitigate: police non-cyclic; enforce admission gates; preserve deterministic lane.
Accept: cyclic deadline miss = 0; jitter ≤ X.
Mitigate: preserve stable configs; keep a safe fallback path.
Accept: bounded outage ≤ X; post-reboot drift ≤ X.
- Target tier: Tier-1 / Tier-2 / Tier-3 (selected based on downtime tolerance and safety impact).
- Topology: legacy ring + TSN backbone; dual-active at migration boundary; bounded single points.
- Tests: link cut, flap, power dip, congestion burst, boundary reboot (repeatable scripts).
- Acceptance: RTO ≤ X, loss ≤ X, reorder ≤ X, jitter ≤ X, oscillation = 0.
H2-10 · Diagnostics, Manageability & Security in a mixed network
A mixed network must remain operable under stress: issues must be observable, changes must be controlled, and upgrades must be trustworthy with rollback safety. This section focuses on capability-level requirements and evidence chains, not protocol deep-dives.
- Covered: minimum black-box evidence set, remote operations gating, and security guardrails for migration.
- Excluded: detailed management protocol mechanics (handled in Remote Management pages).
- Excluded: cryptographic protocol specifics (handled in Security pages).
- Per-class counters: cyclic/acyclic/mgmt/OTA counts, drops, and per-queue watermarks.
- Drop reason: policing, buffer overflow, guard triggers, and rule matches.
- Time health: timestamp continuity, offset-step detector, drift trend (metric-only).
- Context events: temperature/power/reset causes, link flap counts, and configuration change log.
- Window discipline: fixed windows (X seconds / Y minutes) and percentile statistics (p95/p99).
- Check drop reason distribution first.
- Check queue watermarks and p99 delay next.
- Correlate with time health (step/drift) markers.
- Correlate with power/temperature events and link flap history.
- Verify configuration/version truth and change logs.
- Discovery capability: topology/port state, key versions, and capability summary.
- Configuration gating: staged enablement, explicit change IDs, and rollback points.
- Version truth: build ID + config hash + capability hash to prevent “same version, different behavior.”
- Post-change evidence: counter deltas and time health baseline comparison.
- Pre-check: capture baseline counters/time health for window Y.
- Apply: enable in stages (feature flags) with bounded blast radius.
- Post-check: compare deltas; reject if regressions exceed X.
- Rollback: pre-defined trigger thresholds and safe fallback path.
- Key/cert lifecycle: defined rotation window, expiry policy, and controlled overlap.
- Trusted upgrade: signed packages, explicit version monotonicity, and failure observability.
- Safe rollback: rollback must preserve remote access and evidence retrieval.
- Change gating: security policy changes require a monitoring self-check before activation.
- At least one remote recovery channel remains available at all times.
- Upgrade failures emit a durable error code and log excerpt.
- Policy changes are staged and revert automatically if evidence gates fail.
H2-11 · Engineering Checklist (Design → Bring-up → Production)
This chapter turns migration strategy into executable gates. Each item defines a measurable pass criteria (X), and produces audit-friendly evidence (logs, counters, captures, config hashes) for regression and field service.
Design Gate — architecture, budgets, and failure containment
-
Strategy locked (Bridge / Dual-stack / Hybrid)
Tool: decision worksheet + stop-line checklist
Method: map constraints (downtime window, cert pressure, changeability, ops maturity) → strategy selection
Pass criteria: strategy doc signed; rollback path defined; cutover window ≤ X hours -
Time-domain ownership defined (legacy time vs gPTP time)
Tool: time-domain diagram + role table (GM/BC/TC)
Method: pick master/follower; define correction points (ingress/egress tap + queue delay)
Pass criteria: alignment plan exists; measurable offset/jitter targets recorded (offset ≤ X ns, jitter ≤ X ns) -
Hardware timestamp availability confirmed (endpoint + bridge + test NIC)
Tool: datasheet feature check + bring-up NIC selection
Method: ensure at least one reference capture node supports HW timestamping and stable PHC
Example parts: Intel I210-AT (timestamp NIC), NXP i.MX RT1170 (IEEE 1588 timestamp MAC), TI AM6442 (industrial comms + TSN-capable subsystem)
Pass criteria: TAP points enumerated; timestamp path documented; PHC stability within X ppb over Y minutes -
Traffic classes defined (cyclic / acyclic / mgmt / update)
Tool: class mapping table (legacy → TSN)
Method: assign VLAN PCP and per-class queues; reserve deterministic lane headroom
Pass criteria: cyclic lane protected; acyclic burst cannot exceed policing rate X; deadline miss = 0 -
Queue / buffer budget closed (FIFO, CPU, DMA)
Tool: backlog/jitter budget sheet
Method: worst-case burst + fault recovery traffic considered; buffer watermark alerts defined
Pass criteria: p99 queue delay ≤ X µs; no overflow under stress profile Y -
Redundancy target tier selected (ms failover vs zero-loss)
Tool: redundancy playbook (fault injection list)
Method: define dual paths, boundary nodes, and reconvergence behavior under cut/flap/brownout
Pass criteria: RTO ≤ X ms; packet loss ≤ X; oscillation events = 0 -
Clock / holdover component chosen (for drift + outage tolerance)
Tool: jitter/holdover requirement sheet
Method: select sync + jitter attenuator; define monitoring counters and alarms
Example parts: Microchip ZL30772 (1588/SyncE sync family), Skyworks/Silicon Labs Si5341B-D-GM (jitter attenuator)
Pass criteria: holdover meets X seconds at Y ppb; jitter masks met for interface Z -
Security root-of-trust planned (keys, identity, rollback)
Tool: key lifecycle diagram + firmware policy
Method: secure boot + signed updates + monotonic rollback prevention
Example parts: Microchip ATECC608B-SSHDA-B, NXP SE050E2HQ1
Pass criteria: key rotation plan exists; update/rollback drill defined; recovery channel preserved
Bring-up Gate — prove alignment, mapping, and stability (with evidence)
-
Capability sanity (version truth)
Tool: config hash + build id + capability bitmap
Method: freeze “golden” config; verify endpoint/bridge/switch capability exchange matches
Pass criteria: hashes match across nodes; mismatch alarm triggers within X seconds -
Timestamp tap validation (ingress/egress)
Tool: ptp4l/phc2sys (or equivalent) + packet capture w/ HW timestamps
Method: move load from idle → peak; confirm offset/jitter does not step due to queueing
Pass criteria: offset ≤ X ns; p99 jitter ≤ X ns; no step events > X ns -
Traffic classification verification (cyclic stays deterministic)
Tool: per-class counters + queue watermark + drop reason counters
Method: inject acyclic bursts (params/diag/update) while cyclic is running; observe cyclic latency margin
Pass criteria: cyclic deadline miss = 0; p99 cyclic latency ≤ X µs; policing blocks bursts above X -
Bridge pipeline stress (queue delay & head-of-line risk)
Tool: replay traffic profiles + bridge telemetry
Method: validate deterministic lane is not blocked by mgmt/OTA queues; verify shaper hooks enabled
Example parts: Microchip LAN9662 (TSN switch family), NXP SJA1105 (TSN switch family)
Pass criteria: deterministic queue max depth ≤ X; egress schedule invariance holds under load -
Protocol coexistence proof (legacy + TSN in shadow mode)
Tool: mirrored object/cycle telemetry + compare scripts
Method: run legacy as authoritative; TSN mirrors and diff-checks for drift/ordering errors
Pass criteria: diffs remain within X across Y hours; mismatch causes safe fallback -
Fault injection set (cut/flap/brownout/congestion)
Tool: link interrupter + traffic generator + power glitch profile
Method: apply single faults and compound faults; verify reconvergence without oscillation
Pass criteria: RTO ≤ X ms; loss ≤ X; oscillation=0; recovery counters consistent -
Soak test (catch rare field failures)
Tool: black-box logging + periodic counter snapshots
Method: run 24–72h; include temperature/power events; correlate with timestamp health
Pass criteria: drift slope stable; counter deltas within X; no unexplained resets/time steps
Production Gate — consistency, regression, and safe field ops
-
Golden config & regression suite
Tool: CI regression + config artifact storage
Method: every release runs time health + queue health + failover scripts
Pass criteria: no metric regression beyond X; build rejected on drift/latency increase -
Version governance (same version ≠ same behavior)
Tool: capability hash printed in logs + remote inventory
Method: field devices report config hash; mismatches triaged before enabling TSN path
Pass criteria: 100% nodes report hash; mismatch remediation time ≤ X -
Secure update + rollback drills
Tool: signed firmware + staged rollout + rollback triggers
Method: simulate failed update; verify safe fallback path remains accessible
Example parts: Microchip ATECC608B-SSHDA-B / ATECC608B-TCSM, NXP SE050E2HQ1
Pass criteria: rollback completes ≤ X minutes; device remains manageable; keys not reused -
Field forensics readiness (black-box evidence)
Tool: ring buffer logs + counter snapshot schema
Method: record drop reasons, queue peaks, time steps, temperature/power events
Pass criteria: incident package downloadable in ≤ X seconds; evidence covers last Y minutes
A gate-based flow that produces evidence artifacts at each step, reducing “works in lab, fails in field” risk.
H2-12 · Applications & IC Selection Logic (bundles for gradual migration)
Goal: protect cyclic determinism while enabling TSN backbone convergence and staged cutover.
- Step 1: Insert a bridging gateway at island boundary; lock time-domain ownership and correction points.
- Step 2: Map cyclic/acyclic classes to TSN queues; enforce policing for bursts; validate p99 latency/jitter budgets.
- Step 3: Introduce dual-stack endpoints in shadow mode; partial cutover with rollback; then full TSN-native.
Example BOM anchors: TSN switch IC LAN9662 (Microchip) / SJA1105 (NXP), sync/jitter ZL30772 (Microchip) / Si5341B-D-GM (Skyworks/Silicon Labs), secure element SE050E2HQ1 (NXP) / ATECC608B-SSHDA-B (Microchip).
Pass criteria (X): cyclic deadline miss = 0; p99 cyclic latency ≤ X µs; offset ≤ X ns; failover RTO ≤ X ms.
Goal: build a deterministic backbone that improves observability and reduces protocol silos before touching every cell.
- Step 1: Deploy TSN switching core with unified monitoring (per-class counters, queue health, timestamp health).
- Step 2: Add boundary bridges per cell; enforce traffic isolation and guard rails (storm/policing).
- Step 3: Replace cell controllers with TSN-capable platforms; convert endpoints gradually via dual-stack.
Example BOM anchors: managed/TSN switches KSZ9477 (Microchip) / LAN9662, controller/comm platform AM6442 (TI), test NIC I210-AT (Intel) for timestamp correlation.
Pass criteria (X): backbone jitter budget stable; counters consistent across ports; incident package retrievable ≤ X seconds.
Goal: achieve coexistence and diagnostics uplift without widespread endpoint changes.
- Step 1: Use gateway bridging at choke points; keep legacy islands intact; stabilize time mapping with measured correction.
- Step 2: Introduce monitoring + remote ops first (inventory, config hash, black-box evidence).
- Step 3: Plan cutover only after soak + fault injection prove no added brittleness.
Example BOM anchors: gateway class platform i.MX RT1170 (NXP) or AM6442 (TI), TSN switch LAN9662, secure element ATECC608B-SSHDA-B / SE050E2HQ1.
Pass criteria (X): no increase in cycle jitter; failover behavior predictable; rollback tested within X minutes.
- Must-have bits: HW timestamp tap visibility; per-class queues; shaping hooks; drop-reason counters; remote evidence export.
- Example parts: TI AM6442 (industrial comms + TSN-capable subsystem), NXP i.MX RT1170 (IEEE 1588 timestamp MAC), Intel I210-AT (timestamp reference NIC for validation).
- Pass linkage: maps to H2-11 time alignment + traffic classification + fault injection evidence.
- Must-have bits: queue visibility + policing/storm guard; port mirroring; telemetry counters; deterministic lane isolation.
- Example parts: Microchip LAN9662 (TSN switch family), Microchip KSZ9477 (managed switch family), NXP SJA1105 (TSN switch family).
- Pass linkage: maps to H2-11 p99 queue delay, cyclic protection, failover predictability.
- Must-have bits: stable latency; robust link; EEE behavior understood; diagnostics counters exposed (where available).
- Example parts: Analog Devices ADIN1300 (Gigabit PHY family), TI DP83822I (10/100 PHY family), TI DP83867IR (Gigabit PHY family).
- Boundary note: PHY SI/EMC/layout deep-dive belongs to the PHY/Protection sub-pages; only capability selection is handled here.
- Must-have bits: holdover target; jitter attenuation; monitoring hooks for drift and lock events.
- Example parts: Microchip ZL30772 (SyncE/1588 sync family), Skyworks/Silicon Labs Si5341B-D-GM (jitter attenuator family).
- Pass linkage: maps to H2-11 offset/jitter stability and “no time steps” requirement.
- Must-have bits: secure boot support; signed updates; measured identity; key rotation readiness.
- Example parts: Microchip ATECC608B-SSHDA-B / ATECC608B-TCSM, NXP SE050E2HQ1.
- Pass linkage: maps to H2-11 production gate: safe update + rollback drills.
For mixed networks, boundary nodes may require dedicated industrial communication silicon to keep legacy islands stable during coexistence.
- EtherCAT slave controllers: Microchip LAN9252, Beckhoff ET1100, TI AMIC110.
- Multiprotocol SoC (industrial): Hilscher netX 90 (protocol-enabled boundary designs).
- Pass linkage: keep legacy determinism contract intact while TSN backbone is introduced.
Deliver migration as bundles (bridge → dual-stack → TSN backbone) with a measurable evidence trail at every stage.
Recommended topics you might also need
Request a Quote
H2-13 · FAQs (Migration troubleshooting, no scope expansion)
These FAQs close long-tail troubleshooting during coexistence and cutover (bridge / dual-stack / time domain / traffic mapping / redundancy / diagnostics). Each answer is constrained to a first-principles check, minimal fix, and measurable acceptance criteria (X).
Legacy looks stable, but TSN cutover jitters — time alignment or queue mapping?
Likely cause: time-domain correction is incomplete (tap points or queue delay not accounted), OR cyclic traffic lost deterministic priority after mapping.
Quick check: compare pre/post-cutover offset/jitter trend + step events; check per-class queue watermarks and cyclic deadline-miss counters under the same load.
Fix: lock correction points (ingress/egress tap + queue delay accounting) and re-apply cyclic class protection (priority + policing for bursts) before enabling full cutover.
Pass criteria: offset ≤ X ns; jitter(p99) ≤ X ns; cyclic latency(p99) ≤ X µs; deadline miss = 0 over Y minutes.
Dual-stack shows the “same version”, but interoperability fails — capability exchange or config drift?
Likely cause: same software tag but different capability bits or runtime configuration (feature flags, queue policy, time role) causing incompatible behavior.
Quick check: compare build-id + config-hash + capability bitmap between nodes; verify the negotiated roles (master/follower) are consistent across cutover boundary.
Fix: enforce “version truth” (hash-based gating) and deploy a golden config artifact; block cutover unless capability+config are identical for the required profile.
Pass criteria: 100% nodes report matching config-hash/capability-hash; interoperability error rate ≤ X per Y hours; rollback success ≤ X minutes.
Acyclic diagnostics bursts cause cyclic deadline misses — missing policing/shaping guard?
Likely cause: acyclic traffic shares a queue/priority path with cyclic, or burst policing is absent so diagnostics steals deterministic headroom.
Quick check: correlate deadline-miss timestamps with acyclic burst counters; inspect queue watermark spikes and drop-reason distribution by class.
Fix: separate classes (cyclic vs acyclic) into distinct queues; enable burst policing/storm-guard for acyclic; reserve deterministic lane headroom.
Pass criteria: cyclic deadline miss = 0; acyclic policing triggers for bursts > X; cyclic latency(p99) ≤ X µs under stress profile Y.
Bridge works in the lab, but fails in the plant — asymmetry, topology calibration, or event storms?
Likely cause: real deployment adds path asymmetry or uncalibrated topology; field events (link flaps, management storms) starve deterministic processing.
Quick check: compare plant vs lab: offset step frequency, queue delay distribution, and event/log rate; validate boundary path assumptions (active path count, link state changes/hour).
Fix: apply topology calibration for the active boundary; rate-limit management/diagnostics; isolate deterministic lane from log/telemetry spikes.
Pass criteria: event storm rate ≤ X/min; offset steps > X ns = 0; cyclic latency(p99) ≤ X µs in plant for Y hours.
Failover after a link break is slower than expected — redundancy path or cutover state machine?
Likely cause: the intended redundant path is not “hot” during coexistence, or cutover logic waits on stale health/role conditions before switching.
Quick check: inject a single link cut and measure: detection time, decision time, and forwarding re-enable time; verify both paths are active/eligible at the boundary.
Fix: keep both paths pre-qualified (health + roles + counters) and simplify the cutover gate conditions; enforce deterministic failover rules with evidence logging.
Pass criteria: RTO ≤ X ms; loss ≤ X packets; oscillation events = 0 across N fault injections.
Timestamp health looks OK, but motion still drifts — tap point or latency accounting mismatch?
Likely cause: health metrics observe the wrong tap, or latency components (queueing, buffering, forwarding) are not included consistently end-to-end.
Quick check: compare “reported health” vs “measured phase”: run controlled load steps and see if drift correlates with queue delay, FIFO watermarks, or role transitions.
Fix: align the measurement tap points (ingress/egress) and standardize latency accounting; bind motion triggering to the verified time domain.
Pass criteria: drift slope ≤ X ns/min; offset(p99) ≤ X ns; no load-correlated phase shift > X ns.
Cutover succeeds, but rollback does not restore stability — partial state left behind?
Likely cause: rollback reverts the “mode” but not the dependent state (queue policy, policing limits, time role, cached calibration).
Quick check: diff pre-cutover vs post-rollback config-hash; compare per-class queue policies and time-role flags; check whether calibration values remain changed.
Fix: implement atomic rollback (mode + config + calibration) and require a post-rollback bring-up gate (time align + traffic map sanity) before resuming operation.
Pass criteria: rollback completes ≤ X minutes; config-hash matches baseline; cyclic deadline miss = 0 over Y minutes after rollback.
Cyclic looks fine at low load, but breaks at peak — queue watermark or head-of-line blocking?
Likely cause: at peak load, shared resources (FIFO/DMA/CPU) introduce head-of-line blocking, pushing cyclic traffic beyond its jitter budget.
Quick check: capture p95/p99 queue delay and max watermarks under peak; verify deterministic queue is not served behind non-deterministic bursts.
Fix: isolate deterministic queue servicing, reserve bandwidth/headroom, and set hard watermarks + alerts to prevent silent saturation.
Pass criteria: queue delay(p99) ≤ X µs; deterministic queue max watermark ≤ X% for Y minutes at peak load; deadline miss = 0.
gPTP offset is stable, but end-to-end phase shifts with temperature — unmodeled delay budget?
Likely cause: time-domain offset is stable, but path delay components drift (buffering, scheduling, or boundary calibration) beyond the application’s phase tolerance.
Quick check: trend phase error vs temperature/power events; compare queue delay and forwarding latency statistics across temperature steps.
Fix: add explicit drift budget to acceptance criteria and re-calibrate boundary timing under representative thermal/load conditions; alarm on slope violations.
Pass criteria: phase drift slope ≤ X ns/°C; phase(p99) ≤ X ns across temp range; no temperature-correlated steps > X ns.
Counters show “low utilization”, but the plant feels stuck — window/denominator mismatch?
Likely cause: metrics are computed with an inconsistent time window or denominator, hiding burstiness and short blocking events that break determinism.
Quick check: re-sample with a shorter fixed window (X ms / X s) and segment by traffic class; compare p99 latency and peak queue watermark rather than averages.
Fix: standardize metric definitions (window, denominator, class split) and trigger alarms on peak/burst indicators, not only average utilization.
Pass criteria: metric windows consistent across nodes; burst indicator (peak watermark) ≤ X; p99 cyclic latency ≤ X µs during Y hours operation.
Bridge introduces sporadic “micro-pauses” — event storm starving deterministic processing?
Likely cause: logging/telemetry/management bursts consume CPU/FIFO or share queues, temporarily delaying cyclic forwarding even if average load is low.
Quick check: correlate pause timestamps with log rate spikes, management counters, and queue watermark peaks; inspect drop-reason codes during pauses.
Fix: rate-limit non-deterministic events, move logs to a dedicated queue/buffer, and hard-isolate deterministic lane service from diagnostics.
Pass criteria: pause events > X µs = 0 over Y hours; event/log rate ≤ X/min; cyclic deadline miss = 0.
Redundancy recovers, but produces transient reorder/duplicate — inconsistent boundary forwarding rules?
Likely cause: during coexistence, boundary nodes apply mismatched forwarding/filter rules on the two paths, causing brief duplicate or reorder when roles switch.
Quick check: inject a single link cut and capture per-path counters; compare duplicate/reorder indicators and boundary state transitions (role + forwarding enable).
Fix: align boundary forwarding policies across both paths; enforce deterministic “switch-over sequence” with explicit state logging and post-switch sanity checks.
Pass criteria: duplicate/reorder count ≤ X per event; RTO ≤ X ms; oscillation=0 across N injections.