123 Main Street, New York, NY 10001

Industrial Ethernet / TSN Endpoint Hardware Design Guide

← Back to: IoT & Edge Computing

A TSN endpoint is “qualified” only when its sync accuracy, latency tails, and error behavior are measurable and explainable from endpoint-side evidence (timestamps, queues, PHY/PLL states). Build determinism by controlling timestamp/clock domains and queue shaping, then prove it with a repeatable field evidence chain that links symptoms to counters, logs, and event correlation.

H2-1 · Boundary & Ownership

TSN Endpoint Engineering Boundary: What It Owns vs What It Doesn’t

A TSN endpoint is not “a device that supports a TSN feature list.” In practice, it is the port-side determinism and time-truth module of an end station. Its value is measured by whether the endpoint’s contribution to error and latency is controllable, measurable, and explainable.

Owns (This Page) PHY/MAC path, queues & shaping/gating, hardware timestamps, clock tree & jitter, isolation partition, power/reset integrity, endpoint port ruggedness.
Does Not Own Network-wide scheduling tutorials, Grandmaster selection details, whole-topology design, gateway/cloud aggregation, PLC application logic.
Success Standard Endpoint error is budgeted and verified: timestamp bias is calibratable, determinism sources are separable, failures leave evidence.

In an industrial TSN deployment, the endpoint must deliver two outcomes at the port boundary:

  • Deterministic forwarding behavior at the device port — critical traffic is mapped to the intended queues, shaped/gated predictably, and not derailed by best-effort bursts.
  • Time truth that survives real-world noise — hardware timestamps are taken at a known point (ingress/egress path), clock domains stay coherent, and timestamp drift/jitter can be traced to specific causes (clock, queueing, software disturbance, or EMC events).

Acceptance checklist (endpoint-owned, measurable):

  • Timestamp path is defined and testable: PHY-vs-MAC location is known, ingress/egress capture points are documented, and a repeatable method exists to validate bias/offset under controlled load.
  • Determinism sources are separable: queueing effects can be distinguished from software effects (ISR/thread jitter) using counters, traces, and controlled experiments.
  • Failures leave evidence: queue drops, MAC/PHY errors, PLL/lock status changes, and reset events are captured with durable logs or monotonic timestamps.
Figure E1 — TSN endpoint boundary (what this page owns)
Network Domain TSN Endpoint (This Page) Application Load Grandmaster clock source (external) TSN Switch network scheduling outside scope PHY / MAC Path Queues & Shaping HW Timestamps Clock / PLL Isolation & Power Control / IO cycles, deadlines, jitter pain Sensors / Actuators real-world disturbances Bold border = endpoint-owned engineering surface (must be measurable & explainable)
Diagram goal: make scope unmistakable. The endpoint box is where determinism, timestamps, clocking, isolation, and power integrity must be engineered and proven.
H2-2 · Determinism = Controlled Variables

Where Determinism Comes From: Endpoint Latency Components and Control Levers

Determinism is not a single feature. It is the outcome of a latency stack whose variance is dominated by a few controllable layers. The endpoint must turn each layer into a named variable with a measurement point and a control lever.

A practical endpoint latency decomposition can be expressed as: PCS/MAC → Queue → DMA/Memory → ISR/Thread → App. Each stage can introduce “unknown delay” unless its ownership and evidence are defined.

Hardware-controllable first Queue mapping, queue depth, shaping/gating policies, preemption support, timestamp capture point, clock source selection.
Software risk to isolate Interrupt storms, priority inversion, cache/memory contention, DMA burst timing, logging overhead, background tasks.
Evidence that ends debate Queue occupancy/drops, MAC/PHY error counters, timestamp deltas (ingress↔egress), ISR frequency, task latency traces.

Common endpoint “loss of determinism” patterns (symptom → likely layer → strongest evidence):

  • p99/p999 spikes during best-effort bursts → Queue layer → queue depth/occupancy trend + drops on the intended priority queue.
  • Critical flow appears in the “wrong” queue → Mapping/config layer → per-priority counters show unexpected distribution across queues.
  • Periodic jitter aligned with CPU load → ISR/Thread layer → ISR rate, task scheduling latency trace, lock contention markers.
  • Offset/jitter worsens with temperature or EMC events → Clock/port robustness → PLL/lock status changes + error bursts + timestamp variance growth.

A rigorous debug sequence avoids mixing variables:

  • Freeze software disturbance: run a minimal workload, lock CPU frequency policies if possible, reduce background interrupts, and keep traffic patterns stable.
  • Validate the hardware path: confirm timestamp stability under controlled load; then validate queue behavior (mapping, shaping/gating) using counters and deterministic test patterns.
  • Re-introduce software complexity gradually: add application tasks one by one and watch which layer begins to widen the latency distribution.
Figure E2 — Endpoint latency stack (layers, levers, and evidence)
Endpoint Latency Stack Each layer needs a control lever + evidence signal (avoid “unknown delay”) PCS / MAC ~ ns to low-µs (path fixed) Queues / Shaping ~ µs (variance lives here) DMA / Memory ~ µs (contention-driven) ISR / Thread ~ µs to ms (software risk) Application Timing control loop / IO deadlines Control levers / Evidence queues, gates, counters, timestamps, traces LEVER EVIDENCE TP1 TP2 TP3 TP4 Tip: stabilize software first, then prove hardware timestamp + queue behavior, then scale complexity.
Diagram goal: convert “determinism” into a debuggable model—each layer has a lever and a measurable signal. This prevents chasing symptoms across layers.
H2-3 · Reference HW Architecture

Reference Hardware Architecture: SoC/MCU + TSN PHY/Switch + Timestamp Unit

A TSN endpoint becomes “deterministic” only when three paths are engineered together: data path (PHY/MAC/queues), time path (timestamp unit + timebase), and robustness path (isolation + power/reset). A reference diagram should make these paths explicit—otherwise timestamp drift and p99 spikes become unexplainable.

Architecture A: SoC/MCU + External TSN PHY Common for 1–2 ports. Simpler BOM and shorter internal pipeline. Determinism often depends on queue mapping and software disturbance control.
Architecture B: SoC + Small Switch Silicon / Multi-port Bridge Common for multi-port devices and dual-port redundancy. Stronger port-side queueing, but more internal buffering/mapping variables to validate.

Regardless of topology, a TSN endpoint design review must answer three “ownership” questions with concrete artifacts:

  • Timebase ownership: which oscillator/PLL feeds the timestamp counter, and which status flags prove lock/health under temperature and noise.
  • Timestamp insertion point: where ingress/egress timestamps are captured (PHY-side vs MAC-side), and how bias changes under load can be measured.
  • Isolation boundary: what crosses isolation (data/management/time), how grounds are referenced, and how port events (ESD/EFT) are prevented from corrupting time truth.

Board-level interfaces (point only):

RGMII / SGMII / USXGMII decisions should be treated as time-path and debug-visibility decisions, not as “protocol lessons.” The chosen interface must preserve a clear clock-domain story and allow reliable diagnostics (counters, timestamp deltas, lock/reset flags).

Figure E3 — TSN endpoint block diagram (data / time / power with isolation)
Reference TSN Endpoint Architectures Data path (blue) · Time path (dark) · Power/reset path (light) · Isolation boundary (dashed) Data Clock/Time A) SoC/MCU + External TSN PHY (1–2 ports) B) SoC + Small Switch Silicon / Multi-port Bridge Industrial Port RJ45/Magnetics Isolation TSN PHY PCS/MAC edge SoC / MCU Queues · DMA · ISR TSU HW TS XO/PLL Lock Timebase Isolated power rails / reset sequencing Ports Multi-port / Dual-link Isolation Small Switch Queues · Gates SoC / MCU App · DMA · Logs TSU XO/PLL Timebase Isolated DC/DC · rails · reset/lock status
Diagram usage: use it as a review template. If the timebase, timestamp insertion point, and isolation boundary are not explicitly marked, endpoint drift and long-tail latency will be hard to diagnose in the field.
H2-4 · TSN Feature Checklist

Which TSN Features an Endpoint Needs: Must-have vs Optional (with Field Evidence)

TSN features should be selected as control levers tied to field symptoms and endpoint evidence. A feature list without counters, timestamps, and lock/reset visibility does not reduce risk.

Must-have (baseline determinism) 802.1AS/gPTP (end-station timing), hardware timestamps, multi-queue priority handling.
Common (pull down p99) CBS (Qav), TAS gating (Qbv), frame preemption (Qbu/802.3br) when large frames create long-tail blocking.
Application-dependent (name only) Redundancy features (FRER/CB) and related counters/interfaces—avoid network-level configuration details here.

The table below is written for field bring-up: each feature is mapped to a typical symptom and the first “evidence signal” to check. Evidence is intentionally endpoint-local (queue counters, timestamp deltas, lock/reset flags).

Baseline timing + timestamps

Feature 802.1AS/gPTP (end station)
Field symptom if missing/weak offset drift, unstable sync under load/temperature
First evidence to check timestamp deltas, lock status flags, reset events

Hardware timestamping

Feature HW TSU (ingress/egress capture)
Field symptom if missing/weak sync “looks OK” but control still jitters
First evidence to check ingress↔egress TS bias vs load, repeatability

Queueing and priority control

Feature Multi-queue + priority mapping
Field symptom if missing/weak p99 spikes during best-effort bursts
First evidence to check per-queue occupancy/drops, class-to-queue counters

Traffic shaping

Feature CBS (Qav)
Field symptom if missing/weak microbursts inflate jitter and tail latency
First evidence to check queue level oscillation, shaped class rate vs expectation

Time-aware scheduling at port

Feature TAS gating (Qbv)
Field symptom if misapplied periodic stalls, regular p99 “comb” pattern
First evidence to check queue counters vs window, egress TS shows periodic steps

Frame blocking control

Feature Preemption (Qbu / 802.3br)
Field symptom if needed but absent large frames cause long-tail blocking of critical frames
First evidence to check tail latency correlates with MTU/big-frame bursts

Application-dependent redundancy (name only)

Feature FRER / CB (endpoint counters & interfaces)
Field symptom if under-instrumented duplicate/drop behavior cannot be proven
First evidence to check per-stream counters, duplicate/drop stats, event logs

Deployment order (prevents chasing mixed variables):

  • Prove time truth first: stable timebase + repeatable hardware timestamps under controlled load.
  • Prove queue truth next: correct class-to-queue mapping + counters that match expectations.
  • Then add shaping/gating: CBS/TAS/preemption one by one, validating symptoms against evidence signals.
  • Only then scale application load: re-check tail latency and sync stability after software complexity increases.
Figure E4 — Feature → symptom → evidence map (endpoint-local)
Endpoint Feature Map Keep it local: symptoms and evidence must be visible at the endpoint Feature Field Symptom First Evidence 802.1AS / gPTP offset drift / unstable sync timestamp deltas PLL lock / resets HW Timestamp (TSU) sync “OK” but jitter remains ingress↔egress bias vs load Multi-Queue p99 spikes on bursts queue occupancy per-queue drops CBS (Qav) microbursts inflate jitter rate vs expectation TAS (Qbv) periodic stalls / comb p99 window vs counters egress TS steps Preemption big frames block critical tail vs MTU bursts Rule: add features one-by-one, verifying time truth → queue truth → shaping/gating → application load.
Diagram goal: keep decisions evidence-driven. If evidence signals are unavailable, feature claims cannot be verified during bring-up or field failures.
H2-5 · Timestamp Placement

Where Hardware Timestamps Are Taken: PHY vs MAC, 1-Step vs 2-Step, Ingress vs Egress

Timestamp accuracy is determined by the measurement boundary. The capture point decides which variables are included: PHY/MAC pipeline delays, queueing/gating variability, and software disturbance. Sub-µs stability requires a timestamp path that is measurable, calibratable, and compensatable.

PHY timestamp Closer to the cable boundary. Reduces MAC/queue uncertainty in the measurement. Requires correct cross-domain alignment and link/interface delay compensation.
MAC timestamp Convenient integration. Risk: queueing, shaping, and gating can become part of the timestamp boundary—tail latency and load can directly distort apparent timing.
Ingress / Egress meaning Ingress observes “arrival truth.” Egress observes “departure truth.” Using both helps separate clock quality from internal endpoint variability.

A practical endpoint decision rule:

  • If load changes alter timestamp bias, the capture point is likely including queueing/gating variability (or the timebase is unstable).
  • If temperature/EMC events alter timestamp noise, the timebase/PLL/CDC path is likely contributing jitter or readout inconsistency.
  • Sub-µs stability demands a known capture point plus a repeatable method to measure fixed offsets and validate compensation under controlled traffic.

1-step vs 2-step (engineering meaning only):

  • 1-step: hardware inserts/corrects timing at the transmit boundary. Requires stable transmit pipeline timing and strong HW support at the insertion point.
  • 2-step: hardware/software provides a correction based on the actual transmit instant. Integration is flexible, but the correction must remain consistent with the true egress timing and be verifiable.
Figure E5 — Timestamp pipeline and capture-point boundaries
Timestamp Pipeline (Ingress / Egress) Capture point defines the measurement boundary (avoid mixing queue variability into timing truth) Wire → PCS → PHY → MAC → Queue/Gate → DMA/Memory → CPU Cable PCS PHY MAC Queue Gate/Shaper DMA CPU Variable Delay Zone queueing · gating · DMA contention · ISR T_in (PHY) T_in (MAC) T_out (MAC) T_out (PHY) Transmit timing options 1-step insert at egress boundary needs stable pipeline 2-step send then correct prove consistency Requirement for sub-µs stability measurable offset · calibratable bias · compensatable delays Field evidence signals timestamp deltas vs load · queue counters · lock/reset flags
Use this diagram to define measurement boundaries. If the capture point includes the variable-delay zone, apparent “time error” will track load and gating behavior.
H2-6 · Clock Tree & Jitter

Clock Tree and Jitter Budget: XO/TCXO, PLL/Jitter Cleaner, SyncE, and Timestamp Domains

A timestamp is only as good as its timebase. Clock noise turns into timing noise when the timestamp counter’s edges are unstable, when lock transitions create steps, or when clock-domain crossings corrupt readout consistency. Endpoint engineering must make the clock tree and domains auditable with clear “health evidence.”

Timebase sources XO/TCXO direct, PLL/jitter-cleaned, optional SyncE-recovered input path. The endpoint must state which one feeds the timestamp counter.
Where jitter becomes time error Phase noise, supply noise, and EMC events perturb clock edges—seen as wider offset/jitter distributions and occasional step changes on lock transitions.
CDC (domain crossing) risk PHY/TSU capture runs in one domain; CPU reads in another. Readout must preserve monotonicity and consistency, especially under load.

Endpoint holdover (short-term resilience):

When reference quality drops or lock is lost, the endpoint should keep a short window of timing stability, record the event, and avoid silent “time truth collapse.” The goal is not network algorithm design—only local stability and evidence.

  • Make the clock tree explicit: identify the timestamp counter’s clock source and the lock/health status signals.
  • Separate domains: MAC/PHY domain, TSU domain, CPU/RTOS domain—then define the CDC bridge and readout guarantees.
  • Instrument health: lock state changes, frequency/phase alarms (if available), reset reasons, and event logs with monotonic time.
Figure E6 — Clock tree & domains (error injection points, CDC, holdover)
Clock Tree & Domains Timebase source → conditioning → domains → CDC readout (instrument lock and events) Clock sources XO / TCXO SyncE input (optional) PLL / Jitter Cleaner LOCK ALARM Domains TSU domain timestamp counter MAC/PHY domain pipeline & queues CPU/RTOS domain readout · logs Holdover (local) short-term stability + evidence CDC bridge monotonic readout supply noise EMC events lock transition Evidence to log: lock state changes · alarms · resets · timestamp monotonicity checks
Diagram usage: treat the clock tree as an auditable subsystem. If lock/alarms/CDC guarantees are not visible, timing failures become “unprovable” in the field.
H2-7 · Embedded Switching

Embedded Switching in TSN Endpoints: Two-Port Redundancy and Multi-Port Determinism

A multi-port endpoint is not a “network switch” by role, but it inherits switch-like variables: forwarding mode, internal buffering/queue contention, and priority mapping consistency across ports. Determinism improves only when these added delay terms are measurable, explainable, and repeatable.

When an endpoint needs embedded switching (endpoint-only scenarios):

  • Two-port redundancy: dual uplinks or path diversity where the device must maintain deterministic behavior across either path.
  • Daisy-chain devices: the unit forwards traffic onward while still producing/consuming time-sensitive flows.
  • Multi-port equipment: multiple downstream sub-devices or segments, requiring consistent class/queue treatment at each port.

Determinism impact: the “extra variables” introduced by internal forwarding

Forwarding mode (store-and-forward vs cut-through) Forwarding choice can introduce frame-length dependence and new latency “steps.” If the delay distribution shifts with MTU/frame size, forwarding behavior is likely part of the boundary.
Internal buffering & contention Multi-port implies more places where traffic competes: ingress buffers, egress queues, and shared fabric resources. Background bursts on one port can leak into another port’s tail latency if isolation is weak.
Priority mapping consistency (per-port → internal → per-port) More ports mean more mapping tables and more ways to accidentally diverge. Determinism fails when identical traffic classes are treated differently on different ports or paths.

Engineering conclusion (no switch tutorial):

Multi-port endpoints add internal delay terms. Determinism requires an evidence chain that can separate: port entryinternal forwarding/queuesport exit, and confirm consistent class/queue behavior across paths.

Mode → field symptom → first evidence (endpoint-facing triage cues):

  • Store-and-forward effects: tail latency changes with frame length → compare latency distribution across MTU/profile changes.
  • Contention/leakage: one port’s background burst degrades another port’s p99/p999 → correlate queue occupancy/counters (if available) with latency spikes.
  • Mapping divergence: same class behaves differently by port/path → audit per-port classification and internal-to-egress queue mapping consistency.
Figure E7 — Two-port path comparison: dual PHY vs embedded switch core
Two-Port Endpoint Architectures More ports add delay terms → require evidence chain (avoid “it feels like a switch” ambiguity) A) Dual PHY → CPU (no internal forwarding fabric) Port A PHY Port B PHY MAC / Queues per-port shaping MAC / Queues consistent mapping SoC/MCU + TSU timestamp · queues · app B) PHYs + Embedded Switch Core → CPU Port A PHY Port B PHY Switch Core forwarding · buffers · map S&F / Cut-through Buffer Contention Priority Mapping SoC/MCU + TSU readout · logs · app Evidence chain for multi-port determinism port entry → internal queues/buffers → port exit → logs/counters correlated with p99/p999
Diagram usage: architecture B adds forwarding/buffer/mapping variables. Determinism requires measuring these new terms rather than treating them as “invisible switch behavior.”
H2-8 · Isolation Strategy

Isolation Strategy: Data Isolation vs Power Isolation vs Shield/Ground—Avoiding Conflicts

Industrial Ethernet failures often originate from ground potential differences and common-mode disturbance. Isolation must be designed as a system of partitions: data barrier, power barrier, and shield/return paths. Mixing these goals causes “good isolation on paper” but unstable link quality and timing truth in the field.

Three goals (do not mix the intent):

  • Safety isolation: protect people/equipment. This page names the partition; detailed safety standards remain out of scope.
  • Common-mode/GPD resilience: tolerate large ground shifts without injecting noise into PHY/clock/timestamp domains.
  • Shield/return-path control: guide disturbance currents to chassis/earth paths instead of signal reference nodes.

Board-level isolation placement (endpoint-only):

Interface side (near RJ45/magnetics) Controls how external disturbance couples into the front end. Shield/chassis strategy must be explicit to prevent noise returning through signal ground.
PHY/digital partition Protects PHY/TSU/clock domains from common-mode injection. Ensure the isolated reference is well-defined to avoid “floating reference” surprises.
Management side (MDIO/I²C/diagnostics) Often overlooked. A management path can bypass the isolation wall and reintroduce noise unless its reference and routing are controlled.

Common pitfalls (symptom-oriented):

  • Shield termination ambiguity (one-end vs both-ends): creates disturbance current paths that can couple into sensitive references → observe link errors or timing instability after ESD/EMC events.
  • Post-isolation reference mismatch: isolated domains drift or are “pulled” by unintended coupling → see offset/jitter worsen without obvious high BER.
  • Protection/CMC return paths unclear: surge energy couples into signal ground → see event-aligned link retraining or lock transitions.

Boundary reminder: deeper EMC surge path design and lightning/impulse event logging details belong to the sibling page “EMC / Surge for IoT”. This page focuses on endpoint partitions and the minimum evidence needed to avoid unprovable failures.

Figure E8 — Isolation partitions: interface → data wall → digital domain → power wall → system ground
Isolation Partitions (Endpoint Board Level) Separate goals: safety · common-mode/GPD resilience · shield/return-path control Interface domain RJ45 Shield to chassis Magnetics TVS CMC return path DATA ISO Digital domain PHY / MAC TSU / Timestamp PLL / Clock MCU / Logs Power partition Isolated DC/DC System Rails PWR ISO Chassis / Earth Signal GND (controlled) avoid noise into signal GND
Diagram usage: define partitions and return paths first. Many “mysterious” TSN/PTP instabilities are return-path conflicts that inject disturbance into clock/timestamp domains.
H2-9 · Power & Sequencing

Endpoint Power Rails & Bring-Up Sequencing: Isolation Supply, Domains, Brownout, and Recovery

Intermittent sync loss or link instability often originates from power/reset/clock state machines. TSN determinism depends on a stable time base: PLL lock, a valid timestamp counter, and a clean link-up window. When these prerequisites drift under brownout or ripple, failures look like “configuration problems” but resist repro.

Typical rail domains to treat as separate engineering objects (domain → why it matters):

CORE Switch/PHY/SoC core logic. Undervoltage or noise can trigger internal resets, fabric stalls, or hidden retry loops.
I/O Straps, management pins, and digital interfaces. Marginal I/O ramps can cause wrong mode selection or inconsistent bring-up paths.
ANALOG / SERDES PHY analog front-end and SerDes bias. Ripple can raise BER/CRC or cause retraining without obvious “hard faults.”
PLL / CLOCK Jitter-sensitive domain. Supply ripple and ground bounce translate into phase noise → timing instability and timestamp truth degradation.
Isolated-side DC/DC Isolation supply ripple may couple into clock/timestamp domains through reference and return paths; treat ripple and transient response as part of the timing budget.

Bring-up prerequisites for trustworthy time:

  • PG valid across required rails → reset release after rails settle → PLL lock stabletimestamp counter reset/enablelink up stable.
  • Timestamp must be monotonic under load; “mostly correct” time is not deterministic time.
  • Record at least: reset cause brownout flag PLL lock transitions link flap count.

Symptom → first evidence → most likely boundary (power/state-machine first):

  • Offset/jitter suddenly steps up without obvious link down: check PLL_LOCK transitions and timestamp monotonicity → suspect PLL/clock rail ripple or brownout recovery path.
  • CRC increases and retraining repeats: correlate event time with error counters → suspect analog/SerDes rail transient response and port-side disturbances coupling into rails.
  • Reboot is “sometimes good, sometimes not”: compare power-up ordering and reset timing windows → suspect I/O/strap sampling window and inconsistent reset release.

PoE note (boundary-safe): PoE may change ramp rate and hold-up behavior. This page only uses PoE as a reminder to verify the endpoint’s PG/reset/PLL lock/link-up timeline rather than expanding PoE system design.

Figure E9 — Power & reset timing: rails → PG → reset → PLL lock → timestamp enable → link up
Power & Reset Timing (Endpoint) Deterministic time requires stable rails + correct reset/lock ordering time → Brownout window Recovery window CORE ANALOG PLL/CLOCK PG RESETn PLL_LOCK TS_EN LINK_UP PG valid reset release lock stable TS counter enable link up rails settle PLL lock TS enable link up Evidence to log reset cause · brownout flag PLL lock edges · link flaps
Diagram usage: treat PG/reset/lock as a state machine. Brownout often produces “soft failures” (timing truth degraded) before a full link-down is visible.
H2-10 · Port Protection

Industrial Port EMC/ESD/Surge: The Minimum Viable Protection Stack for Endpoints

Port-side protection is an engineering trade-off: protection strength vs signal integrity margin vs timestamp/clock sensitivity. A practical endpoint design starts with a minimum series stack from the connector to the PHY, and adds test points (TP) so failures become measurable events rather than guesswork.

Minimum viable protection stack (RJ45 → PHY) (series path, endpoint-only):

RJ45 / Shield handling Define shield termination and return path to chassis/earth so disturbance currents do not flow through signal reference nodes.
TVS / ESD element Clamp fast transients at the boundary. Overly aggressive capacitance can reduce SI margin and amplify “marginal link” behavior.
Common-Mode Choke (CMC) Suppress common-mode energy. Placement and selection must avoid unintended degradation of differential path margin.
Magnetics / filtering Manage isolation and frequency behavior. Changes here can shift link training behavior and noise coupling paths.
PHY (sensitive endpoint) Where protection and SI decisions become real: error counters, retraining, and timing truth can all be impacted.

Trade-off framing (endpoint viewpoint):

  • Stronger protection can add parasitics → lower SI margin → more retraining and higher CRC under stress.
  • Cleaner SI with weak clamping can yield event-driven link flaps after ESD/EFT/surge exposure.
  • Timing truth can degrade without catastrophic BER if disturbance couples into clock/PLL/timestamp domains.

Evidence-driven triage after events (symptom → first evidence → likely stack segment):

  • After ESD: sync degrades or offset steps → check PLL lock and timing logs → suspect shield/return-path conflicts and coupling into clock domain.
  • After EFT: CRC spikes → correlate error counter burst with event timing → suspect CMC/return path and PHY analog resilience.
  • After surge: link flaps → check link down/up timeline and retrain counters → suspect energy dissipation path and protection stack stress points.
Figure E10 — Port protection stack with test points (TP): RJ45 → TVS/ESD → CMC → magnetics → PHY
Port Protection Stack (Minimum Viable) protection ↔ SI margin ↔ timestamp/clock sensitivity Cable RJ45 Shield to chassis Chassis TVS/ESD clamp CMC Magnetics isolation PHY TSU/Clock TP1 TP2 TP3 TP4 TP5 Trade-offs Protection strength SI margin Timestamp/clock sensitivity
Diagram usage: keep the stack minimal and measurable. Add TP markers so ESD/EFT/surge symptoms can be correlated with counters and timing truth.

H2-11 · Bring-up & Integration Checklist (Endpoint View)

This chapter turns “it links” into “it is measurable, explainable, and reproducible”: a fixed bring-up order, mandatory configuration traceability, and an evidence map that survives PHY/firmware/platform changes.

Bring-up order (do not reorder)

The bring-up is staged so each step eliminates one failure class before TSN features are enabled. Every step includes a pass criterion and a minimal verification method.

1
Link baseline first (PHY stability)
Goal: eliminate flaps, symbol/CRC bursts, and negotiation churn before time sync is trusted.
  • Pass: link stays up; no repeated renegotiation; error counters do not ramp abnormally.
  • Verify: PHY link status + CRC/symbol/error counters + “link-down reason” if available.
  • Fail signature: intermittent link flap → gPTP appears to “randomly lose lock”.
2
Timestamp sanity (before gPTP)
Goal: prove the timestamp path is monotonic, stable, and uses the intended clock domain.
  • Pass: timestamps are monotonic; ingress/egress deltas are consistent and explainable; source selection is explicit.
  • Verify: TSU/PHC readout + raw ingress/egress timestamps + “timestamp source” and “domain/clock” status.
  • Fail signature: non-monotonic reads / drift jumps → “sync looks OK” but app jitter remains high.
Monotonic Stable Δ Correct source CDC-consistent
3
gPTP synchronization (endpoint time receiver)
Goal: ensure the sync state machine reaches a stable state (not “occasionally locked”).
  • Pass: stable sync state; offset does not show step changes; loss-of-sync transitions are logged and recover.
  • Verify: gPTP state + offset/jitter logs + PHY/PLL lock status around transitions.
  • Fail signature: frequent state toggles → periodic latency spikes even with light traffic.
4
Queue mapping & shaping (TSN determinism at the port)
Goal: make classification → priority → queue → gate/shaper behavior deterministic and traceable.
  • Pass: priority mapping is correct; queues are bounded; gate/shaper enabled states match the intended profile.
  • Verify: queue drop/occupancy counters + “PCP/DSCP→queue” mapping dump + gate/CBS parameter dump.
  • Fail signature: wrong mapping → head-of-line blocking or p99/p999 tail inflation.
5
Application periodic load (final acceptance)
Goal: verify the real cycle load meets jitter/latency targets, with evidence that points to the endpoint contribution.
  • Pass: cycle jitter and tail latency remain within budget under representative traffic mixes.
  • Verify: app cycle timestamp logs aligned to PHC/TSU timebase + queue counters during load.
  • Fail signature: good sync but bad tails → queue/shaper or CPU/ISR contention evidence will show.
Bring-up rule: a TSN endpoint is considered “ready” only when each stage can be reproduced after cold boot, firmware update, and cable re-plug—while producing the same evidence artifacts (dumps + counters + logs).
Figure E11 — Bring-up pipeline & evidence taps (endpoint-only)
Bring-up pipeline (LINK → TS → gPTP → QUEUES → APP) Trace log (must) Priority map Gate table (GCL) CBS params TS source + PLL Build ID LINK PHY stable TS SANITY monotonic gPTP SYNC stable state QUEUES map + shaper APP LOAD cycle jitter PHY stat CRC cnt TSU/PHC In/Out TS state offset log Q map drop cnt Readouts must share one timebase: PHC/TSU time + logged events + counter snapshots (same boot/session)
A TSN endpoint bring-up is “done” only when evidence taps (PHY/TSU/queues/logs) can reproduce the same story after cold boot, firmware update, and cable re-plug.

Configuration traceability (must record)

TSN failures often look “network-related” but are caused by silent endpoint configuration drift. The items below must be versioned and logged as a single “TSN profile fingerprint”.

  • Priority mapping table: PCP/DSCP → internal priority → queue ID → shaper/gate assignment.
  • Gate control list (GCL): cycle time, phase/offset, per-queue open/close windows, active schedule ID.
  • CBS parameters: idleSlope / sendSlope / hiCredit / loCredit (or equivalent driver representation).
  • Timestamp path selection: PHY vs MAC insertion point; ingress/egress enable; timestamp format/units.
  • Clock/PLL state: reference source, lock status, holdover triggers, any ref switch events.
  • Build identity: firmware/driver commit ID, PHY firmware/strap config, device tree/profile hash.
Minimum logging rule: every profile dump must include a timestamp (PHC time), so configuration snapshots can be correlated with counter changes and event logs.

Debug evidence sources (endpoint-only)

The goal is to avoid “guessing”: each evidence source is tied to a claim it can prove or falsify.

PHY evidence
Read: link status, error counters, link-down reason, clock/recovery status. Proves: whether the physical layer is stable enough for time sync to be meaningful.
TSU/PHC evidence
Read: PHC/TSU counter, raw ingress/egress timestamps, domain/source selection flags. Proves: timestamp monotonicity, stability, and clock-domain consistency.
Queue evidence
Read: per-queue drops, occupancy/credit stats, gate/shaper enable states, mapping dumps. Proves: whether determinism is broken by misclassification, congestion, or shaper misconfiguration.
CPU/ISR evidence
Read: interrupt rate, latency spikes around ISR storms, scheduling/softirq metrics (concept-level). Proves: whether software contention is injecting tail latency after the TSN pipeline.
Event logs
Read: time-stamped state transitions (gPTP state, PLL lock, link up/down, profile apply). Proves: causal order between a state change and a jitter/latency symptom.

Reference BOM (example material numbers)

The parts below are commonly used building blocks for TSN endpoints. Exact selection depends on port count, speed (100M/1G/2.5G), isolation rating, EMC class, and clocking strategy.

TSN-capable processor / MCU
TI Sitara AM64x (TSN-capable ports; gPTP endpoint examples available) :contentReference[oaicite:0]{index=0}
TSN switch silicon (embedded multi-port endpoint)
Microchip LAN9662 (4-port TSN Gigabit Ethernet switch with integrated CPU) :contentReference[oaicite:1]{index=1}
TSN/AVB switch (compact add-on switch family)
NXP SJA1105 family (e.g., SJA1105TEL, SJA1105TELY) :contentReference[oaicite:2]{index=2}
Industrial Ethernet PHY (Gigabit)
TI DP83869HM (robust Gigabit PHY family; widely used in industrial Ethernet designs) :contentReference[oaicite:3]{index=3}
Industrial Ethernet PHY (Gigabit, low latency)
Analog Devices ADIN1300 (industrial Gigabit Ethernet transceiver; supports IEEE 1588 timestamping via MAC indications) :contentReference[oaicite:4]{index=4}
Industrial TSN MPU with integrated switch (companion / standalone)
Renesas RZ/N2L (integrated TSN-compliant multi-port Gigabit switch) :contentReference[oaicite:5]{index=5}
Clock generator / jitter conditioning
Skyworks/Silicon Labs Si5341A (ultra-low jitter clock generator family) :contentReference[oaicite:6]{index=6}
Isolated DC/DC (logic-side isolation power)
Murata NXE1S0505MC-R7 (1 W isolated DC/DC example) :contentReference[oaicite:7]{index=7}
Ethernet line ESD/TVS (low capacitance arrays)
Semtech RClamp0524P (TVS array family) :contentReference[oaicite:8]{index=8}
ESD diode array (multi-channel, ultra-low capacitance)
Littelfuse SP3012-06UTG :contentReference[oaicite:9]{index=9}
Common-mode choke (example part number)
Würth Elektronik 744232090 (WE-CNSW series example) :contentReference[oaicite:10]{index=10}
Magnetics module (example part number)
Pulse Electronics H5007NL (single-port Gigabit LAN magnetics module example) :contentReference[oaicite:11]{index=11}
Selection note: if the bring-up repeatedly fails at Step 2/3 (timestamp/gPTP), prioritize a design where the timestamp clock domain is explicit and the PHY/MAC timestamp handshake is well-documented—then lock the “TSN profile fingerprint” into version control.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12 · Validation & Field Evidence Chain

A TSN endpoint is “qualified” only when three metrics are measurable, repeatable, and explainable: sync error (offset/jitter), deterministic latency tails (p99/p999), and error/event correlation (CRC/drop vs ESD/EFT/load/temperature).

Acceptance metrics (the required triad)

1) Sync error: offset & jitter
Measure the endpoint’s time alignment stability, not just “sync state”. Track steady jitter vs step changes. A step change usually points to clock/PLL events, brownout recovery, or timestamp source/domain switching on the endpoint.
offset jitter step events
2) Deterministic latency: p99 / p999
Throughput can look fine while tails explode. Validate latency distribution under representative periodic loads. If sync is stable but tails are poor, prioritize queue mapping, gate/shaper states, and CPU/ISR contention evidence before protocol speculation.
p99 p999 periodic spikes
3) Error/event correlation: CRC/drop vs stressors
Correlate CRC/symbol errors, link flaps, and queue drops with event timestamps (ESD/EFT, load steps, temperature points). A strong correlation is more valuable than a single snapshot, because it reveals whether uncertainty is injected by EMC, power/reset, or software contention.
CRC drops event time

Minimal bench (capabilities, not brands)

  • Time/packet domain: capture hardware timestamps or time-related packets and export latency distributions.
  • Electrical/clock domain: observe rails, reset/PG timing, and clock/PLL lock behavior; confirm short-term frequency stability.
  • Load/event domain: apply controlled load steps and record event times (ESD/EFT occurrences, thermal points, link transitions) for correlation.

Evidence becomes valid only when all artifacts share the same timebase (PHC/TSU time) and the same boot/session context.

Field evidence chain (symptom → evidence → exclusion order)

Use a hard-first exclusion order: clock/power/EMC evidence before software assumptions. Each symptom must map to counters/logs that can prove or falsify a hypothesis.

Symptom A: Sync drift / step change
Evidence: PLL lock/ref-switch logs, TSU/PHC monotonicity, timestamp source/domain flags, brownout/reset markers. Exclude: clock tree events and reset sequencing first; only then consider protocol stack behavior.
Symptom B: Latency tail inflation (p99/p999)
Evidence: queue mapping dump, gate/shaper enable + params snapshot, per-queue drops/occupancy, ISR rate spikes. Exclude: mapping/shaping mistakes first; then CPU/ISR contention; last, application scheduling.
Symptom C: CRC surge / short link flap
Evidence: PHY error counters + link-down reason + event timestamp, port-side protection segmentation tests. Exclude: EMC/grounding/isolation reference issues first; then transceiver/clock sensitivity; last, software retries.
Figure E12 — Debug flow tree (endpoint-only evidence)
Field symptom observed Pick a branch, collect evidence, exclude hard causes first Sync drift / step Clock / TS domain evidence Latency tail p99/p999 Queues / shaping / ISR evidence CRC surge / link flap Port / EMC / ground evidence PLL_LOCK? TSU mono? Brownout/reset markers? Q map GCL/CBS ISR rate / CPU contention? PHY err cnt Shield/return Port protection segments Evidence artifacts Counters Snapshots Logs + event time Rule: same timebase + same boot/session + event timestamps → correlation beats single-point snapshots
Use the branch-specific evidence taps to localize uncertainty sources. Hard causes (clock/power/EMC) should be excluded before software-level conclusions.

H2-13 · FAQs (with answers)

Each answer stays within the endpoint boundary and points to concrete evidence (counters, dumps, logs) that can be captured during bring-up and field validation.

1) What is the practical boundary between a TSN endpoint and a TSN switch/gateway, and when is hardware timestamping mandatory?
Mapped: H2-1 / H2-5
A TSN endpoint generates/consumes time-sensitive flows and must make its own egress/ingress timestamps and queue behavior measurable. A switch/gateway focuses on forwarding and network-wide policy. Hardware timestamping becomes mandatory when sub-µs stability is required, because it anchors time at a calibrated insertion point and reduces queue/software uncertainty.
2) Why can throughput look fine while p99/p999 latency is terrible, and what should be checked first on the endpoint?
Mapped: H2-2 / H2-11 / H2-12
Throughput averages hide tail behavior. p99/p999 usually degrades due to mis-mapped priorities, shaping/gating mismatches, or CPU/ISR contention that bursts at unlucky times. Start with (1) queue mapping dump, (2) gate/CBS enable + parameter snapshot, and (3) per-queue drop/occupancy counters aligned to the same timebase.
3) Qbv (gating) is enabled but jitter gets worse—how to tell whether priority mapping or the gate table is wrong, and what evidence should be captured?
Mapped: H2-4 / H2-11
If mapping is wrong, latency spikes appear across classes because traffic lands in the wrong queue. If the gate table is wrong, spikes are often periodic with the cycle/phase. Capture three artifacts together: the priority map dump, the GCL (cycle/phase/windows) dump, and queue counters (drops/occupancy) with timestamps. Correlation beats guesswork.
4) In the field, what differences are typically visible between PHY and MAC timestamping (jitter, bias, temperature drift)?
Mapped: H2-5 / H2-6 / H2-12
PHY timestamping usually shows lower jitter under load because it is closer to the wire and less sensitive to MAC/queue variability, but it demands correct domain calibration and interface delay compensation. MAC timestamping is easier to integrate yet can inherit queue and gate timing variability. Temperature drift is dominated by the timestamp clock path and PLL stability, not the timestamp label alone.
5) What happens when 1-step vs 2-step is chosen incorrectly—what “looks synced but is actually off” symptoms appear?
Mapped: H2-5
A common failure mode is “stable state, wrong truth”: sync state appears locked but the measured offset shows a persistent bias or occasional step corrections. This happens when the effective insertion point and correction handling do not match the hardware capability. The tell is a consistent offset bias that does not track load, plus logs indicating mismatched timestamp mode or correction path.
6) What is the practical criterion for adding a PLL/jitter cleaner on an endpoint, and where does skipping it show up?
Mapped: H2-6 / H2-12
Add jitter conditioning when the timestamp clock domain cannot maintain stable short-term phase under temperature, supply ripple, or reference disturbances, and the target offset/jitter budget is tight. Skipping it often shows up as temperature-sensitive jitter growth, step changes after ESD/EFT events, or frequent lock/relock transitions. Evidence is PLL lock/ref-switch logs aligned with offset/jitter excursions.
7) In dual-port redundancy or daisy-chain devices, why can an embedded switch introduce “unexplainable” latency compared with two PHYs directly connected to the CPU?
Mapped: H2-7 / H2-2
An embedded switch adds forwarding mode choices, internal buffering, and priority translation points, each introducing extra delay terms and variability. Two PHYs directly connected to the CPU keep the path simpler and often easier to model. With an embedded switch, determinism depends on consistent priority mapping end-to-end and measurable buffer behavior; otherwise tails inflate without obvious symptoms.
8) After isolation is added, CRC still surges or sync drifts—what is more often at fault: shield grounding strategy or reference-ground mismatch across the isolation boundary?
Mapped: H2-8 / H2-10
Both occur, but the fastest discriminator is evidence shape and correlation. If CRC surges correlate with touch/ESD and link events, shield/return path strategy is suspect. If sync drifts or shows steps without CRC growth, reference-ground mismatch affecting the clock/timestamp domain is more likely. Capture PHY error counters, PLL/TSU status, and event timestamps to separate the two.
9) The endpoint “occasionally loses sync” but link never drops—what power/reset/PLL states and timing should be checked first?
Mapped: H2-9 / H2-6
Check for silent clock-domain disturbances: PLL lock transitions, reference switch events, timestamp source changes, and brownout markers that do not force a link drop. Then confirm reset/PG sequencing around those events: rail dips, delayed releases, or partial resets that restart the timestamp counter. The key is aligning these state changes to offset/jitter logs on the same timebase.
10) After ESD/EFT a short link flap appears—within the port protection stack, which segment should be suspected first?
Mapped: H2-10
Start from the segment that directly clamps or filters common-mode energy: TVS/ESD arrays and the common-mode choke region, then the magnetics and connector return paths. A degraded segment often leaves a signature: error counters spike, link-down reason appears, and recovery time changes. Isolate by logging PHY counters and repeating controlled events while probing test points along the stack.
11) What are the three most commonly missed bring-up trace items that make later incidents impossible to reproduce?
Mapped: H2-11
The top misses are (1) the priority mapping table (PCP/DSCP→queue), (2) the gate/CBS configuration snapshot (GCL + shaper parameters), and (3) the timestamp path selection and clock/PLL state (source, domain, lock/ref switch). Without these, field symptoms cannot be tied to configuration drift, and “same firmware” does not mean same behavior.
12) How can a minimal setup prove the endpoint meets spec for sync error, deterministic latency, and error rate?
Mapped: H2-12
For sync error, record offset/jitter over time and mark step events with PLL/TSU status logs. For deterministic latency, measure p99/p999 under periodic load and correlate tails with queue/shaper counters. For error rate, log CRC/symbol errors and link events while applying controlled stressors (load/temperature/ESD/EFT), ensuring all artifacts share one timebase and session.