123 Main Street, New York, NY 10001

AFDX / ARINC 664 Switch (Deterministic Avionics Ethernet)

← Back to: Avionics & Mission Systems

An AFDX/ARINC 664 switch is built for provable determinism: it controls traffic injection (VL/BAG policing), isolates and shapes queues to bound latency/jitter, and exposes counters/mirroring/logs so drops and spikes can be explained and verified. In practice, the “right” switch is the one that can keep worst-case behavior inside limits under load and faults, and still provide the evidence needed for fast field diagnosis and safe configuration changes.

H2-1 · What this page solves (AFDX switch boundary & outcomes)

This page focuses on the switch viewpoint: how an AFDX/ARINC 664 switch enforces deterministic forwarding with VL policing/shaping, supports A/B redundancy without fault spread, and exposes diagnostics that can be verified in test and in the field.

What a reader should get from this page

  • Bounded latency and jitter: understand what sets the worst-case delay budget (serialization + switching + queueing) and how the switch constrains the variable part.
  • Virtual Link governance: know how VL/BAG policing and shaping turn bursty traffic into predictable behavior that can be proven.
  • A/B redundancy with fault containment: identify failure signals and isolation tools so a single bad port or stream cannot contaminate the whole network plane.
  • Evidence-based diagnostics: counters, mirroring, and event records that create a traceable “proof chain” for acceptance testing and troubleshooting.

Typical reasons people land here

  • “Latency looks fine on average, but jitter spikes appear under load.”
  • “Drops occur only in certain modes—suspected policing/shaping or queue mapping issues.”
  • “A/B redundancy exists, yet failover behavior is unclear or hard to verify.”
  • “Field troubleshooting lacks evidence: no counters/mirroring workflow.”

What this page is NOT

  • Not a MIL-STD-1553B / ARINC 429 / CAN (ARINC 825) tutorial.
  • Not a mission-computer PCIe/NVMe architecture guide.
  • Not a GPSDO/atomic-clock deep dive (timing sources are only referenced as needed for switch behavior).
  • Not a full compliance handbook; it stays on engineering mechanisms and verifiable outcomes.

Engineering rule of thumb: determinism is proven at the worst case, not by averages. A switch must provide both traffic governance (policing/shaping) and observable evidence (counters/mirroring) to make that proof possible.

AFDX / ARINC 664 Network Overview (Switch View) End System VL Scheduler BAG / Burst Control End System VL Receive Health Inputs AFDX Switch VL Table + Policing Queues + Shaping Diagnostics (Counters/Mirror) Network A Network B VL BAG Diagnostics Port Mirror / Counters / Events Analyzer / Monitor Evidence for test & field Focus: governance + determinism + redundancy + observable diagnostics (switch viewpoint).
Figure F1. End-to-end AFDX view: dual A/B planes, VL/BAG governance in the switch, and a dedicated diagnostics path for evidence-based validation.

H2-2 · ARINC 664 / AFDX essentials in 90 seconds (VL, BAG, policing)

Extractable answer block

In AFDX (ARINC 664), a Virtual Link (VL) defines a controlled traffic flow. The Bandwidth Allocation Gap (BAG) limits how often frames may be sent, and the switch enforces policing and shaping so bursts cannot create unbounded queueing. This is the foundation of deterministic latency and jitter.

The practical goal is simple: convert “unpredictable bursts” into behavior that remains predictable under worst-case contention. Three control points work together—ingress policing, queue mapping, and egress shaping.

Term What it controls Engineering failure mode if wrong
Virtual Link (VL) Traffic identity: how frames are classified into policies, queues, and limits. Wrong queue/policy; unintended drops or jitter spikes that only appear under load.
BAG Pacing ceiling: the minimum interval between transmissions for a VL. Burstiness pushes queueing delay upward; “average OK” but worst-case jitter breaks determinism.
Policing (ingress) Compliance enforcement: what happens when a VL exceeds its allowed envelope. Unbounded contention if absent; silent drops if too strict; hard-to-debug intermittent behavior.
Shaping (egress) Scheduling discipline: how competing VLs share an output port in a predictable way. Queueing becomes the dominant variable; jitter spikes emerge when multiple streams collide.

Quick intuition: for a given VL, the average-rate ceiling scales roughly with frame size / BAG. Smaller BAG allows higher average throughput, but also raises the risk that burst alignment across VLs inflates queueing delay. Policing and shaping keep that “alignment risk” inside a verifiable bound.

Three determinism control points (switch-centric)

  1. Ingress policing: constrains input bursts so worst-case contention is provable.
  2. Queue mapping: isolates classes/VLs so one stream cannot steal latency budget from another.
  3. Egress shaping: enforces a predictable transmit schedule at the port, reducing jitter spikes.
VL/BAG Governance Pipeline (Inside the Switch) Frame In Classify VL ID BAG Check Burst Envelope Policer Action Drop / Mark / Limit Queue Map Class / Priority Egress Shaping Predictable schedule Port Tx Bounded jitter VL BAG POLICE SHAPE → bounded Key idea: policing constrains the input envelope; shaping constrains contention at the output.
Figure F2. Switch ingress pipeline: classify by VL, enforce BAG/burst envelope, apply policing actions, map to queues, then shape at egress for predictable timing.

H2-3 · Switch architecture that actually matters (pipeline, buffers, fabric)

Deterministic performance is set by what happens inside the switch under contention. The fixed part of delay comes from serialization and the forwarding path; the variable part (jitter spikes) is driven by queueing and resource arbitration. This section highlights architectural details that directly change worst-case latency and how to verify them.

Store-and-forward vs cut-through

Impact: Store-and-forward adds a larger fixed delay (full frame reception + checks), while cut-through reduces average delay by forwarding early. However, cut-through designs can fall back under certain conditions (congestion, filtering, mirroring, error handling), creating discontinuous worst-case latency.

  • Verify: measure min/avg/max latency across (a) light load, (b) heavy contention, (c) mirroring enabled, (d) error injection.
  • Pass signal: max latency remains bounded and does not “jump” when features toggle.

Buffers and head-of-line blocking

Impact: Shared buffers and shared queue banks can cause head-of-line blocking: a bursty stream occupying a shared resource delays unrelated traffic, inflating worst-case queueing. Large buffers may hide congestion while increasing worst-case delay.

  • Verify: run a burst stress stream alongside a critical VL; check whether critical max latency inflates without proportional throughput change.
  • Observe: per-port drops, congestion counters, and (if available) queue watermarks or queue-drop counters.

Fabric bandwidth and oversubscription

Impact: Internal fabric arbitration and oversubscription can introduce hidden contention even when external links look underutilized. Under specific port combinations, fabric congestion manifests as queueing jitter spikes or selective drops.

  • Verify: multi-port “fan-in” tests (many ingress ports converging on one egress) and “fan-out” tests (one ingress feeding many egress).
  • Pass signal: worst-case latency bound stays stable across port-combination stress, not just average throughput.

Common misconception

  • High throughput does not imply bounded worst-case latency.
  • Average latency can look excellent while max latency fails determinism requirements.
  • Testing only at light load misses mode changes and hidden contention paths.
Switch Internals That Set Worst-Case Latency AFDX Switch Pipeline (conceptual) Ingress Parse / Classify Policing Queue Banks Shared buffers Worst-case queueing builds here Fabric Arbiter Bandwidth Egress Shaper Scheduler Critical VL Burst VL Hidden contention trigger Fabric oversubscription / arbitration → queueing spikes under certain port mixes Determinism depends on bounded queueing: shared buffers + arbitration decide worst-case behavior.
Figure F3. The variable part of delay is dominated by queueing and resource arbitration. Architecture determines when hidden contention appears and how large worst-case spikes can be.

H2-4 · Determinism toolkit: QoS, shaping, policing, and bounded latency

The objective is not “low average delay.” The objective is a provable upper bound on latency and jitter under worst-case contention. That bound is set by what can and cannot vary inside the switch.

Worst-case latency decomposition (engineering view)

Worst-case latency = Serialization (line rate × frame length) + Switching path (forwarding/pipeline) + Queueing (contention).
Only the queueing term can “explode” without governance—so determinism is primarily achieved by policing, queue isolation, and egress shaping.

QoS classification → queue mapping

VLAN/PCP (and similar tags) are only useful if they deterministically map traffic into the intended queues and policies. The label is not the guarantee; the queue isolation and policy binding behind the label are.

Policing (ingress) → bounded input envelope

Policing ensures a VL or class cannot inject bursts that invalidate worst-case proof. A good setup provides observable evidence (drops/violations counters) so configuration and field behavior match.

Shaping (egress) → predictable schedule

Shaping turns competition into a predictable transmit schedule. Two practical scopes are common: per-port shaping to control total output behavior and per-class shaping to protect timing-critical traffic windows.

Configuration checklist (lock these to make bounds provable)

  1. Classification rules fixed: VLAN/PCP mapping is deterministic with a safe default for unknown traffic.
  2. Queue mapping explicit: critical traffic maps to protected queues; avoid mixing bursty streams into the same queue bank.
  3. Ingress policing enabled: define what happens on violation (drop/mark/limit) and ensure per-policy counters exist.
  4. Shaping scope chosen: per-port shaping for overall stability; per-class shaping for timing-critical windows.
  5. Scheduling behavior known: understand priority handling under congestion (avoid surprises that starve a class).
  6. Mirroring plan: port mirroring supports capture without overloading the mirror destination (rate limits as needed).
  7. Counter set captured: per-port CRC/errors, per-policy drops, and link up/down events are collected and trended.
  8. Worst-case test profile: verify max latency/jitter under fan-in/fan-out and burst alignment scenarios.
  9. Feature toggles validated: check max latency when mirroring/diagnostics are enabled to avoid hidden mode changes.
Where Jitter Comes From (Contention) — and How Shaping Helps time → Egress Port VL-A (critical) VL-B (bursty) Case 1: No shaping (alignment causes queueing jitter) VL-A VL-B burst queueing delay → jitter spikes Case 2: With shaping (reserved window protects critical traffic) reserved window VL-A VL-B uses leftover Shaping reduces contention-driven variability by creating predictable transmit opportunities for timing-critical traffic.
Figure F4. Time-axis illustration: without shaping, burst alignment inflates queueing and jitter; with shaping, reserved windows keep critical traffic timing predictable.

H2-5 · Redundancy: dual network A/B, fault containment, and failover behavior

Dual-plane A/B redundancy is not “two cables.” It is two independent fault-containment domains designed to keep a localized issue (a bad port, a flapping link, or a bursty stream) from degrading deterministic behavior across the network. This section stays strictly on what can be enforced and observed at the switch.

A/B isolation principles (switch viewpoint)

  • Independent PHY/port domains: Plane A link faults must not toggle Plane B link state or counters.
  • Independent power/reset domains (principle): a brown-out or reset event in one plane should not restart the other plane.
  • Configuration mirrored but controlled (principle): policies should match across A/B, but mirroring must avoid “copying a mistake everywhere.”
  • Evidence-first operation: every containment action should have measurable signals (events/counters) that prove why it occurred.

Fault → observable signal → containment action

Fault pattern Observable signals (switch-side) Containment strategy (switch-side)
Single-port jitter / micro-bursts Queue drops rise, latency spikes on affected egress, policing violations (if enabled) Enable/ tighten policing + shaping, isolate mapping to protected queues, rate-limit offender
Link flap (up/down oscillation) Rapid link events, error spikes around transitions, intermittent drops Port isolate / shut, hold-down policy (principle), capture events + counters for root-cause
Plane-specific frame loss CRC/symbol errors, per-port error counters, plane-A only drops (plane-B clean) Keep fault inside the plane: isolate bad port, verify independent PHY domain, alert with plane tag
Policing violations Policing drop/violation counters climb; other traffic sees jitter relief Treat as misbehaving source: keep policy strict, export evidence, isolate if recurring
Storm / flooding pattern Storm-control hits, port utilization spikes, broad performance degradation without single-VL signature Storm control / rate limiting (principle), port isolate, mirror for evidence capture
Diagnostics side-effects Mirror destination congestion; analysis port drops; “measurement changes behavior” Rate-limit mirroring (principle), schedule capture windows, keep diagnostics out of critical paths

How to verify A/B containment (switch-only evidence)

  1. Inject a controlled fault into Plane A (flap a link, burst a port, or violate policing).
  2. Confirm Plane B counters and link state remain stable (no mirrored error signatures).
  3. Validate containment actions are local: the switch isolates the offender and the blast radius is limited to the plane/port domain.
  4. Record evidence: link events + CRC/error counters + policing drops + queue drops (as supported).

Boundary note: this page covers switch-side containment and evidence. End-system or application-level fusion/voting/duplicate handling belongs to system pages, not this switch explainer.

Dual Network A/B — Fault Containment Zones (Switch View) End System Port A Port B End System Port A Port B Switch Plane A Counters / Events Isolate / Limit Switch Plane B Counters / Events Isolate / Limit Plane A Plane B Fault containment zone A Fault containment zone B fault injected Plane B stable Goal: keep faults local to a plane/port domain with isolation + rate limiting + evidence counters.
Figure F5. Dual-plane redundancy is two separate containment zones. The switch enforces isolation locally and exposes counters/events to prove separation.

H2-6 · Timestamping in an AFDX switch: IEEE 1588 PTP (what to do and what to avoid)

PTP accuracy depends less on the protocol name and more on where timestamps are taken and how load-dependent delay is handled. A switch must keep timestamping close to the physical boundary (MAC/PHY), prevent queueing from contaminating timestamps, and avoid asymmetry that turns into systematic offset.

Three must-check points (switch-side)

  1. Timestamp point: hardware timestamps taken at the MAC/PHY boundary (not in software paths).
  2. Queue influence: verify whether congestion changes timestamp stability; queueing variability must not masquerade as timing drift.
  3. Link symmetry: asymmetric paths or rate mismatches introduce systematic errors that cannot be “averaged away.”

Residence time (concept)

Residence time is the time a PTP event frame spends inside the switch (ingress to egress). If this internal delay varies with load, it appears as jitter in timing. The core engineering goal is to keep this behavior stable and observable.

Correction field (concept)

Some switch designs account for internal residence effects by applying a correction concept. The key practical test is simple: does timing error stay stable when the switch is stressed, or does it drift with queueing and arbitration?

Common pitfalls

  • Software timestamping: acceptable at light load, but error grows sharply under contention.
  • Measuring via a mirror port as “ground truth”: mirroring can reorder or rate-limit and alter timing observability.
  • Congestion-sensitive timestamps: queueing contaminates the apparent timing and looks like “clock drift.”
  • Uncontrolled asymmetry: mismatched link paths or rate conversion introduces systematic offset.
  • PHY delay drift: temperature/conditions shift PHY delays; trending counters and stability checks matter.

Practical verification (no tool dependency)

  1. Establish a baseline offset/jitter at light load.
  2. Apply a controlled contention profile (fan-in or burst traffic) while keeping links stable.
  3. Compare offset/jitter before vs during stress; large changes indicate queue influence or timestamp point issues.
  4. Correlate with switch evidence: congestion counters, drops, and link events around the error window.
  5. If errors scale with load, prioritize verifying hardware timestamp position and the delay path through queues.

SyncE may help reduce wander/jitter in some designs, but it is treated as optional support here—not a standalone tutorial.

IEEE 1588 PTP Timestamp Path (Inside the Switch) Ingress MAC / PHY Timestamp t1 Queue load variability Egress Shaper Egress MAC / PHY t2 timestamp Residence time (t2 − t1) Correction (concept) account for internal delay behavior avoid load-dependent drift Avoid asymmetry mismatched paths → systematic offset congestion → queueing contaminates timestamps Key checks: hardware timestamp at MAC/PHY, stable residence behavior under load, and controlled link symmetry.
Figure F6. Timestamping quality is set by the timestamp point and load sensitivity. Queueing variability and path asymmetry are the most common sources of large timing error.

H2-7 · Ethernet PHY & interfaces: what impacts integrity (EMI, BER, link stability)

Link integrity is part of determinism: if the PHY layer is marginal, the network can look “random” even with perfect shaping and policing. This section focuses on practical PHY and interface boundaries that affect link stability, and on switch-side evidence that narrows root cause without turning into a full EMC tutorial.

PHY families: practical selection boundaries

100BASE-TX

  • Use when: a stable, well-understood copper link is needed with simpler signal conditioning.
  • Failure signature: CRC errors and occasional drops often appear before frequent link flaps.
  • Switch evidence: CRC trend + link event rate distinguish margin loss vs intermittent connection.

1000BASE-T

  • Use when: higher throughput is required and cabling/connector quality is controlled.
  • Engineering reality: more sensitive to cable/connector changes and environment-driven margin shrink.
  • Switch evidence: symbol/alignment/PCS-type errors (if exposed) + retrain / flap events matter.

1000BASE-X

  • Use when: the medium is not classic copper, or a serial link boundary is preferred.
  • Trade: moves sensitivity away from magnetics/cable toward module/serial link conditions.
  • Switch evidence: link stability + error counters still determine whether issues are physical or policy-driven.

MAC-PHY interfaces: boundaries that affect stability (not a layout tutorial)

RGMII

Practical boundary: parallel timing sensitivity can turn board-level variation into intermittent errors. If link issues correlate with temperature or vibration windows, verify interface stability assumptions.

SGMII

Practical boundary: a serial interface often improves consistency, but relies on stable clocking and correct negotiation/config. Confirm that the link does not “retrain” under stress.

USXGMII

Practical boundary: multi-rate flexibility increases configuration surface. In deterministic networks, avoid silent mode changes and ensure the switch exposes clear state and counters for link behavior.

Engineering indicators that matter (switch-side interpretation)

  • BER (practical): manifests as rising CRC/symbol errors and eventually as drops or retrains; trends are more valuable than single snapshots.
  • Jitter tolerance (practical): poor tolerance often shows as intermittent errors under specific load/temperature/vibration windows.
  • Temperature drift (practical): error counters grow with time/temperature even when link stays “up.”
  • Cable/connector sensitivity: link flap + error spikes align with mechanical events or environmental transitions.

Symptoms → most likely cause → switch-side evidence

Symptom Most likely cause What to check on the switch
CRC errors climb, link stays up Margin shrinking (noise/temperature/cable quality) CRC trend vs temperature/time, symbol/alignment errors (if available), drops staying low
Link flap with error spikes Intermittent connection or negotiation boundary instability Link up/down event rate, retrain counts, error bursts aligned to events
Alignment / symbol errors appear PHY-level integrity loss under conditions Symbol/alignment counters (if exposed), correlation to load/temperature windows
Drops rise without PHY errors Congestion/policy governance rather than physical integrity Port drops vs policing drops vs shaping stats; link events remain quiet
PHY Integrity Chain — Where Errors Appear (Switch Evidence) MAC Link state Negotiation PHY Errors Margin Magnetics Susceptibility Isolation Cable / Connector Intermittent Loss / vibration noise / EMI Switch Evidence (Counters) CRC errors Link up/down Symbol / align Retrain count Drops Port utilization Use counters + event timing to separate margin loss, intermittent connection, and congestion-driven symptoms.
Figure F7. Integrity problems surface as error counters and link events. The switch can provide evidence that narrows the fault domain without expanding into a full EMC course.

H2-8 · Diagnostics & health monitoring: counters, mirroring, built-in tests, event logs

A deterministic switch should not only forward traffic—it should prove health and shorten field triage. The most valuable capability is a structured evidence chain: counters → health decisions → event logs → maintenance export, so intermittent faults can be reproduced and contained.

Observability checklist (what a good switch exposes)

Per-port

  • CRC / symbol errors
  • Link up/down (with timestamps)
  • Drops (port/queue, if exposed)
  • Queue indicators (if available)

Per-class / policy

  • Policing drops (violations)
  • Shaping statistics (if supported)
  • Priority/class counters
  • Policy hit-rate evidence

Evidence capture

  • Mirroring / SPAN strategy
  • Rate limits (avoid new bottleneck)
  • Windowed capture (principle)
  • Export path to maintenance

BIT / BIST + logs

  • Power-up self-test (BIT) with record
  • Periodic self-test (BIST) (concept)
  • Loopback / signature checks (concept)
  • Event logs with snapshots

“Diagnostic dashboard” cards (metric → trigger → action)

Link stability

Metric: link up/down + retrain

Trigger: burst of events in a short window or repeatable periodic flaps

Action: isolate the port/plane, capture counters snapshot, verify cable/connector domain

Integrity errors

Metric: CRC / symbol / alignment

Trigger: monotonic growth trend or temperature-correlated escalation

Action: flag as margin risk, trend over time, correlate to link events and load windows

Congestion evidence

Metric: drops / queue indicators

Trigger: drops rising without PHY error growth

Action: separate port drops vs policy drops; tighten shaping/policing where needed

Policy violations

Metric: policing drops / shaping stats

Trigger: sustained violations on specific classes/flows

Action: treat as misbehavior evidence; keep containment local and export proof

Mirroring strategy

Metric: mirror enable state + mirror port utilization

Trigger: mirror port becomes congested or capture changes traffic behavior

Action: rate-limit or window captures; keep diagnostics out of critical paths

BIT/BIST evidence

Metric: self-test status + last-pass timestamp

Trigger: any self-test failure or repeated marginal warnings

Action: log snapshot, isolate domain, re-run targeted tests, export event package

Fastest field triage (5 steps, switch-side)

  1. Check link events first: flap/retrain indicates a physical or negotiation boundary.
  2. Check integrity counters: CRC/symbol/alignment growth pattern reveals margin vs intermittent faults.
  3. Separate drops: port/queue drops vs policing drops to distinguish congestion from integrity.
  4. Capture evidence only when needed: enable mirroring in windows and watch mirror port load.
  5. Commit an event package: logs + counter snapshots around the incident enable traceable maintenance.
Diagnostics Data Flow — From Counters to Maintenance Evidence AFDX Switch Counters per-port / per-class Mirror / SPAN windowed capture BIT / BIST self-test records Health Monitor Trigger Snapshot Log / Event Store time + port + counters incident window snapshots Maintenance Port export evidence package Design goal: counters and self-tests feed health decisions; events capture snapshots; maintenance exports a traceable package.
Figure F8. A switch becomes field-ready when it can produce an evidence chain: metrics → triggers → snapshots → logs → maintenance export.

H2-9 · Configuration & verification: VL tables, policing rules, and change control

Configuration is where “latent faults” are born: a small BAG, MTU, queue mapping, or mirror setting change can silently shift worst-case behavior without breaking day-to-day operation. This section treats the switch as a governed artifact: configuration layers are explicit, validation is repeatable, and change control is designed for predictable rollback.

Configuration layers (what each layer controls)

Port-level

  • Link behavior baseline (events, stability, counters exposure)
  • Mirror / SPAN policy placement and rate discipline
  • Local containment knobs (storm-limiting principles, if supported)

Queue / QoS-level

  • Queue count and class-to-queue mapping
  • Congestion behavior boundaries (what happens under load)
  • Egress shaping granularity (port/class where applicable)

VL / Policing-level

  • VL table completeness (binding + parameters)
  • BAG and frame size boundary (MTU alignment)
  • Policing action + visibility (drops must be measurable)

Time sync (switch config points)

  • PTP mode selection (as configured on the switch)
  • Timestamp behavior settings exposed by the device
  • Monitoring thresholds for drift symptoms (principle)

Verification rules (repeatable, switch-side)

  • Schema & references: every port/class/VL reference is resolvable; no orphan entries.
  • Consistency across planes: A/B plane configurations match on all “must-equal” items.
  • Visibility is mandatory: any policing/shaping decision must leave counters or events.
  • Invariants must hold: critical classes cannot be pushed into uncontrollable congestion by mapping.

Copyable configuration checklist (layered)

Port-level (4)

  1. Port role and plane (A/B) are explicitly labeled and audited.
  2. Link events are recorded (up/down, retrain if available).
  3. Error counters are readable/exportable (CRC, symbol/alignment if exposed).
  4. Mirror/SPAN ports are isolated from critical forwarding and have a discipline policy.

Queue / QoS-level (4)

  1. Class-to-queue mapping is explicit and reviewable (no “default ambiguity”).
  2. Critical traffic is not co-located with best-effort in a way that amplifies tail latency.
  3. Congestion behavior is defined (what drops, what remains bounded).
  4. Shaping granularity matches the intended jitter control boundary (port/class as supported).

VL / policing-level (6)

  1. VL table entries are complete: ID, binding, and key parameters are present.
  2. BAG intent is documented (what behavior it enforces, not just the value).
  3. Frame size boundary is aligned: MTU assumptions match the link domain (avoid silent drops).
  4. Policing action is defined (drop/mark) and produces measurable stats.
  5. Per-VL or per-class stats exist for violations (policing drops are observable).
  6. Stats reset and observation windows are defined for comparisons (trend-friendly).

Time sync (2)

  1. PTP mode is explicitly configured and traceable per plane/port where applicable.
  2. Drift symptoms are monitored with thresholds (principle: detect before it becomes operationally “random”).

The 5 most commonly missed items (latent fault creators)

  • BAG / MTU boundary: misalignment leads to silent drops or hidden tail-latency shifts.
  • Queue mapping: critical traffic accidentally shares congestion with noncritical flows.
  • Mirror port discipline: no rate/window control turns diagnostics into a new bottleneck.
  • PTP mode drift risk: mode/config mismatch creates “good in lab, bad in field” timing error.
  • Alarm thresholds: without triggers, faults remain invisible until they become incidents.

Change control (governance without security scope creep)

Version & diff

Every change produces a version ID and a human-auditable diff grouped by layer (port / queue / VL / time sync). The goal is clarity: what changed, where, and why.

Pre-deploy validation

Run the verification rules: schema/consistency, visibility, and invariants. If a policing/shaping decision cannot be observed, it cannot be safely governed.

Rollback readiness

Define “known-good” versions and monitoring triggers. Rollback is a planned step with clear conditions, not an emergency improvisation.

Configuration Governance Flow — Preventing Latent Faults Config Version Diff Validation Schema Invariants Deployment Plane A Plane B Monitoring Counters Events Rollback Trigger Known-good Keep the flow simple: version + validate + deploy + monitor. Rollback is planned and trigger-driven.
Figure F9. A change-control flow designed for deterministic behavior: validation and monitoring define rollback conditions before deployment.

H2-10 · Validation & worst-case testing: proving bounded latency and robustness

“Deterministic” must be proven under stress. Validation should separate (1) functional correctness, (2) worst-case performance boundaries, and (3) robustness to faults. Acceptance criteria should be expressed as bounds, percentiles, and hold-time windows—not averages.

How to express acceptance (avoid “average-only”)

  • Upper bound: the measurable worst-case (max) must stay below a defined ceiling.
  • Tail percentiles: P99 / P99.9 reveals whether the tail explodes under contention.
  • Hold-time window: bounds must remain valid for a sustained duration under a defined stress profile.

Validation checklist (test → expected behavior → fail criteria)

Test Expected behavior Fail criteria
Policing violation injection Violations are contained; policing stats rise; critical traffic remains stable Critical class exceeds latency ceiling or shows unexpected drops
Class/queue mapping verification Critical class stays isolated from best-effort contention under load Tail latency inflates when noncritical traffic is added
Worst-case fan-in to a single egress Bounded latency holds; jitter remains within defined ceiling Max or P99.9 exceeds limits; sustained instability over hold-time
Micro-burst stress (short bursts) Queues absorb bursts as designed; drops remain bounded and observable Unexpected drops; queue behavior cannot be explained via counters
Sustained congestion storm (noncritical) Containment is local; critical class remains bounded and measurable Congestion spreads across planes/ports; critical latency breaks
Link flap injection (single port / plane) Fault is observable; effects do not cascade; recovery is explainable Fault cascades to other ports/planes or leaves no evidence trail
Error frame injection (CRC/error bursts) Error counters rise; alarms trigger as configured; critical forwarding remains bounded Counters/alarms fail to capture the event or bounded behavior breaks

Context note

Validation is typically executed within the target compliance and environmental constraints, but the focus here is strictly on switch-side bounded latency, robustness, and evidence generation.

Worst-Case Test Bench — Proving Bounded Latency & Robustness Traffic Generator Load Burst Plane A Plane B AFDX Switch Worst-case Fan-in Analyzer Latency Jitter Drops Fault Injection Flap Errors Focus on worst-case: fan-in contention + bursts + injected faults. Report bounds, tail percentiles, and hold-time.
Figure F10. A practical bench separates correctness, worst-case boundaries, and robustness while keeping evidence traceable via counters and event logs.

H2-11 · BOM / IC selection checklist (switch ASIC, PHY, clock, PMIC) — criteria + concrete part numbers

This section turns “determinism + evidence” into a procurement-ready BOM shortlist. Selection is driven by measurable capabilities (policing/shaping/queues, timestamp behavior, counters/diagnostics, and power sequencing). Part numbers below are practical candidate pools—final fit must be confirmed against the latest datasheet feature tables and temperature / lifecycle requirements.

1) Switch ASIC / platform (what directly impacts bounded latency and evidence)

Selection criteria (use as pass/fail bullets)

  • Port count + speed mix: 100M / 1G / mixed, plus spare ports for monitoring and growth.
  • Queueing resources: queue count, queue isolation, and scheduler options (priority/WRR/etc. as exposed).
  • Shaping capabilities: ability to shape per port and/or per traffic class (jitter control boundary).
  • Ingress policing support: rate enforcement per flow/class; defined violation actions (drop/mark) with stats.
  • Fabric headroom: oversubscription behavior and worst-case queue build-up conditions.
  • Store-and-forward vs cut-through behavior: understand tail-latency implications (verify on bench).
  • Observability granularity: per-port / per-queue / per-class counters; drops must be explainable.
  • Mirroring/SPAN: filter capability and discipline policy to prevent diagnostics from causing congestion.
  • Timestamp support: hardware timestamp behavior and the path points that remain stable under load.
  • Operating envelope: temperature range, supply rails, package/manufacturability, and lifecycle constraints.

Concrete candidate part numbers (shortlist pool)

  • Microchip LAN9662 (TSN-capable managed switch platform; verify shaping/queue stats and timestamp features)
  • NXP SJA1105 family (e.g., SJA1105EL/QL/TEL variants; verify exact TSN/PTP feature set by variant)
  • Microchip KSZ9567 / KSZ9477 (managed switch families; verify time-sync and shaping capabilities for the target design)
  • Marvell / Infineon 88Q6113 (high-port automotive-class switch; verify feature set and availability for program needs)
  • Broadcom BCM53xx / BCM56xx families (managed Ethernet switch families; verify industrial availability and feature licensing)

Tip: keep at least two switch candidates in the RFQ to avoid “single-vendor lock” during supply events.

2) Ethernet PHY (link integrity, BER symptoms, and diagnosability)

Selection criteria (what prevents “mystery link flap”)

  • Speed support: 100BASE-TX vs 1000BASE-T vs 1000BASE-X matched to the architecture.
  • MAC–PHY interface: MII/RMII/RGMII/SGMII (timing + layout risk boundary).
  • Diagnostic registers/counters: CRC/alignment/symbol errors and link up/down events.
  • Cable/connector sensitivity: margin against return loss, crosstalk, and harness variability.
  • Temperature behavior: stability of link and error rate across the operating range.
  • EMI headroom: enough margin so small layout/harness changes do not cause field dropouts.
  • Loopback features: local/remote loopback to shrink the field debug path.
  • Power rails: rail count, IO voltages, and susceptibility to supply noise.
  • Second-source plan: two PHY candidates with equivalent interface strategy.

Concrete PHY part numbers (grouped by common use)

  • 10/100 copper (classic control networks): TI DP83848, Microchip KSZ8081 / KSZ8091
  • Gigabit copper (growth headroom): TI DP83867 / DP83869, Microchip KSZ9031 / KSZ9131, Marvell 88E1512
  • SGMII / 1000BASE-X (optical/backplane style): Microchip VSC85xx family (variant-dependent)

Debug mapping rule: symptom → switch counter. For example: CRC spikes → error counters; repeated link up/down → link event log; sporadic bursts → queue drop stats (if exposed).

3) Clocking (jitter budget → timestamp stability and link behavior)

Selection criteria (board-level clock only)

  • Clock tree needs: number of outputs and required frequencies for switch/PHY/management domains.
  • Jitter relevance: edge uncertainty can show up as timestamp instability or PHY recovery sensitivity.
  • Jitter cleaning boundary: use a jitter attenuator when the reference quality is variable or multi-domain.
  • Temperature stability: drift and aging behavior across the operating range.
  • Input strategy: single reference vs multiple references (switching must not introduce discontinuities).
  • Supply sensitivity: vulnerability to rail noise (keep rails clean and monitored).
  • Integration: package, output standards, and layout constraints.
  • Evidence hooks: ability to expose lock status / fault pins (if used) for event logging.

Concrete clock / jitter parts (candidate pool)

  • Silicon Labs / Skyworks Si5345 (jitter attenuator / clock generator family)
  • Silicon Labs / Skyworks Si5341 (same class, output/feature variant)
  • TI LMK05318 (clock generator / jitter cleaner class; verify IO standards needed)
  • Renesas (IDT) 8A34xxx family (timing / jitter solutions; variant-dependent)

Practical rule: if timestamp error grows under congestion or temperature swings, check clock stability + queueing effects together. Clocking alone rarely explains everything, but weak clocking makes the tail worse.

4) PMIC / supervisor / sequencer (power-up order, reset evidence, power-fail logging)

Selection criteria (stay at board-level governance)

  • Sequencing: multi-rail ordering, delays, dependencies (avoid “random boot behavior”).
  • Voltage monitoring: UV/OV thresholds and accuracy; clear fault signaling.
  • Reset strategy: reset outputs and hold times; clean deglitch to prevent false resets.
  • Power-fail signaling: early warning to support event logging and controlled shutdown behavior.
  • PG aggregation: combining multiple rails into a deterministic “system ready” condition.
  • Fault latching: retain reset/power-fail reasons for post-event diagnosis.
  • Telemetry (if used): rail status visibility that supports health monitoring (do not overcomplicate).
  • Supply architecture: rail count, transient response needs, and manufacturability.
  • Second-source plan: at least one alternative supervisor/sequencer path.

Concrete PMIC / supervisor parts (candidate pool)

  • ADI / Linear LTC2937 (multi-rail supervisor / monitoring class)
  • ADI / Linear LTC2974 / LTC2977 (power system manager class; use when telemetry/control is required)
  • TI TPS3808 (supervisor / reset generator class)
  • TI TPS386000 (supervisor class; variant-dependent thresholds/features)
  • Maxim / Analog Devices MAX16052 (supervisor class; check availability/lifecycle)

Evidence rule: without power-fail and reset-cause visibility, “rare field resets” become unprovable and non-actionable.

5) Supporting parts (NVM + management MCU) — keep governance practical

NVM (config versions + event logs)

  • SPI NOR flash: Winbond W25Q128JV, Macronix MX25L128xx, Micron MT25Q series
  • High-endurance option (if frequent writes): Infineon/Cypress FM25Vxx FRAM family

Keep log format simple: version ID, event type, timestamp, port/class counters snapshot.

Management MCU (control-plane only)

  • ST STM32H7 (high-performance management/control)
  • ST STM32F7 (mid-class management/control)
  • Microchip SAM E70 (industrial-grade MCU option)
  • NXP LPC55Sxx (MCU family; keep security features out of scope here)

The MCU role stays narrow: configuration load, health readout, event log shipping, and safe rollback triggers.

RFQ “minimum information” (copy/paste checklist)

Target: AFDX/ARINC 664 switching platform (switch-side determinism + evidence)

Ports: ____ total   Speed mix: [100M] [1G] [mixed]   Spare for monitoring: [yes/no]
Queues/classes: ____ queues (or ____ priority levels)   Class-to-queue mapping: [explicit required]
Shaping: [per-port] [per-class]  Granularity requirement: ____________________________
Policing: [required/optional]  Violation action: [drop/mark]  Violation stats: [required]
Counters: [per-port] [per-queue] [per-class]  Drop attribution required: [yes/no]
Mirroring/SPAN: [required/optional]  Filter: [yes/no]  Mirror discipline: [rate/window]
Timestamp/PTP (switch config points only): mode: __________  HW timestamp: [required/optional]
Operating range: temperature: ________  lifecycle: ________  second-source: [required/optional]
Power sequencing: rails: ____  order constraints: _________________________________
Supervisor: reset outputs: ____  power-fail signal: [required/optional]  reset-cause latch: [yes/no]
NVM: type: [SPI NOR/FRAM]  capacity: ____  endurance requirement: __________________
Deliverables: datasheet + feature table confirmation + availability/lifecycle statement
BOM Layering — What Must Exist to Prove Determinism Switch ASIC Queues / Shaping Policing / Counters PHY Block(s) MAC ↔ PHY PHY Block(s) Magnetics/Cable Clock Tree XO / Jitter Cleaner PMIC Sequencer Reset / PG Power-fail Mgmt MCU Config Logs NVM Config / Events Keep labels short. The goal is BOM clarity: ASIC capabilities + diagnosable PHY + stable clock + controlled power + versioned config/logs.
Figure F11. BOM layering view: the switch ASIC depends on PHYs, a stable clock tree, sequenced power/reset, and a management path with NVM for versioned configuration and event evidence.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12 · FAQs (AFDX / ARINC 664 Switch)

These FAQs focus on switch-side determinism, governance, and evidence: bounded latency/jitter, VL/BAG policing, queue/shaping behavior, A/B fault containment, PTP timestamp stability, PHY integrity symptoms, and verification after configuration changes.

1) What is the practical “determinism difference” between an AFDX switch and a normal Ethernet switch?
An AFDX-oriented design is judged by provable worst-case behavior, not just average throughput. It relies on controlled injection and forwarding (VL governance with policing and shaping), queue isolation that prevents microburst tail spikes, and evidence hooks (drop/counter attribution, mirroring, logs). The outcome is bounded latency/jitter with explainable loss events.
2) How do VL and BAG work together to limit congestion and jitter?
A Virtual Link (VL) defines a controlled logical flow, and BAG enforces a minimum inter-frame spacing for that flow. Together, they cap injection rate so bursts cannot endlessly pile into queues. When the switch also applies consistent queue mapping and egress shaping, queue build-up becomes predictable and jitter is constrained to a measurable upper bound.
3) What happens when policing triggers (drop/mark), and how can it be located quickly?
Policing triggers when traffic violates the configured limits (rate/BAG intent, class rules, or other enforcement boundaries exposed by the platform). The usual actions are drop or a policy mark, and the key is attribution. Check policing-drop counters by port/class/VL (if available), correlate timestamps with the traffic generator window, and use a filtered, rate-limited mirror session to confirm the violating frames without creating extra congestion.
4) How do store-and-forward and cut-through affect worst-case latency?
Store-and-forward adds a more consistent “frame buffering” component and can reject errored frames before forwarding, often making behavior easier to bound. Cut-through can reduce average latency, but tail behavior becomes more sensitive to congestion, error handling, and internal contention. The practical test is worst-case fan-in under load: measure max and high percentiles (e.g., P99.9), not only the mean.
5) Why can a poor queue configuration create “sporadic jitter spikes” even when average load looks fine?
Jitter spikes usually come from tail queueing: critical traffic shares a queue with bursty or poorly-shaped traffic, so short microbursts momentarily block the critical frames. The average load may stay low while the tail explodes. Fixes are switch-side: enforce deterministic class-to-queue mapping, reserve bandwidth via shaping, avoid head-of-line blocking patterns, and monitor per-queue drops/occupancy signals when the silicon exposes them.
6) What are common A/B dual-network fault modes, and how can the switch contain them?
Common modes include link flaps on one plane, localized congestion storms, or a single port generating errored frames that flood retry/processing. Switch-side containment is built on strict plane separation (independent PHY/power/config where possible), fast port isolation policies, storm control, and fault counters that trigger alarms. Verification should inject each fault while confirming that only one plane is affected and evidence is captured.
7) Why can PTP timestamp accuracy degrade under high load, and what should the switch avoid?
Under load, queueing increases and becomes more variable, so residence time through the switch fluctuates and adds timing uncertainty—especially if the timestamp path is not strictly hardware-based. Avoid software timestamps, avoid timestamping on mirror/SPAN paths, and avoid letting PTP packets fight in heavily congested queues. Prefer hardware timestamping with controlled queue priority and stable egress shaping.
8) Is hardware timestamping at the MAC or the PHY “more correct” for a switch design?
PHY-level timestamping is closer to the physical medium boundary and can reduce ambiguity from internal interface delays, but it increases integration and calibration complexity. MAC-level timestamping is simpler and widely supported, but it may include more variability if PHY delays drift with temperature. The right choice is the one with a documented timestamp point, stable behavior under congestion, and validated error across temperature/load sweeps.
9) What does a surge in PHY CRC error counters usually indicate?
A CRC spike almost always points to physical-layer integrity stress: cable/connector issues, magnetics margin, EMI coupling, return-loss problems, or supply noise that affects the PHY analog front-end. Correlate CRC with link up/down events and alignment/symbol errors, compare both link ends if available, and check whether errors rise with temperature, vibration, or specific harness routing. Then validate with a controlled swap test.
10) What is the most effective 5-step field packet-capture and fault isolation workflow?
Step 1: pin down the symptom type and time window (drops, jitter spikes, or link flap). Step 2: read switch counters (CRC/link/drops/policing). Step 3: enable a filtered, rate-limited mirror on the smallest set of ports. Step 4: reproduce with a worst-case traffic pattern or fault injection. Step 5: close the loop with a config change plus regression evidence and a recorded version ID.
11) After a configuration change, how can determinism be proven quickly without a full re-qualification?
Start with a strict diff: identify whether the change touched port/queue/VL/time-sync settings. Then run a minimal worst-case regression focused on the affected path: fan-in contention, burst shaping boundaries, and policing violations. Report determinism using max and high percentiles (not only average), over a defined duration window, while confirming drop counters remain attributable. Keep a rollback-ready configuration snapshot.
12) Which selection metrics are most often overlooked but can destroy verifiability?
The biggest misses are tail-focused and evidence-focused: worst-case latency/jitter (not just throughput), queue/shaping granularity, and counter visibility that can attribute drops by port/class/VL. Mirroring must support filtering and rate discipline, timestamps must remain stable under congestion, and power-fail / reset-cause visibility must exist for field events. Without these, failures become unprovable and non-actionable.