AFDX / ARINC 664 Switch (Deterministic Avionics Ethernet)
← Back to: Avionics & Mission Systems
An AFDX/ARINC 664 switch is built for provable determinism: it controls traffic injection (VL/BAG policing), isolates and shapes queues to bound latency/jitter, and exposes counters/mirroring/logs so drops and spikes can be explained and verified. In practice, the “right” switch is the one that can keep worst-case behavior inside limits under load and faults, and still provide the evidence needed for fast field diagnosis and safe configuration changes.
H2-1 · What this page solves (AFDX switch boundary & outcomes)
This page focuses on the switch viewpoint: how an AFDX/ARINC 664 switch enforces deterministic forwarding with VL policing/shaping, supports A/B redundancy without fault spread, and exposes diagnostics that can be verified in test and in the field.
What a reader should get from this page
- Bounded latency and jitter: understand what sets the worst-case delay budget (serialization + switching + queueing) and how the switch constrains the variable part.
- Virtual Link governance: know how VL/BAG policing and shaping turn bursty traffic into predictable behavior that can be proven.
- A/B redundancy with fault containment: identify failure signals and isolation tools so a single bad port or stream cannot contaminate the whole network plane.
- Evidence-based diagnostics: counters, mirroring, and event records that create a traceable “proof chain” for acceptance testing and troubleshooting.
Typical reasons people land here
- “Latency looks fine on average, but jitter spikes appear under load.”
- “Drops occur only in certain modes—suspected policing/shaping or queue mapping issues.”
- “A/B redundancy exists, yet failover behavior is unclear or hard to verify.”
- “Field troubleshooting lacks evidence: no counters/mirroring workflow.”
What this page is NOT
- Not a MIL-STD-1553B / ARINC 429 / CAN (ARINC 825) tutorial.
- Not a mission-computer PCIe/NVMe architecture guide.
- Not a GPSDO/atomic-clock deep dive (timing sources are only referenced as needed for switch behavior).
- Not a full compliance handbook; it stays on engineering mechanisms and verifiable outcomes.
Engineering rule of thumb: determinism is proven at the worst case, not by averages. A switch must provide both traffic governance (policing/shaping) and observable evidence (counters/mirroring) to make that proof possible.
H2-2 · ARINC 664 / AFDX essentials in 90 seconds (VL, BAG, policing)
Extractable answer block
In AFDX (ARINC 664), a Virtual Link (VL) defines a controlled traffic flow. The Bandwidth Allocation Gap (BAG) limits how often frames may be sent, and the switch enforces policing and shaping so bursts cannot create unbounded queueing. This is the foundation of deterministic latency and jitter.
The practical goal is simple: convert “unpredictable bursts” into behavior that remains predictable under worst-case contention. Three control points work together—ingress policing, queue mapping, and egress shaping.
| Term | What it controls | Engineering failure mode if wrong |
|---|---|---|
| Virtual Link (VL) | Traffic identity: how frames are classified into policies, queues, and limits. | Wrong queue/policy; unintended drops or jitter spikes that only appear under load. |
| BAG | Pacing ceiling: the minimum interval between transmissions for a VL. | Burstiness pushes queueing delay upward; “average OK” but worst-case jitter breaks determinism. |
| Policing (ingress) | Compliance enforcement: what happens when a VL exceeds its allowed envelope. | Unbounded contention if absent; silent drops if too strict; hard-to-debug intermittent behavior. |
| Shaping (egress) | Scheduling discipline: how competing VLs share an output port in a predictable way. | Queueing becomes the dominant variable; jitter spikes emerge when multiple streams collide. |
Quick intuition: for a given VL, the average-rate ceiling scales roughly with frame size / BAG. Smaller BAG allows higher average throughput, but also raises the risk that burst alignment across VLs inflates queueing delay. Policing and shaping keep that “alignment risk” inside a verifiable bound.
Three determinism control points (switch-centric)
- Ingress policing: constrains input bursts so worst-case contention is provable.
- Queue mapping: isolates classes/VLs so one stream cannot steal latency budget from another.
- Egress shaping: enforces a predictable transmit schedule at the port, reducing jitter spikes.
H2-3 · Switch architecture that actually matters (pipeline, buffers, fabric)
Deterministic performance is set by what happens inside the switch under contention. The fixed part of delay comes from serialization and the forwarding path; the variable part (jitter spikes) is driven by queueing and resource arbitration. This section highlights architectural details that directly change worst-case latency and how to verify them.
Store-and-forward vs cut-through
Impact: Store-and-forward adds a larger fixed delay (full frame reception + checks), while cut-through reduces average delay by forwarding early. However, cut-through designs can fall back under certain conditions (congestion, filtering, mirroring, error handling), creating discontinuous worst-case latency.
- Verify: measure min/avg/max latency across (a) light load, (b) heavy contention, (c) mirroring enabled, (d) error injection.
- Pass signal: max latency remains bounded and does not “jump” when features toggle.
Buffers and head-of-line blocking
Impact: Shared buffers and shared queue banks can cause head-of-line blocking: a bursty stream occupying a shared resource delays unrelated traffic, inflating worst-case queueing. Large buffers may hide congestion while increasing worst-case delay.
- Verify: run a burst stress stream alongside a critical VL; check whether critical max latency inflates without proportional throughput change.
- Observe: per-port drops, congestion counters, and (if available) queue watermarks or queue-drop counters.
Fabric bandwidth and oversubscription
Impact: Internal fabric arbitration and oversubscription can introduce hidden contention even when external links look underutilized. Under specific port combinations, fabric congestion manifests as queueing jitter spikes or selective drops.
- Verify: multi-port “fan-in” tests (many ingress ports converging on one egress) and “fan-out” tests (one ingress feeding many egress).
- Pass signal: worst-case latency bound stays stable across port-combination stress, not just average throughput.
Common misconception
- High throughput does not imply bounded worst-case latency.
- Average latency can look excellent while max latency fails determinism requirements.
- Testing only at light load misses mode changes and hidden contention paths.
H2-4 · Determinism toolkit: QoS, shaping, policing, and bounded latency
The objective is not “low average delay.” The objective is a provable upper bound on latency and jitter under worst-case contention. That bound is set by what can and cannot vary inside the switch.
Worst-case latency decomposition (engineering view)
Worst-case latency = Serialization (line rate × frame length) + Switching path (forwarding/pipeline) + Queueing (contention).
Only the queueing term can “explode” without governance—so determinism is primarily achieved by policing, queue isolation, and egress shaping.
QoS classification → queue mapping
VLAN/PCP (and similar tags) are only useful if they deterministically map traffic into the intended queues and policies. The label is not the guarantee; the queue isolation and policy binding behind the label are.
Policing (ingress) → bounded input envelope
Policing ensures a VL or class cannot inject bursts that invalidate worst-case proof. A good setup provides observable evidence (drops/violations counters) so configuration and field behavior match.
Shaping (egress) → predictable schedule
Shaping turns competition into a predictable transmit schedule. Two practical scopes are common: per-port shaping to control total output behavior and per-class shaping to protect timing-critical traffic windows.
Configuration checklist (lock these to make bounds provable)
- Classification rules fixed: VLAN/PCP mapping is deterministic with a safe default for unknown traffic.
- Queue mapping explicit: critical traffic maps to protected queues; avoid mixing bursty streams into the same queue bank.
- Ingress policing enabled: define what happens on violation (drop/mark/limit) and ensure per-policy counters exist.
- Shaping scope chosen: per-port shaping for overall stability; per-class shaping for timing-critical windows.
- Scheduling behavior known: understand priority handling under congestion (avoid surprises that starve a class).
- Mirroring plan: port mirroring supports capture without overloading the mirror destination (rate limits as needed).
- Counter set captured: per-port CRC/errors, per-policy drops, and link up/down events are collected and trended.
- Worst-case test profile: verify max latency/jitter under fan-in/fan-out and burst alignment scenarios.
- Feature toggles validated: check max latency when mirroring/diagnostics are enabled to avoid hidden mode changes.
H2-5 · Redundancy: dual network A/B, fault containment, and failover behavior
Dual-plane A/B redundancy is not “two cables.” It is two independent fault-containment domains designed to keep a localized issue (a bad port, a flapping link, or a bursty stream) from degrading deterministic behavior across the network. This section stays strictly on what can be enforced and observed at the switch.
A/B isolation principles (switch viewpoint)
- Independent PHY/port domains: Plane A link faults must not toggle Plane B link state or counters.
- Independent power/reset domains (principle): a brown-out or reset event in one plane should not restart the other plane.
- Configuration mirrored but controlled (principle): policies should match across A/B, but mirroring must avoid “copying a mistake everywhere.”
- Evidence-first operation: every containment action should have measurable signals (events/counters) that prove why it occurred.
Fault → observable signal → containment action
| Fault pattern | Observable signals (switch-side) | Containment strategy (switch-side) |
|---|---|---|
| Single-port jitter / micro-bursts | Queue drops rise, latency spikes on affected egress, policing violations (if enabled) | Enable/ tighten policing + shaping, isolate mapping to protected queues, rate-limit offender |
| Link flap (up/down oscillation) | Rapid link events, error spikes around transitions, intermittent drops | Port isolate / shut, hold-down policy (principle), capture events + counters for root-cause |
| Plane-specific frame loss | CRC/symbol errors, per-port error counters, plane-A only drops (plane-B clean) | Keep fault inside the plane: isolate bad port, verify independent PHY domain, alert with plane tag |
| Policing violations | Policing drop/violation counters climb; other traffic sees jitter relief | Treat as misbehaving source: keep policy strict, export evidence, isolate if recurring |
| Storm / flooding pattern | Storm-control hits, port utilization spikes, broad performance degradation without single-VL signature | Storm control / rate limiting (principle), port isolate, mirror for evidence capture |
| Diagnostics side-effects | Mirror destination congestion; analysis port drops; “measurement changes behavior” | Rate-limit mirroring (principle), schedule capture windows, keep diagnostics out of critical paths |
How to verify A/B containment (switch-only evidence)
- Inject a controlled fault into Plane A (flap a link, burst a port, or violate policing).
- Confirm Plane B counters and link state remain stable (no mirrored error signatures).
- Validate containment actions are local: the switch isolates the offender and the blast radius is limited to the plane/port domain.
- Record evidence: link events + CRC/error counters + policing drops + queue drops (as supported).
Boundary note: this page covers switch-side containment and evidence. End-system or application-level fusion/voting/duplicate handling belongs to system pages, not this switch explainer.
H2-6 · Timestamping in an AFDX switch: IEEE 1588 PTP (what to do and what to avoid)
PTP accuracy depends less on the protocol name and more on where timestamps are taken and how load-dependent delay is handled. A switch must keep timestamping close to the physical boundary (MAC/PHY), prevent queueing from contaminating timestamps, and avoid asymmetry that turns into systematic offset.
Three must-check points (switch-side)
- Timestamp point: hardware timestamps taken at the MAC/PHY boundary (not in software paths).
- Queue influence: verify whether congestion changes timestamp stability; queueing variability must not masquerade as timing drift.
- Link symmetry: asymmetric paths or rate mismatches introduce systematic errors that cannot be “averaged away.”
Residence time (concept)
Residence time is the time a PTP event frame spends inside the switch (ingress to egress). If this internal delay varies with load, it appears as jitter in timing. The core engineering goal is to keep this behavior stable and observable.
Correction field (concept)
Some switch designs account for internal residence effects by applying a correction concept. The key practical test is simple: does timing error stay stable when the switch is stressed, or does it drift with queueing and arbitration?
Common pitfalls
- Software timestamping: acceptable at light load, but error grows sharply under contention.
- Measuring via a mirror port as “ground truth”: mirroring can reorder or rate-limit and alter timing observability.
- Congestion-sensitive timestamps: queueing contaminates the apparent timing and looks like “clock drift.”
- Uncontrolled asymmetry: mismatched link paths or rate conversion introduces systematic offset.
- PHY delay drift: temperature/conditions shift PHY delays; trending counters and stability checks matter.
Practical verification (no tool dependency)
- Establish a baseline offset/jitter at light load.
- Apply a controlled contention profile (fan-in or burst traffic) while keeping links stable.
- Compare offset/jitter before vs during stress; large changes indicate queue influence or timestamp point issues.
- Correlate with switch evidence: congestion counters, drops, and link events around the error window.
- If errors scale with load, prioritize verifying hardware timestamp position and the delay path through queues.
SyncE may help reduce wander/jitter in some designs, but it is treated as optional support here—not a standalone tutorial.
H2-7 · Ethernet PHY & interfaces: what impacts integrity (EMI, BER, link stability)
Link integrity is part of determinism: if the PHY layer is marginal, the network can look “random” even with perfect shaping and policing. This section focuses on practical PHY and interface boundaries that affect link stability, and on switch-side evidence that narrows root cause without turning into a full EMC tutorial.
PHY families: practical selection boundaries
100BASE-TX
- Use when: a stable, well-understood copper link is needed with simpler signal conditioning.
- Failure signature: CRC errors and occasional drops often appear before frequent link flaps.
- Switch evidence: CRC trend + link event rate distinguish margin loss vs intermittent connection.
1000BASE-T
- Use when: higher throughput is required and cabling/connector quality is controlled.
- Engineering reality: more sensitive to cable/connector changes and environment-driven margin shrink.
- Switch evidence: symbol/alignment/PCS-type errors (if exposed) + retrain / flap events matter.
1000BASE-X
- Use when: the medium is not classic copper, or a serial link boundary is preferred.
- Trade: moves sensitivity away from magnetics/cable toward module/serial link conditions.
- Switch evidence: link stability + error counters still determine whether issues are physical or policy-driven.
MAC-PHY interfaces: boundaries that affect stability (not a layout tutorial)
RGMII
Practical boundary: parallel timing sensitivity can turn board-level variation into intermittent errors. If link issues correlate with temperature or vibration windows, verify interface stability assumptions.
SGMII
Practical boundary: a serial interface often improves consistency, but relies on stable clocking and correct negotiation/config. Confirm that the link does not “retrain” under stress.
USXGMII
Practical boundary: multi-rate flexibility increases configuration surface. In deterministic networks, avoid silent mode changes and ensure the switch exposes clear state and counters for link behavior.
Engineering indicators that matter (switch-side interpretation)
- BER (practical): manifests as rising CRC/symbol errors and eventually as drops or retrains; trends are more valuable than single snapshots.
- Jitter tolerance (practical): poor tolerance often shows as intermittent errors under specific load/temperature/vibration windows.
- Temperature drift (practical): error counters grow with time/temperature even when link stays “up.”
- Cable/connector sensitivity: link flap + error spikes align with mechanical events or environmental transitions.
Symptoms → most likely cause → switch-side evidence
| Symptom | Most likely cause | What to check on the switch |
|---|---|---|
| CRC errors climb, link stays up | Margin shrinking (noise/temperature/cable quality) | CRC trend vs temperature/time, symbol/alignment errors (if available), drops staying low |
| Link flap with error spikes | Intermittent connection or negotiation boundary instability | Link up/down event rate, retrain counts, error bursts aligned to events |
| Alignment / symbol errors appear | PHY-level integrity loss under conditions | Symbol/alignment counters (if exposed), correlation to load/temperature windows |
| Drops rise without PHY errors | Congestion/policy governance rather than physical integrity | Port drops vs policing drops vs shaping stats; link events remain quiet |
H2-8 · Diagnostics & health monitoring: counters, mirroring, built-in tests, event logs
A deterministic switch should not only forward traffic—it should prove health and shorten field triage. The most valuable capability is a structured evidence chain: counters → health decisions → event logs → maintenance export, so intermittent faults can be reproduced and contained.
Observability checklist (what a good switch exposes)
Per-port
- CRC / symbol errors
- Link up/down (with timestamps)
- Drops (port/queue, if exposed)
- Queue indicators (if available)
Per-class / policy
- Policing drops (violations)
- Shaping statistics (if supported)
- Priority/class counters
- Policy hit-rate evidence
Evidence capture
- Mirroring / SPAN strategy
- Rate limits (avoid new bottleneck)
- Windowed capture (principle)
- Export path to maintenance
BIT / BIST + logs
- Power-up self-test (BIT) with record
- Periodic self-test (BIST) (concept)
- Loopback / signature checks (concept)
- Event logs with snapshots
“Diagnostic dashboard” cards (metric → trigger → action)
Link stability
Metric: link up/down + retrain
Trigger: burst of events in a short window or repeatable periodic flaps
Action: isolate the port/plane, capture counters snapshot, verify cable/connector domain
Integrity errors
Metric: CRC / symbol / alignment
Trigger: monotonic growth trend or temperature-correlated escalation
Action: flag as margin risk, trend over time, correlate to link events and load windows
Congestion evidence
Metric: drops / queue indicators
Trigger: drops rising without PHY error growth
Action: separate port drops vs policy drops; tighten shaping/policing where needed
Policy violations
Metric: policing drops / shaping stats
Trigger: sustained violations on specific classes/flows
Action: treat as misbehavior evidence; keep containment local and export proof
Mirroring strategy
Metric: mirror enable state + mirror port utilization
Trigger: mirror port becomes congested or capture changes traffic behavior
Action: rate-limit or window captures; keep diagnostics out of critical paths
BIT/BIST evidence
Metric: self-test status + last-pass timestamp
Trigger: any self-test failure or repeated marginal warnings
Action: log snapshot, isolate domain, re-run targeted tests, export event package
Fastest field triage (5 steps, switch-side)
- Check link events first: flap/retrain indicates a physical or negotiation boundary.
- Check integrity counters: CRC/symbol/alignment growth pattern reveals margin vs intermittent faults.
- Separate drops: port/queue drops vs policing drops to distinguish congestion from integrity.
- Capture evidence only when needed: enable mirroring in windows and watch mirror port load.
- Commit an event package: logs + counter snapshots around the incident enable traceable maintenance.
H2-9 · Configuration & verification: VL tables, policing rules, and change control
Configuration is where “latent faults” are born: a small BAG, MTU, queue mapping, or mirror setting change can silently shift worst-case behavior without breaking day-to-day operation. This section treats the switch as a governed artifact: configuration layers are explicit, validation is repeatable, and change control is designed for predictable rollback.
Configuration layers (what each layer controls)
Port-level
- Link behavior baseline (events, stability, counters exposure)
- Mirror / SPAN policy placement and rate discipline
- Local containment knobs (storm-limiting principles, if supported)
Queue / QoS-level
- Queue count and class-to-queue mapping
- Congestion behavior boundaries (what happens under load)
- Egress shaping granularity (port/class where applicable)
VL / Policing-level
- VL table completeness (binding + parameters)
- BAG and frame size boundary (MTU alignment)
- Policing action + visibility (drops must be measurable)
Time sync (switch config points)
- PTP mode selection (as configured on the switch)
- Timestamp behavior settings exposed by the device
- Monitoring thresholds for drift symptoms (principle)
Verification rules (repeatable, switch-side)
- Schema & references: every port/class/VL reference is resolvable; no orphan entries.
- Consistency across planes: A/B plane configurations match on all “must-equal” items.
- Visibility is mandatory: any policing/shaping decision must leave counters or events.
- Invariants must hold: critical classes cannot be pushed into uncontrollable congestion by mapping.
Copyable configuration checklist (layered)
Port-level (4)
- Port role and plane (A/B) are explicitly labeled and audited.
- Link events are recorded (up/down, retrain if available).
- Error counters are readable/exportable (CRC, symbol/alignment if exposed).
- Mirror/SPAN ports are isolated from critical forwarding and have a discipline policy.
Queue / QoS-level (4)
- Class-to-queue mapping is explicit and reviewable (no “default ambiguity”).
- Critical traffic is not co-located with best-effort in a way that amplifies tail latency.
- Congestion behavior is defined (what drops, what remains bounded).
- Shaping granularity matches the intended jitter control boundary (port/class as supported).
VL / policing-level (6)
- VL table entries are complete: ID, binding, and key parameters are present.
- BAG intent is documented (what behavior it enforces, not just the value).
- Frame size boundary is aligned: MTU assumptions match the link domain (avoid silent drops).
- Policing action is defined (drop/mark) and produces measurable stats.
- Per-VL or per-class stats exist for violations (policing drops are observable).
- Stats reset and observation windows are defined for comparisons (trend-friendly).
Time sync (2)
- PTP mode is explicitly configured and traceable per plane/port where applicable.
- Drift symptoms are monitored with thresholds (principle: detect before it becomes operationally “random”).
The 5 most commonly missed items (latent fault creators)
- BAG / MTU boundary: misalignment leads to silent drops or hidden tail-latency shifts.
- Queue mapping: critical traffic accidentally shares congestion with noncritical flows.
- Mirror port discipline: no rate/window control turns diagnostics into a new bottleneck.
- PTP mode drift risk: mode/config mismatch creates “good in lab, bad in field” timing error.
- Alarm thresholds: without triggers, faults remain invisible until they become incidents.
Change control (governance without security scope creep)
Version & diff
Every change produces a version ID and a human-auditable diff grouped by layer (port / queue / VL / time sync). The goal is clarity: what changed, where, and why.
Pre-deploy validation
Run the verification rules: schema/consistency, visibility, and invariants. If a policing/shaping decision cannot be observed, it cannot be safely governed.
Rollback readiness
Define “known-good” versions and monitoring triggers. Rollback is a planned step with clear conditions, not an emergency improvisation.
H2-10 · Validation & worst-case testing: proving bounded latency and robustness
“Deterministic” must be proven under stress. Validation should separate (1) functional correctness, (2) worst-case performance boundaries, and (3) robustness to faults. Acceptance criteria should be expressed as bounds, percentiles, and hold-time windows—not averages.
How to express acceptance (avoid “average-only”)
- Upper bound: the measurable worst-case (max) must stay below a defined ceiling.
- Tail percentiles: P99 / P99.9 reveals whether the tail explodes under contention.
- Hold-time window: bounds must remain valid for a sustained duration under a defined stress profile.
Validation checklist (test → expected behavior → fail criteria)
| Test | Expected behavior | Fail criteria |
|---|---|---|
| Policing violation injection | Violations are contained; policing stats rise; critical traffic remains stable | Critical class exceeds latency ceiling or shows unexpected drops |
| Class/queue mapping verification | Critical class stays isolated from best-effort contention under load | Tail latency inflates when noncritical traffic is added |
| Worst-case fan-in to a single egress | Bounded latency holds; jitter remains within defined ceiling | Max or P99.9 exceeds limits; sustained instability over hold-time |
| Micro-burst stress (short bursts) | Queues absorb bursts as designed; drops remain bounded and observable | Unexpected drops; queue behavior cannot be explained via counters |
| Sustained congestion storm (noncritical) | Containment is local; critical class remains bounded and measurable | Congestion spreads across planes/ports; critical latency breaks |
| Link flap injection (single port / plane) | Fault is observable; effects do not cascade; recovery is explainable | Fault cascades to other ports/planes or leaves no evidence trail |
| Error frame injection (CRC/error bursts) | Error counters rise; alarms trigger as configured; critical forwarding remains bounded | Counters/alarms fail to capture the event or bounded behavior breaks |
Context note
Validation is typically executed within the target compliance and environmental constraints, but the focus here is strictly on switch-side bounded latency, robustness, and evidence generation.
H2-11 · BOM / IC selection checklist (switch ASIC, PHY, clock, PMIC) — criteria + concrete part numbers
This section turns “determinism + evidence” into a procurement-ready BOM shortlist. Selection is driven by measurable capabilities (policing/shaping/queues, timestamp behavior, counters/diagnostics, and power sequencing). Part numbers below are practical candidate pools—final fit must be confirmed against the latest datasheet feature tables and temperature / lifecycle requirements.
1) Switch ASIC / platform (what directly impacts bounded latency and evidence)
Selection criteria (use as pass/fail bullets)
- Port count + speed mix: 100M / 1G / mixed, plus spare ports for monitoring and growth.
- Queueing resources: queue count, queue isolation, and scheduler options (priority/WRR/etc. as exposed).
- Shaping capabilities: ability to shape per port and/or per traffic class (jitter control boundary).
- Ingress policing support: rate enforcement per flow/class; defined violation actions (drop/mark) with stats.
- Fabric headroom: oversubscription behavior and worst-case queue build-up conditions.
- Store-and-forward vs cut-through behavior: understand tail-latency implications (verify on bench).
- Observability granularity: per-port / per-queue / per-class counters; drops must be explainable.
- Mirroring/SPAN: filter capability and discipline policy to prevent diagnostics from causing congestion.
- Timestamp support: hardware timestamp behavior and the path points that remain stable under load.
- Operating envelope: temperature range, supply rails, package/manufacturability, and lifecycle constraints.
Concrete candidate part numbers (shortlist pool)
- Microchip LAN9662 (TSN-capable managed switch platform; verify shaping/queue stats and timestamp features)
- NXP SJA1105 family (e.g., SJA1105EL/QL/TEL variants; verify exact TSN/PTP feature set by variant)
- Microchip KSZ9567 / KSZ9477 (managed switch families; verify time-sync and shaping capabilities for the target design)
- Marvell / Infineon 88Q6113 (high-port automotive-class switch; verify feature set and availability for program needs)
- Broadcom BCM53xx / BCM56xx families (managed Ethernet switch families; verify industrial availability and feature licensing)
Tip: keep at least two switch candidates in the RFQ to avoid “single-vendor lock” during supply events.
2) Ethernet PHY (link integrity, BER symptoms, and diagnosability)
Selection criteria (what prevents “mystery link flap”)
- Speed support: 100BASE-TX vs 1000BASE-T vs 1000BASE-X matched to the architecture.
- MAC–PHY interface: MII/RMII/RGMII/SGMII (timing + layout risk boundary).
- Diagnostic registers/counters: CRC/alignment/symbol errors and link up/down events.
- Cable/connector sensitivity: margin against return loss, crosstalk, and harness variability.
- Temperature behavior: stability of link and error rate across the operating range.
- EMI headroom: enough margin so small layout/harness changes do not cause field dropouts.
- Loopback features: local/remote loopback to shrink the field debug path.
- Power rails: rail count, IO voltages, and susceptibility to supply noise.
- Second-source plan: two PHY candidates with equivalent interface strategy.
Concrete PHY part numbers (grouped by common use)
- 10/100 copper (classic control networks): TI DP83848, Microchip KSZ8081 / KSZ8091
- Gigabit copper (growth headroom): TI DP83867 / DP83869, Microchip KSZ9031 / KSZ9131, Marvell 88E1512
- SGMII / 1000BASE-X (optical/backplane style): Microchip VSC85xx family (variant-dependent)
Debug mapping rule: symptom → switch counter. For example: CRC spikes → error counters; repeated link up/down → link event log; sporadic bursts → queue drop stats (if exposed).
3) Clocking (jitter budget → timestamp stability and link behavior)
Selection criteria (board-level clock only)
- Clock tree needs: number of outputs and required frequencies for switch/PHY/management domains.
- Jitter relevance: edge uncertainty can show up as timestamp instability or PHY recovery sensitivity.
- Jitter cleaning boundary: use a jitter attenuator when the reference quality is variable or multi-domain.
- Temperature stability: drift and aging behavior across the operating range.
- Input strategy: single reference vs multiple references (switching must not introduce discontinuities).
- Supply sensitivity: vulnerability to rail noise (keep rails clean and monitored).
- Integration: package, output standards, and layout constraints.
- Evidence hooks: ability to expose lock status / fault pins (if used) for event logging.
Concrete clock / jitter parts (candidate pool)
- Silicon Labs / Skyworks Si5345 (jitter attenuator / clock generator family)
- Silicon Labs / Skyworks Si5341 (same class, output/feature variant)
- TI LMK05318 (clock generator / jitter cleaner class; verify IO standards needed)
- Renesas (IDT) 8A34xxx family (timing / jitter solutions; variant-dependent)
Practical rule: if timestamp error grows under congestion or temperature swings, check clock stability + queueing effects together. Clocking alone rarely explains everything, but weak clocking makes the tail worse.
4) PMIC / supervisor / sequencer (power-up order, reset evidence, power-fail logging)
Selection criteria (stay at board-level governance)
- Sequencing: multi-rail ordering, delays, dependencies (avoid “random boot behavior”).
- Voltage monitoring: UV/OV thresholds and accuracy; clear fault signaling.
- Reset strategy: reset outputs and hold times; clean deglitch to prevent false resets.
- Power-fail signaling: early warning to support event logging and controlled shutdown behavior.
- PG aggregation: combining multiple rails into a deterministic “system ready” condition.
- Fault latching: retain reset/power-fail reasons for post-event diagnosis.
- Telemetry (if used): rail status visibility that supports health monitoring (do not overcomplicate).
- Supply architecture: rail count, transient response needs, and manufacturability.
- Second-source plan: at least one alternative supervisor/sequencer path.
Concrete PMIC / supervisor parts (candidate pool)
- ADI / Linear LTC2937 (multi-rail supervisor / monitoring class)
- ADI / Linear LTC2974 / LTC2977 (power system manager class; use when telemetry/control is required)
- TI TPS3808 (supervisor / reset generator class)
- TI TPS386000 (supervisor class; variant-dependent thresholds/features)
- Maxim / Analog Devices MAX16052 (supervisor class; check availability/lifecycle)
Evidence rule: without power-fail and reset-cause visibility, “rare field resets” become unprovable and non-actionable.
5) Supporting parts (NVM + management MCU) — keep governance practical
NVM (config versions + event logs)
- SPI NOR flash: Winbond W25Q128JV, Macronix MX25L128xx, Micron MT25Q series
- High-endurance option (if frequent writes): Infineon/Cypress FM25Vxx FRAM family
Keep log format simple: version ID, event type, timestamp, port/class counters snapshot.
Management MCU (control-plane only)
- ST STM32H7 (high-performance management/control)
- ST STM32F7 (mid-class management/control)
- Microchip SAM E70 (industrial-grade MCU option)
- NXP LPC55Sxx (MCU family; keep security features out of scope here)
The MCU role stays narrow: configuration load, health readout, event log shipping, and safe rollback triggers.
RFQ “minimum information” (copy/paste checklist)
Target: AFDX/ARINC 664 switching platform (switch-side determinism + evidence) Ports: ____ total Speed mix: [100M] [1G] [mixed] Spare for monitoring: [yes/no] Queues/classes: ____ queues (or ____ priority levels) Class-to-queue mapping: [explicit required] Shaping: [per-port] [per-class] Granularity requirement: ____________________________ Policing: [required/optional] Violation action: [drop/mark] Violation stats: [required] Counters: [per-port] [per-queue] [per-class] Drop attribution required: [yes/no] Mirroring/SPAN: [required/optional] Filter: [yes/no] Mirror discipline: [rate/window] Timestamp/PTP (switch config points only): mode: __________ HW timestamp: [required/optional] Operating range: temperature: ________ lifecycle: ________ second-source: [required/optional] Power sequencing: rails: ____ order constraints: _________________________________ Supervisor: reset outputs: ____ power-fail signal: [required/optional] reset-cause latch: [yes/no] NVM: type: [SPI NOR/FRAM] capacity: ____ endurance requirement: __________________ Deliverables: datasheet + feature table confirmation + availability/lifecycle statement
H2-12 · FAQs (AFDX / ARINC 664 Switch)
These FAQs focus on switch-side determinism, governance, and evidence: bounded latency/jitter, VL/BAG policing, queue/shaping behavior, A/B fault containment, PTP timestamp stability, PHY integrity symptoms, and verification after configuration changes.