123 Main Street, New York, NY 10001

Managed / Enterprise Switch IC: L2/L3, ACL, QoS & Telemetry

← Back to: Industrial Ethernet & TSN

Core idea
A managed/enterprise switch IC is a deterministic packet-processing pipeline with finite tables, queues, and counters. The winning design turns features into measurable gates—resource budgets, observability, and acceptance targets—so field issues can be diagnosed and fixed fast.

H2-1 · What is a Managed / Enterprise Switch IC

Section intent
  • Define the enterprise-class switch IC boundary versus unmanaged/smart/TSN classes.
  • Turn “feature lists” into selection triggers and engineering expectations.
  • Keep TSN/PTP/PoE/stack mechanisms out of scope (only capability-level pointers).
Definition (one sentence)

A managed / enterprise switch IC is a multi-port switching silicon platform that delivers hardware forwarding plus scalable policy, traffic management, and observability hooks (L2/L3, ACL/TCAM, multicast control, QoS shaping, and line-rate telemetry) for networks that must be diagnosable and maintainable at scale.

Stop line: TSN time-window scheduling, PTP servo algorithms, and PoE power-path design are not expanded here.

The boundary is not “one feature” — it is a system capability level
  • Policy at scale: VLAN/LAG/STP + QoS + ACL + multicast control form a coherent pipeline, not isolated toggles.
  • Measurable hardware resources: TCAM entries, MAC/route/neighbor tables, queues, and buffers have visible limits and counters.
  • Operations & diagnostics: line-rate counters, mirroring, event hooks, and field logs enable fast root-cause in real deployments.
Decision triggers (choose enterprise-class when…)
  • Multiple VLANs and segmentation policies must remain predictable after growth and maintenance cycles.
  • Traffic classes require explicit queue mapping, shaping, and congestion behavior (not “best effort”).
  • ACL rules must be enforced in hardware (TCAM) with counters and deterministic priority order.
  • Multicast must be controlled (IGMP/MLD snooping) to avoid flooding, jitter spikes, and CPU punt storms.
  • Field diagnosis must be fast: telemetry counters, mirroring, and “black-box” event trails are required.
  • Microbursts/oversubscription are real: buffer visibility and queue-depth instrumentation matter.
What this page will deliver
  • Switch-class feature matrix (unmanaged vs smart vs enterprise vs TSN).
  • Enterprise silicon block map: data-plane pipeline, TCAM/ACL, queues/shapers, multicast control, telemetry.
  • Engineering hooks: bring-up checks, counter baselines, and field-debug instrumentation.
  • Selection scorecard: resources (TCAM/buffers/queues/tables) + performance + manageability.
Switch Class Feature Matrix Four-class map comparing unmanaged, smart, enterprise, and TSN switch IC capabilities using compact feature chips. Unmanaged Basic L2 forward Fixed config straps Minimal QoS Limited counters No TCAM / ACL Smart VLAN VLAN / trunk QoS (limited) IGMP snooping Basic SPAN Small tables / light mgmt Enterprise (This page) TCAM L2 at scale L3 offload ACL via TCAM QoS shaping Multicast control Telemetry TSN TIME Time-aware shaping GCL Deterministic latency Admission Timing coupling (capability-level)
Diagram: A capability-level map. The “Enterprise” column is the scope of this page; TSN timing windows and sync algorithms are routed to dedicated pages.

H2-2 · Scope Guard & Page Routing

Section intent
  • Lock the page boundary (what is covered vs not covered) to prevent topic overlap.
  • Provide keyword-based routing rules so readers reach the correct deep-dive page fast.
  • Use “stop-line examples” to show how out-of-scope questions are handled without expanding this page.
In-scope (covered on this page)
  • L2 forwarding at scale: VLAN/trunk, LAG, MAC learning/aging/move behavior.
  • L3 offload at capability level: routing tables, neighbor tables (ARP/ND), ECMP behaviors and limits.
  • ACL policy in hardware: TCAM matching, rule priority, policers, per-rule counters.
  • Multicast control: IGMP/MLD snooping, querier presence, flooding guards.
  • QoS without TSN windows: class mapping, queues, schedulers, shapers, WRED/ECN hooks.
  • Line-rate telemetry: counters, mirroring, events, field logs, and “black-box” diagnostics hooks.
  • Engineering hooks: bring-up loopback/PRBS usage and counter baselining.
Out-of-scope (routed to dedicated pages)
Keyword routing rules (fast navigation)
If the question contains TSN window terms…
Keywords: Qbv, Qci, Qav, GCL, gate-control list, time-slot, admission controlTSN Switch / Bridge
If the question contains time-sync terms…
Keywords: PTP one-step/two-step, E2E/P2P, SyncE, holdover, WRTiming & Sync
If the question contains power-over-cable terms…
Keywords: 802.3af/at/bt, PSE, PD, PoDL class, center-tap, power allocationPoE / PoDL
If the question contains stack/certification terms…
Keywords: PROFINET IRT, EtherCAT DC, CIP Sync, certification, conformanceIndustrial Ethernet Stacks
If the question contains EMC/protection terms…
Keywords: TVS, CMC, ESD, surge, shield bond, return path, creepage, intrinsic safetyPHY Co-Design & Protection
Stop-line examples (how out-of-scope questions are handled)
Question: “How to build the TSN gate-control list (Qbv)?”
Handling: This page only checks whether the switch silicon exposes TSN window capabilities; mechanism and parameter tables are routed to TSN Switch / Bridge.
Question: “Why does PTP offset drift over temperature?”
Handling: This page only covers timestamp exposure and switch-side observability counters; drift analysis is routed to Timing & Sync.
Question: “How to size PoE surge protection on the center-tap?”
Handling: Power-path and protection sizing are routed to PoE / PoDL and PHY Co-Design & Protection.
Scope Guard Routing Map Routing diagram with a central node for this page and arrows to dedicated pages for TSN, timing, PoE, stacks, protection, and security, labeled with keywords. Managed / Enterprise Switch IC L2/L3 · ACL/TCAM · IGMP · QoS · Telemetry TSN Switch / Bridge Qbv/Qci/Qav · GCL Timing & Sync PTP · SyncE · WR PoE / PoDL PSE/PD · Classes Industrial Stacks PROFINET · EtherCAT · CIP PHY Protection TVS · CMC · Shield/Return Security MACsec · DTLS/TLS Qbv · Qci · GCL PTP · SyncE · WR 802.3bt · PoDL PROFINET · EtherCAT TVS · CMC · Shield MACsec · DTLS/TLS
Diagram: A routing map. When keywords match TSN timing windows, sync, PoE, stacks, protection, or security, the deep-dive belongs to the linked page.
TSN Switch / Bridge Timing & Sync PoE / PoDL Industrial Ethernet Stacks PHY Protection Security

H2-3 · Silicon Block Diagram: Data Plane vs Control Plane vs Management

Section intent
  • Split switch silicon into data plane, control plane, and management plane to form a stable “map” for all later chapters.
  • Bind each block to an observability bucket (counters / status / table resources) so failures can be localized quickly.
  • Keep TSN/PTP/PoE mechanisms out of scope; only capability-level routing is allowed.
Three-plane engineering definition
Data plane (fast path)
Line-rate parsing, lookup, policy, buffering, scheduling, and egress. Primary risks are drop/latency behaviors driven by table misses, ACL actions, queue buildup, and buffer limits.
Control plane (slow path)
CPU/assist logic that maintains tables and handles exceptions (learn/age, route/neighbor maintenance, punts). Primary risk is CPU punt storms that turn line-rate traffic into a bottleneck.
Management plane (operations & telemetry)
Configuration, telemetry export, event logs, and remote access paths (PCIe/MDIO/I²C/EEPROM). Primary risk is incomplete visibility that prevents root-cause and repeatable recovery.
Module inventory (what exists, where it belongs, what to observe)
Port MAC / PCS / SerDes
Data
Role: capture frames at line rate and surface link-quality signals. Observability: link flap, CRC/FCS, symbol/code errors, pause/PFC, EEE entry/exit.
Parser / Classifier
Data
Role: extract L2/L3/L4 fields and build metadata (VLAN/DSCP/flow key). Observability: parse errors, unknown types, punt reasons.
Lookup: FDB / LPM / Neighbor
Data
Role: decide forwarding domain and next hop using tables. Observability: table utilization, hit/miss, collision/rehash, route/ARP/ND pressure.
ACL / Policy (TCAM + actions)
Data
Role: enforce allow/deny/remark/police/mirror decisions. Observability: per-rule hits, policer drops, remark counts, mirror triggers.
Queue / Buffer Manager
Data
Role: enqueue traffic classes, manage shared buffers, mark or drop under congestion. Observability: queue depth peak, WRED/tail drops, pause/PFC events.
Scheduler / Shaper
Data
Role: determine service order and rate limits (SP/WRR/WFQ, token bucket). Observability: serviced bytes, shaper throttles, queue starvation indicators.
CPU / Exception Engine
Control
Role: maintain tables and handle punt traffic. Observability: punt rate by reason, CPU load, table update latency, event timestamps.
Config / Telemetry / Logs (PCIe/MDIO/I²C/EEPROM)
Mgmt
Role: configure, export counters, and preserve field evidence. Observability: config change log, export health, black-box ring buffer fields.
Observability taxonomy (counter/register buckets)
  • Port quality: link flap, CRC/FCS, symbol/code errors, pause/PFC, EEE.
  • Parse/classify: parse errors, unknown types, punt-to-CPU reasons.
  • Table resources: FDB/LPM/neighbor/TCAM utilization, hit/miss, collision/rehash.
  • Policy actions: ACL hit per rule, policer drops, remark/mirror counters.
  • Queues/buffers: depth peaks, WRED/tail drops, congestion marks, pause events.
  • Events/logs: config changes, thermal/power, port flap timeline, black-box fields.
Switch Silicon Blocks: Data vs Control vs Management Block diagram showing the main data-plane forwarding pipeline and side blocks for CPU/control plane and management interfaces. TSN is shown as optional. Data Plane Pipeline (fast path) SerDes Port MAC Parser VLAN/DSCP Flow key Lookup FDB (L2) LPM ND/ARP ACL/Policy TCAM Policer Queue / Buffer Q0 Q1 Q2 Shared buffer Scheduler / Shaper SP / WRR Token bucket WRED / ECN hooks Egress Rewrite (L2/L3) Mirror / SPAN Transmit (MAC) Control Plane CPU + exceptions Table maintenance Punt Management Plane PCIe MDIO I²C EEPROM Telemetry export Events / logs TSN (optional) Capability only
Diagram: The map used by later chapters. Each block binds to a counter/resource bucket so failures can be localized without mixing TSN/PTP/PoE mechanisms.

H2-4 · Switching Pipeline Deep Dive (Ingress→Decision→Egress)

Section intent
  • Explain the end-to-end packet journey using a fixed 8-stage pipeline.
  • Highlight the decision points (lookup, ACL, classify, queue/buffer) that dominate loss and latency.
  • Attach typical counters, failure modes, and pass-criteria placeholders (X) per stage.
The fixed 8-stage pipeline
  1. Ingress receive: Port MAC/PCS/SerDes accepts a frame and records link-layer health.
  2. Parse & metadata: Header fields become metadata (VLAN/DSCP/flow key).
  3. Lookup (L2/L3): FDB/LPM/neighbor tables decide destination and next hop. Decision
  4. ACL/Policy: TCAM match triggers allow/deny/remark/police/mirror. Decision
  5. Classify to QoS: Map to traffic class/queue/color. Decision
  6. Enqueue & buffer: Shared buffer/queue behavior determines drops/marks. Decision
  7. Schedule & shape: Service order and rate limiting decide timing.
  8. Egress transmit: Rewrite, mirror, and transmit on the destination port.
Pipeline stage table (function → counters → failure → pass criteria)
Stage 1 · Ingress receive
Typical counters: link flap, CRC/FCS, code errors, pause/PFC events.
Common failure: physical errors masquerading as higher-layer drops.
Pass criteria: CRC/FCS error rate < X over Y minutes; link up stability > X hours.
Stage 2 · Parse & metadata
Typical counters: parse errors, unknown types, punt reasons.
Common failure: wrong VLAN/DSCP interpretation causes incorrect class or policy path.
Pass criteria: parse error count < X; metadata classification matches test vectors.
Stage 3 · Lookup (L2/L3) — Decision point
Typical counters: FDB/LPM/neighbor hit/miss, table utilization, collision/rehash.
Common failure: table miss or resource exhaustion leads to flooding or CPU punt.
Pass criteria: utilization < X%; miss rate < X per 1k frames during steady state.
Stage 4 · ACL/Policy — Decision point
Typical counters: per-rule hit, deny drops, policer drops, remark/mirror counts.
Common failure: rule priority/order mismatch causes unintended drops or remarking.
Pass criteria: expected rule-hit distribution; deny drops < X for allowed flows.
Stage 5 · Classify to QoS — Decision point
Typical counters: class-to-queue mapping stats, remark counters, per-queue byte counters.
Common failure: wrong mapping sends critical traffic into a congested queue.
Pass criteria: critical class always maps to target queue; mismatch rate < X.
Stage 6 · Enqueue & buffer — Decision point
Typical counters: queue depth peak, tail drops, WRED drops, pause/PFC events.
Common failure: microbursts build queues faster than drain rate; drops appear despite low average utilization.
Pass criteria: peak depth < X; drop rate < X%; recovery time < X ms.
Stage 7 · Schedule & shape
Typical counters: served bytes, starvation indicators, shaper throttles.
Common failure: strict priority starvation or shaper parameters too aggressive.
Pass criteria: starvation events < X; shaped rate within ±X% of target.
Stage 8 · Egress transmit
Typical counters: egress drops, mirror oversubscription, rewrite errors (if exposed).
Common failure: egress congestion or mirror path drops hide the root signal.
Pass criteria: egress drop rate < X; mirror capture completeness > X%.
Switching Pipeline (Ingress to Egress) Numbered 1 to 8 pipeline diagram showing the packet path from ingress to egress, with decision points highlighted in blue. Ingress → Decision → Egress (fixed 8 stages) 1 · Ingress MAC SerDes 2 · Parse Metadata build 3 · Lookup FDB LPM ND 4 · ACL/Policy TCAM Police 5 · Classify DSCP/PCP → Q 6 · Enqueue Q0 Q1 BUF 7 · Schedule SP/WRR Shaper 8 · Egress Rewrite + TX Decision Localization rule: observe at decision points before tuning rate limits Lookup miss / resource ACL order / action QoS mapping Queue/buffer peaks TSN windows and PTP algorithms are routed to dedicated pages; this pipeline stays capability-level.
Diagram: Stages 3–6 dominate most “mystery drops” and latency spikes. Use counters at those decision points before changing shaping parameters.

H2-5 · L2 Feature Set: VLAN, MAC Learning, STP, LAG

Section intent & scope
  • Turn L2 terms into repeatable engineering checks: configuration entry points, observability, and failure patterns.
  • Cover capability-level interfaces and pitfalls on enterprise switches.
  • Not covered: ring redundancy protocol implementation details (MRP/HSR/PRP) and deep protocol internals.
VLAN & trunking (define the forwarding domain)
VLAN behavior is an L2 boundary contract. Most “mystery cross-talk” comes from boundary collapse: a wrong PVID/native VLAN pairing, an overly wide allowed list, or inconsistent tag/untag behavior across ports.
  • Ingress decision: untagged frames map into a VLAN (PVID / port VLAN).
  • Egress contract: access ports untag; trunk ports tag (plus native VLAN exception).
  • Allowed list: the trunk must explicitly permit the VLANs that should cross.
  • Common pitfall: VLAN leakage when native VLAN and PVID assumptions differ across links.
MAC learning / FDB (learn → age → move)
Forwarding behavior depends on FDB health. When learning is polluted or aging is too aggressive, the switch falls back to flooding or exception handling, which can look like random loss or bursts of latency.
  • Learn: source MAC + VLAN binds to an ingress port in the FDB.
  • Age: inactive entries expire; traffic to unknown destinations floods.
  • Move: the same MAC appears on a different port; repeated moves become MAC flapping.
  • Common pitfall: flapping indicates loops, wiring errors, dual-homing mistakes, or mis-scoped VLANs.
STP (interface-level behavior, not protocol internals)
STP changes forwarding eligibility on ports. During state transitions, MAC movement and transient flooding can increase. The most actionable checks are port state and state-transition timing, not protocol packet details.
  • State visibility: blocking / learning / forwarding per port.
  • Failure symptom: “only one uplink works” when a port is blocked unexpectedly.
  • Interaction hotspot: STP + trunk + LAG can disguise the true failing link.
LAG (aggregation) — hash, symmetry, and imbalance evidence
LAG scales bandwidth by distributing flows across member links. The distribution is governed by a hash key (L2/L3/L4 fields). Imbalance is often traffic-shape driven (few large flows), not a silicon defect.
  • Hash key: choose fields that reflect flow diversity (e.g., include L3/L4 when possible).
  • Symmetry: ensure bidirectional flows keep a stable member mapping if required by the system.
  • Imbalance proof: compare per-member byte/packet counters and peak queue depth per egress.
  • Common pitfall: changing hash keys without verifying upstream NAT/encapsulation effects.
L2 checklist (configuration → verify → failure → pass criteria)
VLAN boundary
What to set: port mode (access/trunk), PVID/native VLAN, allowed VLAN list.
Quick verify: per-port VLAN membership, tagged/untagged egress behavior.
Failure symptom: VLAN leakage or silent blackholes on specific VLANs.
Pass criteria: only intended VLANs traverse trunks; leakage events = 0 (X).
FDB stability
What to set: aging time, learning rules (per VLAN if supported), storm guards.
Quick verify: learn/age/move counters, FDB utilization, MAC move rate.
Failure symptom: MAC flapping, flooding bursts, CPU punt increases.
Pass criteria: move rate < X/min; utilization < X%; flooding stable.
STP operational sanity
What to set: STP mode enablement, port roles, edge/portfast policy (if applicable).
Quick verify: per-port STP state, state-transition timing, topology-change indicators.
Failure symptom: unexpected blocking, intermittent reachability during convergence.
Pass criteria: expected ports in forwarding; convergence < X seconds.
LAG distribution
What to set: member links, LACP mode, hash key selection.
Quick verify: per-member counters, hash distribution, per-queue depth on egress.
Failure symptom: one member saturates while others idle; microburst drops on one path.
Pass criteria: member utilization within X%; drops < X under expected load.
L2 Engineering Flow: VLAN + FDB + LAG Diagram showing VLAN tagging decisions, FDB learning/aging/move behavior, and LAG hash distribution across member links. L2: VLAN boundary + FDB learning + LAG hash distribution Lane A · VLAN tag/untag Lane B · FDB learn/age/move Lane C · LAG hash split Ingress Untag PVID VLAN domain Egress Access Trunk Untag Tag Allowed VLAN list FDB Learn (Src MAC + VLAN) Age (timer) Expire → Flood Move (port change) Flap counter CPU punt risk LAG Flow key (L2/L3/L4) Hash M1 M2 M3 Member counters Imbalance evidence
Diagram: VLAN boundary decisions, FDB learning/aging/move, and LAG hashing are the three L2 pillars that most often explain “unexpected” reachability and drops.

H2-6 · L3 Feature Set: Routing, ARP/ND, ECMP, NAT (Capability-level)

Section intent & scope
  • Explain what L3 hardware forwarding actually does in silicon: lookup → next-hop → rewrite → egress.
  • Focus on resource bottlenecks that cause punts and performance cliffs (route, neighbor, ECMP, ACL).
  • Not covered: dynamic routing protocol courses (OSPF/BGP) and deep control-plane protocol design.
L3 hardware fast path (what happens on a hit)
On a route hit, silicon performs deterministic actions: longest-prefix match (or host route), next-hop selection, header rewrite, and forwards into the egress pipeline. When lookups miss or neighbors are unresolved, traffic is punted or flooded, producing sudden latency and throughput collapse.
  • Lookup: LPM and host routes decide the forwarding action.
  • Next-hop: bind to a resolved neighbor (ARP/ND) and output port.
  • Rewrite: MAC DA/SA, TTL decrement, IP checksum update.
  • Egress: enqueue/schedule applies the configured QoS policy (no algorithm deep dive here).
ARP/ND (neighbor table) — the frequent hidden bottleneck
Many real systems fail due to neighbor pressure, not route count. Neighbor misses trigger resolution traffic and punts. Aging and churn can create intermittent “it works, then stalls” patterns.
  • Resource: ARP/ND entries and unresolved (incomplete) entries.
  • Symptoms: punt spikes, burst latency, periodic stalls as neighbors expire.
  • Observability: neighbor utilization, incomplete count, ARP/ND request rate (X).
ECMP (multi-path) — throughput vs. explainability
ECMP spreads flows across multiple next-hops using a hash key. It improves aggregate throughput but introduces a distribution model that must be observable: per-path counters and stable hashing assumptions matter.
  • Hash key: field selection determines balance and flow stability.
  • Resource: ECMP groups and paths; exhaustion forces fallback behavior.
  • Evidence: per-path utilization and queue depth peaks, not average link usage.
NAT (capability-level only)
If NAT exists in an enterprise switch IC, the practical constraints are session-table capacity and timeout behavior. When session resources saturate, new-flow setup fails or punts to slow paths. This section records capability and resource impact only.
  • Resource: session entries, port allocation, per-session timers (X).
  • Symptoms: new connections fail, intermittent stalls, CPU punt storms.
Resource budget placeholders (capacity planning)
Route entries
X
Consumed by prefixes/VRFs. Exhaustion causes lookup miss and punt behavior. Pass criteria: utilization < X%.
ARP/ND entries
X
Consumed by active neighbors. Exhaustion increases unresolved entries and punts. Pass criteria: incomplete < X; utilization < X%.
ECMP groups / paths
X
Consumed by multi-path routes. Exhaustion forces reduced path diversity or software fallback. Pass criteria: utilization < X%.
ACL entries (TCAM)
X
Consumed by security and policy filters. Exhaustion leads to rule compromises or unexpected default behavior. Pass criteria: headroom > X%.
L3 Hardware Forwarding Path (Capability-level) Diagram showing L3 hardware forwarding: LPM/host route lookup, next-hop resolution (ARP/ND), header rewrite (TTL and checksum), and egress; misses punt to CPU. L3 fast path: lookup → next-hop → rewrite → egress (miss → punt) Parser IPv4/IPv6 key Lookup LPM Host Next-hop ARP/ND Out Rewrite MAC DA/SA TTL – 1 IP checksum update Egress Queue Transmit Per-path counters Control plane Punt (miss) Resolve ARP/ND Update tables Cliff behavior: resource miss → punt → latency/throughput collapse Route miss Neighbor miss ECMP exhaustion Watch utilization and incomplete entries before tuning QoS parameters.
Diagram: L3 offload is deterministic on hits. Most production cliffs come from route/neighbor/ECMP resource misses that punt traffic to slow paths.

H2-7 · QoS, Scheduling & Shaping (Non-TSN)

Section intent & boundary (Non-TSN determinism)
  • Convert QoS from “settings” into an explainable queue/scheduler/shaper model.
  • Define the non-TSN boundary: statistical service guarantees, not time-slot determinism.
  • Not covered: TSN time-aware shaping (Qbv) and time-slot scheduling. CBS is mentioned only as “capability supported or not”.
QoS truth chain: DSCP/PCP → class → queue → scheduler
All QoS outcomes depend on a single internal truth: which packets map into which traffic class and egress queue. If mapping is ambiguous, scheduler and shaping tuning becomes non-repeatable.
  • Inputs: DSCP (L3), PCP (VLAN), port-based policy, ACL remark.
  • Outputs: traffic class and queue ID with fixed counters per queue.
  • First verification: per-class and per-queue counters align with expected traffic.
Queues & buffers: microburst → depth → tail latency → drop/mark
Many industrial “P99 spikes” happen under moderate average load due to microbursts. The correct evidence is queue depth and watermarks, not average utilization.
  • Depth/occupancy: rising depth predicts latency growth before drops begin.
  • Watermarks: peak depth reveals brief congestion that averages hide.
  • Drop/mark: thresholds trigger WRED/ECN or tail drop depending on policy.
Scheduling: SP vs WRR/WFQ (service guarantees and failure modes)
Scheduler choice defines who gets served first under contention. SP protects critical flows but can starve lower classes. WRR/WFQ improves fairness, but does not provide strict time-slot determinism.
  • SP (Strict Priority): predictable for high class; starvation risk for low class.
  • WRR/WFQ: controlled sharing; tail latency still depends on burstiness and buffers.
  • Evidence: per-queue service counters and per-queue drop/mark rates.
Shaping: token bucket (rate + burst) and pacing stability
Token bucket shaping enforces a long-term rate limit while allowing short bursts. A too-small burst creates frequent throttling; a too-large burst propagates congestion into downstream hops.
  • Rate: average bandwidth ceiling for the class.
  • Burst: short-term tolerance window; directly impacts jitter and microburst smoothing.
  • Evidence: shaper drop/mark counters and “sawtooth” throughput patterns.
Congestion control: WRED / ECN (trigger window, not algorithm theory)
WRED/ECN is best treated as a trigger window. Below the lower threshold, no action. Between thresholds, probabilistic mark/drop. Above the upper threshold, forced action. ECN requires end-to-end support to be effective.
  • Early action: mark/drop begins as depth enters a configured window.
  • Forced action: tail drop or forced mark at high depth.
  • Evidence: ECN mark counters, early drop counters, and depth correlation.
Deliverable · QoS mapping (DSCP/PCP → class → queue → scheduler)
Replace a wide table with a mobile-safe mapping list. Each entry is a policy row that can be verified with per-queue counters.
Row A · Critical control
Example
Input: DSCP = X, PCP = X
Class: TC = X
Queue: Q = X
Scheduler: SP / WRR (X)
Verify: Q counters increase only for intended traffic.
Row B · Best-effort
Example
Input: DSCP = X, PCP = X
Class: TC = X
Queue: Q = X
Scheduler: WRR / WFQ (X)
Verify: depth watermarks and drop rate remain within targets.
Pass criteria placeholders (measured at defined points)
Loss rate
Target: drop rate < X per 1M packets on queue Q under load profile Y.
Tail latency
Target: P99 < X ms for class TC measured at egress port P.
Congestion recovery
Target: queue depth returns below threshold within X seconds after burst ends.
Non-TSN QoS Model: classify → queue → schedule → shape → egress Diagram showing QoS mapping, queues, schedulers (SP/WRR/WFQ), token bucket shaping, and observability points: depth, drop, ECN mark. Non-TSN QoS: mapping + queues + scheduler + shaper + egress Classify DSCP/PCP Policy/ACL Queues Q0 Q1 Q2 Q3 Depth Drop Mark Scheduler SP WRR/WFQ Shaper Token Rate Burst Egress Port TX counters Boundary: Non-TSN (no time-slot determinism)
Diagram: QoS becomes explainable when mapping, queue depth, scheduler choice, shaping parameters, and drop/mark triggers are observed together.

H2-8 · Multicast Control: IGMP/MLD Snooping, Querier, Storm Guard

Section intent: first-principles troubleshooting
  • Explain multicast as a table-driven forwarding system (member ports vs. VLAN flooding).
  • Provide the shortest troubleshooting path for: flooding, subscription loss, and latency jitter.
  • Keep the focus on IGMP/MLD snooping and operational controls (querier, aging, storm guard), not protocol theory.
Core model: group table present → member-only forwarding
Snooping turns multicast from “flood inside VLAN” into “forward only to member ports”. If the group table is missing, stale, or not refreshed, behavior degrades to flooding or intermittent delivery.
  • IGMP: IPv4 multicast control messages.
  • MLD: IPv6 multicast control messages.
  • One shared principle: maintain group membership and refresh timers.
Deliverable · Snooping state machine (join / leave / aging)
Join
Trigger: host sends join (report).
Table action: add/update group entry + member port.
Observe: group entry count increases; member port list updates.
Leave
Trigger: host sends leave.
Table action: start confirmation (query/timeout) then remove member.
Observe: leave counters; member removal after confirmation window (X).
Aging
Trigger: refresh timeout.
Table action: expire group/member entries.
Observe: aging counters; sudden flooding if table empties unexpectedly.
Shortest troubleshooting path: flooding
  • Step 1 — Querier: verify a querier exists on the VLAN; missing querier often triggers table decay and flooding.
  • Step 2 — Aging: check whether group entries age out faster than refresh; correlate with entry count drops.
  • Step 3 — VLAN & ACL: confirm IGMP/MLD control messages are not blocked or isolated.
  • Step 4 — CPU punt: if control-plane load spikes, snooping updates may lag and degrade to flood.
Shortest troubleshooting path: subscription loss / latency jitter
  • Subscription loss: confirm join arrives → confirm member port added → confirm forwarding only to members.
  • Latency jitter: verify whether flooding is occurring (unexpected fan-out) → check storm guard thresholds → check queue depth.
  • Common trap: storm control silently drops BUM traffic when snooping degrades to flood.
Storm guard (protection vs. false positives)
Storm control protects the fabric by limiting broadcast/unknown-unicast/multicast (BUM). When snooping fails and multicast turns into VLAN-wide flooding, storm guard can become the dominant loss mechanism. Thresholds must be validated for degraded modes.
  • Threshold placeholder: BUM rate limit = X (per port / per VLAN depending on chip).
  • Verify: storm-drop counters correlate with jitter/loss events.
Multicast Control: snooping vs flooding degradation Diagram showing host join, switch group table, member-only forwarding, and a red degradation path when querier is missing that causes VLAN flooding. Multicast: snooping builds group table → forward only to members (missing querier → flood) VLAN Host A Join (Report) Host B Member port Host C Non-member Switch (Snooping) Group table Members Aging timer Forward only to members Querier Periodic query Querier missing Degrade → Flood Storm guard can dominate loss when flooding occurs
Diagram: With snooping, multicast forwards only to member ports. If querier/refresh fails, behavior degrades toward VLAN flooding and storm control may become the primary loss mechanism.

H2-9 · ACL & Security Hooks: TCAM, Policer, Isolation

Section intent & boundary (capability-level security hooks)
  • Explain ACL as a resource system: TCAM width/slices/priority and how rule overlap causes “false drops”.
  • Turn ACL from “can it be configured” into “can it be verified and scaled without conflicts”.
  • Not covered: deep MACsec/TLS/key management. Only “integrated or not” and interface hooks are discussed.
ACL insertion point defines what can be matched
ACL behavior is not only about “fields supported” but also about where ACL is evaluated in the forwarding pipeline. Parser availability, metadata visibility, and stage ordering define match fidelity.
  • Ingress (early): fast filtering, may have limited L4 visibility.
  • Post-lookup: can combine L2/L3 decision metadata with policy.
  • Egress (late): can police/shape/mirror with queue context.
TCAM resource model: entries ≠ capacity
Rule capacity depends on key width and slicing. A single “complex” rule can consume multiple slices/banks. Shared TCAM pools across features can also reduce effective ACL capacity.
  • Width expansion: IPv6 + 5-tuple + inner headers increase key width and slice usage.
  • Sharing: ACL/Policy/Filters may share TCAM slices (implementation-specific).
  • Practical check: track TCAM utilization and rule install failures at scale.
Priority & conflicts: overlap + default action causes false drops
ACL issues often present as “random” failures because overlapping rules and ambiguous priority resolve differently than expected. A misaligned default action can turn misses into widespread drops.
  • Overlap trap: a broad deny rule can shadow a narrow permit rule.
  • Stop condition: first-hit behavior vs. multi-hit behavior determines resolution.
  • Default action: permit/deny on miss must be explicitly defined and tested.
Action engine: permit/deny/remark/police/mirror
ACL actions can reshape traffic behavior, not just allow or drop. Policing and mirroring are common sources of “intermittent” symptoms when thresholds or overhead are underestimated.
  • Remark: DSCP/PCP rewrite can change QoS mapping downstream.
  • Police: exceeding CIR/PIR or burst limits triggers drops (often mistaken for link instability).
  • Mirror: SPAN overhead can amplify congestion and distort latency measurements.
Verification loop: counters determine whether rules are provable
A scalable ACL design requires measurable evidence. Per-rule counters may be limited by a shared pool; policer and mirror counters are often the minimum proof required for acceptance and field debugging.
  • Per-rule hit: validate match correctness and detect shadowing.
  • Policer drops: confirm threshold behavior under load.
  • Mirror bytes: quantify overhead and potential congestion impact.
Deliverables · ACL design checklist & resource budget placeholders
ACL rule design checklist
Match fields: L2/L3/L4/metadata (capability-defined)
Priority model: rule ordering plan (avoid overlap ambiguity)
Default action: explicit permit/deny on miss
Action set: permit/deny/remark/police/mirror
Counters: hit + drops + mirror bytes (minimum proof)
Resource budget (placeholders)
TCAM slices: total X, reserved for ACL X
Effective entries: X (depends on key width)
Per-rule counters: X (shared pool size X)
Policer instances: X
Hitless update: supported? X
ACL/TCAM pipeline: parser → TCAM → action engine Block diagram showing parser fields feeding TCAM lookup, then action engine producing permit/deny/remark/police/mirror, with counter hooks and TCAM resource placeholders. ACL pipeline: Parser fields → TCAM lookup → Action engine (permit/deny/remark/police/mirror) Parser L2 fields L3 fields L4/meta TCAM lookup Rule match Priority resolve Hit / miss TCAM resources Slices: X Entries: X Key width: X Action engine Permit Deny Remark Police Mirror Counter hooks Rule hit: X Police drop: X Mirror bytes: X
Diagram: The practical ACL design problem is TCAM resource consumption, rule overlap, and verifiable counters—not just “supported fields”.

H2-10 · Line-rate Telemetry & Manageability

Section intent: observability as a primary selection value
  • Prioritize line-rate counters and event logs that shorten field debug cycles.
  • Cover mirroring concepts, sampling concepts, in-band telemetry capability, and black-box hooks.
  • Provide a minimal counter map and a field log schema that closes the troubleshooting loop.
Observability layers: Port / Queue / Table / CPU
A reliable debug loop requires evidence across multiple layers. Without queue depth/watermarks, tail latency cannot be explained. Without punt-by-reason, control-plane degradations look like random data-plane failures.
  • Port: link flap, CRC/FCS, error bursts.
  • Queue: depth, watermark, drops, ECN marks.
  • Table: MAC moves, multicast group entries, aging.
  • CPU: punt rate and exception reasons.
Mirror, sampling, and in-band capabilities (concept-level)
Mirroring provides payload visibility but can add overhead. Sampling is low-impact but probabilistic. In-band telemetry can embed hop metadata if supported, but should be treated as a capability flag rather than a protocol deep dive.
  • SPAN/RSPAN/ERSPAN: concept only (payload visibility vs overhead).
  • sFlow: concept only (trend monitoring vs microburst reconstruction limits).
  • INT/IOAM: capability flag only (where overhead is paid).
Deliverable · Counter map (minimum set for closed-loop debugging)
Use grouped counter sets to answer root-cause questions quickly without requiring full packet capture.
Link / Port
CRC/FCS, link flaps, error bursts (X)
QoS / Queues
Per-queue depth, watermarks, drops, ECN marks (X)
Control / Exceptions
CPU punts by reason, exception drops, trap counters (X)
Policy / Security
ACL hits, policer drops, mirror bytes (X)
Deliverable · Field log schema (forensics-ready)
Time & identity
timestamp (X), device id (X), port id (X), VLAN/BD (X)
Link & traffic events
link up/down, flap count, CRC surge, storm trigger
System context
temperature, rails, reset reason, boot count, FW version, config revision
Policy and changes
ACL update, QoS change, mirror enable/disable, admin actions (X)
Line-rate telemetry: counters → mgmt CPU → export + black-box Block diagram showing data plane counters/events aggregated by management CPU and exported via SNMP, NETCONF, streaming telemetry; with a black-box ring buffer for forensics. Telemetry: data plane counters/events → management CPU → export (SNMP/NETCONF/streaming) + black-box Data plane Port counters Queue depth / drops Table stats Punt / exceptions Mgmt CPU / SDK Aggregate counters Event log Black-box buffer Export SNMP NETCONF Streaming Field service reads black-box + counters to isolate root cause faster
Diagram: Line-rate counters plus event logs reduce reliance on full packet capture and accelerate field forensics using export paths and a black-box buffer.

H2-11 · Performance Reality: Buffers, Microbursts, Latency, Head-of-Line Blocking

Section intent: convert marketing numbers into engineering budgets and verifiable tests
  • Explain why average utilization can look fine while microbursts still drop packets.
  • Separate throughput, P50/P99 latency, loss, and recovery time into measurable acceptance targets.
  • Address bufferbloat and head-of-line blocking as the common root causes of “mysterious” tail latency.
  • Not covered: TSN time-aware scheduling (Qbv/Qci) details.
Four metrics that must be separated: throughput / latency / loss / recovery
Line-rate throughput does not imply low tail latency. Zero loss does not imply predictable latency. Acceptance targets must explicitly include P50 and P99 latency plus recovery time after congestion events.
  • Throughput: line-rate under defined frame sizes and feature enablement.
  • Latency: P50 and P99 (tail) under defined load and queue policy.
  • Loss: microburst drops vs sustained congestion drops.
  • Recovery: time to return to baseline after burst or oversubscription.
Microbursts: why “20% average” can still drop
Microbursts occur when multiple ingress sources align toward the same egress in a short window. The average rate can remain low while the instantaneous arrival rate exceeds egress capacity.
  • Window matters: burst duration (μs–ms) determines required buffering.
  • Rate gap: ingress aggregate > egress rate causes queue buildup.
  • Evidence: queue watermark spikes + short drop bursts.
Buffer architecture: shared pool vs per-queue limits
Total buffer size is not the same as effective buffer per queue. Shared pools can be exhausted by a subset of flows, while per-queue caps can cause earlier drops. Headroom reservation protects priority traffic but reduces general capacity.
  • Shared: flexible but contention-driven (whoever arrives first can consume).
  • Per-queue cap: prevents starvation but may amplify microburst loss.
  • Headroom: reserved buffer for critical traffic (policy-defined).
Head-of-line blocking: one bad actor inflates others’ tail latency
HOL appears when different traffic types are forced to share the same queue or when shared buffering makes one class crowd out others. Tail latency grows even if throughput remains high.
  • Queue coupling: coarse classification maps too many flows into one queue.
  • Shared pool coupling: one class consumes shared buffer, others drop or stall.
  • Evidence: queue depth + watermark + drop counters correlate with P99 spikes.
Bufferbloat: no drops, but latency becomes uncontrollable
Deep queues can hide congestion by absorbing bursts, but the queueing delay becomes the dominant latency term. For industrial systems, the correct target is typically P99 latency, not only drop rate.
  • Queue limits: enforce an upper bound on worst-case queueing delay.
  • WRED/ECN: if supported, can reduce standing queues (capability-level).
  • Proof: queue watermark distribution + latency histogram.
Deliverables · Buffer budgeting cards & pass criteria placeholders
Buffer budgeting card (template)
Port speed: X Gbps
Oversubscription: X:1 (ingress aggregate vs egress)
Burst size: X bytes / X μs window
Target P99 latency: X μs
Required buffer: X KB / MB
Pool policy: shared / per-queue cap / headroom (X)
Pass criteria (placeholders)
Latency: P50 ≤ X, P99 ≤ X
Microburst drops: drop rate ≤ X (define window)
Recovery: congestion recovery time ≤ X ms
Microburst → queue buildup → drop/WRED → P99 tail latency Diagram showing burst input above egress capacity, queue depth rising to thresholds, drops or WRED, and tail latency growth with queue depth. Microburst reality: burst input → queue buildup → WRED/drop → P99 tail latency time level Burst in Egress out Queue depth WRED start (optional) Drop threshold Drop / WRED P99 tail latency grows
Diagram: Microbursts build queues fast; WRED or hard drops can occur, and tail latency grows with queue depth even when average load looks low.

H2-12 · Engineering Checklist (Design → Bring-up → Production)

Section intent: move from “works” to “manufacturable and serviceable”
  • Use checklists that are evidence-driven: each item must have a proof point and a pass threshold placeholder.
  • Organize by project phase: Design, Bring-up, Production.
  • Make the result repeatable for teams and factories with versioning and black-box fields.
How to use this checklist (evidence + threshold + traceability)
Each checkbox item is written as an action with a required evidence field and a pass criterion placeholder. This format enables consistent bring-up, stable production release, and faster RMA forensics.
Design checklist (interface / clock / power / thermal / straps / EMC hooks)
  • ☐ Select host interface (SGMII/QSGMII/USXGMII/PCIe) → Evidence: lane map + pin plan → Pass: X
  • ☐ Define ref clock source and routing constraints → Evidence: jitter/route notes → Pass: X
  • ☐ Build power tree (rails, sequencing, PDN targets) → Evidence: rail list + probe points → Pass: X
  • ☐ Plan thermal path (heatsink/airflow/copper) for peak modes → Evidence: thermal model → Pass: X
  • ☐ Define strap/EEPROM policy and recovery path → Evidence: config map + versioning → Pass: X
  • ☐ Reserve EMC/ESD hooks (layout keepouts, grounding strategy) → Evidence: layout rules → Pass: X
Bring-up checklist (self-test / loopback / PRBS / FDB+ACL+IGMP / baselines / fault injection)
  • ☐ Port self-test and baseline counters → Evidence: CRC/drop/flap = X → Pass: X
  • ☐ PHY/MAC loopback per port and speed → Evidence: loopback log → Pass: X
  • ☐ PRBS patterns (if available) under load → Evidence: error counters → Pass: X
  • ☐ Validate FDB learning/aging and moves → Evidence: MAC table stats → Pass: X
  • ☐ Validate ACL core use cases and counters → Evidence: hit/drop proof → Pass: X
  • ☐ Validate IGMP snooping behavior (join/leave/aging) → Evidence: group table stats → Pass: X
  • ☐ Record counter baselines (idle + typical load) → Evidence: baseline snapshot → Pass: X
  • ☐ Fault injection (link flap, overload, mirror enable) → Evidence: recovery time → Pass: X
Production checklist (version lock / config backup / thresholds / black-box fields / aging)
  • ☐ Lock FW/SDK/config revision → Evidence: version stamp → Pass: X
  • ☐ Backup “golden config” + rollback config → Evidence: hash + storage location → Pass: X
  • ☐ Set telemetry thresholds (P99, watermarks, punts) → Evidence: threshold table → Pass: X
  • ☐ Define RMA black-box fields (reset, temp, rails, flap, key counters) → Evidence: log schema → Pass: X
  • ☐ Perform burn-in / thermal cycling and validate drift → Evidence: test report → Pass: X
Design → Bring-up → Production checklist flow Three-stage flow diagram with each stage containing 6–8 checklist blocks representing key engineering tasks. Engineering checklist: Design → Bring-up → Production (evidence-driven) Design Bring-up Production Interface Clock Power tree Thermal Straps / EEPROM EMC / ESD hooks Self-test Loopback PRBS FDB ACL IGMP Fault inject Version lock Config backup Telemetry thresholds Black-box fields Burn-in Thermal cycle
Diagram: A phase-based checklist converts switch integration from “it works” into evidence-driven, repeatable bring-up and production readiness.

H2-13 · Applications + IC Selection Logic

Section intent: convert features into application gates, resource budgets, and verification-ready acceptance targets
In-scope (this page)
  • Applications framed as switch roles + failure patterns + measurable pass targets (X placeholders).
  • Selection gates: non-negotiables → resource budgets → field operability (maintainability).
  • Capability-to-verification mapping that closes the loop with counters, logs, and bring-up tests.
  • Reference material numbers (part numbers) for silicon shortlisting (not a recommendation).
Out-of-scope (route to sibling pages)
  • TSN time-aware scheduling (Qbv/Qci) deep details.
  • PTP/SyncE/WR timing math and calibration procedures.
  • PoE/PoDL classification, thermal, surge energy sizing.
  • Industrial stacks (PROFINET/EtherCAT/CIP) implementation details.
  • ESD/surge/magnetics component-by-component design deep dive.

Application buckets (role → pain → required capabilities → pass targets)

A1 · Industrial cabinet aggregation switch
System role: aggregate PLC/IO/drive ports into a deterministic, segmented plant network.
Top pain: VLAN leakage, MAC flapping, microburst drops during synchronized cycles.
Key capabilities: VLAN/LAG/STP basics, QoS mapping, robust counters (drops, watermarks).
Pass targets: P99 latency ≤ X, microburst drop ≤ X, recovery time ≤ X ms.
A2 · Edge switch + gateway uplink (segmentation first)
System role: enforce network zones at the edge and forward to an uplink/gateway.
Top pain: ACL rule conflicts, unexpected punts to CPU, hard-to-debug intermittent drops.
Key capabilities: TCAM ACL + per-rule counters, policer, mirroring, event logs.
Pass targets: ACL hit counters consistent, CPU punt rate ≤ X, drop hotspots identified within X minutes.
A3 · Vision / imaging multicast distribution
System role: distribute multicast streams to subscribers without flooding the plant.
Top pain: multicast flooding, missing querier, group aging issues, jittery delivery.
Key capabilities: IGMP/MLD snooping + querier option, storm guard, group-table observability.
Pass targets: no-flood guarantee under X joins/sec, group aging stable within X seconds, loss ≤ X.
A4 · Oversubscribed uplink (microbursts are the real enemy)
System role: fan-in many access ports into fewer uplinks (oversubscription).
Top pain: “average looks fine” but tail latency and burst drops appear randomly.
Key capabilities: queue watermarks, WRED/ECN (if available), per-queue limits, shaping.
Pass targets: watermark distribution controlled, P99 ≤ X, drop bursts ≤ X per hour.
A5 · Remote maintenance / fast triage is the #1 KPI
System role: prioritize observability and auditability over raw port density.
Top pain: slow root-cause, missing data, non-reproducible failures.
Key capabilities: line-rate counters, event logs, black-box snapshots, mirroring, sampling hooks.
Pass targets: RCA time ≤ X, black-box fields complete, config change trace ≤ X minutes.

Selection gates (cut the candidate list in minutes, not weeks)

Gate 1 · Non-negotiables
  • Port count + speeds + host interfaces (SGMII/QSGMII/USXGMII/PCIe) match the system design.
  • VLAN + trunking + LAG + STP primitives meet the topology needs.
  • IGMP/MLD snooping present if multicast bucket is in scope.
  • ACL/TCAM present if zone isolation is required.
  • Observability exists: queue watermarks, drops, punts, and mirroring hooks.
Gate 2 · Resource budgets (X placeholders)
  • FDB / MAC entries: X
  • VLANs / translation rules: X
  • ACL / TCAM entries (and slices): X
  • Multicast group entries: X
  • Queues per port + scheduling options: X
  • Total buffer + per-queue policy + headroom: X
  • Route / ARP/ND / ECMP (if L3 offload is required): X
Gate 3 · Field operability (maintainability)
  • Counter coverage: CRC/drop, queue depth/watermarks, ECN marks (if any), CPU punts.
  • Event logs: temperature, rails, link flap, config change, watchdog/reset reason.
  • Bring-up proof: loopback/PRBS (if available), baseline snapshots, fault-injection recovery.
  • Release controls: version lock, config backup/rollback, black-box schema for RMA.
  • Acceptance targets: P50/P99 latency ≤ X, microburst drop ≤ X, recovery ≤ X ms.

Capability → verification map (engineering closure)

QoS mapping
Verify: DSCP/PCP → class → queue mapping is deterministic.
Evidence: per-queue counters + watermark distribution.
Pass: P99 ≤ X, drops ≤ X, recovery ≤ X ms.
IGMP/MLD snooping
Verify: join/leave/aging → correct group forwarding only to members.
Evidence: group table stats + flood counters.
Pass: flood = 0 under X joins/sec, aging stable within X seconds.
ACL / TCAM rules
Verify: priority + default action + conflict rules behave as designed.
Evidence: per-rule hit/drop counters + mirror samples.
Pass: no false drops in X-hour soak, punt rate ≤ X.
Telemetry + black-box fields
Verify: counters + events enable fast root-cause.
Evidence: exported counters + change logs + snapshots on fault.
Pass: RCA time ≤ X, field schema completeness ≥ X%.
Buffers + microbursts
Verify: burst tests produce controlled watermarks and acceptable tail latency.
Evidence: queue depth (watermark), burst drop counters, latency histogram.
Pass: P99 ≤ X, microburst drops ≤ X per hour, recovery ≤ X ms.

Shortlist template + reference material numbers (part numbers)

The part numbers below are provided as reference examples for shortlisting. Always confirm the exact feature set (L2/L3, TCAM, queues, counters), temperature grade, and current datasheets.
Managed L2 switch SoC (industrial/embedded)
  • Microchip: VSC7426, VSC7427 (SparX-III family examples)
  • Microchip: VSC7514 (10-port L2 switch example)
Embedded/SMB switch IC with integrated PHY + SerDes
  • Marvell Link Street: 88E6390X (11-port switch example)
  • Marvell family example: 88E6190 (model reference)
High port-count managed MAC switch controller
  • Realtek: RTL8393M-VC (52-port managed MAC switch controller example)
Enterprise / multilayer switch SoC examples
  • Broadcom: BCM56150 (family example)
  • Broadcom: BCM56160 (family example)
  • Broadcom: BCM53156XUB1KFBG (Robo family example)
Shortlist card (copy/paste template)
Candidate: X
Gate 1: pass / fail (interfaces, VLAN/LAG/STP, IGMP, ACL, counters)
Gate 2 budgets: FDB X · VLAN X · ACL X · Multicast X · Queues X · Buffer X
Gate 3 operability: logs + black-box fields + baseline + fault injection
Risks: X
Acceptance: P99 ≤ X · drops ≤ X · recovery ≤ X ms
Applications → Selection Gates → Shortlist → Verify Flow diagram showing application buckets entering three selection gates, producing a shortlist and closing the loop with verification via counters and bring-up checklist. Applications → gates → shortlist → verify (engineering closure) Application buckets A1 · Aggregation A2 · Segmentation A3 · Multicast A4 · Microbursts A5 · Fast triage Selection gates Gate 1 Interfaces · L2 basics · ACL · IGMP Counters · Mirroring Gate 2 FDB · VLAN · TCAM · Multicast Queues · Buffer policy · (L3 budgets) Gate 3 Logs · Black-box · Baselines Fault injection · Acceptance (P99) Shortlist + verify Candidate A Candidate B Candidate C Verify loop Counters · Watermarks Bring-up checklist verify → refine gates → lock release
Diagram: application needs flow through three selection gates, producing a shortlist; verification closes the loop with counters, logs, and bring-up evidence.
Page boundary reminder
When the requirement becomes time-aware determinism, timing calibration, PoE power negotiation, or protocol-stack certification, route to the corresponding sibling pages to avoid scope overlap.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-14 · FAQs (Troubleshooting, data-driven)

Scope: only L2/L3 (capability-level), QoS (non-TSN), IGMP/MLD, ACL/TCAM, telemetry, buffers/microbursts
Each answer follows a strict 4-line format and ends with measurable pass criteria placeholders (X).
Average utilization is low, but packet drops appear in short bursts
Likely cause: microbursts from fan-in oversubscription; per-queue limit/headroom too small; tail-drop triggers before congestion becomes visible in averages.
Quick check: correlate burst drops with per-queue watermark(max); compare ingress aggregate vs egress line-rate in the same time window; verify which queue/class is dropping (per-queue drop counters).
Fix: split traffic into more queues; adjust buffer sharing/headroom; cap or reshape bursty sources (token-bucket shaping); validate LAG hashing if fan-in is via LAG.
Pass criteria: burst-drop rate ≤ X events/hour at Y% load; queue watermark peak ≤ X% of limit; recovery to baseline ≤ X ms.
No drops, but P99 latency explodes under load (bufferbloat)
Likely cause: deep standing queues hide congestion (bufferbloat); strict-priority or uneven scheduling keeps queues persistently filled, inflating tail latency.
Quick check: observe sustained queue depth/watermark plateau; plot latency histogram vs watermark; check ECN marks (if supported) and per-queue service rates.
Fix: enforce queue limits to bound worst-case delay; enable/tune WRED/ECN if available; adjust scheduler weights (WRR/WFQ) and apply shaping to dominant sources.
Pass criteria: P99 latency ≤ X µs at Y% load; standing queue ≤ X% for > Z ms; drop rate ≤ X.
One traffic class starves others (head-of-line blocking symptoms)
Likely cause: too many flows mapped into the same queue; strict-priority dominance; shared buffer coupling causes one class to crowd out others.
Quick check: verify DSCP/PCP→class→queue mapping; compare per-queue occupancy and drops; check strict-priority queue service share vs WRR/WFQ queues.
Fix: separate critical and bulk flows into distinct queues; cap strict-priority or add minimum service via WRR/WFQ; apply policers on noisy classes.
Pass criteria: no starvation longer than X ms; low-priority P99 ≤ X µs; high-priority loss ≤ X/1e6 packets.
VLAN leakage: traffic appears in the wrong VLAN
Likely cause: trunk allow-list missing; wrong PVID/default VLAN; tag/untag action mismatch; ingress VLAN filtering disabled; stale FDB entries across VLANs.
Quick check: validate per-port allowed VLAN list and PVID; confirm tag/untag rules; check VLAN filter-drop counters; confirm MAC table entries are VLAN-scoped (VLAN+MAC).
Fix: enable ingress VLAN filtering; explicitly configure allow-lists; correct PVID and tag actions; clear and relearn FDB after fixing topology/config.
Pass criteria: cross-VLAN frames observed = 0 over X minutes; VLAN filter-drop counter stable within ≤ X/minute under steady state.
MAC flapping / frequent MAC moves cause instability
Likely cause: L2 loop, misconfigured LAG, or topology changes causing the same source MAC to be learned on multiple ports; aggressive aging amplifies churn.
Quick check: read MAC move/learn counters; track one flapping MAC across ports and VLANs; check STP state changes and link flap logs; verify LAG membership consistency.
Fix: eliminate loops (STP/loop-guard); correct LAG configuration; tune aging only after topology is stable; consider MAC pinning for fixed endpoints.
Pass criteria: MAC move events ≤ X/hour; topology-change events ≤ X/day; CPU utilization for control tasks ≤ X%.
LAG is enabled, but load is heavily imbalanced
Likely cause: hash key too narrow (e.g., only L2/L3); elephant flows dominate a single member; asymmetry from source traffic patterns.
Quick check: compare per-member byte/packet counters; identify top flows; check configured hash fields (L2/L3/L4) and whether symmetric hashing is enabled.
Fix: expand hash to include L4 where possible; split elephant flows at the source; add more LAG members or adjust traffic engineering policies.
Pass criteria: member utilization within ± X% over Y minutes; LAG member drops ≤ X/1e6 packets.
Multicast floods the whole network unexpectedly
Likely cause: IGMP/MLD snooping off or not active on the VLAN; querier missing; group aging collapses; control packets blocked by ACL so the table never builds.
Quick check: confirm snooping enabled per VLAN; verify querier presence; inspect group-table entries and aging; read unknown-multicast and multicast-flood counters.
Fix: enable querier on the correct VLAN; adjust aging/robustness parameters; ensure IGMP/MLD reports are permitted; apply storm guard for unknown multicast.
Pass criteria: multicast flood counter = 0 in steady state; unknown-multicast rate ≤ X pps; group table stable with ≥ X% expected members.
Multicast subscribers intermittently miss streams
Likely cause: join/leave timing vs aging mismatch; IGMP reports dropped under congestion; control packets punted then delayed; VLAN/ACL interaction blocks membership maintenance.
Quick check: confirm reports are seen on ingress (mirror); compare group entry aging vs report intervals; check control-queue drops and CPU punt rates; verify ACL permits IGMP/MLD.
Fix: tune IGMP/MLD timers to match endpoints; protect IGMP reports with QoS mapping; reduce punts; ensure VLAN membership and ACL rules allow report/querier traffic.
Pass criteria: join-to-forwarding latency ≤ X ms; stream loss ≤ X packets/1e6; group table matches membership with ≥ X% accuracy.
ACL blocks legitimate traffic (false drops)
Likely cause: TCAM rule priority conflict; default action too aggressive; match fields too broad; slice/entry exhaustion changes rule placement or disables intended rules.
Quick check: read per-rule hit/drop counters; verify rule order and default action; mirror packets before the drop stage; check TCAM utilization and slice allocation.
Fix: reorder by specificity; narrow matches; add explicit permits; align default action with policy; reserve TCAM slices for critical rules and disable noisy logging.
Pass criteria: false-drop counter = 0 over X-hour soak; intended-drop rate within ± X%; CPU punt rate ≤ X pps.
Enabling mirroring/telemetry makes performance worse
Likely cause: SPAN oversubscription (mirror egress slower than mirrored traffic); sampling/export overload; CPU interrupts and punts increase when telemetry is misconfigured.
Quick check: compare mirrored traffic rate vs mirror port speed; check mirror-port drop counters; observe CPU punt/IRQ rate; confirm export queue occupancy if streaming telemetry is enabled.
Fix: mirror only necessary subsets; rate-limit sampling; use a high-speed dedicated mirror port; shift telemetry export off the critical path; reduce per-packet logging.
Pass criteria: enabling telemetry increases P99 latency by ≤ X%; mirror-port drops ≤ X/minute; CPU utilization increase ≤ X%.
Counters don’t match: hosts see loss, but switch stats look “clean”
Likely cause: counter scope mismatch (ingress vs egress); different sampling windows; loss happens outside the measured stage (e.g., mirror port, control queue, or offloaded path not included).
Quick check: align measurement windows; compare ingress/egress/queue-level counters simultaneously; run a controlled traffic burst and reconcile deltas; confirm if drops occur on mirror/control/export paths.
Fix: standardize counter definitions and collection cadence; record snapshot bundles (counters + watermarks + punts) on anomaly triggers; add correlation IDs in logs if supported.
Pass criteria: reconciliation error ≤ X% over Y minutes; timestamp skew ≤ X ms; anomaly snapshots captured within X seconds.
High CPU punt rate causes random loss or jitter
Likely cause: exception paths punt too much traffic (unknown L2/L3, TTL, ACL log, multicast control); CPU queue starvation and interrupt storms create unpredictable control-plane delays and drops.
Quick check: read punt reason counters/registers; check CPU queue depth/watermarks and IRQ load; correlate loss with punt bursts; verify which flows are hitting exception rules.
Fix: enable hardware handling for common exceptions; narrow ACL logging and reduce punt-generating rules; isolate control traffic into protected queues; rate-limit punt sources.
Pass criteria: punt rate ≤ X pps; CPU queue depth ≤ X% steady; loss under Y% load ≤ X/1e6 packets.