123 Main Street, New York, NY 10001

Network TAP / Probe: Lossless Packet Capture, Timestamps & Replay

← Back to: Telecom & Networking Equipment

A Network TAP/Probe is a non-intrusive observation platform that copies traffic (and optionally records/replays it) without making forwarding or security blocking decisions. “Production-ready” means provably lossless under a declared profile (Gbps + 64B Mpps + microbursts + replication + storage), backed by aligned evidence like drop counters, buffer watermarks, timestamp stability, and IO tail latency.

H2-1 · What is a Network TAP / Probe, and where the boundary is

A Network TAP / Probe is a visibility system: it copies traffic for capture, analysis, and replay without participating in forwarding or security enforcement decisions. The fastest way to avoid design and procurement confusion is to separate three roles—TAP, NPB, and Probe—by what each one does to packets.

Scope Guard (Allowed / Banned keywords)
TAP Packet Broker (NPB) Lossless capture Hardware timestamp Buffer / burst Replication / aggregation Replay PCAP 10G/25G/100G/400G PHY
Firewall policy IPS / DDoS blocking BNG / AAA CGNAT allocation SD-WAN tunnels BGP / OSPF OTN / ROADM system design

Role TAP — “Copy traffic” (do not interpret)

  • Produces a duplicate stream from a live link (passive split or electronic mirror).
  • Goal is visibility fidelity: “what was on the wire” as closely as the deployment allows.
  • Does not decide whether packets are good/bad, allowed/blocked, or routed.

Role Network Packet Broker (NPB) — “Organize traffic”

  • Aggregates multiple inputs, filters/masks, replicates 1→N, and load-balances to tools.
  • Operates as a visibility fabric (not a security gateway): the output is “what tools should observe”.
  • Where “lossless” becomes conditional: fan-out, burst, and egress congestion determine drops.

Role Probe / Recorder / Analyzer — “Prove & replay”

  • Captures packets to storage, adds timestamps + metadata, indexes for search, and supports replay.
  • Focus is evidence quality: timestamp accuracy, integrity (checksums/hashes), and reproducibility.
  • Often the bottleneck is write jitter (tail latency), not peak throughput.

Boundary What it is NOT

  • Not a firewall/IPS/DDoS mitigator (no policy enforcement, no blocking responsibility).
  • Not a router/switch for production forwarding decisions (no control plane, no routing roles).
  • Inline products exist, but their purpose is still visibility; resilience comes from bypass design.
Fast self-check: which “plane” is the requirement?
  • Wire truth required? (microburst, forensic evidence, consistent packet counts)
  • Time-order matters? (hardware timestamps, multi-device time alignment)
  • Long retention + query? (indexing, storage control, replay with shaping)

If the answer is “yes” to any of the above, the architecture must be treated as a pipeline (copy → organize → prove), not as a single “box with ports”.

Figure F1 — Deployment boundary: production forwarding vs observation plane
Forwarding Path vs Observation Path Traffic is copied for tools; forwarding decisions remain in production network. Production forwarding path Observation (copied) path Production Network Switch / Router / Link Switch / Router Live Link Forwarding decision stays here. TAP / NPB Plane Copy • Filter • Replicate Copy / Mirror Filter / Mask Replicate / Balance Probe / Recorder Timestamp • Store • Replay HW Timestamp Capture + Index Storage + Replay Observation plane must not disturb forwarding.

H2-2 · Deployment patterns: passive optical TAP, SPAN, inline bypass—when each breaks

Deployment choice is less about marketing labels and more about failure modes. Each method can “look fine” during light load and quietly fail when burst, contention, or timing precision becomes critical. The goal is to choose a pattern whose breakpoints are understood, measurable, and acceptable for the use case.

SPAN Port mirroring — quick to enable, easiest to lie

  • What it delivers: convenient mirrored frames for troubleshooting and coarse visibility.
  • Where it breaks: mirror congestion (drops), prioritization differences, and timing skew.
  • How to detect: compare ingress counters vs captured counts; test with 64B-heavy traffic and burst.
  • Choose when: temporary triage, cost-sensitive visibility, and lossless is not a hard requirement.

Passive Optical TAP — highest fidelity, physical budgets matter

  • What it delivers: traffic copy closest to wire behavior (independent of switch mirroring resources).
  • Where it breaks: insertion loss / optical power margin, connector hygiene, module/port compatibility.
  • How to detect: pre-check link margin; verify stable BER on production + identical packet counts on copy.
  • Choose when: long-term monitoring, microburst sensitivity, forensics-grade evidence needs.

Inline Inline with bypass — visibility inline, resilience via fail-open

  • What it delivers: inline insertion when physical constraints demand it, with controlled observation outputs.
  • Where it breaks: bypass design (relay/optical switch) and unexpected link flap during failover.
  • How to detect: power-cut and reboot tests; measure outage time; validate alarms/logs match switchovers.
  • Choose when: inline insertion is unavoidable and production continuity must be protected by hardware bypass.

Decision A practical selection rule (keeps teams aligned)

  • Need “wire truth” (microburst, strict counts, evidence chain) → prioritize Passive TAP.
  • Need lowest operational risk with “good enough” visibility → SPAN with explicit limitations.
  • Must be inline (physical path constraint) → Inline + bypass with rigorous fail-open tests.

“Lossless” should be treated as a tested condition: traffic profile + burst + replication + egress write stability. A deployment method is “correct” only if its breakpoints are measurable and controlled.

Verification mini-checklist (before trusting any capture)
  • Run a known traffic pattern and confirm packet counts match on source vs capture (not just throughput).
  • Repeat with 64B-heavy traffic and with controlled bursts to expose Mpps/buffer limits.
  • If timing matters, validate hardware timestamp stability across load and across reboot/holdover scenarios.
Figure F2 — SPAN vs Passive TAP vs Inline+BYPASS (what breaks first)
Three deployment patterns Compare by fidelity, production impact, timestamp credibility, and the first failure mode. SPAN / Mirror Passive Optical TAP Inline + BYPASS Fidelity Prod impact Timestamp Breaks first Choose when Mirror congestion Drops & skew under load Quick triage Short-term visibility Optical margin Insertion loss budget Wire truth Long-term monitoring Bypass behavior Fail-open & link flap Inline unavoidable Continuity via bypass Dots indicate relative strength (more filled = better). “Breaks first” shows the first constraint to validate in the field.

H2-3 · “Lossless” really means what: throughput vs Mpps vs burst, and where drops are born

“Lossless capture” is not a single checkbox. It is a tested condition defined by a traffic model (packet-size mix, burstiness, replication factor, and output constraints). A platform can meet line-rate in Gbps yet still drop packets when Mpps (small packets) or microbursts exceed the observation pipeline’s service budget.

Framework Three knobs that decide “lossless”

  • Gbps (line-rate): sustained bandwidth capacity of ports and fabric.
  • Mpps (per-packet budget): parser/match/replicate/schedule must finish within a fixed cycle budget—64B frames are the hardest case.
  • Burst absorption: when instantaneous arrival rate exceeds service rate, buffer depth and arbitration decide whether drops occur.

Practical takeaway: a “100G/400G” label is a throughput statement, not a proof of lossless capture across packet sizes and bursts.

Mpps Why 64B traffic sets the hard floor

  • On the wire, minimum Ethernet frames carry overhead; a common engineering approximation is 84B per packet (64B frame + 8B preamble/SFD + 12B IFG).
  • Packets-per-second estimate: pps ≈ R / (84B × 8).
  • Rule-of-thumb reference points (approx.): 10G ≈ 14.88 Mpps, 25G ≈ 37.2 Mpps, 100G ≈ 148.8 Mpps, 400G ≈ 595.2 Mpps (64B-heavy).

If a capture path is validated only with large packets, the real bottleneck may remain hidden until a 64B-heavy workload appears (telemetry storms, ACK-heavy flows, scanning, or bursty control traffic).

Burst Microburst math that explains “average is low, yet it drops”

  • When a short burst arrives faster than the observation path can drain, buffer fills by the rate gap.
  • Useful sizing intuition: B_needed ≈ (R_in − R_out) × t_burst × Fanout.
  • Fanout is a traffic multiplier (replication 1→N, oversubscription, or “one input feeds many tools”).

Microbursts punish systems that look stable at 5–10% average utilization; watermark-driven evidence is more reliable than averages.

Drop taxonomy Where drops are born (and what to instrument)

  • Ingress overflow: front-end FIFO/PCS/MAC cannot absorb burst → ingress overflow counters rise.
  • Pipeline stall: parser/lookup/match exceeds per-packet budget → stage-stall/utilization counters spike (often worst on small packets).
  • Replication & egress arbitration: fan-out creates congestion → egress queue watermarks hit max, arbitration drops appear.
  • DMA/PCIe backpressure: host path cannot keep up → DMA ring overflow/backpressure flags asserted.
  • Storage write jitter: tail latency (GC/flush/RAID events) causes capture backpressure → drops correlate with I/O latency spikes.
Decision hints (symptom → most likely choke point)
  • Drops explode with 64B-heavy traffic → parser/lookup/Mpps bottleneck (pipeline stall evidence).
  • Drops occur only during bursts → buffer depth/watermark + arbitration (burst absorption evidence).
  • Drops start after enabling extra tool outputs → replication fan-out + egress oversubscription.
  • Drops correlate with host load → DMA/PCIe backpressure.
  • Drops correlate with I/O latency spikes → storage tail-latency jitter (write-path evidence).
Figure F3 — Drop-origin map across the observation pipeline
Where drops are born Lossless is a tested condition across Mpps, burst, fan-out, and write stability. Pipeline stage Drop/constraint tags PHY/MAC Parser Filter/Hash Replicate Buffer Egress Storage Ingress overflow PCS/MAC stall Stage stall Header exception Lookup overrun Rule hot-spot Fan-out burst Arb drop Watermark max Egress queue DMA backpress IO tail jitter Evidence to collect (minimal set) Drop reason counters Ingress / stage / egress Queue watermarks Peak + time correlation Write-path latency P99/P999 + backpressure Traffic model used in test Size mix + 64B share + burst profile + fan-out Result statement (what “lossless” means) Lossless under defined conditions, with counters = 0

H2-4 · Data plane architecture: parse → classify → filter/mask → replicate → load-balance

High-speed TAP/NPB/probe platforms rely on a hardware pipeline (ASIC/FPGA or equivalent) because the observation plane must handle line-rate plus worst-case per-packet timing. The key is not “more throughput” but deterministic actions per packet: extracting keys, deciding actions, replicating, and delivering consistent tool feeds without hidden drops.

Why HW Why ASIC/FPGA-class pipelines are used

  • 64B Mpps budget: header parsing + match + action must complete within a fixed cycle window.
  • Deterministic replication: 1→N fan-out and egress arbitration require predictable scheduling and watermark evidence.
  • Instrumentable correctness: stage counters, queue watermarks, and drop-reason telemetry provide proof of “lossless under conditions”.

Pipeline Stage-by-stage actions (visibility-focused, not enforcement)

  • Parse: extract headers and tunnel context needed for observation decisions (outer/inner keys where applicable).
  • Classify: compute flow key and select an action profile (fast match + counters).
  • Filter: decide whether a packet is forwarded to tools (visibility selection, not security blocking).
  • Mask / Slice: header-only, payload truncation, or field masking to reduce tool load and storage pressure.
  • Replicate: copy traffic to multiple tools or recorders; fan-out becomes a multiplier on burst and queueing.
  • Load-balance: hash-based distribution with flow consistency (same flow goes to the same tool node).

A reliable architecture makes every action measurable: rule-hit counters, per-output utilization, queue watermarks, and explicit “drop reason” telemetry.

Hashing Load-balance without breaking analysis

  • Flow consistency is the default goal: splitting one conversation across multiple tools can break stateful analytics and distort timelines.
  • Hash key selection determines outcome: 5-tuple vs inner-5-tuple (for encapsulated traffic) vs custom fields.
  • Skew detection matters: report distribution across outputs (utilization + flow counts) to catch hot spots.
Practical rule (keeps designs honest)
  • Every “feature” (filter, slice, replicate, hash) must map to a measurable counter or watermark; otherwise it cannot be proven lossless.
  • Replicate and slice decisions must be evaluated under 64B-heavy traffic and burst, not only under large packets and average load.
Figure F4 — Visibility data plane pipeline + replication/load-balance tree
Visibility data plane (deterministic actions) Parse → classify → filter/mask → replicate → hash load-balance → tool feeds. Parse Classify Filter Select Mask Slice Replicate Hash LB Exceptions Hit counters Select rate Bytes/packet Watermarks Skew Replication + load-balance tree (tool feeds) One ingress can be replicated to multiple tools; hash LB keeps flows consistent. Ingress stream Tool A (IDS/QA) mirror feed Recorder pool hash LB Tool C (NTA) filtered feed Hash key: 5-tuple Recorder A1 flow-consistent Recorder A2 flow-consistent Queue watermark Skew telemetry

H2-5 · Hardware timestamps: where accuracy is won or lost (PTP/SyncE only for marking)

For a probe or recorder, timestamps are the spine of usability: event ordering, burst correlation, and cross-box alignment all depend on them. Accuracy is “won” when timestamping happens early (near ingress) and the local timebase is stable; it is “lost” when queues, clock-domain crossings, or write-path backpressure inject variable delay. PTP/SyncE/GNSS are treated here only as timebase inputs for the probe (marking), not as a full network timing design topic.

Where to stamp Timestamp point decides how much queue delay leaks into time

  • Best case: stamp at PHY/MAC ingress before shared queues—lowest sensitivity to traffic load.
  • Later stamping: after parsing, replication, or buffering—timestamp includes variable queue wait and arbitration delay.
  • A practical rule: if timestamp variance grows with utilization or fan-out, the stamp point is too far downstream.

Timebase chain Time source → cleaner/PLL → distribution → timestamp unit

  • Time source: GNSS / PTP / SyncE provide a reference; stability during loss-of-source depends on holdover.
  • Clock cleaner / PLL: reduces short-term jitter and isolates noise; bad settings convert phase noise into timestamp jitter.
  • Clock distribution: introduces skew and potential clock-domain crossings; consistency matters across all timestamp domains.
  • Timestamp unit (TSU): stamps packets and writes metadata; its resolution and domain crossing define quantization and uncertainty.

The most useful mental model is to track three quantities separately: offset (fixed bias), jitter (short-term variance), and drift (slow change over time).

Error sources Common ways accuracy is lost (and what to log)

  • Queue-induced jitter: variable FIFO/queue wait (worse under burst and replication) → correlate timestamp spread with queue watermarks.
  • Domain crossing uncertainty: clock-domain crossings add non-determinism → track CDC events/exceptions where available.
  • Holdover drift: losing the external source increases drift → log holdover state and drift alarms.
  • Cleaner/PLL noise: insufficient jitter cleaning inflates short-term variance → monitor phase-noise/jitter status indicators if exposed.
Cross-box consistency (what makes multi-probe correlation reliable)
  • Same epoch and time scale: devices must reference the same notion of time before comparing capture timelines.
  • Calibrate fixed offset: measure and compensate deterministic offsets introduced by distribution and stamp placement.
  • Monitor drift: track drift over time and during holdover; alert when drift rate changes.
  • Prefer ingress stamping: keep stamping upstream so traffic load does not rewrite time ordering.
Figure F5 — Timestamp chain: where offset, jitter, and drift enter
Hardware timestamps: where accuracy is won or lost Treat the timebase as a chain; each stage can inject offset, jitter, or drift. Main path Error tags Optional / later stamping Time Source GNSS / PTP / SyncE Clock Cleaner / PLL jitter cleaning Clock Distribution domains / skew Timestamp Unit (TSU) ingress stamping Packet metadata timestamp + interface + flow key + counters epoch holdover jitter wander skew CDC offset quantize Later stamping queue-sensitive queue-induced jitter Cross-box alignment same epoch + calibrated offset + drift monitoring (holdover aware) offset jitter drift

H2-6 · Buffering & burst absorption: sizing rules and the hidden cost of replication

Buffering is not a mystery feature. It is a capacity to absorb the gap between instantaneous arrival rate and the observation pipeline’s drain rate. The non-obvious part is that replication and write-path variability can turn a “localized slowdown” into a system-wide backpressure event. This section turns burst absorption into a small set of sizing variables and evidence signals.

Burst model A usable sizing intuition (with fan-out)

  • When a burst arrives faster than packets can be drained, occupancy grows by the rate gap.
  • Conservative intuition: B_needed ≈ (R_in − R_out) × t_burst × Fanout.
  • Fanout is a multiplier: a 1→N replication tree increases effective egress demand and raises watermark peaks.

The correct question is not “how much buffer exists”, but “what burst profile and replication configuration can be absorbed with drop counters staying at zero”.

Buffer tiers FIFO vs DRAM vs host memory (what each is good at)

  • On-chip FIFO/SRAM: fastest and most deterministic; limited depth; best for short microbursts and per-stage smoothing.
  • External DRAM: deeper absorption for longer bursts; performance depends on arbitration patterns and contention.
  • Host memory (DMA): seemingly deep but least deterministic; susceptible to PCIe scheduling, CPU pressure, and driver noise.

Arbitration Policies matter only if “drop reasons” are visible

  • Queue arbitration chooses who drains first during contention (per-port or per-class priority).
  • Drop behaviors are only diagnosable if the platform exports drop reason counters and watermark telemetry.
  • Two common names: tail drop (drop at full) and early drop (drop before full under policy); the key is visibility.

Hidden costs Replication and write jitter amplify bursts

  • Replication amplifies demand: each additional tool output consumes egress scheduling and buffer budget.
  • One slow sink can backpressure many: a recorder with tail-latency write spikes can push back into shared queues.
  • Rule changes can amplify bursts: changing filters or slicing can unintentionally increase fan-out or shift traffic to a single hot output.
What to measure (minimal proof of burst absorption)
  • Peak watermarks: per-stage and per-egress queue peaks, with timestamps for correlation.
  • Drop reasons: ingress overflow vs egress queue drop vs DMA/storage backpressure.
  • Output utilization: per-output bandwidth and skew; replication changes should move these numbers predictably.
  • Write-path tail latency: capture backpressure aligns with P99/P999 latency spikes, not average throughput.
Figure F6 — Buffer watermark & backpressure: OK → rising → near-full → drop
Buffering & burst absorption Watermarks reveal microbursts, fan-out cost, and backpressure from slow sinks. Watermark states OK low watermark Rising burst arrives Near-Full backpressure Drop drop reason microburst fan-out amplifies egress slow Backpressure path (simplified) A slow recorder/storage sink can push back into shared queues and raise global watermarks. Ingress Shared buffer watermark Egress queues Recorder storage Fan-out increases load Oversubscribe Backpressure (IO jitter) Drop reason counters

H2-7 · Capture formats & metadata: PCAP at scale, indexing, and “wire data vs derived data”

Storing packets is not the same as making them usable. At scale, capture must be split into two deliverables: wire data for faithful replay, and derived data (metadata) for fast search and triage. The engineering goal is simple: keep the write path sequential and predictable, while making queries hit an index first.

What is captured Wire data vs derived data (why both are needed)

  • Wire data: full packets or truncated packets. Truncation saves write bandwidth and capacity, but reduces forensic depth and replay fidelity.
  • Derived data (metadata): flow records, counters, events, labels, and file pointers. Metadata answers “where to look” in seconds.
  • A scalable system avoids the trap of “only wire data” (slow search) and “only metadata” (no payload replay).

File format PCAP vs PCAPNG (engineering choice, not a spec lecture)

  • PCAP: broad tool compatibility and simple pipelines—often the default when interoperability is the priority.
  • PCAPNG: better suited for carrying extra capture context (interfaces, annotations/options) when scale and multi-port capture matter.
  • The decisive question is: which format preserves the metadata needed for operations without forcing random writes.

Format choice should be paired with a chunk strategy and indexing strategy; otherwise, “better format” still produces an unsearchable archive.

Indexing Minimal index dimensions that make archives searchable

  • Time: enables fast “minute/second window” retrieval and aligns with alarms and incident timelines.
  • 5-tuple / flow key: enables “show this session” queries without scanning entire files.
  • Interface / direction: makes multi-port, multi-link capture debuggable.
  • Tags: connects capture to events (burst windows, anomaly IDs, maintenance windows) without touching payload.

Write path Chunking + sequential append (avoid random-write amplification)

  • Split capture into chunks (by time window or size). Each chunk becomes an immutable object with a chunk ID and time range.
  • Write wire data as sequential append with alignment; avoid in-place updates during capture.
  • Build the index as a separate append-only stream that references chunk IDs and byte ranges.
Data integrity (engineering controls, not legal advice)
  • Per-chunk hash detects silent corruption and verifies replay fidelity.
  • An optional hash chain makes tampering detectable by linking each chunk to the previous one.
  • Integrity metadata should be stored alongside the index so audits do not require scanning payload files.
Figure F7 — Write & index pipeline: capture → chunk → write → index → query
Capture at scale: wire data + derived data Keep writes sequential; make queries hit the index first. Wire-data path Metadata/index path Capture ingress Packet → chunk chunk ID Compress? optional Sequential write append + aligned Chunk store wire data Metadata extract Index append log time / flow / iface Index store tags + pointers Query Index first → locate chunk → replay Minimal index dimensions time 5-tuple / flow key interface tags Integrity (engineering) per-chunk hash optional hash chain store integrity with index for fast checks

H2-8 · Storage control: NVMe/RAID throughput, write jitter, and keeping capture truly lossless

“Lossless capture” is usually lost in storage, not in headline bandwidth. The dominant failure mode is tail-latency spikes (P99/P999) caused by internal housekeeping (GC), cache transitions, or rebuild activity. Those spikes create backpressure, watermarks climb, and drops appear upstream. This section provides a diagnosis-first view: prove causality by aligning timelines.

Core idea Peak throughput is not the metric; tail latency is

  • Average write bandwidth can look fine while P99/P999 write latency stalls the write path long enough to overflow buffers.
  • Capture systems fail on variance: a rare long stall is enough to trigger backpressure and drops during microbursts.
  • Use latency distribution and queue depth, not only “GB/s”, to judge whether storage is safe for line-rate capture.

Causes Where write jitter comes from (capture-relevant only)

  • NVMe GC / write amplification: background cleanup can steal service time and inflate tail latency.
  • Cache transitions: when fast caching is exhausted, latency rises and becomes bursty.
  • RAID rebuild or scrubbing: background IO changes latency distribution even if bandwidth remains high.

Controls Make the write path predictable

  • Chunked, aligned, sequential writes: avoid random IO amplification and reduce metadata churn under load.
  • Queue depth discipline: keep write queues within a stable operating region; persistent growth indicates backpressure.
  • PLP (power-loss protection): stabilizes write semantics and reduces “partial write” risk for data + index consistency.

The goal is not maximal speed; the goal is minimal tail latency while sustaining the capture’s sequential append pattern.

RAID tradeoffs Bandwidth vs rebuild risk vs CPU overhead

  • RAID can increase sustained throughput and parallelism, but rebuild/scrub windows can inject latency spikes.
  • For capture, the decisive question is: what happens to P99/P999 latency during rebuild or background maintenance?
  • Monitor CPU overhead and controller contention; storage “success” still fails if host-side scheduling adds jitter.
Prove storage-caused drops (timeline alignment)
  • IO latency distribution: log P50/P99/P999 and mark spike windows.
  • IO queue depth: correlate sustained depth growth with spike windows.
  • Backpressure events: writer backlog / DMA stalls / flush stalls must rise before drops.
  • Drop reason counters: storage/DMA-related reasons should align with the same windows.
Figure F8 — Write jitter → backpressure → watermark → drops (causal chain)
Keeping capture truly lossless: storage causality Tail latency spikes create backpressure, watermarks climb, and drops appear upstream. Write jitter NVMe / RAID GC rebuild cache Backpressure writer backlog queue depth ↑ stalls Watermark → Drop buffers overflow near-full drop reason tail latency ↑ drain rate ↓ Evidence alignment (must match in time) Spikes → backpressure → watermark peak → drop counters (same window) time IO p99/p999 spike queue depth ↑ backpressure drop counter Checklist latency distribution • queue depth • backpressure flags • drop reasons • watermark peaks
Chapter H2-9 Focus troubleshooting playbook Format symptom → cause → fix

Failure modes & troubleshooting: symptom → root cause → fix

Effective CGNAT troubleshooting starts with a short decision path: classify the symptom, confirm with minimal counters, then apply a targeted fix.

This chapter stays “CGNAT-local”: sessions/ports/table/drops/log pipeline signals are enough to narrow the failure class without pulling in external protocol detail.

Use the cards below like a field playbook: each is four lines—Symptom, Fast check, Likely root cause, Fix.

Fault cards (field-usable) — four fixed lines per card

1) Port exhaustion / block hot spot

Symptom: new flows fail; only a subset of users/services degrade; failures cluster in time.

Fast check: port block utilization distribution becomes highly skewed; CPS spikes; drops show allocation/exhaustion reasons.

Likely root cause: hot blocks/pools saturate while averages look acceptable (skew hides risk).

Fix: reduce skew (rebalance blocks/pools), increase headroom where the skew concentrates, and verify skew flattening plus drop-reason recovery.

Key counters: port block p95/p99 utilization, pool headroom, CPS, drops-by-reason.

2) “Gbps is fine” but setup collapses

Symptom: throughput remains high, yet new sessions time out; setup rate falls off a cliff.

Fast check: CPS falls while sessions plateau; create-path drops increase; table occupancy stays high or churn spikes.

Likely root cause: create/update path is saturated (inserts, updates, or reclaim pressure), not the steady-state forwarding.

Fix: cut create-path cost (reduce churn drivers), keep occupancy below jitter threshold, and confirm CPS recovery during burst tests.

Key counters: CPS, create-path drops, occupancy, aging/churn, collision/chain depth.

3) Table jitter / early reclaim

Symptom: sessions are reclaimed too early; retransmissions rise; tail latency spikes periodically.

Fast check: aging/churn peaks align with latency spikes; collision/chain depth increases; occupancy remains high.

Likely root cause: aging/reclaim cycle becomes expensive and bursty; hot buckets amplify tail behavior.

Fix: tune aging to reduce churn, rebalance buckets to reduce collisions, and validate that churn peaks no longer trigger tail spikes.

Key counters: churn/aging rate, occupancy, collision/chain depth, tail indicators (if available).

4) Asymmetric path → return flow state miss

Symptom: one-way connectivity; intermittent “works then breaks”; failures are direction-dependent.

Fast check: state-miss drops rise for return-direction traffic; hit/miss balance becomes asymmetric during the incident window.

Likely root cause: forward and return packets do not hit the same state domain/shard, so return lookups miss.

Fix: enforce flow-to-state consistency (same flow lands in the same shard/state domain) and confirm miss drops disappear after change.

Key counters: state-miss drops by reason, per-direction hit/miss (or equivalent), shard imbalance indicators.

5) “Random” loss that is packet-size dependent (PMTU)

Symptom: small packets succeed but larger payloads fail; issues correlate with specific size ranges.

Fast check: drops spike in certain size bins; oversize/fragment-related counters increase during failures.

Likely root cause: path MTU constraints or size-dependent handling triggers drops that look random at flow level.

Fix: make size-dependent handling consistent and validate with controlled size sweeps until the spike disappears.

Key counters: packet-size histogram (if available), oversize/fragment counters, drops by reason.

6) Fragmentation / checksum inconsistency

Symptom: intermittent loss with no clear CPU spike; failures show weak correlation to throughput.

Fast check: fragment-related drops rise; checksum-related drops appear; issue reproduces only under specific packet patterns.

Likely root cause: fragmentation path or checksum update path diverges from the main translation path.

Fix: unify translation behavior for all packet paths and verify checksum/fragment drops return to baseline.

Key counters: fragment drops, checksum drops, drops by reason, packet pattern correlation.

7) Logging backpressure spillover

Symptom: throughput falls but CPU is not high; queue/watermark signals look abnormal.

Fast check: log backlog watermark rises first; export latency rises; log drops may appear; data-plane drops follow later.

Likely root cause: log pipeline cannot drain; backlog feeds back into data plane (backpressure).

Fix: reduce log pressure (record size/event rate), strengthen decoupling (buffers/batching), and confirm backlog leads no longer precede drops.

Key counters: backlog watermark, export latency, log drops, drops by reason, CPS over time.

8) Drops surge with no clear single “big” metric change

Symptom: drops increase suddenly; no single aggregate metric explains it; impact is uneven.

Fast check: drops by reason show one class dominating; distributions (blocks/buckets) worsen even if averages stay flat.

Likely root cause: localized hot spots (port blocks or hash buckets) create tail failures that aggregates mask.

Fix: switch to distribution-first view, mitigate hot spots, and verify the dominating drop-reason class returns to baseline.

Key counters: drops by reason, port block distribution, collision/chain depth distribution, occupancy.

Uses: sessions/CPS Uses: port blocks + utilization skew Uses: flow table occupancy/collisions/aging Uses: drops by reason Uses: log backlog/watermarks
Figure F8 — Troubleshooting decision tree (3-level field workflow)
CGNAT troubleshooting decision tree A three-level decision tree for CGNAT troubleshooting. Starts from three entry symptoms and guides to minimal checks and fix directions. Highlights port hot spots and logging backpressure in red. Decision tree: classify → minimal checks → fix direction Keep node text short; use distributions and watermarks to avoid “average traps” Entry symptom New flows failing / CPS cliff Intermittent / one-way / subset Drops / perf down, CPU not high Minimal checks CPS up? create drops? Port skew (blocks/pools)? Table high? collisions up? State miss (directional)? Packet-size dependent? Log backlog rising first? Fix direction Reduce create-path cost Fix port blocks / hotspots Tune aging / reduce collisions Enforce state consistency Validate PMTU / fragments Reduce log pressure / decouple
Keep the tree shallow: 3 entry symptoms, minimal checks, then a fix direction. Red highlights the two high-impact chains: port hot spots and log backpressure.
Chapter H2-10 Focus high availability (HA) Risk state sync load hurts CPS

High availability: state sync, failover, and keeping mappings consistent

HA for CGNAT is hard because state is large: replication must preserve enough mapping state for continuity without turning synchronization into a second data-plane bottleneck.

The practical trade-off is straightforward: stronger session continuity requires more replication load, which can reduce CPS and increase tail behavior if not isolated.

Success criteria: after failover, mapping consistency holds (no mass state misses) and replication load does not push CPS into a cliff during bursts.

What state must be replicated (minimal set vs optional)

Must replicate (minimal set)

Active session mapping identity (inside/outside address+port mapping) and enough lifecycle info to keep lookups consistent after takeover.

Goal: prevent mass state misses immediately after failover.

Optional replicate (only if justified)

Non-essential metadata that improves investigation or reporting but is not required for mapping continuity.

Rule: if it can be rebuilt, avoid replicating it under load.

Replication load vs data-plane health (how to avoid a CPS cliff)

Replication frequency and bandwidth

More frequent updates reduce continuity gaps but increase write amplification and contention risk.

Practical readout: CPS and create-path stability under burst should not degrade when replication becomes busy.

How to detect “sync is hurting the data plane”

If replication queue/backlog rises first and CPS drops next, synchronization load is likely spilling into the packet path.

Correlate replication backlog (if available) with CPS, drops by reason, and churn peaks.

Failover mapping consistency (avoid mass state misses)

Design goal: after takeover, existing flows should still resolve to the expected mapping state domain.

Operational test: during controlled failover, verify state-miss drops do not surge and that session continuity is preserved within expected limits.

Use the same “drops by reason + distribution view” approach: a short surge may be acceptable; a sustained state-miss plateau indicates broken consistency.

Figure F9 — Active/Standby with state replication (where sync load becomes a risk)
CGNAT high availability with state replication Active and standby nodes with a replication link, queue watermark, and failover takeover. Highlights replication congestion risk and mapping consistency check after takeover. HA: replicate minimal state, keep mapping consistent, isolate sync load Replication backlog is a first-class risk signal when CPS and create-path stability matter Active Standby Packet path State table mappings + lifecycle Replication queue watermark RISK Packet path (idle) Replicated state minimal set Takeover readiness mapping consistency check state replication congestion failover: standby takes over After takeover: avoid sustained state-miss drops; verify mapping consistency and CPS stability
Replicate only the minimal state needed for mapping continuity. Treat replication queue/backlog as a risk signal: if it rises first and CPS drops next, sync load is likely impacting the data plane.

H2-11 · Validation & acceptance checklist: how to certify a TAP/Probe is production-ready

“Production-ready” is not a datasheet claim. It is a repeatable acceptance report: under a declared traffic profile (packet mix, bursts, replication, timestamp mode, and storage mode), drops stay at zero and the evidence (counters, watermarks, backpressure, utilization, IO tail latency, timestamp stability) aligns in the same time window.

Acceptance package What must be delivered with the test results

  • Test profile: port rates, packet-size distribution (64B / mixed / IMIX), flow count, replication factor, timestamp mode, storage mode.
  • Evidence bundle (time-aligned): drop counters (per-port/per-reason), key queue/FIFO watermarks, backpressure/DMA stall signals, per-port bps+pps utilization, and (if writing) IO p99/p999 + queue depth.
  • Runbook: minimal reproducible steps (1→1 first, then enable replication, then enable filtering/masking, then enable storage last).
  • Config snapshot: filter rules, masking/slicing, replication map, hash key, output gating state, timestamp source selection.

A clean acceptance result is reproducible by a second engineer without hidden knobs.

A · Performance Line-rate + 64B Mpps + mixed packet sizes + multi-port concurrent

TestRun line-rate per port, then min-size 64B Mpps, then mixed/IMIX, then all ports concurrently.
EvidenceDrop counters = 0; watermarks stay below near-full; per-port bps+pps matches expected load.
Pass criteriaNo drops in the declared profile window (e.g., 30–60 minutes per profile step), with stable headroom.

A bps-only claim can hide failures at 64B where Mpps and arbitration dominate.

B · Microburst Burst injection validates buffer headroom and “lossless under spikes”

TestInject controlled microbursts on top of background traffic; repeat at multiple burst intensities and durations.
EvidenceWatermark peaks correlate to burst windows; no backpressure accumulation; drop reasons remain zero.
Pass criteriaZero drops across burst windows; watermarks rise and recover (no sustained near-full region).

C · Replication & load-balance 1→N fan-out, hash consistency, and explainable output balance

  • Fan-out ladder: validate 1→2→4→8 (or declared maximum) with fixed traffic profile at each step.
  • Hash consistency: the same flow must map to the same tool/output when “session consistency” is enabled.
  • Output balance: verify no single “hot” port becomes the hidden choke (check per-port bps+pps + watermark).
Pass criteriaZero drops at the declared fan-out; stable mapping policy (no unexplained flow migration); output imbalance is within declared limits.

D · Timebase & hardware timestamps Lock, holdover, temperature drift, and reboot convergence

  • Lock: record offset/jitter baseline after locking to the declared time source.
  • Holdover: disconnect time source and record drift vs time; define the acceptable holdover window.
  • Thermal drift: observe drift under temperature variation in the declared operating envelope.
  • Reboot convergence: measure time-to-stable timestamps after restart (cold/warm where applicable).
Pass criteriaOffset/jitter/drift stay within declared limits, and their changes are observable (logged) and correlated to lock/holdover state.

E · Storage Sustained write, tail latency, power-loss behavior, and index searchability

  • Sustained write: run continuous capture for N hours (e.g., 4/8/24h steps) under the declared packet profile.
  • Tail latency: monitor IO p99/p999 and queue depth; prove that spikes do not create capture backpressure.
  • Power-loss behavior: validate recovery (where supported) and confirm index consistency for post-event search.
  • Indexing: confirm queries by time window / interface / flow key return expected results (not just “files exist”).
Pass criteriaZero drops for the full sustained window; no persistent writer backlog; index remains usable and consistent after restarts.

Peak bandwidth alone is not sufficient; sustained behavior and IO tail latency decide whether capture stays lossless.

F · Reliability Fail-open behavior, upgrade rollback, and complete event logging

  • Fail-open / bypass: validate that device faults (or loss of power where designed) do not disrupt the production link.
  • Upgrade & rollback: prove a safe return to a known-good version without losing critical configuration.
  • Logs: confirm that drop reasons, watermark events, backpressure states, and timebase status changes are recorded and exportable.
Pass criteriaProduction link continuity matches the declared bypass semantics; upgrade/rollback is repeatable; evidence logs are complete for postmortems.

Reference BOM Example MPNs commonly used in TAP/Probe platforms (verify per SKU)

These are examples to anchor acceptance criteria to concrete building blocks. Exact fit depends on port speeds, timestamp needs, and storage architecture.

Hardware timestamp / capture NIC

Intel Ethernet Controller E810 — PTP-capable family often used when ingress timestamp quality matters.

Clock cleaner / DPLL (timebase conditioning)

Skyworks/SiLabs Si5345 — jitter-attenuating clock device family.

Analog Devices AD9545 — DPLL-centric time synchronization device family.

PCIe switch (multi-NVMe / multi-endpoint fabrics)

Broadcom PEX88096 — PCIe switch example SKU.

Microchip Switchtec PFX (e.g., PM4162) — PCIe switch family example SKU.

BMC / management controller

ASPEED AST2600 — widely used BMC SoC for OOB management, sensors, and lifecycle control.

NVMe SSD (capture storage)

Solidigm D7-P5520 — data center NVMe SSD family often evaluated for sustained write behavior and PLP options (SKU-dependent).

Switch silicon (merchant switching, when used)

Broadcom StrataXGS Tomahawk family — example class of switch silicon used in high-density port platforms (exact SKU varies).

Acceptance tests must be written against observables (drops, watermarks, tail latency, timestamp drift), not brand promises. Use the exact SKU datasheets to set the final thresholds.

Figure F11 — Acceptance checklist groups (clean, report-friendly)
Production-ready acceptance = zero drops + aligned evidence Group the report into clear sections; keep each section measurable. Evidence bundle counters • watermarks • backpressure utilization • IO tail latency • timestamp drift A · Performance Gbps + Mpps + IMIX B · Microburst watermark headroom C · Replication fan-out + hash consistency D · Timestamps lock + holdover + drift E · Storage sustained + tail latency F · Reliability fail-open + rollback + logs Pass = drop counters stay at 0 while evidence remains consistent in the same time window If drops appear, rerun minimal repro and isolate the first feature that triggers the choke point.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12 · FAQs (Network TAP / Probe)

Each answer is kept concise (roughly 40–70 words) and stays within this page’s scope: observation, capture, replay, timestamps, buffering, and storage.

1 TAP vs SPAN: why is SPAN often “unreliable” for packet capture? Deployment
SPAN traffic is copied by a switch after normal forwarding, so the mirror stream can be oversubscribed and silently dropped. Different platforms apply different mirroring priorities, truncation, and queue behavior. Under bursts, mirror packets may arrive late or reordered. For “wire-like” evidence, a physical TAP (or dedicated bypass device) is typically more faithful.
2 What does “lossless” really mean, and what is a practical acceptance standard? Lossless
“Lossless” means zero drops under a declared profile: port rate (Gbps), 64B minimum-size rate (Mpps), burst behavior, and replication factor. Proof requires aligned evidence: drop counters stay at 0 while watermarks remain below near-full and backpressure does not accumulate. A line-rate claim without Mpps and burst conditions is not an acceptance result.
3 Why can throughput look fine, yet 64B packets start dropping? Datapath
Small packets are a Mpps problem, not a Gbps problem. Parser/classifier arbitration, replication logic, and queue scheduling can saturate at high packet rates even when bandwidth is modest. Verification should compare pps vs bps, and check which stage reports congestion (ingress overflow, pipeline stall, or egress arbitration). Fixes usually reduce fan-out, simplify parsing, or rebalance outputs.
4 Why do drops worsen after replicating to multiple tool ports, and how is the root cause isolated? Replication
Replication multiplies load: burst × fan-out increases buffer demand and can create egress hot spots. Isolation is done by stepwise enablement: validate 1→1 first, then add replication levels, then add filtering/masking, and only then enable storage. At each step, correlate drop reasons with watermarks and per-port utilization to locate the first stage that collapses.
5 Where should timestamps be taken, and why are software timestamps often insufficient? Timestamp
The most accurate timestamps are taken as close to PHY/MAC ingress as possible. The later the timestamp is applied, the more queueing and arbitration jitter contaminates it. Software timestamps are affected by OS scheduling, interrupt moderation, and host-side buffering, which expands jitter and offsets under load. Hardware timestamps plus a stable timebase are the usual foundation for reliable timelines.
6 Why do multiple probes fail to align on the same timeline? Timebase
Timeline mismatch usually comes from inconsistent timebases: different lock states, different epochs, or different holdover behavior. Temperature drift and domain crossings (clock-to-clock handoff, FIFO queueing) add offset and jitter differences between boxes. Validation should log lock/holdover state, track offset/jitter/drift over time, and confirm that all devices use the same reference and calibration method.
7 What distortions can ERSPAN/encapsulated mirroring introduce, and when is a physical TAP preferred? Mirror
Encapsulated mirroring adds overhead and depends on transport paths that can congest, causing drops or reordering in the mirror stream. Timestamp fidelity often degrades because copying happens later and passes through additional queues. When the goal is “closest to wire data” evidence—especially under bursts—a physical TAP or dedicated bypass path is typically preferred, provided the optical/electrical budget and insertion constraints are acceptable.
8 PCAP full capture vs flow metadata: how to choose without regret? Data
Full packets (or truncated packets) preserve forensics and replay value, but drive storage and indexing cost. Flow/metadata records are lighter and faster to query, but may lose payload and fine timing details. The choice should be tied to the use case: reproduction and deep inspection favor packet data, while fleet monitoring and trend analysis often favor metadata—sometimes with selective packet sampling.
9 Why can NVMe/RAID show high peak bandwidth but still drop during sustained capture? Storage
Sustained lossless capture is limited by tail latency, not peak throughput. Garbage collection, write amplification, cache flushes, and queue buildup can create long-latency spikes that trigger backpressure. Validation should monitor IO p99/p999 latency and queue depth, and correlate spikes to writer backlog, backpressure flags, and drop reasons. Mitigations focus on sequential write patterns, buffering policy, and avoiding random-write indexing in the hot path.
10 How can replay reproduce original timing instead of “blasting packets out”? Replay
Timing-correct replay uses a scheduler driven by capture timestamps to preserve inter-packet gaps and microbursts. A shaper then enforces rate limits and burst constraints when running stress tests or time scaling. For multi-port output, per-flow ordering must be protected while distributing load. Replay output should always be gated to an isolated lab network to prevent accidental injection into production paths.
11 Inline + bypass: how are fail-open vs fail-close risks evaluated? Bypass
Fail-open means the production link remains connected if the monitoring device fails or loses power; fail-close blocks traffic in certain fault states. Risk evaluation focuses on link continuity, switchover time, mis-trigger probability, and how the bypass element behaves under power loss and reboot. Acceptance should include repeatable failover tests plus logs that prove the device does not become a single point of failure for the production path.
12 What is a reproducible “zero-drop” stress-test script strategy? Acceptance
Start with a fixed packet profile and fixed flow count in a 1→1 setup to establish a baseline. Then increase one knob at a time: add replication levels, enable filtering/masking, and enable storage last. At each step, record a time-aligned evidence bundle (drop counters, watermarks, backpressure, per-port bps+pps, and IO tail latency when writing). The first step that introduces drops defines the choke region.
Tip for consistency: keep the same question text in the visible FAQ and in the FAQPage JSON-LD below, and keep answers free of markup-heavy formatting.