Edge Observability / TAP / Probe Architecture & Validation
← Back to: 5G Edge Telecom Infrastructure
An Edge Observability / TAP / Probe is a non-intrusive appliance that mirrors traffic at line rate and turns packets into provable evidence—with lossless accounting (counters + reason codes), precise hardware timestamps, and sustained NVMe capture that stays consistent even under bursts and power events.
Its engineering value is not “more throughput,” but a measurable chain from ingress → replication → buffering → storage, so operators can trust what was captured, when it happened, and why any packet would ever be dropped.
H2-1 · What it is & boundary: the engineering definition of TAP / Packet Broker / Probe
Role (what it does)
An edge observability TAP/probe is a side-path capture system that replicates live traffic and produces analysis-grade evidence without participating in forwarding decisions. Typical outputs include selective copies (filtered/sliced), precise timestamps, deep buffering for bursts, and capture-to-storage for audit and incident reconstruction.
Boundary rule: this page stays on observability (capture, timestamp, buffer, store, manage). It does not cover user-plane forwarding appliances, network slicing gateways, or security policy enforcement engines.
“Lossless” (what it means in engineering terms)
“Lossless” is not a marketing word; it is a verifiable contract that must be tied to explicit conditions and measurable evidence.
- Traffic model is declared: port rates, minimum packet size (Mpps), and microburst profile.
- Replication expansion is bounded: fan-out, filter rules, and hotspot scenarios are specified.
- Buffering and backpressure behavior is defined: how bursts are absorbed, and what happens when sinks are slower.
- Proof is produced: aligned counters and reason codes that explain any drop (if it can happen).
Common field pattern: “average bandwidth is low but drops still occur” is usually caused by microbursts + hotspot rules + replication fan-out, not by sustained throughput limits.
Boundary comparisons (decision-oriented)
| Comparison | When it is acceptable | When it breaks (why “lossless” matters) |
|---|---|---|
| TAP/Probe vs SPAN/ERSPAN |
SPAN/ERSPAN can be acceptable for quick, best-effort troubleshooting when:
|
SPAN/ERSPAN often fails for evidence-grade capture because mirror resources are typically
oversubscribed and not engineered for microbursts. Symptoms include:
|
| Packet broker/probe vs IDS/Firewall |
A probe is designed for capture, timestamping, buffering, and export:
|
IDS/Firewall systems add policy evaluation and blocking paths. Mixing enforcement with evidence capture can introduce different bottlenecks and responsibilities. This page keeps the boundary: observability does not decide or block traffic. |
Four threads that will close the loop across the full page: Mirror ASIC → Timestamp & Buffer → Storage → MCU/OOB Management.
H2-2 · Reference architecture: from ingress ports to capture storage (data plane + management plane)
Why this architecture matters
A probe only becomes “trustworthy” when the full pipeline is treated as a measurable chain: ingress → replication → queues/buffers → timestamp + metadata → tool export and/or capture-to-storage, with a separate management plane that never steals data-path resources.
Port rates (1/10/25/100G) are only the starting point. Line-rate “Gbps” can still fail on small packets (Mpps), microbursts, or replication fan-out. The architecture must expose the right counters and reason codes to prove where limits occur.
Data plane (capture fidelity)
- Ingress ports: bi-directional or uni-directional capture; aggregation outputs are potential chokepoints.
- Parser + classification: enough for observation slicing; avoid turning into a generic policy engine.
- Replication engine: fan-out, rule hotspots, and per-rule counters are mandatory for proof.
- Queues & buffers: SRAM/DRAM hierarchy for bursts; high-water marks and drop reasons must be observable.
- Timestamp + metadata: consistent insertion point, calibrated error budget, port-to-port skew monitoring.
- Egress: tool export and/or NVMe capture with indexing and power-loss safe commits.
Management plane (operability)
- Management MCU: configuration, health telemetry, event logs, safe upgrade + rollback.
- OOB interface: resilient remote access; management must continue even when data outputs are saturated.
- Evidence outputs: aligned counters (in/replicate/out/drop), time health (offset/skew alarms), and storage health (stall/latency).
- Field maintainability: deterministic boot, self-test hooks, reset causes, and persistent logs.
Module-to-proof map (responsibility → KPI → validation)
| Module | Primary responsibility | KPIs that must exist | How it is validated |
|---|---|---|---|
| Lossless mirror ASIC | Replication, filtering/slicing, per-rule accounting, deterministic behavior under fan-out and hotspots. | Fan-out capacity, rule counters, drop reasons, ingress/egress packet counters. | Line-rate + hotspot rules + burst replay; verify counter alignment and reason codes. |
| Queues & buffers | Absorb microbursts and tool/storage backpressure while preserving capture fidelity. | High-water marks, queue residency indicators, overflow reasons, per-port drop counters. | Synthetic microbursts with controlled fan-out; check “no silent loss” and explainable behavior. |
| Timestamp + metadata | Produce consistent event ordering and multi-point correlation across ports and outputs. | Timestamp accuracy, port-to-port skew alarms, calibration status, ToD lock state (as a slave input). | Compare against external reference; validate skew and stability over temperature/long runs. |
| NVMe capture & indexing | Sustained write without stalls; power-loss safe commits; searchable evidence. | Sustained write telemetry, stall events, commit points, storage health and latency distribution. | Long-duration capture + concurrent indexing; inject power loss and verify consistency and replay. |
| Management MCU / OOB | Configuration lifecycle, telemetry export, event logs, safe upgrades with rollback. | Upgrade success/rollback evidence, log persistence, watchdog/reset causes, management availability. | Failure-injection drills (bad firmware, link loss, reboot loops); verify remote recovery and audit logs. |
A reference architecture becomes “production-ready” when it exposes measurable proof points for capture fidelity, time trustworthiness, and storage integrity.
H2-3 · Lossless mirror ASIC: line-rate replication, filtering, and accountable outputs
What this block must guarantee (beyond “it mirrors packets”)
A lossless mirror ASIC is defined by deterministic replication under fan-out expansion, with explainable behavior when downstream sinks are slower. The output is only “trustworthy” when the replication pipeline exposes per-port and per-rule counters plus drop reason codes.
Engineering boundary: replication and observation slicing are in-scope. Deep security inspection, policy enforcement, and user-plane forwarding engines are out-of-scope for this page.
Replication points (where copies are created)
- Ingress port mirroring: simplest, but often used beyond its safe envelope (burst + contention create silent loss).
- Session / rule mirroring: the practical default for observability—repeatable, auditable, and controllable.
- Rule-based multicast fan-out: required for multi-tool outputs; the main source of bandwidth expansion and hotspot queues.
Parsing & matching (only what observation needs)
Matching is used to route traffic into observable slices and to generate accountable counters—not to become a generic policy engine.
- L2/L3/L4 classification: MAC/VLAN, IP, TCP/UDP ports, and a minimal set of encapsulations (e.g., VLAN/MPLS/VXLAN) for slicing.
- Rule identity: every match should map to a stable rule ID so counters, hotspots, and drops are explainable.
Fan-out bandwidth expansion (capacity planning that prevents “surprise drops”)
Treat “Overhead” as a real budget: metadata sideband, internal bus pressure from features, and worst-case alignment of multi-port microbursts.
- Worst-case beats average: plan using smallest packets (Mpps), peak fan-out, and hotspot rule scenarios.
- Hotspot queues are predictable: a single rule ID feeding multiple tools can concentrate load into one queue class.
- Proof requirement: per-rule counters must show whether drops correlate with a specific replication branch.
Accountability: counters and reason codes (what makes “lossless” auditable)
A credible probe exposes a counter chain that can be aligned across the replication pipeline: Ingress (seen) → Replicated (expected) → Delivered (exported or captured). Any mismatch must be explained by a drop reason code rather than silent loss.
- Per-port: identifies physical ingress/egress constraints and oversubscription.
- Per-rule: identifies hotspot branches and filter-induced overload.
- Reason codes: at minimum distinguish queue overflow, rule overload, downstream stall, and self-protection.
Output formats (features change risk, not just convenience)
| Feature toggle | What it buys | What it costs / risks | Default for evidence-grade capture |
|---|---|---|---|
| Raw packets | Best fidelity and reconstruction accuracy. | Highest egress and storage pressure; worst-case bursts are harder. | Use when audits/forensics require full payload context. |
| Truncation | Reduces bandwidth and storage; keeps headers for many analyses. | May break application-layer reconstruction and some detections; requires clear documentation. | Enable only with explicit “what is safe to truncate” policy. |
| Metadata sideband | Fast indexing and correlation (rule ID, port, time, reason codes). | Consumes additional bandwidth/compute; must remain consistent with packet stream. | Enable for long captures; require commit points and integrity checks. |
| Reassembly / heavy transforms | Convenience for certain tools and higher-level views. | Increases variable latency, internal pressure, and can create new drop modes. | Prefer minimal transforms; keep evidence path deterministic. |
| Dedup | Reduces repeated packets in multi-path mirror scenarios. | Requires state, can become a hotspot, and must be explainable to avoid “missing evidence”. | Use only with tight scope and full accounting metrics. |
Practical rule: every feature that “does more” must also expose more telemetry; otherwise it turns drops into unexplainable gaps.
H2-4 · Buffering & burst: why “average is low” still drops packets, and how to prove it does not
Root causes (what actually creates loss)
Packet loss in observability pipelines is usually driven by short time-scale overload, not by average utilization. The dominant triggers are microbursts, head-of-line blocking, and fan-out that concentrates traffic into a hotspot queue.
- Microbursts: instantaneous arrival rate far exceeds service rate for milliseconds.
- Hotspot queues: one rule/output accumulates backlog while other resources appear idle.
- Downstream stalls: tool export or NVMe capture pauses briefly, pushing queues to overflow unless buffering/backpressure is engineered.
Buffer hierarchy (SRAM vs DRAM, and what each is for)
- On-chip SRAM: low latency; absorbs sharp spikes and feeds queue scheduling with stable timing.
- External DRAM: deep capacity; survives longer bursts and temporary sink slowdown.
The design goal is not “more memory,” but “enough absorb time” under declared burst profiles and fan-out configuration.
Queue organization (per-port / per-flow / per-class)
- Per-port: simple, but hotspots can starve unrelated traffic if resources are shared.
- Per-flow: fairer under hotspots; higher implementation cost and state pressure.
- Per-class: protects critical evidence paths (e.g., capture-to-storage) from best-effort tool outputs.
Backpressure (support vs no support—what changes)
Backpressure determines whether overload is converted into added latency or into dropped evidence. The key is not the mechanism details, but the operational contract:
- If backpressure is supported: bursts can be absorbed longer by slowing sources; evidence is preserved but timing may include added residency.
- If backpressure is not supported: buffering must absorb all overload; when exhausted, drops must be counted and reasoned.
A system that cannot apply backpressure must provide stronger proof telemetry: high-water marks, overflow points, and reason codes.
Burst model → buffer depth (short engineering derivation)
Service_rate must be evaluated after replication (fan-out) and after feature overhead. This is why “average bandwidth” is a misleading planning input.
Provability: how “no loss” is demonstrated (not claimed)
- Aligned counters: ingress seen ↔ replication expected ↔ delivered/exported/captured.
- Queue evidence: high-water marks and residency indicators for the exact hotspot branch.
- Drop reasons: at minimum: buffer overflow, rule overload, storage stall, thermal throttle.
- Reproducible validation: microburst replay + hotspot rules + sink slowdowns must be part of acceptance tests.
H2-5 · Precision timestamping: insertion point choices and an error budget you can accept
Why timestamps matter in observability (three non-negotiable uses)
Timestamps are the backbone of cross-point correlation, SLA and jitter analysis, and evidence ordering. A probe timestamp is only credible when its definition is explicit (arrival vs departure) and when its uncertainty is decomposed into measurable terms.
Scope boundary: this section only covers external Time-of-Day inputs (PTP slave / PPS / 10 MHz) and local distribution. Grandmaster selection, GNSS reception, and oscillator disciplining details are out-of-scope.
Where to stamp: PHY vs MAC vs ingress queue vs egress (what each adds to uncertainty)
The stamping point defines the meaning of time. Queue-based stamping can silently turn queuing residency into “time error”.
| Stamp point | Best for | Primary error terms introduced | Common failure mode |
|---|---|---|---|
| PHY | Arrival timing closest to the wire; tight correlation across ports. | Calibration / alignment of internal path; port-to-port skew monitoring required. | Looks “accurate” but drifts across ports without skew calibration and alarms. |
| MAC | Hardware timestamping with good repeatability and practical integration. | Clock-domain crossing, internal pipeline latency variation, interface alignment. | Correlation errors under load when CDC and pipeline variability are not accounted. |
| Ingress queue | Defining “arrival at scheduler”; useful for some internal latency accounting. | Queuing residency variability becomes part of the timestamp definition. | Jitter analysis becomes misleading because bursts inflate timestamp variation. |
| Egress | Departure timing and export timing; aligning “when it left the box”. | Scheduling / contention variability dominates; can be unrelated to wire arrival. | Incorrect event ordering when egress congestion reshapes timing. |
Define the timestamp explicitly
Arrival TSDeparture TSScheduler-entry TS
Correlation and ordering are only valid when all tools use the same definition.
Expose the uncertainty terms
QuantizationResidencyPort skewHoldover
Each term must map to a measurable method and a threshold for acceptance.
Time-of-Day distribution (external ToD input as a slave, not a GM)
- PTP ToD input (slave): provides time alignment; the probe must expose lock state and offset alarms.
- PPS / 10 MHz (optional): supports tighter phase alignment when available; treat as external references.
- Time counter discipline (in-box): distribute ToD to timestamp units and port blocks; record the active input and status.
A timestamp without time-status metadata is incomplete evidence. Record: lock/holdover state, active ToD source, and skew alarms.
Error budget table (source → typical scale → how to verify)
| Error source | Typical scale | How to verify (field-usable) | Mitigation / control |
|---|---|---|---|
| Quantization (clock resolution) | Bounded by timestamp clock period. | Measure distribution of repeated events; confirm minimum step equals resolution. | Use higher-rate time counter; keep conversion paths deterministic. |
| Pipeline variability (CDC / internal stages) | Load-dependent variation. | Replay traffic patterns under controlled load; compare percentile spread. | Hardware path calibration; reduce variable stages before stamp point. |
| Queuing residency (if stamp occurs after queue) | Can dominate during bursts. | Trigger microbursts and correlate high-water with timestamp spread. | Prefer stamping before queue; record residency metrics if unavoidable. |
| Port-to-port skew | Static offset + drift. | Same-event injection across ports; track relative offsets over temperature/time. | Skew calibration, continuous monitoring, alarms on drift beyond threshold. |
| Holdover drift (ToD loss) | Grows over time without ToD. | Remove ToD input and record offset growth vs elapsed holdover time. | Holdover timer + thresholds; mark evidence with holdover state. |
Acceptance metrics (must be testable)
Accuracy Jitter Port skew Holdover
Express targets using percentiles and maximum bounds; store results with run configuration.
Evidence-grade requirements
Timestamp records should include time status (lock/holdover) and source ID to preserve interpretability.
H2-6 · Smart capture: triggers, slicing, sampling, and pre/post event evidence with a ring buffer
Why “capture everything forever” fails (even when bandwidth looks fine)
Continuous full capture is limited by write IOPS, indexing overhead, thermal throttling, and long-term retrieval cost. Smart capture treats storage as an evidence system: record the right windows, label them, and keep them searchable.
Evidence-grade captures require metadata alignment: time status (lock/holdover), port, rule ID, trigger reason, and segment ID.
Trigger types (signal sources that start a capture action)
Flow / 5-tuple / port / threshold
Targets specific traffic slices. Best when the suspected offender is known; can miss the “lead-up” without pre-cache.
Telemetry anomalies
Drop spikes, buffer high-water, storage stalls. Best for unknown root causes; pairs naturally with reason codes.
Time windows / cyclic capture
Baseline sampling and periodic evidence. Must be bounded to avoid storage pressure and “unsearchable bulk”.
Multi-condition gating
Reduces false triggers: e.g., high-water AND drop spike within a short interval; emits a single event ID.
Capture actions (what the probe does when a trigger fires)
- Slice selection: bind the event to a rule ID / port set / tenant ID so the evidence window is attributable.
- Format control: choose raw vs truncation; optionally attach metadata sideband for indexing.
- Dual-path export: tool output for real-time analysis and NVMe capture for evidence retention.
- Event identity: emit an event ID that ties counters, reason codes, and capture segments together.
Ring buffer with pre/post-trigger windows (how to avoid “only the tail”)
Triggers have detection latency. A pre-capture ring buffer ensures the window includes the lead-up to the event. A post window captures propagation and recovery (retries, re-ordering, queue drain).
Pre-window (before event)
Sized to cover the typical lead-up: bursts, queue build-up, or prior packets that define context.
Post-window (after event)
Sized to capture stabilization: queue drain, export recovery, retransmission patterns, and reason code transitions.
Operational rule: pre/post windows must be set in bytes/time and validated using replay tests; otherwise evidence windows are not repeatable.
Metadata and indexing (minimum fields for searchable evidence)
Captures become evidence only when each segment can be queried and aligned to telemetry.
| Trigger | Action | On-disk format | Index fields (minimum) |
|---|---|---|---|
| Flow / rule match | Slice to rule ID; optional raw payload window. | PCAP/PCAPNG segments + optional metadata stream. | time + time-status, port, rule ID, segment ID, truncate flag. |
| Drop spike | Increase capture fidelity; attach counter snapshot. | Event-tagged segments (pre/post) with snapshot record. | trigger reason, drop reason code, counters, high-water mark, event ID. |
| High-water | Capture hotspot branch only; annotate queue metrics. | Segments + queue telemetry frames. | queue ID/class, high-water, residency, port/rule, event ID. |
| Storage stall | Mark stall intervals; preserve ordering evidence. | Segments + stall markers + system status. | stall begin/end, thermal state, write backlog, reason code, event ID. |
| Periodic window | Baseline slice for trend comparison. | Rolling segments with retention policy. | time-range, slice ID, compression flag, summary counters. |
Acceptance tests (smart capture is only useful when it is reproducible)
- Microburst replay: verify pre-window contains lead-up and post-window contains drain; correlate with high-water evidence.
- Sink slowdown: force tool/export or NVMe slowdown; ensure reason codes and segment tags remain aligned.
- Hotspot rule: create a single rule ID hotspot; verify event ID ties per-rule counters to captured segments.
H2-7 · Storage pipeline: from captured packets to NVMe/RAID evidence without becoming the bottleneck
What “capture-to-storage” must guarantee (evidence-grade outcomes)
A probe storage pipeline must deliver predictable sustained write, searchable segments, and crash/blackout consistency between packet data and its index. Peak benchmarks are not sufficient: the real objective is preventing storage stalls from turning into capture gaps.
Scope boundary: this section focuses on engineering requirements (sustained write, segmentation, commit consistency, PLP behavior). Software ecosystems and deep NVMe protocol details are intentionally out-of-scope.
Write-path layers (why the “commit point” matters)
Split the path into observable stages so bottlenecks can be located and reason-coded (e.g., storage stall, thermal throttle).
DMA → staging buffer
Absorbs short bursts and NVMe latency tails. Track high-water and backlog growth.
Chunk/segment build
Defines search granularity. Too small inflates metadata; too large hurts replay and targeted retrieval.
NVMe write + index commit
Data is not evidence until index/metadata commits atomically with the segment.
Define a single commit point: after commit, the segment must be discoverable by index and replayable without gaps. Before commit, it is transient.
Sustained vs peak write (why “high benchmark” still drops in the field)
- Cache exhaustion: burst-friendly cache can hide a sustained-write cliff; the staging buffer then fills and stalls.
- Write amplification: segmentation + indexing + redundancy can reduce effective throughput far below device peak.
- Latency tail growth: background work and garbage collection inflate the tail, not the average, breaking lossless capture.
- Thermal throttling: sustained workloads can trigger periodic slowdowns that appear as recurring capture gaps.
Practical rule: design for the worst-case write latency tail, not just average bandwidth, because staging overflow is driven by tails.
RAID and redundancy (trade write amplification for evidence reliability)
Redundancy improves evidence survivability but can increase write amplification and degrade performance during degraded mode or rebuild. For an observability node, the key requirement is not “maximum speed” but predictable minimum capture capability during failure and recovery.
Normal mode
Full segmentation + indexing; rich metadata; optional higher fidelity capture profiles.
Degraded / rebuild mode
Prioritize trigger-based windows; keep minimum index fields; reduce expensive post-processing.
PLP and consistency (prevent index/data tearing on power loss)
Power-loss protection (PLP) and journaled commits must ensure that a sudden blackout does not produce “search hits” that point to missing or partial data. The pipeline should provide a fast recovery check that validates segment boundaries, timestamps, and index references before marking evidence as usable.
| Factor | Why it causes risk | Field symptom | Control / mitigation |
|---|---|---|---|
| Throughput | Insufficient sustained bandwidth creates backlog that can overrun staging. | Capture gaps when workload becomes steady. | Provision sustained headroom; monitor backlog and throttle capture profiles. |
| IOPS / latency tail | Tail latency bursts fill staging even when average looks fine. | Periodic stalls; reason code “storage stall”. | Tail-aware sizing; segment batching; NVMe queue tuning and thermal margin. |
| Write amplification | Indexing + redundancy + metadata multiply write work. | Drop in effective capture capacity vs expectation. | Right-size segment granularity; minimal index fields for baseline mode. |
| Commit consistency | Index and data can diverge on crash/power loss. | Search finds segments that cannot replay. | Atomic commit; journal; recovery validation before exposing segments. |
Storage acceptance steps (short, repeatable)
- Write stress: sustained capture + burst replay; verify staging high-water stays bounded and stalls are reason-coded.
- Consistency test: power-loss/crash injection; after reboot, index must not reference missing data; segments must replay.
- Replay alignment: query by event ID/segment ID; verify pre/post windows and timestamps align with telemetry.
H2-8 · Management MCU & OOB: configuration, telemetry, upgrade rollback, and field maintainability
Control-plane boundary (MCU does not compete with the data plane)
The management MCU is responsible for control-plane reliability: rule deployment, health monitoring, telemetry/log uploads, and safe firmware operations. Data-plane throughput and losslessness must remain deterministic even during upgrades and maintenance workflows.
Scope boundary: this section focuses on maintainability mechanics (A/B images, rollback, versioned config, self-tests). Security product features and deep root-of-trust details are intentionally out-of-scope.
OOB management: dedicated vs shared interface (reliability trade-offs)
Dedicated OOB port
Best for remote recovery and “last-resort” access when the data network is congested or misconfigured.
Shared port (in-band)
Saves ports, but can become unreachable under congestion or failure modes; requires explicit recovery design.
Operational requirement: document the expected failure mode and recovery path for the chosen OOB approach, including “OOB unreachable”.
Upgrade strategy: A/B images, failure rollback, and versioned configuration
- A/B firmware images: keep a known-good image available for automatic rollback.
- Health-gated commit: only commit after storage, timestamp, and port self-tests pass.
- Config versioning: deploy rules as versioned artifacts; record which config version matches which firmware.
- Rollback triggers: boot failures, critical self-test failures, persistent time-status faults, or storage commit errors.
“Rollback” should be a deterministic state transition, not a manual procedure. Emit state events into logs for auditability.
Field self-tests (what must be verified before evidence is trusted)
| Self-test | What it validates | Failure symptom | Operator action |
|---|---|---|---|
| Boot integrity | Firmware image selection, version match, basic service readiness. | Boot loop or unstable startup. | Rollback to prior image; retain logs and state timeline. |
| Storage quick check | Write/read, segment boundary, index commit sanity. | Search hits without replay, or commit errors. | Enter degraded capture profile; schedule deeper storage test. |
| Timestamp path check | Time status reporting, skew alarms, counter monotonicity. | Holdover stuck, skew beyond threshold. | Verify ToD input; gate evidence with time-status tagging. |
| Port/link check | Link state, mirror path readiness, counter increments. | No traffic growth or asymmetric counters. | Validate cabling/optics; ensure rules map to correct ports. |
Operator playbook (day-0, change, rollback, troubleshooting)
- Day-0 commissioning: run self-tests → verify OOB reachability → enable baseline telemetry and reason codes.
- Rule/config change: publish config version → validate counters and capture windows → record activation timestamp.
- Firmware upgrade: download → verify → switch → health check → commit; otherwise auto-rollback and preserve logs.
- Troubleshooting: query by event ID → check time-status and storage stall markers → correlate with counters.
H2-9 · Telemetry & evidence: KPI system that proves “no drops, accurate time, no write stalls”
Why telemetry is part of the evidence chain
“Lossless” and “accurate timestamps” are not marketing statements—they are properties that must be measurable, cross-checkable, and auditable. A credible observability node exposes a small set of KPIs that align across the full path: Ingress Replication/Buffer Persisted evidence.
Scope boundary: protocol deep-dives (SNMP MIB/YANG models) are out-of-scope. This chapter defines what must be measured and how it ties to proof.
Three KPI pillars (minimum set to make proof hard)
Lossless proof chain
Ingress/egress/persisted counters must reconcile with drop reason codes and buffer high-water events.
Time health
Offset, drift trend, port-to-port skew, and holdover state determine whether timestamps are evidence-grade.
Storage health
Write-latency distribution and queue depth expose stalls. PLP events and media health protect consistency.
Lossless proof chain (counter alignment + reason codes)
The objective is not “drop = 0” in isolation. The objective is reconciliation across stages with explicit exceptions.
Stage alignment
Compare Ingress vs Replication/Queue vs Persisted. Differences must be explained by reason codes or policy exceptions (e.g., truncation/sampling).
Minimum reason codes
Expose a small but sufficient set: buffer overflow, rule overload, storage stall, thermal throttle, link flap.
Best practice: log high-water events with duration. Microbursts often appear as “average utilization is low” but high-water spikes explain drops.
Time health (evidence usability states)
Timestamp quality should be reported as a clear operational state rather than a single “locked” flag. This enables downstream systems to decide when evidence is admissible for multi-point correlation.
- Offset: instantaneous deviation vs ToD input (external reference).
- Drift trend: slope that predicts degraded accuracy over time.
- Port skew: inter-port mismatch that breaks event ordering across taps.
- Holdover state: duration and severity of time-source absence.
Suggest reporting a timestamp usability state: Locked / Degraded / Holdover / Invalid, and writing that state into capture metadata for every segment.
Storage health (latency tails, stall evidence, and PLP events)
- Write-latency distribution: focus on tail (e.g., P95/P99/P99.9), not only average bandwidth.
- Queue depth & backlog: NVMe queue depth plus staging backlog reveal stall onset.
- PLP events: record the event, recovery validation result, and any segment/index consistency repairs.
- Media health: track lifetime/health indicators to predict stall risk and evidence loss.
A storage stall should always correlate: latency tail ↑ + backlog/high-water ↑ + reason code = storage stall. If correlation is missing, telemetry is incomplete for proof.
Event log timeline (turn failures into auditable evidence)
A minimal event timeline should cover time-source transitions, buffer overflow conditions, storage stalls, thermal throttling, and reboot causes. Each event must include timestamp, duration (if applicable), and a snapshot of the relevant counters.
Time events
lock↔holdover transitions, skew alarms, ToD source changes, degraded accuracy gates.
Data/storage events
high-water spikes, overflow, storage stall, commit errors, PLP recovery checks, thermal throttle.
Alerts → likely causes → first troubleshooting path (field-first)
| Alert | Likely causes | First checks | Next action |
|---|---|---|---|
| Persisted < Ingress | Microburst overflow, rule hotspot, storage stall, thermal throttle | Check high-water spikes + drop reasons + staging backlog + temperature flags | Switch to degraded capture profile and re-run burst test |
| High-water frequent | Burst profile too aggressive, fan-out amplification, NVMe latency tail | Observe backlog slope; correlate with write-latency tail and queue depth | Increase staging headroom or reduce capture features |
| Time Degraded/Holdover | ToD input loss, unstable reference, holdover budget exceeded | Check time state transitions + skew alarms + offset/drift trend | Tag evidence as degraded; restore reference before forensic claims |
| Storage Stall | Thermal throttling, cache cliff, write amplification, media wear | Check latency tail + queue depth + PLP events + media health indicators | Lower write amplification; validate commit consistency under load |
| Thermal Throttle | Insufficient cooling, fanless enclosure limits, sustained write load | Correlate temperature with drop/stall windows and reason codes | Raise thermal margin or enforce capture rate limits |
| Unexpected reboot | Watchdog, brownout, firmware fault, storage error escalation | Reboot reason + pre-reboot event log + last-known time state | Confirm rollback/commit behavior and replay segment integrity |
Export surfaces (names only; keep proof consistent)
Telemetry must share a consistent time base with evidence segments and event logs.
SNMP gNMI REST
H2-10 · Validation checklist: proving true line-rate, true lossless capture, and evidence-grade replay
What “done” means (vendor-signable acceptance)
Validation must cover Gbps and Mpps, microburst behavior, three-way counter reconciliation, timestamp accuracy and port skew, sustained storage + indexing under load, and thermal/power stress. A pass result should be supported by exported counters and an event timeline.
Scope boundary: this checklist defines test intent, required records, and pass/fail signatures. It does not prescribe specific generators or protocol stacks.
Test groups (cover worst cases, not only averages)
Line-rate + small packets Microbursts Counter reconciliation Timestamp validation Storage stress Thermal & power
Always record (minimum deliverables)
- Raw counters with timestamps: ingress, replication/buffer, persisted evidence (segment/index commits).
- Event timeline: time state changes, high-water spikes, stalls, thermal throttles, reboot causes.
- Replay sample: a segment set that is searchable, replayable, and aligned with the timeline.
Acceptance checklist (Pass/Fail format)
| Test item | Setup | Metrics to record | Pass criteria | Fail signature |
|---|---|---|---|---|
| Line-rate + 64B | Target port rate with 64B / mixed sizes; worst-case Mpps focus | Ingress/persisted counters; high-water; reason codes | No unexplained gap; counters reconcile | Gbps OK but Mpps drops; high-water saturates |
| Microburst profile | Synthetic bursts with varied fan-out and rule hit rates | High-water duration; backlog slope; drop reasons | Bounded high-water; no overflow/stall beyond spec | Average low but burst triggers overflow without correlation |
| 3-way reconciliation | Repeat under multiple rulesets; include truncation/sampling modes | Ingress vs replication vs persisted; exceptions audit log | Differences explained by policy or reason codes | Persisted < ingress with missing reason evidence |
| Timestamp accuracy | Align to external reference; sweep load and port combinations | Offset; drift trend; port skew; time state timeline | Within target limits; no unstable state toggling | Skew drift or holdover events without metadata tagging |
| Storage stress | Sustained write + indexing + concurrent search/replay | Latency tail; queue depth; backlog; commit errors | No stall causing evidence gaps; replay aligns | Periodic stall; index/data tearing after crash test |
| Thermal & power | Warm-up to steady-state; inject load steps and supply perturbations | Thermal throttle events; drop/stall correlation; reboot reasons | No lossless break under throttling; clear evidence marking | Throttling introduces drops/stalls without logging clarity |
H2-11 · BOM / IC selection checklist (with concrete part numbers)
This checklist converts “lossless + precise time + sustained capture” into procurement-ready criteria. It prioritizes measurable proof points (counters, reason codes, QoS under burst, timestamp accuracy, sustained write QoS, and rollback-safe management), then maps them to representative BOM options.
A) Mirror / Replication ASIC (line-rate copy + filter)
Replication fan-out headroom Rule scale & hit hot-spots Drop counters + reason codes Cut-through latency- Capacity reality check: the data plane must survive worst-case fan-out amplification, not average traffic. Any “lossless” claim must be tied to a burst profile + buffer depth + egress shaping rules.
- Line-rate filtering: choose match resources (ACL / TCAM / exact match) to avoid “rule hit hot-spots” that collapse a single queue under bursty flows.
- Accountability: require per-port / per-rule counters and drop reason codes (buffer overflow, rule overload, egress blocked, etc.) so field evidence can be audited.
| Vendor | Part number | Best-fit scenarios | What to verify (must-pass) |
|---|---|---|---|
| Broadcom | BCM56880 (Trident 4 family) | Complex packet broker feature mix (mirroring + filtering + shaping) with predictable latency. | Rule scale at line rate; per-rule/per-port counters; loss behavior under microburst + fan-out; truncation/metadata modes do not create hidden drops. |
| Broadcom | BCM56990 (Tomahawk 4 family) | High-throughput, high-port-count designs where bandwidth headroom is the primary constraint. | Worst-case 64B Mpps + replication stress; queue occupancy observability; deterministic behavior when a single rule becomes a hot spot. |
| Marvell | 98DX7308 / 98DX7312 (Prestera DX 73xx) | Cost/power sensitive observability nodes; moderate port-count with practical rule/counter requirements. | Ingress/egress counters alignment; queue drops labeled by reason; sustained performance under mixed packet sizes and bursty flows. |
Procurement wording tip: specify “lossless” as a testable guarantee bound by (1) port speed(s), (2) burst profile, (3) replication factor ceiling, (4) enabled features (truncate/dedupe/reassembly), and (5) required evidence counters & reason codes.
B) Precision timestamping (where to timestamp + ToD input)
Timestamp insertion point Port-to-port skew ToD input (PTP slave / PPS) Calibration hooks- Decide the timestamp boundary: PHY/MAC/Ingress-queue timestamps produce different error terms. Require a written error budget (quantization + queue residence variation + port skew).
- ToD distribution: the capture plane needs a stable Time-of-Day input (PTP slave, PPS, optional 10MHz). Avoid expanding into grandmaster/GNSS disciplines on this page.
- Acceptance metrics: timestamp accuracy, jitter, port-to-port skew, and holdover status reporting (holdover without “atomic clock” deep dive).
| Function | Vendor | Part number | Selection notes (what it enables) |
|---|---|---|---|
| Hardware timestamp NIC/controller | Intel | Ethernet Controller E810 (e.g., E810-CAM2) | NIC-based HW timestamping path for capture/probe designs; validate timestamp API support, queueing behavior, and port skew consistency under load. |
| SyncE / 1588 clocking & ToD distribution | Renesas | 8A34001 (ClockMatrix family) | Distribute PPS/ToD-derived timebase into the timestamp domain; require lock/holdover state telemetry and alarm outputs for evidence logs. |
| SyncE / 1588 clocking & ToD distribution | Renesas | RC38612 (ClockMatrix family) | Alternative ClockMatrix option for ToD/SyncE distribution; verify output phase alignment, redundancy hooks, and field-readable status. |
Acceptance test must include: external reference comparison, port-to-port skew sweep, temperature drift observation, and “under-burst” behavior (timestamp quality must not degrade when buffers are stressed).
C) Buffering (SRAM/DRAM) for burst survival + accountability
SRAM for low-latency queues DRAM for deep burst absorption High-watermark visibility Queue residence stats- SRAM role: absorb microbursts with low latency and predictable behavior; use it for queues and per-port/per-rule bookkeeping.
- DRAM role: provide depth when burst windows exceed on-chip/SRAM. DRAM must be paired with measurable queue high-watermarks and residency statistics (evidence, not guesses).
| Type | Part number | Capacity class | Why it’s used in probes |
|---|---|---|---|
| QDR II+ SRAM | CY7C1565KV18-450BZI | 72 Mbit | Low-latency queueing/stat counters under burst; good fit for deterministic “buffer then forward” stages. |
| QDR-IV SRAM | CY7C4122KV13 (QDR-IV XP family) | 144 Mbit | Higher-speed SRAM class for heavy queueing and metadata buffering when transaction rate is the limiter. |
| DDR4 SDRAM | Samsung K4A8G165WB (DDR4 8Gb class) | Deep buffer | External deep buffering and staging; must be paired with telemetry (occupancy, stall time) to keep “lossless” provable. |
Must-have observability hooks: per-queue high-watermark, drop counters with reason codes, and (if available) queue residence time statistics.
D) NVMe storage (sustained capture + power-loss safety)
Sustained write QoS PLP (power-loss protection) Thermal throttling behavior Index/data consistency- Sustained write beats peak: require steady-state write throughput under concurrent metadata/index commits (not synthetic burst benchmarks).
- PLP is non-negotiable for evidence: capture evidence must survive sudden power loss without “index/data tear”.
- Thermal policy matters: throttling should raise an explicit event and provide clear storage-stall counters, not silent packet drops upstream.
| Vendor | Model / part family | Selection notes (probe-specific) |
|---|---|---|
| Samsung | PM9A3 (enterprise NVMe) | Use when stable sustained write and PLP-backed data safety are required; validate QoS under long runs + mixed IO sizes. |
| Micron | 7450 (enterprise NVMe) | Focus on sustained write consistency and telemetry (latency distribution, queue depth) during capture + indexing. |
| Kioxia | CD6 (data center NVMe) | Prefer where evidence retention and predictable performance are needed; require explicit PLP/flush behavior verification. |
| Solidigm | D7-P5520 (data center NVMe) | Verify sustained write under capture workloads and confirm how throttling is surfaced to logs/telemetry. |
Capture acceptance test: (1) sustained write with index commits, (2) concurrent replay/readback spot checks, (3) forced power-cut drills to confirm no corruption, (4) temperature ramp to confirm throttling yields explicit “storage stall” evidence rather than upstream packet loss.
E) Management MCU / BMC (OOB, rollback, and field maintainability)
A/B firmware + rollback Watchdog + reset cause Telemetry throughput OOB reliability- Hard boundary: management silicon must not steal data-plane determinism. It owns control-plane (rules, logs, upgrades) and exposes health evidence.
- Rollback safety: A/B images, verified boot chain, and “configuration versioning” are required to avoid bricking field probes.
- Evidence logs: reset causes, thermal events, storage stall events, time lock/holdover transitions, and buffer overflow incidents must be persistently logged.
| Role | Vendor | Part number | When to pick |
|---|---|---|---|
| Full-feature BMC | ASPEED | AST2600 | Dedicated OOB management, rich sensor/telemetry integration, and server-style remote management workflows. |
| Legacy/low-cost BMC | ASPEED | AST2500 | When remote management features are needed but performance targets are modest; verify interface needs and lifecycle. |
| High-end MCU | NXP / ST | i.MX RT1170 / STM32H743 | Lean management plane designs where a full BMC stack is unnecessary; still require robust rollback + persistent logs. |
F) PCIe fanout switch (common when NVMe/backplane scales)
Lane/port scaling NTB / partitions Sideband manageability- Use-case: scale NVMe endpoints and isolate domains while keeping capture write paths non-blocking.
- Acceptance: non-blocking under sustained write, clean error containment, and readable error counters for evidence logs.
| Vendor | Family | Part numbers (examples) | Notes |
|---|---|---|---|
| Microchip | Switchtec PFX Gen4 | PM40100A-FEIP, PM40084A-FEIP, PM40052A-F3EIP… | Choose by lanes/ports; require per-port counters and robust containment for field diagnostics. |
Selection scorecard template (copy/paste for vendor comparison)
Fill weight as 1–5 and record pass/fail evidence references (counter screenshots, test logs, thermal traces, power-cut reports).
| Block | Metric / criterion | Weight | Candidate part # | Verification method / evidence |
|---|---|---|---|---|
| Mirror ASIC | Lossless under burst + fan-out ceiling | __ | __ | Traffic gen profile + in/out/drop counters aligned; reason codes captured |
| Timestamp | Accuracy/jitter + port skew under load | __ | __ | External reference compare; skew sweep; temperature drift log |
| Buffer | Queue high-watermark visibility + no silent drops | __ | __ | Occupancy traces; reason codes; residency stats if available |
| NVMe | Sustained write QoS + PLP consistency | __ | __ | Long-run write with index commits; forced power-cut drill; readback |
| MCU/BMC | A/B rollback + reset-cause logging | __ | __ | Upgrade failure injection; rollback proof; persistent event log export |
H2-12 · FAQs (Edge Observability / TAP / Probe)
These answers are symptom-first and evidence-driven: each one points to the fastest counters/logs to check, then maps the likely root cause back to the relevant sections (H2-1…H2-10).
Lossless proofMicroburst & buffersHW timestamps NVMe sustained writePLP & consistencyOOB reliabilityWhy can packet loss happen even when average traffic is low? Where is it dropped?
Average utilization hides microbursts, queue hot spots, and fan-out amplification. Loss may occur in the mirror/queue stage (buffer overflow), during replication (rule hot-spot), or later when storage stalls push backlog upstream. “Lossless” must be proven by aligning ingress → replicated → persisted counters and checking per-queue high-watermarks plus drop reason codes.
TAP vs SPAN: when is SPAN acceptable, and when will it inevitably drop?
SPAN is best-effort: mirrored packets compete with normal switching resources and are often deprioritized under congestion. It can be acceptable for low burstiness, low replication, and non-forensic troubleshooting where gaps are tolerable. It will inevitably drop under small-packet Mpps stress, fan-out, or when “complete evidence” is required—because the switch rarely provides provable loss accounting for the mirror path.
Fan-out causes drops immediately—reduce rules first or add buffer first?
Start by separating “total egress saturation” from “single-queue hot spot.” If replication makes required egress bandwidth exceed physical output, buffer only delays the inevitable—reduce fan-out, split outputs, or lower capture fidelity (truncate/sampling). If drops concentrate on one rule/class, reduce rule hit rate or isolate queues before adding memory. Use reason codes + per-rule counters to decide which failure mode dominates.
64B small packets fail in Mpps, but Gbps looks fine—what is the bottleneck?
Gbps hides per-packet work. Small packets stress parser/match, replication bookkeeping, metadata insertion, queue scheduling, and DMA/PCIe transfer rates long before bandwidth is “full.” Another common limiter is storage tail latency that turns into backpressure and queue growth. Validation must include 64B line-rate Mpps with rule hit hot-spots and fan-out enabled, not only large-packet throughput.
Timestamp at PHY or queue egress—where do the errors show up?
PHY/MAC stamping minimizes queue-related variability but may have its own quantization and interface latency. Queue-egress stamping inherits variable residence time, so the same packet can receive different timestamps depending on congestion. Error budget typically includes quantization, asymmetric pipeline delay, port-to-port skew, and temperature drift. The “right” point is the one whose error terms can be measured and reported as health telemetry.
PTP slave is locked—why does cross-port alignment still drift?
PTP lock ensures a shared timebase, not identical per-port latency. Drift usually comes from port-to-port skew changes caused by unequal pipelines, per-port queue pressure, different timestamp insertion points, or missing per-port calibration. Temperature and traffic mix can also change internal delays. Require telemetry that exposes lock state, offset/drift, and port skew, then correlate skew excursions with queue high-watermarks and drop/stall events.
NVMe specs look fast—why does capture still see “write stall”?
Capture workloads demand sustained writes with tight tail latency while metadata/index commits happen continuously. Peak benchmarks do not represent garbage collection, write amplification, thermal throttling, or queue-depth saturation under long runs. A stall becomes dangerous when staging buffers fill and upstream queues overflow. Monitor P99/P99.9 write latency, device temperature/throttle events, and the staging backlog to separate storage QoS from dataplane issues.
After power loss, files open but indexes are corrupted—how to make PLP/consistency reliable?
Power-loss protection prevents incomplete writes, but it does not automatically guarantee that data segments and indexes commit atomically. Reliable designs define a clear commit point (segment + metadata) and use journaling/transaction semantics so recovery can replay or roll back cleanly. Validation should include forced power-cut drills during peak write plus index commits, followed by integrity checks and replay alignment sampling.
How to set pre/post-trigger so captures don’t “only catch the tail”?
The pre-trigger window must cover detection latency: counters need time to cross thresholds, and the trigger path adds delay. Use a ring buffer sized by “worst expected detection + action latency,” not by a guess in seconds. Start with a conservative pre-window, verify that the causal packets appear (not only the aftermath), then tune post-window based on how long the failure signatures persist. Always stamp triggers with rule IDs and reason codes for fast indexing.
How to prove “truly lossless”—which counters must align one-by-one?
“Lossless” is an evidence chain: (1) ingress received packets/bytes, (2) post-replication produced packets/bytes per rule/output, and (3) persisted packets/bytes (or egress output) per segment. Any mismatch must be explained by explicit reason codes (overflow, rule overload, storage stall, throttle). Require high-watermarks and timestamps for each anomaly so an auditor can replay the timeline and verify that no silent loss occurred.
When temperature rises, intermittent write/drop appears—how to tell throttle vs IO bottleneck?
Build a time-aligned timeline: temperature sensors → throttle flags (ASIC/NVMe/system) → write latency distribution → staging backlog → queue high-watermarks → drop reason codes. Throttle typically creates step-like performance degradation paired with explicit thermal events; pure IO bottlenecks often show rising tail latency without a throttle flag. The key is correlation: if drops follow storage-stall events, focus on NVMe QoS/thermal. If drops occur before stalls, focus on buffer/replication hot spots.
Shared vs dedicated management port: how to choose, and why “ping works but config times out”?
ICMP reachability is not management-plane health. Timeouts often come from shared-port contention (NCSI/shared NIC), control-plane CPU overload, TLS/session exhaustion, MTU/ACL mismatches, or a stalled upgrade/rollback state. A dedicated OOB port improves isolation and predictability; a shared port reduces BOM but increases failure coupling with dataplane congestion. Always monitor management CPU, connection counts, and persistent event logs to diagnose “reachable but unusable.”