Near-RT RIC Controller Hardware Design & Root-of-Trust
← Back to: 5G Edge Telecom Infrastructure
A Near-RT RIC controller is built to execute near-real-time control decisions with predictable P99 latency and low jitter, using hardware acceleration and a trusted boot/update chain to keep closed-loop actions stable in the field. Its engineering value is proven by measurable latency budgets, non-intrusive observability, and replayable evidence—rather than by generic MEC compute capacity.
H2-1 · Definition & Boundary (What Near-RT RIC owns)
The engineering purpose of a Near-RT RIC is not “general edge compute.” It is the near-real-time control execution point: ingest control events, apply xApp policies, optionally run bounded-latency inference, and emit control actions—while preserving tail-latency stability and producing correlation-grade evidence.
Definition in engineering terms
- Near-real-time is defined by closed-loop actionability on the order of ~10–500 ms (use-case dependent), where tail behavior matters more than averages.
- The core deliverable is a bounded-latency policy chain plus time-correlated evidence that proves the chain stayed within budget.
- Primary success metrics: P99 latency, jitter (variance of stage delays), and robustness to burst/degenerate inputs.
Boundary vs DU/CU
- The RIC does not implement internal DU scheduling algorithms here. It owns the policy execution path and the interface/traffic contract.
- The correct “boundary description” is a measurable contract: event rates, burst envelopes, loss/ordering sensitivity, and action emission deadlines.
- When instability is observed, the RIC must be able to show whether the violation happened in ingest, policy, infer, or emit—without blaming the DU by default.
Boundary vs Non-RT RIC / SMO
- Non-RT RIC / SMO is treated as an upstream supplier of policies/models and governance, not the focus of this page.
- Only the on-device implications are in-scope: signed update intake, admission checks, rollback safety, and attestation gating.
Checkable criterion: stage-budget decomposition
Near-real-time must be expressed as a measurable sum of stages, each with a timestamp definition and correlation ID:
T_ingest: NIC receive → queueing → parse start (ingress overhead + queue depth).T_policy: normalized event ready → policy decision ready (compute + synchronization effects).T_infer: inference start → inference output (must be bounded and optionally bypassable).T_emit: action build → NIC transmit (queueing + IRQ scheduling sensitivity).T_network: external propagation/peer processing (observed, not “explained away”).
H2-2 · Workload & Latency Budget (make “near-real-time” measurable)
Near-real-time performance must be framed as a burst-aware latency distribution, not as average throughput. The practical objective is simple: keep P99 stage latency bounded under event storms, and ensure every stage is observable with correlation-grade timestamps.
Workload model (control pipeline, not DU internals)
- Ingress: receive control events (often bursty) and normalize into a stable internal schema.
- Decision: xApp policy evaluation, priority arbitration, and conflict resolution.
- Optional inference: bounded-latency scoring/detection that must be bypassable during overload.
- Egress: action packaging, emission, and evidence tagging for later replay/correlation.
Why bursts break systems (and what must be controlled)
- Event storms create queue growth, which inflates latency even when CPU looks “not fully utilized.”
- Tail latency typically comes from scheduling and locality: IRQ migration, NUMA remote access, cache cold paths, and lock contention.
- Degradation is mandatory: overload must trigger backpressure, down-sampling, or inference bypass—rather than random timeouts.
Acceptance definition: distribution + burst envelope
- Define completion by P50/P95/P99 per stage and for
T_total—never by averages alone. - Specify a burst envelope: peak event rate, burst duration, and allowed backlog recovery time.
- Require a reproducible harness: recorded input (pcap/event log) → deterministic replay → decision output latency distribution.
Minimal stage budget template (fill with targets)
| Stage | Timestamp definition | Primary tail-latency sources | Evidence/metrics to capture |
|---|---|---|---|
| T_ingest | NIC RX → parse start | RX queue depth, IRQ migration, cache cold parse path | Queue depth histogram, IRQ affinity, ingest span timing |
| T_policy | event ready → decision ready | lock contention, NUMA remote hits, core migration | runqueue/steal time, numa_miss/remote, policy span timing |
| T_infer (optional) | infer start → infer output | cold start, batching side-effects, PCIe/DMA arbitration | infer P99, accelerator queue depth, bypass trigger logs |
| T_emit | action build → NIC TX | TX queueing, congestion, buffer pressure | TX queue depth, drops/ECN marks, emit span timing |
| T_network | TX → peer accept | external variable (observe, do not ignore) | one-way delay estimate, loss/ordering, time-quality tags |
Note: each stage must emit correlation ID + timestamps; otherwise, tail-latency root cause cannot be proven.
H2-3 · Controller Hardware Topology (turn P99 into a board-level problem)
A Near-RT RIC controller is built for determinism: stable stage latency under bursts, plus replayable evidence. The topology must minimize tail-latency amplification from IRQ migration, NUMA remote access, PCIe/DMA arbitration, and NVMe write tails.
CPU & memory (determinism > raw core count)
- Core isolation: dedicate “control-loop cores” for ingest/policy/emit and keep non-critical work off the hot path.
- NUMA locality: keep NIC queues, processing cores, and memory allocations on the same NUMA node to avoid remote jitter.
- Evidence-ready timing: stage timestamps must be consistent even when CPU frequency or load changes.
Ethernet/NIC (queues define tail behavior)
- Multi-queue + RSS: separate event streams to reduce head-of-line blocking during storms.
- Queue depth is latency: a shallow backlog can inflate P99 even if average CPU utilization looks safe.
- Hardware timestamp is used for measurement/correlation (not as a time source design topic).
PCIe (why switch/retimer shows up in near-real-time appliances)
- Topological pressure: multiple NICs + accelerators + NVMe often exceed root-port and lane planning constraints.
- Arbitration creates tails: shared DMA and contention can produce rare-but-long stalls that dominate P99.
- Retimers extend reach and SI margin, but require monitoring of link errors/retrain events to avoid hidden jitter.
NVMe (state, logs, replay — written like an evidence system)
- Write strategy: separate high-frequency evidence tags from bulky traces to avoid write amplification under bursts.
- Consistency: define what must be durable (commit) vs buffered, especially across power events.
- Tail control: measure write latency histograms; “rare long writes” are a common root cause of pipeline stalls.
Checkable criterion: “one-hop” P99 budget (measured, not assumed)
| Hop | What it measures | How to measure | Common P99 drivers |
|---|---|---|---|
| NIC → CPU | RX → parse start (ingest overhead) | stage spans (t0→t1) + queue depth | IRQ migration, RX queue buildup, cache-cold parse |
| CPU → Accelerator | enqueue → result return | span timing + accelerator queue counters | PCIe arbitration, DMA contention, cold start |
| CPU → NVMe | append → durable commit (evidence/log) | write latency histograms + fsync/commit spans | write amplification, GC/merge tails, PLP policy |
A topology is “done” only when hop-level P99 aligns with the stage budget and explains total tail behavior.
H2-4 · eBPF Acceleration Strategy (where it helps, and how to keep it safe)
In a Near-RT RIC, eBPF is valuable only when it improves tail-latency stability or burst survivability. The design goal is to place eBPF at high-leverage points with strict guardrails: bounded complexity, controlled updates, and resource isolation so observability never steals cycles from the control loop.
High-leverage attachment points (mapped to the control pipeline)
- Ingest filter: early drop/merge/sampling of noisy events during storms to protect parse/policy stages.
- Fast rules: small, deterministic checks that remove low-value work before full policy evaluation.
- Light features: lightweight extraction to reduce policy/inference work per event.
- Hot-path observability: minimal, bounded probes that feed correlation evidence without adding jitter.
Guardrails (avoid “cool but unstable”)
- Bounded complexity: limit program size and worst-case path length; avoid unbounded loops and heavy map operations.
- Safe updates: versioned rollout + rollback; changes must be traceable in the evidence stream.
- Fail behavior: if validation/load fails or overhead exceeds a cap, automatically revert to a safe baseline path.
Isolation (control loop stays clean)
- Separate domains: keep “control-loop cores” dedicated; place telemetry aggregation and heavier probes on telemetry cores.
- Overhead cap: define a maximum telemetry cost under steady state; reduce probe density before harming P99.
- Backpressure integration: eBPF can assist burst handling by enforcing rate limits and sampling policies at ingress.
Checkable criterion: before/after regression set
- Per-event CPU cost trend (cycles proxy) + cache-miss trend on ingest/policy stages.
- P50/P95/P99 of
T_ingest,T_policy, andT_totalunder a defined burst envelope. - Backlog recovery time after a storm (time to return to baseline queue depth).
- Telemetry overhead cap honored in steady state (no “observer-induced jitter”).
H2-5 · P4 / Programmable Pipeline Hooks (helper, not the main actor)
In a Near-RT RIC controller, P4 is not a “whitebox switch topic.” It is a deterministic helper on the NIC/SmartNIC side: classify and mark control-plane messages, offload sampling/mirroring, and export queue signals—so the CPU hot path spends cycles only on high-value decisions and keeps P99 stable under congestion.
Boundary (what belongs here)
- In scope: control-plane message classification, priority marking, lightweight match-action, mirror/sampling offload, queue telemetry tags.
- Out of scope: switch architecture tutorials, routing/forwarding pipelines, TSN switch design, whitebox ecosystem deep dives.
- RIC value frame: P4 reduces low-value CPU work and increases observability without adding jitter to the control loop.
Where P4 sits (NIC-side hooks)
- Before CPU parse: classify and mark events so the CPU sees a cleaner, prioritized stream under storms.
- Split lanes: send mirrored/sampled evidence traffic to telemetry cores, protecting the control cores.
- Queue signals: export queue depth/drops/marks as tags that explain tail latency (evidence-friendly).
Minimal action set (deterministic and bounded)
- Classify: stable message grouping by header/fields to avoid expensive host-side parsing.
- Mark: priority tags for critical control actions when queues build up.
- Sample / Mirror: bounded observability that does not compete with the hot path.
- Queue telemetry: counters and watermark tags that correlate with P99 inflation.
- Light match-action: only predictable, bounded operations—no feature creep.
Checkable criterion: prove it is worth it
- Lower CPU P99: under the same replayed burst,
T_ingestand/orT_policyP99 decreases or jitter narrows. - Lower loss under congestion: critical control messages show fewer drops/late arrivals at the defined burst envelope.
- Better observability without disturbance: telemetry lane overhead stays capped and does not worsen
T_totalP99.
H2-6 · AI Inference Acceleration (in-loop, deterministic, and degradable)
In a Near-RT control loop, inference is not “edge AI compute.” It is a bounded-latency decision stage that must meet an
explicit SLA and remain safe under overload. The correct design is deterministic: measurable P99_infer,
degradable (normal → reduced → bypass), and auditable via signed model updates and rollback.
Where inference sits in the control loop
- Inline (synchronous): inference directly influences the current decision; requires the strictest P99.
- Assist (advisory): inference produces a score/signal that policy can accept or ignore; still must be observable and bounded.
- Bypass is mandatory: if SLA is at risk, policy must continue using a safe baseline path.
Use cases (kept intentionally narrow)
- Anomaly detection: suppress event noise and reduce storm amplification.
- Policy scoring: rank candidate actions under uncertainty.
- Load prediction: provide trend signals that stabilize policy decisions.
Acceleration choice (CPU vs GPU/NPU/FPGA) — judged by determinism
- P99 & jitter: prefer the option with stable tails, not the highest peak throughput.
- Cold-start risk: initialization and model load must not create rare long stalls.
- Batch tradeoff: batching can lower average but may raise tails; use only if
P99_inferstays inside budget. - Update frequency: frequent model changes require predictable rollout/rollback without loop instability.
- PCIe contention: DMA arbitration can create tails; measure and cap queueing delay.
Determinism patterns (normal → reduced → bypass)
- Normal: inference on, meeting the target P99 under the defined burst envelope.
- Reduced: lower-rate inference, smaller model, or partial features to keep P99 bounded.
- Bypass: policy proceeds with rules/cache/last-known-good signal when inference budget is threatened.
Checkable criterion: inference SLA + safe updates
- SLA:
P99_infer < X ms, with a clearly stated share of the total budget. - Overload behavior: queue-depth or overhead-cap triggers Reduced/Bypass automatically.
- Model lifecycle: signed model intake, staged rollout, and rollback window; every decision is tagged with model version.
H2-7 · Interconnect & Determinism (Ethernet/PCIe as a P99 problem)
In a Near-RT RIC controller, “fast links” are not the goal. The goal is determinism: control-plane events remain prioritized and predictable under storms, multi-tenant isolation, and DMA contention. Ethernet queues, PCIe topology, NUMA locality, and IRQ behavior must be designed as one system—because P99 is where failures hide.
Ethernet side (control-plane priority, not switching theory)
- Multi-queue + RSS: split event classes so a burst in one stream does not block another.
- Priority and drop policy: critical control messages are protected; low-value traffic is sampled or dropped first under congestion.
- Evidence-safe observability: mirrored/sampled traffic is routed to telemetry lanes so the control loop stays clean.
PCIe side (topology + isolation + predictable arbitration)
- Topology matters: shared root ports and switches can create rare-but-long stalls that dominate tail latency.
- Bandwidth budget ≠ determinism budget: even when average bandwidth is sufficient, DMA arbitration can inflate P99.
- Isolation (only as it affects determinism): IOMMU and virtualization features reduce unsafe sharing and help keep jitter bounded.
NUMA & IRQ (prevent core jitter)
- IRQ affinity: keep RX processing stable on the intended cores to avoid cache-cold migrations.
- NUMA locality: NIC queues, hot-path cores, and memory allocations must stay within the same domain.
- Busy-poll (if used): applied only to reduce interrupt jitter, never to “chase throughput.”
Checkable criterion: the “three culprits” for tail latency
- IRQ migration: latency spikes align with interrupt/core drift and softirq load movement.
- NUMA cross-domain: spikes align with remote memory access increases and cross-node allocation.
- Queue congestion: spikes align with RX queue depth, drops, and late-arrival counts for critical classes.
“Done” means P99 changes can be explained by these three signals and reduced under the defined burst envelope.
H2-8 · Timing for Measurement & Correlation (time as evidence, not time source)
This page uses timing only to measure and correlate the control loop: event ordering across nodes, action-to-effect attribution, and replayable evidence. It does not design the time source. The core requirement is to attach timestamps plus time quality to the evidence chain so diagnostics remain trustworthy under drift or jumps.
Why timing is needed (kept intentionally narrow)
- Cross-node ordering: stable ordering of multi-source events during storms and failovers.
- Action-to-effect correlation: determine whether a control action produced the expected effect within the budget window.
- Replay and root cause: align evidence so the same inputs reproduce comparable timelines.
Time input → timestamp policy → log alignment
- Time input: accept PTP or NTP as an input; record the source and quality level.
- Timestamp policy: stamp at stage boundaries (ingest / policy / emit) and preserve correlation IDs.
- Log alignment: align multi-stream logs onto one evidence timeline, then replay with the same alignment rules.
Mismatch handling (drift / jump)
- Drift shifts ordering and correlation windows; detect and mark reduced confidence.
- Jump can create false “negative latency” and wrong ordering; detect and tag aggressively.
- Degrade safely: keep the control loop running, but downgrade cross-node correlation and highlight uncertainty in evidence.
Checkable criterion: “time quality” becomes a real field
- Offset threshold: exceeding a configured threshold downgrades time quality and triggers an alarm.
- Jump detection: jump events are tagged so downstream analytics do not trust broken ordering.
- Evidence tags: critical records carry
timestamp+time_quality+time_source.
H2-9 · Hardware Root-of-Trust & Supply-Chain Security (boot → xApp, auditable)
Hardware root-of-trust (RoT) is valuable only when it becomes auditable and testable. The target is a verifiable chain from power-on to xApp execution: boot integrity, remote attestation, signed artifacts (including policy/model updates), controlled admission, and evidence logs that support field investigations without ambiguity.
Trust chain: from power-on to runtime
- Secure Boot: only a signed boot path is allowed to execute (enforced start-of-trust).
- Measured Boot: components are hashed into PCRs to form a measurable baseline (audit trail).
- Remote attestation: the platform proves its identity + PCR state before accepting sensitive workloads.
- Signed images: OS, drivers, container images, and xApp bundles are verified before launch.
- Runtime integrity: drift is detected via periodic checks and integrity signals (not “trust once”).
xApp supply chain: signed, declared, and gated
- Signing: xApps and dependencies ship with a signature chain that maps to approved publishers.
- SBOM: software bill of materials is attached to the artifact to expose components and versions.
- Admission control: only approved signature + SBOM + policy compliance can start or update.
- Non-repudiation logs: every install/update/action is recorded with identity + hash + timestamps.
Critical rule: policy/model updates belong to the same chain
- Policy updates: treated as signed artifacts (not “config files”).
- Model updates: signed, versioned, staged rollout, and rollback window are mandatory.
- Anti-poisoning posture: updates are rejected or forced into safe mode when attestation is unhealthy.
Checkable acceptance checklist (audit-ready)
- PCR baselines: known-good measurements exist and are versioned for each platform/firmware release.
- Signature chain: every runnable artifact has a verifiable chain and an allowlist policy.
- Attestation failure policy: explicit behavior is defined:
deny/degrade/read-only. - Evidence continuity: logs include artifact hash + signer identity + decision result (accept/reject).
H2-10 · Observability & Closed-Loop Evidence (low overhead, replayable)
Near-RT failures often hide in tail latency and “cannot reproduce” field reports. The objective here is a non-perturbing evidence system: always-on low overhead in steady state, automatic escalation under anomalies, and replay that turns logs into a repeatable diagnosis process.
Metric layers (what must exist, not a wish list)
- System: CPU, IRQ, NUMA, NIC queue depth/drops, PCIe contention signals.
- Stage: per-stage latency distribution (ingest → decide → emit) with P50/P95/P99.
- Protocol: message loss, re-ordering, late-arrival counts for critical classes.
- Security: attestation status, signature verification outcomes, update decisions.
Sampling policy (observe without injecting jitter)
- Steady mode: low-cost counters and coarse histograms, bounded overhead.
- Anomaly mode: escalate selectively (targeted spans, short windows, per-class detail).
- Guardrail: observability traffic stays on telemetry lanes; control cores remain protected.
Replay workflow (field issue → reproducible diagnosis)
- Capture: evidence logs + correlation IDs + time tags + time quality.
- Reconstruct: rebuild the stage timeline and identify where P99 inflated.
- Replay: feed recorded inputs back into the pipeline to reproduce latency distributions.
- Locate: map spikes to culprits (queue / IRQ drift / NUMA remote / PCIe stalls / security rejects).
Checkable criterion: spans exist end-to-end
- Every critical request produces spans for
ingest→decide→emit. - Each span carries
timestamp,correlation_id, andtime_quality. - Security decisions (attestation/signature/admission) are attached to the same correlation chain.
Validation & Failure-Mode Playbook
This section defines what “done” means for a Near-RT RIC controller by turning architecture claims into measurable thresholds, repeatable injections, and auditable evidence artifacts. The goal is to prove three outcomes under stable load, burst conditions, and degraded environments: near-real-time (tail-latency stays inside budget), trusted (boot/update/xApp chain is attestable), and stable (self-protection + replayable root-cause evidence).
A) Validation Matrix (measurable + sign-off)
Use a fixed replay input (pcap/message recordings) and run each scenario long enough to expose drift and tail behavior. Each test row defines: what it proves, threshold, method, evidence artifact, expected alarm, and expected system behavior.
| Test ID | Scenario / Injection | Metric & Threshold | Method | Evidence Artifact | Expected Alarm & Behavior |
|---|---|---|---|---|---|
| RT-PERF-01 | Stable load (baseline replay), long-run drift check. |
P99_total ≤ B_total Jitter within J_max Late-action rate ≤ R_late |
Deterministic replay → capture per-stage spans; pin IRQ + CPU sets. | Percentile report + span histograms; per-stage breakdown; CPU/IRQ/NUMA snapshots. |
No protection triggers. Any drift must correlate to a measurable resource signal (queue/NUMA/IO). |
| RT-PERF-02 | Burst event storm (alert spike / mobility surge model). |
P99_total stays ≤ B_total Drop ≤ D_max Reorder ≤ O_max |
Burst generator + replay; apply priority rules (control-plane first). | Queue-depth vs P99 curve; drop counters; span samples during burst crest. | Congestion alarm at threshold; self-protection mode triggers within T_protect and preserves critical messages. |
| RT-NET-01 | Degraded network (delay/jitter/loss/reorder) on ingress or egress. |
P99_total ≤ B_total (or controlled degrade) Control continuity ≥ C_min |
Netem impairment profile; compare “impairment on/off” A/B runs. | Impairment profile + before/after latency; reorder evidence; replayable capture bundle. | “Quality degraded” alarm; degrade behavior is deterministic (e.g., safe-action policy). |
| SEC-BOOT-01 | Measured boot + remote attestation pass/fail paths. |
PCR baseline match rate ≥ P_pass Fail path = deny/degrade with audit proof |
TPM-backed measured boot; force mismatch; verify admission policy. | PCR quote record; attestation decision log (accept/reject + reason); immutable audit trail. | Attestation-fail alarm; system enters configured mode: deny or read-only or degraded. |
| SEC-UPD-01 | Signed image / xApp / policy/model update (valid/invalid signatures). |
Invalid signature = block + log Rollback window ≤ T_rb |
Inject bad signature; test rollback; verify “no untrusted window”. | Signature chain evidence; SBOM/admission records; rollback proof pack. | Signature-fail alarm; update is rejected; rollback restores previous trusted state. |
| FI-TIME-01 | Time jump/drift injection (measurement alignment stress). |
Jump detection ≤ T_det Time-quality tagged on all spans |
Force offset step; compare event-order consistency. | time_quality timeline; span alignment before/after; root-cause replay proof. | Time-quality alarm; cross-node correlation degrades safely (no false causality). |
| FI-NIC-01 | NIC queue congestion + IRQ migration (tail-latency killer #1). |
P99_total increase ≤ ΔP99_max Critical-class drop ≤ D_crit |
Saturate RX/TX queues; disturb IRQ affinity; observe queue depth + P99. | Queue counters + IRQ trace; P99 vs queue plot; span evidence at congestion peak. | Congestion alarm; priority rules preserve control-plane; IRQ pinning prevents jitter amplification. |
| FI-IO-01 | NVMe write amplification (logging/replay pressure). |
IO latency bounded Control P99 remains ≤ B_total (or controlled degrade) |
Stress log writes; switch write policy (batch/limit); verify isolation. | IO latency stats; write amplification evidence; control-span correlation. | Storage-pressure alarm; log rate limiting engages; closed-loop path stays protected. |
| FI-CPU-01 | CPU frequency / scheduling jitter (tail-latency killer #2). |
P99 variance ≤ V_max Context-switch spikes bounded |
Induce frequency swings; disturb scheduler; validate CPU-set isolation. | Frequency + scheduler traces; before/after percentile comparison; reproducible replay seed. | “Determinism degraded” alarm; isolation policy keeps control cores stable. |
B) Failure-Mode Playbook (fast triage + deterministic recovery)
Each playbook below is designed for field reproducibility. The shortest path is always: Queue → IRQ → NUMA → PCIe/IO, while keeping the correlation-id intact for replay.
1) “P99 cliff” during burst
- Trigger: burst crest causes P99_total jump; critical actions arrive late.
- Primary evidence: NIC queue depth, drop counters, IRQ migration events, span histograms for ingest/emit stages.
- Fast isolation: lock IRQ affinity → validate queue priority → confirm NUMA locality of RX queues → verify PCIe contention.
- Expected behavior: self-protection mode engages (rate-limits non-critical telemetry, preserves control class), alarm is raised.
- Replay recipe: store burst window capture + exact generator profile + config snapshot; re-run until percentile curve matches.
2) “Attestation fail” after update
- Trigger: measured boot quote mismatch or remote attestation reject after a new image/xApp/policy/model is deployed.
- Primary evidence: PCR quote + decision log (accept/reject + reason), signed artifact chain, SBOM/admission records.
- Fast isolation: confirm signing chain → verify PCR baseline vs expected → confirm update package integrity → check rollback proof.
- Expected behavior: deny / read-only / degraded mode (configured), with immutable audit record; rollback restores trusted baseline.
3) “Root-cause cannot be reproduced” in the field
- Trigger: intermittent failures; average metrics look fine; only tail events break the loop.
- Primary evidence: per-stage spans with correlation-id, time_quality tags, abnormal-mode telemetry snapshots.
- Fast isolation: confirm time jump/drift events → align logs by time_quality → replay the exact slice → validate deterministic drift source.
- Expected behavior: observability escalates only under anomaly and does not perturb control cores (sampling is controlled).
C) Reference Materials (example part numbers / SKUs)
The list below anchors validation to concrete hardware. It is not a procurement recommendation; it is a repeatable test reference set for attestation, determinism, queue stress, IO stress, and timing quality.
| Function | Material No. (PN/SKU) | Why it matters for validation | Used in |
|---|---|---|---|
| TPM 2.0 (RoT) | Infineon SLB-9670VQ2-0 | Measured boot PCR baselines, quotes, attestation fail-path sign-off. | SEC-BOOT-01 |
| Secure Element | NXP SE050C2HQ1/Z01SDZ | Key storage, signed artifact validation, admission control evidence. | SEC-UPD-01 |
| Secure Element | Microchip ATECC608B-SSHDA-B | Alternate secure element reference for signing/verification workflows. | SEC-UPD-01 |
| NIC (PTP-capable) | Intel Ethernet Adapter E810-XXVDA2 | Hardware timestamping support, queue/IRQ stress tests, drop/reorder counters. | RT-PERF-02, FI-NIC-01 |
| DPU/SmartNIC | NVIDIA BlueField-2 MBF2H332A-AENOT | Offload baseline for deterministic pipeline, queue handling, and security posture A/B runs. | FI-NIC-01 |
| PCIe Switch (Gen3) | Broadcom PEX8747 (e.g., PEX8747-AA80BC G) | Multi-endpoint contention reproduction; isolates PCIe topology impacts on P99. | FI-CPU-01, FI-IO-01 |
| PCIe Switch (Gen4) | Broadcom PEX88096 | High-lane-count topology reference for Gen4 stress and peer traffic paths. | FI-CPU-01 |
| PCIe Redriver | TI DS80PCI810 | Signal-integrity related variance isolation (link training/recovery-induced jitter). | FI-CPU-01 |
| Enterprise NVMe SSD | Samsung PM9A3 MZQL2960HCJR-00A07 | Write-amplification and log-pressure reproducibility with consistent IO characteristics. | FI-IO-01 |
| PTP/SyncE Timing IC | Renesas 8A34001C-000AJG | Time-quality and jump/drift experiments with a PTP/SyncE-focused timing source. | FI-TIME-01 |
| Jitter Attenuator | Si5345A-D-GM | Clock-noise sensitivity isolation: separates timing quality issues from software latency. | FI-TIME-01 |
| Edge SoC (example) | Intel Xeon D-2796TE | Determinism baseline: frequency/scheduler sensitivity runs with a known edge-class SoC. | FI-CPU-01 |
The diagram keeps the validation deliverable compact: scenarios/injections define inputs, metrics define thresholds, evidence artifacts guarantee reproducibility, and expected behaviors enforce safe determinism. All stages must preserve correlation-id and time_quality.
H2-12 · FAQs (Near-RT RIC Controller)
These FAQs lock the engineering boundary, focus on determinism (tail latency), and define evidence-driven validation. Answers intentionally avoid expanding into DU scheduling details, full P4 tutorials, or time-source (grandmaster) design.
1) Near-RT RIC vs DU scheduler—what is the real engineering boundary?
2) What latency metric matters most: average, P95, or P99—and why?
3) Why does adding more cores sometimes worsen tail latency?
4) Where should eBPF sit to help without destabilizing the control loop?
5) When is P4 helpful in RIC, and when is it the wrong tool?
6) How to make AI inference deterministic enough for near-real-time control?
7) PCIe switch/retimer: when is it required, and what failures does it introduce?
8) Do we need PTP in RIC if we’re not a time source?
time_quality, and degrade cross-node correlation when time is untrusted so analysis does not create false causality.