Train-to-Ground (T2G) Gateway for Rail Backhaul
← Back to: Rail Transit & Locomotive
A T2G gateway is the boundary device between onboard Ethernet/TSN domains and public/private cellular backhaul. It stabilizes connectivity under motion by multi-link aggregation, protects deterministic traffic with QoS, preserves time observability with PTP/GNSS + holdover, and proves integrity through a hardware root-of-trust and signed evidence logs.
H2-1. Page Promise: What “Good T2G” Guarantees
“Good T2G” is not a feature checklist. It is a set of testable guarantees that remain valid across RF fading, cell handovers, tunnels, power transients, and passenger load spikes. This page defines three guarantees and the evidence fields required to prove them in service.
Connectivity
Sessions survive motion: predictable cutover, controlled drop rate, and clear root-cause attribution.
Determinism
Critical flows get bounded p95 RTTp95 jitterloss even during passenger peaks.
Trust
Software and policy integrity are provable; incident logs are timestamped, signed, and auditable.
| Guarantee | Acceptance metrics (field-verifiable) | Evidence to collect (must be logged) |
|---|---|---|
| Connectivity |
|
|
| Determinism |
|
|
| Trust |
|
|
H2-2. System Context & Interfaces (Onboard ↔ Backhaul ↔ Ground)
The T2G gateway is defined by its system boundary. It terminates onboard domains (ops, passenger, maintenance), attaches to external backhaul bearers (public cellular and/or private networks), and anchors ground-side services (NOC, identity, policy delivery, and evidence ingestion). Clear interfaces prevent scope creep and make later design choices measurable.
Onboard side
Domain separation (VLAN/VRF) + QoS ingress classification + local switch/uplink constraints.
Backhaul side
Multi-link bearers (public/private) with roaming, APN policy, and make-before-break cutover.
Ground side
Policy and identity services, attestation checks, and incident-bundle upload/retention.
Evidence-first interface rule: every boundary must emit counters that explain failures without guesswork: link timeline, QoS proof, power/reset reasons, time confidence, and integrity status.
| Interface group | What to specify (review-ready) | Evidence fields to log |
|---|---|---|
| Onboard ports | Ethernet count/speed; domain mapping (ops/passenger/maintenance); QoS trust boundary; optional PoE role (PD/PSE/pass-through). | Per-domain ingress/egress bytes; DSCP/class mapping; per-class drops/queues; admin access attempts by domain. |
| External bearers | Public cellular + private network attachment; SIM/eSIM policy; roaming/APN rules; link preference by domain; cutover method. | Bearer up/down; handover count; IP changes; DNS reachability; score trend + switch reason codes. |
| Power | EN 50155 wide-input expectations; brownout thresholds; holdup target; safe shutdown constraints for storage and updates. | Reset reason; brownout count; min input voltage; thermal throttle state; storage I/O errors. |
| Time inputs | GNSS availability assumptions (tunnels); PTP role (boundary/relay); holdover behavior; time confidence output. | Offset/drift; servo state; GNSS health; time confidence level; timestamp validity flags in logs. |
| Security boundary | Root-of-trust presence; secure/measured boot; signed policy delivery; remote attestation gating before privileged services. | Boot measurement hash; attestation pass/fail; policy signature checks; admin audit trail; incident bundle hash. |
Implementation hint: keep “what to specify” and “what to log” together. If a field cannot be observed in logs, it cannot be guaranteed in service, and it should not be claimed as a capability.
H2-3. Network Segmentation Model (Safety/Ops/Passenger/Maintenance)
Multi-link aggregation amplifies both good and bad behavior. Without hard segmentation, passenger bursts can starve operations, and a compromised endpoint can pivot across domains. A rail-grade T2G gateway must enforce VLAN/VRF separation and stateful firewall policy at the boundary, then bind each domain to explicit QoS budgets and audit trails.
Safety / Control-adjacent
Only allow strictly scoped telemetry and monitoring; default-deny cross-domain access; highest QoS protection.
Operations
Fleet health, logs, software delivery; allowlisted services to ground; protected under congestion.
Passenger
Best-effort with hard caps; isolated from ops/safety; shaped to prevent bufferbloat and tail latency spikes.
Maintenance
Time-limited privileged access; MFA + session logging; per-action accountability and least privilege.
Hard-cut rule: domain separation is not “best practice”; it is a reliability requirement. The gateway boundary must be the enforcement point: VLAN/VRF mapping, stateful firewall rules, and QoS classification. Downstream devices may differ across fleets, but the boundary contract must remain stable.
| Domain | Allowed flows (examples) | Security policy | QoS level | Evidence fields |
|---|---|---|---|---|
| Safety | Heartbeat telemetry, alarms, time confidence status | Default-deny; explicit allowlists; no inbound from passenger | Highest priority + reserved bandwidth | DSCP→queue map, ACL hit counts, cross-domain deny audit |
| Ops | Fleet health, logs upload, policy/OTA fetch | Ground endpoints allowlisted; signed policy required | Protected class with minimum rate | Queue depth/drops, top talkers, policy signature checks |
| Passenger | Portal, browsing, infotainment updates | Isolated VRF; no lateral access to ops/safety | Best-effort with hard cap + shaping | Shaping rate, drops, p95 RTT during peaks |
| Maintenance | Remote service sessions, diagnostics pulls | MFA; per-session time limits; full command audit | Controlled; never starves safety/ops | Login/audit trail, session duration, rule change events |
- DSCP trust boundary: re-mark at ingress unless the source is trusted and managed.
- Cross-domain audit: log both allow and deny decisions with domain IDs and rule IDs.
- No shared fate: passenger queue growth must not increase ops/safety p95 latency.
- Policy drift control: config and firewall bundles must be signed and versioned.
H2-4. Multi-Link Strategies: Bonding vs Steering vs Failover
“Multi-link” is not one mechanism. It is a choice among bonding, steering, and failover. Each optimizes a different objective and introduces a different failure mode. The strategy should be selected per traffic class and constrained by anti-flap controls: hysteresis, hold-time, and make-before-break.
| Strategy | Primary goal | Upside | Risk / failure mode | Best fit |
|---|---|---|---|---|
| Bonding | Max throughput | Higher aggregate bandwidth for bulk transfers | Reordering & jitter amplification; harms time/interactive flows | Bulk uploads, non-real-time sync |
| Steering | Policy control | Per-domain/per-flow path selection; protects critical classes | Policy complexity; weak observability leads to “unexplainable” incidents | Mixed traffic: ops + passenger + bulk |
| Failover | Determinism | Lowest jitter under normal operation; simplest tail behavior | Bandwidth underused; session breaks without overlay + MBB | Critical domains pinned to one best link |
- Hysteresis: switch only when the candidate link is meaningfully better, and switch back only when meaningfully worse.
- Hold-time: minimum dwell time after a switch to prevent oscillation in marginal coverage.
- Make-before-break: establish the next bearer (IP + overlay warm-up) before moving critical flows.
Selection rule: bonding is a bulk tool; steering is a policy tool; failover is a determinism tool. Anti-flap controls are mandatory in rail mobility; without them, multi-link increases incident frequency and reduces repeatability.
H2-5. Link Scoring & Decision Engine (What to Measure, How to React)
The decision engine is the “system soul” of a T2G gateway. It converts noisy mobility signals into auditable actions: warm up a candidate link, steer a domain, or switch a primary path. A robust engine must (1) measure across three layers, (2) prefer tail behavior over averages, and (3) encode every action as a Decision Record with reason codes.
RF layer (early warning)
RSRP/RSRQ/SINR windows, MIMO rank distribution, BLER trend. RF signals predict degradation but should not trigger switching alone.
Transport layer (experience)
p50/p95 RTT, jitter, loss, reorder, and throughput under load. Tail metrics detect bufferbloat and reordering harm.
Service layer (final gate)
DNS/TLS failures, portal detection, and IP-change frequency. Service probes prevent “RF looks good but apps fail”.
Reaction ladder (light → heavy): raise probe rate and warm up candidates first; switch only when sustained evidence exceeds hysteresis thresholds and the hold-time budget allows. This prevents oscillation at tunnel entrances and marginal coverage.
| Decision Record element | What it captures | Example proof fields |
|---|---|---|
| Score time series | Component scores over time (RF risk / transport quality / service gate). | RF_risk, T_score, S_gate, window stats, timestamps |
| Reason codes | Why a change happened; makes incidents explainable. | DNS_FAIL, TLS_FAIL, TAIL_RTT_SPIKE, BLER_TREND, IP_FLAP |
| Switch counters | How often decisions occur; detects flapping. | switches/hour, flap counter, hold-time violations |
| Before/after validation | Whether service recovered; measures user-perceived continuity. | time-to-service, probe success rate, tunnel state |
| Policy snapshot | Which weights/thresholds were active when the decision happened. | hysteresis, hold-time, weights, domain pinning state |
- RF is predictive, not decisive: RF trends should trigger candidate warm-up and increased probing, not immediate switching.
- Tail-first transport: p95/p99 RTT and jitter dominate averages; reorder is a hard penalty for critical flows.
- Service gates stop false confidence: DNS/TLS/portal probes prevent “link looks fine” failures.
- Every action must be explainable: reason codes + score history are mandatory for field triage.
H2-6. Session Continuity & Overlays (VPN, NAT, CGNAT Reality)
Multi-link availability does not guarantee user-perceived continuity. Underlay changes—IP reassignment, NAT mapping expiry, and CGNAT behavior—can terminate sessions even when a backup bearer is available. Overlays bind connectivity to a stable tunnel identity so sessions recover faster across mobility events.
Underlay changes
IP change, NAT timeout, CGNAT policy shifts. Result: sessions break, retries increase, and tail latency spikes.
Overlay tunnel
IPsec / WireGuard / TLS tunnels create a stable identity; underlay may change while the tunnel re-establishes.
Multipath overlay (optional)
Maintains multiple sub-paths to reduce interruption, at the cost of complexity and reorder management.
| Continuity claim | What must be true | Evidence fields |
|---|---|---|
| Fast recovery after switch | Candidate tunnel is warmed up; rekey/reconnect is bounded; service probes confirm availability. | tunnel reconnect count, rekey time, time-to-service |
| NAT resilience | Keepalives maintain mappings; detection triggers re-establish before apps fail. | NAT keepalive hits, keepalive RTT, mapping expiry events |
| CGNAT variability handling | Policies tolerate carrier-specific timeouts; overlays reduce dependence on stable public identity. | IP flap frequency, handshake failures, retry bursts |
- Measure continuity as time-to-service: recovery time after a switch is more meaningful than link-up time.
- Keepalive is not optional: NAT mapping expiry is a common root cause of “link is up, app is down”.
- Warm up before switching: establish IP + tunnel + service probes, then move critical flows (make-before-break).
- Log what users feel: include DNS/TLS probe outcomes and session recovery time in decision records.
H2-7. QoS & Traffic Shaping (Protect Ops From Passenger Peaks)
QoS is only valuable when it is provable. The goal is not “configured queues” but a measurable contract: passenger peaks must not inflate operations tail latency. This requires a trusted ingress classification boundary, explicit queue budgets and shaping, and egress proof that links queue behavior to p95 RTT and bandwidth attribution.
Ingress classification
Define a trust boundary: re-mark DSCP at the gateway unless the source is managed. Map each domain to a fixed class.
Queues & shaping
Protect ops with reserved capacity; cap passenger with shaping; penalize reorder-sensitive classes when needed.
Egress proof
Show queue depth/drops, p95 RTT correlation, and top talkers during bursts. Prove that policy—not luck—kept ops stable.
| Proof item | What should be observed | Evidence fields |
|---|---|---|
| Queue depth & drops | Passenger queue grows and drops under peaks; ops queue remains shallow with low drops. | per-class depth, per-class drops, shaped rate |
| Tail latency stability | Ops p95 RTT does not track passenger queue depth spikes; tail remains within budget. | ops p95/p99 RTT, depth vs RTT correlation |
| Bandwidth attribution | During peaks, specific passenger sources can be identified and governed. | top talkers, per-VRF egress, per-client rate |
| Classification integrity | Ops traffic hits the correct class; untrusted DSCP is re-marked at the boundary. | class hit counters, DSCP rewrite counters |
Common pitfalls (and how to detect them): bufferbloat appears as high throughput with exploding p95 RTT and sustained queue depth; misclassification appears as abnormal class hit ratios; VRF bypass appears as missing/abnormal egress counters for the expected policy point.
- Confirm classification first: verify DSCP remarking and class hit counters before tuning queues.
- Use tail metrics: optimize ops p95/p99 RTT, not average throughput.
- Correlate evidence: queue depth spikes should explain latency spikes; if not, check bypass paths and power resets.
- Attribute peaks: top talkers during bursts must be visible to enforce caps and governance.
H2-8. Ethernet & PoE Integration (Budget, Inrush, Brownout Immunity)
In rail deployments, “random reboots” often trace back to power transients and PoE events: inrush current, input droop, brownout, and reset cascades that look like networking issues. A robust T2G gateway treats PoE as a power system: define roles and budgets, control inrush and sequencing, and keep audit-grade evidence (brownout counters, reset reasons, and minimum input voltage).
PoE roles & budget
PD / PSE / pass-through must be explicit. Port budgets and priorities prevent overload when multiple endpoints attach.
Transient failure chain
PoE enable → inrush → VIN sag → UVLO/brownout → MCU reset → link flap → session loss.
Protection & holdup
Limit inrush, enforce sequencing, and preserve critical state during short drops with holdup and graceful load shedding.
| Incident proof field | Why it matters | Examples |
|---|---|---|
| Brownout counter | Separates power instability from “mysterious network flaps”. Tracks frequency and severity of droops. | brownout_count, brownout_flag |
| Reset reason | Shows whether the reboot came from UVLO/brownout, watchdog, or software paths. | reset_reason enum |
| Minimum VIN | Quantifies droop during PoE enable or load steps; correlates with UVLO thresholds. | VIN_min (window) |
| PoE enable timing | Proves causality: enable sequence aligns with droop and resets. | port_id, enable_ts, duration |
| Thermal derate state | Explains reduced headroom; derating can turn a safe transient into a reset. | derate_state, temp |
Compliance touchpoints (mention-only): rail power environments require wide input range, temperature resilience, and transient tolerance (EN 50155). These constraints should be reflected in budgets, sequencing, and evidence logs.
- Budget before enabling: enforce per-port power budgets and priorities; allow load shedding on passenger ports first.
- Control inrush: sequence PoE enables and avoid simultaneous port start-up that collapses VIN.
- Prove causality: align VIN_min, PoE enable timing, brownout counters, and reset reason in one incident record.
- Protect continuity: prevent resets that cascade into link flaps and session recovery storms.
H2-9. Time Sync Across Backhaul (PTP/GNSS/Holdover + Confidence)
A T2G gateway must not become a “time black hole”. The objective is a resilient time service that survives mobility and backhaul variability: multiple sources are arbitrated, the gateway role is selected by deployment constraints, loss-of-lock triggers holdover with bounded drift, and every timestamp is accompanied by a Time Confidence Level that downstream systems can consume.
Time sources (arbiter)
GNSS and network/PTP are inputs—not guarantees. Arbitration must react to health, stability windows, and source transitions.
Gateway role logic
Select boundary/transparent/relay based on whether the gateway must terminate instability and re-distribute time onboard.
Holdover + confidence
When GNSS is lost (tunnels) or backhaul becomes unstable, holdover maintains continuity while confidence degrades automatically.
Time Confidence Level turns “time quality” into a contract. Downstream consumers (logging, signatures, event correlation, monitoring) should branch behavior by confidence (e.g., trusted / degraded / not-trusted) and avoid treating all timestamps as equal.
| Evidence field | What it proves | Examples |
|---|---|---|
| Offset / drift | Quantifies alignment and slope; supports drift budgeting during holdover. | offset p50/p95, drift slope |
| Servo state | Shows locked/holdover/freerun transitions and stability windows. | servo=LOCKED/HOLDOVER |
| GNSS health | Explains tunnel loss-of-lock and antenna faults without guesswork. | gnss_lock, health summary |
| Source selection + reason | Records which source was active and why switching occurred. | source=GNSS/PTP, reason code |
| Time confidence log | Makes time quality machine-consumable and auditable over time. | confidence=L1→L3, transition log |
- Tunnel loss-of-GNSS: switch to network/PTP if stable; otherwise enter holdover and degrade confidence immediately.
- Backhaul instability: avoid oscillation with stability windows; degrade confidence if offset variance exceeds budget.
- Holdover limits: enforce maximum holdover time or drift budget; beyond limits, mark time as not-trusted.
- Recovery anti-flap: require sustained health before upgrading confidence (no instant “green” on brief reacquisition).
H2-10. Hardware Root-of-Trust & Remote Attestation
A T2G gateway is not just a connected box—it is an edge node that must be provably trustworthy. Trust requires a boot chain that cannot be silently altered, keys bound to device identity, measured boot that produces verifiable measurements, and remote attestation that is evaluated automatically (not by humans). Crucially, configuration and policy must be signed as policy-as-code.
Secure boot chain
Boot stages validate the next stage. Failure handling must be explicit: block, safe mode, or constrained maintenance mode.
Measured boot
Critical components are measured into hashes. Measurements become evidence, not just version strings.
Attestation automation
Reports are verified by a ground-side verifier that returns machine-actionable outcomes: PASS/FAIL/QUARANTINE.
Policy-as-code: signing firmware is not enough. Routing rules, ACLs, QoS maps, tunnel policies, and time-sync thresholds must be bundled as versioned policy artifacts with signature verification on-device. If policy verification fails, the system should refuse to apply changes and emit a high-severity audit event.
| Evidence / audit | What it enables | Fields |
|---|---|---|
| Measurement hashes | Detects tampering in boot/OS/services/policy bundle; supports deterministic verification. | hash_boot, hash_os, hash_policy |
| Attestation result | Machine decision for access control and fleet governance. | PASS/FAIL, reason code |
| Policy signature verify | Ensures configuration changes are authorized and traceable. | policy_version, sig_ok |
| Admin login audit | Human accountability: who changed what, when, and how. | who/when/method/change_id |
- Trust extends to runtime policy: critical configuration is inside the trust boundary, not outside it.
- Automate decisions: attestation must drive allow/deny/quarantine actions without manual review.
- Fail predictably: define safe-mode behavior for verification failures to preserve maintainability.
- Audit everything: measurement, policy verification, and admin access logs must align for incident response.
H2-11. OTA Updates & Safe Rollback Under Motion/Power Risk
Rail OTA fails for predictable reasons: backhaul variability, motion-driven link changes, power transients, and short maintenance windows. The solution is not “reliable download”, but a gated state machine: downloads can pause and resume, staging is verifiable, commit is only allowed when Power-Good AND Time-Confidence are satisfied, and failures must revert safely without disrupting critical operational domains.
A/B partitions
Run from A (known-good). Stage new image into B, verify, then switch only under commit gate conditions.
Resumable transfer
Chunked download with per-chunk verification and retry budgeting; tolerate link loss without corrupting state.
Rollback safety
Boot/health failures trigger rollback to A with counters and cool-down windows to prevent oscillation loops.
| Stage | What must be true | Evidence fields |
|---|---|---|
| Download | Chunked transfer with bounded retries; progress survives link loss and session changes. | chunk_id, retry_count, bytes_ok |
| Verify | Per-chunk integrity passes; full image integrity matches manifest; signature is valid. | chunk_hash_ok, image_hash_ok, manifest_sig_ok |
| Stage (B) | Written image is re-verified on target partition; storage health supports commit. | stage_verify_ok, storage_health |
| Commit Gate | Power-Good AND Time-Confidence are stable; system is quiescent; temperature not derating. | VIN_min, brownout_delta, derate_state, time_conf_level |
| Activate/Boot | New partition boots and reaches service readiness within window; no critical domain regressions. | boot_ok, ready_ms, domain_ok |
| Rollback | Triggered by boot/health faults; limited by rollback counter and cool-down to prevent loops. | rollback_count, rollback_reason |
Rail-specific constraint: commit must never be attempted during uncertain power or time conditions. Power-Good should be derived from input minima and brownout counters (not a single instant reading). Time-Confidence must be a stable level (e.g., L1/L2) over a window, not a momentary reacquisition.
- Commit gate is an AND rule: Power-Good AND Time-Confidence AND Thermal-OK AND Quiescent.
- Critical domain isolation: OTA must not starve ops/safety traffic; use domain separation and resource limits for updater tasks.
- Pause vs rollback: link failures pause download; integrity failures abort staging; boot/health failures trigger rollback.
- Rollback loop protection: increment rollback_count; apply cool-down; quarantine updates that repeatedly fail with same reason code.
Material Numbers (MPNs) — reference building blocks
The following MPNs are common “lego bricks” used to implement OTA safety gates (storage integrity, secure boot evidence, power-good/brownout monitoring,
hold-up, watchdog) in rugged gateways. Final selection depends on input range, temperature class, and system architecture.
| Function | Suggested MPNs | Why used in OTA safety |
|---|---|---|
| eMMC (A/B) | Micron MTFC16GAPALBH (eMMC) | Non-volatile A/B partitions; supports robust staging and verification. |
| SPI NOR (boot) | Winbond W25Q128JV (SPI NOR) | Bootloader/manifest storage; predictable read behavior for measured boot evidence. |
| TPM / RoT | Infineon SLB9670 (TPM 2.0 family) | Hardware-rooted keys + measurement anchoring for policy signing/verification and attestation evidence. |
| Secure element | Microchip ATECC608B | Device identity and signing/verification primitives for policy bundles and update manifests. |
| eFuse / hot-swap | TI TPS25947 (eFuse) | Inrush limiting and fault protection reduce brownout-induced “commit bricks”. |
| Surge stopper | Analog Devices LTC4368 | Overvoltage/undervoltage protection supports stable Power-Good envelope for commit gating. |
| Ideal diode OR | TI LM74610 | Input ORing / reverse protection; improves resilience during transient events and supply switchover. |
| Supervisor (reset) | TI TPS386000 | Deterministic reset behavior + monitoring supports reliable “power-good window” determination. |
| Watchdog | Analog Devices/Maxim MAX6369 | Forces recovery from updater deadlocks without indefinite partial-update states. |
| RTC | Analog Devices/Maxim DS3231M | Stable local time base for logs when time confidence degrades; improves audit continuity. |
| Temp sensor | TI TMP117 | Thermal-OK gating and derate evidence at commit timestamp. |
| Hold-up / backup | Analog Devices LTC4041 | Helps bridge short power sags to complete critical commit steps or cleanly abort before flash corruption. |
| Oscillator (holdover) | SiTime SiT5356 | Supports time-holdover quality, enabling a meaningful Time-Confidence gate during GNSS loss. |
Tip: tie each MPN-backed mechanism to an evidence field. Example: supervisor/reset and eFuse/hot-swap should feed VIN_min, brownout_delta, and reset_reason; TPM/secure element should feed manifest_sig_ok and policy_sig_ok.
H2-12. Diagnostics & “Incident Bundle” (Make Field Failures Fixable)
Field failures become fixable only when “a network problem” is converted into a bounded, explainable incident. A T2G gateway should generate one signed Incident Bundle per event: a time-aligned link timeline, transport tail metrics, QoS evidence, system health, and security proof. Bundles are stored locally and can be uploaded later when connectivity is stable.
Connectivity triggers
Bearer down, tunnel down, repeated DNS/TLS failures, frequent IP/NAT changes, link score collapse.
Performance triggers
p99 RTT breach over window, loss/jitter/reorder spikes, tail latency correlated with queue depth.
System triggers
Brownout delta, watchdog/reset reason, thermal derating, time-confidence downgrade transitions.
A practical default is a fixed evidence window around the trigger (example: T-60s to T+120s). Apply de-duplication (merge repeated triggers within a short interval) to avoid “log storms” during tunnels or brief coverage gaps.
| Basket | What it explains | Minimum fields to pack |
|---|---|---|
| Link timeline | What changed first (bearer, handover, IP/NAT, tunnel). Align decisions with outcomes. | bearer up/down, handover start/end + reason, IP change points, tunnel reconnect, link score series |
| Transport tails | Turns “bad network” into measurable tail behavior. | RTT p50/p95/p99, jitter, loss, reorder, throughput-under-load snapshot |
| QoS evidence | Proves whether ops traffic was protected or starved by passenger peaks. | queue depth peaks, drops per queue/class, DSCP/ACL hit counts, top talkers |
| System health | Separates connectivity issues from power/thermal/reset root causes. | temperature, derate_state, VIN_min, brownout_delta, reset_reason, watchdog events |
| Security proof | Answers “was the device/config trusted” and detects policy drift. | attestation PASS/FAIL + reason, measurement hash summary, policy signature OK/FAIL + version, admin login audit |
Bundle contents
manifest + evidence JSON + hashes + signature. Store-and-forward for unstable backhaul.
Signing rules
Sign the manifest and the hashes of evidence files to make the bundle tamper-evident.
Deferred upload
Upload only when stable: rate-limited, non-critical window, retry with backoff.
| File | Purpose | Notes |
|---|---|---|
| manifest.json | Incident id, trigger, time window, software/policy versions, hash list. | Small, always present |
| timeline.json | Bearer/handover/IP/tunnel timeline aligned to time-confidence. | Link-first causality |
| tail_metrics.json | RTT/jitter/loss/reorder percentiles and snapshots. | Prefer tails p95/p99 |
| qos_evidence.json | Queue depths, drops, classification hits, top talkers. | Proof of protection |
| system_health.json | Thermal/power/reset/watchdog evidence around incident. | Power-good context |
| security_evidence.json | Attestation result + config signature verification + admin audit. | Trust + drift |
| signature.sig | Signature covering manifest + evidence hashes. | Tamper-evident |
A simple, high-value triage pattern is: (1) timeline first, then check whether (2) tails align with (3) queue evidence, and finally eliminate (4) power/thermal resets and (5) trust/policy drift.
Material Numbers (MPNs) — reference parts that support “signed incident bundles”
These MPNs commonly appear in rugged gateways to make incident capture reliable: secure signing anchors, durable local storage, watchdog/reset determinism,
power-good/brownout evidence, and stable timestamps for audit continuity.
| Need | Suggested MPNs | How it helps incident bundles |
|---|---|---|
| TPM / RoT | Infineon SLB9670 (TPM 2.0 family) | Anchors signing keys and measurement evidence; supports attestation results included in bundles. |
| Secure element | Microchip ATECC608B | Device identity + signing/verification for bundle signatures and policy signature checks. |
| eMMC (local store) | Micron MTFC16GAPALBH (eMMC) | Durable local store for ring-buffer bundles; supports staging and retention policies. |
| SPI NOR (boot logs) | Winbond W25Q128JV | Stable storage for minimal boot/audit artifacts or fallback evidence markers. |
| Watchdog | Analog Devices/Maxim MAX6369 | Prevents collector deadlocks; ensures incidents are captured or recovered deterministically. |
| Supervisor / reset | TI TPS386000 | Captures clean reset behavior and power-fail context (evidence for brownout vs network issues). |
| eFuse / inrush | TI TPS25947 (eFuse) | Reduces transient-induced resets; also enables meaningful “Power-Good” gating evidence. |
| Surge stopper | Analog Devices LTC4368 | Protects supply envelope; improves reliability of VIN_min and brownout evidence logging. |
| RTC (audit time) | Analog Devices/Maxim DS3231M | Maintains audit timestamps when backhaul time confidence degrades; keeps bundles alignable. |
| Temp sensor | TI TMP117 | Thermal/derate evidence for incident windows; separates RF issues from thermal throttling. |
| Hold-up / backup | Analog Devices LTC4041 | Bridges brief sags so evidence can be flushed and signed rather than lost on sudden power drop. |
| Oscillator (holdover) | SiTime SiT5356 | Supports time-holdover quality so time-confidence and event alignment remain meaningful in tunnels. |
Implementation hint: map each MPN-backed function to a bundle field. Example: TPS386000 + TPS25947 should feed VIN_min, brownout_delta, reset_reason. SLB9670/ATECC608B should feed signature.sig, attestation_result, policy_sig_ok.
H2-13. FAQs (Accordion ×12)
Each answer follows the same field-ready pattern: 1 conclusion + 2 evidence checks + 1 first fix, mapped back to the relevant chapters for quick verification.