Diagnostics / Gateway / TCU: Multi-bus, DoIP, OTA, Security
← Back to: Automotive Fieldbuses: CAN / LIN / FlexRay
A Diagnostics/Gateway/TCU is the system “traffic + policy” core that bridges multi-bus vehicle networks to Ethernet/external services, while keeping diagnostics and OTA stable, recoverable, and secure.
This page provides an engineering path from architecture and bridging rules to serviceability logs, OTA state machines, security enforcement, and measurable pass criteria—so failures are contained and systems remain operable in real vehicles.
Definition & Boundary: What “Diagnostics / Gateway / TCU” Means Here
This page defines a system-level gateway/TCU as a network boundary node that bridges in-vehicle buses to Automotive Ethernet and external services, while enforcing diagnostic/OTA/security policies and producing service-grade observability.
- Traffic boundary: filtering, rate limiting, prioritization, and fault containment.
- Policy execution point: access control, routing rules, and session gates.
- Serviceability anchor: consistent logging, counters, and traceability IDs.
- Domain compute: coordinates domain functions (body/chassis/powertrain/infotainment).
- May host partial gateway functions, but priority is domain feature execution.
- Interfaces are treated as abstract ports; physical-layer tuning belongs to bus-specific pages.
- External termination: cellular/Wi-Fi/VPN/TLS endpoints and cloud connectivity.
- OTA & remote diagnostics orchestration with recoverable state machines.
- Often doubles as the secure gateway boundary for access and update authorization.
- Inputs: in-vehicle frames (CAN/LIN/FlexRay/Ethernet), diagnostic sessions (DoIP/service tool), OTA campaigns, security policies, timing base.
- Outputs: policy-compliant forwarding/bridging, bounded latency/throughput behavior, auditable security events, traceable diagnostic logs, OTA state transitions with recovery.
- Deliverables: architecture boundary map, rule tables (filter/limit/route), minimal observability schema (fields + counters), verification hooks.
- Multi-bus to Ethernet bridging logic (filter/limit/queue/fault containment).
- DoIP/diagnostic path engineering (session gates, address mapping, serviceability logging).
- OTA lifecycle reliability (state machine, rollback, dependency control, power-loss recovery).
- Secure gateway integration (trust chain, keys, policy enforcement, auditability).
- CAN/CAN-FD/SIC/XL waveform timing, sample-point tuning, termination values → see CAN FD Transceiver, SIC/SIC-XL, CAN XL PHY.
- LIN physical-layer slew/auto-baud electrical details → see LIN Transceiver.
- FlexRay port electrical tuning and topology specifics → see FlexRay Transceiver.
- CMC/TVS placement and exact protection parasitics → see EMC / Protection & Co-Design.
Diagram intent: show the gateway/TCU as the boundary control point between zonal buses, Ethernet backbone, and external diagnostics/OTA services.
System Architecture: Data Plane / Control Plane / Management Plane
A robust gateway/TCU architecture separates fast-path forwarding (data plane), decision-making and policy (control plane), and long-horizon operations (management plane). This prevents diagnostic/OTA/security features from destabilizing real-time traffic.
- Ingress parsing → classification → policy match → queueing → scheduling → egress shaping.
- Hard requirements: bounded latency, bounded queue growth, and controlled loss under congestion.
- Fault containment: isolate noisy ports/sessions before they impact the rest of the vehicle network.
- Diagnostic sessions: admission control, authorization levels, and timeout policies.
- Routing & filtering rules: versioned distribution, rollback, and safe defaults on mismatch.
- Error handling: degrade modes, reset boundaries, and safe recovery sequencing.
- Observability: structured logs, counters, traces, and health snapshots for field triage.
- Configuration & versions: rule packs, certificates, OTA campaigns, and policy baselines.
- Timebase consistency: unified timestamp source for auditability and cross-ECU correlation.
- Classification keys: bus type, source ECU, service class (control / diagnostics / OTA / logging), and safety criticality.
- Queue model: dedicate queues per service class; protect real-time control from diagnostic bursts via strict priority or minimum service.
- Rate limiting: enforce per-session/per-source caps; apply backoff to misbehaving testers to avoid starvation and watchdog resets.
- Congestion policy: define drop order (e.g., bulk logs before control), and record drops with reasons for field analysis.
- Fault containment: circuit-breaker rules for repeated timeouts/resets; isolate the port rather than rebooting the entire gateway.
Diagnostic/OTA incidents are rarely reproducible without a consistent schema. Define a minimal field set and enforce it across sessions.
- Session: session_id, tester_id, auth_level, start/stop timestamp, timeout reason.
- Routing: rule_pack_version, mapping_id, source_port, dest_port, action (forward/drop/shape).
- Performance: queue_id, queue_depth_peak, drop_count, p99_latency (X ms), throughput (X Mbps).
- Security: cert_version, key_id, policy_decision, violation_type, audit_id.
- Reliability: reboot_cause, watchdog_stage, power_event_flag, recovery_state.
- Allowed: “port type, throughput class, error model, wake/sleep impact, diagnostics session behavior”.
- Forbidden: “sample point, termination components, SIC waveform symmetry, TVS/CMC parasitic tuning”.
- Action: when forbidden terms appear, provide a one-line context and link to the dedicated PHY/EMC page.
Diagram intent: enforce a clean split—data plane stays deterministic; control plane decides; management plane records and operates.
Multi-bus ↔ Ethernet Bridging Fundamentals
Bridging is a data-path engineering problem: classify traffic, apply policy, protect control flows with queueing and limits, and contain faults so one noisy endpoint cannot destabilize the entire vehicle network.
- Fit: Ethernet-to-Ethernet segments where L2 boundaries are explicitly controlled.
- Risk: broadcast/unknown-unicast storms and uncontrolled fan-out under misconfiguration.
- Rule: require storm control and strict isolation policies; do not rely on “best effort” forwarding.
- Fit: CAN/LIN/FlexRay ↔ Ethernet where traffic is mapped by ID/address/service class.
- Risk: rule-table growth, rule conflicts, and “works on bench, fails in field” version drift.
- Rule: version rule packs and default-safe behavior on mismatch (deny/shape + audit).
- Fit: diagnostics/OTA/security where the gateway must enforce authorization and produce audit trails.
- Risk: state explosion and resource exhaustion if proxy logic leaks into the fast path.
- Rule: keep proxy decisions in the control plane; keep the data plane deterministic.
- Filter keys: port, source ECU, service class (control / diagnostics / OTA / logs), and session identity.
- Admission control: bound concurrent diagnostic sessions; reject excess sessions with a reason code and audit log.
- Rate caps: apply per-session and per-source limits; add backoff when repeated timeouts indicate retry storms.
- Congestion policy: define drop order that protects control traffic; record drop reason for field triage.
- Safe defaults: unknown traffic is shaped or denied (never broadcasted) and always audited.
- Service-class queues: Control / Diagnostics / OTA / Logs as the primary split.
- Fairness knobs: per-ECU or per-session sub-queues to prevent single-source starvation.
- Scheduling: strict priority or minimum-service guarantees for control; diagnostics are shaped, not “randomly dropped”.
- Explainability: every throttle/drop should map to a policy rule and emit a reason code.
- Isolation domains: by port, by session, and by service class (preferred for service stability).
- Circuit breaker states: Closed → Open → Half-open; transitions require explicit reasons and timers.
- Degrade modes: keep control + restrict diagnostics + pause OTA; or service-only mode for recovery.
- Auditability: every isolation event produces an audit_id and correlates with queue and session metrics.
| Ingress | Class | Match Keys | Action | Queue & Rate | Exception | Audit |
|---|---|---|---|---|---|---|
| CAN Port A | Control | ECU / ID range | Forward | Q1 · min service | Service mode | rule_pack_version |
| Tester / DoIP | Diagnostics | session_id / auth_level | Proxy / Shape | Q2 · cap X | Factory mode | reason_code |
| Cloud / OTA | OTA | campaign_id / version | Shape / Pause | Q3 · cap X | Degrade mode | audit_id |
Implementation rule: every forward/drop/shape decision must be attributable to a single policy row and emit a stable reason code.
Diagram intent: show where decisions happen (policy), where protection happens (queues/limits), and where containment happens (circuit breaker).
Diagnostics Path: DoIP Session, Addressing, and Serviceability
Stable diagnostics requires three things: a deterministic admission gate, a versioned address-mapping contract, and a minimal logging schema that makes field failures explainable without reproducing the exact harness setup.
- Session overload: too many concurrent sessions or retries can starve control traffic; check admission counters and queue watermarks first.
- Mapping drift: address-table version mismatch can look like random timeouts; verify mapping_id and rule_pack_version alignment.
- Policy mismatch: an auth-level downgrade may cause silent rejects; require explicit reason codes and audit IDs.
- Resource collapse: CPU/memory spikes or encryption overhead can trigger watchdog resets; correlate session events with health logs.
- Goal: translate DoIP-side logical addressing into stable target identities without ambiguity.
- Versioning: every mapping must carry mapping_id and rule_pack_version for field correlation and rollback.
- Conflicts: duplicates and gaps must resolve to safe defaults (reject/shape + audit) rather than “best-effort forward”.
- Self-check: validate coverage, uniqueness, and default actions before enabling service mode in production.
- Read-only: lowest risk, still requires session identity and rate limits.
- Write: requires elevated auth_level and strict per-target quotas; reject is never silent.
- Programming/flash: highest risk; enforce strong authorization, maintenance mode constraints, and mandatory audit trails.
- Reject behavior: always return a stable reason_code and record an audit_id with timestamps and latency.
- Time: timestamp (unified timebase), duration/latency_ms.
- Session: session_id, tester_id, auth_level, start/stop reason.
- Target: target_ecu, logical_addr, physical_addr (or stable target ID).
- Operation: service_id, payload_len, status (ok/fail/timeout/reject), reason_code.
- Resources: queue_id, queue_depth, drop_count_delta, throttle_events.
- Versions: rule_pack_version, mapping_id, cert_version (if applicable).
Diagram intent: show the auth gate as a mandatory step and mark audit points that enable field debugging without reproducing the entire setup.
OTA Lifecycle: State Machine, Rollback, and Dependency Control
Automotive-grade OTA requires recoverability. The lifecycle must define durable state per step, strict activation gates, and rollback behavior that keeps the vehicle serviceable under power loss, weak links, or reboots.
- Campaign: define target set, allowed windows, rollout groups, and dependency rules as a versioned contract.
- Download: chunked transfer with resume; rate caps protect control traffic under weak links.
- Verify: signature and hash checks; reject on mismatch with explicit reason codes and audit IDs.
- Install: write to staging/A-B slot; keep the current image untouched until activation is safe.
- Activate: switch pointers/slots via an atomic flag; ensure a deterministic boot path.
- Confirm: commit only after health signals pass; otherwise trigger rollback or safe service mode.
- Durable progress: each step persists minimal fields to resume or roll back without guessing.
- Atomic boundaries: define where interruption is allowed (download, staged install) vs guarded (activate switch).
- Restart rules: on reboot, recover from the last durable state and follow deterministic transitions.
- Failure semantics: verification failures never activate; install failures keep the old image bootable.
- Triggers: boot-loop counters, health-check failure, missing critical services, or explicit negative confirmation.
- Confirm window: commit only after stable operation across defined cycles/time; otherwise rollback automatically.
- Non-rollback cases: if rollback is disallowed (e.g., mandatory security update), fail into a restricted safe mode with full diagnostics.
- Auditability: every rollback records audit_id, reason_code, and the last known durable state.
- Version matrix: define compatible sets and minimum versions; reject activation when dependencies are not satisfied.
- Ordering: enforce explicit sequences per domain role (gateway services, targets, then optional modules) with rollback points.
- Stop-loss: if any critical ECU fails, pause the campaign and keep the vehicle in a known serviceable mode.
- Confirm scope: confirmation checks both ECU health and dependency satisfaction as a whole.
| State | Entry | Do | Durable Fields | Exit | Fail → | Safety Note |
|---|---|---|---|---|---|---|
| Download | Campaign accepted | Chunked fetch + resume | package_id, received_ranges, bytes_done, hash_state | All chunks received | Retry / Pause | Old image untouched |
| Verify | Download complete | Signature + hash checks | signature_ok, hash_ok, verified_version, audit_id | Verified | Abort | Never activate on fail |
| Install | Verify ok | Write to staging slot | staging_slot, write_offset, progress, result | Installed | Retry / Abort | Old boot slot preserved |
| Activate | Install ok | Atomic slot switch | next_boot_slot, activation_flag, time | Boot new image | Rollback | Switch must be atomic |
| Confirm | Boot ok | Health checks + commit | confirm_deadline, signals, result, reason_code | Committed | Rollback / Safe | Vehicle stays serviceable |
Implementation rule: durable fields must be sufficient to determine the next state after reboot without heuristic guesses.
- Campaign: campaign_id, target_set_hash, rollout_group, policy_window
- Download: package_id, received_ranges, bytes_done, chunk_hash_state
- Verify: signature_ok, hash_ok, verified_version, audit_id
- Install: staging_slot, write_offset, install_progress, install_result
- Activate: next_boot_slot, activation_flag, activation_time
- Confirm: confirm_deadline, health_signals, confirm_result, reason_code
- Audit: audit_id, mapping_id, rule_pack_version (for correlation with gateway policies)
Diagram intent: highlight durable persistence points and show rollback/safe-mode paths without dense text.
Secure Gateway Integration: Trust Chain, Keys, and Policy Enforcement
Secure gateways are operational systems: a trust chain anchors identity, key management keeps credentials alive across the vehicle lifecycle, and policy enforcement produces auditable decisions for diagnostics and OTA.
- Secure boot: establishes a trusted software identity for the gateway/TCU runtime.
- HSM/root key: anchors cryptographic operations; private roots remain non-exportable.
- Runtime identity: produces stable device_id/cert_id used by sessions and policy decisions.
- Policy tie-in: identity + session attributes map to allow/deny/shape decisions with audit IDs.
- Lifecycle: issue → activate → rotate → revoke/expire with explicit ownership and audit trails.
- Rotation: time-based and event-based rotation; support rollback of configuration but not of root identity.
- Revocation: define behavior on expired/revoked credentials (restricted mode vs deny-all) by policy.
- Factory injection: bind identity to hardware roots; record injection batch and provisioning version.
- Service updates: update credentials in service mode without breaking diagnostics access.
- Termination point: terminate at the gateway for fine-grained audit/policy, or upstream for simplified roles; make it explicit.
- Certificate ownership: define who rotates and who revokes; enforce a single source of truth for cert versions.
- Failure semantics: handshake failure routes to restricted mode or deny by policy; never fall back silently.
- Audit: record peer_id, cert_id, channel, and reason_code for every deny or downgrade.
- Network allowlist: permitted peers, ports, and services; default deny for unknown traffic.
- Diagnostics privilege: read/write/flash mapped to auth_level; enforce quotas per session and per target.
- OTA authorization: campaign must be signed/approved; activation depends on dependency satisfaction and policy windows.
- Domain isolation: cross-domain flows require explicit permits; log every cross-boundary decision.
- Rate anomalies: sudden spikes per peer/service; link to throttling and circuit-breaker triggers.
- Session anomalies: abnormal failures, timeouts, or concurrency patterns; tie to admission gates.
- Replay-like patterns: repeated identical requests in short windows; enforce policy-based rejection and auditing.
- Actionability: anomaly signals must trigger degrade/isolation/audit escalation instead of being passive dashboards.
Diagram intent: show a closed loop from trust anchoring to policy enforcement and auditability, with anomaly inputs driving action.
Functional Safety & Reliability: Fail-Operational vs Fail-Silent
A gateway/TCU failure must have a defined outcome. Safety objectives grade impacts by function class, health monitoring closes the loop from detection to action, and degraded modes preserve serviceability without allowing fault propagation.
- Fail-Operational: preserve a minimal set of essential functions under fault, typically via controlled degradation.
- Fail-Silent: stop emitting potentially harmful effects, isolate external connectivity, and block cross-domain propagation.
- Per-function decision: control, diagnostics, OTA, and external connectivity do not share the same allowed outcome.
- Evidence requirement: every degrade/isolate decision must be explainable with audit_id and reason_code.
| Function Class | Failure Impact | Required Outcome | Recovery Target | Audit Minimum |
|---|---|---|---|---|
| Control (critical) | High | Fail-operational (minimal set) or isolate to local domain | ≤ X s | mode_id, detector_id, action_id, audit_id |
| Diagnostics | Medium | Degraded (service-only) with strict privileges | ≤ X s | session_id, reason_code, audit_id |
| OTA | Medium | Fail-safe pause + recover/rollback (no partial activation) | ≤ X min | campaign_id, state, audit_id |
| External connectivity | High | Fail-silent by default (isolate) unless explicitly allowed | ≤ X s | peer_id, channel, rule_id, reason_code |
Implementation note: safety objectives must map to explicit degraded modes; “undefined behavior” is treated as a design failure.
- Deadlock / stalls: heartbeat gaps, scheduling delay spikes, and unresponsive service endpoints.
- Memory pressure: heap high-watermark, allocation failures, handle growth, and leak-rate estimates.
- Queue blockage: queue depth saturation, tail latency, drops, and backpressure trigger counts.
- Session health: abnormal timeouts, failed handshakes, and runaway concurrency.
- Soft recovery: restart a service, clear a stuck queue, re-load rule packs, re-open sessions.
- Containment: circuit breaker, rate limiting, deny-by-policy, cross-domain blocking.
- Hard recovery: controlled reboot, revert to last known-good config, enter service-only mode.
| Mode | Allowed | Denied | Enter | Exit | Required Logs |
|---|---|---|---|---|---|
| Mode 1 Control-preserve |
Minimal control flows, critical routing, bounded queues | External access, non-essential cross-domain traffic | Resource stress, repeated stalls | Health passes Y cycles | mode_id, detector_id, action_id |
| Mode 2 Service-only |
Diagnostics + logging, strict auth_level gates | OTA activation, broad routing, external sessions | Policy failure, rule-pack mismatch | Service confirmation | session_id, rule_id, reason_code |
| Mode 3 Silent / Isolated |
Local safe logging only, bounded watchdog recovery | External connectivity, cross-domain flows | Untrusted state, repeated failed recoveries | Manual service reset | audit_id, fault_id, last_state |
| Injection | Expected Detector | Expected Action | Pass Criteria |
|---|---|---|---|
| CPU saturation (X%) | sched-delay / heartbeat timeout | rate limit + degrade to Mode 1 | detect ≤ X ms; action ≤ X ms |
| Memory allocation failures | heap watermark / alloc-fail counter | restart service; if repeated → Mode 2 | recovery ≤ X s; logs complete |
| Queue blockage / flood | queue depth + tail latency | circuit breaker + drop policy | no cross-domain collapse |
| Policy engine stalls | policy heartbeat + rule-pack integrity | Mode 2 or Mode 3 depending on trust | deny-by-default is enforced |
Diagram intent: present a complete reliability loop with explicit actions and audit evidence, without protocol-level details.
Physical Integration Envelope: EMC & Protection Boundaries for Gateways
Gateway/TCU ports define the system boundary. Protection must be layered from the connector inward, return paths must be planned, and configurable drive/slew must follow a system policy that balances emission, margin, and robustness.
- Connector / shield: define the physical entry and shield bonding point(s).
- Surge layer: manage energy and return paths; keep surge loops out of sensitive signal ground.
- ESD layer: clamp fast events close to the entry; minimize inductive distance to the return path.
- Common-mode layer: suppress radiated/common-mode energy before the clean IC domain.
- Clean domain: keep protocol ICs inside a clearly defined “quiet zone” behind the protection stack.
- Keep “dirty return” away: ESD/surge return must not traverse sensitive digital/analog reference regions.
- Shield continuity: bonding must be explicit; floating or intermittent shield connections often amplify emissions.
- Single controlled tie: define where clean reference and chassis/body ground connect (if required), and keep it deterministic.
- Black-box logging: record port, trigger type, and the resulting action (reset/isolate) for service correlation.
- Slower edges: reduce emissions but shrink timing margin and increase sensitivity to noise and loading.
- Stronger drive: improves robustness but can increase crosstalk and radiated energy.
- Policy table: define profiles by harness class (length, node count, environment) and validate against a fixed checklist.
- Harness A: low emission profile (slew low, drive medium)
- Harness B: balanced profile (slew medium, drive medium)
- Harness C: high robustness profile (slew high, drive high with strict containment)
- Uncontrolled ground potential differences: domains with unpredictable reference offsets across operating conditions.
- High disturbance interfaces: external-facing ports or long harness segments with strong coupling risk.
- Security partitioning: boundaries where external connectivity must not influence safety-critical domains.
- Fault containment: when a single-port event must not impact the rest of the network.
| Port Type | Layer Stack | Placement | Return Target | Risk Notes |
|---|---|---|---|---|
| Ethernet / External | Surge → ESD → CM → Clean | PCB edge / connector-side | Chassis / body ground | Reset, false wake, session drops |
| Diagnostics port | ESD → CM → Clean | Connector-side | Controlled tie point | False diagnostics triggers |
| Power entry | Surge → ESD → Filtering | At entry + tight loop | Chassis/body + power return | Brownout / reset storms |
- “Dirty return” paths do not cross clean reference regions.
- Shield bonding is explicit and mechanically reliable.
- Protection components are connector-close with short return loops.
- Clean/dirty zone boundary is drawn and enforced in layout review.
Diagram intent: visualize protection layering at the connector and emphasize return-path direction without waveform-level details.
Performance Budgeting: Latency, Throughput, CPU/Memory, and Congestion
Budgeting turns performance into engineering contracts: an end-to-end latency model with measurable boundaries, a throughput-to-resource accounting view, congestion protections that preserve serviceability, and pass criteria that can be accepted in production.
- Ingress: first observable entry point until classification begins (driver scheduling included).
- Classify: policy match (routing/ACL/session) and rule-pack lookup decision.
- Queue: waiting time under contention; often the dominant term in P99 latency.
- Process: forwarding/encapsulation, copy count, crypto checks, and log field assembly.
- Egress: shaping, rate limiting, and transmission scheduling to the next hop.
- CPU budget: packet/session management, policy evaluation, crypto checks, and logging pipeline overhead.
- Memory budget: session state, buffers, queue depths, retry windows, and log staging buffers.
- Copy budget: copy count is a primary multiplier for CPU and memory bandwidth under OTA payloads.
- Storage budget: write/flush policies can dominate tail latency; measure it explicitly as a stage.
- Peak throughput: ≥ X Mbps for Y seconds
- Sustained throughput: ≥ X Mbps for Y minutes
- CPU headroom: peak CPU ≤ X% with P99 latency within budget
- Memory headroom: peak memory ≤ X% with stable queue depths
- Traffic classes: Control / Diagnostics / OTA / Telemetry must not share a single “best-effort” queue.
- Watermarks: define low/high thresholds per queue to trigger shaping and admission control.
- Backpressure: reject new sessions, delay non-critical transfers, and rate-limit at the boundary.
- Drop rules: drop-by-class and drop-by-session to prevent one client from starving critical services.
- Containment: circuit breakers for floods; avoid retry/log storms that amplify congestion.
| Traffic Class | Queue | Priority | Rate Limit | Drop Rule | Protection |
|---|---|---|---|---|---|
| Control | Q0 | Highest | Min guarantee + cap | Never drop by default | Admission control |
| Diagnostics | Q1 | High | Cap per session | Drop by session on flood | Breaker + audit |
| OTA | Q2 | Medium | Windowed throttle | Drop non-critical chunks | Resume + persist |
| Telemetry | Q3 | Low | Aggressive cap | Drop-first | No starvation |
- Latency: P50/P95/P99 per stage and end-to-end, plus tail-spike counts (above X ms).
- Throughput: sustained (Y minutes) and peak (Z seconds) with resource headroom recorded.
- Sessions: max concurrent sessions with handshake success and timeout rate thresholds.
- Congestion: queue depth distribution, drop rate, and backpressure trigger counts.
- Stability: degrade/recover counts and recovery time after congestion or stress.
- Latency: end-to-end P99 ≤ X ms; stage-level P99 ≤ X ms (Queue must remain bounded).
- Sessions: max concurrent sessions ≥ X with timeout rate ≤ Y% over Z minutes.
- Throughput: peak ≥ X Mbps and sustained ≥ X Mbps while CPU ≤ Y% and memory ≤ Z%.
- Loss/timeout: drop ≤ X/1k and timeout ≤ X/1k per traffic class.
- Recovery: congestion recovery ≤ X s and no mode oscillation above X/hour.
| Stage | Metric | Target | Measurement Method | Margin |
|---|---|---|---|---|
| Ingress | P99 latency | ≤ X ms | timestamp t0→t1 (fixed window) | X% |
| Classify | P99 latency | ≤ X ms | timestamp t1→t2 (rule eval) | X% |
| Queue | P99 wait | ≤ X ms | queue timestamp t2→t3 | X% |
| Process | CPU / copies | ≤ X% / ≤ X | profiling + counters | X% |
| Egress | P99 latency / drop | ≤ X ms / ≤ X | timestamp t4→t5 + stats | X% |
Engineering Checklist: Design → Bring-up → Production
Gate-based checklists turn complex gateway/TCU programs into repeatable execution. Each gate defines required inputs, checks, evidence artifacts, and pass criteria placeholders so cross-team delivery remains consistent from first bring-up to production.
- Inputs: architecture boundary, policy tables, minimum log fields, key/cert policy, state machines, performance budget.
- Checks: deny-by-default rules, explicit exceptions, audit fields completeness, persistence points, budget measurability.
- Outputs: frozen config_version / rule_pack_version, acceptance draft (pass criteria placeholders).
- Evidence: review record, signed tables, baseline workload definition (workload_id).
- Inputs: test tools, logging pipeline, workload profiles, congestion tests, fault-injection matrix.
- Checks: session stability, pressure and congestion behavior, P99 within budget, recovery actions verified end-to-end.
- Outputs: baseline performance report (run_id), known-good config, validated degraded-mode behavior.
- Evidence: P99 report, queue stats report, fault-injection report, recovery timing report.
- Inputs: release bundle, cert injection flow, station self-test, traceability schema.
- Checks: config/cert version alignment, self-test coverage, consistent pass criteria validation, audit attribution fields.
- Outputs: trace bundle (device_id, config_version, cert_version), factory pass report.
- Evidence: station logs, regression comparison report, sampling audit report.
| Gate | Item | Owner | Method | Pass Criteria | Evidence |
|---|---|---|---|---|---|
| Design | Boundary + policy tables complete | System / Security | Review + diff | Coverage = 100% | Signed tables |
| Bring-up | P99 within budget under stress | Test | Load test | P99 ≤ X ms | run_id report |
| Bring-up | Congestion protections verified | System | Queue stats | No starvation | queue report |
| Production | Version + cert alignment | Factory | Station self-test | Match = 100% | station logs |
H2-11 · Applications: Diagnostics / Gateway / TCU Patterns
This section maps common system shapes to practical constraints, required modules, and serviceability expectations. Each pattern keeps boundaries clear: it focuses on gateway/TCU system integration (bridging, DoIP/OTA, security policy, logging), and avoids PHY-level deep dives that belong to sibling pages.
A) Central Gateway (centralized)
- Typical scene: Multiple CAN/LIN domains converge to one gateway; DoIP diagnostics and OTA coordination are centralized.
- Key constraints: High session concurrency, queue isolation (diagnostics vs control vs logging), strict fault containment, predictable P99 latency.
- Serviceability minimum: session-id, tester-id, target ECU, service id, result code, duration, drop reason, policy decision.
- Related sibling pages to link: CAN FD transceiver / Selective wake / Ethernet PHY & switch / EMC & port protection.
Example BOM candidates (material numbers)
- Gateway compute / SoC:
NXP S32G274AABK0CUCT,Renesas R8A779F0,Infineon SAK-TC397XX-256F300S-BD - Automotive Ethernet switch:
NXP SJA1105TEL,NXP SJA1105EL - Automotive Ethernet PHY:
NXP TJA1100,TI DP83TC811R-Q1 - CAN FD transceiver / controller:
TI TCAN1044-Q1,TI TCAN4550 - LIN transceiver:
TI TLIN1029-Q1,TI TLIN1021-Q1 - Secure element / TPM:
NXP SE050A2HQ1/Z01SHZ,Infineon OPTIGA TPM SLB 9672 FW16 - Safety PMIC / SBC:
NXP MFS2633HMBA0AD,Infineon TLF35584QVVS1 - Port ESD (recommended-for-new):
Nexperia PESD2ETH100T-Q,Nexperia PESD2CANFD24LT-Q
B) Zonal Gateway + Ethernet Backbone
- Typical scene: Many LIN/CAN nodes aggregated per zone; zonal gateways uplink to an Ethernet backbone; DoIP/OTA policy is shared or centralized.
- Key constraints: Broadcast storm containment, per-zone rate limiting, deterministic forwarding under congestion, isolation of “noisy” zones.
- Pitfall to guard: Unbounded retries across zones turning into global queue collapse; missing per-zone “circuit breaker”.
- Related sibling pages to link: Selective wake / SBC with CAN/LIN / CAN FD transceiver / Ethernet PHY & switch.
Example BOM candidates (material numbers)
- Zonal SBC / CAN:
NXP UJA1169A,TI TCAN4550 - CAN FD transceiver:
TI TCAN1044-Q1 - Ethernet backbone switch:
NXP SJA1105TEL - Ethernet PHY:
NXP TJA1100,TI DP83TC811R-Q1 - Power & safety monitor:
NXP MFS2633HMBA0AD,Infineon TLF35584QVVS1
C) TCU-as-Gateway (TCU terminates external link + policies)
- Typical scene: TCU is the external termination point (TLS/VPN), and also enforces gateway policies for DoIP/OTA.
- Key constraints: Trust chain clarity (boot→keys→runtime policy), strong logging/forensics, robust rollback, strict separation between external and in-vehicle domains.
- Pitfall to guard: “Security termination” placed too deep (policy after bridging), causing untrusted traffic to consume internal queues.
- Related sibling pages to link: Secure gateway / selective wake / DoIP diagnostics / Ethernet PHY.
Example BOM candidates (material numbers)
- Compute:
NXP S32G274AABK0CUCT,Renesas R8A779F0 - External trust anchor:
Infineon OPTIGA TPM SLB 9672 FW16,NXP SE050A2HQ1/Z01SHZ - In-vehicle networking:
NXP SJA1105TEL,TI TCAN1044-Q1,TI TLIN1029-Q1 - Safety PMIC:
Infineon TLF35584QVVS1
D) Service Tool / Factory Mode (diagnostics throughput + traceability)
- Typical scene: Factory flashing, EOL test, service station diagnostics, controlled bypass policies, strong traceability fields.
- Key constraints: High throughput, strict access levels, deterministic test time, audit-ready logs (who/what/when/result).
- Pitfall to guard: “Test-only” backdoors leaking into field images; missing policy attestation tags.
- Related sibling pages to link: DoIP diagnostics / OTA lifecycle / security policy enforcement.
Example BOM candidates (material numbers)
- Compute / safety MCU option:
Infineon SAK-TC397XX-256F300S-BD - DoIP-facing Ethernet PHY:
TI DP83TC811R-Q1 - CAN FD access:
TI TCAN4550 - Trust anchor for station auth:
Infineon OPTIGA TPM SLB 9672 FW16
Use the pattern choice to drive: (1) queue isolation boundaries, (2) rate-limit placement, (3) trust termination location, and (4) minimum logs for field diagnosis.
H2-12 · IC Selection Logic: What to Choose and Why (with material numbers)
Selection is driven by policy boundaries and serviceability requirements first, then by compute/network/security/power modules. Material numbers below are illustrative references; always confirm AEC grade, package, longevity, and safety documentation for the target program.
- Choose by: session concurrency, copy budget (DMA/zero-copy), security acceleration, real-time partitioning, safety concept (ASIL targets), and I/O count.
- Validation hook: CPU headroom at peak DoIP + OTA + logging; verify P99 latency with congestion + encryption enabled.
- Example parts:
NXP S32G274AABK0CUCT,Renesas R8A779F0,Renesas R8A779G0,Infineon SAK-TC397XX-256F300S-BD,TI TDA4VM-Q1
- Ethernet topology: port count, QoS/TSN needs, mirroring for diagnostics, and storm control placement.
- Bus access strategy: discrete transceivers vs integrated SBC/controller; decide by wake policy, SPI bandwidth, and failure isolation.
- Example parts:
Switch:
NXP SJA1105TEL,NXP SJA1105ELEthernet PHY:NXP TJA1100,TI DP83TC811R-Q1CAN FD transceiver/controller:TI TCAN1044-Q1,TI TCAN4550LIN transceiver:TI TLIN1029-Q1,TI TLIN1021-Q1FlexRay (if required):NXP TJA1080A,Infineon TLE9221SX
- Choose by: root-of-trust availability, key storage capacity, crypto throughput, update/rotation method, and auditability.
- Validation hook: attested boot chain + policy versioning + log integrity (tamper evidence).
- Example parts:
NXP SE050A2HQ1/Z01SHZ,Infineon OPTIGA TPM SLB 9672 FW16
- Choose by: rail count, fail-safe outputs, watchdog concept, wake sources, and required safety diagnostics coverage.
- Validation hook: brownout/ignition cranking recovery + OTA power-loss recovery + watchdog-induced safe state.
- Example parts:
NXP MFS2633HMBA0AD,Infineon TLF35584QVVS1
- Choose by: capacitance budget, surge model assumptions, and placement feasibility (return path length dominates).
- Validation hook: post-ESD link stability + insertion loss / reflection sanity checks on the real harness.
- Example parts (recommended-for-new):
Nexperia PESD2ETH100T-Q,Nexperia PESD2CANFD24LT-Q,Nexperia PESD2CANFD36UU-Q
Practical acceptance placeholders (replace X/Y with program targets): P99 latency, max concurrent sessions, peak OTA throughput, drop/timeout rate, and post-ESD stability.
Recommended topics you might also need
Request a Quote
H2-13 · FAQs (Diagnostics / Gateway / TCU)
Long-tail troubleshooting only. Each answer is a fixed 4-line engineering path: Likely cause → Quick check → Fix → Pass criteria (threshold placeholder X).