CAN Controller & Bridge (TTCAN): Filtering, Remap & Gateway

Q: Bus load is low but frames drop — check RX FIFO or ISR jitter first?

Likely cause: RX FIFO / host queue overflow triggered by ISR entry jitter, IRQ masking, or insufficient service rate under bursts. Quick check: Correlate rx_overflow, queue_depth_peak, and isr_latency_p99 within the drop time window. Fix: Increase FIFO depth/watermarks, enable/optimize DMA, reduce IRQ masking, and isolate control vs diagnostic queues. Pass criteria: rx_overflow=0 and isr_latency_p99 < X µs for Y minutes under the same traffic replay.

Q: Only during diagnostic/OTA bursts, periodic messages delay — priority issue or missing shaping?

Likely cause: Diagnostic bursts share the same queue/service budget and starve periodic control flows (no plane split or quota). Quick check: Compare control-flow p99_latency and queue_depth_peak when diag_burst_rate spikes; inspect token_empty_time if shaping exists. Fix: Hard-split control vs diagnostic planes, enforce token bucket/quotas for diagnostics, reserve service for safety-class flows. Pass criteria: Control p99_latency < X ms while diagnostics runs at Z msgs/s, and drop_count(control)=0.

Q: After bridging, frames occasionally arrive out of order — how to validate merge ordering?

Likely cause: Multi-queue merge without stable ordering (timestamp domain mismatch or per-queue dequeue bias). Quick check: Log ts_ingress and ts_egress per (src_bus, ID); compute reorder_count for the same stream. Fix: Enforce per-stream FIFO ordering, unify timestamp source, or add sequence tags through the bridge pipeline. Pass criteria: reorder_count=0 for Y frames at X% bus load.

Q: Same rule table, new firmware makes latency drift — what to align first?

Likely cause: Timestamp capture point changed, rule-eval cost shifted, or counters/window definitions differ between builds. Quick check: Verify ts_domain (ingress vs post-filter), compare t_proc proxy (rule-eval time), and ensure identical replay/time-window settings. Fix: Standardize measurement points, log rule_ver + checksum, and run regression against the same traffic capture. Pass criteria: New build matches baseline within ±X% on p99_latency and queue_depth_peak under identical replay.

Q: CAN↔CAN bridging creates a storm — how to prove it’s a loop (not a noisy node)?

Likely cause: Forwarding loop (A→B and B→A) or mirrored rules reflecting traffic back, creating a storm signature. Quick check: Detect repeated (ID + payload hash) crossing both buses with near-zero inter-arrival; track loop_detect_count and top rule_id hits. Fix: Add loop-prevention tags/TTL, enforce one-way rules, and quarantine endpoints on storm signatures. Pass criteria: loop_detect_count=0 and storm-induced drop_burst < X per Y minutes.

Q: TTCAN startup jitter is high for a few seconds — sync settling or guard time too small?

Likely cause: Reference time acquisition is still settling, or schedule guard time is under-budgeted during the start phase. Quick check: Plot jitter vs time since power-up; check window_miss_count and slot-boundary violations during the first T seconds. Fix: Increase startup guard time, delay enabling non-critical traffic, and require sync lock before full schedule activation. Pass criteria: After T seconds, window_miss_count=0 and periodic jitter p99 < X µs.

Q: Diagnostic timeouts happen but CAN waveforms look fine — check DoIP→CAN backpressure first?

Likely cause: Diagnostic plane backpressure/queue coupling (DoIP ingress > CAN service), not a PHY-layer problem. Quick check: Inspect diag queue_depth_peak, token_empty_time, and timeout-aligned drop_reason logs (reason + rule_id + q_depth snapshot). Fix: Rate-limit DoIP requests, enforce per-target quotas, and hard-split control/diag queues with caps and priority. Pass criteria: Timeout rate < X/hour and diag queue_depth_peak < Y under the same tester script.

Q: After a rule update, legitimate traffic is blocked — how to do canary/version check/rollback?

Likely cause: Rule mismatch (mask/action), inconsistent ruleset distribution, or non-atomic update without version pinning. Quick check: Verify rule_ver and ruleset_checksum on all nodes; compare deny hits grouped by rule_id before/after rollout. Fix: Canary rollout + atomic swap, maintain last-known-good ruleset, and enable one-click rollback keyed by checksum. Pass criteria: False-deny rate < X% during canary, and rollback restores baseline within T minutes.

Q: After bus-off recovery, the gateway “self-excites” — how to decouple recovery from forwarding?

Likely cause: Recovery triggers burst retransmissions or rule-state resets, feeding back into forwarding queues and amplifying traffic. Quick check: Align bus_off_count/recovery_events with tx_rate, tx_abort, and queue spikes (queue_depth_peak). Fix: Gate forwarding during recovery, ramp TX rate, and keep ruleset stable across recovery transitions. Pass criteria: Post-recovery tx_rate < X msgs/s and control p99_latency < Y ms with no sustained queue saturation.

Q: Logs show drops, but field capture can’t reproduce — which “drop reason” field is missing?

Likely cause: Drops are recorded without the causal dimension (queue full vs rate-limit vs policy deny vs quarantine/loop guard). Quick check: Ensure every drop has reason + rule_id + q_depth + state captured at decision time. Fix: Standardize a reason taxonomy (QFULL/RATE/POLICY/QUAR/LOOP) and log it consistently across all drop paths. Pass criteria: Drop attribution coverage = 100% (no “unknown”), and replay using the same reasons reproduces the drop signature.

← Back to: Automotive Fieldbuses: CAN / LIN / FlexRay

A CAN controller/bridge is the traffic policy engine of in-vehicle networks: it filters, schedules, and routes messages so critical control frames stay deterministic while diagnostics and logging are shaped and audited.

Done right, it turns bus load, rule updates, and burst events into measurable budgets (latency/jitter/queue) with traceable rules (version/checksum) and serviceable logs (reason codes) instead of mystery failures.

Definition & Scope Guard

What it is

A CAN controller and bridge turns raw bus traffic into deterministic, policy-driven flows: receive → classify (filters) → optionally transform (remap) → queue/shape → forward across domains (CAN↔CAN, CAN↔Ethernet/DoIP), with counters and logs that make failures diagnosable.

Typical outcomes: stable latency under burst load, controlled forwarding (no storms), explainable drops/timeouts, serviceable field logs.
Primary value: forwarding becomes strategy (filtering, shaping, isolation, audit), not blind relay.

Scope Guard (contract)

This page covers

Controller datapath: mailboxes/FIFOs, acceptance filtering, timestamps, error states (bus-off policy).
Gateway/bridge policy: filter → remap → rate-shape → queue isolation → forward, with loop prevention.
Time-triggered messaging (TTCAN): schedule concepts, windows, determinism checks (no PHY deep-dive).
Latency & buffer budgets: stage breakdown, p95/p99 targets, backpressure and drop policies.
Diagnostics & serviceability: counters, drop reasons, black-box logging fields, audit hooks.

This page does NOT cover

PHY/transceiver electrical behavior (waveforms, termination, EMC, TVS/CMC placement). See sibling pages: HS CAN / CAN FD / SIC / CAN XL PHY.
Selective wake / partial networking (ISO 11898-6) filter tables and standby false-wake tuning. See: Selective Wake / PN.
Ethernet/TSN PHY details. Only bridge touchpoints are discussed (mapping, latency, isolation).

Fast self-check (is this the right page?)

Symptoms: drops with low bus load, timeouts only during diagnostics bursts, latency jitter, mis-forwarding, or bus-off recovery storms.
Questions: “Which rules forward what?”, “Where is the queue peak?”, “What is the p99 latency?”, “Which domain caused backpressure?”, “Can the event be proven by counters/logs?”

Who should read

System architects: choose bridge patterns and isolation boundaries; plan determinism and serviceability.
Firmware/network stack: implement rule tables, queues, timestamps, bus-off recovery, and logging schemas.
Diagnostics & test: define evidence fields (counters, drop reasons) and pass criteria for bursts and corner cases.
Production/service: rely on black-box logs for fast root-cause attribution in the field.

Diagram: Layer map (what belongs to transceiver vs controller vs gateway/bridge)

System Architecture Patterns

Why patterns (avoid register-first thinking)

Bridging can be repeated as a single pipeline: Ingress → Classify → Transform → Shape → Egress + Observe. Each architecture pattern is the same pipeline with different constraints (loop risk, jitter risk, burst risk, audit requirements).

Standard pipeline checklist (deep but non-PHY)

Ingress: identify source domain, message class, and burst behavior (steady vs diagnostics).
Classify: hardware filters (coarse) + software rules (fine) + security/diagnostic allowlists (strict).
Transform: ID remap and payload rewrite only when required; treat “modify” as a safety boundary decision.
Shape: isolate queues per class; reserve bandwidth for control traffic; rate-limit diagnostics/logging.
Egress: prevent loops and storms; enforce per-destination budgets; apply backpressure policies.
Observe: counters + drop reasons + rule-id + timestamps; enable field reproduction (“black-box”).

Pattern 1 · CAN ↔ CAN Gateway

Use when: multiple CAN domains require isolation plus controlled exchange (ID translation, domain boundaries).
Design focus: rule table ownership, remap correctness, loop prevention, congestion containment.
Common pitfall: “mirror forwarding” creates a silent loop; storms amplify across domains.
Verify: no-forward-back rule, loop detection counters, and bounded latency at high load.

Pattern 2 · CAN ↔ Ethernet Gateway (domain controller)

Use when: CAN traffic is aggregated into a higher-bandwidth backbone for centralized compute.
Design focus: latency/jitter budgets, queue isolation, prioritization of control vs bulk flows.
Common pitfall: Ethernet-side bursts turn into CAN-side jitter (queue coupling).
Verify: p95/p99 latency bounds with burst injection; queue peak and drop reason coverage.

Pattern 3 · Diagnostics Path (CAN ↔ DoIP)

Use when: service tools reach CAN via IP tunnels and a secure gateway.
Design focus: control-plane priority, diagnostics rate shaping, allowlist/audit, timeout attribution.
Common pitfall: diagnostic bursts starve periodic control frames or trigger false “network unstable” symptoms.
Verify: channel quotas, audit logs, and consistent timeouts correlated to queue and rule-id.

Pattern 4 · Bridged Logging (“Black-box” bus)

Use when: field failures require post-event reconstruction (who sent what, when, and why it dropped).
Design focus: ring buffers, minimal-but-sufficient fields, drop reason taxonomy, non-intrusive shaping.
Common pitfall: logging traffic competes with critical traffic; missing drop reasons makes logs useless.
Verify: log cannot destabilize control traffic; events are reproducible from counters + timestamps.

Evidence to log (minimum set)

Rule evidence: rule-id, rule-version/hash, hit-count, last-hit timestamp.
Queue evidence: per-class queue peak, drop-count, drop-reason, backpressure time.
Timing evidence: ingress timestamp, egress timestamp, derived latency histogram (p50/p95/p99).
Fault evidence: bus-off events, recovery attempts, quarantine triggers, storm counters.

Diagram: Four reusable gateway patterns (minimal topologies + key risk tags)

Filtering / Remapping / Routing Strategy

Core intent

Bridging succeeds when forwarding becomes policy: a versioned rule pipeline that is measurable, auditable, and safe under burst load. The goal is to define what is allowed, how it is shaped, and how it is proven by counters and logs.

Three-layer filtering (coarse → fine → strict)

Hardware filter (coarse): reduce RX load early (banked masks/lists/ranges). Treat as a performance asset. Evidence: per-bank hit count, reject count (if available), RX FIFO watermark.
Software rules (fine): domain isolation, forwarding decisions, remap, and shaping. Treat as a business/architecture asset. Evidence: rule-id hits, rule version/hash, drop reasons.
Security & diagnostics allowlist (strict): minimal exposure surface (default deny, audited allow). Treat as a security asset. Evidence: allow/deny audit logs, quota usage, timeout attribution.

Boundary rule

Keep hardware filters simple and stable. Put frequent-change logic into software rules with versioning and observability.

Remapping contract (when modification is allowed)

Default posture

Prefer forward or drop. Modifying control/safety-critical semantics is not the default. If modification is required, it must be explicit, versioned, and auditable.

Allowed reasons (whitelist)

ID translation: domain-specific ID plans require mapping across a boundary.
Payload rewrite: only for strictly defined gateway transformations with audit fields.
Signal-level mapping: only when virtualization/aggregation defines a clear “semantic owner”.

Minimum traceability fields (required)

rule-id + rule version/hash recorded at decision time.
before/after tag (compact summary) for remap events.
drop/deny reason taxonomy for blocked or shaped traffic.

Rate shaping & queue isolation (protect control traffic)

Use class-based isolation and quotas. Priorities alone can fail under contention; independent queues reduce coupling and stabilize p99 latency.

Control plane (reserved)

Periodic and safety-relevant flows. Dedicated queue, guaranteed service, and measurable jitter budget.

Diagnostics / OTA (token bucket)

Bursty traffic. Enforce quotas (token bucket), allowlist, and audit logs. Timeouts must be attributable.

Logging (drop-tolerant)

Lowest priority. Drop is allowed, but every drop must carry an explainable reason and counters.

Evidence to record

per-class queue peak + drop count + drop reason
token-bucket empty time + shaped count
end-to-end latency histogram (p50/p95/p99)

Loop prevention & storm suppression (safety valves)

Static guard: forbid mirror rules that create unconditional A→B and B→A forwarding; require an explicit “return path” policy.
Runtime guard: storm counters and short-window detection trigger quarantine; preserve control allowlist, throttle diagnostics/logging.
Observability: loop-detected count, quarantine reason, burst histogram, per-rule drop reasons.

Output artifact: Rule table template (field design)

Identity

rule-id (unique)
version/hash (rollback-proof)
owner tag (control/diag/log/security)

Match

src bus / dst bus
ID match (mask/list/range)
direction (ingress/egress)

Action

action (drop / forward / remap)
priority class (control/diag/log)
logging flag (audit on/off)

Shaping

rate limit (token bucket/quota)
burst allowance (peak control)
drop policy (oldest/newest per class)

Pass criteria (template)

Under diagnostic burst: control p99 forwarding latency < X ms (placeholder)
No uncontrolled loop/storm: storm counters remain < X within Y seconds (placeholder)
Every allow/deny decision is attributable: rule-id + reason + timestamps available

Diagram: Rule pipeline (Match → Decide → Transform → Queue → Emit) + minimal rule table fields

Bridging to Ethernet / DoIP

Scope guard

Only bridge touchpoints are covered: mapping points, isolation, shaping, audit, and latency evidence. Protocol tutorials (DoIP/UDS details, Ethernet PHY/TSN specifics) are intentionally out of scope.

CAN → Ethernet aggregation (why it exists, what it risks)

Why: domain controllers and centralized compute aggregate multiple CAN domains into a higher-bandwidth backbone.
Primary risk: Ethernet-side bursts couple into CAN queues, producing p99 latency spikes and unexplained timeouts.
Bridge focus: class isolation, rate shaping, and evidence fields that attribute delays to specific queues and rules.

Diagnostics path (IP tester → gateway → CAN)

Treat as an attack & congestion source: diagnostics is bursty and must not destabilize control traffic.
Minimum controls: default deny, strict allowlist, quota/token bucket, and full audit logs.
Timeout attribution: every timeout should map to a reason (deny, quota, queue peak, backpressure).

Control vs Diagnostic/Logging plane separation (non-negotiable)

Separate planes with independent queues and quotas. Control traffic requires reserved service; diagnostics/logging are shaped and audited.

Control

Reserved queue + determinism targets (p99 latency/jitter placeholders).

Diagnostic

Token bucket + allowlist gate + audit logs (timeouts must be attributable).

Logging

Drop-tolerant; always record drop reasons and counters.

Minimal security gateway principles (checklist)

Default deny: no rule means drop; drop includes reason + timestamps.
Allowlist only: each allow has rule-id, version/hash, and owner tag.
Rate limiting: per-channel quotas; record shaped events and token-empty durations.
Audit: allow/deny decisions are logged and attributable (rule-id + timestamps + counters).

Diagram: Gateway control/diagnostic plane split (isolation + shaping + audit)

Determinism, Latency & Buffer Budget

Core intent

Determinism is a measurable contract: define a latency path, break it into stages, monitor each stage, and size buffers for burst worst-cases. When congestion appears, enter controlled degradation instead of unpredictable instability.

Measurement contract (scope guard)

Latency path: ingress timestamp → rule decision complete → enqueue → dequeue → TX submit (or egress).
Jitter: p99 (or p99–p50 spread) on the same path; thresholds are placeholders until system targets are set.
Backlog: queue depth peaks and time-above-watermark over a defined window.

Latency decomposition (stage-by-stage)

t_ISR

Interrupt entry time: impacted by interrupt masking, priority inversions, and CPU contention. Monitor: IRQ latency histogram (or closest proxy).

t_proc

Processing time: rule matching, remap, and bookkeeping. Monitor: sampled per-message compute time; rule-eval time budget.

t_queue

Queueing time: growth under burst, shaping constraints, and backpressure. Monitor: queue depth peak, time-in-queue distribution, drop count.

t_wait

Send-wait time: arbitration opportunities and TX backlog effects. Monitor: TX backlog peak and time-above-watermark.

Engineering conclusion

Determinism requires both isolation (queues/classes) and observability (timestamps/counters). Faster hardware alone cannot fix coupling-induced p99 spikes.

Jitter sources (shared-resource contention)

Scheduling

interrupt masking windows
task preemption and priority inversions
non-preemptive critical sections

Data movement

DMA contention and arbitration
cache misses and memory bandwidth pressure
shared bus contention across peripherals

Congestion

queue backpressure and coupling
insufficient shaping for diagnostics/logging bursts
output TX backlog under high bus utilization

Buffer sizing method (load + burst + worst-case)

Step 1 — steady-state load

Define average ingress rate by bus utilization (placeholder X%) and average frame rate. Record typical processing service rate (t_proc_typ).

Step 2 — burst model

Specify burst size N and burst duration T for diagnostic/OTA/logging sources. Identify whether the burst is gated by allowlist/quota or arrives unbounded.

Step 3 — worst-case queue growth

Queue growth ≈ ingress_rate − service_rate during burst windows. Size FIFO/queue depth for the peak growth plus safety factor (placeholder X). Add watermarks to trigger shaping and controlled degradation before overflow.

Traffic-class guidance

Control: isolate and reserve service; avoid drops whenever possible.
Diagnostics: allowlist + quota; shaping is mandatory under burst.
Logging: drop-tolerant; prioritize recent-window observability (ring buffer).

Congestion strategy: drop policy + controlled degradation

Drop policy (contract)

Drop-oldest typically preserves a “recent window” for black-box replay.
Drop-newest can protect existing queued control sequences when bursts arrive.
Every drop/deny must include drop reason and counters.

Degraded mode (safe subset)

Entry triggers: queue depth over watermark, drop burst, error-rate spike.
Behavior: keep control allowlist; throttle diagnostics/logging aggressively.
Evidence: mode entry/exit logs with reason + duration.

Output artifact: Latency budget sheet (text template)

Ingress timestamp

Typical: X · Worst: X · Monitor: ts source validity · Pass: consistent origin

t_ISR

Typical: X · Worst: X · Monitor: IRQ latency hist · Pass: p99 < X

t_proc

Typical: X · Worst: X · Monitor: rule-eval sample · Pass: p99 < X

t_queue

Typical: X · Worst: X · Monitor: queue peak + drop · Pass: peak < X

t_wait

Typical: X · Worst: X · Monitor: TX backlog peak · Pass: p99 < X

End-to-end gate

Typical: X · Worst: X · Monitor: p50/p95/p99 hist · Pass: p99 < X ms

Diagram: Latency waterfall (t_ISR + t_proc + t_queue + t_wait) with observable counters

Diagnostics, Fault Handling & Logging

Core intent

Serviceability requires evidence: counters, state transitions, audit fields, and a minimal black-box that can reproduce the last seconds before failure. Fault isolation must prevent a single-node storm from collapsing the entire domain.

Minimum evidence set (counters + states)

Bus state

error active / passive
bus-off entry count + duration
recovery attempts

Queue state

RX overflow + watermark peaks
TX abort + backlog peaks
drop burst histogram

Policy state

deny count by reason
rule-id hits + version/hash
quarantine entry/exit counts

Fault isolation (rate-limit + quarantine)

Detect: error-rate spikes, drop bursts, bus-off, loop/storm counters.
Suspect: temporarily throttle diagnostics/logging, preserve control allowlist.
Quarantine: isolate the suspected source; forward only the safe subset; keep full evidence logs.
Recover: cooldown + stability gate; gradual re-enable prevents relapse storms.

Required audit fields

state (Normal/Suspect/Quarantine/Recover) · entry reason · exit reason · duration · counters snapshot · rule-id context

Logging schema (minimum fields for reproducibility)

Core fields

timestamp (aligned to the latency contract)
src bus / dst bus
ID / DLC / flags
action (forward/drop/remap)

Attribution fields

rule-id + rule version/hash
queue depth snapshot (per class)
drop/deny reason taxonomy
quarantine state (if any)

Service gate

Every deny/drop must be explainable by reason + counters + timestamps, enabling field triage without re-instrumentation.

Minimal black-box replay (last N seconds)

Ring buffer: retain the most recent N seconds or M events (placeholders).
Tiered retention: control is preserved; diagnostics is limited; logging may be sampled or compressed.
Trigger densification: bus-off, quarantine entry, and drop bursts increase capture density automatically.

Diagram: Fault quarantine state machine (Normal → Suspect → Quarantine → Recover)

Safety & Security Hooks

Core intent

A gateway becomes trustworthy through explicit hooks: safety-class protection (reserved service and no-modify), security policy gates (default deny, freshness, rate-limit), and auditable evidence (rule-id, reason, counters, timestamps).

Scope guard (what this section covers)

Covers: safety-class path contract, security policy gates, fault injection hooks, audit evidence fields, and measurable trade-offs.
Not covered: ISO 26262 handbook content, cryptography deep dives, and full DoIP/UDS protocol tutorials.

Safety path contract (deterministic, protected, and non-modifying by default)

Reserved service

Safety-class flows use isolated service resources: dedicated queue, reserved scheduling, and predictable p99 behavior under bursts. Evidence: safety queue peak, time-above-watermark, p99 latency.

No-modify principle

Safety-class frames default to forward-only: filtering and protection are allowed; payload rewriting is prohibited unless explicitly justified and audited. Evidence: rule tags for “no-remap”, remap count = 0 for safety class.

Degradation behavior

When congestion triggers degraded mode, the allowed set is restricted to the safety allowlist, while diagnostics/logging are throttled or paused. Evidence: mode entry/exit logs with reason and duration.

Fault injection hooks (testability without rewiring the system)

Injectable faults

Drop: simulate loss to validate recovery and isolation.
Delay: simulate scheduling/backpressure to validate p99 budgets.
Replay: simulate re-injection to validate freshness and anti-replay policy.

Targeting scope

by traffic class (safety/diagnostic/logging)
by rule-id / bus / time window
bounded rate to avoid accidental “always broken” states

Pass gates (placeholders)

Under injected delay/drop: safety p99 latency < X and safety drop = 0. Under replay injection: frames are denied with freshness reason and audited.

Security policy (minimum viable set)

Default deny + allowlist

Unruled traffic is denied by default. Allowed paths must be explicit and owned (rule-id + version/hash + owner tag). Evidence: deny reason taxonomy; rule version hash recorded.

Freshness / anti-replay (concept)

Anti-replay is enforced via time or counter concepts: stale frames are denied and logged. The policy is implemented at the gate, not scattered in application code. Evidence: replay-detected count; stale-deny count.

Audit is mandatory

Every allow/deny/drop must carry rule-id, reason, counters snapshot, and timestamp to enable field triage.

Verification points: safety vs performance trade-offs

False deny (mis-block): keep below X in validated diagnostic scenarios; all denies must be explainable.
False allow (mis-pass): target 0 across restricted domain boundaries.
Latency ceiling: safety p99 < X even under X% load + bursts; avoid heavy policy checks that inflate t_proc.

Diagram: Policy gates (Message → Safety gate → Security gate → Bridge queue)

Engineering Checklist (Design → Bring-up → Production)

Core intent

This checklist converts the architecture into executable gates. Each gate produces artifacts, counters, and pass criteria so that rule changes and load bursts remain explainable and repeatable.

Design checklist

Latency contract: define t_ISR/t_proc/t_queue/t_wait measurement points and p99 targets (placeholders).
Rule asset: src/dst/match/action/priority/rate/log + rule-id + version/hash.
Safety class: isolated queue, reserved service, “no-remap” principle.
Loop guard: block mirror rules; storm counter + quarantine logic defined.
Watermarks: shaping thresholds + degraded mode behavior defined and auditable.
Security minimum: default deny + allowlist + freshness + audit reason taxonomy.

Artifacts

rule table spec · rule version hash · latency budget sheet · mode transition spec · deny reason taxonomy

Bring-up checklist

Observability: timestamps align to the latency contract; rule-id and queue snapshots are present.
Baseline: establish p50/p95/p99 under idle, typical load, and diagnostic load.
Burst reproduction: inject diagnostic/logging bursts; confirm shaping and isolation prevent coupling.
Extreme load: run X% bus load + burst; verify safety p99 and safety drops remain within gate.
Fault injection: drop/delay/replay tests produce expected deny reasons and state transitions.

Evidence to capture

latency histograms · queue peak · token empty time · deny/drop reasons · quarantine entry/exit logs

Production checklist

Audit completeness: allow/deny/drop coverage with reason + rule-id + version/hash.
Black box: last N seconds ring buffer exportable; trigger densification on bus-off/quarantine/drop bursts.
Storm statistics: quarantine rates, drop bursts, and false-deny metrics are tracked and reviewed.
Regression gates: rule changes must pass typical, burst, and injection suites before rollout.
Traceability: deployed ruleset checksum recorded for every build and for field reports.

Pass criteria (placeholders)

Under X% load + burst: safety drop = 0 and safety p99 < X. Non-safety drops are explainable. Rule changes are fully traceable (version + checksum) and auditable in the field.

Diagram: Gate flow (Design Gate → Bring-up Gate → Production Gate)

Applications (Pattern Library — Bridge/Controller View)

How to use this library

Each bucket is described only by bridge/controller differences: determinism, filtering/routing, logging/diagnostics, and security hooks. PHY, EMC, and protocol tutorials are intentionally out of scope.

Material numbers (examples)

Examples below are representative. Always verify automotive grade (AEC-Q/ASIL context), CAN/CAN FD features (and TTCAN/time-trigger support if required), package/suffix, and long-term availability.

Powertrain & Chassis ECUs

The dominant constraint is deterministic behavior under bursts: periodic control traffic must remain stable while diagnostics/logging is shaped and audited.

Bridge focus

Determinism: safety-class isolated queue + reserved service; p99 latency gate under diagnostic bursts.
Filtering: allowlist-by-default across domains; loop/storm prevention with quarantine transitions.
Logging: black-box ring buffer with reason codes (rule-id, drop/deny reason, queue snapshot).
Security hooks: default deny + freshness concept + audit coverage (100% allow/deny/drop).

Example ICs (controller/gateway class)

Infineon AURIX TC3xx (e.g., TC397) — multi-CAN, strong determinism ecosystem.
NXP S32K3 (e.g., S32K344) — CAN FD heavy MCU class for ECUs.
Renesas RH850 (e.g., RH850/U2A) — automotive MCU with robust comms/peripherals.
Texas Instruments TMS570 (e.g., TMS570LS1224) — safety-oriented MCU class (check CAN feature set per variant).

Note: TTCAN/time-trigger capability is variant-dependent; confirm in the controller module feature table.

Body & Comfort

Node count is high and traffic is heterogeneous. The gateway value is rule scalability (filter/remap), shaped diagnostics, and serviceable logging under low-power policies.

Bridge focus

Determinism: explainable congestion behavior is more important than ultra-low jitter.
Filtering: layered filtering (coarse → fine → policy) to avoid rule explosion and loops.
Logging: event-trigger snapshots (wake/storm/drop bursts) to reproduce intermittent field issues.
Security hooks: default deny + quotas for diagnostics/OTA/logging to control attack surface.

Example ICs (body gateway / multi-node)

NXP S32K1 (e.g., S32K144) — body ECU MCU class (feature set varies).
ST SPC58 (e.g., SPC58EC) — automotive MCU family used in body domains.
Microchip MCP2517FD — external CAN FD controller (SPI) for channel expansion/offload.
Microchip MCP2518FD — external CAN FD controller (SPI) alternative/variant class.

External controllers are often used to add channels or isolate timing/IRQ load; host interface buffering still needs budget gates.

Diagnostics / TCU / Secure Gateway

The main problem is not forwarding, but isolation and auditability: diagnostics/OTA traffic must be rate-limited and fully attributable without degrading control traffic.

Bridge focus

Plane split: control plane vs diagnostic/logging plane with hard queue isolation.
Rules: explicit src/dst/match/action with rule-id + version hash; deny reasons are mandatory.
Rate shaping: token bucket/quota to prevent diagnostic bursts from starving control flows.
Audit: 100% allow/deny/drop coverage with counters + timestamps + queue snapshots.

Example ICs (gateway processor / domain controller)

NXP S32G2 (e.g., S32G274A) — automotive gateway processor class.
NXP S32G3 (e.g., S32G399A) — higher gateway performance class.
Texas Instruments Jacinto (e.g., TDA4VM) — domain controller class (verify gateway feature mix).
Renesas R-Car (e.g., R-Car V3H) — high-integration domain controller class (verify network peripheral mix).

DoIP/OTA details are intentionally not expanded here; only the bridge isolation/audit requirements are modeled.

Sensors / Actuators Nodes

Small nodes require predictable shaping at the gateway. The gateway must provide service logs with stable reason codes to make intermittent failures diagnosable.

Bridge focus

Shaping: smooth bursts from many small nodes; keep control flows stable.
Isolation: quarantine stormy endpoints to protect the rest of the bus.
Service logs: drop/deny must carry reasons, rule-id, and queue snapshots.
Security hooks: minimal policy gates at the bridge rather than scattered node firmware logic.

Example ICs (compact controller solutions)

NXP S32K118 — compact MCU class (verify CAN feature set per SKU).
ST SPC560 (e.g., SPC560B family) — compact automotive MCU family class.
TI TCAN4550 — external CAN FD controller + integrated transceiver (PHY details out of scope; useful for integration density).

Diagram: Application matrix (bucket × bridge concerns)

IC Selection Logic (Controller/Bridge View)

Intent

Select among three architectures: MCU-only, MCU + external CAN controller, or gateway SoC/domain controller. The decision is driven by throughput, determinism, filtering power, diagnostics evidence, and security hooks.

Requirement intake (ask these first)

Multi-bus scale: number of CAN/CAN FD channels and concurrent traffic (peak bursts included).
Determinism: required safety p99 latency/jitter gate (placeholder threshold).
Diagnostics strength: whether field triage needs black-box + reason codes + full audit.
Security boundary: default deny + allowlist + rate-limit + freshness concepts required.

1) Throughput

Throughput is end-to-end service capacity under bursts: RX/TX FIFO depth, DMA support, IRQ load, and queue isolation determine whether drop/overflow happens. Evidence: RX overflow, TX abort, queue peak, drop burst count.

2) Determinism

Determinism depends on predictable scheduling and timestamps: p99 latency gates, time semantics (if required), and isolated service for safety-class flows. Evidence: p99/p95 histograms, time-in-queue, mode entry/exit logs.

3) Filtering power

Filtering power is not only “how many filters”, but how maintainable the rule asset is: layered filters (coarse → fine → policy), update cost, and loop prevention. Evidence: per-rule hit counts, deny/drop reason taxonomy, rule version hash.

4) Diagnostics & logging

Field serviceability requires explainable events: every allow/deny/drop must be attributable (ts, rule-id, reason, queue snapshot), backed by a black-box ring buffer. Evidence: audit coverage, ring-buffer export, quarantine transitions with reasons.

5) Security hooks

Security hooks are enforceable gates: default deny, allowlist, rate-limit, freshness concept, and mandatory audit. Prefer centralized gates over scattered application logic. Evidence: deny reasons, replay/stale counters, quotas triggered, ruleset checksum.

Architecture options (with example material numbers)

Option A — MCU-only

Best when channel count and rule complexity are moderate and p99 gates can be met with careful queue isolation and observability.

S32K344 TC397 RH850/U2A TMS570LS1224

Option B — MCU + external CAN controller

Used to expand channels or offload message objects/filtering/timestamping. Budget the host interface (SPI) and ensure timestamp/queue evidence remains consistent.

MCP2517FD MCP2518FD TCAN4550

TCAN4550 includes an integrated transceiver; PHY/EMC details remain out of scope for this page.

Option C — Gateway SoC / domain controller

Preferred for multi-domain aggregation with strong audit/security boundaries. Requires strict plane split (control vs diag/log), rule asset management, and p99 budgets.

S32G274A S32G399A TDA4VM R-Car V3H

Diagram: Selection flow (decision tree)

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Controller/Bridge Debug — Fixed 4-line Answers)

Scope: controller/bridge/scheduling/filtering/latency/diagnostics only. Each answer is data-driven and anchored to counters, logs, and pass gates.

Bus load is low but frames drop — check RX FIFO or ISR jitter first?

Likely cause: RX FIFO / host queue overflow triggered by ISR entry jitter, IRQ masking, or insufficient service rate under bursts.

Quick check: Correlate rx_overflow, queue_depth_peak, and isr_latency_p99 within the drop time window.

Fix: Increase FIFO depth/watermarks, enable/optimize DMA, reduce IRQ masking, and isolate control vs diagnostic queues.

Pass criteria: rx_overflow=0 and isr_latency_p99 < X µs for Y minutes under the same traffic replay.

Only during diagnostic/OTA bursts, periodic messages delay — priority issue or missing shaping?

Likely cause: Diagnostic bursts share the same queue/service budget and starve periodic control flows (no plane split or quota).

Quick check: Compare control-flow p99_latency and queue_depth_peak when diag_burst_rate spikes; inspect token_empty_time if shaping exists.

Fix: Hard-split control vs diagnostic planes, enforce token bucket/quotas for diagnostics, reserve service for safety-class flows.

Pass criteria: Control p99_latency < X ms while diagnostics runs at Z msgs/s, and drop_count(control)=0.

After bridging, frames occasionally arrive out of order — how to validate merge ordering?

Likely cause: Multi-queue merge without stable ordering (timestamp domain mismatch or per-queue dequeue bias).

Quick check: Log ts_ingress and ts_egress per (src_bus, ID); compute reorder_count for the same stream.

Fix: Enforce per-stream FIFO ordering, unify timestamp source, or add sequence tags through the bridge pipeline.

Pass criteria: reorder_count=0 for Y frames at X% bus load.

Same rule table, new firmware makes latency drift — what to align first?

Likely cause: Timestamp capture point changed, rule-eval cost shifted, or counters/window definitions differ between builds.

Quick check: Verify ts_domain (ingress vs post-filter), compare t_proc proxy (rule-eval time), and ensure identical replay/time-window settings.

Fix: Standardize measurement points, log rule_ver + checksum, and run regression against the same traffic capture.

Pass criteria: New build matches baseline within ±X% on p99_latency and queue_depth_peak under identical replay.

CAN↔CAN bridging creates a storm — how to prove it’s a loop (not a noisy node)?

Likely cause: Forwarding loop (A→B and B→A) or mirrored rules reflecting traffic back, creating a storm signature.

Quick check: Detect repeated (ID + payload hash) crossing both buses with near-zero inter-arrival; track loop_detect_count and top rule_id hits.

Fix: Add loop-prevention tags/TTL, enforce one-way rules, and quarantine endpoints on storm signatures.

Pass criteria: loop_detect_count=0 and storm-induced drop_burst < X per Y minutes.

TTCAN startup jitter is high for a few seconds — sync settling or guard time too small?

Likely cause: Reference time acquisition is still settling, or schedule guard time is under-budgeted during the start phase.

Quick check: Plot jitter vs time since power-up; check window_miss_count and slot-boundary violations during the first T seconds.

Fix: Increase startup guard time, delay enabling non-critical traffic, and require sync lock before full schedule activation.

Pass criteria: After T seconds, window_miss_count=0 and periodic jitter p99 < X µs.

Diagnostic timeouts happen but CAN waveforms look fine — check DoIP→CAN backpressure first?

Likely cause: Diagnostic plane backpressure/queue coupling (DoIP ingress > CAN service), not a PHY-layer problem.

Quick check: Inspect diag queue_depth_peak, token_empty_time, and timeout-aligned drop_reason logs (reason + rule_id + q_depth snapshot).

Fix: Rate-limit DoIP requests, enforce per-target quotas, and hard-split control/diag queues with caps and priority.

Pass criteria: Timeout rate < X/hour and diag queue_depth_peak < Y under the same tester script.

After a rule update, legitimate traffic is blocked — how to do canary/version check/rollback?

Likely cause: Rule mismatch (mask/action), inconsistent ruleset distribution, or non-atomic update without version pinning.

Quick check: Verify rule_ver and ruleset_checksum on all nodes; compare deny hits grouped by rule_id before/after rollout.

Fix: Canary rollout + atomic swap, maintain last-known-good ruleset, and enable one-click rollback keyed by checksum.

Pass criteria: False-deny rate < X% during canary, and rollback restores baseline within T minutes.

After bus-off recovery, the gateway “self-excites” — how to decouple recovery from forwarding?

Likely cause: Recovery triggers burst retransmissions or rule-state resets, feeding back into forwarding queues and amplifying traffic.

Quick check: Align bus_off_count/recovery_events with tx_rate, tx_abort, and queue spikes (queue_depth_peak).

Fix: Gate forwarding during recovery, ramp TX rate, and keep ruleset stable across recovery transitions.

Pass criteria: Post-recovery tx_rate < X msgs/s and control p99_latency < Y ms with no sustained queue saturation.

Logs show drops, but field capture can’t reproduce — which “drop reason” field is missing?

Likely cause: Drops are recorded without the causal dimension (queue full vs rate-limit vs policy deny vs quarantine/loop guard).

Quick check: Ensure every drop has reason + rule_id + q_depth + state captured at decision time.

Fix: Standardize a reason taxonomy (QFULL/RATE/POLICY/QUAR/LOOP) and log it consistently across all drop paths.

Pass criteria: Drop attribution coverage = 100% (no “unknown”), and replay using the same reasons reproduces the drop signature.

Multi-channel CAN FD pushes CPU high — how to tell DMA vs IRQ bottleneck fast?

Likely cause: IRQ storm (small batch size) or DMA/memory contention causing long service gaps and FIFO pressure.

Quick check: Compare irq_rate, isr_latency_p99, DMA completion latency, and RX watermark behavior (rx_fifo_highwater).

Fix: Increase batching, prioritize DMA paths, reduce per-frame IRQs, and isolate bridge tasks to avoid cache/priority inversion.

Pass criteria: CPU headroom > X% and rx_overflow=0 at Y% bus load with Z channels concurrent.

Allowlist enabled, functions fail intermittently — allowlist miss or rate-limit collateral damage?

Likely cause: Legitimate frames are blocked by missing allowlist entries or shaped away by quotas/rate-limit (false deny).

Quick check: Group denies by reason (POLICY vs RATE) and top-hit rule_id; validate rule coverage for required IDs and rates.

Fix: Patch allowlist coverage, add per-function quotas, and reserve service/priority for safety-class/control messages.

Pass criteria: False-deny rate < X% and control drop=0 under worst-case diagnostic burst + normal operation.

CAN Controller & Bridge (TTCAN): Filtering, Remap & Gateway

CAN Controller & Bridge (TTCAN): Filtering, Remap & Gateway

Definition & Scope Guard

System Architecture Patterns

Filtering / Remapping / Routing Strategy

Bridging to Ethernet / DoIP

Determinism, Latency & Buffer Budget

Diagnostics, Fault Handling & Logging

Safety & Security Hooks

Engineering Checklist (Design → Bring-up → Production)

Applications (Pattern Library — Bridge/Controller View)

IC Selection Logic (Controller/Bridge View)

Request a Quote

Accepted Formats

Attachment

FAQs (Controller/Bridge Debug — Fixed 4-line Answers)

Explore

Categories

Get in Touch

CAN Controller & Bridge (TTCAN): Filtering, Remap & Gateway

CAN Controller & Bridge (TTCAN): Filtering, Remap & Gateway

Definition & Scope Guard

System Architecture Patterns

Filtering / Remapping / Routing Strategy

Bridging to Ethernet / DoIP

Determinism, Latency & Buffer Budget

Diagnostics, Fault Handling & Logging

Safety & Security Hooks

Engineering Checklist (Design → Bring-up → Production)

Applications (Pattern Library — Bridge/Controller View)

IC Selection Logic (Controller/Bridge View)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

FAQs (Controller/Bridge Debug — Fixed 4-line Answers)

Explore

Categories

Get in Touch