SBC with CAN FD: ECU Power, Wake Management & Safety Hooks
← Back to: Automotive Fieldbuses: CAN / LIN / FlexRay
An SBC with CAN FD turns ECU power + safety + wake management into a verified state machine, with the CAN FD interface integrated so mode transitions stay quiet, measurable, and serviceable. The goal is fewer wiring/BOM pitfalls and faster root-cause (wake/reset/thermal) using consistent logging and pass/fail criteria.
H2-1 · What is an SBC with CAN FD
An SBC (System Basis Chip) with CAN FD is an ECU “foundation layer” that consolidates power rails, supervision/reset, watchdog, low-power/wake policy, and a CAN FD PHY (often with optional LIN) into one coherent system block.
Definition (system view)
It is not just a transceiver. It is the ECU’s power + safety + wake orchestration point, with the CAN FD interface integrated so that power states and network behavior stay aligned.
What “integrated CAN FD (optional LIN)” really means
- Fewer coupling mistakes: sleep current, wake routing, and reset causes are controlled by one state machine.
- Cleaner power-state alignment: the CAN interface can be gated/biased consistently across Normal/Standby/Sleep.
- Optional LIN is a system convenience: useful when the ECU also hosts a small-node domain; electrical details belong to the LIN PHY page.
Where it sits in a typical ECU
- Near VBAT entry: to manage transients/thermal and stabilize rails before MCU domains ramp.
- Near the MCU: to keep reset/watchdog/interrupt paths short and unambiguous.
- Near the bus connector region: to coordinate power-state gating with port protection/EMC layout constraints.
Scope guard (to prevent overlap)
This page treats CAN FD as a system interface (modes, fail-safe behavior, wake integration, EMC knobs). Detailed physical-layer timing and waveform topics are intentionally excluded.
See also (internal): CAN FD Transceiver, Selective Wake / Partial Networking.
H2-2 · Why integrate: Problems SBC solves (vs discrete)
The strongest reason to integrate is not only fewer parts. Integration turns scattered “wires and defaults” into a single, testable state machine for power, safety, and wake behavior—reducing field escapes and improving serviceability.
What goes wrong in discrete builds (common failure patterns)
- Sleep current surprises: back-powering through I/O domains, missing rail gating, or wake pin defaults.
- Untraceable resets: POR/BOR/watchdog/thermal causes are fragmented across devices and logs.
- Wake chaos: bus/local/timed wake sources are not attributed consistently, increasing false wakes and missed events.
- EMC trade-offs become accidental: slew and return paths are tuned late, after the architecture is already brittle.
Integration benefits that show up in design reviews
- Complexity reduction: fewer rails-to-enables-to-wake chains; fewer ambiguous “glue” connections.
- Consistent low-power behavior: a defined mode machine (Normal/Standby/Sleep) controls rails and network gating coherently.
- Better diagnostics: unified flags/interrupts enable clear fault attribution (reset cause, wake reason, thermal events).
- Production repeatability: standardized bring-up checks and measurable pass criteria (sleep IQ, wake rate, reset statistics).
Integration trade-offs (what must be planned early)
- Earlier system decisions: rail policy, wake sources, and watchdog strategy must be defined before layout.
- Tighter EMC & thermal coupling: a single chip concentrates power, safety, and bus behavior—layout and return paths matter more.
- Bring-up discipline: validate mode transitions and attribution (wake/reset causes) as first-class acceptance tests.
Scope guard (what stays out of this section)
This section focuses on system integration outcomes (power states, safety hooks, wake attribution, diagnosability). Detailed CAN FD physical-layer timing and waveform optimization remain out of scope.
H2-5 · Power modes & state machine (Normal / Standby / Sleep)
Low power must be implemented as a state machine, not a set of scattered defaults. A clear mode contract prevents false wakes, missed wakes, and reboot loops by making rails, CAN behavior, MCU expectations, and allowed wake sources explicit and testable.
Mode contract (what each mode guarantees)
Normal
- Rails: all required rails ON (core + VIO + comm domains).
- MCU: run mode; reset cause & fault flags periodically sampled.
- CAN: normal operation; fail-safe rules enforced (timeout/thermal).
- Wake: not applicable (already awake), but attribution remains enabled for diagnostics.
Standby
- Rails: standby/always-on domains kept; high-load rails gated.
- MCU: deep sleep or halted; wake reason is latched for attribution.
- CAN: standby/silent behavior aligned with wake policy (no waveform tuning here).
- Wake allowed: bus wake, local wake, timer wake, ignition wake (as configured).
Sleep
- Rails: only always-on / minimal standby rails remain (target sleep Iq).
- MCU: off or deepest sleep; I/O must not back-power domains.
- CAN: lowest-power biasing; wake path is qualified and attributed.
- Wake allowed: only the explicitly enabled sources (avoid “anything wakes”).
Entry/exit conditions (sealed transitions)
Enter Standby / Sleep
- Bus silent window: wait for a defined quiet period before gating comm domains.
- MCU handshake: confirm the application is ready to sleep (no pending critical tasks).
- Rail policy applied: VIO/back-power checks must pass before deep gating.
Exit (wake paths)
- Ignition wake: highest priority; raise rails first, then software policy.
- Bus wake: qualified by policy; latch wake reason before enabling full comm.
- Local/timer wake: debounce/qualify; attribute source to avoid “mystery wake”.
Policy priority & first actions (prevent reboot loops)
Priority (who can wake what)
- Hard wake: ignition and safety-critical events.
- Qualified wake: bus/local/timer sources enabled by policy and validated by attribution.
- Blocked wake: noise-like triggers and unauthorized wake sources.
First actions after wake (minimum sequence)
- Latch wake reason: lock the source (bus/local/timer/ignition) before enabling full features.
- Raise rails + wait power-good: avoid brownout-triggered reset chatter during ramp.
- Early boot posture: keep loads minimal until rail stability is confirmed.
- Comm gating order: silent/standby → normal after policy checks (details belong to PHY pages).
- Log context: VBAT/temperature/mode counters to enable serviceability.
Pass criteria placeholders (measurable acceptance)
- Sleep Iq ≤ X µA (define VBAT/temperature/bus-connected conditions).
- False wake ≤ X / day (define window + what counts as “unauthorized wake”).
- Wake latency ≤ X ms (wake event → rails stable → MCU run).
- Reboot loops = 0 within a defined observation window.
Scope guard (to prevent overlap)
This section defines the mode contract (rails/MCU/CAN/wake policy), transitions, and verification metrics. Detailed CAN FD waveform/timing and selective-wake filter-table mechanics are intentionally excluded.
H2-6 · Watchdog, reset, and safety hooks (ASIL-friendly thinking)
A watchdog is only valuable when it is verifiable: it must detect real faults, be attributable in logs, and integrate with MCU safety handling via clear hooks (fault/interrupt/flags). Reset causes must be distinguishable to avoid “mystery resets” in production and field operation.
Window watchdog vs timeout watchdog (correct use)
Window watchdog
- Best for: detecting “too fast / too slow” servicing (runaway, timing faults).
- Common misuse: windows too tight cause false resets; servicing placed in the wrong task hides faults.
- Verify: inject CPU-load/IRQ storms and confirm predictable fault attribution.
Timeout watchdog
- Best for: baseline “stuck” protection with simpler software scheduling.
- Common misuse: too long spreads faults; too short resets during boot ramps.
- Verify: separate boot-phase and run-phase servicing policies (avoid boot chatter).
Reset taxonomy (POR/BOR/WD/THERM) and how to separate in logs
- POR: power-on reset; correlate with rail rise and initial power-good events.
- BOR: brownout reset; correlate with VBAT/rail dips and load transients.
- WD reset: watchdog violation; confirm with WDG status/flag and service history.
- THERM: thermal shutdown/reset; confirm thermal flag and recovery behavior.
Minimum trace fields (serviceability baseline)
reset_cause · last_wake_reason · mode_before_reset · VBAT_min/VBAT_now · temperature · fault_flags snapshot · watchdog_config_id
Safety hooks (fault pin / interrupt / fault-injection readiness)
- Fault pin / INT: map critical vs warning events; avoid ambiguous “one pin means everything”.
- Attribution first: latch cause codes before clearing flags or changing modes.
- Fault injection: validate reactions by triggering WDG violations, brownout scenarios, and thermal thresholds under controlled conditions.
- MCU integration: route safety events into a consistent handler that logs context and enforces a recovery policy.
Production focus (reset statistics and root-cause traceability)
- False reset rate ≤ X / 1k hours (define conditions and duty cycle).
- Reset cause distribution is observable (POR/BOR/WD/THERM) with a consistent decoding table.
- Reboot loop prevention is enforced (no repeated resets within a defined time window).
- Field correlation is enabled by “minimum trace fields” logged at each reset boundary.
Scope guard (to prevent overlap)
This section focuses on watchdog strategy, reset attribution, and safety hooks for diagnosability. It does not expand protocol timing or physical-layer waveform tuning.
H2-7 · Wake management (bus/local/timed) without overlap
Wake must be managed as a system policy: which sources are allowed in each low-power mode, how false wakes are suppressed, and how the wake source is attributed for serviceability. Protocol-level filter-table mechanics are intentionally delegated to the ISO 11898-6 selective-wake page.
Wake source map (what can wake the ECU)
- Bus wake: wake request derived from in-vehicle network activity, qualified by policy.
- Local pin wake: external pins (switch/sensor) with debounce/threshold management.
- Timer wake: periodic wake to maintain freshness, diagnostics, or keep-alive tasks.
- Ignition wake: highest-priority power-domain signal; typically overrides other gating.
Mode vs allowed wake (policy matrix)
Standby: bus ✓ · local ✓ · timer ✓ · ign ✓
Sleep: policy-only ✓ (avoid “anything wakes”)
Recommendation: bind each wake source to a mode contract and a measurable acceptance target.
False-wake control (manage noise without protocol overlap)
- Debounce windows: require stability before accepting local/ignition transitions.
- Threshold + hysteresis: prevent small excursions from triggering wake (temperature/noise aware).
- Quiet/observe windows: do not accept wake during entry transients (avoid wake loops).
- Rate limiting: enforce cool-down time to prevent repeated wake storms.
- Policy versioning: track configuration revision to correlate field events with settings.
Protocol-level frame matching and filter-table details are intentionally handled by the ISO 11898-6 selective-wake page.
Wake attribution (serviceability baseline)
Attribution pipeline (minimum)
- Latch: lock wake reason before clearing flags or switching modes.
- Classify: bus / local / timer / ignition (concept-level categories).
- Expose: reason register + INT/FLAG + optional output pin (if available).
- Log: wake_reason + mode_before_wake + VBAT/temperature + policy_id.
Measurable checks (placeholders)
- False wake ≤ X/day (define “unauthorized wake”).
- Missed wake = 0 (trigger seen but no boot entry).
- Attribution accuracy ≥ X% (source matches stimulus).
- Wake latency ≤ X ms (event → rails stable → handler).
Responsibility split (SBC vs gateway/PN)
- SBC: power rails + wake enabling + qualification windows + attribution outputs.
- Gateway / PN logic: protocol filtering tables and network-level wake authorization (handled in ISO 11898-6 page).
Scope guard (to prevent overlap)
This section describes wake sources, qualification, and attribution at the power/wake layer. It does not expand ISO 11898-6 filter-table specifics or physical-layer timing details.
H2-8 · CAN FD inside an SBC: what matters (system view)
When CAN FD is integrated into an SBC, the main pitfalls are rarely “waveform details”. The system-level risks come from configuration knobs (slew/timeout/mode), fail-safe behaviors, and reset/sleep sequencing that couple into EMC, false wake exposure, bus errors, and sleep current.
Integrated PHY pitfalls (system layer)
- I/O domain coupling: TXD/RXD level domain and VIO sequencing can trigger back-power and sleep-Iq overflow.
- Mode transition visibility: reset and wake transitions can appear as bus activity if silent windows are not enforced.
- Fail-safe consequences: timeout and dominant protection protect the network, but can drive system-level retries and wake storms.
Programmable slew/drive (EMC vs robustness trade-off)
- Slower edges: lower emissions tendency, but reduced margin on heavy harness/loads and across temperature.
- Faster edges: stronger robustness margin, but higher emissions risk and more stringent layout/return requirements.
- System method: manage settings as discrete profiles with a validation matrix (harness/temperature/load).
Fail-safe behavior (TXD stuck dominant, dominant timeout)
- TXD stuck dominant: can hold the bus, forcing network-level isolation and loss of communication availability.
- Dominant timeout: prevents persistent dominance, but may shift failures into retries, bus-off events, and wake storms.
- System countermeasure: attribute the event, gate re-entry, and log the context before re-enabling normal mode.
Reset/sleep interaction (silent windows and recovery posture)
- Power-up silent window: stabilize rails and mode policy before enabling normal bus participation.
- Sleep entry order: ensure TXD/RXD are not driving while VIO domains transition to prevent back-power.
- Bus-off recovery policy: enforce cool-down windows and logging to avoid oscillation loops.
MCU interface and I/O domain policy (prevent back-power)
- TXD/RXD level domain must match VIO policy; undefined domains often create leakage paths.
- Sequencing rule: when VIO is OFF, TXD/RXD must not drive; enforce high-Z posture and clear defaults.
- Field symptom: sleep Iq spikes and unpredictable wakes often trace back to I/O domain back-powering.
Scope guard (to prevent overlap)
This section covers configuration knobs and system outcomes (EMC, false wakes, bus errors, sleep current). Detailed sampling-point, loop-delay, and waveform symmetry topics are delegated to the CAN FD transceiver page.
H2-9 · Robustness vs automotive transients (load dump, reverse battery, shorts)
The SBC power entry defines ECU survivability. This section frames power transients as a review-ready chain: event → system risk → protective posture → recovery policy → measurable acceptance. Component encyclopedias are intentionally avoided; only system hooks and verification posture are covered.
Transient map (events → typical system risks)
- Load dump / jump start (over-voltage): rail overstress, clamp saturation, mode churn.
- Cold crank (deep droop): brownout chatter, reboot loops, bus visibility glitches.
- Reverse battery (polarity error): reverse current paths, leakage drift, latent failures.
- Shorts + thermal (abuse): current limit posture, thermal shutdown, recovery oscillation.
Review cue: define which rails must stay alive, which may drop, and how wake/reset attribution is preserved across events.
Load dump & jump start (over-voltage posture)
- System objective: protect downstream rails while keeping the ECU in a controlled posture.
- Protection posture: clamp/limit/shutdown decisions should map to a defined mode contract (Normal/Standby/Sleep).
- Recovery order: stabilize rails → latch fault cause → enforce silent window → re-enable communication.
Acceptance placeholders: no rail overstress beyond X; no reset chatter; recovery ≤ X ms; fault counters and context are traceable.
Cold crank (deep droop without reboot loops)
- Failure mode: BOR/POR oscillation creates repeated boots and unstable bus participation.
- Rail policy: define always-on/standby rails that preserve attribution and avoid I/O back-power.
- Silent window: prevent entry transients from being interpreted as legitimate bus activity.
Verification placeholders: reset chatter ≤ X within Y seconds; wake attribution remains consistent; sleep Iq returns to target after recovery.
Reverse battery (polarity error → controlled protection and recovery)
- System concern: reverse paths can leak through supply pins and I/O structures, causing latent drift.
- Posture: force safe reset/quiet state, preserve fault context, avoid partial powering of logic domains.
- Recovery: after polarity returns, enforce a stabilization window before re-entering network participation.
Field signature to avoid: persistent sleep-Iq inflation and unpredictable wakes caused by back-powering after a polarity event.
Shorts & thermal (limit, shutdown, and recovery policy)
- Current limit posture: define whether rails droop, fold back, or transition into a restricted mode.
- Thermal shutdown: treat as a system event; capture context and avoid uncontrolled retry storms.
- Recovery strategy: choose retry vs latch-off posture and bind it to measurable gates (cool-down window, max retries).
Bus short robustness (system behavior, not PHY electrical detail)
- Short-to-VBAT/GND: ensure the ECU remains in a controlled posture (quiet, alarm, and logging) during the fault.
- Domain isolation: prevent bus pins from partially powering logic domains and inflating sleep current.
- Recovery: after fault removal, enforce a recovery window before normal participation to avoid oscillation loops.
Design review checklist (ready-to-use)
- Event coverage: load dump, crank, reverse, jump start, shorts (inputs are explicitly tested).
- Protective posture: clamp/limit/shutdown behaviors are mapped to mode contracts.
- Recovery gates: silent windows, cool-down windows, and retry limits are defined.
- Traceability: fault flags + event counters + VBAT/temperature snapshots are captured.
- No loops: no reboot storms and no repeated wake storms under any transient scenario.
Scope guard (to prevent overlap)
This section focuses on system posture and verification under power transients. Detailed TVS/CMC/termination parts selection and CAN PHY electrical deep-dives are delegated to the EMC/Protection and CAN Transceiver pages.
H2-10 · EMC co-design & layout hooks (system-level, not component bible)
System EMC is dominated by partitioning and return paths. This section focuses on zones, ground/return planning, connector-side protection placement principles, and a measurable workflow for configuring slew profiles. Detailed TVS/CMC/termination part encyclopedias are intentionally delegated.
PCB partitioning (power noisy vs digital vs bus interface)
- Power noisy zone: input protection, high dI/dt loops, and rail generation; keep loop areas tight.
- Digital zone: MCU/clock/reference; protect the quiet reference and prevent return disruptions.
- Bus interface zone: CAN pins and connector adjacency; manage ESD/CM return paths deliberately.
Return path & ground planning (make current return predictable)
- Harness return awareness: do not force return currents to detour through sensitive areas.
- Split planes with intent: avoid cutting the natural return path; use stitches at boundaries.
- Star-ground only when justified: uncontrolled “star” wiring can increase loop areas and radiation.
Connector-side protection placement (principles only)
- Near-connector: place protection close to the connector to intercept fast events early.
- Short return: the protection return path must be short and direct to the intended reference.
- Keep-out discipline: reserve space to prevent sensitive traces from crossing noisy return regions.
Detailed TVS/CMC/split termination device selection is delegated to the EMC/Protection page.
Slew configuration as a workflow (profiles + measurable gates)
- Use profiles: EMC-friendly / robust / cold-temp (treat as discrete settings sets).
- Bind a matrix: EMI margin, bus error counters, false wake, and recovery time across harness/load/temp.
- Avoid single-point tuning: validate in representative wiring and mode transitions (boot, sleep, wake).
Validation hooks (system-level checks)
- Layout sanity: verify return continuity and that protection returns do not detour through digital zones.
- Mode transitions: check boot/sleep/wake for emission spikes and false wake exposure.
- Profile deltas: quantify how each profile shifts EMI margin and system robustness metrics.
Common pitfalls (fast diagnosis cues)
- Protection too far: “installed” but ineffective because the event reaches the board first.
- Return cut: split ground breaks natural return, increasing loop area and radiation.
- Unplanned harness return: current finds a path through sensitive zones.
- Slew tuned once: passes in lab but fails across harness/load/temp without a matrix approach.
Scope guard (to prevent overlap)
This section explains system partitioning, return paths, and a profile-based EMC workflow. Detailed TVS/CMC/split termination part encyclopedias are delegated to the EMC/Protection page.
Applications (Body/Comfort domains & gateway patterns)
Partial networking is most effective when the vehicle is partitioned into domains and wake is targeted to only the ECUs that are needed. This section provides three reusable application patterns without expanding into device-level EMC or full gateway implementation details.
Scope lock (to avoid cross-page overlap)
- In scope: domain roles, wake boundaries (remote/timed/diagnostic), node-level PN templates, and attribution fields.
- Out of scope: device-level EMC components, CAN FD waveform tuning, and full gateway/bridge internal implementation.
Pattern 1 — Body/Comfort domain: many small ECUs, targeted wake
Why PN fits: large node count amplifies standby cost; targeted wake avoids powering the whole body domain for one function.
Policy skeleton: keep a minimal whitelist of wake-up frames; treat every policy as versioned (filter_table_version + policy_version).
Attribution requirement: every wake must be explainable with source, timestamp, and (when applicable) filter_hit_id.
Common pitfall: a “convenience” diagnostic bypass that becomes permanent and silently destroys standby targets.
Pattern 2 — Gateway/TCU: remote/timed/diagnostic wake boundaries
Remote wake: define when the gateway may wake a domain directly vs when it must emit a filtered wake-up frame to target nodes.
Timed wake: treat periodic jobs as a budgeted resource; enforce cooldown windows to avoid “chatter wakes”.
Diagnostic wake: require time-bounded service mode + audit logging; record who requested bypass and for how long.
Minimum boundary fields (placeholders)
wake_source · request_origin · policy_version · duration_limit · outcome
Pattern 3 — Sensor/actuator node: minimal PN template (reusable)
Use a node template to prevent configuration drift across many small ECUs. Keep the template small, versioned, and measurable.
Template fields
- node_role: sleep-only / PN-capable / always-on
- wake_sources: bus / local / timed (plus remote via gateway policy)
- filter_set: minimal whitelist (ID/mask + optional DLC/payload match)
- logging_hooks: source + timestamp + version (+ filter_hit_id when present)
Pass criteria (placeholders)
- standby Iq: within X µA over Y minutes (stable window defined)
- false wake rate: within X/day over Y days, with attribution completeness ≥ Z%
- wake latency: P95 within X ms from bus activity to host-ready
IC Selection Notes (Selective-wake transceiver / SBC pairing)
This section provides PN-specific selection logic and concrete material-number examples. It does not replace HS CAN / CAN FD transceiver selection pages. The goal is to choose the correct PN capability class, interface fit, and serviceability hooks for the system policy.
PN-only scope (avoid overlap with transceiver families pages)
- In scope: standby Iq definition, wake filtering capability, wake outputs, false-wake suppression knobs, versioning + logging hooks.
- Out of scope: detailed CAN FD physical-layer tuning, waveform shaping deep dives, and component-level EMC design.
Step 1 — Freeze selection inputs (as fields, not prose)
Inputs
node_role · battery_budget · wake_sources · update_policy · logging_level
Why it matters
it determines filter table size, wake evidence needs, and whether an SBC-class device simplifies power + wake policy enforcement.
Step 2 — Must-check PN specs (the “four-piece set”)
- Standby Iq (PN listening definition): measure in the exact low-power mode that still monitors for wake frames.
- Filtering capability: ID/mask + optional DLC/payload match + table capacity + rule priority behavior.
- Wake outputs / evidence: wake pin behavior plus readable hit evidence (e.g., hit_id / status) for attribution.
- False-wake suppression knobs: debounce window, second-confirm option, cooldown windows, and rate-limited wake counters.
Step 3 — Match to SBC/MCU interfaces (policy meets pins)
- Wake pin compatibility: voltage domain, pull behavior, and wake-to-boot timing compatibility with MCU/SBC policy.
- Reset strategy alignment: avoid losing wake evidence due to immediate resets; preserve attribution across boot.
- Supply domain reality: define what must remain powered in PN listening (and record that as part of the budget).
Step 4 — Risk radar (PN projects fail here)
Update & governance
filter table versioning + rollback; service bypass must be time-bounded and auditable.
Post-stress behavior
after ESD/surge events, false-wake behavior must remain consistent (track false_wake_rate and suspects).
Serviceability
require wake attribution fields: source · timestamp · version · (hit_id when available) and a ring buffer replay path.
Example material numbers (PN-capable classes) — verify package/suffix/availability
PN-capable CAN transceivers (selective wake / ISO 11898-6 related)
- NXP: TJA1145A (HS CAN transceiver for partial networking)
- TI: TCAN1145-Q1 (CAN FD transceiver with partial networking via selective wake)
- TI: TCAN1146-Q1 (selective wake + watchdog/diagnostics variant class)
- Infineon: TLE9255W (HS CAN transceiver with partial networking)
- Microchip: ATA6570 (HS CAN transceiver with partial networking)
SBC pairing examples (SBC with CAN + partial networking / selective wake)
- NXP: UJA1168 (mini HS CAN SBC for partial networking)
- NXP: UJA1169A family (mini HS CAN SBC with selective wake and CAN FD-passive variants)
- Infineon: TLE9471-3ES (Lite CAN SBC family with CAN partial networking / selective wake feature)
Selection rule: treat the material number as the starting point; the project must still validate PN behavior under the system’s harness, wake sources, and policy versioning requirements.
Recommended topics you might also need
Request a Quote
FAQs (SBC system troubleshooting; 4-line answers)
These FAQs close long-tail troubleshooting without expanding new topics. Each answer is a fixed 4-line structure: Likely cause / Quick check / Fix / Pass criteria (with measurable placeholders).
Sleep current is 10× higher than expected — what is the first rail/isolation sanity check?
Likely cause: a “switched-off” domain is being back-powered through I/O or an always-on load is unintentionally left enabled.
Quick check: measure V(domain) while VIO is off; read SBC mode/policy bits; split Iq by disabling suspected rails/loads one at a time.
Fix: enforce pin posture (high-Z where needed), disable unused blocks (e.g., LIN, wake comparators), and gate always-on loads by policy/state entry.
Pass criteria: Sleep Iq ≤ X µA over Y min, VBAT=[Vmin..Vmax], Temp=[Tmin..Tmax]; ΔV(switched rail) ≤ X mV when “OFF”.
Random resets in the field but not on bench — how to separate BOR vs watchdog vs thermal?
Likely cause: environment-triggered reset sources (VBAT dips, watchdog window violations, or thermal events) are not captured with a consistent reset_cause taxonomy.
Quick check: read/reset_cause + counters; log VBAT_min and temperature snapshot near reset; correlate with watchdog service timestamps and mode transitions.
Fix: implement a small retained “reset ring buffer” (reset_cause + VBAT_min + T_max + policy_id); tune BOR/WD windows and add crank/thermal-aware gating.
Pass criteria: reset attribution accuracy ≥ X% in stress tests; reset rate ≤ X / 100 h under defined profile; reset chatter = 0 over N cycles.
Wake-ups happen at night with no bus traffic — how to triage false-wake sources fast?
Likely cause: an enabled wake source (local pin, timer, ignition sense, or bus wake) is noisy or misconfigured, causing false triggers in Sleep/Standby.
Quick check: read wake_reason + latched pin status immediately after wake; confirm timer schedule; probe the wake pin(s) for bounce/noise vs debounce settings.
Fix: disable unused wake sources in Sleep, tighten debounce/thresholds, add rate-limiting, and validate with representative harness disturbance and supply ripple profiles.
Pass criteria: false wake ≤ X / day (defined test method); wake_reason match ≥ X% across N trials; wake storm limit ≤ X wakes/hour.
CAN works until you enter standby, then bus errors spike — what mode-transition check is most common?
Likely cause: the transition sequence disables the transceiver or IO domain at the wrong time, creating a noisy edge case (missing silent window, premature TX enable/disable).
Quick check: capture timing of STBY request, transceiver enable, RST, and VIO rail; read error counters just before/after transition and check for bus-off oscillation.
Fix: enforce a gated sequence (silent window → disable TX → enter standby), keep required rails alive for wake-only receive, and validate transitions on representative harness/load.
Pass criteria: errors ≤ X / transition; bus-off count = 0 over N transitions; wake latency ≤ X ms after exit.
After cold crank, ECU boots but CAN stays silent — first “power-good vs transceiver enable” check?
Likely cause: rails recover but the transceiver enable/gating remains inhibited (power-good gating, IO domain not ready, or SBC still in a restricted mode).
Quick check: measure rail stability vs the CAN_EN signal; read SBC mode/status bits; verify TXD/RXD levels are valid for the IO domain.
Fix: add a post-crank recovery step (re-apply policy, re-enable transceiver after rails settle), and ensure IO domain is powered before enabling CAN.
Pass criteria: CAN activity begins ≤ X ms after rails stable; no “silent-after-crank” occurrence over N crank profiles; mode/status flags consistent with policy_id.
TXD stuck dominant triggers a cascade reset — what timeout/fail-safe policy prevents reboot loops?
Likely cause: dominant timeout/fail-safe is disabled or not integrated with the mode/state machine, causing repeated bus errors and watchdog-driven reboots.
Quick check: confirm TXD is stuck low; read timeout/fault flags; watch reset counters to detect a reboot loop (reset_cause repeating with short intervals).
Fix: enable dominant timeout, force a safe silent mode on timeout, and add a lockout/backoff after N rapid resets to prevent reboot storms.
Pass criteria: stuck-dominant event leads to silent-safe posture within X ms; reboot loop count ≤ X per event; recovery occurs only after fault clears and policy gate passes.
EMC passes at one slew setting but fails with another — what is the quickest knob-to-symptom mapping?
Likely cause: slew/drive knobs trade emissions vs robustness; changing them can shift both radiated peaks and error/false-wake susceptibility.
Quick check: run A/B with the same harness: record EMI margin, bus error counters, false-wake rate, and sleep Iq drift; keep everything else fixed (policy_id/profile_id).
Fix: define validated profiles (EMC-friendly / robust / cold-temp) with a verification matrix; select a profile that meets both EMC and system reliability targets.
Pass criteria: EMI margin ≥ X dB AND errors ≤ X / h AND false wake ≤ X / day; profile_id fixed across N builds.
LIN option present but unused; sleep Iq is still high — what pin/default-mode mistake is typical?
Likely cause: LIN pins/blocks remain in an active default state (pull-ups, wake comparators, or supply path enabled) even when LIN is not used.
Quick check: read LIN mode bits; measure Iq delta when forcing LIN pins to known safe states; check for external pull-ups that keep the block biased.
Fix: explicitly disable the LIN block in Sleep/Standby policy, set unused pins to high-Z or defined levels, and ensure the LIN-related rail is not left on.
Pass criteria: LIN-disabled Iq delta ≤ X µA; no unintended LIN-related wake over Y h; policy_id confirms LIN block = OFF in Sleep.
ESD test passes once, later nodes become “fragile” — what degradation logging is fastest for SBC + port?
Likely cause: latent ESD stress shifts leakage/thresholds or weakens return paths, causing gradual increases in Iq, wake noise sensitivity, or bus error rates.
Quick check: log pre/post ESD: Sleep Iq, wake count, error counters, fault flags; compare pin leakage/idle voltage levels and any thermal drift signature.
Fix: improve return paths and connector-near protection placement; add degradation counters + quarantine rules; include a post-ESD self-test to catch drifting units early.
Pass criteria: post-ESD drift ≤ X% (Iq and error rate) over Y cycles; no monotonic “gets worse” trend; fault flags stable and explainable.
Thermal shutdown happens only with high bus utilization — what power-path measurement catches it quickest?
Likely cause: transceiver/LDO dissipation rises with dominant duty cycle and rail dropout, pushing junction temperature over the limit only during heavy traffic.
Quick check: measure rail current and VBAT-to-rail dropout under utilization sweep; read thermal flags; correlate with bus utilization and mode (normal vs standby).
Fix: reduce dissipation (lower dropout, switcher rail, validated slew profile), improve thermal copper/vias, and enforce traffic/duty limits in worst-case thermal scenarios.
Pass criteria: no thermal shutdown over Y min at utilization U%, Temp=[Tmin..Tmax]; TJ estimate remains below X°C; fault flags match observed thermal events.
Production has intermittent “no wake” units — what fixture/ground reference check is usually missing?
Likely cause: test stimulus does not reach the DUT pin with a consistent reference (fixture ground, contact resistance, or marginal amplitude), or policy/config is not programmed/read back.
Quick check: probe the wake stimulus at the DUT pin (not at the fixture output); verify fixture ground reference; read wake_reason latch + policy_id readback on the same cycle.
Fix: add fixture self-check (continuity + ground reference), widen stimulus margin, enforce contact cleaning, and require policy/config readback before pass.
Pass criteria: wake success ≥ X% across N cycles; wake_reason match ≥ X%; “unknown wake/no-wake” category = 0.
Diagnostics can’t tell why ECU woke — what minimum wake attribution signals should be exposed?
Likely cause: wake sources are not latched across reset/mode transitions, or the system clears logs before capturing the wake event context.
Quick check: confirm wake_reason and pin-latch survive until the application reads them; verify capture order: wake_reason → timestamp → VBAT/T snapshot → counters → policy_id.
Fix: expose a minimal “wake black box” bundle: wake_reason, pin_latch, reset_cause, VBAT_min, T_max, bus error counters, and policy_id; store in retained RAM or non-volatile snapshot.
Pass criteria: wake root-cause resolvable ≥ X% from service logs; required fields present in ≥ X% units; timestamp skew ≤ X ms to the wake event.