123 Main Street, New York, NY 10001

Wake/Sleep Strategy: Frame Filters, False Wakes, Timed Wakes

← Back to: Automotive Fieldbuses: CAN / LIN / FlexRay

Wake/Sleep Strategy is an engineering policy that makes every wake explainable, suppresses false-wake storms, and protects critical events across sleep cycles.
It standardizes gates, attribution records, filter tables, timed wake windows, and black-box evidence so power targets are met without event loss (pass criteria use X placeholders).

H2-1 · Definition & Scope Guard: What “Wake/Sleep Strategy” Owns

This page defines a system-level wake/sleep policy that keeps standby power low without losing critical events. The focus is strategy engineering: power-state gating, frame-filter governance, wake attribution, timed wake windows, and “no event loss” verification.

Why this chapter exists
  • Prevent scope confusion: wake/sleep strategy is not an EMC or PHY waveform tutorial.
  • Fix vocabulary drift: terms and measurement “accounting” must be consistent across teams and releases.
  • Make success measurable: every wake has a reason, and every sleep is safe.
Scope Guard (anti-overlap rules)
Owned by this page (system strategy)
  • Power-state machine and sleep-entry gates (network + software + safety readiness).
  • Wake source taxonomy and wake attribution (evidence chain for every wake).
  • Frame-filter table design, versioning, and safe updates (governance, not PHY internals).
  • False-wake control (debounce/vote/cooldown/rate-limit) and wake-storm containment.
  • Gateway timed wakes (window scheduling) and no event loss mechanisms and verification.
Referenced only (link out; do not re-teach here)
  • Selective Wake / Partial Networking (ISO 11898-6): PHY-specific filtering internals belong on the Selective Wake page.
  • SBC power tree & rails: LDO sequencing, watchdog/reset hardware belong on SBC pages.
  • Controller/bridge specifics: protocol stack, FIFO/register-level behavior belong on Controller/Bridge pages.
  • EMC/Protection: CMC, split termination RC networks, TVS parasitics, layout return paths belong on EMC/Protection pages.
Forbidden inside this page (avoid cross-page duplication)
  • Termination/TVS/CMC value derivations and waveform tuning recipes (handled elsewhere).
  • PHY register-by-register walkthroughs; vendor feature comparisons (handled on PHY/SBC pages).
  • Security protocol details (crypto/PKI); only interface-level policy hooks are allowed here.
Glossary (measurement-grade definitions)
Term Definition (what to measure) Evidence
Sleep Lowest-power state where network participation is paused; only wake monitors remain active. Power-state flag + wake monitor armed + bus quiet.
Standby Reduced-power state with partial logic alive (e.g., timers/filters); not fully active but not deepest sleep. State machine shows Standby; periodic tasks allowed by policy.
Partial networking Only selected traffic can wake the node; non-relevant frames are ignored to save power. Filter rules loaded + filter hit counters.
Wake source Category of wake trigger: bus frame, local I/O, timer, diagnostics, safety fault, or policy event. Wake record with source_id + timestamp.
Wake attribution Evidence chain that explains each wake: what matched, what policy allowed, and what task consumed it. RuleID + counters snapshot + task_id.
No event loss Required events are neither missed (not captured) nor dropped (captured but not delivered) across sleep/wake cycles. Event counters + timestamps + delivery acknowledgments.
Measurement guardrails (avoid “accounting bugs”)
  • Standby IQ: specify rails included, sample window length, and ambient/voltage conditions (placeholders: V=X, T=X, window=X s).
  • False wake: define “no business value” precisely (e.g., no task executed, no event delivered, or filtered as noise).
  • Event loss: separate “capture loss” vs “delivery loss” and log both counters.
Success criteria (targets; fill with system thresholds)
  • Standby (Sleep) IQX µA (defined accounting window).
  • False wake rateX / day (vehicle-level).
  • Wake attribution coverageX% (unknown wake ≤ X / week).
  • Event lossX ppm (capture + delivery combined).
  • Wake-to-service latencyX ms (policy-dependent).
  • Wake storm rateX wakes/min (containment enabled).
Deliverables produced by this page
  • Policy-ready definitions, targets, and measurement accounting.
  • State-machine gates (what must be true to sleep, and what must be restored to serve).
  • Filter governance model (rules, priorities, versioning, and safe updates).
  • Wake evidence chain requirements (for serviceability and fleet debugging).
Diagram — Strategy Triangle (Power State · Wake Source · Event Integrity)
Power State Active / Prepare / Sleep IQ target, entry gates Wake Source Bus / Local / Timer Attribution required Event Integrity No event loss (capture + delivery) Timed wakes + buffers trade-offs gates filters Standby IQ ≤ X µA False wake ≤ X/day Event loss ≤ X ppm

H2-2 · System Power-State Machine: Active → PrepareSleep → Sleep → Wake → Restore

A wake/sleep strategy fails most often due to missing gate conditions, recovery races, or “accounting gaps” in evidence. This chapter provides a policy-ready state machine where each transition has entry gates, exit criteria, actions, timeouts, and failure handling.

Design principles (make sleep safe and wake serviceable)
  • Gate by evidence: sleep is permitted only when readiness checks prove stable conditions.
  • Restore deterministically: wake is complete only when services are available and logging is armed.
  • Time-bound everything: each state has a timeout and a defined fallback path.
  • Never lose causality: wake reason must be recorded before any cleanup that could erase evidence.
Sleep-entry gates (grouped to avoid blind spots)
Gate A — Network readiness
  • Bus idle windowX ms (avoid sleeping into active traffic).
  • TX/RX queues empty and no retry backlog (prevent “wake-and-immediate-transmit” churn).
  • Error counters stable over X seconds (avoid hiding a deteriorating network condition).
Quick checks
Bus utilization (%), queue depth, TEC/REC snapshot, and bus-idle timer.
Gate B — System/software readiness
  • Diagnostic session state permits sleep (no active keep-alive lock).
  • NVM flush complete (no pending commits that could be interrupted).
  • Background jobs paused or scheduled into timed-wake windows (OTA/log upload aligned to policy).
Quick checks
“Stay-awake tokens” list, job scheduler state, last NVM commit status, and watchdog mode.
Gate C — Safety/environment readiness
  • Thermal state stable (no oscillation near shutdown thresholds).
  • Brownout risk controlled (supply droop counters not increasing).
  • Critical fault actions complete (e.g., safe-state latched if required).
Quick checks
Rail monitor logs, brownout counter, thermal flags, and last reset reason.
Policy-ready state machine (with gates, actions, timeouts)
State Entry criteria Actions Exit criteria Timeout X Evidence Failure handling
Active Normal operation; wake policy monitors armed. Update counters; maintain attribution log; prepare sleep token evaluation. Sleep request + gates A/B/C satisfied. X Bus idle timer; token list; counter snapshots. If gates fail: remain Active and log gate blocker.
PrepareSleep Active with gates trending stable; final drain allowed. Drain queues; commit NVM; arm wake monitors; persist “sleep intent” record. All drains complete + gates A/B/C satisfied. X Queue depth=0; NVM ok; monitors armed flag. If timeout: abort sleep → Active; log which drain/gate blocked.
Sleep PrepareSleep completed; deepest allowed policy state. Enable filters; limit timers; capture baseline IQ; hold minimal retention log. Wake trigger detected (bus/local/timer/diag/fault). IQ sample window; filter hit counters. If repeated false wakes: apply cooldown/rate-limit policy (see H2-5).
Wake Wake trigger latched; evidence capture must run first. Record wake reason; snapshot counters; start clocks; switch transceiver mode; open logging. Services ready + policy allows full restore. X Wake record persisted before cleanup. If timeout: escalate to safe minimal Active; keep evidence for service.
Restore Wake stage completed; full rejoin and buffers restored. Reload policy; resubscribe; restore buffers; reconcile counters; schedule next timed wake window. System operational OR policy chooses return to Sleep. X Service-ready marker + buffer integrity checks. If restore fails: remain Active; degrade features; raise diagnostic event.
Anti-race invariants (prevent “wake then sleep” flapping)
  • Single authority: only one policy owner decides state transitions; other modules request via tokens.
  • Sticky wake reason: once latched, the wake record is immutable for X seconds.
  • Minimum awake dwell: after a wake, remain awake for at least X ms unless safety demands otherwise.
  • Gate hysteresis: require gates to remain satisfied for X ms before entering PrepareSleep.
Non-owned reminder
Power-tree sequencing and rail-level behaviors belong on SBC pages. EMC/termination/TVS value tuning belongs on the EMC/Protection page.
Diagram — 5-State Power Policy Machine (gated, time-bounded)
Active Operate + monitor PrepareSleep Drain / Flush / Arm Timeout X Sleep Filters + timers Wake Capture evidence Restore Rejoin + buffers Gates OK Armed Trigger Services Operational Back to Sleep Gate A/B/C stable Evidence first (wake reason)

H2-3 · Wake Sources & Attribution: Make Every Wake Explainable

A low-power system becomes unserviceable when wakes are not explainable. A wake attribution model turns “it woke up” into an evidence chain: what triggered, what matched, what ran, and whether value was delivered.

What this chapter delivers
  • Wake source taxonomy with measurement-grade evidence fields per source.
  • Wake Record schema (black-box) that preserves causality through sleep/wake cycles.
  • Conflict resolution rules for multi-source triggers within a single wake episode.
  • Metrics that quantify attribution quality and unknown wake rate.
Wake source taxonomy (evidence + expected value)
Source class Trigger evidence (minimum) Expected value (minimum) False-wake signal
Bus wake rule_id + frame_id + match flags + filter hit counter delta task executed and/or required event delivered within X ms wake with no task/action and immediate return-to-sleep
Local I/O io_id + edge + debounce window + input snapshot local function enabled or event delivered to gateway/app repeated edges within cooldown window (rate-limit violation)
Timer timer_id + window_id + schedule_version + jitter margin maintenance tasks completed inside window; next wake scheduled window wake with no executed tasks (empty window)
Network management nm_state + token reason + peer status snapshot network consistency restored or policy gate updated nm wake loops (oscillation) across X minutes
Diagnostics session_id + requester + service class + gate decision service window opened and request handled within X ms diagnostic keep-alive holds system awake unexpectedly
Fault fault_id + thermal/brownout snapshot + last reset reason safe-state action executed and evidence preserved re-trigger storms without environment changes
Scope guard
Security authentication details are out of scope. This page records only policy-relevant gates and outcomes.
Wake Record (black-box) — fixed schema

A wake record must be written before any cleanup that can erase causality (queue draining, counter resets, logging rotation). The schema below is structured for fleet analytics and service diagnostics.

Field group Fields (examples) Why it exists
Identity record_id, boot_id, uptime_ms, vehicle_time Enables correlation across reboots, ECUs, and fleet logs.
Source & trigger source_class, source_id, trigger_detail, rule_id, frame_id Answers “what triggered” in measurement-grade terms.
Context snapshot state_before, bus_util, queue_depth, TEC/REC, filter_hit_deltas, rail/thermal flags Explains “why it happened then” and exposes gating gaps.
Consequence task_id, action_taken, event_delivered, awake_dwell_ms, result_code Separates value wakes from false wakes (no task/no delivery).
Attribution quality confidence, unknown_reason_code, conflict_resolved_by, episode_id Makes “unknown wake” measurable and fixable across the fleet.
Pass criteria (placeholders)
  • Attribution completenessX% over the last Y days.
  • Unknown wakesX / week per vehicle (report by ECU/domain, not only average).
  • Value wake ratioX% (wakes with executed task and/or delivered event).
Multi-source conflicts (single episode attribution rules)

Multiple triggers can occur within a short time window. Attribution must group triggers into a single wake episode and resolve conflicts deterministically.

Rule set (policy template)
  1. Episode grouping: triggers inside Δt = X ms become one episode_id.
  2. Priority order: Fault > Diagnostics > Local I/O > Timer > Bus wake > Network management.
  3. Evidence check: if top-priority lacks required evidence fields, downgrade to the next source class.
  4. Outcome binding: record the executed task_id and delivery result under the chosen source class.
Diagram — Wake Evidence Pipeline (Sources → Filters → Attribution → Policy → Sleep Decision)
Sources Bus wake Local I/O Timer NM Diagnostics Fault Filters Frame rules Debounce Attribution Wake record Conflict Quality Policy Gates Actions Decision Stay awake Back sleep Completeness ≥ X% · Unknown ≤ X/week Value wake ratio ≥ X%

H2-4 · Frame-Filter Tables: Design, Versioning, and Safe Updates

Frame filters must be treated as a config asset, not a one-off tweak. A maintainable filter table requires: a stable match schema, explicit actions, ownership, test cases, version scope, and observability (hit-rate and leak signals).

Design rules (keep filters safe and debuggable)
  • Match keys must explain risk: wider masks increase leak risk; strict payload patterns increase miss risk.
  • Actions must be graded: IGNORE, WAKE_MIN, WAKE_FULL (tie into the power-state machine).
  • Every rule must be owned: owner + testcase + rollback strategy.
  • Observability is mandatory: rule_id and hit counters must be exportable for attribution.
Default deny vs default allow (risk model)
Policy Upside Risk Typical use
Default deny Lower false wakes; predictable standby power. Miss risk if new required event is not whitelisted. Production low-power strategy with timed-wake fallback.
Default allow Lower miss risk; easier bring-up and diagnostics. Higher false wakes; wake storms; standby IQ increases. Time-bounded debug/diagnostic windows with strict expiry.
Recommended production posture
Default deny with explicit whitelists, plus gateway timed wakes as a safety net for periodic maintenance and “no event loss” requirements.
Filter Table template (maintainable config asset)
RuleID Match Action Priority Owner Testcase Version scope Telemetry
R-CRIT-001 ID/mask + DLC + payload pattern + cycle constraint + domain tag WAKE_FULL High System policy TC-001 SW vX.Y, vehicle A hit_count, wake_count, leak_flag
R-MAINT-010 ID/mask + cycle constraint (window aligned) + domain tag WAKE_MIN Med Gateway TC-010 SW vX.Y, domain B hit_count, dwell_ms
Scope guard
No discussion of ISO 11898-6 PHY internal filter implementation. This chapter covers system governance, schema, and safe lifecycle.
Versioning & safe updates (OTA-ready lifecycle)
  1. Schema compatibility: new table fields remain parseable; unknown fields fail closed.
  2. Shadow deploy: download rules but do not wake; record potential matches and hit-rate.
  3. Canary enable: enable on a small fleet slice; watch false wakes, unknown wakes, and event loss.
  4. Rollback plan: define trigger thresholds (X) and revert to last known-good version.
Pass criteria (placeholders)
  • Filter leak false wakesX / day (attributed by rule_id).
  • Top-N hit-rate stable within X% across Y days after rollout.
  • Rollback trigger: event loss > X ppm or unknown wakes > X/week in canary.
Diagram — Filter Table Layers (Critical / Maintenance / Diagnostic) with Priority
Filter Table (config asset) Critical WAKE_FULL Maintenance WAKE_MIN Diagnostic Time-bounded Priority High → Low Lifecycle Shadow Canary Enable Rollback Leak false wake ≤ X/day Top-N hit-rate stable Rollback if > X

H2-5 · False-Wake Control: Reduce Noise, Debounce Events, Rate-Limit Wakes

False wakes waste standby energy and can escalate into a wake storm. A robust control layer must make noisy triggers hard to satisfy (debounce/vote), keep damage bounded (cooldown/rate limit), and preserve an evidence trail for fleet debugging.

What this chapter delivers
  • Three-gate control: Debounce → Vote → Rate limit (deterministic, parameterized).
  • Wake storm detection: wake rate, source skew, and bus-activity correlation signals.
  • Mitigation actions: degrade modes, temporary bans, and alert/report hooks.
  • Strategy table with side-effects and observability requirements.
Core mechanisms (parameterized controls)
Debounce (time windows)
  • Stable-for window: require trigger persistence for X ms before admitting wake.
  • Integrate window: accumulate evidence for X ms; wake only if score ≥ X.
  • Max-gap window: require repeated confirmations with gaps ≤ X ms to avoid single spikes.
Vote (multi-condition gating)
  • AND gate: require two independent signals (e.g., rule hit + cycle constraint) before waking.
  • OR gate with priorities: allow high-priority sources while bounding low-priority noise.
  • Episode binding: vote decisions reference a single episode_id for consistent attribution.
Cooldown & rate limit (damage bounding)
  • Wake rate threshold: storm if ≥ X wakes/min (per ECU/domain).
  • Cooldown: enforce minimum X sec between admitted wakes for low-priority sources.
  • Token bucket: allow bursts up to X tokens then throttle to X/min.
Wake storm detection (observability signals)
  • Wake rate: episodes/min over a sliding window X sec.
  • Source skew: top source_id/rule_id share ≥ X% (suggests a single noisy origin).
  • Bus-activity correlation: compare wake timestamps with bus utilization and filter hit deltas.
  • Cooldown violations: repeated attempts inside cooldown (attempt_count ≥ X).
Pass criteria (placeholders)
  • Wake storm threshold = X wakes/min; storm duration ≤ X min before mitigation succeeds.
  • Cooldown = X sec; cooldown violation attempts ≤ X/hour.
  • False wake rateX/day and unknown wakesX/week (per vehicle).
Mitigation actions (degrade, ban, alert)
Degrade strategy (bounded service)
  • WAKE_MIN: run only the minimal task set; cap awake dwell to X ms.
  • Task shedding: disable non-critical tasks while storm mode is active.
  • Adaptive windows: increase debounce window and cooldown during storm.
Temporary bans (source-scoped)
  • Blocklist: ban source_id/rule_id for X minutes (auto-expire).
  • Tiered bans: ban only low-priority layers; keep critical wakes admissible.
  • Evidence required: write episode_id + top sources + counters before applying bans.
Alerts & reporting hooks
  • storm_start / storm_end events with top sources and parameter snapshots.
  • Serviceability payload: source skew histogram + correlation hints + cooldown/bans applied.
  • Black-box alignment: link to Wake Record fields (episode_id, source_id, result_code).
False-wake control strategy table (mechanism → parameters → side-effects → observability)
Mechanism When Parameter X Side-effect Observability
Debounce Noisy triggers; unstable edges; short spikes window = X ms Added latency; miss risk if too strict attempt_count, stable_score, episode_id
Vote Reduce single-point noise; require independent confirmation k-of-n = X Complexity; needs clear priority rules vote_inputs, chosen_source, confidence
Cooldown Repeated wakes from same source in short windows X sec May delay legitimate low-priority events cooldown_hits, violation_attempts
Rate limit Wake storms or bursty attempts X wakes/min May increase miss risk if applied globally wake_rate, source_skew, storm_state
Temporary ban Single source dominates; repeated violations ban TTL = X min Potential miss for that source; must avoid critical bans banned_source_id, ban_reason_code
Scope guard
No deep EMC mechanisms are covered here. This chapter specifies strategy hooks, parameters, and measurable signals only.
Diagram — Three-Gate False-Wake Control (Trigger → Debounce/Vote → Rate limit → Wake/Ignore)
Trigger Bus Local I/O Timer NM Diag Gate 1 Debounce Window Stable Gate 2 Vote AND / OR Priority Gate 3 Rate limit Tokens Cooldown WAKE_FULL WAKE_MIN IGNORE Signals wake_rate · source_skew · correlation

H2-6 · Gateway Timed Wakes: Scheduling Windows to Avoid Event Loss

Passive wakes alone cannot guarantee “no event loss” when gateways sleep. Timed wakes create controlled service windows for periodic tasks and improve event capture probability while bounding duty cycle and overlap across domains.

Three timed-wake types (task-driven windows)
  • Patrol: health checks, counter snapshots, policy synchronization.
  • Aggregation & upload: batch logs/telemetry and serviceability payloads.
  • Critical domain sync: cross-domain alignment and short maintenance windows.
Window design parameters (period, on-time, guard, jitter budget)
  • Period (P): repeat interval between windows.
  • On-time (T_on): window open duration to run required tasks.
  • Guard time (T_guard): buffer before/after to absorb drift and scheduling delays.
  • Jitter budget (J): allowed wake-time deviation (placeholder).
  • Duty cycle: T_on / P, directly tied to standby power.
Pass criteria (placeholders)
  • Event capture probabilityX% over the last Y days.
  • Window overlap ratioX% across Domain A/B/C schedules.
  • Jitter within budgetX% of windows (late/early windows bounded).
Multi-domain coordination (staggered scheduling)
  • Define cadence: each domain owns a base Period and required On-time.
  • Stagger windows: avoid overlap for high-rate windows; align only when necessary.
  • Resolve conflicts: preserve critical windows; degrade others to WAKE_MIN if overlap is unavoidable.
  • Record evidence: log window_id, actual_on_time, and overlap flags for optimization.
Scope guard
DoIP/OTA session flows are out of scope. This chapter covers timed windows, scheduling, and measurable capture/overlap metrics only.
Wake Window plan table (Domain / Period / On-time / Guard / Purpose / Pass criteria)
Domain Period (P) On-time (T_on) Guard (T_guard) Purpose Pass criteria
Domain A X s X ms X ms Patrol + counters capture ≥ X%
Domain B X s X ms X ms Aggregation/upload overlap ≤ X%
Domain C X s X ms X ms Critical sync jitter ok ≥ X%
Diagram — Multi-domain Timed Wake Windows (Domain A/B/C) with “event arrives”
Timed wake windows (with guard time) time Domain A Domain B Domain C Window Window Window Event Event Event On-time Guard Event arrives Overlap ratio ≤ X%

H2-7 · “No Event Loss” Design: Buffering, Holdover, and Resync Rules

“No event loss” is a contract, not a slogan. It must be enforced by explicit rules across three phases: sleep entry (drain/commit), sleep holdover (retain), and wake restore (replay/resync).

What this chapter delivers
  • Buffering rules for sleep entry drain, wake restore replay, and queue sizing.
  • Holdover contract for critical events (retain/retry with bounds).
  • Resync sequence to restore subscriptions, filter tiers, and counters safely.
  • Event Integrity Checklist (evidence fields + pass criteria placeholders).
Event integrity contract (minimal, measurable)
  • Event evidence: event_id · seq · timestamp · producer_id · domain_id.
  • Event classes: Critical · Maintenance · Telemetry (drop policy depends on class).
  • Delivery states: delivered · buffered · dropped · re-sent (every transition has a reason code).
Pass criteria (placeholders)
  • Max acceptable event gap = X ms (delivery gap across sleep/wake boundaries).
  • Buffer depth = X events (must cover peak burst + window coverage).
  • Drop rateX ppm (by class and by producer).
Buffering rules (entry drain → hold → restore replay)
Sleep entry drain (gate to sleep)
  • Drain targets: pending tx/rx events, local processing queue, persistence queue.
  • Exit conditions: pending_events = 0 OR drain_timeout_hit = true (timeout = X ms).
  • Evidence: last_seq_snapshot, drain_result_code, pending_by_class.
Queue bounds & drop policy (explainable loss)
  • Critical: no drops; retain and retry or escalate wake mode.
  • Maintenance: allow bounded drops with drop_reason_code and counters.
  • Telemetry: allow sampling/aggregation to protect storage and CPU budget.
  • Every drop is logged: drop_count_by_class, top_producer, and peak_depth.
Wake restore replay (bounded recovery)
  • Replay order: by seq/timestamp (monotonic delivery).
  • Replay bounds: replay_count ≤ X events OR replay_time ≤ X ms.
  • Fail-safe: if bounds exceeded, enter WAKE_MIN and raise integrity_risk flag.
Holdover & resync (retain critical events, then restore consistency)
Holdover (critical event retention)
  • Retain-until-delivered: critical events stay in holdover until delivered or escalated.
  • Retry rules: retry_count ≤ X with backoff/cooldown (avoid wake storms).
  • Evidence: holdover_queue_depth, oldest_age_ms, last_retry_reason.
Resync sequence (after wake)
  1. Resubscribe: restore consumer subscriptions (topic/domain list).
  2. Enable filters by tier: Critical → Maintenance → Diagnostic (avoid early noise).
  3. Counter resync: align seq/timestamp and validate jumps (jump_threshold = X).
If resync_timeout_hit = true (timeout X ms), enter WAKE_MIN and log resync_fail_reason_code.
Event Integrity Checklist (evidence fields + pass criteria)
Phase Check Evidence fields Pass criteria
Before sleep Drain complete or bounded by timeout pending_by_class · drain_timeout_hit · last_seq_snapshot drain_time ≤ X ms
During sleep Holdover queue bounded, no critical drops holdover_depth · critical_drop_count · oldest_age_ms critical_drop_count = 0
After wake Replay bounded, resync succeeds replay_count · resync_fail_reason · max_gap_ms max_gap ≤ X ms
Fleet view Drop rate and unknown integrity risks bounded drop_ppm_by_class · integrity_risk_count drop ≤ X ppm
Scope guard
No controller FIFO register details are included. This chapter specifies system-level event integrity rules and evidence only.
Diagram — Event Funnel (Producers → Buffer → Gateway awake window → Consumer)
Producers Sensors Body Chassis Diag NM Classifier Critical Class Maintenance Class Telemetry Buffer Ring Snapshot Drop policy Awake Window Consumer Metrics gap_ms · depth_events · drop_ppm

H2-8 · Diagnostics & Logging: Black-Box for Wake/Sleep Failures

Intermittent wake/sleep failures cannot be fixed without a black-box record. The black-box must capture counters, snapshots, and triggered traces with bounded storage and clear export rules.

Must-have counters (minimal, high value)
  • Wake layer: wake_count · reason_histogram · unknown_wake_count · storm_state transitions.
  • Policy layer: rule_version · policy_version · filter_hit_topN · window_id and overlap flags.
  • Bus health: bus_utilization · error_counters_snapshot · bus_off_count (statistics only).
  • Integrity: buffer_depth_peak · drop_count_by_class · max_gap_ms · replay_count · resync_fail_count.
Retention model (ring buffer + triggered capture)
  • Ring buffer: always-on counters and compact episodes to guarantee retention ≥ X hours.
  • Sampling rules: telemetry can be downsampled; critical events are not downsampled.
  • Triggered capture: on storm_start, resync_fail, drop_rate_exceed, bus_off — capture a bounded bundle.
  • Export bundle: packaged records with versions and integrity checks for service tooling.
Pass criteria (placeholders)
  • RetentionX hours for counters + episode summaries.
  • Unknown wake MTBFX days (unknown wake interval).
  • Export timeX seconds for a standard time range.
Black-box field table (fields + update mode + retention)
Field Type Update mode Rate Retention
episode_id u32 event-driven per wake ≥ X hours
source_id / rule_id u16/u16 event-driven per episode ≥ X hours
reason_histogram fixed bins periodic + event X sec ≥ X hours
filter_hit_topN u32[N] periodic X sec ≥ X hours
buffer_depth_peak u16 periodic + event X sec ≥ X hours
export_bundle_version u16 on export per export N/A
Triggered capture table (Trigger / Data captured / Retention)
Trigger Data captured Retention
storm_start top sources · cooldown/bans · wake_rate window · counters snapshot ≥ X hours
resync_fail policy/table versions · replay summary · fail_reason_code ≥ X hours
drop_rate_exceed drop counts by class · peak buffer depth · top producers ≥ X hours
bus_off error counters snapshot · utilization window · policy versions ≥ X hours
Scope guard
No physical-layer waveform measurement is included. This chapter covers counters, snapshots, triggered summaries, and export packaging only.
Diagram — Black-Box Stack (Counters / Snapshots / Triggered traces / Export)
Counters Snapshots Triggered traces Export bundle R S E Retention Storage Export Trigger

H2-7 · “No Event Loss” Design: Buffering, Holdover, and Resync Rules

“No event loss” is a contract, not a slogan. It must be enforced by explicit rules across three phases: sleep entry (drain/commit), sleep holdover (retain), and wake restore (replay/resync).

What this chapter delivers
  • Buffering rules for sleep entry drain, wake restore replay, and queue sizing.
  • Holdover contract for critical events (retain/retry with bounds).
  • Resync sequence to restore subscriptions, filter tiers, and counters safely.
  • Event Integrity Checklist (evidence fields + pass criteria placeholders).
Event integrity contract (minimal, measurable)
  • Event evidence: event_id · seq · timestamp · producer_id · domain_id.
  • Event classes: Critical · Maintenance · Telemetry (drop policy depends on class).
  • Delivery states: delivered · buffered · dropped · re-sent (every transition has a reason code).
Pass criteria (placeholders)
  • Max acceptable event gap = X ms (delivery gap across sleep/wake boundaries).
  • Buffer depth = X events (must cover peak burst + window coverage).
  • Drop rateX ppm (by class and by producer).
Buffering rules (entry drain → hold → restore replay)
Sleep entry drain (gate to sleep)
  • Drain targets: pending tx/rx events, local processing queue, persistence queue.
  • Exit conditions: pending_events = 0 OR drain_timeout_hit = true (timeout = X ms).
  • Evidence: last_seq_snapshot, drain_result_code, pending_by_class.
Queue bounds & drop policy (explainable loss)
  • Critical: no drops; retain and retry or escalate wake mode.
  • Maintenance: allow bounded drops with drop_reason_code and counters.
  • Telemetry: allow sampling/aggregation to protect storage and CPU budget.
  • Every drop is logged: drop_count_by_class, top_producer, and peak_depth.
Wake restore replay (bounded recovery)
  • Replay order: by seq/timestamp (monotonic delivery).
  • Replay bounds: replay_count ≤ X events OR replay_time ≤ X ms.
  • Fail-safe: if bounds exceeded, enter WAKE_MIN and raise integrity_risk flag.
Holdover & resync (retain critical events, then restore consistency)
Holdover (critical event retention)
  • Retain-until-delivered: critical events stay in holdover until delivered or escalated.
  • Retry rules: retry_count ≤ X with backoff/cooldown (avoid wake storms).
  • Evidence: holdover_queue_depth, oldest_age_ms, last_retry_reason.
Resync sequence (after wake)
  1. Resubscribe: restore consumer subscriptions (topic/domain list).
  2. Enable filters by tier: Critical → Maintenance → Diagnostic (avoid early noise).
  3. Counter resync: align seq/timestamp and validate jumps (jump_threshold = X).
If resync_timeout_hit = true (timeout X ms), enter WAKE_MIN and log resync_fail_reason_code.
Event Integrity Checklist (evidence fields + pass criteria)
Phase Check Evidence fields Pass criteria
Before sleep Drain complete or bounded by timeout pending_by_class · drain_timeout_hit · last_seq_snapshot drain_time ≤ X ms
During sleep Holdover queue bounded, no critical drops holdover_depth · critical_drop_count · oldest_age_ms critical_drop_count = 0
After wake Replay bounded, resync succeeds replay_count · resync_fail_reason · max_gap_ms max_gap ≤ X ms
Fleet view Drop rate and unknown integrity risks bounded drop_ppm_by_class · integrity_risk_count drop ≤ X ppm
Scope guard
No controller FIFO register details are included. This chapter specifies system-level event integrity rules and evidence only.
Diagram — Event Funnel (Producers → Buffer → Gateway awake window → Consumer)
Producers Sensors Body Chassis Diag NM Classifier Critical Class Maintenance Class Telemetry Buffer Ring Snapshot Drop policy Awake Window Consumer Metrics gap_ms · depth_events · drop_ppm

H2-8 · Diagnostics & Logging: Black-Box for Wake/Sleep Failures

Intermittent wake/sleep failures cannot be fixed without a black-box record. The black-box must capture counters, snapshots, and triggered traces with bounded storage and clear export rules.

Must-have counters (minimal, high value)
  • Wake layer: wake_count · reason_histogram · unknown_wake_count · storm_state transitions.
  • Policy layer: rule_version · policy_version · filter_hit_topN · window_id and overlap flags.
  • Bus health: bus_utilization · error_counters_snapshot · bus_off_count (statistics only).
  • Integrity: buffer_depth_peak · drop_count_by_class · max_gap_ms · replay_count · resync_fail_count.
Retention model (ring buffer + triggered capture)
  • Ring buffer: always-on counters and compact episodes to guarantee retention ≥ X hours.
  • Sampling rules: telemetry can be downsampled; critical events are not downsampled.
  • Triggered capture: on storm_start, resync_fail, drop_rate_exceed, bus_off — capture a bounded bundle.
  • Export bundle: packaged records with versions and integrity checks for service tooling.
Pass criteria (placeholders)
  • RetentionX hours for counters + episode summaries.
  • Unknown wake MTBFX days (unknown wake interval).
  • Export timeX seconds for a standard time range.
Black-box field table (fields + update mode + retention)
Field Type Update mode Rate Retention
episode_id u32 event-driven per wake ≥ X hours
source_id / rule_id u16/u16 event-driven per episode ≥ X hours
reason_histogram fixed bins periodic + event X sec ≥ X hours
filter_hit_topN u32[N] periodic X sec ≥ X hours
buffer_depth_peak u16 periodic + event X sec ≥ X hours
export_bundle_version u16 on export per export N/A
Triggered capture table (Trigger / Data captured / Retention)
Trigger Data captured Retention
storm_start top sources · cooldown/bans · wake_rate window · counters snapshot ≥ X hours
resync_fail policy/table versions · replay summary · fail_reason_code ≥ X hours
drop_rate_exceed drop counts by class · peak buffer depth · top producers ≥ X hours
bus_off error counters snapshot · utilization window · policy versions ≥ X hours
Scope guard
No physical-layer waveform measurement is included. This chapter covers counters, snapshots, triggered summaries, and export packaging only.
Diagram — Black-Box Stack (Counters / Snapshots / Triggered traces / Export)
Counters Snapshots Triggered traces Export bundle R S E Retention Storage Export Trigger

H2-11 · Applications: Where Wake/Sleep Strategy Dominates

Intent: map wake/sleep strategy to real vehicle domains using repeatable policy patterns and measurable outcomes. This section avoids hardware implementation details and avoids bridge/protocol deep dives; it focuses on policy hooks, evidence fields, and pass criteria placeholders (X).
Domain A · Body / Comfort (LIN swarm + CAN gateway night mode)
Strategy dominates when node count is high: false-wake probability scales with population, and serviceability depends on explainable wake attribution.
Example material numbers (reference)
  • LIN transceiver: TI TLIN1029-Q1; NXP TJA1021
  • Partial-networking CAN transceiver: TI TCAN1145-Q1; NXP TJA1145
  • Mini CAN SBC (MCU supply + watchdog + CAN): NXP UJA1169
Pattern: LIN swarm night policy
Goal: Sleep Iq ≤ X µA and unknown wake ≤ X/week while keeping wake latency p95 ≤ X ms.
Pitfall: “bus activity wake” becomes unexplainable without source_id/rule_id and hysteresis; service teams cannot isolate the culprit node.
Measurement: reason histogram, unknown_wake_count, wake_rate/min, per-source cooldown_applied ratio, wake latency distribution by source.
Pattern: Gateway rendezvous windows
Goal: event capture probability ≥ X% with window overlap ≤ X% and bounded wake duty cycle ≤ X%.
Pitfall: wake windows drift or overlap unintentionally, causing either event loss or battery drain.
Measurement: window schedule logs, overlap rate, captured/expected counters, max_event_gap_ms, drop_ppm_by_class.
Domain B · Powertrain / Chassis (cross-domain wake gating)
Strategy dominates when “wake ≠ usable”: restore/resync gates and critical-event integrity rules define safety and reliability.
Example material numbers (reference)
  • Isolated CAN transceiver: TI ISO1042-Q1; NXP TJA1052i
  • CAN FD transceiver (non-isolated): Microchip MCP2562FD; TI TCAN1042-Q1
  • MCU/SoC (policy + logging): Infineon AURIX TC3xx; NXP S32K3
Pattern: Gated wake with restore criteria
Goal: attribution completeness ≥ X% and resync success ≥ X% within X ms.
Pitfall: accepting “wake” before filters/queues/counters are consistent causes phantom loss and intermittent faults.
Measurement: resync_fail_count, restore_time_ms (p95), unknown_wake_rate, critical_event_drop_ppm.
Pattern: Critical event holdover contract
Goal: max acceptable event gap ≤ X ms; buffer depth ≥ X events; drop rate ≤ X ppm (critical class).
Pitfall: bounded buffers without class-aware policy lead to “silent loss” during repeated sleep cycles.
Measurement: max_gap_ms, buffer_high_watermark, replay_count, drop_ppm_by_class.
Domain C · TCU / Diagnostics (timed wakes + background jobs)
Strategy dominates when scheduled wake windows must coexist with low-power budgets and serviceability exports.
Example material numbers (reference)
  • Gateway SoC: NXP S32G2; Renesas R-Car
  • External CAN FD controller (SPI): Microchip MCP2517FD
  • CAN FD transceiver: Microchip MCP2562FD; NXP TJA1044
Pattern: Window budget + job scheduling
Goal: wake duty cycle ≤ X% while meeting capture probability ≥ X% and export latency ≤ X s.
Pitfall: background jobs overrun wake windows, forcing uncontrolled “stay-awake” behavior or losing events at the boundary.
Measurement: window_on_time_ms, job_overrun_count, export_bundle_size, window_capture_probability.
Pattern: Serviceability export bundle
Goal: retention ≥ X hours with reproducible wake evidence for intermittent failures.
Pitfall: logging is either too sparse (no proof) or too heavy (power/flash wear).
Measurement: ring_buffer_fill, trigger_hit_rate, export_success_rate, unknown_wake_MTBF ≥ X days.
Diagram: Domain → Gateway → Sleep Policy (pattern view)
Domain gateway pattern: domains feed gateway policy and evidence, producing bounded sleep decisions Body/Comfort Wake sources + filter tiers Powertrain/Chassis Gated restore + integrity TCU/Diagnostics Timed windows + budgets Gateway policy hub Attribution (source_id / rule_id) Window scheduler (period / guard) Black-box (counters / snapshots) Sleep decision Entry gate (drain / flush) Holdover + replay rules Resync (subscribe / counters) Shared metrics Sleep Iq ≤ X · false wake ≤ X/day · unknown wake ≤ X/week · drop ≤ X ppm · max gap ≤ X ms

H2-12 · IC Selection Logic: What Features Matter for Wake/Sleep Success

Intent: select capabilities that make wake/sleep policy measurable and reliable (not a shopping list). This section provides example material numbers as references, but the decision criteria stays capability-first with verification hooks and targets (X).
Decision tree (If…then…)
  1. If partial networking / selective wake is required, then require a transceiver/SBC that supports selective wake identification and exposes wake reason fields. Verify: unknown wake ≤ X/week; attribution completeness ≥ X%.
  2. If timed wakes are required (rendezvous windows), then require a reliable low-power timer/clock path and bounded wake scheduling hooks. Verify: capture probability ≥ X%; duty cycle ≤ X%; window overlap ≤ X%.
  3. If “no event loss” is a hard contract, then require host-side buffering/holdover/replay support and fail-safe receive behavior during transitions. Verify: drop ≤ X ppm; max gap ≤ X ms; replay_count bounded.
  4. If serviceability is mandatory, then require black-box export hooks (counters + snapshots + retention) and stable table/version identifiers. Verify: retention ≥ X hours; export success ≥ X%.
Capability metrics table (Spec → Why → Target → Verify)
Spec Why it matters Target (X) Verify
Standby / Sleep IQ Defines battery drain floor and constrains window duty cycle. Sleep Iq ≤ X µA Unified IQ method; report median/p95/max over X s window.
Selective wake / partial networking Reduces false wakes and enables domain-level power saving. false wake ≤ X/day Measure reason histogram + filter hit top-N; unknown wake ≤ X/week.
Wake reason reporting Makes every wake explainable and serviceable. coverage ≥ X% Wake record fields: source_id / rule_id / timestamp / counters.
Timer + low-power clock path Enables rendezvous windows without staying awake. capture ≥ X% Window overlap ≤ X%; duty cycle ≤ X%; guard time = X ms.
Fail-safe behavior during transitions Prevents event loss and phantom faults at wake/restore boundaries. drop ≤ X ppm Max gap ≤ X ms; resync success ≥ X% within X ms.
System role split (capability ownership)
SBC
Owns low-power policy hooks (rails/enable), watchdog/reset policy, and stable standby modes. Example: NXP UJA1169.
Transceiver
Owns selective wake entry points, wake reason signals, and bus/local wake observables in low-power modes. Examples: TI TCAN1145-Q1, NXP TJA1145, TI TLIN1029-Q1.
MCU / SoC
Owns the state machine, filter-table versioning, window scheduling, black-box logging, and integrity rules (buffer/holdover/replay/resync). Examples: NXP S32K3, NXP S32G2, Infineon AURIX TC3xx.
Example material numbers by capability (reference set)
This reference set is not exhaustive. Variants/suffixes differ by package, temperature grade, and OEM constraints. Use the verification hooks above to validate suitability.
Capability / role Example parts Typical verification focus
Selective wake / partial networking CAN TI TCAN1145-Q1 · NXP TJA1145 unknown wake ≤ X/week · false wake ≤ X/day · wake reason coverage ≥ X%
CAN FD transceiver (general) Microchip MCP2562FD · TI TCAN1042-Q1 · NXP TJA1044 wake latency p95 ≤ X ms · drop ≤ X ppm at boundaries
Isolated CAN (cross-domain) TI ISO1042-Q1 · NXP TJA1052i resync success ≥ X% · integrity risk counters stable under domain transitions
LIN transceiver (low-power + wake) TI TLIN1029-Q1 · NXP TJA1021 Sleep Iq ≤ X µA · false wake/day on LIN ≤ X · bus/local wake attribution
CAN SBC (MCU supply + watchdog + CAN) NXP UJA1169 standby behavior repeatable · wake source export stable · version identifiers readable
External CAN FD controller (SPI) Microchip MCP2517FD sleep entry drain behavior · FIFO/queue boundary logging · drop_ppm accounting
Diagram: Selection flow (Requirements → Capabilities → Verification)
Selection flow: requirements map to capabilities, which map to verification points Requirements Capabilities Verification Selective wake / PN Timed wake windows No event loss contract Serviceability evidence Wake reason reporting Filter depth + version hooks Low-power timer/clock Fail-safe transition behavior Black-box export (retention) unknown wake ≤ X/week capture ≥ X% · overlap ≤ X% drop ≤ X ppm · max gap ≤ X retention ≥ X hours

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13 · FAQs (10) — fixed 4-line answers + JSON-LD

These FAQs only close troubleshooting long-tail within the Wake/Sleep Strategy scope: power-state policy, attribution, filter-table governance, false-wake control, timed wakes, and “no event loss” integrity. Each answer is strictly 4 lines: Likely cause / Quick check / Fix / Pass criteria (X placeholders).
Sleep Iq occasionally spikes — wake storm or a module not entering low power?
Likely cause: wake-rate bursts keep the system in short active loops, or a subsystem never transitions into its low-power state.
Quick check: correlate SleepIq with wake_rate/min and state residency; verify whether any module shows “active residency” during nominal sleep.
Fix: enable storm detection + cooldown (X sec) and add explicit per-module sleep-entry gates with evidence fields.
Pass criteria: Sleep Iq ≤ X µA over Y hours; wake_rate ≤ X/min; unknown_wake ≤ X/week.
A node false-wakes but logs show no frames — filter-table version or debounce/cooldown?
Likely cause: selective-wake rules are missing/incorrect after an update (version mismatch), or local/timer wakes bypass debounce/cooldown.
Quick check: compare policy_version + filter_table_hash against expected; review rule_hit_topN and local-wake counters in the same time window.
Fix: enforce rule versioning + rollback, and apply local-wake debounce/vote/cooldown with a recorded source_id.
Pass criteria: false wake ≤ X/day; unknown_wake ≤ X/week; attribution coverage ≥ X%.
Wakes up then immediately goes back to sleep — gate-condition race or window too short?
Likely cause: entry/exit gates evaluate in the wrong order (race), or the timed window on-time is shorter than restore/resync time.
Quick check: inspect transition_trace and gate blockers; compare restore_time_ms (p95) against window_on_time_ms minus guard.
Fix: make gates monotonic (single owner, explicit ordering) and extend on-time or guard to cover p95 restore/resync.
Pass criteria: no “wake→sleep” oscillation within X min; restore p95 ≤ (on-time − guard); wake latency p95 ≤ X ms.
Gateway timed wakes still lose events — buffer drain/restore or insufficient guard time?
Likely cause: sleep-entry drain or wake-restore replay is incomplete, or the window guard does not cover scheduling jitter.
Quick check: confirm drain_status before sleep and replay_count after wake; compare observed jitter to guard_time_ms and max_gap_ms spikes.
Fix: make drain a hard gate, bound replay with class-aware priority, and increase guard time to cover jitter budget X.
Pass criteria: drop ≤ X ppm over Y cycles; max_gap ≤ X ms; drain success ≥ X%; window capture ≥ X%.
False wakes spike in winter only — missing attribution fields or threshold drift?
Likely cause: attribution payload is incomplete (no proof to separate sources), or thresholds are temperature-sensitive and drift across seasons.
Quick check: compare reason histograms by temperature bucket; verify attribution_fields_present and thresholds_version consistency fleet-wide.
Fix: enforce minimum attribution fields for every wake and calibrate thresholds per temperature band with bounded hysteresis.
Pass criteria: false wake ≤ X/day across temp range; unknown_wake ≤ X/week; seasonal delta ≤ X%.
Diagnostic sessions prevent vehicle sleep — who holds the “stay-awake token”?
Likely cause: a token/lease is not released, or the session gate blocks sleep without a bounded timeout.
Quick check: read stay_awake_token_owner and token_lease_age; confirm session_state and the specific entry_gate_blockers.
Fix: implement lease-based tokens (auto-expire at X s) and explicit session-to-sleep policy with safe teardown actions.
Pass criteria: sleep entry succeeds within X min after session ends; token_lease_age ≤ X s; no stuck blockers in Y cycles.
A single domain wake pulls the whole vehicle awake — is propagation policy too broad?
Likely cause: propagation rules are overly permissive, or hidden dependencies force other domains to wake for shared services.
Quick check: compare domain_wake_counts against wake_source_id; confirm propagation_policy_version and inspect wake fan-out evidence.
Fix: scope propagation by domain + event class, and require explicit dependency declarations with bounded wake budgets.
Pass criteria: fan-out limited to X domains per trigger; vehicle-wide wakes ≤ X/day; wake scope matches policy in Y trials.
OTA/background tasks cause wake storms — how to trim conflicts with wake windows?
Likely cause: background jobs overrun windows and retrigger wakes, or retry loops are not rate-limited under partial connectivity.
Quick check: correlate wake_rate/min with job_overrun_count and window boundaries; verify whether cooldown was applied and respected.
Fix: enforce per-window job budgets (quota X) and slice long tasks across windows; add rate-limit + backoff for retries.
Pass criteria: wake storm ≤ X wakes/min; duty cycle ≤ X%; job_overrun_count = 0 for Y runs.
Filter hit-rate is low but power is high — local wakes or timer too frequent?
Likely cause: power drain is driven by local/timer wakes rather than bus-filter hits, or window schedule is over-aggressive.
Quick check: compare filter_hit_rate to local_wake_count and timer_wake_count; inspect wake_duty_cycle and window period.
Fix: reduce timer frequency, tighten local-wake debounce/vote, and align window cadence with real event arrival statistics.
Pass criteria: duty cycle ≤ X%; timer_wake_count ≤ X/day; Sleep Iq ≤ X µA over Y hours.
“Sporadic event loss” cannot be reproduced — do black-box triggers cover the critical path?
Likely cause: triggers miss boundary conditions (sleep entry, restore, resync, buffer overflow), or retention is insufficient to capture rare episodes.
Quick check: review trigger_table_version and trigger_hit_rate; confirm whether snapshots exist around the suspected boundary window.
Fix: add triggers on drain failure, replay overflow, resync fail, and storm threshold crossing; increase retention or compress snapshots.
Pass criteria: trigger coverage ≥ X% of boundary events; retention ≥ X hours; reproducible capture in ≤ X days MTBF.