Wake/Sleep Strategy: Frame Filters, False Wakes, Timed Wakes
← Back to: Automotive Fieldbuses: CAN / LIN / FlexRay
H2-1 · Definition & Scope Guard: What “Wake/Sleep Strategy” Owns
This page defines a system-level wake/sleep policy that keeps standby power low without losing critical events. The focus is strategy engineering: power-state gating, frame-filter governance, wake attribution, timed wake windows, and “no event loss” verification.
- Prevent scope confusion: wake/sleep strategy is not an EMC or PHY waveform tutorial.
- Fix vocabulary drift: terms and measurement “accounting” must be consistent across teams and releases.
- Make success measurable: every wake has a reason, and every sleep is safe.
- Power-state machine and sleep-entry gates (network + software + safety readiness).
- Wake source taxonomy and wake attribution (evidence chain for every wake).
- Frame-filter table design, versioning, and safe updates (governance, not PHY internals).
- False-wake control (debounce/vote/cooldown/rate-limit) and wake-storm containment.
- Gateway timed wakes (window scheduling) and no event loss mechanisms and verification.
- Selective Wake / Partial Networking (ISO 11898-6): PHY-specific filtering internals belong on the Selective Wake page.
- SBC power tree & rails: LDO sequencing, watchdog/reset hardware belong on SBC pages.
- Controller/bridge specifics: protocol stack, FIFO/register-level behavior belong on Controller/Bridge pages.
- EMC/Protection: CMC, split termination RC networks, TVS parasitics, layout return paths belong on EMC/Protection pages.
- Termination/TVS/CMC value derivations and waveform tuning recipes (handled elsewhere).
- PHY register-by-register walkthroughs; vendor feature comparisons (handled on PHY/SBC pages).
- Security protocol details (crypto/PKI); only interface-level policy hooks are allowed here.
| Term | Definition (what to measure) | Evidence |
|---|---|---|
| Sleep | Lowest-power state where network participation is paused; only wake monitors remain active. | Power-state flag + wake monitor armed + bus quiet. |
| Standby | Reduced-power state with partial logic alive (e.g., timers/filters); not fully active but not deepest sleep. | State machine shows Standby; periodic tasks allowed by policy. |
| Partial networking | Only selected traffic can wake the node; non-relevant frames are ignored to save power. | Filter rules loaded + filter hit counters. |
| Wake source | Category of wake trigger: bus frame, local I/O, timer, diagnostics, safety fault, or policy event. | Wake record with source_id + timestamp. |
| Wake attribution | Evidence chain that explains each wake: what matched, what policy allowed, and what task consumed it. | RuleID + counters snapshot + task_id. |
| No event loss | Required events are neither missed (not captured) nor dropped (captured but not delivered) across sleep/wake cycles. | Event counters + timestamps + delivery acknowledgments. |
- Standby IQ: specify rails included, sample window length, and ambient/voltage conditions (placeholders: V=X, T=X, window=X s).
- False wake: define “no business value” precisely (e.g., no task executed, no event delivered, or filtered as noise).
- Event loss: separate “capture loss” vs “delivery loss” and log both counters.
- Standby (Sleep) IQ ≤ X µA (defined accounting window).
- False wake rate ≤ X / day (vehicle-level).
- Wake attribution coverage ≥ X% (unknown wake ≤ X / week).
- Event loss ≤ X ppm (capture + delivery combined).
- Wake-to-service latency ≤ X ms (policy-dependent).
- Wake storm rate ≤ X wakes/min (containment enabled).
- Policy-ready definitions, targets, and measurement accounting.
- State-machine gates (what must be true to sleep, and what must be restored to serve).
- Filter governance model (rules, priorities, versioning, and safe updates).
- Wake evidence chain requirements (for serviceability and fleet debugging).
H2-2 · System Power-State Machine: Active → PrepareSleep → Sleep → Wake → Restore
A wake/sleep strategy fails most often due to missing gate conditions, recovery races, or “accounting gaps” in evidence. This chapter provides a policy-ready state machine where each transition has entry gates, exit criteria, actions, timeouts, and failure handling.
- Gate by evidence: sleep is permitted only when readiness checks prove stable conditions.
- Restore deterministically: wake is complete only when services are available and logging is armed.
- Time-bound everything: each state has a timeout and a defined fallback path.
- Never lose causality: wake reason must be recorded before any cleanup that could erase evidence.
- Bus idle window ≥ X ms (avoid sleeping into active traffic).
- TX/RX queues empty and no retry backlog (prevent “wake-and-immediate-transmit” churn).
- Error counters stable over X seconds (avoid hiding a deteriorating network condition).
- Diagnostic session state permits sleep (no active keep-alive lock).
- NVM flush complete (no pending commits that could be interrupted).
- Background jobs paused or scheduled into timed-wake windows (OTA/log upload aligned to policy).
- Thermal state stable (no oscillation near shutdown thresholds).
- Brownout risk controlled (supply droop counters not increasing).
- Critical fault actions complete (e.g., safe-state latched if required).
| State | Entry criteria | Actions | Exit criteria | Timeout X | Evidence | Failure handling |
|---|---|---|---|---|---|---|
| Active | Normal operation; wake policy monitors armed. | Update counters; maintain attribution log; prepare sleep token evaluation. | Sleep request + gates A/B/C satisfied. | X | Bus idle timer; token list; counter snapshots. | If gates fail: remain Active and log gate blocker. |
| PrepareSleep | Active with gates trending stable; final drain allowed. | Drain queues; commit NVM; arm wake monitors; persist “sleep intent” record. | All drains complete + gates A/B/C satisfied. | X | Queue depth=0; NVM ok; monitors armed flag. | If timeout: abort sleep → Active; log which drain/gate blocked. |
| Sleep | PrepareSleep completed; deepest allowed policy state. | Enable filters; limit timers; capture baseline IQ; hold minimal retention log. | Wake trigger detected (bus/local/timer/diag/fault). | — | IQ sample window; filter hit counters. | If repeated false wakes: apply cooldown/rate-limit policy (see H2-5). |
| Wake | Wake trigger latched; evidence capture must run first. | Record wake reason; snapshot counters; start clocks; switch transceiver mode; open logging. | Services ready + policy allows full restore. | X | Wake record persisted before cleanup. | If timeout: escalate to safe minimal Active; keep evidence for service. |
| Restore | Wake stage completed; full rejoin and buffers restored. | Reload policy; resubscribe; restore buffers; reconcile counters; schedule next timed wake window. | System operational OR policy chooses return to Sleep. | X | Service-ready marker + buffer integrity checks. | If restore fails: remain Active; degrade features; raise diagnostic event. |
- Single authority: only one policy owner decides state transitions; other modules request via tokens.
- Sticky wake reason: once latched, the wake record is immutable for X seconds.
- Minimum awake dwell: after a wake, remain awake for at least X ms unless safety demands otherwise.
- Gate hysteresis: require gates to remain satisfied for X ms before entering PrepareSleep.
H2-3 · Wake Sources & Attribution: Make Every Wake Explainable
A low-power system becomes unserviceable when wakes are not explainable. A wake attribution model turns “it woke up” into an evidence chain: what triggered, what matched, what ran, and whether value was delivered.
- Wake source taxonomy with measurement-grade evidence fields per source.
- Wake Record schema (black-box) that preserves causality through sleep/wake cycles.
- Conflict resolution rules for multi-source triggers within a single wake episode.
- Metrics that quantify attribution quality and unknown wake rate.
| Source class | Trigger evidence (minimum) | Expected value (minimum) | False-wake signal |
|---|---|---|---|
| Bus wake | rule_id + frame_id + match flags + filter hit counter delta | task executed and/or required event delivered within X ms | wake with no task/action and immediate return-to-sleep |
| Local I/O | io_id + edge + debounce window + input snapshot | local function enabled or event delivered to gateway/app | repeated edges within cooldown window (rate-limit violation) |
| Timer | timer_id + window_id + schedule_version + jitter margin | maintenance tasks completed inside window; next wake scheduled | window wake with no executed tasks (empty window) |
| Network management | nm_state + token reason + peer status snapshot | network consistency restored or policy gate updated | nm wake loops (oscillation) across X minutes |
| Diagnostics | session_id + requester + service class + gate decision | service window opened and request handled within X ms | diagnostic keep-alive holds system awake unexpectedly |
| Fault | fault_id + thermal/brownout snapshot + last reset reason | safe-state action executed and evidence preserved | re-trigger storms without environment changes |
A wake record must be written before any cleanup that can erase causality (queue draining, counter resets, logging rotation). The schema below is structured for fleet analytics and service diagnostics.
| Field group | Fields (examples) | Why it exists |
|---|---|---|
| Identity | record_id, boot_id, uptime_ms, vehicle_time | Enables correlation across reboots, ECUs, and fleet logs. |
| Source & trigger | source_class, source_id, trigger_detail, rule_id, frame_id | Answers “what triggered” in measurement-grade terms. |
| Context snapshot | state_before, bus_util, queue_depth, TEC/REC, filter_hit_deltas, rail/thermal flags | Explains “why it happened then” and exposes gating gaps. |
| Consequence | task_id, action_taken, event_delivered, awake_dwell_ms, result_code | Separates value wakes from false wakes (no task/no delivery). |
| Attribution quality | confidence, unknown_reason_code, conflict_resolved_by, episode_id | Makes “unknown wake” measurable and fixable across the fleet. |
- Attribution completeness ≥ X% over the last Y days.
- Unknown wakes ≤ X / week per vehicle (report by ECU/domain, not only average).
- Value wake ratio ≥ X% (wakes with executed task and/or delivered event).
Multiple triggers can occur within a short time window. Attribution must group triggers into a single wake episode and resolve conflicts deterministically.
- Episode grouping: triggers inside Δt = X ms become one episode_id.
- Priority order: Fault > Diagnostics > Local I/O > Timer > Bus wake > Network management.
- Evidence check: if top-priority lacks required evidence fields, downgrade to the next source class.
- Outcome binding: record the executed task_id and delivery result under the chosen source class.
H2-4 · Frame-Filter Tables: Design, Versioning, and Safe Updates
Frame filters must be treated as a config asset, not a one-off tweak. A maintainable filter table requires: a stable match schema, explicit actions, ownership, test cases, version scope, and observability (hit-rate and leak signals).
- Match keys must explain risk: wider masks increase leak risk; strict payload patterns increase miss risk.
- Actions must be graded: IGNORE, WAKE_MIN, WAKE_FULL (tie into the power-state machine).
- Every rule must be owned: owner + testcase + rollback strategy.
- Observability is mandatory: rule_id and hit counters must be exportable for attribution.
| Policy | Upside | Risk | Typical use |
|---|---|---|---|
| Default deny | Lower false wakes; predictable standby power. | Miss risk if new required event is not whitelisted. | Production low-power strategy with timed-wake fallback. |
| Default allow | Lower miss risk; easier bring-up and diagnostics. | Higher false wakes; wake storms; standby IQ increases. | Time-bounded debug/diagnostic windows with strict expiry. |
| RuleID | Match | Action | Priority | Owner | Testcase | Version scope | Telemetry |
|---|---|---|---|---|---|---|---|
| R-CRIT-001 | ID/mask + DLC + payload pattern + cycle constraint + domain tag | WAKE_FULL | High | System policy | TC-001 | SW vX.Y, vehicle A | hit_count, wake_count, leak_flag |
| R-MAINT-010 | ID/mask + cycle constraint (window aligned) + domain tag | WAKE_MIN | Med | Gateway | TC-010 | SW vX.Y, domain B | hit_count, dwell_ms |
- Schema compatibility: new table fields remain parseable; unknown fields fail closed.
- Shadow deploy: download rules but do not wake; record potential matches and hit-rate.
- Canary enable: enable on a small fleet slice; watch false wakes, unknown wakes, and event loss.
- Rollback plan: define trigger thresholds (X) and revert to last known-good version.
- Filter leak false wakes ≤ X / day (attributed by rule_id).
- Top-N hit-rate stable within X% across Y days after rollout.
- Rollback trigger: event loss > X ppm or unknown wakes > X/week in canary.
H2-5 · False-Wake Control: Reduce Noise, Debounce Events, Rate-Limit Wakes
False wakes waste standby energy and can escalate into a wake storm. A robust control layer must make noisy triggers hard to satisfy (debounce/vote), keep damage bounded (cooldown/rate limit), and preserve an evidence trail for fleet debugging.
- Three-gate control: Debounce → Vote → Rate limit (deterministic, parameterized).
- Wake storm detection: wake rate, source skew, and bus-activity correlation signals.
- Mitigation actions: degrade modes, temporary bans, and alert/report hooks.
- Strategy table with side-effects and observability requirements.
- Stable-for window: require trigger persistence for X ms before admitting wake.
- Integrate window: accumulate evidence for X ms; wake only if score ≥ X.
- Max-gap window: require repeated confirmations with gaps ≤ X ms to avoid single spikes.
- AND gate: require two independent signals (e.g., rule hit + cycle constraint) before waking.
- OR gate with priorities: allow high-priority sources while bounding low-priority noise.
- Episode binding: vote decisions reference a single episode_id for consistent attribution.
- Wake rate threshold: storm if ≥ X wakes/min (per ECU/domain).
- Cooldown: enforce minimum X sec between admitted wakes for low-priority sources.
- Token bucket: allow bursts up to X tokens then throttle to X/min.
- Wake rate: episodes/min over a sliding window X sec.
- Source skew: top source_id/rule_id share ≥ X% (suggests a single noisy origin).
- Bus-activity correlation: compare wake timestamps with bus utilization and filter hit deltas.
- Cooldown violations: repeated attempts inside cooldown (attempt_count ≥ X).
- Wake storm threshold = X wakes/min; storm duration ≤ X min before mitigation succeeds.
- Cooldown = X sec; cooldown violation attempts ≤ X/hour.
- False wake rate ≤ X/day and unknown wakes ≤ X/week (per vehicle).
- WAKE_MIN: run only the minimal task set; cap awake dwell to X ms.
- Task shedding: disable non-critical tasks while storm mode is active.
- Adaptive windows: increase debounce window and cooldown during storm.
- Blocklist: ban source_id/rule_id for X minutes (auto-expire).
- Tiered bans: ban only low-priority layers; keep critical wakes admissible.
- Evidence required: write episode_id + top sources + counters before applying bans.
- storm_start / storm_end events with top sources and parameter snapshots.
- Serviceability payload: source skew histogram + correlation hints + cooldown/bans applied.
- Black-box alignment: link to Wake Record fields (episode_id, source_id, result_code).
| Mechanism | When | Parameter X | Side-effect | Observability |
|---|---|---|---|---|
| Debounce | Noisy triggers; unstable edges; short spikes | window = X ms | Added latency; miss risk if too strict | attempt_count, stable_score, episode_id |
| Vote | Reduce single-point noise; require independent confirmation | k-of-n = X | Complexity; needs clear priority rules | vote_inputs, chosen_source, confidence |
| Cooldown | Repeated wakes from same source in short windows | X sec | May delay legitimate low-priority events | cooldown_hits, violation_attempts |
| Rate limit | Wake storms or bursty attempts | X wakes/min | May increase miss risk if applied globally | wake_rate, source_skew, storm_state |
| Temporary ban | Single source dominates; repeated violations | ban TTL = X min | Potential miss for that source; must avoid critical bans | banned_source_id, ban_reason_code |
H2-6 · Gateway Timed Wakes: Scheduling Windows to Avoid Event Loss
Passive wakes alone cannot guarantee “no event loss” when gateways sleep. Timed wakes create controlled service windows for periodic tasks and improve event capture probability while bounding duty cycle and overlap across domains.
- Patrol: health checks, counter snapshots, policy synchronization.
- Aggregation & upload: batch logs/telemetry and serviceability payloads.
- Critical domain sync: cross-domain alignment and short maintenance windows.
- Period (P): repeat interval between windows.
- On-time (T_on): window open duration to run required tasks.
- Guard time (T_guard): buffer before/after to absorb drift and scheduling delays.
- Jitter budget (J): allowed wake-time deviation (placeholder).
- Duty cycle: T_on / P, directly tied to standby power.
- Event capture probability ≥ X% over the last Y days.
- Window overlap ratio ≤ X% across Domain A/B/C schedules.
- Jitter within budget ≥ X% of windows (late/early windows bounded).
- Define cadence: each domain owns a base Period and required On-time.
- Stagger windows: avoid overlap for high-rate windows; align only when necessary.
- Resolve conflicts: preserve critical windows; degrade others to WAKE_MIN if overlap is unavoidable.
- Record evidence: log window_id, actual_on_time, and overlap flags for optimization.
| Domain | Period (P) | On-time (T_on) | Guard (T_guard) | Purpose | Pass criteria |
|---|---|---|---|---|---|
| Domain A | X s | X ms | X ms | Patrol + counters | capture ≥ X% |
| Domain B | X s | X ms | X ms | Aggregation/upload | overlap ≤ X% |
| Domain C | X s | X ms | X ms | Critical sync | jitter ok ≥ X% |
H2-7 · “No Event Loss” Design: Buffering, Holdover, and Resync Rules
“No event loss” is a contract, not a slogan. It must be enforced by explicit rules across three phases: sleep entry (drain/commit), sleep holdover (retain), and wake restore (replay/resync).
- Buffering rules for sleep entry drain, wake restore replay, and queue sizing.
- Holdover contract for critical events (retain/retry with bounds).
- Resync sequence to restore subscriptions, filter tiers, and counters safely.
- Event Integrity Checklist (evidence fields + pass criteria placeholders).
- Event evidence: event_id · seq · timestamp · producer_id · domain_id.
- Event classes: Critical · Maintenance · Telemetry (drop policy depends on class).
- Delivery states: delivered · buffered · dropped · re-sent (every transition has a reason code).
- Max acceptable event gap = X ms (delivery gap across sleep/wake boundaries).
- Buffer depth = X events (must cover peak burst + window coverage).
- Drop rate ≤ X ppm (by class and by producer).
- Drain targets: pending tx/rx events, local processing queue, persistence queue.
- Exit conditions: pending_events = 0 OR drain_timeout_hit = true (timeout = X ms).
- Evidence: last_seq_snapshot, drain_result_code, pending_by_class.
- Critical: no drops; retain and retry or escalate wake mode.
- Maintenance: allow bounded drops with drop_reason_code and counters.
- Telemetry: allow sampling/aggregation to protect storage and CPU budget.
- Every drop is logged: drop_count_by_class, top_producer, and peak_depth.
- Replay order: by seq/timestamp (monotonic delivery).
- Replay bounds: replay_count ≤ X events OR replay_time ≤ X ms.
- Fail-safe: if bounds exceeded, enter WAKE_MIN and raise integrity_risk flag.
- Retain-until-delivered: critical events stay in holdover until delivered or escalated.
- Retry rules: retry_count ≤ X with backoff/cooldown (avoid wake storms).
- Evidence: holdover_queue_depth, oldest_age_ms, last_retry_reason.
- Resubscribe: restore consumer subscriptions (topic/domain list).
- Enable filters by tier: Critical → Maintenance → Diagnostic (avoid early noise).
- Counter resync: align seq/timestamp and validate jumps (jump_threshold = X).
| Phase | Check | Evidence fields | Pass criteria |
|---|---|---|---|
| Before sleep | Drain complete or bounded by timeout | pending_by_class · drain_timeout_hit · last_seq_snapshot | drain_time ≤ X ms |
| During sleep | Holdover queue bounded, no critical drops | holdover_depth · critical_drop_count · oldest_age_ms | critical_drop_count = 0 |
| After wake | Replay bounded, resync succeeds | replay_count · resync_fail_reason · max_gap_ms | max_gap ≤ X ms |
| Fleet view | Drop rate and unknown integrity risks bounded | drop_ppm_by_class · integrity_risk_count | drop ≤ X ppm |
H2-8 · Diagnostics & Logging: Black-Box for Wake/Sleep Failures
Intermittent wake/sleep failures cannot be fixed without a black-box record. The black-box must capture counters, snapshots, and triggered traces with bounded storage and clear export rules.
- Wake layer: wake_count · reason_histogram · unknown_wake_count · storm_state transitions.
- Policy layer: rule_version · policy_version · filter_hit_topN · window_id and overlap flags.
- Bus health: bus_utilization · error_counters_snapshot · bus_off_count (statistics only).
- Integrity: buffer_depth_peak · drop_count_by_class · max_gap_ms · replay_count · resync_fail_count.
- Ring buffer: always-on counters and compact episodes to guarantee retention ≥ X hours.
- Sampling rules: telemetry can be downsampled; critical events are not downsampled.
- Triggered capture: on storm_start, resync_fail, drop_rate_exceed, bus_off — capture a bounded bundle.
- Export bundle: packaged records with versions and integrity checks for service tooling.
- Retention ≥ X hours for counters + episode summaries.
- Unknown wake MTBF ≥ X days (unknown wake interval).
- Export time ≤ X seconds for a standard time range.
| Field | Type | Update mode | Rate | Retention |
|---|---|---|---|---|
| episode_id | u32 | event-driven | per wake | ≥ X hours |
| source_id / rule_id | u16/u16 | event-driven | per episode | ≥ X hours |
| reason_histogram | fixed bins | periodic + event | X sec | ≥ X hours |
| filter_hit_topN | u32[N] | periodic | X sec | ≥ X hours |
| buffer_depth_peak | u16 | periodic + event | X sec | ≥ X hours |
| export_bundle_version | u16 | on export | per export | N/A |
| Trigger | Data captured | Retention |
|---|---|---|
| storm_start | top sources · cooldown/bans · wake_rate window · counters snapshot | ≥ X hours |
| resync_fail | policy/table versions · replay summary · fail_reason_code | ≥ X hours |
| drop_rate_exceed | drop counts by class · peak buffer depth · top producers | ≥ X hours |
| bus_off | error counters snapshot · utilization window · policy versions | ≥ X hours |
H2-7 · “No Event Loss” Design: Buffering, Holdover, and Resync Rules
“No event loss” is a contract, not a slogan. It must be enforced by explicit rules across three phases: sleep entry (drain/commit), sleep holdover (retain), and wake restore (replay/resync).
- Buffering rules for sleep entry drain, wake restore replay, and queue sizing.
- Holdover contract for critical events (retain/retry with bounds).
- Resync sequence to restore subscriptions, filter tiers, and counters safely.
- Event Integrity Checklist (evidence fields + pass criteria placeholders).
- Event evidence: event_id · seq · timestamp · producer_id · domain_id.
- Event classes: Critical · Maintenance · Telemetry (drop policy depends on class).
- Delivery states: delivered · buffered · dropped · re-sent (every transition has a reason code).
- Max acceptable event gap = X ms (delivery gap across sleep/wake boundaries).
- Buffer depth = X events (must cover peak burst + window coverage).
- Drop rate ≤ X ppm (by class and by producer).
- Drain targets: pending tx/rx events, local processing queue, persistence queue.
- Exit conditions: pending_events = 0 OR drain_timeout_hit = true (timeout = X ms).
- Evidence: last_seq_snapshot, drain_result_code, pending_by_class.
- Critical: no drops; retain and retry or escalate wake mode.
- Maintenance: allow bounded drops with drop_reason_code and counters.
- Telemetry: allow sampling/aggregation to protect storage and CPU budget.
- Every drop is logged: drop_count_by_class, top_producer, and peak_depth.
- Replay order: by seq/timestamp (monotonic delivery).
- Replay bounds: replay_count ≤ X events OR replay_time ≤ X ms.
- Fail-safe: if bounds exceeded, enter WAKE_MIN and raise integrity_risk flag.
- Retain-until-delivered: critical events stay in holdover until delivered or escalated.
- Retry rules: retry_count ≤ X with backoff/cooldown (avoid wake storms).
- Evidence: holdover_queue_depth, oldest_age_ms, last_retry_reason.
- Resubscribe: restore consumer subscriptions (topic/domain list).
- Enable filters by tier: Critical → Maintenance → Diagnostic (avoid early noise).
- Counter resync: align seq/timestamp and validate jumps (jump_threshold = X).
| Phase | Check | Evidence fields | Pass criteria |
|---|---|---|---|
| Before sleep | Drain complete or bounded by timeout | pending_by_class · drain_timeout_hit · last_seq_snapshot | drain_time ≤ X ms |
| During sleep | Holdover queue bounded, no critical drops | holdover_depth · critical_drop_count · oldest_age_ms | critical_drop_count = 0 |
| After wake | Replay bounded, resync succeeds | replay_count · resync_fail_reason · max_gap_ms | max_gap ≤ X ms |
| Fleet view | Drop rate and unknown integrity risks bounded | drop_ppm_by_class · integrity_risk_count | drop ≤ X ppm |
H2-8 · Diagnostics & Logging: Black-Box for Wake/Sleep Failures
Intermittent wake/sleep failures cannot be fixed without a black-box record. The black-box must capture counters, snapshots, and triggered traces with bounded storage and clear export rules.
- Wake layer: wake_count · reason_histogram · unknown_wake_count · storm_state transitions.
- Policy layer: rule_version · policy_version · filter_hit_topN · window_id and overlap flags.
- Bus health: bus_utilization · error_counters_snapshot · bus_off_count (statistics only).
- Integrity: buffer_depth_peak · drop_count_by_class · max_gap_ms · replay_count · resync_fail_count.
- Ring buffer: always-on counters and compact episodes to guarantee retention ≥ X hours.
- Sampling rules: telemetry can be downsampled; critical events are not downsampled.
- Triggered capture: on storm_start, resync_fail, drop_rate_exceed, bus_off — capture a bounded bundle.
- Export bundle: packaged records with versions and integrity checks for service tooling.
- Retention ≥ X hours for counters + episode summaries.
- Unknown wake MTBF ≥ X days (unknown wake interval).
- Export time ≤ X seconds for a standard time range.
| Field | Type | Update mode | Rate | Retention |
|---|---|---|---|---|
| episode_id | u32 | event-driven | per wake | ≥ X hours |
| source_id / rule_id | u16/u16 | event-driven | per episode | ≥ X hours |
| reason_histogram | fixed bins | periodic + event | X sec | ≥ X hours |
| filter_hit_topN | u32[N] | periodic | X sec | ≥ X hours |
| buffer_depth_peak | u16 | periodic + event | X sec | ≥ X hours |
| export_bundle_version | u16 | on export | per export | N/A |
| Trigger | Data captured | Retention |
|---|---|---|
| storm_start | top sources · cooldown/bans · wake_rate window · counters snapshot | ≥ X hours |
| resync_fail | policy/table versions · replay summary · fail_reason_code | ≥ X hours |
| drop_rate_exceed | drop counts by class · peak buffer depth · top producers | ≥ X hours |
| bus_off | error counters snapshot · utilization window · policy versions | ≥ X hours |
H2-11 · Applications: Where Wake/Sleep Strategy Dominates
- LIN transceiver: TI TLIN1029-Q1; NXP TJA1021
- Partial-networking CAN transceiver: TI TCAN1145-Q1; NXP TJA1145
- Mini CAN SBC (MCU supply + watchdog + CAN): NXP UJA1169
Pitfall: “bus activity wake” becomes unexplainable without source_id/rule_id and hysteresis; service teams cannot isolate the culprit node.
Measurement: reason histogram, unknown_wake_count, wake_rate/min, per-source cooldown_applied ratio, wake latency distribution by source.
Pitfall: wake windows drift or overlap unintentionally, causing either event loss or battery drain.
Measurement: window schedule logs, overlap rate, captured/expected counters, max_event_gap_ms, drop_ppm_by_class.
- Isolated CAN transceiver: TI ISO1042-Q1; NXP TJA1052i
- CAN FD transceiver (non-isolated): Microchip MCP2562FD; TI TCAN1042-Q1
- MCU/SoC (policy + logging): Infineon AURIX TC3xx; NXP S32K3
Pitfall: accepting “wake” before filters/queues/counters are consistent causes phantom loss and intermittent faults.
Measurement: resync_fail_count, restore_time_ms (p95), unknown_wake_rate, critical_event_drop_ppm.
Pitfall: bounded buffers without class-aware policy lead to “silent loss” during repeated sleep cycles.
Measurement: max_gap_ms, buffer_high_watermark, replay_count, drop_ppm_by_class.
- Gateway SoC: NXP S32G2; Renesas R-Car
- External CAN FD controller (SPI): Microchip MCP2517FD
- CAN FD transceiver: Microchip MCP2562FD; NXP TJA1044
Pitfall: background jobs overrun wake windows, forcing uncontrolled “stay-awake” behavior or losing events at the boundary.
Measurement: window_on_time_ms, job_overrun_count, export_bundle_size, window_capture_probability.
Pitfall: logging is either too sparse (no proof) or too heavy (power/flash wear).
Measurement: ring_buffer_fill, trigger_hit_rate, export_success_rate, unknown_wake_MTBF ≥ X days.
H2-12 · IC Selection Logic: What Features Matter for Wake/Sleep Success
- If partial networking / selective wake is required, then require a transceiver/SBC that supports selective wake identification and exposes wake reason fields. Verify: unknown wake ≤ X/week; attribution completeness ≥ X%.
- If timed wakes are required (rendezvous windows), then require a reliable low-power timer/clock path and bounded wake scheduling hooks. Verify: capture probability ≥ X%; duty cycle ≤ X%; window overlap ≤ X%.
- If “no event loss” is a hard contract, then require host-side buffering/holdover/replay support and fail-safe receive behavior during transitions. Verify: drop ≤ X ppm; max gap ≤ X ms; replay_count bounded.
- If serviceability is mandatory, then require black-box export hooks (counters + snapshots + retention) and stable table/version identifiers. Verify: retention ≥ X hours; export success ≥ X%.
| Spec | Why it matters | Target (X) | Verify |
|---|---|---|---|
| Standby / Sleep IQ | Defines battery drain floor and constrains window duty cycle. | Sleep Iq ≤ X µA | Unified IQ method; report median/p95/max over X s window. |
| Selective wake / partial networking | Reduces false wakes and enables domain-level power saving. | false wake ≤ X/day | Measure reason histogram + filter hit top-N; unknown wake ≤ X/week. |
| Wake reason reporting | Makes every wake explainable and serviceable. | coverage ≥ X% | Wake record fields: source_id / rule_id / timestamp / counters. |
| Timer + low-power clock path | Enables rendezvous windows without staying awake. | capture ≥ X% | Window overlap ≤ X%; duty cycle ≤ X%; guard time = X ms. |
| Fail-safe behavior during transitions | Prevents event loss and phantom faults at wake/restore boundaries. | drop ≤ X ppm | Max gap ≤ X ms; resync success ≥ X% within X ms. |
| Capability / role | Example parts | Typical verification focus |
|---|---|---|
| Selective wake / partial networking CAN | TI TCAN1145-Q1 · NXP TJA1145 | unknown wake ≤ X/week · false wake ≤ X/day · wake reason coverage ≥ X% |
| CAN FD transceiver (general) | Microchip MCP2562FD · TI TCAN1042-Q1 · NXP TJA1044 | wake latency p95 ≤ X ms · drop ≤ X ppm at boundaries |
| Isolated CAN (cross-domain) | TI ISO1042-Q1 · NXP TJA1052i | resync success ≥ X% · integrity risk counters stable under domain transitions |
| LIN transceiver (low-power + wake) | TI TLIN1029-Q1 · NXP TJA1021 | Sleep Iq ≤ X µA · false wake/day on LIN ≤ X · bus/local wake attribution |
| CAN SBC (MCU supply + watchdog + CAN) | NXP UJA1169 | standby behavior repeatable · wake source export stable · version identifiers readable |
| External CAN FD controller (SPI) | Microchip MCP2517FD | sleep entry drain behavior · FIFO/queue boundary logging · drop_ppm accounting |