123 Main Street, New York, NY 10001

Safety Watchdog for ASIL/SIL

← Back to: Supervisors & Reset

What It Solves

Independent WDT clock + window constraints + auditable logs = a safety loop that detects deadlocks, timing faults, and clock stalls, then enforces fail-safe actions with evidence for audits.

Jitter & ISR latency

Size T_min/T_max with guard bands so task jitter and interrupt latency do not cause false trips.

Main clock stalls

An independent clock/power domain with stall detection exposes failures even if the SoC PLL halts.

Audit-ready evidence

Structured logs capture timestamp, violation type, detect latency, action, evidence hash for coverage reports.

  • Application deadlock/disorder → abnormal feeding (early/late/missing/too-often)
  • Window violation detected in watchdog’s own clock domain
  • Fail-safe action: RESET_SAFE pulse/hold or graded degrade
  • Evidence logged → auditable coverage report
Scope vs excluded topics for ASIL/SIL Safety Watchdog Left: covered modules (independent clock domain, window WDT, fault classification, fail-safe, audit logs). Right: greyed exclusions (simple WDT, window supervisor, POR/BOD, reset buffer, RTC/switchover, debounce). Center arrow shows deadlock → abnormal feed → window violation → RESET_SAFE/logs. Covered Independent clock / power Window WDT (T_min / T_max) Faults: Early / Late / Missing Too-often / Clock-stall Fail-safe: RESET / Degrade Audit logs & coverage Excluded Simple watchdog Window supervisor POR / BOD Reset buffer RTC / Switchover Push-button debounce Deadlock / disorder Abnormal feed Window violation RESET / Logs

Safety Watchdog Architecture

Independent clock/rail feeds a window comparator that classifies early/late/missing, too-often, and clock-stall faults, driving RESET_SAFE, FAULT_LATCH, and DIAG channels. Key traps: t_pw(min), polarity, re-arm and debounce conditions.

Clock domain

XTAL/independent RC with stall detection; do not derive from main PLL.

Window logic

Compare T_feed ∈ [T_min, T_max] with guard bands; classify early/late/missing/too-often.

Outputs & diagnostics

RESET_SAFE pulse/hold, FAULT_LATCH sticky, and DIAG/I²C/PMBus/SPI logging.

Independent clock, window comparator, fault classifier, fail-safe outputs, DIAG Left: WDI/WEN inputs. Top: independent clock with stall detect. Middle: window comparator into fault classifier. Right: RESET_SAFE, FAULT_LATCH, DIAG/I²C/PMBus/SPI. Inputs WDI WEN PG (opt.) Independent clock XTAL / RC + stall detect Window comparator T_min T_max Valid feed window Fault classifier Early / Late / Missing Too-often Clock-stall Outputs & diagnostics RESET_SAFE (pulse / hold) FAULT_LATCH (sticky) DIAG / I²C / PMBus / SPI t_pw(min) · re-arm · debounce · polarity · I_Q · AEC-Q100

Timing Windows & Jitter Budget

Under task jitter, ISR latency, and DVFS transitions, size the watchdog as a valid feed band with an auditable derivation, not just two hard limits.

Copyable formula

Given baseline period T_task, max jitter ±J, worst ISR/scheduler latency L, and safety margin Δguard:
T_min = T_task − J − L − Δguard
T_max = T_task + J + L + Δguard
Valid band: [T_min + Δguard, T_max − Δguard]

Multi-task feed: release a token only after critical tasks complete to prevent “empty feeds”.

1) Sample task period distribution (≥10k events across temp/voltage/load).

2) Estimate J and L (extreme + P95/P99).

3) Set Δguard = 5–15% per risk and aging margin.

4) Sweep parameters to measure false-trip and miss rates.

5) Freeze to registers/OTP with versioned evidence.

  • Mask window during DVFS transitions and enable a first-feed delay after sleep.
  • At low-frequency modes, relax T_max and log configuration changes.
  • All feed validation must occur in the watchdog’s own clock domain.
Valid feed region under jitter/latency budget Timeline with early/late red bands and a green valid band between T_min and T_max. Sparse dots show observed feed times. T_min T_max Valid feed band Early Late

Fault-Injection & Self-Test

Build scripts and C_diag% coverage. Separate power-on self-test, online periodic, and maintenance mode. Define pass criteria and log fields for audits.

Types: Early feed / Late feed / Missing feed / Too-often

Clock: Stall / frequency skew

Boundary: Window drift / threshold perturbation

Pass criteria

  • Late/Missing: trigger RESET_SAFE ≥ t_pw within T_max + Δdetect.
  • Early/Too-often: set FAULT_LATCH within N violations and degrade.
  • Clock-stall: detect within T_stall_max and enter safe state.

C_diag% = 1 − (undetected / total injections)

Log fields: timestamp, injection_type, detect_latency, action_result, evidence_hash / log_ptr, config_version, tester_id

  • Power-on self-test: cover primary failures at boot.
  • Online periodic: low-duty rolling injections with time-window avoidance.
  • Maintenance mode: factory/service full matrix + exportable report.
Injection types → expected actions → required logs Left column lists injection types; middle column shows expected actions; right column shows required log fields. Footer shows C_diag% computation. Injection types Early feed Late feed Missing feed Too-often Clock stall Window drift Expected actions Degrade / FAULT_LATCH RESET_SAFE ≥ t_pw RESET_SAFE ≥ t_pw FAULT_LATCH within N Safe state within T_stall_max Re-evaluate T_min/T_max Required logs timestamp; type; latency action_result; t_pw evidence_hash / log_ptr config_version; tester_id C_diag% = 1 − (undetected violations / total injections) Text-view must match JSON export; evidence must be hash-verifiable.

Fail-Safe Policies

Policies must be auditable tables: Reset (pulse ≥ t_pw(min)), Degrade (limit power/disable non-critical/read-only), and Hold-off (manual/remote unlock). Guard against mis-trips during power ramp, DVFS, and sleep-to-wake transitions.

Level 1 · Reset

Pulse width ≥ t_pw(min); for Late/Missing or clock-stall criticals.

Level 2 · Degrade

Disable non-critical blocks, limit power, or enter read-only after Early/Too-often or boundary drift.

Level 3 · Hold-off

Keep power removed until authorized; for persistent/severe violations or trust failures.

  • Power-ramp mask of the window counter.
  • DVFS suppression during frequency transitions; restore with delay.
  • Wake-up compensation for the first feed after sleep.

Table columns to freeze in BOM/validation: fault_class · preconditions · action · parameters (t_pw, N, T_stall_max, re-arm, polarity) · logging (timestamp, latency, action_result, evidence_hash, config_version).

Reset/degrade/hold-off ladder with masks for ramp/DVFS/sleep Three-step ladder: L1 Reset, L2 Degrade, L3 Hold-off. Left gates for ramp/DVFS/sleep. Bottom strip lists audit fields. Condition gates Ramp mask DVFS suppress Wake compensation Level 1 · Reset Pulse ≥ t_pw(min) Level 2 · Degrade Limit power · Disable non-critical · Read-only Level 3 · Hold-off Manual/remote unlock required Audit fields timestamp · latency · action_result · evidence_hash · config_version

Integration Patterns

Minimal interlocks among lock-step MCU, PMIC safety state machine, and reset tree. When multiple sources fault simultaneously, choose the most conservative path.

Lock-Step MCU

Dual-token feeds only after each core completes critical tasks; out-of-phase feeds to avoid common-mode misses; any-side violation triggers policy.

PMIC Interlock

WDT FAIL → PMIC safety state (limit/hold-off). PMIC PG → enables WDT window to prevent early mis-trips. Reverse path elevates policy level when PMIC faults.

Reset Tree

Fan-out buffers and level compatibility; avoid back-powering and glitches; ensure RESET_SAFE pulse width/hold-time survives the tree.

Arbitration: when MCU and PMIC report faults together, select Hold-off > Reset > Degrade. Log reason codes and source IDs.

Lock-step dual-feed, PMIC interlock, reset tree fan-out with level compatibility Left: lock-step dual-token/out-of-phase feeds. Top: PMIC↔WDT interlocks. Right: reset tree fan-out and level compatibility. Bottom: conservative arbitration path. Lock-step MCU Dual-token feed Out-of-phase feeds Any-side violation → action WDT PMIC WDT FAIL → PMIC safety PMIC PG → enable WDT window Reset tree Fan-out buffers · level compatibility No back-power · no glitches Pulse/hold-time survives tree Arbitration: Hold-off > Reset > Degrade Log reason codes and source IDs for audits

Selection Guide

Focus on differences & availability rather than long catalogs. Prioritize independent clock, window watchdog, and auditable diagnostics. Use the axes below as hard selection criteria; freeze decisions into BOM notes.

Clock independence: XTAL / independent RC / isolated divider; include clock-stall detect.

Window granularity & interface: Tmin/Tmax step, via OTP / pins / I²C / PMBus / SPI / one-wire.

Diagnostics: DIAG/TLOG/I²C readouts for fault_class, latency, config_version; exportable JSON parity with UI text.

Safety grade & temp: AEC-Q100 level, ASIL/SIL collateral (Safety Manual/FMEDA), −40…125/150 °C range.

RESET/FAULT polarity & pulse: selectable polarity, t_pw(min), re-arm, debounce; verify at reset tree endpoints.

IQ & standby: sleep keep-alive strategy, first-feed compensation after wake.

Package & pins: SOT-23/DFN/QFN pinouts; EN/SET/WDI position and power-up order.

Representative parts & why (engineering-oriented)

TI: TPS3430-Q1 — independent window WDT, programmable reset delay; good for tight window + external feed path.

TPS3813-Q1 — UV/OV supervise + WDT + reset; when power and WDT supervision must be combined.

ST: STWD100 — simple external WDI, low IQ; fits cost/area-constrained basic window/timeout monitoring.

NXP: FS26 Safety SBC — integrates challenge-response watchdog + multi-rail safety outputs; suits ASIL B–D partitioning.

MC33FS6526 — fail-safe outputs + challenger watchdog; tight PMIC + WDT coupling.

Renesas: RAA271000 safety PMIC — challenge-response WDT, reset generator, safety shut-off; pairs with high-compute SoC.

onsemi: NCV8668 — LDO + window WDT + reset, low IQ; compact supply+WDT merge.

NCV97400 — multi-output PMIC with WDT/monitoring for ADAS power trees.

Microchip: MCP1317 / MCP1320 (AEC-Q) — supervisor with WDI & configurable reset polarity; pair with MCU internal WDT for dual-channel.

Melexis: MLX81124 / MLX80051 (LIN SBIC) — does not include an independent window WDT; typical strategy is Melexis SBIC + external window WDT for dual-channel safety.

  • Polarity/pulse/order: RESET/FAULT polarity, t_pw(min), re-arm vary across brands; verify at reset-tree endpoints.
  • Interface semantics: SBC/PMIC families (NXP FS / Renesas RAA / onsemi NCV97xxx) differ in challenge-response and safety state machines.
  • Safety collateral: require Safety Manual/FMEDA and diagnostic coverage guidance.
  • Pin/footprint: SOT-23/DFN/QFN pin swaps on EN/SET/WDI can cause empty-feed or false reset at power-up.

Validation & Coverage Report

Acceptance by C_diag%. Pass criteria: Late/Missing → trigger RESET_SAFE ≥ t_pw within T_max + Δ_detect; Early/Too-often → set FAULT_LATCH within N violations; Clock-stall → detect within T_stall_max and enter safe state.

Laboratory

  • Window boundary sweep (Tmin/Tmax) and jitter/latency scanning (J/L/Δguard).
  • Feed storm / no-feed; period drift; clock skew/stall injection.
  • Temp/voltage drift; cold/hot start; power-ramp + DVFS + first-feed compensation.

In-vehicle / System

  • Supply disturbance (crank/load-dump emulation).
  • Task crash injections (early/late/miss/too-often).
  • Wake scenarios (RTC / PG-OR / WDT-IRQ) and mis-trip rate statistics.

Device-level cues (align with the parts above)

TI: TPS3430-Q1 / TPS3813-Q1 — verify window step/tolerance and reset delay; record program version.

ST: STWD100 — confirm timeout & reset pulse variation across temp; check AEC-Q100 grade.

NXP: FS26 / MC33FS6526 — run challenge-response + fail-safe linkage; log escalation paths.

Renesas: RAA271000 — check reset generator transparency through reset tree (polarity/hold-time).

onsemi: NCV8668 / NCV97400 — LDO+WDT bench (low IQ standby) and multi-rail monitor regression for ADAS trees.

Microchip: MCP1317 / MCP1320 — dual-channel with MCU internal WDT; verify WDI polarity/pulse.

Melexis: MLX81124 / MLX80051 — pair with external window WDT; align LIN/SBIC diagnostics with WDT logs.

Report fields (minimum)timestamp, injection_type, expected_action, detect_latency, action_result, evidence_hash/log_ptr, tester_id, config_version.

Auditable coverage report fields and traceability chain Left: injection types. Middle: expected actions with latency bar. Right: report fields and trace chain (log pointer / evidence hash). Footer: C_diag% formula. Injection types Early Late Missing Too-often Clock-stall Expected actions Degrade / FAULT_LATCH RESET_SAFE ≥ t_pw RESET_SAFE ≥ t_pw FAULT_LATCH within N Safe state ≤ T_stall_max detect_latency Report fields timestamp; injection_type expected_action; detect_latency action_result evidence_hash / log_ptr tester_id; config_version C_diag% = 1 − (undetected violations / total injections) Text view must match JSON export; evidence must be hash-verifiable.

Small-Batch Procurement Hooks

Ship the first prototype without rework. Freeze watchdog math, reset timing, diagnostics export, and safety collateral before PO. Use the copy-paste BOM note and submit only the minimum required fields so sourcing can deliver second-source options within 48 hours.

BOM Notes (copy & paste)

Safety WDT: independent clock domain; window feed = Ttask ± J (incl. ISR latency L); T_min/T_max frozen; RESET_SAFE ≥ X ms; DIAG log export required; AEC-Q100 Grade X; pin/polarity must match reset tree; provide Safety Manual & FIT.

Replace variables (X, T_min/T_max, J, L, Grade) with your values before you paste into PLM.

Task timing: T_task, jitter ±J, ISR/scheduler latency L

Safety target: ASIL/SIL level; operating temp range

Reset semantics: polarity, t_pw(min), re-arm requirement

Diagnostics preference: I²C / PMBus / SPI / DIAG

Second source: mandatory? acceptable package/pin swaps?

Second-source policy — Provide a pin/polarity/timing delta sheet; if polarity or power-up order differs, run reset-tree passthrough tests and log results. Include Safety Manual/FMEDA excerpt and one short-lead-time supply proof per candidate.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs

Why must the watchdog clock be independent for ASIL/SIL?

Independence avoids common-cause failures tied to the main PLL or shared oscillators. If the system clock stalls, the watchdog must still advance, detect lateness, and trigger a safe action. Standards expect clear independence plus an audit trail: timestamps, configuration version, and results linked to evidence.

How to size T_min/T_max with jitter and ISR latency?

Start from the task period T_task and characterize jitter ±J and worst-case latency L. Pick a guard Δ_guard, then compute T_min = T_task − J − L − Δ_guard and T_max = T_task + J + L + Δ_guard. Validate with boundary sweeps and freeze values in OTP or registers with change control.

Early vs late feeds—reset or degrade?

Early or too-often feeds usually indicate logic disorder or feed spoofing, so prefer degrade or locked modes that restrict non-critical functions. Late or missing feeds often reflect task stalls or timebase faults, so prioritize a deterministic reset with guaranteed pulse width and audited action results.

How to run online self-tests without service hits?

Use low-duty rolling injections that avoid service windows, shifting test timing with workload. Escalate only on confirmed violations, falling back to degrade rather than immediate resets when continuity matters. Log detect latency, classification, and evidence hashes so audits can verify both coverage and impact boundaries.

What belongs to an audit-ready coverage report?

Include timestamp, injection type, expected action, detect latency, action result, and an evidence hash or log pointer. Add tester ID and configuration version to anchor repeatability. Summaries should present per-class detection ratios and boundary cases, with raw logs retained for traceable re-analysis when required.

How to avoid false trips during power ramp/DVFS/sleep?

Mask the window counter during ramps, suppress feeds through DVFS transitions, and delay the first post-sleep feed. Add threshold hysteresis and a restore delay so transient frequency or voltage shifts do not contaminate timing. Every mask event must be logged to preserve a complete and auditable record.

Lock-step MCUs: single or dual tokens?

Dual tokens are harder to spoof because each core must complete its critical slice before a feed is allowed. Out-of-phase feeding reduces common-mode misses. Single-token schemes require extra anti-spoof checks and arbitration rules to ensure a single compromised path cannot maintain a healthy watchdog illusion.

Can PMIC-integrated WDT meet independence?

It can, if its clock and safety paths are demonstrably isolated from the main timebase and regulators they supervise. Review the safety manual for independence claims and coupling analysis. Where doubt remains, add an external window watchdog to provide a second, separately powered timing domain.

Safe policy when the watchdog clock stalls?

Detect stall quickly using a reference or window comparator and promote to the most conservative policy. Prefer hold-off when the timebase trust is lost, with optional degrade for limited diagnosis. Redundant sources and stall thresholds should be validated across temperature and voltage corners with logs retained.

Map faults to PG/DIAG for fast RCA?

Use a unified codebook and define edge-versus-level semantics. Time-align PG, DIAG, and reset causes with a common timestamp base, and filter glitches at the reset-tree fan-out. Provide short, machine-readable records that link to raw logs so root-cause analysis is fast, repeatable, and reviewable.

Second-source swaps—pin/polarity/timing traps?

Reset polarity reversals, shorter t_pw(min), or different WDI sampling edges often break compatibility. Power-up order mismatches can also create false resets or empty feeds. Always run a reset-tree passthrough test at endpoints and update the BOM notes and validation plan before approving any substitution.

Minimal BOM notes for first-time pass?

State window parameters with jitter and latency, reset pulse width and polarity, diagnostics export capability, temperature grade, safety level, and second-source policy. Keep wording identical between the visible note and any JSON exports so audits, sourcing portals, and PLM workflows remain perfectly synchronized.