Windowed Watchdog ICs
← Back to: Supervisors & Reset
What It Solves
Early Feed
Feed occurs before Tmin (Tserv < Tmin − Margin), often due to ISR misuse, task disorder, or API abuse—basic WDTs miss this.
Late Feed
Feed happens after Tmax (Tserv > Tmax + Margin) from blocking I/O or priority inversion; windowed WDT flags the slip deterministically.
Missing Feed
No feed inside the window; once past Tmax it’s a deadline failure. Common in sleep/wake mis-sequencing or paused clocks.
Timing Window
[Tmin, Tmax] is an allowed interval; the actual feed is Tserv. Add Margin/Guard band to absorb jitter and clock drift.
Latch-before-Reset
On violation, latch cause bits (FAULT/IRQ or register) before reset, preserving evidence for post-mortem and cloud analytics.
Added Value
Detects fake/early feeds, improves root-cause visibility, and reduces reset storms via policy staging (IRQ → power-limit → reset).
- Jitter/clock drift → misclassification without Margin/Guard band.
- No latch-before-reset → production failures become unreproducible.
- Sleep/wake first-window not widened/frozen → false Early/Missing.
How Windowed WDT Works
Core Variables
Tmin, Tmax (window limits), Tserv (actual feed), tw_rst (violation→reset delay), tLOCK (latch hold).
Margin & Guard band
Margin = M_jitter + M_clock + M_impl where M_jitter = k·σ (k=2–3), M_clock = ppm_total × μ, and M_impl covers synchronizers/deglitch/quantization.
Window Tuning
Tmin = max(μ − k1·σ − M, Tmin_hw_min)
Tmax = min(μ + k2·σ + M, Tmax_hw_max)
Then add 5–15% Guard band to both sides; provide wide-window profile for high-temp.
Rules & Options
- Early:
Tserv < Tmin − Margin - Late:
Tserv > Tmax + Margin - Missing: no feed within window → past
Tmax - Feed semantics: single/dual-edge, min pulse, optional “signature”.
Example (μ=20ms, σ=1.2ms, ppm_total≈200, M_impl=0.3ms): M_jitter≈3.0ms (k=2.5), M_clock≈0.1ms (lower bound), M≈3.4ms. With 10% guard band (~2ms): Tmin≈15ms, Tmax≈26ms. High-temp: widen to ~14–28ms.
Reset & IRQ Policies
Light: IRQ
First-time Early/Late or sparse Missing. Latch FAIL_CAUSE, increment counters, raise IRQ, then degrade (rate-limit, pause non-critical tasks).
Medium: Power-limit / Limp
Repeated violations within cooldown (≥K). Enter Power-limit/Limp, apply current/duty caps, start Tcool. Exit only after M healthy windows.
Severe: Reset
Continuous Missing or hard Late. Strict order: Latch → (optional) IRQ → Reset. Preserve RST_REASON=WDT_WINDOW for post-boot readout.
Interlocks & Priorities
- Chain: Latch → IRQ → Reset (never invert).
- Cooldown gates upgrades; count once to avoid chattering.
- PG/FAULT & PM linkage: only upgrade when
PG_OR=true.
Traceability set: FAIL_CAUSE, RST_REASON, COUNTERS, COOLDOWN, WINDOW_STATS, PM_STATE. Latch must be readable before reset propagation.
Sequencing & Power Edges
Cold-start
Hold WDT_EN=0 until CLK_STABLE & PG_OR. Use wide first window then revert to normal after the first OK cycle.
Sleep / Wake
Before sleep set WDT_FREEZE=1. On wake, clear marks, enable wide-first-window, and reset decision state to avoid stale violations.
PG-OR Aggregation
De-freeze windows only when PG_OR=true; on any rail drop (PG_OR↓) immediately freeze and discard that cycle’s decision.
Register Hints
WDT_EN, WDT_FREEZE, FIRST_WINDOW_MODE, PG_OR, CLK_STABLE, DECISION_STATE. Synchronize control lines across domains.
Tuning & Jitter Immunity
Inputs to Measure
- Task period stats: μ, σ, Jpp
- RTOS tick quantization: Δtick = 0.5×Tick
- Clock drift to time: Δtclock = μ×ppm×1e−6
- Implementation margin: Mimpl
Margins & Window
M = k·σ + Δt_clock + Δtick + M_impl, k=2–3
Tmin = max( μ − k1·σ − M , Tmin_hw_min )
Tmax = min( μ + k2·σ + M , Tmax_hw_max )
Guard Band
Apply 5–15% guard band on both sides, or add G = α·σ (α=0.5–1) to absorb system-level jitter and tails.
Room/Hot Profiles
Room: ppm≈100–200, g=8–10%. Hot: ppm×2, σ×1.1–1.3, g=12–15%. Allow one-time switch to a wider window at temperature boundaries.
Worked Example
μ=20 ms, σ=1.2 ms, Tick=1 ms ⇒ Δtick=0.5 ms, ppm=300 ⇒ Δt_clock≈0.1 ms, M_impl=0.3 ms.
k=2.5 ⇒ M≈3.9 ms → Tmin≈16.1 ms, Tmax≈23.9 ms. With 10% guard band ⇒ Tmin=18 ms, Tmax=26 ms.
Acceptance
- Jitter sweep 1 h:false-positive < 10⁻⁶/h
- Tick 0.5/1/2 ms:same acceptance
- −40/25/125 °C:zero first-window misfire
Diagnostics & Telemetry
Minimal Observable Set
- FAIL_CAUSE: EARLY/LATE/MISSING
- RST_REASON: WDT_WINDOW/BOD/POR/EXT…
- COUNTERS: per-cause and total
- TIMESTAMPS: last violation/reset
- SNAPSHOT: minimal pre-reset context (≤16B)
Register Map (I²C/PMBus)
0x00 STAT (CAUSE/REASON)
0x02 CNT_E/L/M (16-bit)
0x08 TS_LAST_VIO (32-bit)
0x0C SNAP0..3 (≤16B)
0x10 CTRL (clear/freeze/first-window/cooldown)
Pins & Paths
FAULT mirrors CAUSE≠0; IRQ is maskable/clearable. Pre-reset latch; post-boot read via I²C/PMBus; optional NVM ring buffer.
Cloud Schema (Minimal)
Device ID, FW ver, wdt object (cause, reason, counters, ts, snapshot), window profile (room/hot), and environment (temp, VDD).
Clear & Integrity
On boot: read STAT/CNT/TS; if RST_REASON=WDT_WINDOW, log CAUSE. Clear via CTRL.CLEAR. Counters are monotonic; snapshot may carry CRC/MAC.
Acceptance
- Injected E/L/M ×1000:evidence chain = 100%
- A/B brand: identical semantics (mapping allowed)
- Cloud vs local counters:no loss/dup
Design-in Checklist
Electrical / Schematic
RESET integrity, WDI quality, timing & EMC.
- RESET type (push-pull / open-drain), min pulse ≥ MCU spec; short, clean route.
- External pull-ups sized for noise/RC; add clamp/RC if long trace or cross-board.
- WDI deglitch: meet min high/low pulse; use Schmitt or RC on noisy domains.
- Sequence: enable window only after PG_OR & CLK_STABLE.
- Local decoupling (0.1 µF + 1 µF) near VDD; controlled return path.
Firmware / Interface
Single API, signature feed, low-power policy.
- Single
wdt_feed()entry; optional feed signature (sequence/double-hit). - Convert μ/σ/ppm→Tmin/Tmax; apply margin & guard band (5–15%).
- Freeze on sleep; widen first window on wake; clear stale decision state.
- DFU/Factory mode: temporary bypass or wide-timeout; privilege-gated.
- IRQ→Limp→Reset chain with cooldown to prevent reset storms.
Production / Testability
Scripts, logging, cooldown & persistence.
- Run boundary scripts (E/L/M) and print
FAIL_CAUSE,RST_REASON. - Cooldown timer verified: repeated violations elevate once, not chatter.
- Pre-reset snapshot (task/sp/status) persisted to NVM or gateway.
- Room/Hot profiles verified across −40/25/125 °C with jitter sweep.
- Field log schema: counters, last timestamps, window profile, temp/VDD.
Cross-Brand Alternatives & Migration
Semantic Dimensions
Feed semantics (edge/pulse/signature), Tmin/Tmax ranges & steps, tolerance (±%), RESET polarity & driver, latch-before-reset support, readout path, AEC-Q100 grade.
Migration Tips
From timeout-only → windowed: start wide, enforce latch-first. From SBC → discrete: re-check RESET pulse/logic; re-verify Early/Late/Missing boundaries.
Texas Instruments
TPS3430-Q1 — Discrete window WDT; resistor-set period/window; AEC-Q100.
Why pick: clean latch-before-reset flow; easy to add as safety redundancy.
TPS3852 — Supervisor + WDI; programmable delays & thresholds.
Why pick: power threshold + window in one, fewer BOM lines.
STMicroelectronics
L99PM62GXP / L99PM72GXP — Automotive SBCs with configurable window WDT (SPI).
Why pick: start-up wide window → windowed run; solid for body/door ECUs.
NXP
UJA1169 / UJA1069 — LIN/CAN SBCs with window/timeout modes; predividers.
Why pick: flexible modes; easy migration from legacy SBCs.
FS6500 — Safety SBC (ASIL-D capable), watchdog + fail-safe power.
Why pick: high safety domains; Grade 0/1 options.
Renesas
RL78 WWDG (MCU) — Native window watchdog peripheral with early-warning.
Why pick: cost-optimized; keep window control in MCU.
RAA271000 — Automotive PMIC; pairs well with MCU WWDG for layered reset.
Why pick: power & reset tree coherence, SPI programmability.
onsemi
NCV7450 / NCV7471 / NCV7462 — LIN/CAN SBCs with window WDT semantics (e.g., first-service → window mode).
Why pick: well-documented window phase; body domain staple.
NCV8768C — LDO + window WDT + reset in one small device.
Why pick: distributed nodes; BOM & area saver.
Microchip
ATA663232 / ATA6632xx / ATA6612C — LIN SBC families with window watchdog capability.
Why pick: low-cost LIN nodes, integrated diagnostics.
Melexis
MLX80051 — LIN SBC with configurable window WDT and wake features.
Why pick: actuator-side compactness; automotive qualified.
| Brand | Series/PN | Window Semantics | RESET | Readout | AEC-Q100 | Why Pick |
|---|---|---|---|---|---|---|
| TI | TPS3430-Q1 / TPS3852 | Discrete window; resistor-set / supervisor+WDI | PP/OD, latch-first | FAULT/IRQ + I²C | Yes | Redundant safety path / fewer BOM lines |
| ST | L99PM62GXP / 72GXP | Windowed WDT (SPI) | Configurable pulse | SPI regs + pins | Yes | Body/door ECUs, start-up wide window |
| NXP | UJA1169 / UJA1069; FS6500 | Window/timeout modes | Polarity options | SPI/I²C + pins | Yes | Flexible legacy migration; safety-capable |
| Renesas | RL78 WWDG; RAA271000 | MCU native window | MCU reset tree | Regs + pins | MCU-dep. | Cost-optimized layering |
| onsemi | NCV7450/7471/7462; NCV8768C | First-service→window; LDO+WDT | Configurable | SPI + pins | Yes | Body domain staple; area saver |
| Microchip | ATA663232/6632xx/6612C | Window watchdog (LIN SBC) | Configurable | Regs + pins | Yes | Low-cost LIN nodes |
| Melexis | MLX80051 | Configurable window | LIN SBC reset | Regs + pins | Yes | Compact actuator-side |
BOM Hooks & A/B Validation
BOM remark (copy & fill placeholders)
Keep one line per requirement; remove the placeholders you don’t use.
Windowed WDT; AEC-Q100; VDD=__V; Iq ≤ __ µA; Tmin=__ ms; Tmax=__ ms; Guard=__%; RESET=OD/PP, Pol=__; Latch-before-Reset=Y; Fault Readout=Pin/I²C; Pkg=__; Primary=[Brand A PN], Secondary=[Brand B PN]; Samples=__ pcs; A/B boundary test required.
Seq-1 Early
Tserv = Tmin − ε for 3 consecutive feeds → EXPECT: FAIL_CAUSE=EARLY; counter+1; policy (IRQ/Limp/Reset) activated.
Seq-2 Late
Tserv = Tmax + ε for 3 times → EXPECT: LATE; counter increments; policy chain runs.
Seq-3 Missing
No feed within window → EXPECT: MISSING; latch-first then reset per policy.
Seq-4 Jitter Sweep
Apply ±Jpp Gaussian perturbation to feed time; false-positive < threshold across 1 h run.
Seq-5 Sleep/Wake
Freeze on sleep; first window on wake is widened and state reset → EXPECT: 0 false events.
Seq-6 Cold-start
Delay window enable until PG_OR & CLK stable; verify first-window behavior and RESET pulse width.
Worked Example
Observed Conditions
- Task period μ=20 ms; jitter Jpp=±2 ms
- Δclock=±200 ppm @25 °C (≈±0.004 ms)
- High-temp doubles ppm/jitter envelope
Tuning (Room/Hot)
Room: Tmin≈15 ms, Tmax≈27 ms, Guard=10%
Hot: Tmin≈14 ms, Tmax≈29 ms
Firmware Rules
wdt_feed()only in scheduler heartbeat- Freeze window before sleep; widen first window on wake
- Latch-before-Reset; post-boot read & log CAUSE/REASON
Frequently Asked Questions
What does a windowed watchdog prevent that a basic watchdog does not?
A windowed watchdog detects three failure classes—early feeds, late feeds, and missing feeds—whereas a basic watchdog only notices timeouts. It also supports latch-before-reset, creating a traceable evidence chain for post-mortem. This combination reduces false resets, surfaces scheduler anomalies and fake feeds, and enables tiered policies instead of blunt resets.
How do I choose the window width?
Measure the task period mean μ, deviation σ, and peak-to-peak jitter Jpp; convert clock drift (ppm) and temperature effects into time error Δt. Set Tmin = μ − k₁σ − Δt and Tmax = μ + k₂σ + Δt with k≈2–3, then add a 5–15% guard band. Start slightly wide for bring-up, log field data, and iteratively tighten.
Why is “latch-before-reset” mandatory?
Latching cause bits, counters, timestamps, and a small pre-reset snapshot before asserting RESET preserves the root-cause evidence. It satisfies production traceability, enables cloud visualization, and prevents post-reset blindness. Ensure tLOCK exceeds reset propagation and logging paths so evidence survives brownouts and quick reboots.
How should I trade off IRQ versus hard reset?
Use a tiered policy: light violations trigger IRQ, logging, and graceful degradation; repeated or hazardous cases enter limp/power-limit with a cooldown; only severe or persistent faults escalate to reset—after latching evidence. A cooldown timer prevents reset storms and protects availability.
How do I avoid false triggers at cold-start, sleep, and wake?
Delay enabling the window until rails are valid (PG-OR) and the clock is stable. Freeze the window during sleep; on wake, clear stale state and widen the first window. Align WDT_EN with power sequencing so early edges and oscillator settling do not create spurious early/late decisions.
How do I production-test boundary feeding?
Run three boundary scripts—Early at Tmin−ε, Late at Tmax+ε, and Missing with no feed—and a jitter sweep using ±Jpp Gaussian noise. Each test should update cause bits, counters, and timestamps. Export a unified log so A/B device comparisons are reproducible and audit-friendly.
Should the feed API use a signature or double-edge scheme?
In higher safety domains, a signature (sequence/double-hit) reduces accidental feeds and misuse. Balance it against added latency and CPU time; ensure the signature fits inside the valid window and aligns with edge semantics. Keep a single public wdt_feed() entry for auditable control.
When should I tighten or relax the window?
Start wide to stabilize, then tighten as field data shows low false positives. Relax Tmax under light-load jitter or high-temperature drift; tighten Tmin for critical timing integrity. Consider safety level, nuisance-reset cost, and aging trends. Support dual profiles (Room/Hot) and OTA updates.
How should I persist counters and timestamps?
Use an NVM ring buffer with monotonic counters and compact timestamps. Capture cause bits and a minimal pre-reset snapshot (task ID, stack pointer, last feed result). For connected devices, batch upload to a signed cloud schema. Ensure graceful handling during power loss and verify integrity on boot.
What cross-brand semantic gaps cause migration pitfalls?
Vendors differ in feed semantics (edge vs pulse vs signature), Tmin/Tmax range and step granularity, tolerance, reset polarity/driver, and whether latch-before-reset is hardware-guaranteed. Mitigate with a mapping table and A/B boundary scripts; re-verify first-window behavior, pulse width, and readout paths.
Why are late feeds more likely at light load or high temperature?
Light loads can increase scheduler jitter relative to the period, while high temperature amplifies clock drift and execution latency. Combined, these push Tserv toward the late boundary. Countermeasures include relaxing Tmax, improving task priority, stabilizing clocks, and adopting hotter profile parameters.
What’s a robust second-source strategy for automotive projects?
Compare semantics, window step sizes, reset attributes, and readout paths first. Qualify both suppliers with identical Early/Late/Missing boundary scripts and jitter sweeps, preserving a common evidence chain. Run parallel samples, align documentation (PPAP), and encode primary/secondary plus guard parameters directly in the BOM remarks.