Reset Cause & Event Logger
← Back to: Supervisors & Reset
What This Page Solves
We solve the full loop from symptom to root cause with a practical, small-batch-friendly reset logger: standardize reset semantics across brands (BOR/POR/WDT/EXT/SOFT/THERM/PG_FAIL/LOCKUP), tag each event with either a reliable timestamp or a monotonic seq_id plus Δt ticks, and guarantee “write-before-collapse” so the last event survives brownouts. This removes “intermittent reboot” guesswork, eliminates cause ambiguity (single flag vs concurrent bits), controls duplicate entries from button bounce or slow ramps with a programmable coalescing window, and enables consistent factory checklists (commit success rate ≥99% at rated load/holdup), field diagnosis (minimal export in time/seq order), and RMA analytics (CSV/JSON parity with the on-device schema), while budgeting NVM wear with log-structured writes (FRAM preferred; small-page EEPROM acceptable with double-buffer commit flags). We also map PG/FAULT to IRQ latency targets and define BOM notes so cross-brand migrations don’t break your diagnostic toolchain.
Reset Causes: Taxonomy & Semantics
Use a versioned bitmap/enum so concurrent sources are preserved and summarized predictably: BIT0 POR, BIT1 BOR, BIT2 WDT, BIT3 EXT, BIT4 SOFT, BIT5 THERM, BIT6 PG_FAIL, BIT7 LOCKUP, BIT8..11 rail-specific, BIT12..14 vendor-mapped, BIT15 custom. Define a short concurrency window (e.g., 10–20 ms) that sets all arriving bits in one event while the summary view orders priority as LOCKUP > THERM > BOR > PG_FAIL > WDT > EXT > SOFT > POR. Specify exposure semantics per device family (pins: OD/PP, polarity, built-in debounce; registers: latch/clear on read or write-1-to-clear, sticky across VBAT domains) and guard against false positives on power ramps. Pair timestamps with a monotonic seq_id so logs remain comparable when RTC is absent or backup power is depleted. Ensure SOFT never masks higher-priority hardware causes, and treat PG_FAIL→BOR sequences as one event with both bits set while preserving arrival order using Δt or subseq. Keep schema_ver in exports so field tools stay stable as brands and PMICs change.
Event Record Schema (Logger Minimal Set)
Use a fixed, small event frame that survives brownouts and remains parsable across brands: schema_ver for compatibility; cause_bits (BOR/POR/WDT/EXT/SOFT/THERM/PG_FAIL/LOCKUP with concurrency window) as the primary meaning; seq_id as the monotonic anchor; ts holding absolute time or Δt ticks with ts_src to declare provenance; vbat_min to capture the lowest rail during the event; pf_flag for power-fail/fast-path markers; a compact aux nibble/byte for rail_id or zone; and crc8/16 to validate integrity; write with a pre-mark header on PG_FAIL IRQ, then finalize via double-buffer + commit_flag in a log-structured region so incomplete frames are skipped on read; target 16–20 B/frame (≤24 B with digest), little-endian fields, JSON/CSV export mirroring the exact on-device names.
Timebase & Monotonic Counters
Prefer absolute RTC when backup power exists; otherwise record Δt ticks and always anchor ordering with a monotonic seq_id that never rolls back; declare provenance via ts_src; when RTC is absent or uncertain, set ts to Δt since last event and expose optional boot_count; coalesce near-simultaneous causes with a small concurrency window; note RTC ppm over temperature for export-time calibration; UI sorts by seq_id, showing Δt/RTC side by side.
Write-Before-Collapse: How to Not Lose the Last Event
Guarantee the last reset event is safely stored between PG_FAIL and rail collapse by using a priority write path and holdup budgeting: enter an ISR/NMI within tens of microseconds, pre-mark a minimal frame {cause_bits, seq_id, ts_src}, then fill ts/vbat_min/pf_flag/aux and commit via double-buffer+commit_flag; size holdup with C·ΔV/I (or energy ½·C·(V_hi²−V_min²)/P) so t_commit ≥ t_ISR+t_pre+t_prog+margin; prefer FRAM for µs-class writes, or align EEPROM page writes and reserve a P0 slot for ultra-short commits; use a log-structured region with block rotation for wear, CRC8/16 to reject torn frames, an optional digest8 for tamper hints, and enforce ≥99% commit success at rated load across −40~+85 °C with JSON/CSV fields mirroring on-device names for factory, field, and RMA tooling.
Debounce, De-dup & Coalescing
Stabilize noisy edges and concurrent sources with one readable stream: debounce EXT and mechanical keys (10–30 ms) with long-press detect (≥1 s); apply enter/exit thresholds and hysteresis to PG/UV to prevent ping-pong on slow ramps; require minimum dwell for thermal trips (≥100 ms) and optional dT/dt gating; de-dup identical causes within a 50–100 ms suppression window by incrementing a repeat counter bound to the same frame; use a 10–20 ms concurrency window to merge multi-source arrivals into one event with multiple cause_bits while preserving order via Δt/subseq; prefer seq_id for ordering when ts_src≠RTC; verify with button-bounce injection, programmable-supply slow sweeps, and concurrent WDT+PG_FAIL tests across temperature so summary views retain low noise yet never hide lower-priority bits.
What This Page Solves
We solve the full loop from symptom to root cause with a practical, small-batch-friendly reset logger: standardize reset semantics across brands (BOR/POR/WDT/EXT/SOFT/THERM/PG_FAIL/LOCKUP), tag each event with either a reliable timestamp or a monotonic seq_id plus Δt ticks, and guarantee “write-before-collapse” so the last event survives brownouts. This removes “intermittent reboot” guesswork, eliminates cause ambiguity (single flag vs concurrent bits), controls duplicate entries from button bounce or slow ramps with a programmable coalescing window, and enables consistent factory checklists (commit success rate ≥99% at rated load/holdup), field diagnosis (minimal export in time/seq order), and RMA analytics (CSV/JSON parity with the on-device schema), while budgeting NVM wear with log-structured writes (FRAM preferred; small-page EEPROM acceptable with double-buffer commit flags). We also map PG/FAULT to IRQ latency targets and define BOM notes so cross-brand migrations don’t break your diagnostic toolchain.
Reset Causes: Taxonomy & Semantics
Use a versioned bitmap/enum so concurrent sources are preserved and summarized predictably: BIT0 POR, BIT1 BOR, BIT2 WDT, BIT3 EXT, BIT4 SOFT, BIT5 THERM, BIT6 PG_FAIL, BIT7 LOCKUP, BIT8..11 rail-specific, BIT12..14 vendor-mapped, BIT15 custom. Define a short concurrency window (e.g., 10–20 ms) … keep schema_ver in exports so field tools stay stable as brands and PMICs change.
Diagnostics Flow for Factory & Field
Build a repeatable loop from Factory bring-up → Prototype validation → Field swap/reset policy → RMA analysis using reset-event logs. In factory, snapshot cause_bits before clear and stamp a factory anchor (seq_id=0); inject EXT/PG edges to verify debounce, hysteresis, and same-second coalescing. In prototypes, combine WDT/THERM/PG_FAIL to measure P0→P1→commit coverage ≥99% and capture vbat_min/bus-error rates. In the field, prefer SOFT reset for minor faults, escalate to limited-current/derate after ≥3 repeats, and always export the log before replacing Supervisors/PMIC/RTC. For RMA, sort by seq_id, align Δt/RTC, bucket same-second arrivals, and triage root causes (threshold drift, pre-thermal signatures by dT/dt, slow supply ramps).
Recommended parts: TI TPS3850-Q1, TPS3890-Q1, BQ32002; ST STWD100, STM706/708, M41T82; NXP FS6523/FS6500, PCF8523; Renesas ISL88014, RAA271000, ISL12022M; onsemi NCP301/NCP302, NCP308, (supply note) RV-3028-C7; Microchip MCP1316/18/19, MIC841/842, MCP79410; Melexis MLX81116/81113.
Procurement Hooks (Small-Batch Ready)
Unify vendor semantics to accelerate small-batch swaps: map each PN to {vendor,family,part,aec_grade,cause_bitmap_reg,clear(W1C/RC),wdt_type,reset_out(type/pol/delay),pg_mask,i2c_addr,ts_capability}; annotate BOM lines with threshold/delay and ALT choices; keep at least two cross-brand alternates per function so 48-hour validation remains on schedule.
BOM REMARK (COPY & ADAPT)
Reset/WDG : BRAND/PN
Vth / Tol : VTH_VOLTS (±TOL_PCT %)
Delay : DELAY_MS ms
Output : OUTPUT_TYPE (OD|PP), POLARITY
AEC Grade : AEC_GRADE
Alternates : BRAND1/PN1 ; BRAND2/PN2
Holdup Cap : C_HOLDUP_mF (t_commit ≥ T_COMMIT_ms)
Logger Schema : SCHEMA_VER
Cause Map : { BOR:BIT , POR:BIT , WDT:BIT , EXT:BIT , THERM:BIT , PG_FAIL:BIT , LOCKUP:BIT }
Clear Method : CLEAR_MODE (RC | W1C)
I²C Address : 0xADDR
Supply Notes : CUT-TAPE OK ; PARTIAL-REEL OK ; TRACEABILITY=ON
Security & Tamper-Evidence (Minimal)
Use a log frame that is verifiable after power loss without heavy crypto. Provide cheap integrity, anti-replay, and export parity so factory/field/RMA tools trust the data across brands and RTC states.
- Cheap integrity:
crc8/16+digest8(LFSR overdevice_id ⊕ seq_id ⊕ cause_bits ⊕ ts) to flag edits/bit flips. - Anti-replay: accept only strictly increasing
(boot_count, seq_id); storeboot_countin a domain that can’t roll back. - Write-before-collapse: pre-mark then commit; torn frames are skipped by CRC; exports mirror on-device names.
- Low-cost signing (optional): append 4–8-byte HMAC tag computed offboard during factory provisioning; devices verify tag with a shared secret if available.
- Privacy: avoid PII; keep only cause/timing/voltage flags; mask vendor-unique test registers in exports.
FAQs
How do I prove a log wasn’t edited after power loss?
Each frame carries crc8/16 plus a lightweight digest8 derived from device_id, seq_id, cause_bits, and ts. Tools recompute both and reject frames that fail either check. Because seq_id is monotonic, any removal or re-order becomes detectable during import and timeline reconstruction.
What prevents replaying an old, “good” event sequence?
Import logic enforces strict growth of the tuple (boot_count, seq_id). If either value does not increase, the frame is dropped and a replay is recorded. Optional offboard HMAC on batch exports adds another barrier if logs are moved between devices or sites.
Can I keep authenticity without adding a crypto chip?
Yes. Use CRC + digest for cheap integrity and add an export-time HMAC over the file with a backend key. Devices verify structure; your server signs the bundle before it leaves the factory. Field tools verify the HMAC, giving authenticity without per-device secure elements.
How do I stop logs from going backward after brownouts?
Store boot_count in a domain that increments early in boot and never decrements. Pair it with seq_id that always increases before commit. If a commit tears, CRC fails and the reader skips it, preserving a strictly forward timeline across resets and brownouts.
Do I need real timestamps if RTC is unreliable?
No. Use ts_src to declare provenance. If RTC is absent, set ts to Δt ticks and always anchor order with seq_id. Exporters display both Δt and any available wall time, so factory, field, and RMA views stay comparable without trusting absolute time.
What’s the smallest frame that still catches tampering?
Keep 16–20 bytes: {schema_ver,cause_bits,seq_id,ts,ts_src,vbat_min,pf_flag,aux,crc8,digest8}. CRC catches tears; digest flags edits. It’s not cryptography, but it’s enough to make silent modifications detectable and to keep wear within budget on EEPROM/FRAM.
How do I safely clear cause bits during production?
Snapshot to the logger first, then clear with the vendor’s method (W1C or read-clear). The logger frame remains immutable due to CRC. Tools record the factory anchor (seq_id=0) so later frames are always greater, preventing resets from erasing audit history.
What if someone swaps the RTC module to spoof dates?
Ordering never relies solely on RTC. Importers sort primarily by seq_id, with RTC/Δt shown for human context. When RTC provenance changes, ts_src indicates the switch, and digest/CRC still protect integrity even if human-readable dates are misleading.
How are multi-source, same-second resets handled?
Use a 10–20 ms concurrency window. All causes within the window set multiple cause_bits in a single frame, preserving arrival hints with Δt/subsequence if present. This avoids duplicate entries while preventing low-priority causes from masking higher-priority ones.
Can logs be validated offline after a total power loss?
Yes. The combination of CRC, digest, and strictly increasing (boot_count,seq_id) permits full offline validation. Readers skip torn frames, reject replays, and rebuild a monotonic sequence without needing connectivity, secrets, or wall power on the device.
How do I keep privacy while exporting diagnostics?
Export only event semantics and timing (cause_bits, seq_id, ts/Δt, vbat_min, flags). Strip manufacturing IDs unless traceability is contractually required. If needed, hash device identifiers in exports while keeping raw values on secure factory servers.
When should I move from digest to real cryptography?
Adopt HMAC or signature when adversaries are motivated and have physical access, or when regulatory evidence is required. Start with digest+CRC for cost and wear, then add per-batch HMAC at export. For highest assurance, add a secure element and per-frame MAC keys.