Modern buck/boost rails should measure and remember how they live: voltage/current/temperature snapshots, fault black-box logs, and lifetime/stress counters. This page explains what to log, how to wire/scale it, how to store/stream it, and how to act on it (derating, service flags, RMA triage).
Introduction & Scope
Telemetry measures and remembers how rails live (V/I/T, events, counters). Protection makes fast threshold actions (OC/OV/OT). Diagnostics turns logs into service decisions. The outcome is fewer RMAs, faster isolation, and healthier buck/boost systems.
Field Returns Triage
Pre/post snapshots restore context so engineers can isolate root cause instead of “no-fault-found”.
Warranty Thresholds
Objective counters (A·s, °C·h, fault hours) support fair, data-backed warranty decisions.
Adaptive Derating
Policies act on health state (warn → derate → conservative mode) to prevent repeat failures.
- % faults with pre-event context (target ≥ 95%)
- % RMAs with actionable root cause (↑ by 30–50%)
- Time-to-isolation for suspected faults (< 30 min)
Telemetry Signals & Accuracy Targets
Start with V/I/T plus PG. Add mode, duty and fSW where available. Hit the DC targets: V ±1%, I ±3%, T ±2 °C. Use min/max windows and hardware latches to avoid missing fast edges.
| Signal | Typical Range | Interface | Accuracy Target | Min/Max? | Notes |
|---|---|---|---|---|---|
| Vout | 0.6–24 V (rail) | Divider → ADC / PMBus | ±1% dc | Yes | RC + clamp; ref decoupling; alias control |
| Iout | mA–10s A | Shunt/DCR + CSA | ±3% dc | Yes | Kelvin sense; drift/offset budget; bi-directional? |
| Temperature | −40–125 °C | NTC / IC sensor / estimate | ±2 °C typ. | Yes | Placement vs thermal lag; airflow bias |
| PG / RESET | 0/1 | GPIO / PMBus bit | N/A | Edge | Don’t load PG node; separate from comparator path |
Voltage
Set divider for mid-range ADC code at nominal; add RC and OV clamp. Budget resistor tolerance, ADC ENOB/INL and reference drift.
Current
Shunt for accuracy; DCR for low loss at high current with temperature comp. Kelvin pick-off; short symmetric routes into CSA.
Temperature
NTC (cheap, needs curve), digital sensor (linear/I²C), or on-die estimate. Place near hottest copper; account for lag to junction.
Accuracy Targets
- Voltage: ±1% (dc)
- Current: ±3% (dc)
- Temperature: ±2 °C typical
Noise vs Bandwidth
| Bandwidth | V-RMS (norm.) | I-RMS (norm.) |
|---|---|---|
| 10 Hz | 1.0 | 1.0 |
| 100 Hz | 3.2 | 3.1 |
| 1 kHz | 10.0 | 9.8 |
Numbers are normalized for illustration; shape shows why decimation/EWMA matter.
Common Pitfalls
- Shared ground impedance causing load-dependent sense error
- Aliasing of switching ripple into ADC; fix with RC and sync/decimation
- Poor reference decoupling or long divider legs driving drift/noise
Sensing Hardware Topologies
Voltage uses a divider with RC and optional high-Z buffer plus OV clamp. Current uses a low-ohm shunt with differential amplifier, or inductor DCR with temperature compensation; support bidirectional flow and range switching. Temperature uses NTC networks or digital sensors, and in some ICs an on-die estimator from RDS(on). Keep protection comparators on a separate fast path and do not load PG nodes.
V-sense (Divider + RC)
- Divider ratio → nominal ADC code at 50–70% FS
- RC anti-alias; optional op-amp buffer (high-Z)
- OV clamp to protect ADC/reference
I-sense (Shunt / DCR)
- Shunt + CSA (gain/offset/drift); Kelvin pick-off
- DCR sense for low loss; add temp comp / calibration
- Bidirectional CSA; range switch via PGAs / parallel shunts
T-sense (NTC / Digital / On-die)
- NTC ladder (needs curve/table) or digital I²C/SPI sensor
- On-die estimator from RDS(on)/OT flags
- Placement vs thermal lag; avoid airflow bias
Protection coexistence: keep the fast comparator path (OV/OC/OT) separate from the slow ADC telemetry; do not hang loads on PG/FAULT nodes; share only a synchronized reference/ground.
Sampling, Filtering & Compression
Use a dual-rate model: 1–10 Hz health logs plus fast min/max latching from comparators at the switching domain. Apply MA/EWMA filtering and decimation; track droop during load steps. Compress by storing deltas and run-lengths; quantize to 12–14 bits; always keep wall-clock and a monotonic tick.
Rates
- Health log: 1–10 Hz periodic records
- Fast facts: hardware min/max or comparator edges
- Sample >10× target health bandwidth; anti-alias RC
Filters
- Moving average vs EWMA (α = 1 − e−Ts/τ)
- Low-pass then decimate by N; align to controller sync
- Droop tracking baseline vs min-window during load steps
Compression
- Delta + run-length for quasi-steady segments
- Quantize to 12–14 bits, unit-consistent fields
- Keep wall-clock (RTC) and monotonic tick for gaps
Footprint Estimate
| Record | Bytes | Rate | Per-day |
|---|---|---|---|
| Health (V/I/T + PG) | 24 B | 5 Hz | ~432 KB |
| Event snapshot | 96–256 B | on event | depends |
| After delta/RLE | ×0.2–0.5 | — | ~86–216 KB |
Event Model & Black-Box Schema
Define a precise event record: what fired (type, cause bits), when it happened (monotonic tick + RTC), and context (pre/post buffers, min/max). Default pre-trigger ≈ 200 ms and post-trigger ≈ 800 ms; tune per product class.
Voltage
UV/OV; brown-in/out; PG drop
Current & Thermal
OC (limit/retry), OT (derate/latch)
Control State
Foldback trigger, hiccup start/stop, PFM↔CCM swap
{
"ts_ms": 1730102456123,
"tick": 987654321,
"event_type": "UV",
"cause_bits": {"uv":1,"pg_drop":1,"mode_swap":0,"hiccup":0},
"pre_ms": 200, "post_ms": 800,
"stats": {"v_min": 4.58, "v_max": 5.12, "i_max": 3.4, "t_max_c": 84.2},
"counters": {"retries": 2, "duration_ms": 120, "sequence": 41},
"flags": {"crc_ok": true, "synced": true, "drift_ppm": 12},
"blob_ref": "rle:...",
"fw_rev": "1.3.2"
}
- Missed-event rate < 0.1%
- Timestamp skew < 2 ms vs external sync
- Window min/max error < 2% FS
Lifetime Counters & Health Indices
Track cumulative exposure and stress, then map to a stable SOH score with hysteresis. Use bins (not lifetime promises) for a defensible health view.
Cumulative Counters
- Qabs = ∑|I|·Δt (A·s / A·h)
- Θ = ∑(T − Tref)·Δt (°C·s / °C·h)
- On-time & Fault-hours
Stress Bins
- Ripple-current: <10%、10–20%、>20% (Iripple/Iavg)
- Thermal: <60 °C、60–85 °C、85–105 °C、>105 °C
- Arrhenius-like buckets; no overclaiming
SOH Score
- Weights: ripple 0.35, thermal 0.45, fault 0.20
- Traffic-light: Green ≥ 80, Amber 60–79, Red < 60
- Hysteresis: ±3 points to avoid flapping
| Metric | Bin 1 | Bin 2 | Bin 3 | Bin 4 |
|---|---|---|---|---|
| Ripple (%) | <10 | 10–20 | >20 | — |
| Thermal (°C) | <60 | 60–85 | 85–105 | >105 |
{
"counters": {"Q_as": 2.3e6, "Theta_c_s": 7.1e7, "on_time_h": 912.0, "fault_h": 2.4},
"bins": {
"ripple": {"lt10": 42.1, "b10_20": 31.4, "gt20": 26.5},
"thermal": {"lt60": 55.0, "b60_85": 28.0, "b85_105": 14.0, "gt105": 3.0}
},
"weights": {"ripple": 0.35, "thermal": 0.45, "fault": 0.20},
"soh": {"score": 82, "band": "Green", "hysteresis": {"up": 3, "down": 3}}
}
Automated Policies & Reactions
Policies escalate from soft (warn → derate → reduce fSW → raise dead-time → conservative mode) to hard (latch-off or hiccup with retry/cooldown). A human loop raises a service flag when SOH < θ or black-box faults exceed N within M hours.
Soft actions
- Warn (log + throttle hint)
- Derate current limit
- Reduce switching frequency
- Raise dead-time
- Enter conservative mode
Hard actions
- Latch-off vs hiccup
- Retry counts with cooldown timers
- Persist cause bits + sequence ID
Human loop
- Raise service flag when SOH < θ
- Or when events ≥ N in M hours
- Attach recent black-box snapshot
| Condition | Policy step | Action | Exit rule | Debounce / Hysteresis | Log fields |
|---|---|---|---|---|---|
| OC_WARN (≥90% Ilim for >10 ms) | DERATE_20 | −20% current limit | Clear when <70% for 5 s | Enter: 10 ms, Exit: 5 s | policy_id, step, reason, ts_ms |
| OT_WARN (T > Tth−5 °C) | REDUCE_FSW | −25% fSW | T < Tth−10 °C | Temp avg over 1 s EWMA | T_max_c, fsw_new |
| UV_LATCH (UV event > 50 ms) | LATCH_OFF | Disable gate; wait cooldown | Manual or timed retry | Cooldown 30–120 s | cause_bits, retries.used/limit |
| Repeated faults (≥N in M h) | SERVICE_FLAG | Notify host; pin flag | Ack from host | Min interval 10 min | rma_hint, blackbox_ref |
{"policy_id":"POL-DRV-01","step":"DERATE_20","reason":"OC_WARN",
"ts_ms":1730102456,"cooldown_s":30,"retries":{"used":1,"limit":3}}
Interfaces & Export (PMBus / SMBus / CSV / CBOR)
Provide live telemetry over PMBus/SMBus and compact exports for audits. Support a custom I²C command for min/max since last read. For dumps, use newline-delimited CSV (NDCSV) or a compact CBOR/MessagePack stream. Sign records (CRC/HMAC) to avoid tamper disputes.
Live (PMBus/SMBus)
- Standard pages for V/I/T registers
GET_MINMAX_SINCE_READ(auto-clear)GET_COUNTERS(Q_abs, Θ, fault-hours)
Export (CSV / CBOR)
- NDCSV: append-friendly, human-auditable
- CBOR/MessagePack: burst dumps, small size
- Field names/units:
vout_V, iout_A, temp_C, ts_ms, tick, event_type
Security
- Per-packet
crc32integrity - Optional
HMAC-SHA256(device_id, fw_rev) - Anti-replay: timestamp + counter
NDCSV sample
ts_ms,vout_V,iout_A,temp_C,pg,event 1730102456001,5.02,1.23,54.1,1,none 1730102458002,4.88,2.90,62.7,0,UV
One record per line; easy append & spreadsheet import.
CBOR keys (illustrative)
{"ts_ms":1730102456,"vout_V":5.02,"iout_A":1.23,"temp_C":54.1,"pg":1,
"minmax":{"v_min":4.58,"v_max":5.12},"sig":"hmac:BASE64..."}
Binary payload on the wire; JSON here only shows field names.
Field naming & units
vout_V, vin_V, iout_A, temp_C, pg, mode, fsw_khzts_ms, tick, event_type, cause_bits, seq- Min/Max windows:
v_min, v_max, i_max, t_max_c
Storage: Retention, Wear & Power Gaps
Use layered storage with wear-aware commits and brown-out safety: RAM ring for live samples → periodic FRAM/EEPROM journal commits → optional external SPI/NOR flash for bulk dumps. Cap records and retain by severity.
Layers
- RAM ring (seconds to minutes of context)
- Periodic FRAM/EEPROM commit (journal pages)
- Optional external flash (bulk export / long-term)
Wear & Retention
- Rolling journal + round-robin wear leveling
- Record cap: last 64 events per class
- Severity policy: Critical > Major > Warning
Power Gaps
- Supercap “last-gasp” commit on brown-out IRQ
- Integrity marker: pre-commit header + post-commit CRC
- Boot scan recovers only valid tuples
Record & page metadata
- Per-record:
ts_ms, tick, class, severity, len, crc32 - Per-page:
page_id, write_ptr, erase_cnt, crc32_page, epoch, seq_base - Retention: critical (∞), major (90 days), warning (30 days) — example policy
{"hdr":{"page_id":17,"seq":8921,"len":128,"epoch":7},
"payload":{"event_type":"UV","ts_ms":1730102456,"v_min":4.58,"i_max":3.4},
"crc32":"0x7B3A91C2"}
Validation: Calibration & Proof
Calibrate V/I/T paths, then prove behavior with scripted injections and exports. Store calibration revision & date; accept only if the metrics meet targets.
| Item | Method | Stored fields | Target |
|---|---|---|---|
| V divider ratio | 2-point vs DMM / ref supply | v_ratio, v_offset, cal_rev, date |
±1% FS (dc) |
| CSA gain/offset | 0 / Inom / Imax points | csa_gain, csa_offset |
±3% FS (dc) |
| T curve | Ice / ambient / hot fixture | t_curve[], t_rev, date |
±2 °C typ |
| Reference | Meter traceability check | fixture_id, operator, hash(cal) |
doc only |
Scripted injections
- UV/OV/OC/OT pulses (duration & amplitude)
- Load steps with defined slew
- Thermal soaks (plate temp + dwell)
Verify
- Pre/post buffers present & in range
- Window min/max & cause bits correct
- Timestamps vs sync; drift corrections stored
Accept
- Missed-event rate < 0.1%
- Time-skew < 2 ms
- Counter error < 2%
- Export parse success = 100%
{"cal_rev":"v3.1","date":"2025-10-28","fixture_id":"FX-42A","operator":"QA07",
"v_ratio":2.0031,"v_offset":0.002,"csa_gain":50.12,"csa_offset":0.0013,
"t_curve":[-10,25,85],"crc32":"0x9AD0C4B1"}
name,target,measured,error,pass MissedEventRate,<0.1%,0.03%,-,TRUE TimeSkew_ms,<2,1.2,-,TRUE CounterErr_As,<2%,1.1%,-,TRUE ExportParse_OK,100%,100%,-,TRUE
Internal Telemetry — PMICs & Power Modules
Parts that expose V/I/T/PG/fault via PMBus/I²C. Pick by rails, accuracy class, AEC-Q, and evaluation support.
| Brand | Part | Rails / Range | Telemetry & Interface | Accuracy / Class | AEC-Q | Eval tools | Notes |
|---|---|---|---|---|---|---|---|
| TI | TPS544C25 | Single buck, ~20 A | PMBus (V/I/T/PG/fault) | Digital telemetry, device-grade | Variants | EVM available | Data center/SoC rails; sequencing friendly |
| TI | TPS546C23 | Single buck, ~35 A | PMBus telemetry | Digital; parallelable | Variants | EVM available | Higher current; current share |
| TI | TPS549D22 | Single buck, ~25 A | PMBus + telemetry | Device-grade | — | EVM available | Popular for FPGA/ASIC rails |
| TI | TPSM846C23 | Module, ~35 A | PMBus (V/I/T/PG) | Module-class | — | EVM/GUI | Integrated inductor; fast bring-up |
| TI | UCD90120A | Monitor/Sequencer (12 rails) | PMBus (monitor + logging) | Supervisor-class | — | EVM/PMBus GUI | Use as multi-rail telemetry gateway |
| ST | STPMIC1A | Multi-rail (STM32MP1) | I²C status/PG/fault | PMIC-class | — | STEVAL boards | Validated with STM32MP1 ecosystems |
| ST | STPMIC2 | Multi-rail (STM32MP2) | I²C telemetry/status | PMIC-class | — | EVAL available | Updated rails and protections |
| ST | L5965 | Automotive PMIC | I²C/SPI diagnostics, PG | PMIC-class | AEC-Q100 | EVB | Vehicle body/infotainment rails |
| NXP | PCA9450 | PMIC (i.MX8M) | I²C status/PG/fault | PMIC-class | — | i.MX EVK | Optimized for i.MX8M power tree |
| NXP | PF8100 / PF8200 | PMIC (i.MX8) | I²C monitors + thresholds | PMIC-class | Variants | EVK | Programmable sequencing/limits |
| NXP | PF5020 | PMIC (S32) | I²C status/PG | PMIC-class | AEC-Q100 | EVB | Automotive MCU families |
| Renesas | ISL68220 / ISL68223 | Digital multiphase | PMBus telemetry (V/I/T) | Controller-class | — | Eval kits | High-current core rails |
| Renesas | RAA229131 | Digital multiphase | PMBus + fault log | Controller-class | — | Eval kits | Telemetry + advanced protections |
| Renesas | ISL70321SEH | Radiation-tolerant monitor | Telemetry/sequence | Supervisor-class | — | Eval | Specialty (space/hi-rel) |
| onsemi | NCP4206 / NCP4208 | Digital multiphase | PMBus telemetry | Controller-class | — | Eval | CPU/GPU/DDR rails |
| Microchip | MCP16502 | PMIC (MPU/SoC) | I²C status/monitors | PMIC-class | — | EVB | Compact multi-rail solution |
| Microchip | MCP19118 / MCP19119 | Digitally enhanced controller | I²C readable params | Controller-class | — | Eval | Mixes analog power with MCU |
| Melexis | — | — | — | — | — | — | Use discrete sensing path (see below) |
Discrete Sensing Path — Pair with Buck/Boost Controllers
Build a flexible telemetry chain around your existing regulator: current sense / power monitor + ADC + temperature sensor.
| Brand | Part | Function | Key specs | Rail pairing tips | AEC-Q | Eval tools |
|---|---|---|---|---|---|---|
| TI | INA240 | CSA (PWM-rejection) | Bidirectional, high CMRR | Low-ohm shunt; Kelvin routing | Variants | EVM |
| TI | INA226 | I²C power monitor | V/I/Power calc; alerts | Good for system power budget | — | EVM |
| TI | INA228 | High-res power monitor | 20-bit, wide range | Precision logging | — | EVM |
| ST | TSC2011 / TSC2010 / TSC2012 | CSA | High-side, low offset | Match gain to shunt FS | Car-grade options | EVAL |
| Renesas | ISL28022 / ISL28025 / ISL28034 | I²C power monitors | Multi-range, alerts | Good with multiphase Vcore | — | Eval |
| onsemi | NCS213 / NCS214 | CSA | High-side, low drift | Short traces; RC anti-alias | Variants | EVB |
| onsemi | LC709204F | Fuel gauge | I²C voltage/temperature | Handy for battery rails | — | EVB |
| Microchip | MCP6C02 / MCP6C04 | CSA | Low offset, fixed gains | Keep shunt self-heat low | — | EVM |
| Microchip | PAC1934 | 4-ch power monitor | I²C, accumulators | Multi-rail logging | — | EVB |
| Melexis | MLX91220 / MLX91221 / MLX91208 | Hall current sensor | Isolated, fast | Use when shunt loss is critical | AEC-Q options | EVB |
| TI | ADS1115 / ADS1015 | I²C SAR ADC | 16/12-bit, PGA, 4-ch | Map V/I/T into separate channels | — | EVM |
| TI | ADS131M02 | ΔΣ ADC | High-precision dual-ch | Noise-limited voltage sense | — | Eval |
| Renesas | ISL26102 / ISL26104 | ΔΣ ADC | 24-bit, low-speed | Accurate temperature/voltage | — | Eval |
| Microchip | MCP33131D / MCP33111D | SAR ADC | High-speed, 12/16-bit | Fast transients capture | — | EVB |
| Microchip | MCP3561/2/4 | ΔΣ ADC | 24-bit, low noise | High-resolution logging | — | EVB |
| NXP | PCF8591 | I²C ADC | 8-bit (basic) | Non-critical channels | — | Eval |
| TI | TMP117 | I²C temp sensor | ±0.1 °C typ | Log case/board temperature | — | EVM |
| TI | TMP102 | I²C temp sensor | Popular, low power | Ambient/board corners | — | EVM |
| ST | STTS75 / STTS22H | I²C temp sensor | Low drift | Close to inductor/case | — | EVAL |
| NXP | PCT2075 / LM75B | I²C temp sensor | Industry standard | Quick integration | — | EVB |
| Renesas | HS3001 | Temp/Humidity | ±0.2 °C typ (T) | Environment health | — | Eval |
| Microchip | MCP9808 | I²C temp sensor | ±0.25 °C typ | Good default pick | — | EVM |
| Melexis | MLX90614 / MLX90632 | IR temp (non-contact) | Board/CASE IR sensing | When probes are intrusive | Variants | EVB |
Integration tips
Route differential pair tightly; avoid shared ground impedance with gate drivers.
Use 0.1%/10 ppm resistors where accuracy matters; RC anti-alias tuned to ADC bandwidth.
Prefer PWM-immune CSAs (e.g., INA240) or add input RC; increase sample points per burst.
Frequently Asked Questions
What’s the minimal telemetry set for a single 5 V rail?
Capture Vout (±1% dc), Iout (±3% dc), and case/board temperature (±2 °C typical) with a power-good status and a min/max window. Add a 200–400 ms pre-trigger buffer and an 800 ms post window for events. This covers brown-outs, overloads, thermal drift, and user-visible resets.
How to size a shunt for low loss yet measurable current at light load?
Pick Rsh so Vfs = Imax × Rsh meets your monitor’s full-scale (typically 50–100 mV for low loss). Ensure light-load resolution ≥ 5–10 LSBs after gain. Check self-heating: P = I2Rsh at worst case. Use Kelvin routing and consider PWM-immune CSAs to avoid ripple bias.
How fast should I sample to catch brown-outs without false trips?
Use a fast analog comparator for threshold qualification (tens of microseconds) and log via ADC at 1–10 Hz health rate. Latch min/max in hardware with a debounce of 1–5 ms. The comparator prevents aliasing; the slow stream preserves history without flooding storage.
Best way to separate protection comparator from slow ADC telemetry?
Split paths at the sense node: fast comparator with its own RC and reference handles protection; buffered, bandwidth-limited path feeds the ADC. Do not load PG nodes. Keep grounds independent until a star point; match input impedances to minimize cross-coupling and delay skew.
How do I log events across brown-out gaps?
Use a supercap “last-gasp” path: on brown-out IRQ, shrink the snapshot, commit an atomic tuple (header→payload→CRC), and set an integrity marker. On boot, scan for valid tuples by CRC and monotonic sequence. Keep the commit within a fixed energy budget and time bound.
How to convert V/I/T logs into a single SOH score?
Bin cumulative stress—A·s, °C·s above a reference, ripple exposure—then compute a weighted score with hysteresis (Green/Amber/Red). Include fault-hours and retried events. Calibrate thresholds from fleet data so flags predict RMAs with low false positives. Log the mapping and revision.
What’s the right buffer length for pre-trigger history?
For field triage, 200–400 ms pre-trigger typically captures the causal load or VIN sag; pair with ~800 ms post window. Ensure fixed-rate ticks and a monotonic counter. For faster rails, keep hardware min/max in parallel so you don’t miss sub-buffer transients.
Can inductor DCR sense replace a shunt for accuracy targets?
DCR works for trend and protection but struggles with ±3% dc accuracy over temperature and tolerance unless you calibrate R and L and control thermal tracking. Use matched networks and temperature compensation; keep a shunt for acceptance tests or absolute energy accounting.
How to prevent min/max latches from being spammed by switching spikes?
Place RC anti-aliasing at the sense buffer, add a short digital debounce (1–3 switching periods), and use a programmable blanking window around mode transitions. Prefer CSAs with PWM rejection and route the sense pair tightly to suppress capacitive pickup from the SW node.
What retention policy avoids EEPROM wear while keeping useful history?
Adopt a rolling journal with round-robin pages and severity-aware caps (e.g., last 64 critical, 32 major, 16 warnings). Commit summaries periodically, events on demand. Keep erase count balance within 10%. Store page headers and CRC so recovery is deterministic after power loss.
PMBus vs simple I²C register map for small MCUs—trade-offs?
PMBus gives standardized pages, scaling, and tooling; it adds complexity and code size. A minimal I²C map is lighter and can expose “min/max since last read” with auto-clear. For one rail and tiny MCUs, I²C is efficient; for fleets and tooling, PMBus scales better.
How to HMAC-sign a dump for warranty disputes?
Append a device-scoped HMAC-SHA256 over the normalized payload (CSV line or CBOR bytes) including ts_ms, monotonic tick, and firmware revision. Keep the key in protected storage; verify server-side. Include a CRC for transport errors and a counter to prevent replay.
Does spread-spectrum impact min/max capture accuracy?
It can redistribute ripple energy and slightly change peak timing. Hardware min/max still works if your anti-alias and blanking are set for the broadened spectrum. For precision, increase measurement dwell or use EWMA windows large enough to smooth frequency dither effects.
How to sync timestamps across multiple rails?
Use a shared monotonic tick sourced from a common oscillator and mark events with both ts_ms and tick. Periodically discipline drift using a sync pulse or host-broadcast checkpoint. On merge, align by tick, then refine by wall clock to handle wrap and resets.
What counters matter most for field reliability?
Prioritize cumulative charge (A·s), thermal exposure above reference (°C·s), fault-hours, and retried-event counts. Add ripple-stress bins per mode. These correlate with connector fatigue, capacitor aging, and MOSFET stress. Track firmware revisions so counter trends map cleanly to design changes.