Asset Tracking Tag: ULP GNSS/Cell/LoRa/BLE Design
← Back to: IoT & Edge Computing
An asset tracking tag succeeds when it turns real-world events (motion, temperature, location) into traceable, low-power reports using a disciplined pipeline: wake gating → positioning → radio scheduling → reliable buffering. Most field failures come from ignoring the same fundamentals—energy-per-event budgeting, peak-current power paths, and recoverable design (logs, watchdog, and safe OTA).
H2-1 · Definition & Boundary: What an Asset Tracking Tag Is (and Is Not)
Engineering definition, ownership boundaries, and where to route adjacent topics without expanding scope.
Engineering definition: An asset tracking tag is a battery-powered, deploy-at-scale device that stays in deep sleep most of the time, wakes on a small set of events, packages “state + time + (optional) position,” reports using a constrained radio budget, and maintains a minimal local record so events remain traceable even under coverage gaps.
What this page “owns”: the tag-side closed loop. That means duty-cycled sensing, wake gating, energy-aware scheduling for GNSS and radios (BLE / LPWAN / cellular), peak-current resilience in the power path, and traceability primitives (event queue + durable counters + minimal diagnostics).
Three non-negotiable boundaries (used to prevent scope creep while keeping the content actionable):
- Energy boundary: lifetime depends on state occupancy + peak bursts + retry/coverage behavior, not just “average current.” Design is driven by an event-driven energy budget.
- Link boundary: only tag-side decisions are covered (when to wake, how long to scan/attach, failover rules). Gateway/backhaul and carrier-core topics are routed out.
- Positioning boundary: only tag-controllable contributors are covered (TTFF strategy, antenna zone/keep-out, noise coupling, fallback behavior). RTLS infrastructure and UWB/AoA algorithms are routed out.
Common “not this” confusion (fast routing)
BLE beacon focuses on broadcast/proximity without guaranteed traceability. Cellular terminal focuses on network residency and protocol-stack depth. UWB RTLS node focuses on infrastructure-assisted precision localization. This page focuses on ultra-low-power, event-closed-loop tag engineering.
SEO hint: keep this chapter’s first paragraph definition-like and extractable; avoid protocol tutorials. Use the routing phrases to prevent scope creep.
H2-2 · Use Cases & SKU Matrix: Why “One Tag” Becomes Many Versions
A decision framework to split SKUs by constraints (not by marketing labels), keeping cost and battery risk under control.
SKU planning for tags should start from irreversible constraints rather than feature wishlists. The practical split is driven by: (1) whether absolute position is required, (2) whether relay is acceptable, (3) event frequency and the cost of missed events, and (4) battery/temperature limits that cap peak bursts.
Four-axis SKU split: Positioning (none / proximity / absolute), Uplink (relay / LPWAN / cellular direct), Event (motion / tamper / temperature / door), Energy envelope (coin-cell class / primary lithium / rechargeable + buffer).
- Positioning axis: choose based on energy profile and failure modes (indoor/obstructed → fallback), not “spec sheet accuracy.”
- Uplink axis: direct cellular changes the power path and retry budget; relay shifts risk to discovery windows and handoff probability.
- Event axis: wake sources define false-trigger cost and queue pressure; state machine design is a battery feature.
- Energy axis: battery type and temperature range determine whether burst radios are sustainable without brownout or capacity collapse.
Consistent 3-line format for each SKU (keeps pages scalable)
Best for: deployment constraint that this SKU matches. · Not for: the failure-prone scenario to avoid. · Cost drivers: the real drivers (peak power path, antenna layout, certification/test, retries).
SEO hint: keep SKU descriptions short and repeatable; readers should be able to “land → select a cell → understand the cost driver” in under 60 seconds.
H2-3 · Power Budget First: Lifetime Is Not “Battery mAh ÷ Average Current”
A tag’s lifetime is governed by state occupancy, burst energy, retries under weak coverage, and battery reality (temperature, internal resistance, aging).
Design rule: Start with an event-driven energy ledger (energy per motion event, energy per daily heartbeat, energy per failed attempt), then integrate over time. Average current is an output, not an input.
A tracking tag typically spends most of its life in deep sleep, but lifetime collapses when short bursts become frequent or unstable: a GNSS fix turns into repeated attempts under遮挡, a relay scan window stretches longer than planned, or a cellular attach retries under marginal signal. The energy budget must therefore be built around states + actions + failure probability.
Budget the tag in four layers (keeps the worksheet reusable across SKUs):
- Layer A — State occupancy: Sleep, Sense-only, Confirm, Locate, Uplink, Retry/Backoff. Each state contributes time per day.
- Layer B — Energy per action: “One motion event,” “one tamper event,” “one daily heartbeat,” “one GNSS fix attempt,” “one uplink session.”
- Layer C — Failure probability: fix success rate, discovery success rate, uplink success rate, average retries per event (bounded by policy).
- Layer D — Battery reality: usable capacity after derating, temperature-dependent internal resistance, aging/self-discharge reserve.
Why “2 years” becomes “3 months” in the field
The dominant drain is rarely the sleep current. The usual collapse comes from extended scan windows, repeated attach / retries, GNSS re-fix loops under遮挡, or brownout → reboot → retry storms. Each has an observable counter that the design should log.
| Worksheet Block | Fields to Record (no numbers required here) | Computation Logic (what the fields are used for) |
|---|---|---|
| Battery & Environment | Battery type, nominal V, usable capacity (derated), internal resistance proxy, temperature profile, reserve % | Cap the available energy and flag burst droop risk under low temp / aged cells |
| State Occupancy | Time/day in Sleep, Sense-only, Confirm, Locate, Uplink, Retry/Backoff | Daily baseline energy: Σ(P_state × t_state) |
| Per-Action Energy | Energy per GNSS attempt, per scan window, per uplink session, per MCU confirm | Event energy: E_event = Σ(E_action) inside the event pipeline |
| Retry / Coverage | Success rates, expected retries per event, hard retry cap, backoff policy | E_event_total = E_event × (1 + retries_expected), bounded by policy |
| Conversion Loss | DC/DC efficiency (light load & burst), LDO IQ, switch leakage | Translate load energy to battery energy: E_batt ≈ Σ(E_load / η) + overhead |
| Outputs | E_day, E_event, worst-case E_day (retries), lifetime estimate, margin flags | Life ≈ E_batt_usable / E_day (after reserves). Check worst-case tail, not only typical. |
Practical requirement: log minimal counters (scan count, attach attempts, GNSS fix attempts, reset reason). They turn “battery died early” into a measurable root cause.
H2-4 · Power Path & Energy Management: Peak Handling Drives Stability and Lifetime
A robust tag is built around peak-safe power delivery, clean domain isolation, controlled buffer charging, and a brownout policy that prevents reboot storms.
System objective: Keep the always-on domain stable and low-leakage, while enabling burst domains (GNSS / RF / cellular) to turn on/off predictably without pulling the battery below safe voltage thresholds.
Peak current is not a corner case in tracking tags—radios and GNSS introduce short, high-demand bursts that interact with battery internal resistance and cold temperature behavior. When the power path is not designed for burst delivery, the result is not only reduced lifetime, but also unstable sessions (attach/fix failures), and in the worst case, brownout → reboot → retry storms.
Write the power path as a controlled chain rather than a single rail:
- Energy source: battery (derated capacity + temperature constraints).
- Protection: reverse protection / transient clamps / inrush shaping (tag-side only).
- Buffer (optional): local storage to smooth bursts, with controlled charging to avoid wasting energy.
- Conversion: DC/DC and/or LDO selection by light-load overhead, burst response, and quiescent/leakage.
- Domain gating: separate always-on vs burst domains using load switches and sequencing.
- PG/reset + brownout policy: enforce safe turn-on and safe turn-off behavior under droop.
Problem → Root cause → Action (tag-side)
TX starts and resets → battery droop + simultaneous domain turn-on → stagger sequencing, add controlled buffer, tighten brownout rules.
RF sessions fail without reset → noisy/weak rail under burst → isolate RF domain, improve filtering and conversion transient response.
Lifetime drops unexpectedly → buffer charging loss + leakage in “off” domains → audit leakage, choose low-IQ switches, gate cleanly.
Reliability requirement: record minimum evidence for power-related failures (reset reason, PG drop events, minimum battery voltage, attach/fix failure counters). Without these, “battery died early” remains un-actionable.
H2-5 · Wake-up & Event Pipeline: Accel-Wake Needs a System, Not Just a Threshold
A reliable tag uses a multi-stage pipeline: low-power sensing → wake gating → MCU confirmation → locate/report → cooldown/backoff, with counters for field tuning.
Engineering goal: Reduce false wakes (transport vibration) without missing slow motion, by combining prefilter gating, debounce/hysteresis, and event merge + cooldown.
An accelerometer wake trigger is often treated as a single threshold, but real deployments face vibration profiles, mounting variation, and temperature drift. A tag must therefore separate “possible motion” from “report-worthy motion” using a staged event pipeline. This keeps the always-on domain cheap while reserving expensive actions (GNSS, radio sessions) for confirmed events.
Common failure modes and why a single threshold fails:
- False triggers: transport vibration, handling shocks, mechanical resonance, periodic motion in vehicles.
- Missed triggers: slow motion, gentle tilt, soft packaging damping, low-amplitude movement when the tag is tightly mounted.
- Drift and variation: temperature drift, orientation differences, mounting stiffness, sensor offset and tolerance spread.
Practical strategies (tag-side):
- Multi-stage gating: keep the accelerometer in a low-power prefilter mode; wake MCU only when a windowed rule is met.
- Debounce + hysteresis: require consecutive hits (N-of-M), integrate over a time window, and use different enter/exit thresholds.
- Event merge + cooldown: merge bursts into one report and enforce a cooldown to prevent repeated wake/report storms.
- Bounded retries: if locate/report fails, enter backoff and cap attempts to protect the battery tail.
| State | Entry / Exit Conditions | Config Fields (turn into code/config) |
|---|---|---|
| S_SLEEP | Enter after cooldown or success. Exit on prefilter hit or RTC heartbeat. | sleep_domain_mask, rtc_heartbeat_s, wake_sources |
| S_SENSE_ONLY | Low-power monitoring window; exit to CONFIRM when prefilter rule meets N-of-M. | prefilter_threshold, prefilter_window_ms, prefilter_hits_required |
| S_CONFIRM | MCU validates motion vs vibration; pass → LOCATE/REPORT; fail → COOLDOWN. | confirm_sample_rate, confirm_duration_ms, confirm_features, confirm_pass_rule |
| S_LOCATE | Attempt GNSS or a degraded source; timeout/early-stop if signal is weak. | locate_timeout_s, early_stop_rules, max_fix_attempts, degrade_policy |
| S_TRANSMIT | Report immediately or buffer/merge. Success → COOLDOWN; failure → BACKOFF. | tx_min_interval_s, merge_window_s, queue_limit, tx_trigger_rules |
| S_BACKOFF | Delay and reduce attempts; exit back to TRANSMIT/LOCATE or stop at retry cap. | backoff_schedule, retry_cap, failure_counters |
| S_COOLDOWN | Suppress repeated triggers after handling; exit to SLEEP when timer ends. | cooldown_s, cooldown_escalation_rules |
Minimum evidence to log (for tuning without guesswork)
wake_source, prefilter_hits, confirm_pass/fail, cooldown_active, event_queue_depth, locate_attempts, tx_attempts, last_failure_reason.
H2-6 · Positioning Stack on a Tag: GNSS Success Rate Dominates Both Accuracy and Energy
For tags, GNSS is mainly a decision problem: when to attempt, when to stop early, how to degrade under遮挡, and how antenna/noise constraints affect success rate.
Decision-level view: Manage GNSS with start mode (cold/warm/hot), attempt budget (time/tries), and a degrade policy when遮挡/multipath makes fixes unlikely.
Indoor or metal environments (warehouses, trailers, containers) often produce low satellite visibility and severe multipath. In this regime, the main enemy is not “GNSS spec accuracy,” but low fix success rate and the resulting retry energy. Improving the probability of a valid fix reduces both time-to-first-fix (TTFF) tail and battery drain.
GNSS concepts to include only at decision granularity:
- TTFF distribution: treat TTFF as a tail risk, not a single number; cold starts dominate when assistance is missing.
- Start modes: hot/warm/cold categories map directly to how often the tag can afford re-fixes.
- Fix cadence policy: event-driven (motion) vs time-driven (heartbeat), with explicit attempt budgets.
- Stop conditions: timeouts, low-signal early stop, and bounded attempts to prevent “fix storms.”
遮挡 reality: degrade on the tag side
When GNSS becomes unreliable, the tag should label the position source and quality tier and fall back to a coarse method (e.g., cellular-grade or proximity-grade), rather than looping GNSS attempts. The report remains traceable even without a fresh fix.
Field debug template: Symptom → possible causes → top evidence to check:
- TTFF too slow: cold starts too frequent / weak signal → check start_mode bucket, C/N0 bucket, TTFF distribution.
- Indoor no fix:遮挡/antenna zone compromised/noise coupling → check fix_success_rate, sat_count bucket, low-signal ratio.
- Fix jumps: multipath / edge-of-coverage solution → check position jump flag, quality tier, consecutive bad fixes.
- Energy too high: retry loops / no early-stop → check fix_attempts per event, gnss_on_time, early_stop_count.
- Worse during radio sessions: shared droop/noise → check min_vbat during GNSS, overlap counter.
Antenna and layout (abstract principles) affect success more than minor algorithm tweaks. Keep the antenna zone clean, preserve a stable reference/ground area, and separate it from switching converters and burst transmitters. This is not PCB-level detail; it is a placement and isolation rule that protects the fix probability.
H2-7 · Multi-Radio Strategy: A Scheduler Beats “Stuffing BLE + LoRa + Cellular”
Tag-side only: role separation, time-slicing, window mutual exclusion, bounded retries, and abstract coexistence principles (no protocol tutorials).
Core idea: Multi-radio success depends on a Radio Manager that selects one link at a time, enforces scan/TX mutual exclusion, and applies fallback + cooldown to prevent “self-inflicted packet loss.”
A tag can carry multiple radios for flexibility, but it cannot run them as independent features. In battery-powered tags, the dominant failure patterns are scheduling collisions (scan vs transmit), peak-current droop during bursts, and RF path coupling when antenna resources are shared. A practical design starts by assigning a single “best job” to each radio and then expressing everything as a deterministic schedule with explicit time budgets and fallback behavior.
Tag-side radio scheduling rules (keep them explicit):
- Time-slicing: only one radio window may be active at a time on shared resources (antenna, power, clocks).
- Priority: critical events outrank periodic heartbeats; queued pressure may override low-priority scans.
- Mutual exclusion: scanning windows (e.g., BLE scan) must not overlap with high-current TX bursts (cell/LoRa).
- Bounded attempts: registration/attach and repeated TX retries must be capped; failures trigger backoff/cooldown.
- Fallback: when a preferred link fails, degrade to the next-best link for the current event class.
| Trigger | Preferred Link | Preconditions | Time Budget & Caps | Fallback |
|---|---|---|---|---|
| Motion event confirmed |
BLE proximity/relay |
cooldown off; queue not critical-high; scan allowed | scan_window_s; max_scan_cycles; no overlap with TX | LoRa uplink → Cell burst |
| Critical tamper must deliver |
Cell coverage first |
Vbat above guard; temperature within safe band | attach_timeout_s; tx_retry_cap; backoff schedule | LoRa uplink → BLE relay |
| Heartbeat periodic |
LoRa cheap uplink |
queue depth normal; duty limits respected | tx_window_s; payload cap; merge window | BLE relay → Cell if required |
| Queue high pressure |
Any drain queue |
choose by Vbat/temp and last failures | drain_budget_s; min_tx_interval; retry caps | switch link tier |
Why packets drop (tag-side categories)
(1) scheduling collision (scan vs TX overlap), (2) RF path coupling (shared antenna/filters/switch), (3) power/clock noise during bursts (droop/spurs). Fix order: evidence → schedule → RF path → power/noise.
H2-8 · Data Model & Reliability: Traceable, Replayable, and “No Critical Event Lost”
Tag-side only: event IDs, queue semantics, retry/ack/commit, offline buffering, minimal power-loss protection, and minimum diagnostic fields.
Reliability loop: event → queue → encode → transmit → ack → commit. The key is to keep each event stable under retries using a unique event_id, and to persist enough state to survive resets.
Connectivity is intermittent, and tags reboot or brown out under peak load. Reliability therefore cannot be “best effort.” A tag needs a small, deterministic data model that makes events traceable end-to-end, repeatable without duplication, and recoverable after power loss. The simplest robust approach is a bounded event queue backed by a ring log, where an event is only removed after a verified acknowledgment triggers a commit marker.
Tag-side semantics that make replay safe:
- Stable identifiers: each event carries event_id so retries resend the same identity (no new IDs on retry).
- Two-phase lifecycle: record then commit on ack; uncommitted events are replay candidates after reboot.
- Timestamp quality: record ts_source (RTC/GNSS/other) and a simple ts_quality tier for traceability.
- De-dup friendliness: include event_type, pos_source, and fail_code buckets so upstream can reason without large payloads.
| Minimal payload fields (must-have) | Why it matters | Optional (nice-to-have) |
|---|---|---|
| event_id, event_type | Safe replay and de-dup across retries/reboots | boot_id, event_seq (if exposed) |
| event_time, ts_source, ts_quality | Traceability even without perfect sync | ts_uncertainty_bucket |
| pos_source, pos_quality | Explains degraded positioning (GNSS vs coarse) | ttff_bucket, fix_attempts |
| battery_bucket, temp_bucket | Correlates failures with power/thermal constraints | min_vbat_during_tx |
| radio_used, radio_fail_code | Fast root-cause grouping without stack deep-dive | rssi_bucket, overlap_counter |
Retry / commit pseudo rules (convert into firmware policy)
Critical events never drop. Retries resend the same event_id. Commit only after ack(event_id). Backoff on coverage/registration failures; delay on low-vbat; cap attempts; merge normal events in a window; on reboot, replay uncommitted events by priority.
Minimal offline buffering and power-loss tolerance (tag-side):
- Ring log: fixed-size circular storage; normal events may be overwritten, critical events are protected by priority or reserved pages.
- CRC + sequence: each record includes a small integrity check so reboot recovery can skip torn writes.
- Brownout-aware path: on early low-voltage detection, write only minimal metadata (event_id + reason + counter) and exit gracefully.
- Recovery scan: at boot, scan for records missing commit markers and re-enqueue them with their original IDs.
Minimum observability (do not skip): vbat_min, temp, reset_reason, queue_depth, retry_count, gnss_fail_code, radio_fail_code, last_tx_radio.
H2-9 · Security & Provisioning: “Encryption” Alone Does Not Stop Cloning
Tag-side only: identity, secure boot, key storage, signed payload fields, provisioning states, and audit-friendly evidence logs.
Engineering target: Prevent cloning, reduce replay, and keep changes auditable using a small set of control points: root of trust, secure boot, non-exportable keys, and signed payloads with freshness fields.
1) Root of trust and secure boot (action-level)
A tag must start from an immutable first stage (boot ROM or equivalent). Each stage verifies the next stage before execution. If verification fails, the device enters a restricted mode that preserves evidence and blocks sensitive operations that would amplify risk.
- Chain verification: ROM → verified bootloader → verified application.
- Anti-rollback: enforce a monotonic version/counter so older images cannot be booted after an update.
- Failure policy: on verify failure, deny OTA activation and emit a minimal diagnostic record.
2) Identity and key storage (unclonable in practice)
Device identity must separate what can be public (a device identifier) from what must remain secret (private keys). The most robust “anti-clone” pattern for tags is to keep private keys non-exportable and perform signing inside a protected boundary such as a secure element or TPM-like block.
- Key injection: provision keys into fixed slots at manufacturing or controlled enrollment.
- Key usage: sign critical payloads and attest boot state using the protected key slots.
- Key rotation/revoke: support replacement of operational keys without exposing old keys.
3) Provisioning states (factory → bound → transfer → retire)
“Binding” should be modeled as a state machine. Each state must define what actions are allowed, what is denied, and what evidence must be logged for audit and incident response.
| State | Allowed actions | Denied actions | Must-log evidence |
|---|---|---|---|
| Factory | identity + key slot injection, basic self-test, enter enrollment | field operation, unrestricted debug, OTA activation | key_slot_ver, boot_state, inject_reason |
| Unclaimed | controlled enrollment / claim window, limited comms | re-key without authorization, permanent mode switches | claim_attempts, last_claim_code |
| Bound | normal operation, signed telemetry, controlled OTA | key export, downgrade boot, open debug paths | attest_code, rollback_counter, tamper_flag |
| Transfer | time-limited rebind, rotate operational keys | permanent unlock, bypass of secure boot | transfer_token_state, rotate_result |
| Retire | erase sensitive materials, disable identities | re-activation without factory procedures | erase_proof, retire_reason |
4) Threat → control point → evidence (audit-friendly)
| Threat | Tag-side control points | Evidence / logs (minimum) |
|---|---|---|
| Clone | non-exportable keys; device-bound signing; provisioning state lock | key_slot_ver, attest_code, sig_fail_count |
| Replay | stable event_id; monotonic counter; freshness fields in payload | last_counter, replay_drop_count, last_event_id |
| Tamper | tamper latch/flags; restricted mode; signed “tamper event” | tamper_flag, enclosure_open_count, tamper_time_bucket |
| Storage readout | encrypted storage; keys not stored in plaintext; debug lock | storage_enc_state, debug_lock_state, boot_state |
| Downgrade | anti-rollback counter; secure boot enforcement | rollback_counter, boot_fail_reason, last_fail_stage |
Signed payload minimum fields: event_id, counter, event_type, event_time + ts_source/ts_quality, pos_source/pos_quality, fw_version, boot_state bucket.
H2-10 · OTA & Fleet Operations (Tag-side): Scale Requires Gating + A/B + Rollback
Tag-side only: eligibility gating, resumable download, verify, stage to inactive slot, atomic swap, health confirm, and rollback with metrics.
Non-negotiables: A scalable tag OTA needs eligibility gating (battery/temperature/motion), A/B slots (inactive staging), and automatic rollback on boot/health failure.
Eligibility gating (upgrade allowed ≠ upgrade possible)
The primary brick-risk drivers are low battery, brownouts during staging/swap, and unstable radio windows during download. Eligibility checks should be evaluated before any high-cost step and re-evaluated before swap.
OTA flow in 6 steps (each step has a failure handler)
| Step | Goal (tag-side) | Common failures | Required action + logs |
|---|---|---|---|
| 1) Gate | check battery/temp/motion/storage; reserve swap budget | low vbat, high temp, moving | deny & reschedule; log last_fail_step=1 + gating_code |
| 2) Download | resumable fetch into staging buffer | link down, timeout | backoff; save progress; log dl_time_bucket + signal_bucket |
| 3) Verify | signature + hash + manifest consistency | sig fail, hash mismatch | discard; increment verify_fail_count; block risky activation |
| 4) Stage | write to inactive slot + local integrity checks | write error, CRC fail | bounded retries; log stage_fail_code; stop if repeated |
| 5) Swap/Boot | atomic swap marker + reboot into new slot | boot loop, early crash | auto rollback to old slot; log reset_reason + boot_fail_code |
| 6) Health Confirm | self-check then commit new slot as “good” | health fail, watchdog resets | rollback + blacklist bucket; log confirm_fail_count |
Minimum fleet hooks: fw_current, fw_candidate, last_success_time, last_fail_step, last_fail_code, attempt_count, download_time_bucket, verify_time_bucket, swap_count, reset_reason.
Brownout / weak-battery resilience (tag-side):
- Separate budgets: long-tail download vs short critical staging/swap window.
- Re-check before swap: gating is evaluated again right before reboot and atomic swap.
- Early low-voltage detection: exit staging cleanly; preserve progress and logs; avoid torn writes.
H2-11 · EMC, Ruggedness & Compliance (Tag): Where Compact Wireless Tags Fail Most Often
Practical tag-side guidance: stress → symptoms → suspects → verification → board fixes, plus “minimum evidence logs” to stop guesswork.
Reality check: Most field “mystery failures” are not random. They are repeatable outcomes of uncontrolled transient current paths, insufficient reset/recovery, and poor isolation between enclosure/entry points and sensitive RF/MCU domains.
Fast triage: symptoms that usually indicate a “path problem”
- ESD touch → freeze: MCU lockup, I²C stuck-low, or radio never recovers until battery pull.
- After drop → sleep current increases: leakage in protection parts, cracked solder joints, or power-domain instability.
- Low temperature → boot loops: battery droop + POR threshold mismatch + peak load (TX/GNSS/cold-start).
- Strong RF nearby → random resets / GNSS fails: coupling into power rails/clock/antenna matching and “false brownouts.”
Minimum evidence logs (tag-side): reset_reason_bucket (POR/BOR/WDT/lockup), brownout_count, min_vbat_bucket, rf_fail_code, gnss_fail_code, last_panic_step, uptime_bucket.
Use the same 4-step pattern for every field failure
Format: Symptom → Suspects → How to Verify → Board Fixes. This keeps EMC/ruggedness work actionable and avoids vague “add a TVS” advice.
Case A — “After ESD, the tag freezes or reboots.”
Symptom: touch ESD causes immediate freeze, reboot storm, radio stuck state, or sensors disappear until power-cycle.
Suspects (highest probability first):
- Bad return path: ESD current flows through sensitive ground/power instead of a short clamp-to-chassis path.
- TVS placement: clamp is too far from the entry point; the transient enters the board before clamping.
- Domain coupling: RF/GNSS/clock/power rails share a noisy return route with enclosure/IO entry.
- Insufficient recovery: no robust POR/WDT; lockup persists after the transient.
How to verify:
- Capture reset_reason_bucket and brownout_count around events; “freeze without reset” usually needs WDT strategy.
- Probe VBAT and primary rail during ESD; watch for droop and ringing that correlates with resets.
- Repeat ESD at multiple points (enclosure, connector, battery door) and compare which points are “worst.”
Board fixes:
- Make a short, direct clamp loop: entry → TVS → chassis/reference (avoid routing through RF/MCU return).
- Enforce RF/GNSS keepout: keep transient return currents away from antenna matching and LNA regions.
- Add/strengthen POR + watchdog so the device always returns to a known state after transients.
Case B — “After drop/vibration, sleep current becomes higher.”
Symptom: after impact, average current rises; battery life collapses; failures can be intermittent.
Suspects:
- Micro-cracks: solder joint cracks on battery contacts, passives, shielding cans, or protection diodes.
- Damaged protection parts: ESD diode arrays can become leaky after repeated stress.
- Contact instability: battery spring/contact bounce causes repeated brownouts and partial wake cycles.
- Antenna/matching shift: PA sees worse mismatch → higher TX current for the same link outcome.
How to verify:
- Measure current in modes: sleep → scan → position → transmit. Identify which state changed.
- Apply gentle flex/press at battery area/shield/connector while monitoring current.
- Thermal/cold spray can reveal crack-sensitive leakage changes.
Board fixes:
- Improve mechanical retention of battery contacts and shielding; reduce micro-motion paths.
- Place protection parts so failure modes do not leak into sensitive rails; consider higher-robustness ESD arrays.
- Partition rails with load switches so damaged subsystems can be isolated and diagnosed.
Case C — “Low temperature: boot fails or repeats reset.”
Symptom: at low temperature, startup fails; repeated resets; GNSS/cellular registration never completes.
Suspects:
- Battery internal resistance rises → VBAT droops during peak loads (cold-start, RF TX bursts).
- POR/BOR thresholds misaligned with real droop; false resets cascade.
- Peak scheduling: GNSS/cellular/radio all start at once; rails collapse.
How to verify:
- Log min_vbat_bucket and reset_reason_bucket at each boot attempt.
- Scope VBAT and primary rails during boot; confirm if the failure aligns with peak events.
Board fixes:
- Implement staged power-up: MCU minimal domain first → evaluate VBAT → then enable GNSS/cellular/radio sequentially.
- Use supervisor + BOR strategy to avoid ambiguous partial boot states.
- Add local energy buffering for short peaks (within tag constraints), plus “defer TX” policy at cold start.
Case D — “Strong RF nearby causes drops, GNSS fails, or random resets.”
Symptom: near radios/motors/chargers, link degrades; GNSS fails; sometimes the MCU resets unexpectedly.
Suspects:
- Coupling into rails: RF energy couples into power lines; causes false brownouts or digital upset.
- Antenna region contamination: noisy DCDC/clock currents near antenna matching or GNSS LNA.
- Insufficient filtering: no ferrite/RC partitioning between noisy and sensitive domains.
How to verify:
- Correlate rf_fail_code / gnss_fail_code with proximity scenarios.
- Measure rail noise under TX and in the presence of interferers; look for repeatable “failure windows.”
Board fixes:
- Strengthen domain filters (ferrite + local decoupling) and keep antenna region clean.
- Separate and control return currents so RF coupling does not modulate MCU reference.
Parts selection pointers (with example MPNs)
Example MPNs below are commonly used building blocks. Final selection must match the tag’s voltage, interface count, package constraints, and target stress level.
| Function | Where used (tag-side) | Example MPNs | Selection notes (actionable) |
|---|---|---|---|
| ESD diode (single line) | button lines, sensor I/O, low-speed GPIO near entry points | Nexperia PESD5V0S1UL; onsemi ESD9B5.0ST5G | Place at the entry; minimize loop area to reference/chassis; check leakage vs sleep current budget. |
| ESD array (multi-line) | I²C/SPI/UART headers, test pads, small connectors | TI TPD4E05U06; Littelfuse SP0503BAHTG; Semtech RClamp0524P | Prefer low capacitance for RF-adjacent lines; ensure return path does not traverse RF/MCU ground. |
| Power TVS (surge clamp) | battery input or charging input (if present) | Littelfuse SMAJ5.0A / SMBJ5.0A; Vishay SMBJ5.0A-E3/52 | Use only if the power entry is exposed; keep clamp loop short; verify leakage vs always-on battery. |
| Ferrite bead (rail partition) | between noisy DCDC and RF/GNSS/AFE rails | Murata BLM18AG102SN1D; TDK MPZ1608S221A | Select impedance at the noise band; validate DC resistance vs peak current; pair with local decoupling. |
| Common-mode choke | USB/charging/data lines (if present), external cable interfaces | TDK ACM2012-900-2P; Murata DLW21SN900SQ2 | Use when cables exist; place close to connector; avoid adding series loss on critical low-voltage signaling. |
| Reset supervisor (POR/BOR) | MCU reset integrity under droop/ESD disturbances | TI TPS3839; Microchip MCP1316; Analog Devices/Maxim MAX809 | Pick threshold that matches real VBAT/rail droop; ensure clean reset deassert timing; log reset reason if possible. |
| Watchdog timer | recovery from ESD-induced lockup and stuck peripheral states | TI TPS3436; Analog Devices/Maxim MAX6369 | Hardware watchdog prevents “freeze until battery pull”; tune window to avoid nuisance resets during RF bursts. |
| Load switch | isolate RF/GNSS/sensors during faults; reduce leakage after damage | TI TPS22910A; TI TPS22916 | Use for domain gating and post-fault isolation; verify RON vs peak current and droop at cold temperature. |
| Low-C RF ESD | antenna feed protection (only if required by exposure) | Infineon ESD0P4RFL; Semtech RClamp0502B | Keep capacitance ultra-low; place at antenna entry; validate RF insertion loss and GNSS sensitivity impact. |
| Conformal coating (optional) | humidity/condensation robustness in harsh environments | HumiSeal 1B73; MG Chemicals 422B | Only where moisture is a real driver; avoid coating antenna keepout; verify reworkability and outgassing constraints. |
Compliance boundary (tag-side): focus on design constraints and repeatable self-checks. Avoid writing regulation clauses; instead document the tag’s recovery behavior, evidence logs, and “known-good” EMC layout rules.
H2-12 · FAQs (Asset Tracking Tag)
Tag-side answers only: power budget & power path, wake/event pipeline, GNSS and multi-radio strategy, data model reliability, security/provisioning, OTA robustness, and EMC/ruggedness recovery.