RTLS Security Tag & Anchor Hardware Design (UWB/BLE AoA)
← Back to: Security & Surveillance
RTLS security tags and anchors are only trustworthy when positioning accuracy is backed by measurable evidence—timestamp quality, RF multipath indicators, scheduling health, and power stability. This page shows how to design, test, and debug UWB/BLE AoA RTLS so results are repeatable, tamper-resistant, and auditable in the field.
H2-1. Definition & System Boundary: Tag vs Anchor vs Gateway
Intent: align every reader to the same “map” in under a minute—what counts as a Tag, what counts as an Anchor, what a Gateway/Controller does, and what is explicitly out of scope for this page.
Boundary rules (fast audit)
- Tag = mobile asset/person device. Primary constraint: battery life & stealth. Provides UWB packets / BLE advertisements; may initiate or respond to ranging.
- Anchor = fixed infrastructure node. Primary constraint: time integrity & coverage geometry. Produces trusted time stamps (UWB RX/TX events) and/or IQ samples (BLE AoA).
- Gateway/Controller = transport concentrator. This page only references “ingest & forward” (no cloud/platform architecture, no algorithm deep dive).
Owned responsibilities (what this page explains deeply)
- Tag hardware behaviors: duty-cycled radio activity (UWB/BLE), wake reasons, peak current events, anti-tamper triggers, secure identity.
- Anchor trust primitives: where the time stamp is taken, sync input/holdover path, timestamp jitter budget, anchor identity & anti-replacement signals.
- Inputs to positioning: ranging frames, time stamps, IQ samples—treated as engineering artifacts with measurable quality.
Referenced only (no deep expansion)
- Location engine: shown as an output consumer (it turns evidence into coordinates), but internal math/filters are out of scope.
- Network/PoE infrastructure: anchor may be PoE-powered, but only PD-side power-tree impact and measurement points appear here.
Typical topologies (what moves, what is trusted, what is measured)
Why boundary clarity matters in security RTLS: if identity, time stamps, or anti-tamper signals are not device-owned and verifiable, the final “position” can be spoofed or made non-repeatable. The page therefore treats time and trust as first-class engineering signals, not as software conveniences.
Evidence checklist (deployment inputs that must exist before design freeze)
These items become the “ground truth contract” used later by time-sync, calibration, validation, and field debug chapters.
| Evidence item | What it controls | How to quantify (device-side) | Common failure signature |
|---|---|---|---|
| Anchor geometry density / height / spacing |
NLOS probability, dilution of precision, AoA baseline | Anchor map + “visible anchors per tag event” stats | Good in one zone, bad in another; errors cluster near metal/doors |
| Update-rate target Hz per tag × tag count |
Air-time occupancy, scheduling pressure, peak current | Duty-cycle logs; packet collision/retry counters | Accuracy collapses only at high traffic; IQ buffer overruns appear |
| Error budget time / phase / multipath |
Whether TDoA/AoA can be trusted | Timestamp jitter histogram; CIR quality; AoA phase residual | Distance stable but position drifts; angle jumps in reflective scenes |
| Threat model spoof / replay / tamper |
Key policy, zeroize logic, trust flags | Replay counters; boot-state flags; tamper event count | “Correct” packets observed but identity cannot be proven |
H2-2. Positioning Methods: UWB TWR/TDoA vs BLE AoA (When & Why)
Intent: convert method names into engineering choices. Each method is judged by (1) who pays the energy cost, (2) who pays the synchronization/calibration cost, (3) the dominant error sources, and (4) the device-side evidence that proves or falsifies performance.
Selection checklist (use the same 4 questions for every method)
- Energy owner: does the Tag spend most power (TWR), or do Anchors spend most complexity (TDoA/AoA)?
- Trust owner: does the system rely on anchor time stamps (TDoA) or array phase calibration (AoA)?
- Dominant errors: time offset/jitter, multipath/NLOS, CFO/clock drift, or phase bias.
- Evidence to log: one counter/field per error class (time histograms, CIR quality, phase residual, retry/overrun).
UWB Two-Way Ranging (TWR): trade power for lower sync dependence
- Why it works: distance is derived from two-way packet timing; anchors do not need a tight shared time base to produce a valid range.
- Cost: Tag power increases (extra RX/TX windows, crypto per exchange); air-time scales with Tag count and update rate.
- Best fit: fewer Tags, stronger anti-spoof constraints, environments where infrastructure sync cannot be guaranteed.
- Evidence to capture: per-exchange retry count, ToF consistency, CIR quality vs range, peak current events during ranging bursts.
UWB Time Difference of Arrival (TDoA): trade sync complexity for Tag battery life
- Why it works: Tag can transmit short beacons; multiple anchors time-stamp the same packet; time differences yield geometry constraints.
- Cost: anchors must maintain a trusted time base; small time-offset errors directly map into large position errors.
- Best fit: large Tag populations, long battery targets, deployments where anchors can be powered and periodically verified.
- Evidence to capture: anchor time-offset & jitter histograms, holdover drift vs temperature, “visible anchors per event” stats.
BLE AoA (Angle of Arrival): trade array calibration for cost-effective directionality
- Why it works: AoA anchors sample IQ across an antenna array and infer direction from phase differences.
- Cost: requires stable array geometry and calibration; reflective scenes create phase ambiguity (multipath dominates).
- Best fit: “direction of approach” use-cases, doorway/zone determination, cost-sensitive anchors with controlled installation geometry.
- Evidence to capture: phase residual metrics, per-antenna gain/phase health, IQ buffer overrun counters, angle variance vs RSSI.
Hybrid (UWB distance + BLE AoA angle): constrain AoA multipath with a range sanity check
- Why it helps: UWB range provides an independent constraint; AoA provides direction. When multipath inflates AoA variance, range gating rejects implausible angles.
- Device-side implication: both evidence streams must be time-aligned and tagged with trust quality (time stamp quality + IQ quality).
- Evidence to capture: matched time stamps between UWB events and AoA captures, range/angle consistency counters, NLOS flags.
Error sources → device-side evidence (turn “mystery drift” into measurable causes)
| Error source | Hits methods | What to measure (device-side) | What it looks like |
|---|---|---|---|
| Multipath / NLOS | TWR / TDoA / AoA | CIR quality, RSSI variance, “angle variance”, packet SNR flags | Ranges look “stable” but wrong; angles jump near metal & crowds |
| Clock drift / CFO | TWR / AoA (and TDoA via anchors) | Frequency-offset estimate, drift vs temperature, retry spikes | Errors correlate with temperature or long idle-to-wake intervals |
| Anchor sync offset | TDoA (primary) | Offset/jitter histogram, holdover drift, time-stamp capture latency | Whole zone shifts together; errors “move” when an anchor reboots |
| Array phase bias | AoA (primary) | Phase residual, per-antenna phase health, calibration version/CRC | Angle is consistently biased even with good RSSI (systematic error) |
| Scheduling contention | All (device-side) | IRQ latency, DMA overrun, IQ buffer overrun, crypto time budget | Only fails at high update rate; “good hardware” becomes inconsistent |
H2-3. RF Front-End & Antenna Architecture (UWB + BLE AoA Array)
Intent: explain why “bad positioning” is often not an algorithm problem. If the RF evidence (packets, CIR, IQ phase) is unstable, software can only amplify the instability. This chapter treats antennas, coupling, and the capture chain as the accuracy ceiling.
RF-first principle: Positioning math consumes evidence quality (CIR shape, phase consistency, timestamp stability). When evidence quality collapses (coupling, detune, multipath), accuracy collapses even if the algorithm is unchanged.
UWB chain (ranging evidence depends on spectrum + antenna + CIR)
- TRX + matching network: a poor match reduces link margin and makes CIR “noisy” (less reliable ToF peaks).
- Filter / SAW: sets out-of-band rejection; prevents adjacent radios and digital noise from lifting the noise floor.
- Antenna options (chip / PCB / external): choose by repeatability and detune risk (plastic, metal, human proximity).
- Spectral constraints: keep margin for mask/shape changes caused by layout, enclosure, and temperature (avoid “works in lab only”).
BLE AoA chain (angle evidence depends on multi-channel phase consistency)
- Antenna array geometry: spacing and polarization dominate systematic angle bias; enclosure metal creates strong reflections.
- RF switch + IQ sampling: channel-to-channel insertion loss and switching timing directly impact phase residual.
- Shielding & keep-outs: array needs stable near-field environment; ground return and nearby cables can create “moving reflectors.”
Shared antenna vs separated antennas (use measurable pass/fail rules)
Evidence to collect (turn “it drifts” into measurable RF causes)
| Evidence | Applies to | How to measure | Failure signature |
|---|---|---|---|
| S11 (antenna match) | UWB / BLE | VNA sweep in final enclosure; compare “hand/metal near” cases | RSSI margin collapses; CIR becomes inconsistent; AoA variance increases |
| S21 (coupling / isolation) | Shared / adjacent antennas | Measure coupling between UWB and BLE paths / array elements | Noise floor rises during bursts; phase residual grows with traffic |
| CIR statistics | UWB | Log CIR quality / peak stability across zones and time | Peak “walks” near reflective objects; range stable-but-wrong |
| AoA phase consistency | BLE AoA | Phase residual vs RSSI; per-channel gain/phase health checks | Systematic bias (constant offset) or jumpy angles (multipath + mismatch) |
| IQ overrun / DMA misses | BLE AoA | Runtime counters for buffer overrun, switching timing warnings | Works at low rate; fails at high update-rate due to contention |
H2-4. Time Sync & Timestamp Quality (Anchor-Side “Truth Clock”)
Intent: make time sync an engineering object—not a networking slogan. For TDoA and any fusion that depends on anchor time stamps, the anchor clock is the measurement ruler. If the ruler jitters, offsets, or drifts, the final position becomes noisy, biased, or time-dependent.
Three clock metrics that decide positioning quality
- Timestamp jitter (short-term): increases scatter/noise—results “jump” even when the tag is still.
- Sync offset (systematic): creates stable-but-wrong bias—whole zones can shift together.
- Holdover drift (lost-sync): errors accumulate with time/temperature—often appears after reboot or link loss.
Anchor internal clock chain (where errors are introduced)
- Clock source: TCXO for cost/power; OCXO for higher stability in critical anchors. Selection is driven by required holdover drift and allowed jitter.
- PLL / jitter cleaning: shapes and distributes the clock; poor configuration can trade spur/jitter for “looks stable” but harms timestamp quality.
- Divider / clock tree: routes clocks to timestamp hardware; gating and domain crossings can introduce capture latency variation.
- Timestamp unit: the capture point must be explicit (e.g., UWB RX/TX event edges). If capture point is ambiguous, root cause cannot be proven.
Sync sources (reference only; keep focus inside the anchor)
- Wired sync input (e.g., from local controller): preferred when physical infrastructure allows periodic verification.
- Over-air sync (UWB): can reduce wiring but still requires anchor-side integrity monitoring and holdover behavior.
- Boundary: this page does not cover network switch timing architecture—only the anchor’s sync ingest, monitoring, and holdover.
Evidence to log (make time integrity auditable in the field)
| Evidence field | Why it matters | Recommended view | Typical clue |
|---|---|---|---|
| offset (sync error) | Determines systematic bias in TDoA | Histogram + percentiles | All tags shift in same direction after anchor resync |
| jitter (short-term) | Determines noise floor / scatter | Histogram (spike detection) | Rare spikes cause sudden jumps during high traffic |
| holdover drift | Determines behavior when sync is lost | Drift vs temperature curve | Slowly worsening errors after reboot/temporary link loss |
| sync_state (locked/holdover) | Explains “why it changed” | Timeline events | Zone instability correlates with state transitions |
| capture_latency_var | Detects hidden timing path issues | Time-series trend | Latency variance rises with CPU load or clock reconfiguration |
H2-5. Security Model: Identity, Keys, Encryption, Replay Protection
Intent: in security RTLS, the goal is not only “can locate,” but “the location evidence cannot be forged.” Each ranging/angle/timestamp record must be bound to a verified device identity, a valid key generation, and a trusted firmware state.
Security RTLS requires:
- Authentic identity: tag/anchor identity is verifiable (not just a broadcast address).
- Protected evidence: UWB/BLE messages are integrity-protected and optionally encrypted.
- Replay resistance: captured packets cannot be reused to fake motion or presence.
- Trusted boot chain: firmware cannot be replaced or rolled back silently.
Identity model (device-side only)
- Device identity should be verifiable and stable across resets; avoid relying on public radio identifiers as the trust anchor.
- Anchor identity is higher impact (it defines the “reference ruler”), so it must expose a verifiable identity plus a trust state (e.g., boot verified, key generation, sync state).
- Boundary: certificate issuance and cloud PKI workflows are out of scope; only device-side storage, verification, and audit signals are covered.
Key strategy (what matters is lifecycle, not jargon)
Encryption & integrity for UWB and BLE payloads
- UWB ranging exchanges: protect the fields that define the measurement (challenge/response, timing-critical data). The critical requirement is tamper-evidence (integrity) and optional confidentiality.
- BLE advertising payload: treat it as public and recordable. If advertising carries identity or session evidence, bind it to a fresh nonce/counter and authenticate it (so it cannot be replayed).
- Binding: cryptographic protection must bind identity + message type + nonce/counter + timestamp (if present) to prevent cut-and-paste forgeries.
Replay protection (make it power-loss safe)
- Rolling counter / nonce: tag increments a counter (or derives a nonce) per message; anchors reject repeats or out-of-window values.
- Power-loss behavior: replay defense fails if counters reset. Use a protected monotonic counter (typically inside Secure Element/OTP) or a robust “resume” rule that cannot be exploited.
- Window + reason codes: use a replay window with explicit reject reasons (duplicate, stale, counter rollback, nonce reuse).
Secure boot, firmware signing, rollback protection
- Secure boot: verify the boot chain before radios produce security evidence. If verification fails, enter a fail-secure mode (no trusted positioning evidence).
- Firmware signing: signature verification must occur in a trusted stage (ROM/immutable bootloader). Log verification outcomes.
- Rollback protection: prevent loading older vulnerable images by enforcing a monotonic version counter.
Evidence to log (auditable security, not “trust me”)
| Evidence field | Collected by | Why it matters | Common failure clue |
|---|---|---|---|
| boot_state (verified/failed/recovery) | Tag / Anchor | Proves firmware authenticity for evidence production | “Works” but should be untrusted when verification fails |
| fw_version + rollback_counter | Tag / Anchor | Prevents silent downgrade to vulnerable firmware | Unexpected version regression after service/repair |
| key_version / key_id | Tag / Anchor | Audits key lifecycle and rotation | Key mismatch correlates with auth failures/rejects |
| replay_reject_count + reason | Anchor | Detects real replay attacks and counter rollback | High rejects after power loss suggests counter reset risk |
| se_status (locked/error) | Tag / Anchor | Explains why crypto/zeroize/counters failed | Intermittent SE errors during bursts or brownouts |
H2-6. Anti-Tamper & Physical Attack Surfaces (Tag vs Anchor)
Intent: close real-world threats (open, block, replace, spoof, jam) with hardware-visible evidence and policy actions. Anti-tamper is a closed loop: trigger → detect → decide → zeroize/degrade → log. The system must fail-secure rather than silently producing forged “trusted” location evidence.
Practical threat classes (device-side closure):
- Open / probe: case open, port probing, sensor bypass attempts.
- Block / shield: intentional attenuation, enclosure manipulation, antenna detune.
- Replace: swapping anchors/tags to fake presence or move trust.
- Spoof / replay / jam: RF injection, replays, interference bursts.
Tag anti-tamper (mobile, near-field attack is common)
- Case-open switch: high-confidence trigger; requires mechanical integration and clear “open → untrusted” policy.
- Multi-sensor tamper: accelerometer/light/magnetic reed can cover bypass attempts; use thresholds to avoid transport/maintenance false alarms.
- Encapsulation / mesh: goal is not “never open,” but “opening is detectable and leaves evidence.”
- Zeroize policy: on tamper, erase secrets or lock them inside SE; then enter fail-secure mode (alarm-only or stop evidence).
Anchor anti-tamper (replace & port attacks are higher impact)
- Anti-replacement: anchor identity must not be clonable. Treat a new anchor as untrusted until it proves identity + trusted boot + key generation.
- Port protection: debug/maintenance interfaces should trigger state changes (port open → higher scrutiny / log events / limited function).
- Fail-secure behavior: if anchor trust is uncertain (tamper, boot fail, key error), mark outputs untrusted or stop producing trusted timestamps.
RF jamming/spoofing closure (evidence-based flags, device-side)
- RF anomaly indicators: sudden RSSI floor rise, CIR flattening, AoA variance explosion, or retry/CRC error spikes.
- Policy actions: mark measurement windows untrusted, throttle updates, and raise alarm events rather than outputting “normal-looking” but forged data.
- Audit requirement: record anomaly flags with timestamps to correlate with site incidents.
Evidence to record (prove the loop executed)
| Evidence field | Source | What it proves | Use in investigation |
|---|---|---|---|
| tamper_irq_count + tamper_reason | Tag / Anchor | Tamper detection actually fired | Differentiate real attack vs RF noise |
| zeroize_done (success flag) | Tag | Secrets were erased/locked as intended | Proves fail-secure closure, not just a warning |
| device_lock_state (locked/degraded/alarm-only) | Tag / Anchor | Post-tamper behavior is controlled | Explains missing updates or reduced trust |
| rf_anomaly_flags (RSSI/CIR/AoA) | Anchor | RF conditions indicate jam/spoof attempts | Correlate with site alarms and CCTV footage |
| port_state_events | Anchor | Physical interface access occurred | Identify probe attempts or maintenance windows |
H2-7. Power Architecture: Battery Tag vs PoE Anchor (Budget First)
Intent: design from a power budget, not from a feature wish-list. For tags, update rate and security features must fit an energy envelope. For anchors, supply transients must not distort timestamps or RF waveforms, otherwise “trust clocks” and ranging evidence degrade.
Power-first rules:
- Tag: positioning capability = allowed duty-cycle under battery + peak current limits.
- Anchor: positioning quality = rail stability during radio bursts and timestamp capture.
- Supply transients translate into evidence errors: rail dip → clock jitter/offset → timestamp drift; rail ripple → RF distortion → CIR/AoA quality loss.
Battery tag: build an “energy ledger” before choosing rails
Tag rail partition: domain split is for transients + noise + controllable shutdown
- Always-on domain: Secure Element / monotonic counter / minimal RTC or wake logic (must survive short dips without state corruption).
- Radio burst domain: UWB/BLE rails sized for peak current and short load steps; isolation reduces burst noise coupling into clocks and ADC paths.
- MCU domain: can be gated aggressively; keep wake latency predictable if it participates in timestamping or AoA capture coordination.
- Buck vs LDO: the selection is driven by peak droop margin, PSRR/noise into timing/RF blocks, and efficiency under the real duty cycle.
PoE anchor: treat load steps as a timestamp/RF integrity risk
- PoE PD → DC/DC rails: define which rail supplies clock/PLL, which supplies radio front-end, and which supplies MCU/log. Integrity problems often occur when these rails share impedance.
- Hold-up (supercap / small battery): not for uptime. It is for power-loss logging and time/state hold so the anchor does not reboot into an untrustworthy timestamp regime.
- Transient chain: burst current → PD/load-step → rail droop/ripple → timestamp capture quality and RF pulse shape distortion.
Evidence to capture (prove the budget is real)
| Evidence | Where | What it reveals | Typical correlation |
|---|---|---|---|
| Current profile: sleep / scan / ranging / AoA / crypto / log | Tag | Energy ledger and peak load steps | Update-rate increases drive average & peak current up |
| Brownout / reset reason counters | Tag / Anchor | Supply margin failures under peaks | Timestamp drift or missing evidence after dips |
| PoE load-step waveforms: PD input + key rails | Anchor | Rail stability during bursts | Rail droop aligns with higher jitter/offset |
| Integrity proxy: jitter/offset stats + CIR quality | Anchor | Whether supply noise corrupts evidence | Bad rails → CIR quality drop and AoA variance rise |
| Hold-up status: time-kept + log-flushed flags | Anchor | Power-loss closure executed | Site outages still produce auditable logs |
H2-8. Scheduling & Coexistence: UWB + BLE AoA + Crypto + Logging
Intent: explain why higher update rates often “drift” even with the same RF hardware. The bottleneck is frequently scheduling: timestamp windows and IQ captures are real-time evidence. Interrupt latency, DMA contention, and storage writes can break evidence integrity and appear as positioning error.
Scheduling invariants:
- Hard real-time: UWB RX/TX event timing and BLE AoA IQ capture cannot be delayed without bias.
- Contention sources: interrupts, DMA, bus bandwidth, and flash writes collide at higher rates.
- Auditability: the system must expose counters/timing stats that prove where evidence was lost.
Timeslot model (queue the evidence producers)
- UWB ranging window: schedule TX/RX events so timestamp capture occurs with minimal jitter and stable ISR latency.
- BLE advertising + AoA IQ capture: AoA capture is more fragile than plain advertising because DMA and sample timing must not slip.
- Crypto: message authentication/encryption must fit inside slack time, otherwise it steals time from capture windows.
- Logging: log writes should be buffered and flushed outside capture windows; avoid flash writes during AoA or timestamp-critical edges.
Two dominant failure modes (and what to measure)
- Interrupt latency → timestamp bias: delayed ISR/service time shifts capture points. Measure worst-case IRQ latency and missed-event counters, not just average CPU load.
- DMA contention → IQ overrun: bus/DMA conflicts drop or delay IQ samples. Measure IQ buffer overrun, DMA error counts, and AoA variance spikes correlated to load.
Storage atomicity (device-side): calibration, key version, event logs
- Calibration data: must be written atomically (CRC + swap), but never at the expense of real-time capture. Prefer deferred commit after windows.
- Key versions/counters: atomic update protects replay defenses. Counter rollback due to partial writes creates a replay window.
- Event logs: use a RAM ring buffer + deferred flush. Log flush bursts should be measurable and rate-limited.
Evidence counters (prove the scheduler is the root cause)
| Counter / metric | What it catches | Why it causes “drift” | First isolation step |
|---|---|---|---|
| IRQ latency (histogram + worst-case) | Timestamp service delays | Late timestamp capture looks like range error | Raise priority for timestamp path; reduce preemption |
| IQ overrun / dropped samples | AoA capture loss | Angle jumps and high variance under load | Reserve DMA/bus time for AoA; separate buffers |
| UWB missed response / retry counts | Timeslot miss and airtime collisions | Lower effective update rate and inconsistent timing | Increase guard time; reduce concurrent tasks |
| Crypto time (peak per packet) | CPU spikes | Steals slack from capture windows | Batch crypto in non-critical slots |
| Log flush duration + deferred writes | Flash write bursts | Bus/CPU stalls cause capture slip | Use ring buffer; flush outside AoA/UWB edges |
H2-9. Calibration & Production Test (AoA Phase / Ranging Bias / Time Offset)
Intent: turn “factory consistency” into a repeatable SOP. Without calibration, stable positioning is not achievable. This chapter defines what must be measured, how calibration is sealed and audited on-device, and how runtime applies parameters without breaking security or consistency.
Calibration principles:
- AoA: phase and gain mismatches (plus array geometry error) dominate angle stability.
- UWB: ranging bias and antenna delay dominate systematic distance error; CIR gating rejects bad evidence.
- Anchor time: offset and drift model dominate timestamp consistency across anchors.
- Device-side audit: every calibration set must be versioned, CRC-verified, and write-protected.
AoA calibration: phase, gain, and array geometry
- What to measure: reference angle sweeps (or fixed known angles) to estimate per-path phase and gain mismatch; validate residual phase error after calibration.
- What to store:
phase_offset[],gain_trim[], and a compactgeometry_model(or factory-measured spacing parameters). Addband_idandtemp_model_idfor runtime selection. - Runtime apply: apply phase/gain corrections before AoA estimation; expose counters such as
aoa_apply_ok,aoa_phase_residual_p95,aoa_variance.
UWB calibration: ranging bias, antenna delay, and CIR quality gates
- Ranging bias SOP: measure at known distances; compute a per-config bias (single offset) or a small fitted curve. Store as
uwb_bias_model. - Antenna delay SOP: measure board-specific TX/RX delays; store
tx_ant_delayandrx_ant_delay. - CIR gating: define “good evidence” thresholds from factory statistics (e.g., minimum CIR quality score). Expose
cir_qualityandreject_reasoncounters at runtime.
Anchor time calibration: offset initialization and temperature drift compensation
- Offset initialization: measure each anchor’s timestamp offset to a reference during production; store
time_offset_init. - Drift model: store a compact temperature drift coefficient or LUT identifier as
drift_model; exposeoffset_hist_p95andjitter_histto prove stability. - Apply point: apply corrections at the timestamp unit boundary (before evidence is emitted to the gateway boundary) and log apply status.
Calibration storage contract (sealed, CRC, and write-protected)
- Structured payload: header + payload + CRC. Header should include
cal_type,version,device_id_hash,created_ts, andvalid_flag. - Atomic update: write new block → verify CRC → flip
valid_flag(or A/B slots). Never partially overwrite the only valid set. - Write-protect: runtime prevents accidental overwrite; service mode requires explicit authorization (device-side only).
- Audit fields:
cal_version,cal_crc,cal_apply_ok,cal_apply_fail_reason, andcal_update_count.
H2-10. Validation Metrics & Test Plan (Security RTLS-Focused)
Intent: define acceptance KPIs that are testable and auditable. Validation should align scenarios × metrics × thresholds × evidence fields so accuracy, latency, robustness, power, and security claims can be verified up to the gateway boundary.
Validation philosophy:
- Percentiles over averages: p50/p90/p95 define user-visible stability.
- NLOS and density are mandatory: multipath and collisions are the real operating regime.
- Security must be auditable: pass/fail requires logged evidence (reject reasons, boot states, tamper flags).
Core KPI groups (gateway-boundary)
Acceptance test matrix (scenario × metric × threshold × evidence)
| Scenario | Metric | Acceptance threshold (example) | Evidence fields / counters | Notes (device-side) |
|---|---|---|---|---|
| Static LOS | Position error p95 | ≤ target spec (site-defined) | pos_err_p95, cir_quality, aoa_variance |
Use consistent anchor geometry; record cal_version |
| Static NLOS (metal/occlusion) | Reject rate + bounded error | Reject bad evidence; error within limit when accepted | reject_reason, cir_quality, range_retry |
Proof is “reject with reason”, not silent drift |
| Dynamic walk | Track lag + p95 error | Lag ≤ budget; error within limit | processing_ms, queue_depth, pos_err_p95 |
Correlate with scheduler counters |
| High density (many tags) | Sustained update rate | ≥ required rate without collapse | missed_slot, packet_loss, retry_count |
Check timeslot contention and airtime budget |
| Interference (busy 2.4G) | AoA stability + IQ loss | IQ_overrun remains below threshold | IQ_overrun, DMA_err, aoa_variance |
DMA/IRQ isolation should be visible |
| Power stress (tag burst / anchor load-step) | Brownout-free evidence | No brownout; stable jitter/offset | brownout_count, reset_reason, jitter_hist |
Capture rail waveforms at key points |
| Replay attack | Reject with reason | 100% replay rejected | replay_reject_count, replay_reason, key_version |
Nonce/rolling window must be logged |
| Spoofed identity | Auth fail behavior | Unauth evidence rejected | auth_fail, device_id, key_version |
Reject must not collapse scheduler |
| Tamper (case open / probe) | Zeroize + alarm path | tamper triggers policy + logs | tamper_reason, zeroize_done, tamper_count |
Verify device-side fail-safe state |
| Firmware attack (rollback / unsigned) | Secure boot enforcement | Boot blocked or degraded with logs | boot_state, rollback_counter, fw_sig_fail |
Pass requires auditable boot state |
Key measurement points (power/clock/RF/log) for repeatable validation
- Power points: tag PMIC output rails; anchor PD input and clock/radio rails during bursts.
- Clock/timestamp points: timestamp capture boundary (track jitter/offset histograms).
- RF evidence points: CIR quality, AoA IQ overrun/late capture, retry/missed response counters.
- Log points: diagnostic port or structured counters for
reject_reason,apply_ok, and security events.
H2-11. Field Debug Playbook: Symptom → Evidence → Isolate → Fix
Intent: this is the strongest EEAT module—engineers should be able to diagnose fast using a strict evidence chain. Each symptom starts with the first 2 measurements, then a discriminator, then a first fix that can be verified immediately.
- Only two “first measurements” per symptom. Everything else is optional.
- Discriminator = proof: use histograms/counters, not guesses.
- First fix: one action + one expected improvement signal.
- Log and keep: every branch must name the evidence fields to save.
Symptom 1 — Large drift / unstable position
First 2 measurements (capture 60–120 s, static tag):
- Anchor-side timestamp jitter/offset histogram:
ts_jitter_hist,ts_offset_hist,ts_capture_late - UWB evidence quality:
cir_quality,reject_reason,range_retry
Discriminator (pick the dominating pattern):
- Time-dominated drift: jitter/offset histogram widens, but CIR stays healthy → timestamp capture path or clock stability is suspect.
- RF-dominated drift: CIR quality drops, rejects increase → NLOS/multipath or antenna detune dominates.
- Scheduling-dominated “pseudo drift”: occasional jitter spikes align with
missed_slot/DMA_err/IQ_overrun→ contention/IRQ latency dominates.
Isolate (fast):
- Cut update rate by 50% for 2 minutes → if jitter spikes disappear, this is scheduling/CPU/DMA pressure.
- Repeat in a low-metal area + fixed tag orientation → if CIR improves sharply, this is RF/multipath dominated.
- Check whether
ts_capture_latecorrelates withlog_write_busyor crypto time counters.
First fix (choose one, then verify immediately):
- Time path: reduce timestamp capture latency (raise priority, shorten ISR, avoid blocking logs in critical slots). Expect
ts_jitter_histto narrow. - RF path: tighten CIR gating (reject bad evidence with reasons). Expect
reject_reasonto rise butpos_err_p95to shrink. - Scheduling path: move log writes/crypto out of tight ranging windows. Expect
missed_slotto drop and jitter spikes to vanish.
ts_jitter_hist, ts_offset_hist, ts_capture_late, cir_quality, reject_reason, range_retry, missed_slot, DMA_err, IQ_overrun
Symptom 2 — Distortion only in specific zones
First 2 measurements (repeat a fixed route 10×, label zones):
- NLOS / UWB quality: zone-wise
cir_qualitydistribution andreject_rate(+ topreject_reason) - AoA residual:
aoa_phase_residual_p95and its change vs orientation
Discriminator:
- NLOS-dominated: CIR degrades and rejects spike only in that zone → multipath/occlusion dominates distance evidence.
- AoA-array-dominated: AoA residual jumps and is highly orientation-sensitive → array shading/metal reflections dominate angle evidence.
- Both degrade: gating + calibration confidence are missing; the system outputs “bad evidence” instead of rejecting.
Isolate (fast):
- Change tag orientation (front/side/back) in the bad zone → if residual follows orientation strongly, AoA shading is primary.
- Shift anchor height or angle slightly (no topology change) → if the bad zone “moves”, multipath geometry dominates.
First fix:
- NLOS path: enforce CIR quality gate and confidence labeling. Expect lower accepted rate but tighter error percentiles.
- AoA path: redo AoA phase calibration (per channel group) and verify residual distribution recovery.
zone_id, cir_quality, reject_reason, reject_rate, aoa_phase_residual_p95, rssi_var
Symptom 3 — Battery drains too fast
First 2 measurements (run 1 hour, export summary):
- Wake reason histogram:
wake_reason,wake_count,avg_awake_ms - Duty cycle ledger: time-in-state for scan/ranging/adv/crypto/log →
duty_scan,duty_ranging,duty_adv,duty_crypto,duty_log
Discriminator:
- Unexpected wakeups: wake reasons dominated by IRQ noise/tamper bounce/IO chatter → power loss is “event driven”.
- Over-aggressive schedule: duty cycle is higher than budget (update rate, retries, scan windows too dense).
- Crypto/log tax: awake time grows with security/logging tasks rather than RF airtime.
Isolate (fast):
- Disable one feature for 20 minutes (AoA capture or high-rate ranging) → confirm which duty bucket collapses.
- Compare
avg_awake_msbefore/after batching logs/crypto → confirm CPU tax.
First fix:
- Unexpected wakeups: debounce/tighten thresholds; collapse chatty interrupts. Expect wake histogram to shift toward “scheduled”.
- Schedule tax: enforce an explicit power budget (max retries, min adv interval, capped scan windows). Expect duty totals to meet target.
- Crypto/log tax: batch and defer heavy tasks outside critical RF windows. Expect
avg_awake_msto shrink.
wake_reason, wake_count, avg_awake_ms, duty_scan, duty_ranging, duty_adv, duty_crypto, duty_log
Symptom 4 — Location evidence is suspected to be forgeable
First 2 measurements:
- Key state:
key_version,key_slot,key_state - Replay protection:
replay_reject_count,replay_reason,nonce_window
Discriminator:
- Replay protection not enforced: replay attempts do not increase reject counters → nonce/window is missing or not applied to all evidence types.
- Key rollback / mismatch: unexpected key version or slot used → evidence can be downgraded silently.
- Boot chain not verifiable (optional check):
boot_stateindicates non-verified or rollback counter not monotonic.
Isolate (fast, auditable):
- Replay a captured valid message → must produce
replay_reject_count++with a specificreplay_reason. - Attempt key version downgrade (service mode) → device must reject or degrade with logs;
key_versionmust remain monotonic. - Attempt firmware rollback → verify
boot_state+rollback_counterbehavior is logged.
First fix:
- Make rejection mandatory and auditable: every untrusted evidence must be rejected with reason + counters. Expect security review to pass by logs, not by assumptions.
key_version, key_slot, key_state, replay_reject_count, replay_reason, nonce_window, boot_state, rollback_counter
Concrete BOM / MPN examples (use as reference options)
These are commonly used parts for RTLS tags/anchors. Select based on region/certification, power, and RF front-end requirements.
- UWB transceiver: Qorvo/Decawave
DW3000(UWB),DW3110module family (design-dependent) - BLE (AoA capable, IQ sampling support depends on SoC): Nordic
nRF52833,nRF52840(direction finding support in Nordic DF stack) - Secure element (key storage / attestation): Microchip
ATECC608B, NXPSE050 - MCU (anchor/tag control): ST
STM32U5(low-power + security features), STSTM32H5(performance/security) - Clock / TCXO: Abracon
ASTX-H11(TCXO family), EpsonTG-2520(XO family; select ppm/temp spec as needed) - RF switch (AoA antenna array): pSemi
PE4259(example RF switch), SkyworksSKY13418family (band dependent) - LNA (if external is needed): Qorvo
QPL9057(example LNA family; verify band/noise figure needs) - Tag PMIC / buck: TI
TPS62740(ultra-low-power buck), TITPS62840(high-efficiency buck family) - Battery charger (tag): TI
BQ25120A(small Li-Ion charger family), Analog DevicesLTC4054(charger family) - eFuse / load switch (anchor rails): TI
TPS2595eFuse family, Analog DevicesLTC4365surge stopper family (use-case dependent) - Nonvolatile for counters/logs (device-side): Cypress/Infineon
FM24CL64BFRAM, WinbondW25Q64QSPI NOR flash - Tamper sensors: Omron
D2Fseries micro-switch (case open), Littelfuse590xxreed switch family (magnetic), VishayVEML6030(ambient light sensor) - ESD/TVS (I/O protection): Nexperia
PESD5Vfamily, LittelfuseSMFseries TVS (select for interface)
Tip: tie each “first fix” to a measurable outcome (jitter histogram narrows, reject_reason rises but p95 error shrinks, wake histogram shifts, replay rejects become 100% with reasons).
H2-12. FAQs (Evidence-based, no scope creep)
UWB ranging looks stable but position still drifts — suspect time sync or multipath first?
Start with two signals: anchor timestamp stability (ts_jitter_hist / ts_offset_hist) and UWB evidence quality (cir_quality with reject_reason). If jitter/offset widens while CIR stays healthy, treat it as time capture/sync holdover pressure. If CIR degrades or rejects spike, treat it as multipath/NLOS and gate low-quality ranges first.
AoA angle jumps wildly — is it array calibration or metal reflections?
Measure AoA phase residual (aoa_phase_residual_p95) and its dependence on tag orientation or zone. If residual stays high everywhere and does not correlate with specific geometry, suspect calibration version/apply integrity (phase/gain mismatch). If residual spikes only near metal or with certain orientations, suspect reflections/shadowing. First fix is re-calibration with version lock, then confidence gating in reflective zones.
It only gets worse at high foot traffic — scheduling conflict or over-the-air collisions?
Check device-side pressure counters (missed_slot, IQ_overrun, DMA_err) and air-interface stress (range_retry or packet error rate). If overruns rise with the error, it is scheduling/IRQ/DMA contention; move crypto/log writes out of tight slots and give IQ capture priority. If retries rise without overruns, it is airtime contention; cap retries and randomize timing.
Tag battery life is far below expectation — look at wake reasons or current spikes first?
Start with the wake reason histogram (wake_reason + counts) and a duty-cycle ledger (time in scan/ranging/adv/crypto/log). If “unexpected” wakes dominate, fix debounce/thresholds and reduce chatty interrupts. If duty is simply too high, reduce update rate and retries. If awake time is long, batch crypto/log work outside RF windows and verify avg_awake_ms drops.
PoE anchor occasionally reboots and the whole area drifts — which two waveforms first?
Capture (1) the PoE-PD downstream rail at the radio/MCU domain with PG/RESET and brownout counters, and (2) the clock/timestamp health around the event (PLL_lock if available, plus a step in ts_offset). If rails dip before reset, treat it as power transient/hold-up. If rails stay solid but offset jumps, treat it as clock/timestamp disruption and isolate capture timing.
Payload is “encrypted” but still replayable — where should nonce/counters live?
Verify replay enforcement by observing replay_reject_count growth and a specific replay_reason during a replay test. Then validate counter monotonicity across reset/power loss (no rollback). Nonce/window state must be bound to a monotonic counter stored in a non-rollback device-side trust anchor (secure element or protected NVM). First fix: mandatory reject + reason + counter increment for every evidence type.
After opening the enclosure, the device still “works” — how should tamper policy behave?
Confirm the tamper chain is real: tamper_latch_state must change and remain latched, and the security state must reflect it (key_state, zeroize_success, and optionally boot_state). If tamper triggers but keys remain usable, the policy is not closed-loop. First fix: fail-secure behavior—zeroize or lock keys, stop producing trusted ranging results, and mark evidence as untrusted with auditable logs.
Calibration sometimes “fails” after writing — CRC passes but behavior is wrong; what next?
Check the calibration version/commit record (cal_version, cal_crc, and an atomic “commit flag”) and correlate with runtime apply timing (cal_apply_fail or slot-conflict counters). CRC can pass on a stale or partially switched record if the commit is not atomic. First fix: double-buffer calibration, write-verify, then pointer-swap + lock; apply only in a non-critical timeslot and log the active version.
After replacing an anchor, the entire network shifts — is it location binding or certificate binding?
Inspect identity continuity (anchor_id, cert_fingerprint, key_version) and the time baseline (ts_offset convergence after swap). If identity or key version differs from policy, treat it as a trust-binding issue and prevent “silent replacement.” If identity is correct but offset baseline changes or fails to settle, treat it as time sync/holdover quality. First fix: enforce auditable replacement + release only after offset/jitter passes thresholds.
UWB distance looks OK, but fused position gets worse — what evidence should drive weights/confidence?
Base confidence on measurable evidence, not distance alone: cir_quality, NLOS flags, and retry/fail rates should drive down-weighting or rejection. Validate the rule against the acceptance matrix (scenario × KPI × threshold) so it is testable. First fix: implement a minimal evidence-to-confidence mapping (good CIR → accept; low CIR/NLOS → down-weight or reject) and log the confidence used per update to debug regressions.
Accuracy worsened after firmware update — did RF parameters change or scheduling change?
Compare two deltas: RF evidence distribution (cir_quality, reject_rate, aoa_phase_residual_p95) and scheduling pressure (missed_slot, IQ_overrun, crypto/log busy time). If RF distributions shift while scheduling stays clean, suspect RF config or calibration apply changes. If overruns/latency rises with similar RF stats, suspect scheduling priorities/DMA conflicts. First fix is to restore the stable slot plan and then re-validate RF baselines.
How to validate “forgeable” risk with minimal equipment?
Use three auditable tests: replay (must increment replay_reject_count with a clear replay_reason), key downgrade attempt (must keep key_version monotonic), and firmware rollback attempt (must reflect in boot_state/rollback_counter). Save logs using the H2-11 template (symptom → 2 measurements → discriminator → first fix). First fix for any failure is mandatory reject + reason + counter increments, then retest.