Visitor Kiosk & Turnstile Hardware Architecture
← Back to: Security & Surveillance
Core idea: A visitor kiosk + turnstile works only when the full chain is deterministic and auditable—capture → verify → policy decision → I/O actuation → sensor feedback → trusted logs. Most “it recognizes but won’t open” failures reduce to one of four measurable causes: policy/cache mismatch, timing/QoS jitter, power brownout, or safety interlock feedback.
H2-1. Definition & Boundary (What this page is / is not)
Working definition (engineering-grade)
A visitor kiosk + turnstile controller is an edge security device that fuses multi-modal identity inputs (face / ID scan / ticket / NFC), applies local policy, drives an actuator with safety interlocks and feedback, and emits auditable event records that can be validated in the field.
What this page solves (3 measurable outcomes)
- Input aggregation clarity: what data enters the device, where it is preprocessed, and what minimal “identity claim” exits the recognition stack.
- Actuation closed-loop reliability: the open/deny command is not enough—this page defines interlocks, feedback signals, and “final state” confirmation.
- Evidence chain you can debug: every stage has observable counters/fields (timestamps, latency, reason codes, interlock states) to isolate failures quickly.
Not in scope (to prevent cross-page overlap)
- VMS/NVR / recording platforms: multi-stream ingest, recording integrity, and compliance belong to the Recording & Video Platforms cluster.
- Whole-building access system design: panel topology and building-wide wiring belong to Access Control Panel.
- Protocol deep dives: OSDP/EMV/TLS stacks are referenced only as interface boundaries (no walkthroughs).
- ISP/algorithm tuning: this page stays at hardware evidence level (sync, saturation flags, counters), not image-quality recipes.
Internal links (examples):
Access Control Panel ·
Prox/IC/QR/NFC Reader ·
Face Access Controller ·
Power & Backup for Security
Source: ICNavigator / Security & Surveillance
H2-2. System Architecture Overview (Data path + control path)
The fastest way to prevent “scope drift” in kiosk/turnstile design is to split the system into two observable pipelines: an Identity Pipeline (who is this?) and an Actuation Pipeline (what physical state did we actually achieve?). Both pipelines must converge at a policy decision point and end with an auditable event record.
Identity pipeline (capture → claim)
- Capture: RGB/IR/depth frames, ID image, QR/ticket scan, NFC tap.
- Preprocess: normalization, ROI extraction, basic quality gates (blur/glare/IR saturation flags).
- Infer/Decode: NPU inference (face) and decode engines (QR/MRZ/barcode).
- Verify/Match: compare to local templates/caches; compute confidence and failure reason codes.
- Output: an identity claim (credential_id / user_id / ticket_id + confidence + reason_code).
Actuation pipeline (decision → confirmed physical state)
- Policy decision: offline/online state, cache freshness, schedule rules, deny/allow modes.
- Safety interlocks: emergency override, fire input, tamper, anti-tailgating beam, rotation sensor readiness.
- Drive: relay/driver command, timing window, retry policy, timeout policy.
- Feedback: door contact / rotation pulses / jam detect / E-stop state.
- Output: a final state (opened / denied / timeout / reversed / forced-open) with timestamps.
Evidence card (minimum measurable checkpoints)
A kiosk/turnstile system is only debuggable if each stage emits small, stable counters. The goal is not “more logs”; it is two or three decisive signals per stage that isolate root cause.
- Capture: frame_timestamp, dropped_frame_count, exposure/IR_saturation_flag
- Inference/Decode: inference_latency_ms, npu_util_pct, model_version
- Decision: policy_version, offline_flag, cache_hit, reason_code
- Actuation: cmd_timestamp, interlock_state_mask, timeout_count
- Feedback: rotation_pulses, door_contact_state, jam_flag
- Audit: event_id, monotonic_seq, final_state, local_log_write_ok
Source: ICNavigator / Security & Surveillance
H2-3. Recognition Compute (SoC/NPU sizing & latency budgeting)
The kiosk must keep access decisions deterministic while running multi-modal perception. This chapter focuses on edge compute sizing and a latency budget that can be verified with counters and timestamps—without drifting into cloud workflows or ML theory.
3.1 Input streams and peak concurrency
- Face stream: RGB camera (optionally paired with IR/depth). Define target FPS and resolution for the capture distance.
- Document stream: ID/document camera optimized for glare and MRZ/QR readability.
- Code scan: QR/barcode imager stream (continuous or triggered).
- Optional biometric: fingerprint AFE or other sensor paths should be treated as separate real-time lanes.
3.2 Latency budget (capture → decode → infer → decide → actuate)
Budget by stages so each failure mode maps to a measurable checkpoint. A practical design goal is: fast “first decision” under normal load and bounded worst-case under contention.
| Stage | Target (typ.) | What to log / measure | Common failure signature |
|---|---|---|---|
| Capture | 1–2 frames | frame_ts, exposure_ts, dropped_frames | dropped_frames spike under load |
| Decode / Preprocess | single-digit ms | decode_ms, resize_ms, memory_bw | DRAM bandwidth saturation |
| Infer | tens of ms | npu_ms, thermal_state, throttling_count | thermal throttle increases tail latency |
| Decision | few ms | decision_code, policy_version, offline_flag | policy mismatch or stale cache |
| I/O Command | few ms | io_cmd_ts, ack_ts, retry_count | bus retries or isolator brownout |
| Actuator Response | bounded by mechanics | sensor_feedback_ts, rotation_counts | interlock triggers / tailgating beam events |
3.3 Memory and storage sizing (models, templates, local caches, logs)
- Model footprint: size affects DRAM pressure and sustained thermal load; track model_version and memory headroom.
- Template/identifier store: enforce versioning and bounded growth; store only what is required for offline decisions.
- Audit log retention: ring buffer size must cover expected offline duration and burst events.
3.4 Determinism under load (watchdogs + QoS separation)
- Real-time lane: decision + actuator I/O must remain predictable even if inference slows.
- Watchdog strategy: detect deadlocks (NPU stall, bus hang) and recover without corrupting logs.
- Backpressure policy: cap inference concurrency and degrade gracefully (e.g., drop to lower FPS) rather than destabilizing actuation.
ICNavigator / Security & SurveillanceH2-4. Imaging & Liveness Front-End (RGB/IR/depth + lighting drivers)
Reliable face access depends less on algorithm theory and more on a stable capture chain: sensor choice, lighting synchronization, geometry, and observable saturation/quality indicators. This chapter describes the signal chain and timing coordination required for liveness and repeatable capture.
4.1 Sensor set (RGB + IR, optional depth)
- RGB camera: primary identity capture; define minimum SNR at target distance and typical lobby lighting.
- IR camera / IR band: improves low-light consistency and supports active illumination.
- Optional depth module: structured-light or ToF used as a system-level block; the key is timing and power domains, not ISP internals.
4.2 Liveness lighting drivers (IR flood / white LED flash)
- IR flood driver: gated illumination requires sync with exposure and rolling shutter behavior.
- White LED flash: can improve document/QR capture; must manage flicker, inrush, and EMI coupling into sensors.
- Trigger path: a single “capture window” should coordinate LED on/off, exposure start/end, and inference scheduling.
4.3 Mechanical constraints (FOV, stand-off distance, anti-spoof angles)
- Placement: choose FOV for typical standing distance and height variation; avoid extreme angles that increase spoof risk.
- Glare and ambient IR: lobby sunlight and glossy surfaces can saturate IR; include a saturation indicator in logs.
- EMI/grounding proximity: keep lighting current loops away from sensor and high-speed lanes.
4.4 Evidence counters (capture quality is measurable)
ICNavigator / Security & SurveillanceH2-5. ID / Ticket Scan Subsystem (document, QR, barcode, printer)
A kiosk is “complete” only when it can robustly ingest tickets/IDs and output receipts or visitor passes. The design target is not feature richness—it is failure-proof I/O with measurable success rates and jam/paper-out evidence.
5.1 Document scan (camera + illumination at block level)
- Document camera module: choose optics and distance for ID cards and passports; prioritize glare control.
- Illumination: controlled white LED improves decode; keep the pulse power isolated from sensor rails.
- Decode evidence: log decode retries and failure class (glare, blur, low light).
5.2 QR / barcode (imager vs laser, glare management)
- Imager-based scan: handles phone screens well but needs exposure tuning and anti-flicker awareness.
- Glare handling: mechanical hooding and controlled lighting reduce retries and queueing delays.
- Evidence: scan_success_rate, decode_retry_count, avg_decode_ms.
5.3 Ticket/receipt printer (power pulse + jam detect)
- Thermal head pulse: introduces inrush and supply sag; separate printer rail and log UVLO events.
- Sensors: paper-out and jam detect must map to stable reason codes.
- Interfaces: keep USB/serial paths observable (timeouts, retries) and isolate noisy grounds.
5.4 Evidence (make failures diagnosable)
ICNavigator / Security & SurveillanceH2-6. Access Linkage & Control I/O (turnstile, door, safety interlocks)
The core security function is reliable actuation with safety overrides and feedback loops. This chapter describes output drivers, input sensing, and a closed-loop control path that remains deterministic under noise, brownouts, and link outages.
6.1 Outputs (relay, high-side, motor driver at conceptual level)
- Relay / high-side: simple door strike control; design for flyback, contact wear, and brownout-safe states.
- Turnstile motor drive: H-bridge or BLDC stage at a block level; ensure feedback sensors close the loop.
- Isolated I/O: for long cable runs and noisy grounds; log retries and isolator faults.
6.2 Inputs (door contact, rotation sensor, anti-tailgating beam, tamper)
- Door/turnstile state: rotation counts, lock position, and beam breaks define whether access truly occurred.
- Safety interlocks: emergency release, fire-alarm force-open, and e-stop must override policy decisions.
- Tamper: enclosure switch and cable disconnect should map to stable alarms and log records.
6.3 Interface boundary (Wiegand/OSDP named only, RS-485/GPIO)
- Where protocols sit: treat them as link layers between controller and readers/panels; protocol deep dives are out of scope.
- Evidence: link_error_count, retry_count, and response_timeout define whether the issue is wiring, ground, or interface power.
6.4 Fail-safe states (deterministic behavior on faults)
- Power loss: define default lock/unlock behavior and guarantee a consistent event log record.
- Bus faults: fail closed/open should be a policy decision, but the mechanism must be deterministic and observable.
ICNavigator / Security & SurveillanceH2-7. Payment / Crypto Modules (module-level security boundary)
Payment capability turns a kiosk into a risk-bearing edge device. The engineering goal is to keep keys and trust anchors inside a dedicated security boundary (SE/SAM), while exposing only non-sensitive tokens, results, and reason codes to the main compute and logs.
7.1 Modules (reader + EMV module + SE/SAM)
- NFC payment reader: external capture point; treat as a high-attack-surface peripheral.
- EMV contact/contactless module: transaction execution stays module-contained; protocol derivations are out of scope.
- SE/SAM: root boundary for long-lived secrets and authenticated operations.
7.2 Key boundary (what stays inside SE)
- Inside SE: non-exportable keys, anti-replay counters, authenticated ops (sign/attest).
- Main SoC allowed: txn_id, result_code, reason_code, timestamps, upload state.
- Never in logs: any sensitive account data or anything that can reconstruct secrets.
7.3 Transaction logging template (non-sensitive, diagnosable)
| Field | Meaning | Common failure signature | First mitigation |
|---|---|---|---|
| txn_id | Local unique ID for correlation (not derived from sensitive data) | Duplicates indicate replay/queue bugs | Use random/monotonic + wrap protection; reject repeats at ingest. |
| ts_local + ts_quality | Timestamp + quality indicator (RTC vs network-corrected) | Time jumps when link flaps or RTC drifts | Persist last_sync_ts; log ts_quality; bound drift while offline. |
| pay_module | Which module path handled the request | Errors cluster on one module path | Separate counters per module; verify interface power and reset sequence. |
| result_code + reason_code | Outcome + standardized failure reason | “Unknown reason” prevents field triage | Enumerate stable reasons: timeout, boundary reject, offline deny, queue overflow. |
| offline_flag + upload_state | Whether offline policy applied and current sync state | Offline success but missing upload proof | Record upload ACK or terminal failure; never lose transitions. |
| queue_depth + queue_drop_count | Pressure indicators for offline queuing | Spikes precede deny/timeout events | Set max depth; enforce TTL; surface queue pressure in health metrics. |
| se_status | SE/SAM boundary health and response class | Boundary failures correlate with thermal/power events | Log retries/lockouts; stabilize SE rail; verify reset ordering. |
7.4 Offline policy choices (queue vs deny)
- Queue transactions: requires bounded depth, TTL, dedup markers, and clear overflow/expiry reason codes.
- Deny when offline: reduces risk exposure and simplifies audit but must emit stable “offline deny” codes for support.
ICNavigator / Security & SurveillanceH2-8. Connectivity & Local Data (Ethernet/PoE, Wi-Fi, local storage, offline mode)
Deployability depends on predictable links, bounded offline behavior, and durable local records. The objective is deterministic access decisions during link degradation, while preserving audit integrity via caches, ring-buffer logs, and controlled re-sync.
8.1 Links (primary + optional)
- Ethernet + PoE PD (primary): preferred for uptime and single-cable install.
- Wi-Fi (optional): treat congestion/roaming as expected failure modes; log link_health transitions.
- RS-485 uplink (optional): building integration boundary (no protocol deep dive).
8.2 Local data types (what must exist locally)
- Credential cache: whitelist/blacklist/tickets with TTL; cache hit/miss becomes operational evidence.
- Identifiers/templates (if used): versioned and bounded records; traceability without oversharing sensitive content.
- Audit logs: sequence-numbered events in a ring buffer; retention sized for offline durations and burst events.
8.3 Time base (RTC + time-quality)
- RTC with hold-up: keep time through brief interruptions.
- ts_quality: indicate network-corrected vs RTC-only vs uncertain time.
- Drift policy: bound acceptable drift for offline decisions and record drift state in logs.
8.4 Offline state machine (ONLINE → DEGRADED → OFFLINE → RESYNC)
ICNavigator / Security & SurveillanceH2-9. Power Tree, Thermal & Ruggedization (field reliability)
“Works in the lab but fails in the lobby” is usually a power, thermal, or ruggedization mismatch. The design goal is to partition power domains, bound inrush/UVLO events, and keep logging and decision paths stable through printer pulses, motor starts, ESD, and cable surges.
9.1 Power tree (PoE PD → rails by domain)
- Primary input: PoE PD (or DC input) feeding a structured rail tree with clear priority.
- Quiet domains: SoC core/DRAM, sensors, clocking, secure element.
- Noisy domains: lighting drivers, printer pulse rail, motor/relay rail, external I/O.
9.2 Inrush / UVLO events (printer heat pulse, motor start)
- Printer pulse sag: record UVLO_count and rail_min_mV during thermal head pulses.
- Motor start dip: capture inrush and ground bounce; separate motor rail from logic rail with clear return paths.
- Hold-up concept: keep brownout-sensitive writes (logs, counters) within a safe write window.
9.3 Thermal (SoC/NPU + enclosure) and throttling evidence
- Thermal bottleneck: kiosk enclosure and passive airflow create sustained hot spots.
- Throttling signature: tail latency increases and retry rates rise before visible failures.
- Evidence: thermal_state, throttle_count, p95_latency, dropped_frames.
9.4 Protection (device-level ESD/surge strategy, high-level grounding)
- ESD at touch ports: USB, card reader, metal bezel, UI ports need a defined discharge path.
- Surge for long runs: outdoor/long cables require a robust clamp strategy at the device entry.
- Ground strategy: separate noisy returns and keep sensor/SE rails stable; log ground-fault-like symptoms as repeatable reason codes where applicable.
ICNavigator / Security & SurveillanceH2-10. Validation Plan (acceptance tests that prove it’s stable)
Validation must prove stability across functional, performance, power, and environmental stress. The key deliverable is a repeatable test matrix with explicit pass/fail signals and required logs/counters.
10.1 Functional acceptance (does it do the right thing)
- Identity success rate: face/ID/QR decode success across expected user heights and lighting.
- Anti-tailgating: beam break + rotation feedback must reflect real passage.
- Emergency overrides: fire input / emergency release must preempt policy decisions.
10.2 Performance acceptance (latency distributions, throughput)
- Latency distribution: p50/p95 of capture→decision and decision→actuate paths.
- Throughput: people/min under peak queueing (including scan and printer scenarios).
- Stability markers: dropped_frames, retries, queue depth, throttle_count.
10.3 Power robustness (brownout, PoE drop, motor start, printer pulse)
- Brownout immunity: no corrupted logs; safe write window honored.
- Transient events: motor start and printer pulse must not cause UI hangs or policy resets.
- Evidence: UVLO_count, rail_min_mV, reboot_count, watchdog_reset_count.
10.4 Environmental stress (thermal, glare/IR sunlight, vibration, ESD)
- Thermal soak: sustained operation until steady-state temperature; verify throttle behavior and p95 latency.
- Glare / sunlight IR: verify ambient_ir_sat behavior and liveness retries.
- ESD at ports: UI ports and metal bezel; no persistent failures after discharge events.
10.5 Test matrix template (required evidence fields)
| Test Category | Test Case | Pass/Fail Evidence | Required Logs/Counters |
|---|---|---|---|
| Functional | Face + QR + ID flow (normal lobby lighting) | success_rate meets target; stable reason_code on failures | decision_code, decode_retry_count, liveness_retry_count |
| Performance | Peak throughput (queue) with mixed inputs | p95 latency within bound; no backlog collapse | p50/p95 latency, dropped_frames, queue_depth, throttle_count |
| Power | Printer pulse + motor start transient | no reboot/hang; logs consistent | rail_min_mV, UVLO_count, reboot_count, watchdog_reset_count |
| Environmental | Thermal soak + sunlight glare + ESD at ports | no persistent faults; recovery within policy | thermal_state, ambient_ir_sat, port_fault_count, recovery_events |
ICNavigator / Security & SurveillanceH2-11. Field Debug Playbook (Symptom → Evidence → Isolate → First Fix)
Goal: Diagnose kiosk + turnstile failures with minimum tools by forcing every issue into a short evidence chain: decision → command → feedback → power/time. Each symptom below lists the first two measurements, a discriminator (A vs B), and a first fix you can apply immediately.
Evidence fields (suggested): decision_code, reason_code, policy_version, offline_flag, actuator_cmd_ts, sensor_fb_ts, io_timeout_cnt, reset_reason, uvlo_cnt, rail_min_mV, thermal_state, throttle_cnt, ambient_ir_sat, illum_sync_ok, dropped_frames.
BOM/MPN examples used in this chapter (reference only): pick equivalents based on availability, voltage/current, isolation, and temperature grade.
| Subsystem | Common parts (MPN) | Why they matter for debug |
|---|---|---|
| PoE PD + input | TI TPS2372 / TPS2373, Microchip PD70224, Silvertel AG9700 (module) | Brownouts during motor/printer events often originate at PD UVLO, inrush, or hold-up sizing. |
| eFuse / power switch | TI TPS25947, TPS25940, TPS2663, Analog Devices LTC4365, LTC4412 (ideal diode) | Captures inrush/overcurrent protection behavior; helps segment “power fault” vs “software fault”. |
| DC/DC rails | TI TPS546D24A (buck), TPS62130, ADI LT8609S, MPS MPQ8633 | Rail droop/overshoot and load-step response determine reset immunity under pulses. |
| Reset / supervisor | TI TPS3899, Maxim/ADI MAX16054, Microchip MCP1316 | Turn “random reboot” into a measurable reset-cause + timing event. |
| Relay / DO drivers | TI ULN2003A, ST L6206, TI DRV8871 (DC), DRV8313 (3-phase), Infineon BTS500xx (high-side) | Actuator command may be correct while driver/current path fails; isolate by probing driver enables/currents. |
| Isolated I/O | TI ISO7741 (digital isolator), ADuM141E, TI ISO1050 (CAN), ISO3082 (RS-485) | Ground shifts and long cables cause phantom faults; isolation boundaries give clean evidence points. |
| RS-485 / UART | TI THVD1429, MAX3485, ST ST485 | For access linkage modules: timeouts/retries often indicate wiring, biasing, or common-mode issues. |
| Secure element | Microchip ATECC608B, NXP SE050, Infineon OPTIGA™ Trust M (SLS32AIA) | Separates “policy/crypto state” from SoC storage; helps debug denial due to keys/counters/time. |
| RTC + hold-up | Maxim/ADI DS3231M, NXP PCF85263A, Epson RX8130 | Timestamp quality drives audit + offline decisions; drift or reset breaks policy binding. |
| USB/port ESD | TI TPD4E05U06, Nexperia PESD5V, Littelfuse SP050x | Intermittent UI/scan failures and sudden resets often correlate with ESD at exposed ports. |
Tip: keep a “field kit” oscilloscope + current probe + USB serial dongle; every symptom below references only these tools and device logs.
SYMPTOM 1 Recognizes face but the turnstile doesn’t open
First 2 measurements
- decision_code + reason_code (ALLOW/DENY and why) at the exact moment UI shows “recognized”.
- actuator_cmd_ts vs sensor_fb_ts (command issued? feedback returned?) plus io_timeout_cnt.
Discriminator
- If decision_code=ALLOW but actuator_cmd_ts is missing → scheduling/QoS issue (I/O lane starved by inference load).
- If actuator_cmd_ts exists but sensor_fb_ts never returns → driver/current path or wiring/isolator boundary issue.
- If decision_code=DENY while UI says “success” → UI/state-machine mismatch or stale policy/cache (check policy_version, offline_flag).
First fix
- Force a deterministic I/O lane: move actuation task to high priority; add a hard timeout and always emit actuation_attempt_id.
- Add a self-test: toggle DO → expect DI feedback within N ms; isolate driver vs mechanics.
- If using relay/driver modules: verify DO driver enable and coil/motor current path. Example parts to inspect: ULN2003A (relay), DRV8871 (DC), BTS500xx (high-side), isolators ISO7741/ADuM141E.
SYMPTOM 2 Only fails under bright sunlight / lobby glass reflections
First 2 measurements
- ambient_ir_sat (IR saturation flag) + exposure stats (exposure_us, gain_db) on both RGB and IR channels.
- illum_sync_ok + dropped_frames + liveness_retry_cnt (is timing breaking or is the sensor saturating?).
Discriminator
- If ambient_ir_sat=1 + exposure clamps to extreme values → environment IR overwhelms the liveness channel (not “AI got worse”).
- If illum_sync_ok drops + dropped_frames rises → lighting trigger/exposure window misaligned (scheduler jitter, GPIO/driver timing).
- If failure is angle-specific → mechanical placement/hooding/reflection path issue (system-level).
First fix
- Enable a “sunlight mode”: cap IR flood duty, adjust exposure bounds, and require stable illum_sync_ok before decision.
- Add/verify hardware sync path for illuminator: a GPIO-timed strobe driver, not a best-effort userspace toggle.
- Example driver/port protection parts often involved: LED driver (conceptual), port ESD TPD4E05U06, camera/IR trigger line integrity. For mechanical fixes: add hood/tilt to avoid specular reflections.
SYMPTOM 3 Payment succeeds but access is denied
First 2 measurements
- Payment event record: txn_id, result_code, ts_local, ts_quality (RTC valid? time jumped?).
- Policy binding record for the same txn_id: decision_code, reason_code, policy_version, offline_flag.
Discriminator
- If payment says success but kiosk has no matching txn_id in the access decision log → event association/queueing issue (local commit not atomic).
- If txn_id exists but is “expired” → TTL/clock drift issue (check RTC reset, battery/hold-up).
- If reason_code indicates policy mismatch → config sync problem (policy_version difference).
First fix
- Make payment→access token binding atomic: only show “paid” after local durable record is committed (brownout-safe).
- Promote timestamp quality to a first-class input: deny-with-clear-reason when ts_quality is bad (prevents silent mismatch).
- MPN anchors: secure element for key boundary ATECC608B/SE050/OPTIGA Trust M; RTC DS3231M/PCF85263A to stabilize audit time; eFuse TPS25947 to prevent brownout during transaction logging.
SYMPTOM 4 Random reboot when the printer prints or the motor starts
First 2 measurements
- reset_reason + uvlo_cnt + rail_min_mV (capture with a supervisor log or scope on the main rail).
- Event correlation: align print heat pulse / motor start time with reboot time (event_ts vs reset_ts).
Discriminator
- If reset_reason indicates brownout/UVLO and rail droops during pulse → power integrity / hold-up / inrush.
- If brownout is absent but watchdog fires → scheduler deadlock or driver blocking (still verify supply first).
- If only certain sites fail (long cables / outdoor runs) → surge/ground shift/ESD coupling.
First fix
- Mutual-exclusion scheduling: never allow motor-start and printer-heat pulse in the same window; log a power_pulse_conflict_cnt.
- Segment noisy domains: dedicate a rail for printer/motor (buck + eFuse), keep SoC/RTC/log rail isolated.
- MPN anchors: PoE PD TPS2372/TPS2373; eFuse TPS25947/TPS2663; supervisor TPS3899/MAX16054; fast buck examples LT8609S/TPS546D24A; port ESD TPD4E05U06.
SYMPTOM 5 Throughput drops after 30 minutes (latency rises, more retries)
First 2 measurements
- thermal_state + throttle_cnt + latency histogram (p50/p95) for inference and decision.
- Backpressure evidence: queue_depth, dropped_frames, retry_cnt, io_timeout_cnt.
Discriminator
- If thermal rises and throttle_cnt climbs with p95 latency → thermal throttling is the primary driver.
- If thermal is stable but queue_depth grows → QoS isolation failure (I/O lane starved by compute or logging).
- If performance decays with log size growth → retention/flush policy issue (ring buffer not bounded).
First fix
- Introduce a deterministic “degrade mode”: cap FPS/resolution, limit concurrent recognition, preserve actuation deadlines.
- Tie policy to thermal: when thermal_state crosses a threshold, automatically switch to lower compute cost path.
- Stabilize time + logs: RTC DS3231M and bounded log ring; ensure brownout-safe flush via eFuse supervision (e.g., TPS25947).
One rule that prevents 80% of field chaos: Always validate the closed-loop chain before optimizing anything: decision_code → actuator_cmd → sensor_feedback → power/reset_reason. If the loop is broken, “better AI” will not fix it.
What this playbook intentionally avoids: backend workflow assumptions, protocol-level details (OSDP/Wiegand/EMV), or ML algorithm theory. The focus is device-side signals, logs, and boundaries that are testable on the bench or in the lobby.
H2-12. FAQs (×12) — Field-Proven, Evidence-Based
Each answer stays inside the device boundary and points back to measurable evidence (logs, counters, timestamps, rails, and I/O feedback). Parts listed are example MPN anchors (choose equivalents by voltage/current/isolation/temp grade).
01Face is detected, but the gate stays locked — I/O mapping or safety interlock?
Start by proving the chain: decision_code=ALLOW must exist before chasing hardware. Next compare actuator_cmd_ts vs sensor_fb_ts and check interlock inputs (E-stop, fire input, tamper, anti-passback). If ALLOW is true but no command is issued, it is scheduling/QoS. If command exists but no feedback returns, it is driver/wiring/interlock.
02Works indoors, fails at glass lobby entrance — glare, IR saturation, or liveness lighting timing?
Use evidence instead of guessing. Check ambient_ir_sat plus exposure/gain limits for both RGB and IR. If IR saturates in sunlight or reflections, liveness fails even when detection is fine. If illum_sync_ok drops and dropped_frames rises, the strobe/exposure window is drifting (scheduler jitter or trigger integrity). A fast first fix is a “sunlight mode” with capped IR duty and stricter sync gating.
03Recognition is accurate but slow — NPU load, memory bandwidth, or thermal throttle?
Measure tail latency, not averages: log p50/p95 for capture→infer→decision and track queue_depth. If thermal_state and throttle_cnt climb over time, heat is the primary cause. If thermal is stable but p95 spikes during I/O, DRAM/CPU contention or QoS isolation is failing. First fix: cap FPS/concurrency and pin actuation tasks to a deterministic lane.
04ID scan succeeds, but the user is still denied — policy cache vs upstream sync (device-side view)?
Avoid assumptions about external systems; stay device-side. Confirm policy_version, offline_flag, cache_hit, and cache_ttl_remaining at the denial moment, plus last_sync_ts and sync_state. If TTL is expired or time quality is poor, valid IDs can be denied locally. If policy version changed but cache did not update atomically, a version split occurs. First fix: atomic cache swap + explicit reason_code on deny.
05QR decodes on phone screen but not on printed tickets — imager optics or illumination?
Phone screens are bright and high contrast; printed tickets are reflection-heavy and low contrast. Compare decode_retry_cnt with exposure/gain limits and focus distance. If failures correlate with shiny paper or angle, it is glare/illumination geometry. If failures correlate with distance and tilt, it is optics/DoF. First fix: add diffuse illumination, tighten the allowed scan distance, and enforce a “stable focus/exposure” window before decode attempts.
06Payment approved but access denied — transaction linkage ID mismatch?
Most “approved but denied” cases are correlation failures, not payment failures. Verify the same txn_id (or access token ID) exists in both the payment event record and the access decision record, and check ts_quality to rule out time jumps that invalidate TTL. If the kiosk logs approval but no durable binding record exists, it is a non-atomic commit (often during brownouts). First fix: generate txn_id in one place, commit durably, then allow.
07Random reboot when printing — inrush/UVLO or ground bounce?
Read reset_reason and uvlo_cnt, then scope the main rail during the print heat pulse and motor/step events inside the printer. If rail_min_mV dips and reset_reason indicates brownout/UVLO, it is power integrity/hold-up/inrush. If rails remain stable but resets correlate with cable movement or port activity, suspect ground shift or ESD coupling. First fix: split printer rail via eFuse and stagger pulses.
08Random reboot when turnstile starts — motor surge or driver fault feedback?
Correlate turnstile start with rail_min_mV and reset cause first. If droop aligns with motion start, it is motor surge, inrush, or insufficient separation between motor and SoC rails. If rails are stable but the driver reports a fault (overcurrent/thermal) and the controller hangs until watchdog resets, it is a driver/current-path or feedback wiring issue. First fix: limit motor start ramp, isolate motor rail, and log driver fault pins with timestamps.
09Tailgating false alarms increase at peak hours — sensor placement or latency jitter?
Log the timing chain: sensor_trigger_ts → decision_ts → actuator_cmd_ts, plus p95 decision latency. If p95 rises and alarms correlate with backlog (queue_depth), it is jitter/QoS, not placement. If timing is stable but triggers occur in unexpected sequences, it is geometry/occlusion/beam placement. First fix: prioritize I/O processing over recognition, then re-aim sensors using recorded trigger order as evidence.
10Offline mode causes unexpected denies — cache TTL, clock drift, or log overflow?
Check three device-side facts: cache_ttl_remaining, ts_quality/clock_drift, and log_ring_used_pct. If TTL is valid but the clock jumped, tokens appear “expired”. If TTL and time are fine but log/cache storage is full, updates and commits fail, causing conservative denies. First fix: an RTC with hold-up, bounded ring buffers with drop counters, and explicit deny reason codes for offline cases.
11Logs show events but no uploads — storage ring buffer or connectivity retry policy?
Separate “recording” from “shipping”. Confirm local persistence: ring_write_ptr, ring_drop_cnt, and storage health flags. Then confirm network delivery: link_up, upload_queue_depth, and backoff_state. If drop counters rise, the ring is too small or commits are blocked by brownouts. If ring is stable but retries stall, the retry policy is stuck (backoff too large or a state machine wedge). First fix: cap backoff and bound queues with clear counters.
12After a firmware update, recognition changes — model/version mismatch or camera timing?
First prove what changed. Compare model_version and pipeline_config_hash (capture→preprocess→infer) before/after the update, then verify camera timing evidence (illum_sync_ok, exposure bounds, dropped frames). If versions differ unexpectedly, it is an update packaging/rollback issue. If versions match but timing counters degrade, it is input quality (trigger drift or new scheduling). First fix: sign and pin model/config versions, and add a post-update timing validation gate.