Video Intercom Door Station Hardware Design & Debug Playbook
← Back to: Security & Surveillance
Core idea: A video intercom door station is a tightly coupled edge device where video, audio, PoE power, networking, and door-lock I/O must stay stable together. Most field failures are best solved by an evidence chain—probe a few key test points and read the right logs—then apply the first hardware-level fix (power integrity, isolation, or relay/noise control) before touching platform features.
H2-1. Definition & Boundary: What problem does a Door Station solve?
One-sentence definition (must include 3 verbs)
A Video Intercom Door Station is an outdoor edge endpoint that captures audio/video (and optional identity signals), transports sessions to an indoor monitor or controller, and actuates door I/O (lock relay + door state) with audit-grade event evidence.
This page covers (locked scope)
- Edge hardware coupling: imaging chain, audio chain (AEC/AGC), compute, uplink, power tree, and door I/O in one enclosure.
- Uplink choices at the endpoint: Ethernet/PoE vs Wi-Fi tradeoffs only as they affect the door station’s stability and observability.
- Evidence-first design: define what to log/measure so field issues map back to a specific block and step.
What this page does NOT cover (Banned) + where it belongs
- Access policy engine / credential database / multi-door authorization logic: belongs to Access Control Panel.
- Face liveness algorithms / deep biometric decision stack: belongs to Face Access Controller.
- VMS/NVR ingestion, recording integrity, compliance workflows: belongs to NVR / VMS Ingest & AI Box / Recording Integrity.
- PTP/IEEE-1588 timing architecture or PoE switch/PSE design: belongs to Timing & Sync for CCTV / PoE Switch for CCTV.
Boundary test (quick self-check)
If a paragraph can be answered without referencing any endpoint block (sensor/codec/rail/PHY/relay) or any observable evidence (counter/log/ts), it likely drifts into platform or algorithm territory and should be removed or moved to a sibling page.
Minimum “observability contract” for a door station
To prevent “works in lab, fails in field” ambiguity, the endpoint should expose a small, stable set of evidence signals:
- Identity of the unit: serial number, hardware revision, sensor/module IDs.
- Power evidence: brownout/reset reason code, PoE classification/power status (as seen by the PD), rail-good flags.
- Link evidence: Ethernet link up/down, negotiated speed/duplex changes, Wi-Fi RSSI + retransmit counters (endpoint view).
- Media evidence: frame drop counters, encoder queue depth, audio saturation/AEC convergence flags.
- I/O evidence: relay actuation count + last actuation timestamp, door-contact bounce count, tamper events.
Outcome: every field issue becomes “which block + which step + which evidence diverged”, not a vague symptom report.
H2-2. End-to-End Event Path: from “Ring” to “Unlock”
Why this chapter exists
Door-station failures look similar in the field (black screen, one-way audio, delayed unlock), but the root causes live in different blocks. A reliable design starts with a single, explicit event path that every later chapter can reference and verify with evidence.
Split the system into two primary paths (avoid confusion)
- Call path (talkback): Trigger → media capture → encode → transport → session up → bidirectional A/V stability.
- Unlock path (actuation): Authorization result → unlock command → relay actuation → door-contact state change → audit log commit.
Minimum evidence fields (stable, small, field-friendly)
The endpoint should log a compact set of timestamps and counters per interaction. This turns “it failed” into “which step diverged”.
| Field | Meaning (what it proves) | Used to isolate |
|---|---|---|
| event_id | Unique interaction identifier (one ring/trigger session) | Correlate media + I/O + resets to one attempt |
| ts_trigger | Time of trigger (button/PIR/door-contact change) | Input debounce issues vs real triggers |
| ts_first_frame | First captured frame enters encode pipeline | Sensor/ISP/DDR stalls, cold-start latency |
| ts_session_up | Talkback session established (endpoint view) | Link instability, retransmits, negotiation timeouts |
| link_state / rssi | Ethernet up/down or Wi-Fi RSSI snapshot + retry count | Wired vs wireless root cause separation |
| media_drop_cnt | Frame/audio drop counters during session | Compute overload vs transport loss |
| ts_unlock_cmd | Unlock command received/issued (endpoint view) | Authorization path vs actuation path |
| ts_relay_on | Relay driver asserted / relay energized | Driver/rail/noise issues in I/O block |
| ts_door_open | Door-contact indicates open after unlock | Mechanical lock vs sensor wiring vs bounce |
| reset_reason | Brownout/WDT/thermal reset code | Power tree vs software hangs (evidence-first) |
| reason_code | Final outcome enum (OK / NET_DOWN / TIMEOUT / RELAY_FAULT / …) | Fast triage without full logs |
How to use these fields (practical)
- Latency decomposition: (ts_first_frame – ts_trigger) vs (ts_session_up – ts_first_frame) vs (ts_relay_on – ts_unlock_cmd).
- Regression control: keep these fields stable across firmware revisions so field comparisons remain valid.
- Minimal bandwidth mode: if remote logs are constrained, transmit only (event_id + reason_code + 3 key deltas + reset_reason).
Failure modes mapped to the event path (so debug stays vertical)
- No session / long setup: link flaps, DHCP/negotiation delays, Wi-Fi retries → prove with link_state/rssi + ts_session_up.
- Media stutter: encoder queue overflow, CPU throttling, transport loss → prove with media_drop_cnt + thermal/reset evidence.
- Unlock delayed: decision arrives but actuation slow → prove with (ts_relay_on – ts_unlock_cmd) and correlate with rail dips.
- Unlock fails but relay toggled: contact wiring/bounce/mechanical lock → prove with ts_door_open missing + bounce counter.
- Random reboot near relay action: inductive kickback / ground bounce → prove with reset_reason + scope rail at relay edges (later chapters).
H2-3. Hardware Architecture: “Partitioned BOM Islands” to prevent interference
Why partitioning matters (multi-domain coexistence)
A door station is a tightly packed coexistence system: high-speed imaging, low-noise audio, 48V PoE power conversion, RF (Wi-Fi), and relay/long-cable I/O share one enclosure and often one cable harness. Most “random” field failures are repeatable coupling paths: relay kickback, PoE common-mode noise, and audio return pollution.
Partition rules (actionable, evidence-first)
- Return-path control: keep sensitive analog/audio returns away from relay and high di/dt power loops; verify with rail ripple + audio noise during relay edges.
- Dirty vs clean power domains: separate “dirty” loads (relay driver, speaker amp, IR illuminator) from “clean” rails (sensor/AFE/RF) using domain boundaries and local decoupling; verify with reset_reason and rail-good flags.
- RF quiet zone: keep Wi-Fi antenna/feed and its reference plane away from PoE magnetics, switching nodes, and relay loops; verify with retry_cnt and RSSI stability during load events.
- I/O hardening island: treat lock/door-contact/exit-button/tamper as an “outdoor cable” subsystem: ESD/surge entry, optional isolation, and debounce; verify with bounce counters and tamper logs.
Goal: every coupling path is either blocked structurally, or becomes measurable (counters + waveforms) instead of “mystery failures”.
Key output: Module → Risk → Isolation → Evidence table
| Island | Sensitive to | Typical injector | Isolation lever | Evidence anchor | Fast A/B check |
|---|---|---|---|---|---|
| Imaging | SI margin, rail noise, thermal throttle | PoE switching node, DDR bursts, ground bounce | Short high-speed path, clean rails, DDR locality | frame_drop_cnt, mipi_err_cnt, temp/clock_state | Lower resolution/lane rate and watch error counters |
| Audio | Analog return pollution, amp current pulses | Relay edge, DC/DC ripple, speaker amp di/dt | Return-path separation, amp rail isolation | clip_cnt, AEC residual flag, noise snapshots | Mute amp / disable relay and compare noise floor |
| Power | Load steps, cable drop, UVLO, inrush | Relay + amp + IR loads, long PoE cable | Domain rails, UVLO margin, local bulk caps | reset_reason, rail_pg, uvlo_cnt | Short cable / known-good PSE and see resets disappear |
| RF (Wi-Fi) | Common-mode noise, antenna detune | PoE magnetics, switching harmonics, relay loop | RF quiet zone, keepouts, clean RF rail | RSSI, retry_cnt, throughput log | Operate on Ethernet only; compare retry counters |
| I/O & Relay | ESD/surge entry, inductive kickback | Lock coil, long cable, lightning transients | Clamp/absorb at entry, small loop actuation | relay_cnt, door_contact_bounce, tamper logs | Swap to dummy load; compare reset_reason + pops |
| Compute/Memory | Thermal throttle, overload | High bitrate + AI tasks, background logging spikes | Budgeted pipelines, watchdog, queue backpressure | queue depth, drop counters, clock_state | Lock bitrate/frame rate and observe stability |
H2-4. Imaging Chain: Sensor → ISP → H.26x Encode (interfaces + evidence only)
Scope of this chapter (what we do and do not do)
This chapter treats video as an engineering pipeline, not an ISP “tuning tutorial”. The goal is to locate faults to a pipeline stage using interfaces and evidence counters: sensor control vs high-speed link margin vs processing/thermal limits vs transport loss.
Stage-by-stage: failure → evidence → first action
-
Sensor & front-end control: low-light noise, motion blur, backlight clipping.
Evidence: stable frame pacing + exposure/gain state snapshots.
First action: lock exposure/frame rate briefly and observe whether artifacts persist (control-loop vs transport/processing). -
MIPI/parallel link margin: intermittent stripes, green frames, sudden frame drops.
Evidence: mipi_err_cnt, CRC/ECC indicators (if available), correlated frame_drop_cnt.
First action: reduce lane rate/resolution and check if error counters scale with throughput (SI margin vs downstream overload). -
ISP & encoder load: macroblocking, bitrate spikes, long first-frame latency.
Evidence: encoder queue depth, frame_drop_cnt, bitrate stats, temp/clock_state (throttle).
First action: lock target bitrate/fps and compare queue/drop behavior (compute overload vs network loss). -
Output/transport: looks like “bad encode” but is actually packet loss/jitter.
Evidence: endpoint retries/throughput, link up/down, session renegotiations.
First action: short-cable Ethernet test vs Wi-Fi; if the issue disappears, keep debugging in uplink rather than ISP/encoder.
Three-piece evidence kit (covers most field video issues)
- Drop evidence: frame_drop_cnt (proves real pipeline starvation/drop).
- Link-layer evidence: mipi_err_cnt (proves interface margin errors before ISP/encoder).
- Capacity evidence: bitrate stats + temp/clock_state (proves overload or thermal throttle).
Use them together: if mipi_err_cnt rises with lane rate, fix the physical margin; if drops rise with no MIPI errors, check queues/DDR/thermal; if both are clean but video still breaks, isolate the uplink path (Ethernet/Wi-Fi).
H2-5. Audio Chain: Duplex talk, AEC/anti-howling, amp & speaker coupling
What this chapter locks down (door-end only)
Duplex intercom quality is dominated by the door-end audio chain and its coupling to power and relay events. The engineering target is stable duplex voice with measurable evidence anchors, so “echo / howl / pop / muffled voice” can be mapped to a stage: capture, DSP, playback, or injection.
Stage map: failure → evidence → first action (minimal tooling)
-
Capture (Mic/AFE/Codec ADC) — symptoms: hiss, wind rumble, harsh clipping, “pop” on lock actuation.
Evidence: input saturation flag, quiet-floor spectrum snapshot, mic bias/rail ripple (if measurable).
First action: build a baseline (speaker muted + relay disabled) then repeat with relay/lock events to prove electrical injection vs environment. -
DSP (AEC/NS/AGC/anti-howling) — symptoms: residual echo, pumping volume, sudden “muffled” voice after howl detection.
Evidence: AEC residual / ERLE (if available), AGC gain state, howling detector flag.
First action: A/B toggles (lock AGC / relax NS) to isolate algorithm-state interactions (not a tuning tutorial). -
Playback (DAC/Class-D/Speaker) — symptoms: distortion at high volume, thermal drop, click during relay edge.
Evidence: amp limiter/clip, amp OTP/UVP, rail dip correlated with audio artifacts.
First action: enforce a short mute window around relay actuation (tens of ms) to confirm pop is caused by injection, not AEC. -
Echo path (acoustic cavity) — symptoms: echo differs by hallway/doorframe, degrades with wind/rain.
Evidence: AEC residual trend correlates with environment changes while electrical logs remain stable.
First action: run a fixed playback probe sequence (tone/chirp) and compare residual before/after mechanical/environment change.
Evidence kit (four anchors that cover most field audio issues)
| Anchor | What it proves | Typical culprit | Fast discriminator |
|---|---|---|---|
| AEC residual / ERLE | Echo is not being canceled (or path changed) | Acoustic path shift, DSP state interaction | Residual changes with environment, not with relay/power events |
| Input saturation flag | Capture path is clipping (or injected pop is hitting ADC) | Gain too high, rail/ground injection during events | Clips coincide with lock/relay edges or loud playback bursts |
| Quiet-floor spectrum snapshot | Noise source signature (broadband vs tonal switching) | Power ripple, RF hash, wind rumble | Noise signature matches load events or changes with speaker mute |
| Amp limiter / OTP / UVP | Playback path is self-protecting or power-limited | Undersized rail, thermal coupling, load step | Limiter/OTP/UVP rises with volume/temperature; rail dip correlates |
H2-6. Uplink & Connectivity: Ethernet/PoE vs Wi-Fi (door-end implementation + evidence)
Engineering goal: stable link + observable link health
Many “video/audio quality” complaints are uplink instability in disguise. This chapter focuses on the door-end connectivity paths and the smallest evidence set that separates: physical link, power coupling (PoE), RF conditions (Wi-Fi), and thermal power states.
Ethernet path (MAC/PHY + magnetics + entry hardening)
- Common failures: intermittent link-up/down, only fails on long cable, worse during storms.
- Door-end evidence: link_up_down_log, negotiated speed/duplex changes, error counters (if PHY exposes them).
- Fast discriminator: short-cable direct test + compare link event frequency; if link events vanish, root cause is physical path/cable/entry stress, not encoder/DSP.
Treat lock wiring and Ethernet as separate “outdoor cable” subsystems: coupling between long I/O and PHY reference is a frequent cause of false uplink blame.
PoE coupling (power events that look like network problems)
- Common failures: negotiates then reboots, drops during lock actuation or IR/amp bursts.
- Door-end evidence: PD event log, uvlo_cnt, rail_pg drop, reset_reason timestamp aligned with link events.
- Fast discriminator: correlate link drop to rail dip; if rail dip precedes link loss, treat as power integrity first.
Wi-Fi path (doorframe attenuation + coexistence with relay/power noise)
- Common failures: near-OK far-bad, retries spike during relay events, throughput falls with temperature.
- Door-end evidence: RSSI trend, retry_cnt, throughput stats, temp/tx power state (if available).
- Fast discriminator: Ethernet A/B: if wired is stable while Wi-Fi degrades, keep debugging on RF/noise coexistence rather than codec/encoder.
Unified evidence kit (smallest set to avoid misattribution)
| Signal | Why it matters | Interpretation | Pairs well with |
|---|---|---|---|
| link_up_down_log | Proves physical link churn | Frequent events → PHY/cable/entry stress | negotiated_speed/duplex |
| RSSI | RF margin baseline | Low/variable RSSI → attenuation/multipath | retry_cnt, throughput |
| retry_cnt / loss | Transport instability | Spikes during events → noise coupling | relay_cnt, uvlo_cnt |
| throughput | Effective uplink capacity | Falls with temp → power state/throttle | temp / clock_state |
| uvlo_cnt + reset_reason | Power collapse signature | Power event masquerading as network drop | link_up_down_log timestamps |
Minimal isolation flow: (1) short-cable Ethernet baseline, (2) correlate link events vs rail events, (3) Wi-Fi RSSI/retry trend under the same load actions (lock + audio + IR).
H2-7. PoE PD & Power Tree: 48V → multi-rail bring-up, brownout, and shock immunity
What this chapter locks down (root cause class)
A large fraction of “black screen / reboot / audio pop / relay misfire” incidents originate in the power domain. This chapter defines a door-end evidence path from PoE input to rail sequencing and reset reasons, so symptoms map to measurable rails and counters rather than guesswork.
PoE input (door-end view): classify, inrush, and cable-drop behaviors
- Key failure signature: input sag during hot-plug, long cable, or high-load events (lock actuation, IR/amp bursts).
- Evidence anchors: pd_event_log, pd_class_detected, uvlo_cnt, inrush_limited/ocp_event (if available).
- Fast discriminator: correlate uvlo / pd events with the exact timestamp of lock/relay actions and video/audio load steps. If power events lead, treat uplink/codec issues as secondary.
Treat “PoE instability” as a measurable combination: sag amplitude × sag duration × correlation to load events.
Multi-rail domain plan: keep dirty loads from collapsing core evidence
The power tree should be readable as domains, each with its own margin and evidence. A practical grouping is:
- Core/Compute: SoC + DDR (highest priority; brownout here causes resets and corrupted logs).
- Imaging: sensor/ISP domain (drops show as frame loss, black flashes).
- Audio: codec/AFE domain (sensitive to ripple/ground bounce → hiss/pop).
- Amp: Class-D domain (large pulsed current source; isolate from codec reference).
- Relay/Lock: coil/driver domain (dirty inductive load; isolate and time-gate).
- RF: Wi-Fi domain (drops cause retry spikes and Tx power state changes).
The intent is not topology deep dive, but to ensure each domain has PG visibility and can be correlated to system symptoms.
Bring-up / sequencing: PG chain and “mute-before-ready” principles
- Phase 1: PD stable → primary DC/DC establishes a safe intermediate rail (verify VIN margin above UVLO).
- Phase 2: core + DDR rails first (PG_core, PG_ddr) to avoid undefined boot states.
- Phase 3: imaging + uplink (PG_cam, PG_phy / PG_rf) to ensure early link observability.
- Phase 4: audio/amp last; keep amp muted until codec reference is stable to prevent startup pop.
- Phase 5: enable relay/lock domain only after system-ready and logging is live.
This sequence ensures power disturbances from amp/relay do not mask themselves as codec/encoder failures.
Brownout / power-down evidence: make the “last moment” searchable
| Evidence | What it proves | Typical symptom | Correlate with |
|---|---|---|---|
| Vcore / Vddr dip | Core margin collapse → reset or silent corruption risk | Reboot, black screen, stuck boot | reset_reason_code, wdt_flag |
| Vaudio ripple | Reference contamination → hiss/pop and AEC instability | Audio pop, noisy duplex | clip_flag, amp limiter/OTP |
| Vamp / Vrelay step | Dirty load step magnitude and recovery | Pop on unlock, relay misfire | relay_actuation_cnt, uvlo_cnt |
| UVLO / PD event log | PoE input sag drives system instability | Random resets on long cable | link up/down timestamps, throughput drop |
| reset_reason_code | Classify cause: brownout vs watchdog vs thermal | “Unexplained reboot” | rail min values, temp/clock state |
H2-8. Door Lock Relay & Access I/O: relay, contact, exit button, tamper, and isolation
What this chapter locks down (the “dirty zone”)
This is the high-risk boundary where inductive loads and long outdoor wiring can inject noise and transients back into audio, RF, uplink, and even the core rail. The goal is to define an I/O partition, protect entry points, and make misfires searchable via counters and correlation snapshots.
I/O partition: three categories with different noise paths
- Inductive output (relay/strike/solenoid): large di/dt and kickback; must be isolated and time-gated.
- Long-wire inputs (door contact / exit button): ESD/EFT/common-mode injection; needs entry protection and debounce.
- Tamper switch: mechanical bounce and nuisance triggers; requires filtering and evidence snapshotting.
The purpose is not “feature list”, but to prevent dirty I/O behavior from being misdiagnosed as codec/uplink issues.
Relay/lock output: coil kickback, arc transients, and controlled enable
- Kickback path: coil energy returns through the driver loop; without a nearby clamp, it spreads into ground/reference.
- Arc path: contact switching can create spikes; treat it as an outdoor transient source, not a “logic signal”.
- Placement rule: clamps (TVS/RC) should sit close to the noisy loop/entry, not far away on the logic island.
- Evidence anchors: relay_actuation_cnt, relay_edge_timestamp, misfire_cnt, and correlation to uvlo_cnt / link_up_down_log / audio pop markers.
Door contact / exit button inputs: debounce for evidence, plus entry hardening
- Dry-contact inputs: treat as noisy edges; debounce is required to make state-change logs trustworthy.
- Long cable reality: ESD/EFT and common-mode spikes can look like rapid toggles without entry protection.
- Optional isolation: when wiring is long/outdoor and transient-heavy, isolation can prevent back-injection to logic reference.
- Evidence anchors: contact_state_change_log, contact_bounce_cnt, false_open_alarm_cnt.
Tamper input: bounce filtering + snapshot at trigger time
Tamper events should include a compact snapshot so “false alarms” can be separated from true enclosure opening and from power/network disturbances.
- Filtering: bounce filtering prevents nuisance triggers from mechanical chatter.
- Snapshot fields: reset_reason, uvlo_cnt delta, link state, RSSI/retry, relay activity within a short window.
- Evidence anchors: tamper_open_cnt, tamper_bounce_cnt, tamper_snapshot.
Correlation rules (avoid misattribution)
- If relay actuation aligns with uvlo/reset → prioritize power tree (H2-7) before blaming codec/uplink.
- If relay actuation aligns with clip_flag/amp limiter → prioritize audio injection isolation (H2-5).
- If contact bounce spikes align with entry transient windows → prioritize entry protection / isolation (this chapter).
H2-9. Optional biometrics integration: fingerprint/face at the door-station integration boundary
Integration-level scope (what this page does, and what it refuses)
Biometrics inside a door station should be treated as plug-in peripheral integration: stable power/wake control, robust interface, predictable ESD behavior, and audit-ready logs. Identity system design, algorithms, and controller-level policy belong to dedicated access controller pages.
Fingerprint module: interface + power/wake + “log every update”
- Interfaces: SPI / UART / I²C are acceptable if the integration includes timeout + retry behavior and error counters that survive reboot.
- Power & wake: keep the module in a controlled power domain (power switch or enable rail), and record ready_time_ms and enumeration_result.
- ESD: treat the sensor surface and its cable/connector as an outdoor entry point. The goal is not only “no damage” but also “no silent interface lock-up”.
- Update logging (no algorithm talk): template/DB updates must be event-logged with version and result, so field issues become searchable.
Recommended door-end evidence fields
- fp_pwr_on_cnt, fp_pwr_fail_cnt, fp_ready_time_ms
- fp_if_err_cnt, fp_if_reset_cnt, fp_last_err
- fp_match_ok_cnt, fp_match_fail_cnt, fp_fail_reason
- tpl_version, tpl_update_seq, tpl_write_crc, tpl_write_result
The minimum success metric is not “recognition rate” alone; it is recognition rate split by fail reason (timeout / interface error / not-ready / poor capture / strobe miss), with a stable denominator.
Face/IR integration: strobe timing, install risks, and counters (no model/algorithm)
- IR illumination: treat IR LED/VCSEL as a controlled load domain with a clear trigger edge and a recorded timestamp.
- Frame-level sync: bind strobe to camera exposure windows and count misses. If strobe timing drifts, “recognition failures” will be misattributed to software.
- Dual-camera/depth (integration view): focus on connector/interface robustness and installation coupling (metal door frames, reflections, window contamination).
- Evidence anchors: ir_strobe_cnt, ir_strobe_miss_cnt, ir_phase_err, frame_drop_cnt (correlated window), sensor_status_code.
Privacy & storage (integration boundary): minimal data + traceability fields
Door stations should keep biometric storage minimal and traceable. The objective is to answer “what changed and when” without implementing a controller-level trust architecture here.
- Minimize stored content: store only what is required for operation and audit (versions, counters, checksums).
- Versioning: every update increments a monotonic sequence and writes a CRC to detect partial writes.
- Traceability: log module ID / firmware version so failure clusters can be grouped.
- bio_module_id, bio_fw_version, policy_version
- tpl_update_seq (monotonic), tpl_write_crc, tpl_write_result
- bio_session_id, bio_fail_reason, bio_timeout_cnt
H2-10. Outdoor reliability: ESD/surge/lightning proximity, condensation, and acoustic/optical environment coupling
Real-world objective
Outdoor door stations fail in ways that look “random” unless protection and verification are tied to evidence. This chapter organizes reliability by ports and coupling paths, then defines verification checkpoints and counters to correlate field incidents with environment.
Port-based protection map: port → path → symptom
- RJ45 (data + PoE): common-mode injection → link drop, PD events, reset after ESD.
- Relay/lock wiring: inductive transients → misfire, audio pop, brownout correlation.
- Door contact / exit button: long-wire ESD/EFT → bounce spikes, false triggers.
- Button / mic hole: direct touch/water paths → spurious events, noise floor swings.
- Antenna (if Wi-Fi): detune/metal frame coupling → RSSI drop and retry spikes.
The pass criterion is not only “survives ESD”, but “restores service with stable counters and predictable recovery time”.
RJ45 entry: isolation boundary + common-mode control (door-end view)
- Isolation boundary: treat magnetics/isolation as the policy line; keep noisy return paths out of logic reference.
- Common-mode control: CMC and shield/earth strategy are used to reduce link-drop and silent PHY lockups.
- Evidence anchors: link_down_cnt, phy_err_cnt, pd_event_log, reset_reason_after_esd, recovery_time_ms.
Condensation & weather: short-lived distortions must be correlatable
Condensation and humidity can create transient failures (fogged image, mic bias drift, leakage paths, RF detune) that disappear before a technician arrives. The fix is to log environment context and symptom counters within the same time window.
- Context: temperature (minimum), optional humidity/dew proxy if present.
- Image/audio symptoms: frame_drop_cnt, sensor_err_code, noise_floor_metric, aec_residual_metric.
- Uplink/RF symptoms: rssi, retry_rate, tx_power_state.
“It recovers by itself” is a signature of environment coupling. Without correlation logs, it will be misdiagnosed as software instability.
Acoustic & optical coupling: environment changes the input, not only the settings
- Acoustic: wind/rain and enclosure resonance alter echo path → AEC residual grows, duplex feels unstable.
- Optical: water droplets/film and IR reflections change exposure dynamics → strobe misses and frame perturbations.
- Evidence anchors: aec_residual_metric, agc_state, ir_strobe_miss_cnt, exposure_swing_metric.
Verification checkpoints (minimal, repeatable)
- After ESD event: record reset_reason, link recovery time, and whether counters jump abnormally.
- After surge/near-lightning stress: look for PD/UVLO events, link drops, and false I/O triggers in the same window.
- During humidity/condensation exposure: correlate temp/humidity with sensor_err_code, noise floor, and retry rate.
H2-11. Validation plan: a deliverable test matrix (not just “it works”)
This section converts a door-station build into a repeatable deliverable: each test item includes condition → pass criteria → required logs/counters → first 2 probes. Failures must map back to evidence fields so field incidents are diagnosable.
Reference BOM MPN examples (drop-in candidates)
The following part numbers are examples to anchor validation measurement points and protection design. Select exact variants per power budget, packages, and compliance needs.
- PoE PD (802.3af/at): TI TPS2372 / TPS2373 / TPS23753A, ADI (LT) LT4275A, Microchip PD70224.
- 48V hot-swap / inrush / eFuse: TI TPS25940 / TPS2660, ADI LTC4368 / LTC4367.
- DC/DC (buck, system rails): TI TPS54331, MPS MP1584EN, ADI LTC3633, Richtek RT8299.
- PHY (10/100/1G Ethernet): Microchip KSZ9031RNX, TI DP83867IR, Realtek RTL8211F.
- ESD/Surge TVS (Ethernet / I/O): Semtech RClamp0524P, Littelfuse SP3012, Nexperia PESD2ETH series.
- Common-mode choke (Ethernet): Würth Elektronik WE-CMC families (select by impedance/current), TDK ACT45B families (example family).
- Audio codec: TI TLV320AIC3104, Cirrus Logic CS42L42, Realtek ALC5651 (platform-dependent).
- Class-D speaker amp: TI TPA2016D2, TI TPA3110D2, NXP TFA9890 (smart amp class).
- Relay driver / low-side: TI ULN2003A, TI TPS274160 (multi-channel), Infineon BTS700x (high-side family example).
- Digital isolator (I/O or RS-485 paths when used): TI ISO7721, ADI ADuM1201.
For outdoor wiring, prefer parts with clear surge/ESD ratings and known leakage behavior; validate leakage/false-trigger with long-cable fixtures.
Evidence fields required across tests (log schema starter)
- Power: reset_reason, brownout_cnt, uvlo_event_cnt, pd_event_log, recovery_time_ms
- Video: frame_drop_cnt, encoder_err_cnt, bitrate_avg, bitrate_peak, thermal_throttle_state
- Audio: aec_residual_metric, mic_clip_cnt, noise_floor_metric, amp_limit_state
- Network: link_up_cnt, link_down_cnt, phy_err_cnt, reconnect_cnt, rssi, retry_rate
- I/O: relay_act_cnt, relay_misfire_cnt, contact_bounce_cnt, false_trigger_cnt, esd_event_cnt
- Environment (if available): temp_c, humidity_pct, condensation_flag
Test matrix template (condition → criteria → logs → first 2 probes)
| Test item | Setup / condition | Stimulus | Pass criteria | Logs / counters | First 2 probes (TP) |
|---|---|---|---|---|---|
| Video Low-light + motion stability | Lux down to target, moving subject, IR off/on window | Run 10–30 min streams, toggle IR load and WDR mode (if supported) | Frame drops within limit; bitrate peaks bounded; no unrecoverable freeze | frame_drop_cnt, encoder_err_cnt, bitrate_peak, thermal_throttle_state | TP1: SYS rail TP2: sensor/ISP rail (or core rail) |
| Video Thermal rise → throttling behavior | Chamber or controlled heating; airflow restricted scenario | Continuous encode at max profile for ≥60 min | Graceful degrade (fps/bitrate) with logs; no reboot loops | temp_c, thermal_throttle_state, reset_reason, bitrate_avg | TP1: CORE rail TP2: DDR rail (if accessible) |
| Audio Duplex AEC robustness | Speaker playback at multiple SPL, mic distance fixed | 2-way call; sweep volume; introduce reflective surfaces | AEC residual within limit; no howling; mic clip rare | aec_residual_metric, mic_clip_cnt, noise_floor_metric | TP3: codec mic-in node TP4: class-D PVDD |
| Audio Relay actuation noise injection | Relay wired to representative lock load; audio active | Trigger relay at call start/mid/end; repeat cycles | No audible pop beyond limit; call remains stable | relay_act_cnt, noise_floor_metric, reset_reason | TP4: class-D PVDD TP5: audio analog rail |
| Network PoE cable drop stability | Long cable / worst-case gauge; varying PSE ports | Stream video + call audio; toggle relay periodically | No brownout reset; PD events within spec; recovery time bounded | pd_event_log, uvlo_event_cnt, brownout_cnt, reset_reason, recovery_time_ms | TP6: 48V at PD input TP1: SYS rail |
| Network Link flap & reconnect | Switch port toggles / cable wiggle fixture | Force link down/up repeatedly while streaming | Auto-reconnect; no stuck state; logs show clear event chain | link_down_cnt, reconnect_cnt, phy_err_cnt, recovery_time_ms | TP7: PHY supply TP1: SYS rail |
| Wi-Fi Weak-signal retry stress | Attenuation / metal frame surrogate, high 2.4 GHz congestion | Maintain call + stream; roam or reduce RSSI stepwise | Retry bounded; no reboot; reconnect works after dropout | rssi, retry_rate, reconnect_cnt, reset_reason, temp_c | TP8: RF/PMIC rail TP1: SYS rail |
| I/O Door contact bounce immunity | Long-wire harness + bounce generator fixture | Inject bounce patterns; vary debounce settings (if any) | No false unlock; bounce counted and bounded | contact_bounce_cnt, false_trigger_cnt, io_state_snapshot | TP9: contact input node TP1: SYS rail |
| I/O External-wire ESD robustness | ESD gun / coupling clamp at contact/relay wiring | Apply stress; verify immediate service restoration | No latch-up; if reset happens, reason logged and recovery bounded | esd_event_cnt, reset_reason_after_esd, link_down_cnt, false_trigger_cnt | TP6: 48V TP10: relay drive node |
| Env Temperature cycle | Low/high temp cycle with dwell | Encode + call + relay cycles during ramps | No persistent failures; drift is visible in logs | temp_c, frame_drop_cnt, aec_residual_metric, reset_reason | TP1: SYS rail TP4: class-D PVDD |
| Env Condensation recovery | Condensation exposure window, then dry-out | Observe distortions; track recovery timing | Recovers without manual intervention; distortion correlates to env logs | humidity_pct/condensation_flag, sensor_err_code, noise_floor_metric, retry_rate | TP3: mic-in TP9: contact input |
Rule: every test row must identify first 2 probes. If more probes are needed, the test is not “field-friendly”.
H2-12. Field debug SOP: Symptom → Evidence → Isolate → Fix (minimal tools)
This SOP is designed for field engineers with limited tools: multimeter + scope (or portable scope) + known-good cable + PoE injector/supply. Each symptom starts with two measurements that split the decision tree quickly.
Reusable 4-step template
- Symptom: describe what the user sees (black screen, echo, reboot, misfire).
- Evidence (first 2 checks): 2 probes + 2 log fields (fast discriminators).
- Isolate: remove one coupling path (disconnect relay wiring, swap cable, disable IR load, etc.).
- Fix (first fix): apply the highest-yield correction first (protection/rail isolation/return path/threshold).
Symptom A — Black screen / mosaic / frozen video
- Evidence (first 2 probes): TP1 SYS rail + TP2 CORE
- Evidence (logs): reset_reason, frame_drop_cnt, link_down_cnt, encoder_err_cnt
- Discriminator: brownout/UVLO → power path; link flaps → RJ45/PHY/ESD; stable rails + rising encoder_err → video chain/thermal.
- Isolate: swap to known-good short cable; disable IR load window; remove relay wiring; retest.
- First fix (hardware-first): verify PD/UVLO margin (e.g., TPS23753A class PD + proper hold-up), add/update Ethernet ESD clamp (e.g., RClamp0524P) and CMC selection, and ensure SoC/DDR rails don’t sag under peak encode load (buck choice like TPS54331 / MP1584EN).
Symptom B — Echo / howling / unstable duplex talk
- Evidence (first 2 probes): TP3 codec mic-in + TP4 class-D PVDD
- Evidence (logs): aec_residual_metric, mic_clip_cnt, noise_floor_metric, relay_act_cnt
- Discriminator: mic clip or elevated noise floor → front-end/grounding; PVDD ripple or relay-correlated spikes → power/dirty-zone coupling.
- Isolate: disconnect external lock wiring; run speaker load substitute; retest duplex; compare AEC residual windows.
- First fix: separate amp PVDD (Class-D like TPA3110D2) return from mic/codec analog ground, ensure codec analog rail is clean (codec e.g., TLV320AIC3104), and clamp relay kickback with diode/TVS/RC near the coil and connector.
Symptom C — Random reboot under PoE (especially during relay or IR load)
- Evidence (first 2 probes): TP6 48V (PD in) + TP1 SYS rail
- Evidence (logs): pd_event_log, uvlo_event_cnt, brownout_cnt, reset_reason
- Discriminator: 48V droop + PD event → cable drop / PSE power budget; SYS-only droop → DC/DC transient/hold-up.
- Isolate: shorten cable; lock PSE port power class; disable relay/IR and re-introduce one load at a time.
- First fix: improve inrush/hold-up and UVLO margins (PD e.g., TPS2372/TPS23753A), consider hot-swap/eFuse (e.g., TPS25940 or LTC4368), and tune buck compensation / output caps for load steps.
Symptom D — Door lock misfire / false unlock / contact flicker alarms
- Evidence (first 2 probes): TP10 relay drive + TP9 contact in
- Evidence (logs): relay_misfire_cnt, contact_bounce_cnt, false_trigger_cnt, esd_event_cnt
- Discriminator: drive node glitches → driver/return-path; contact node spikes → long-wire ESD/EFT + debounce/threshold.
- Isolate: disconnect field wiring and use a short harness; swap lock load; apply controlled ESD to reproduce.
- First fix: add driver with known behavior (e.g., ULN2003A for low-side relay), clamp and snub at the connector (TVS/RC), and harden contact inputs (TVS + debounce + optional isolation such as ISO7721 / ADuM1201 in harsh wiring).
Symptom E — Link drops only in one venue (outdoor storms / long runs)
- Evidence (first 2 probes): TP7 PHY supply + TP6 48V
- Evidence (logs): link_down_cnt, phy_err_cnt, reset_reason_after_esd, recovery_time_ms
- Isolate: swap cable and switch port; test with shield/earth variations if the design supports it.
- First fix: improve Ethernet ESD clamp (e.g., RClamp0524P / SP3012), verify PHY choice/strapping (e.g., DP83867IR, KSZ9031RNX), and ensure CMC/magnetics/isolation boundary is implemented with controlled return paths.
Symptom F — Night failures around IR illumination (intermittent recognition or image swings)
- Evidence (first 2 probes): TP1 SYS rail + TP2 CORE (or sensor rail)
- Evidence (logs): ir_strobe_miss_cnt, frame_drop_cnt, bitrate_peak, reset_reason
- Isolate: disable IR; reduce strobe current; retest; then re-enable stepwise to find the load threshold.
- First fix: move IR load to its own controlled rail and ensure load-step stability (buck/eFuse choices as above), and log strobe-miss counters to avoid “software blame” loops.
Decision tree (fast path): symptom → first 2 probes → next action
H2-13. FAQs ×12 (evidence-based, no scope creep)
Each answer gives a short conclusion, the first 2 checks (TP probes + log fields), a first fix, and a chapter mapping back to the main evidence chain.