Smart Plug / Switch Hardware Design & Debug Guide
← Back to: Consumer Electronics
Core idea: A Smart Plug/Switch is not “just an AC switch”—it is a tightly coupled system of mains switching + accurate energy metering + low-power Wi-Fi/BLE + protected AC/DC that must remain stable under surge/EFT/ESD, switching dv/dt, and radio TX bursts.
Good designs are proven by an evidence chain: with a few key waveforms and logs, engineers can quickly separate leakage vs welded contacts, supply sag vs 3.3V droop, noise injection vs drift, and false triggers vs real commands—making the product manufacturable and field-debuggable.
What this page solves: hardware boundary and key variants
This page focuses on making an AC mains switch (relay/SSR) + energy metering (V/I sensing + metering engine) + low-power Wi-Fi/BLE hardware coexistence + protected power supply stable for mass production and field debugging. Topics outside this boundary (app/cloud/protocol stack) are intentionally excluded.
Hard boundary: four hardware pillars
(1) Switching element and driver (relay / triac / SSR) · (2) Metering signal chain (shunt/CT → AFE/ADC → metering SoC) · (3) Connectivity hardware coexistence (radio peak current + EMI immunity) · (4) Protection and power integrity (surge/EFT/ESD + brownout behavior).
Variants: listed only, no cross-topic expansion
Smart Plug vs In-wall Switch (isolation/EMI sensitivity) · Single-wire vs Neutral supply behavior (power availability and leakage constraints) · Relay vs Solid-State (leakage, dv/dt robustness, thermal profile).
- Smart Plug vs In-wall Switch: plug form factors often expose contact heating and connector resistance; in-wall designs amplify isolation, creepage/clearance, EMI coupling, and false-trigger risks.
- Single-wire vs Neutral: single-wire designs face light-load stability, off-state supply constraints, and leakage-driven ghost behaviors; neutral designs shift emphasis toward surge/EMI robustness and thermal derating.
- Relay vs Solid-State: relay risks include arcing, bounce EMI, and contact welding; solid-state risks include leakage current, dv/dt false triggering, snubber trade-offs, and continuous conduction heating.
System architecture: power tree, sense chain, actuation chain, and coexistence
The entire design can be audited through four chains. Every later section maps back to one (or more) chains, enabling a repeatable debug workflow: identify symptom → capture minimum evidence → apply a binary decision rule → tune one hardware knob.
1) Power tree (stability and survivability)
AC input protection → EMI filter → rectifier/HV bus → AC/DC → 5V/3.3V rails → LDO/DC-DC for MCU/radio/metering.
2) Sense chain (accuracy and drift control)
Voltage divider + current sensing (shunt/CT) → AFE/ADC → metering engine → calibration + temperature drift compensation.
3) Actuation chain (clean and deterministic switching)
MCU control → driver/isolation → relay/triac/SSR → snubber/arc control → load. Emphasis: no false trigger, no welding, controlled EMI.
4) Coexistence (radio peaks + dv/dt + metering integrity)
Wi-Fi/BLE burst current, switching dv/dt, and SMPS noise must not corrupt rails, trigger the actuator, or bias the metering path.
- Evidence anchor points (recommended): TP1 HV bus, TP2 3.3V rail, TP3 current-sense output, TP4 coil/gate drive, TP5 RF burst marker (TX activity/current).
- Binary decisions that keep debugging fast: “rail droop vs logic fault” (TP2 + reset reason), “welded contact vs leakage” (TP4 + load voltage/current), “noise injection vs calibration drift” (TP3 vs temperature/time correlation).
- Design knobs that directly map to symptoms: UVLO threshold/hysteresis, bulk capacitance/ESR, snubber RC, dv/dt rating + gate resistor, AFE anti-aliasing, ground partitioning, and surge clamp sizing.
Switching element deep dive: Relay vs Triac/SSR — selection and field failures
Switching selection should be driven by a load-stress profile and verified by field evidence. The most valuable outcome is a binary workflow: identify the failure symptom → capture two signals/measurements → apply a decision rule → tune one hardware knob.
Load stress profile (what matters)
Focus on inrush peak and duration, inductive kickback, dv/dt at turn-off/turn-on, light-load sensitivity to leakage, and continuous conduction heating. These directly map to welding, false triggering, ghost conduction, and thermal runaway risks.
Evidence anchors used in this section
TP4 (coil/gate drive), Load V/I at the output, and a timing marker for switching events. When coexistence is suspected, correlate with TP2 rail droop and the RF burst marker.
- Relay strengths & failure patterns: low leakage and simple off-state behavior; risks include contact welding under inrush/inductive loads, contact bounce spikes (EMI), and coil-drive flyback path side effects.
- Triac/SSR strengths & failure patterns: no moving contacts; risks include off-state leakage (ghost conduction), dv/dt false triggering, snubber heat, and continuous conduction thermal limits.
- Zero-cross vs random turn-on: zero-cross typically reduces conducted EMI; random turn-on improves control flexibility but increases EMI stress and requires stronger filtering/coexistence design.
| Decision axis | Relay | Triac / SSR | MOSFET SSR |
|---|---|---|---|
| Off-state behavior | Near-zero leakage; clean off-state for light loads. | Non-zero leakage & parasitic capacitance; can create “ghost” conduction under light loads. | Can be low leakage (device-dependent); reverse blocking must be engineered. |
| Inrush & inductive stress | Risk of contact weld under high inrush; arc energy must be clamped. | Trigger strategy and snubber define stress; dv/dt immunity is critical. | Rds(on) loss and SOA/series stacking define stress margin. |
| EMI signature | Bounce/arc spikes can be dominant without suppression. | Zero-cross reduces conducted EMI; random turn-on increases EMI requirements. | Fast edges can inject noise; layout + gate control is decisive. |
| Thermal ceiling | Usually low conduction loss; coil power exists. | Conduction loss is continuous; heatsinking often limits maximum current. | Conduction loss ∝ Rds(on); thermal path defines real rating. |
| Best-fit triggers | When leakage is unacceptable and simple off-state is required. | When silent switching and contactless operation are required, with leakage tolerated. | When leakage must be minimized but solid-state control is needed, accepting higher complexity. |
Table: Use constraints (leakage, dv/dt, thermal) to decide; do not decide by “features” alone.
| Field symptom | Most likely causes | Capture two signals / measurements | Decision rule + knob |
|---|---|---|---|
| OFF command but load still ON | Relay contact weld or SSR leakage | 1) TP4 drive 2) Load current | If TP4 is OFF and load current is sustained (not mA-level), welding is likely → add inrush limiting/arc suppression/derating. |
| No command but periodic half-wave at load | dv/dt false trigger (triac/SSR) | 1) Gate waveform 2) Load voltage | If gate noise aligns with dv/dt events → tune snubber RC, gate resistor, isolation CMTI, and HV/LV partitioning. |
| Reboot / disconnect during switching | Arc/bounce EMI injection or rail droop | 1) TP2 rail 2) Switch event marker | If TP2 dips below brownout threshold at the same timestamp → raise bulk/ESR control/UVLO hysteresis and improve return path. |
| Runs for minutes then turns off | Thermal limit (SSR/triac/MOSFET) or contact heating | 1) Case temp 2) On-drop (V) | If on-drop rises with temperature → thermal runaway margin is insufficient → improve heatsinking/derate current/use lower-loss switch. |
| OFF still shows small voltage on load | SSR leakage + parasitic capacitance | 1) Load leakage current 2) Waveform shape | mA-level leakage + capacitive waveform indicates leakage, not welding → change topology or add bleed path (within safety limits). |
Table: A two-signal evidence rule keeps field debugging deterministic and prevents misdiagnosis.
Metering front-end: shunt/CT + AFE/ADC + sampling strategy
Metering design is not “reading a number.” It is an accuracy system with a traceable error budget, temperature stability, phase integrity (especially at low power factor), and immunity to switching/RF noise injection.
Signal chain (what must be controlled)
Voltage divider (surge-tolerant) + current sensing (shunt or CT) → anti-alias filtering → AFE/ADC with stable reference → digital metering engine → calibration parameters + drift compensation.
Why low PF exposes phase error
When real power is small relative to apparent power, a small phase error produces a disproportionate power error. Phase integrity depends on synchronous sampling, CT phase behavior, and timing skew between V and I paths.
Noise injection paths (what corrupts accuracy)
Switching dv/dt, ground bounce, SMPS switching-node coupling, and Wi-Fi/BLE burst current can modulate ADC reference or inject noise into sense inputs. The strongest proof is timestamp correlation with switch events or RF bursts.
Calibration strategy (no cloud dependency)
Single-point or two-point calibration to correct gain/offset; phase calibration if the metering engine supports it; temperature characterization for drift control; parameter integrity via CRC and versioning.
| Error source | Mechanism (what shifts) | Observable evidence (what to measure) | Mitigation knob |
|---|---|---|---|
| Voltage divider tolerance / drift | Gain error; temperature coefficient shifts ratio. | Compare measured Vrms vs reference across temperature points. | Low-TC resistors, ratio planning, thermal symmetry, periodic calibration. |
| Shunt self-heating | Resistance rises with temperature → current gain error. | Sense reading changes with sustained load and shunt temperature. | Lower burden, better copper heat spread, temperature model, derating. |
| CT phase / saturation | Phase error and nonlinearity under certain currents/waveforms. | Error increases at low PF or high crest factor; phase-sensitive tests. | CT selection margin, burden tuning, phase calibration, avoid saturation. |
| ADC reference drift/noise | Reference ripple modulates conversion result. | TP2 rail ripple or ref ripple correlates with reading jitter. | Reference filtering, decoupling, ground partition, rail impedance control. |
| Sampling skew / phase mismatch | V/I misalignment → power error amplified at low PF. | Error increases sharply as PF decreases; compare synchronous vs non-synchronous capture. | Synchronous sampling, matched filters, phase calibration, timing discipline. |
| Anti-aliasing insufficient | High-frequency noise folds into baseband. | Jitter or bias appears when switching noise increases; spectrum evidence if available. | Adjust filter corner, improve layout, reduce coupling from SW nodes. |
| Switching / RF injection | dv/dt coupling, ground bounce, burst current droop. | Reading spikes align with TP4 events or TP5 RF bursts. | Partitioning, return-path control, sampling window strategy, bulk/ESR tuning. |
Table: A practical error budget links each error term to an observable measurement and a concrete hardware knob.
Protection & safety front-end: surge/EFT/ESD, fuse/MOV/NTC, creepage & clearance
Protection is a coordinated energy system: define the threat (surge/EFT/ESD), control the current path (series impedance + clamp), and enforce a clear isolation barrier so the high-voltage return loop never overlaps the low-voltage control and metering domain.
Protection stack roles (who does what)
Fuse / thermal fuse interrupts sustained faults and thermal runaway. MOV absorbs surge energy at the mains edge. NTC limits inrush into rectifier and HV bulk. TVS is typically used on low-voltage or targeted nodes for fast clamping.
Common pitfalls (why boards still burn)
MOV placed with a large loop area, no upstream interruption (fuse mismatch), NTC ineffective under hot-restart, and isolation/keepout violations that let surge return currents couple into metering reference or MCU ground.
Isolation design intent (no standards text)
Maintain a visible boundary between HV danger zone and LV control zone with spacing, slots/keepouts, and strict return-path control. Keep isolation components centered on the barrier and prevent copper/silkscreen intrusion into clearance corridors.
Field evidence mindset
Burn signatures reveal energy flow: MOV thermal runaway, bridge short, or primary switch failure. A deterministic diagnosis starts with “where did the heat concentrate” and “what component became a low-impedance path.”
| Element | Primary purpose | Common pitfall | Field evidence (what it looks like) |
|---|---|---|---|
| Fuse | Interrupt sustained overcurrent and prevent downstream thermal runaway. | I²t mismatch: nuisance open during inrush or fails to open during repeated surge stress. | Open fuse with upstream parts intact, or fuse intact with downstream carbonization (mismatch clue). |
| Thermal fuse | Fail-safe cut-off when local hotspot exceeds a safe temperature. | Placed away from the hotspot or thermally isolated from the real heat source. | Thermal fuse never opens while MOV/primary area is visibly overheated. |
| MOV | Absorb surge energy and clamp mains spikes at the entry. | Thermal runaway from repeated surges; large loop area increases stress elsewhere. | Cracked/charred MOV, reduced resistance, localized board carbonization around MOV pads. |
| NTC | Limit inrush into rectifier and HV bulk at cold start. | Hot-restart: NTC is already hot → resistance low → inrush is not limited. | Failures occur after quick power cycling; rectifier/HV bulk show stress marks. |
| TVS (targeted) | Fast clamping on specific nodes (often low-voltage or sensitive rails). | Used as a “main surge absorber” beyond its energy rating; poor placement with long return path. | TVS shorted after transient events; local pad overheating; rail pulled down permanently. |
Table: Each element must have a single job; multi-role assumptions are a common cause of burn events.
| Burn signature | Likely root cause | Fast checks (2 measurements) | Next hardware knob |
|---|---|---|---|
| MOV overheats / cracks | Repeated surges exceed MOV energy; poor fuse coordination; large loop area. | 1) MOV resistance trend 2) Fuse coordination vs event history. | Improve fuse/MOV coordination, reduce loop, add thermal cut-off near MOV. |
| Bridge rectifier short | Inrush or surge overstress, often amplified by hot-restart NTC behavior. | 1) Bridge diode short test 2) HV bulk inrush profile. | Re-evaluate NTC/hot-restart, bulk capacitance, and surge path layout. |
| Primary switch failure | Overvoltage or thermal overstress; surge energy leaks into primary switching node. | 1) Primary FET short test 2) Snubber/clamp health check. | Strengthen primary clamp, improve thermal path, tighten HV loop and partition. |
Power supply design: AC/DC choices, brownout behavior, inrush, and always-on efficiency
The “protected supply” requirement is defined by burst loads and fault events: relay/SSR transitions, Wi-Fi/BLE TX peaks, and brownout conditions must not collapse the 3.3 V domain, corrupt metering accuracy, or create false switching.
AC/DC choice boundary (risk-focused)
Isolated flyback simplifies safety partition and reduces touch-risk coupling into the LV domain. Non-isolated options can be compact and efficient but impose stricter insulation, return-path control, and fault containment design.
Always-on stability under burst load
Low standby power must still tolerate short, high di/dt bursts from radio transmit and actuation. The decisive parameters are output impedance, decoupling placement, and rail sequencing—more than “power-save mode” settings.
Brownout behavior (why switching causes resets)
Relay coil or gate drive transitions create abrupt load steps. If rail impedance and UVLO/reset thresholds are not tuned, a brief 3.3 V dip triggers brownout reset even when the average rail voltage seems normal.
Inrush (two sources)
Cold-start inrush charges HV bulk through the rectifier/NTC. Load-side inrush occurs at switching. Both can overstress the power stage and the switching element unless current is shaped or timed appropriately.
Debug Card — “Reboot when switching” (evidence-first)
- Capture CH1: TP2 3.3 V rail at ≥100 MS/s if available; confirm minimum voltage vs brownout threshold.
- Capture CH2: primary current (or HV bulk charge current proxy) to detect inrush spikes during events.
- Capture CH3: TP4 coil/gate drive to timestamp the actuation edge.
- Decision rule: if TP2 dip aligns with TP4 edge → supply transient/impedance problem; if TP2 stable but metering jumps → injection into sensing/reference.
- Hardware knobs: move/resize local bulk at 3.3 V, control ESR/ESL, split rails for radio/MCU/metering, tune UVLO/reset hysteresis, reduce loop coupling from actuation return paths.
| Problem pattern | Most likely mechanism | Minimum evidence set | Primary knob |
|---|---|---|---|
| Reset on relay/SSR transition | Rail impedance + load step causes TP2 dip below brownout threshold. | TP2 waveform + TP4 timing + reset reason flag (if available). | Local bulk placement, rail splitting, return-path control, UVLO/reset tuning. |
| Wi-Fi TX causes metering jitter | RF burst current modulates reference/ground (injection). | TP2 ripple + metering jitter timestamps + RF activity marker. | Reference decoupling, ground partition, sampling window strategy, rail impedance control. |
| Failures after quick power cycling | Hot-restart defeats NTC; inrush overstresses bridge/HV bulk/primary. | Primary inrush profile + component temperature/state + failure location. | Hot-restart mitigation, NTC sizing/placement, controlled start timing, derating. |
| Random brownout under heavy load | Thermal drift increases losses; transient margin collapses at temperature. | TP2 dip vs temperature + converter thermal measurement. | Thermal path improvement, derating, lower-loss conversion stage, layout optimization. |
Relay/SSR driver & isolation: coil/gate drive, flyback paths, isolators, and dv/dt control
False switching and random resets are often driven by the drive path and return path, not by the switch element itself. A robust design constrains flyback energy, prevents isolator output glitches under fast common-mode transients, and controls dv/dt coupling into the gate/coil loop.
Relay coil drive: low-side vs high-side (return-path first)
Low-side coil drive is simple but must confine coil return current to a “dirty” loop. High-side drive can isolate the return path but increases component and isolation complexity. The decisive factor is whether the coil current shares ground/reference with metering or MCU.
Flyback clamp: diode vs TVS (release time vs EMI)
A flyback diode slows coil current decay and reduces high-frequency emission but increases release time. A TVS (or elevated clamp) accelerates release yet increases dv/dt and demands tighter loop control and driver voltage margin.
Triac/SSR gate drive: margin and dv/dt immunity
Ensure gate current margin across temperature and line variation. Gate loop area and isolator parasitic capacitance determine how easily a fast dv/dt event becomes an unintended gate pulse.
Isolation hard specs (impact only)
CMTI predicts whether the isolator output stays quiet during fast common-mode edges. Propagation delay affects timing alignment with noise events and can influence observed glitches and sampling overlap.
| Choice | What improves | What gets worse | Evidence to confirm (2 captures) |
|---|---|---|---|
| Flyback diode | Lower HF emission; less stress on driver voltage rating. | Slower coil release; longer time in ambiguous states for contacts. | 1) Coil current decay time 2) Contact release timing vs load behavior. |
| TVS / elevated clamp | Faster release; reduced mechanical bounce window. | Higher dv/dt; tighter layout needed; driver must tolerate higher clamp voltage. | 1) Coil voltage peak 2) Radiated/near-field spike correlation during release. |
Evidence Card — “False trigger / unexpected turn-on”
- Capture 1 (Gate/Coil): gate/coil drive waveform (TP4-class) to detect any pulse without a command edge.
- Capture 2 (Isolator output): isolator output node during switching dv/dt events; check for narrow glitches.
- Capture 3 (Reference shift): differential measurement between control reference and actuation return to reveal ground movement.
- Decision rule: isolator glitches aligned to dv/dt → CMTI/layout/return-path; gate pulse without isolator glitch → gate loop coupling; gate quiet but output turns on → switch-side dv/dt / snubber domain.
- Primary knobs: shrink loop area, move clamp components to minimize return path, strengthen isolation CMTI margin, add gate damping (R/C where appropriate), and enforce a single controlled return route.
EMI/EMC & coexistence: keep metering accurate while switching and transmitting
Coexistence is a correlation problem: identify noise sources (switching, SMPS, radio bursts), map coupling paths, locate metering victims, and prove causality by time-alignment between TX activity and reading spikes.
Primary noise sources (conducted + near-field)
Switching dv/dt edges, relay arcing/release, SMPS switching nodes, and radio TX burst current are the dominant sources. Each source has a characteristic time signature that can be used for correlation.
Metering victims (high sensitivity nodes)
High-impedance divider nodes, sense ground, and ADC reference are the top victims. If these nodes move with the event, the reading moves.
Typical coupling paths
capacitive dv/dt injection, inductive loop coupling, shared return ground/reference shifts.
Hardware-first mitigations
Partition domains, enforce a single-point reference, apply RC/CM suppression at victim nodes, and avoid sampling windows that overlap known burst events (TX edges, actuation transitions).
Correlation Debug Card — “Reading jumps during TX / switching”
- Pick a marker: TX duty/RF activity proxy (or radio supply current burst marker) with a timestamp.
- Align timelines: log metering readings and compare with marker edges; look for repeatable alignment.
- Classify coupling: alignment to dv/dt edges → capacitive injection; alignment to loop currents → inductive; alignment to load steps → shared return.
- Apply one knob: domain partition + single-point sense ground, RC/CM suppression at victim nodes, and sampling window avoidance of burst edges.
- Close the loop: correlation strength and spike count must drop after mitigation (same test, same marker).
| Problem | Evidence to prove | Primary coupling path | Hardware-first mitigation |
|---|---|---|---|
| Reading spikes during TX burst | Time alignment between TX marker and spike; TP2/REF ripple during TX. | Shared return + rail impedance modulation of ADC REF/sense ground. | Split rails, isolate ADC reference decoupling, enforce single-point sense ground, shorten burst-current loops. |
| Reading spikes during switching edges | Spikes align to actuation edge; near-field probe peak near gate/coil loop. | Inductive coupling from loop area; capacitive dv/dt injection into high-Z nodes. | Reduce loop area, shield/route away high-Z divider, add RC where bandwidth allows, tighten return paths. |
| Random drift that tracks load changes | Sense ground delta changes with load current; divider node moves with load. | Shared return path shifts the measurement reference. | Re-architect ground: metering reference separated from power return until a single merge point. |
| False triggers + metering glitches together | ISO OUT glitches and reading spikes share the same dv/dt timestamp. | Common-mode transient couples through isolator capacitance. | Increase CMTI margin, reduce dv/dt at source (snubber/domain control), place isolator away from HV edge loops. |
Thermal & reliability: derating, enclosure constraints, contact heating, and long-run drift
Thermal behavior is a system-level root cause that links contact aging, case temperature, and metering drift. A robust product treats heat as an evidence-driven budget: hotspots are identified, temperature rise curves are captured, and drift/trip events are correlated to temperature milestones.
Contact and terminal heating (hidden I²R)
Relay contact resistance, terminal/plug contact resistance, and copper bottlenecks often dominate hotspot formation. Small resistance changes create large local power dissipation at higher currents.
SSR/Triac conduction loss (steady case temperature)
Solid-state switching turns conduction loss into sustained temperature rise. Case temperature becomes a proxy for long-run reliability, and thermal path quality determines margin.
Enclosure constraints (thermal path, not aesthetics)
Plastic enclosures limit heat spreading; copper and localized thermal paths become critical. Different form factors create different time constants and hotspot locations even with identical circuitry.
Long-run drift: shunt, divider, ADC reference
Shunt temperature coefficient shifts current reading; divider drift shifts voltage reading; ADC reference drift biases all readings. Heat cycling can produce irreversible offset beyond instantaneous temperature coefficients.
| Hotspot | Location | Driver | Instrument | Risk | Mitigation knob |
|---|---|---|---|---|---|
| H1 | Relay contact / contact path | I²R from contact resistance | IR cam + contact-path ΔT | Weld/aging, drift trigger chain | Contact margin, reduce bottlenecks, derate continuous current |
| H2 | Terminal / plug interface | Contact resistance rise | IR cam + TC at terminal | Localized burn, intermittent drop | Improve contact integrity, widen copper, reduce local current density |
| H3 | Triac/SSR case | Conduction loss (Vdrop·I) | TC on case + ΔT curve | High case temp, long-run reliability | Thermal path, copper spreading, derate duty/continuous load |
| H4 | Shunt / sense resistor | TC + self-heating | Error vs temperature sweep | Current reading drift | Lower TC, thermal placement, calibration vs temperature strategy |
| H5 | Divider / high-Z nodes | Aging + humidity/heat stress | Voltage error vs time/T | Voltage reading drift | Component stability, layout guarding, thermal exposure reduction |
| H6 | ADC reference domain | Reference drift / rail coupling | REF noise + error correlation | Global measurement bias | Quiet REF rail, decoupling, separate return, reduce burst coupling |
Derating workflow (thermal budget driven)
- Map hotspots: IR + TC points on contacts, terminal, SSR case, and sense components.
- Capture ΔT curves: run 30–120 minutes per condition until the curve approaches steady state.
- Link to behavior: overlay metering error and fault/trip timestamps on the same timeline.
- Set derating: choose continuous current/duty to keep hotspot margin across worst ambient and enclosure constraints.
- Re-verify: repeat with the same setup; hotspot and drift/trip correlation must reduce with margin.
Validation test plan: what to test, what to log, and pass/fail thresholds
A production-ready validation plan is defined by three outputs: what is stressed, what is measured, and what constitutes pass/fail. This section provides a test matrix that keeps every test tied to concrete logs (TP2/TP4/metering/fault flags/time) and to failure modes seen in the field (reset, false trigger, reading jump).
Validation pipeline (order matters)
Baseline calibration → switching stress → immunity events → radio coexistence checks → long-run soak and thermal drift re-check. The same logging schema is reused across all stages to preserve causality.
Minimum logging set (portable across tests)
TP2 3.3V TP4 coil/gate V/I/P/PF fault flags reset reason TX marker temperature timestamp
Pass/Fail principles (threshold forms)
- Functional stability: no reset, no false trigger, no stuck state during events and stress.
- Metering stability: reading deviation must stay within a predefined limit during and after stress/events.
- Recoverability: after each event, the system must return to normal operation within a predefined time window.
- Correlation reduction: after mitigation, reading spikes must lose time alignment to known activity markers (TX edges, switching edges).
- Drift closure: after thermal soak/cycles, accuracy and offsets must remain within the specification window.
| Test | Setup (concept-level) | Metrics | Pass / Fail (form) | Logs (minimum) |
|---|---|---|---|---|
| Electrical safety hi-pot / leakage |
Apply insulation stress across isolation boundary; monitor leakage and abnormal heating at critical boundaries. | Leakage behavior, abnormal temperature rise, non-recoverable damage signatures. | No breakdown; leakage within product-defined limit; no persistent damage. | Timestamp, IR/TC notes, event result summary. |
| Switching stress surge + repetitive |
Controlled repetitive switching and surge exposure; include worst-case transitions and duty patterns relevant to the product spec. | False trigger count, stuck-on/off, contact/case temperature rise, output consistency. | No weld/stuck; no unintended turn-on; temperature within derating budget. | TP4 (coil/gate), output state, fault flags, temperature points. |
| Inductive simulation stress method |
Use a controlled inductive-like stress fixture to reproduce fast current changes and switching transients (method-focused). | dv/dt sensitivity, false trigger rate, overshoot signatures. | No unintended switching; no reset; overshoot signatures within product-defined margin. | TP4, TP2, event timestamp, output state. |
| Metering accuracy P / PF / T sweep |
Compare against a reference instrument across multiple power points, multiple PF points, and multiple temperatures. | Error vs power, PF, temperature; hysteresis after heat soak. | Error within accuracy target window across the defined sweep; drift bounded after soak. | V/I/P/PF, temperature, reference readings, calibration state. |
| Immunity events EFT / ESD / Surge |
Inject disturbances at defined entry points; observe system stability during injection and recovery behavior afterward. | Reset count, false trigger count, reading jump amplitude and duration, recovery time. | No reset, no false trigger; reading jump bounded and recoverable within time limit. | TP2 (3.3V), TP4, fault flags, reset reason, metering stream. |
| Radio coexistence TX/RX window |
Force repeatable RF activity windows (TX marker proxy); compare readings during activity vs quiet windows. | Reading deviation during activity; correlation of spikes to marker edges; post-activity recovery. | Deviation within limit; correlation strength reduced after mitigations; stable recovery. | TX marker, metering stream, TP2 ripple proxy, timestamps. |
| Long-run soak thermal drift |
Run extended operation at representative ambient and enclosure conditions; repeat metering checks and hotspot scans. | ΔT steady-state, drift over time, trip/reset events over time. | No unexpected trip/reset; drift remains within spec window; hotspots stay within budget. | Temperature points, metering stream, fault flags, event timestamps. |
H2-11|IC & BOM selection map: metering, MCU+radio, drivers, AC/DC, protections
This chapter is a hardware-first BOM map for Smart Plug / Smart Switch designs where metering accuracy, switching robustness, radio coexistence, and protected supplies must hold up in production and in the field.
How to use the checklist (production-style)
- Start from worst case: inductive load inrush, low line, high temperature, TX burst during switching.
- Pick by failure mode: select each IC so the most likely failure can be proven or ruled out with a measurable signature.
- Prefer parts that expose hooks: metering status, brownout flags, driver fault, radio telemetry, temperature sensing.
- Lock the HV/LV boundary early: isolation strategy drives the driver choice and the layout partition.
Example MPNs below are commonly used reference points; availability, approvals, and regional mains requirements change over time and must be verified per design.
BOM selection checklist (rules + evidence + example MPNs)
| Block | Selection rules that matter | Evidence hooks / quick tests | Example MPNs (reference shortlist) |
|---|---|---|---|
| Metering IC SoC / AFE / ADC |
|
|
|
| MCU + Radio Wi-Fi/BLE |
|
|
|
| Driver & Isolation relay / triac / SSR |
|
|
|
| Offline AC/DC protected supply |
|
|
|
| Protection parts fuse/MOV/TVS/NTC |
|
|
|
Common BOM pitfalls (field symptoms → evidence → knob)
- Radio brownout loop: device “reboots only during TX/association” → measure
3.3V droopaligned with TX bursts → increase transient headroom (caps/DC-DC loop/UVLO margin). - False triggering (triac/SSR): load flickers with no command → observe periodic half-wave at load + gate noise synced to dv/dt → improve dv/dt immunity (driver choice, snubber, partition, reference control).
- Meter jumps during switching: energy reading steps at relay open/close → capture V/I waveform + ADC reference ripple → tame injection path (sense routing, RC, sampling avoidance window).
- Fuse nuisance trips: plug-in or capacitive load trips randomly → log inrush peak and duration → coordinate fuse time-lag vs NTC/capacitance.
- MOV overheating drift: after repeated surges, leakage rises / heat spot at MOV → thermal image + leakage check → derate MOV Vrms/energy, add thermal spacing or fuse coordination.
These are selection-time issues: the fastest fix is typically changing one block’s robustness class (driver CMTI, metering IC noise immunity, AC/DC protection behavior) instead of endless layout iteration.
H2-12|FAQs ×12 (hardware-evidence based)
These FAQs are written as evidence-first decisions. Each answer provides a minimum two-measurement set and a short decision tree that maps back to the relevant H2 chapters (switching, driver/isolation, metering, protection, power, EMC, thermal).
Rule of thumb: If a suspected root cause cannot be proven or ruled out with two scope captures (or one capture + one log/status), it is not yet a diagnosis—only a guess.
1) “Switched OFF but the load still glows / still has voltage”: leakage current or welded relay contact?
Goal: separate leakage-type conduction (SSR/triac/snubber paths) from stuck-on conduction (welded relay contact).
- Minimum evidence (2): (A) load-side voltage waveform shape, (B) load current magnitude.
- Leakage signature: periodic half-wave / zero-cross-related conduction with small current that depends strongly on load impedance.
- Weld signature: near-normal current capability; output remains “fully on” even when coil/gate drive indicates OFF.
See H2-3 (relay failure modes) and H2-7 (SSR/triac drive & leakage paths).
Mapped: H2-3 / H2-72) “Reboots when the load turns ON”: AC/DC sag or a 3.3V transient droop?
Goal: determine whether the reset is driven by secondary supply collapse or local LV transient at the MCU/meter domain.
- Minimum evidence (2): (A) TP2 3.3V rail droop/ripple, (B) AC/DC secondary output (5V/12V) or primary current spike at the load-on edge.
- LV transient: TP2 dips sharply and recovers fast, aligned with relay/SSR switching or TX burst.
- Supply sag: AC/DC output collapses first (or enters restart/UVLO behavior), then TP2 follows.
See H2-6 (brownout behavior, inrush, transient headroom).
Mapped: H2-63) “Meter readings jump / drift”: noise injection or calibration drift?
Goal: split edge-/TX-correlated jumps from time/temperature drift.
- Minimum evidence (2): (A) metering error vs switching edge/TX marker correlation, (B) metering error vs temperature (or time-at-load) curve.
- Noise injection: step-like error synchronized to relay/SSR edges or TX activity; often follows TP2/ADC reference ripple patterns.
- Calibration drift: monotonic or slow-changing error that tracks board/hotspot temperature or long-run aging behavior.
See H2-4 (metering front-end & sampling strategy) and H2-9 (thermal drift mechanisms).
Mapped: H2-4 / H2-94) “Wi-Fi TX makes metering worse”: how to prove it is coexistence (correlation proof)?
Goal: show metering degradation follows radio activity, not random noise or “bad calibration”.
- Minimum evidence (2): (A) TX marker (or RF active indicator) vs metering error time series, (B) TP2 ripple (or ADC reference noise) vs TX activity.
- Strong proof: change TX duty/throughput → error amplitude changes accordingly while load stays constant.
- Root path hint: if TP2 ripple spikes align with TX, power/return coupling is likely; if TP2 is clean, sense/REF coupling or layout is more likely.
See H2-8 (EMI/EMC & coexistence evidence workflow).
Mapped: H2-85) “Relay clicks but the load does nothing”: coil drive issue or contact/path failure?
Goal: confirm whether the relay is being driven correctly and whether the power path actually changes state.
- Minimum evidence (2): (A) coil drive waveform (voltage/current) at the driver, (B) load-side voltage (or current) change at the output terminals.
- Contact/path failure: coil drive looks correct but output stays unchanged → suspect welded/burnt contacts, high contact resistance, or an open power path.
- Drive/power issue: coil drive collapses or is short → suspect driver headroom, flyback path, or supply droop during actuation.
See H2-3 (relay field failures) and H2-7 (driver & flyback/isolation behavior).
Mapped: H2-3 / H2-76) “Triac/SSR occasionally turns on by itself”: dv/dt false-trigger—measure gate first or load first?
Goal: detect a false conduction signature and determine whether it is driven by gate/driver jitter or by inherent leakage paths.
- Minimum evidence (2): (A) load-side voltage waveform (look for periodic half-wave conduction), (B) gate/driver output waveform (look for jitter aligned to switching dv/dt events).
- Gate-driven false trigger: driver output shows spurious pulses/noise that aligns with high dv/dt edges.
- Leakage-driven: load-side conduction exists without gate activity; amplitude often changes with snubber/EMI paths.
See H2-7 (dv/dt control, isolation) and H2-8 (coupling paths & mitigation evidence).
Mapped: H2-7 / H2-87) “After a surge it still boots, but mis-triggers more easily”: MOV aging or isolation leakage damage?
Goal: determine whether the surge changed the system by protection-device leakage/aging or by HV/LV boundary leakage.
- Minimum evidence (2): (A) MOV area temperature / leakage trend (before vs after stress), (B) isolation boundary symptom: increased false-trigger rate or abnormal bias on gate/driver reference.
- MOV aging hint: new hotspots near MOV, rising standby loss, or leakage increasing after stress events.
- Isolation leakage hint: false triggers worsen and correlate more strongly with dv/dt or EFT/ESD events even at normal load.
See H2-5 (surge/EFT/ESD & protection stack) and H2-8 (noise coupling paths).
Mapped: H2-5 / H2-88) “Disconnects after running high load for a while”: thermal protection or contact heating voltage drop?
Goal: separate a deliberate protection-driven shutoff from a power-path degradation (contact/terminal heating) that collapses the output.
- Minimum evidence (2): (A) hotspot temperature rise curve (IR/TC), (B) state evidence at shutoff: gate/coil command or protection/log flag vs output voltage drop.
- Thermal protection: temperature approaches a threshold and a commanded OFF/OTP event is visible.
- Contact heating: voltage drop across relay/SSR increases with temperature; output collapses without an intentional OFF command.
See H2-9 (thermal & reliability evidence chain).
Mapped: H2-99) “Unstable standby / occasional offline”: light-load supply jitter or MCU brownout threshold?
Goal: identify whether standby failures come from light-load mode behavior in the offline supply or from MCU brownout sensitivity under bursts.
- Minimum evidence (2): (A) TP2 ripple shape in standby (pulse-skipping / low-frequency ripple), (B) reset reason / disconnect timestamp aligned to TP2 valleys or TX activity.
- Light-load jitter: large low-frequency ripple or periodic dips under tiny load.
- Brownout sensitivity: resets align with short bursts (TX, relay actuation) even when average standby looks normal.
See H2-6 (always-on efficiency, brownout, transient headroom).
Mapped: H2-610) “Large metering error under low power factor (PF) loads”: where does phase error come from?
Goal: locate phase error contributors (front-end filters, sensor choice, sampling alignment) that become dominant at low PF.
- Minimum evidence (2): (A) measured V–I phase alignment at AFE/ADC inputs (or derived phase error), (B) error vs PF sweep curve (multi-point PF test).
- Front-end/filter cause: mismatched RC/anti-aliasing paths shift phase between voltage and current channels.
- Sampling cause: timing jitter or windowing changes under switching/TX creates dynamic phase perturbations.
See H2-4 (phase error, sampling strategy, noise immunity).
Mapped: H2-411) “EFT/ESD causes reset or false-trigger”: which return path to validate first, and which two points are most sensitive?
Goal: determine whether immunity failures are dominated by LV rail disturbance (reset) or by driver/isolation disturbance (false-trigger).
- Minimum evidence (2): (A) TP2 transient (dip/spike) during EFT/ESD, (B) driver output / gate/coil disturbance during the same event.
- Reset path: TP2 shows a dip/spike sufficient to trip brownout; the load path may remain correctly OFF.
- False-trigger path: driver output jitters or gate sees spikes aligned to dv/dt; output toggles without valid command.
See H2-5 (protection & safety front-end) and H2-8 (coexistence, coupling, return control).
Mapped: H2-5 / H2-812) “Same design behaves differently across countries / mains environments”: input-range stress, surge level, or isolation/filter differences?
Goal: convert “different grid” into measurable stress categories: sag/brownout, surge, and EMI aggressiveness.
- Minimum evidence (2): (A) event-rate logging (reset/false-trigger/meter jump) with timestamps vs mains conditions (low-line tests/surge events), (B) symptom classification: reset-dominant vs false-trigger/metering-dominant.
- Reset-dominant: points to input-range and UVLO/AC/DC behavior under low-line or load steps.
- False-trigger/meter-dominant: points to surge/EMI coupling paths and HV/LV boundary implementation (partition/filtering/isolation margin).
See H2-5 (surge/clearance) and H2-6 (input range, UVLO, transient robustness).
Mapped: H2-5 / H2-6