HEMS Panel: Branch Metering, Load Switching & Secure Gateway
← Back to: Smart Home & Appliances
A HEMS panel turns a home’s breaker box into a measurable, switchable, and secure energy node by combining branch-level metering, verified load actuation (sense-back), DER interfaces (PV/ESS/backup), and tamper-evident event logging. It is built around evidence—phase/sign truth, rail/isolation discipline, and repeatable field validation—rather than cloud rules or protocol tutorials.
H2-1 — Page Intent & Boundary (What this page is / isn’t)
This page defines a buildable, panel-side hardware subsystem. It focuses on branch metering, branch switching, DER interfaces, and local security evidence — not cloud architecture and not appliance-side control.
Scope Guard (mechanically checkable)
- In-scope: breaker box integration, branch sensing (CT/shunt/Rogowski), phase mapping, branch switching (relay/contactor/SSR/eFuse), safety taps (RCD/MCB/AFCI/leakage/temp), PV/storage interfaces, secure boot + monotonic counters + event logging, isolated power domains, verification evidence.
- Out-of-scope: cloud/dashboard architecture, automation rule engines, thermostat loops, appliance motor drives, lighting dimming curves, voice assistant UX, protocol tutorials, revenue-meter certification procedures.
What this page delivers
- Buildable architecture: a panel-side block breakdown that maps to schematics and PCB partitions.
- Proof-driven validation: minimal measurement points (2 signals first) to isolate phase errors, false trips, comms drops, and log integrity faults.
- BOM-oriented selection map: functional blocks → key specs → part families (optionally MPN examples in later chapters).
Why the boundary matters
Panel-side designs mix high-energy switching, safety constraints, and metering accuracy. Content is structured to prevent cross-page overlap: general EMC textbooks belong to the EMC/Safety page, while this page only keeps HEMS-specific coupling paths and validation evidence.
Cite this figure: HEMS Panel Boundary Diagram (F1) — use as the page scope reference.
H2-2 — System Block Breakdown (Panel architecture you can actually build)
The HEMS panel is a system of six subsystems that must work together under safety and interference constraints. Each block below includes a concrete proof target, a typical field failure mode, and the first evidence point to capture.
1) Branch sensing front-end
CT/shunt/Rogowski → AFE/ADC → energy, PF, phase mapping.
- Must prove: low-load accuracy + phase consistency + channel matching.
- Typical failure: negative power, phase swap, drift at light loads.
- First evidence: CT polarity/phase map check + synchronous sampling timing.
2) Branch switching / actuation
Relay/contactor/SSR/eFuse with driver and “sense-back” confirmation.
- Must prove: real open/close state + inrush resilience + thermal margin.
- Typical failure: false trip, welded contact, SSR overheating/leakage.
- First evidence: switch-edge current waveform + state feedback (contact or current-to-zero).
3) Protection & safety taps
RCD/MCB/RCBO/AFCI taps, leakage/temperature points, fault flags.
- Must prove: fault cause is traceable and time-aligned across rails and branches.
- Typical failure: “trip happened” without root cause, intermittent nuisance events.
- First evidence: event record with timestamp + branch_id + rail/reset state.
4) DER interfaces (PV / storage)
Signals that make PV/ESS status verifiable at the panel boundary.
- Must prove: interface states are deterministic (not UI-only) and captured in logs.
- Typical failure: delayed/mismatched status, broken interlocks during transfer events.
- First evidence: interface sampling cadence + queue/backpressure + time alignment.
5) Local compute + time base
MCU/MPU + RTC + power-fail handling for durable, ordered records.
- Must prove: monotonic counters + anti-rollback + graceful power-fail commit.
- Typical failure: duplicated/unsorted logs after outages, missing fault windows.
- First evidence: power-fail IRQ timing + RTC backup rail waveform + counter increments.
6) Secure gateway & comms
Ethernet/Wi-Fi/Thread/Zigbee/PLC edge connectivity with device-side security.
- Must prove: secure boot chain + key protection + comms robustness during switching events.
- Typical failure: comms drops when loads switch, unexpected resets during updates.
- First evidence: reset-reason + rail droop at switch events + boot attestation result.
Cite this figure: HEMS Panel Buildable Architecture (F2) — use as the reference map for all later chapters and FAQs.
H2-3 — Branch Metering Physics Choices (CT vs Shunt vs Rogowski, and when)
Branch-level metering succeeds or fails at the sensor physics and installation boundary. The goal is a choice that matches current range, low-load requirements, phase accuracy, isolation strategy, and panel constraints.
3-line conclusion (fast decision)
- CT fits clamp-on installation and strong isolation, but demands phase/polarity control to avoid PF bias and negative power.
- Shunt fits high accuracy and stable phase, but requires a safe isolation/common-mode plan and thermal drift control.
- Rogowski fits wideband/high-current waveforms, but needs a stable integrator and careful low-load noise handling.
Engineering comparison (panel reality)
| Option | Where it wins | Typical pitfalls | First evidence points |
|---|---|---|---|
| CT Clamp-on isolation |
Non-invasive install, high-voltage isolation is natural, low power loss. | Phase error biases PF; saturation under inrush/harmonics; polarity/phase mapping mistakes cause negative power. |
|
| Shunt Precision current sense |
Strong linearity, stable phase for PF, good at low currents if noise is managed. | Isolation/common-mode constraints; self-heating drift; creepage/clearance and layout become first-order. |
|
| Rogowski Wideband coil |
High peak current headroom, wideband waveforms (harmonics/inrush) with low insertion impact. | Integrator offset/drift; low-frequency noise; low-load energy can be noise-floor limited. |
|
Common pitfall A: “Power is negative” on a normal load
Most often caused by CT polarity or phase mapping mistakes, not real reverse flow.
- First 2 checks: (1) verify CT direction marker vs conductor; (2) verify branch-to-phase map.
- Discriminator: if the sign follows the sensor/channel even when the load is unchanged, mapping/polarity is the root cause.
- First fix: enforce a polarity test step + store phase map as a versioned table (logged).
Common pitfall B: PF is consistently low across loads
A stable PF bias indicates phase shift in the sensor/AFE path or sampling misalignment.
- First 2 checks: (1) PF under a known resistive load; (2) compare V/I zero-cross time offset.
- Discriminator: if PF stays low even on a resistive load, phase chain is wrong (not the load).
- First fix: add phase calibration or align sampling timing; re-check anti-alias filter phase.
Common pitfall C: Light-load readings drift or jump
Usually driven by noise floor, gain-range transitions, or integrator drift (Rogowski).
- First 2 checks: (1) raw ADC noise floor at idle; (2) look for gain/PGA switching steps.
- Discriminator: if readings change when gain thresholds toggle, the range strategy is the culprit.
- First fix: add hysteresis for range switching + lengthen accumulation window for low-load energy.
Cite this figure: Branch Metering Sensor Choice Map (F3) — use as the branch sensing selection reference.
H2-4 — Metering AFE & ADC Chain (accuracy, phase, and noise you can prove)
After the sensor choice, accuracy depends on the analog front-end and sampling chain. The key is to control phase alignment, noise floor, and common-mode/isolation behavior — and to diagnose issues using two waveforms and two statistics.
Hard constraints that determine correctness
- Phase/PF: synchronous sampling, consistent phase delay across channels, and known anti-alias filter phase.
- Dynamic range: stable readings from standby watts to kW peaks without gain-range artifacts.
- Common-mode & isolation: shunt-based paths need robust CM headroom and isolation delay awareness.
Symptom 1: PF is consistently low
Stable PF bias usually means a phase problem in the sensor/AFE chain or sampling misalignment.
- First 2 waveforms: (1) V(t) and I(t) zero-cross alignment; (2) I(t) phase vs load step.
- 2 statistics: PF error on resistive load; phase offset trend vs frequency/load.
- Discriminator: PF stays low on a resistive load → the phase chain is wrong (not the load).
- First fix: align sample timing; calibrate phase; verify anti-alias filter phase and isolation delay.
Symptom 2: Light-load readings drift or jump
Low-load instability is typically noise-floor limited or caused by gain-range switching and short accumulation windows.
- First 2 waveforms: (1) raw ADC codes/noise floor at idle; (2) PGA/range switch step response.
- 2 statistics: σ(P)/mean(P) at low load; reading sensitivity vs window length.
- Discriminator: jumps correlate with range transitions → range strategy dominates the error.
- First fix: add hysteresis to range switching; lengthen accumulation window; tighten reference/ground noise.
Symptom 3: Some branches show negative power
Most often caused by polarity/phase mapping or channel labeling issues; occasionally by sampling time skew.
- First 2 waveforms: (1) branch current sign vs known load direction; (2) V/I time alignment for that channel.
- 2 statistics: sign stability across time; correlation of sign flips with installation/channel changes.
- Discriminator: sign follows channel/sensor even when load is unchanged → mapping/polarity is the root cause.
- First fix: enforce polarity test + store phase map in versioned config; log updates with timestamps.
Minimal evidence checklist (bring-up ready)
- Waveform A: V(t) vs I(t) alignment around zero-cross (time offset).
- Waveform B: switching/inrush snapshot to expose saturation or common-mode overload.
- Statistic A: PF bias under resistive load (phase chain fingerprint).
- Statistic B: low-load stability (σP or Allan-like stability window) vs accumulation length.
Cite this figure: Metering Chain Evidence Fingerprints (F4) — use as the validation checklist for PF bias, light-load drift, and negative power.
H2-5 — Branch Switching & Actuation (relay/contactor/SSR/eFuse) with safe control
Branch control inside a breaker box is defined by inrush stress, arcing and lifetime, and fail-safe behavior. Correctness requires a proof loop: command → actuation → sense-back → event log.
Actuation proof loop (non-negotiable)
- Command: MCU issues the intended state with an interlock check.
- Actuate: driver energizes coil or gate with defined timing and brownout rules.
- Sense-back: auxiliary contact / voltage-across-switch / current-zero proof confirms the real state.
- Log: command + sensed state + reason + rails are recorded as an event (versioned firmware).
Two families, different failure signatures
| Family | Strengths | Risks | Minimum proof |
|---|---|---|---|
| Mechanical Relay / Contactor |
Natural isolation, low conduction loss, clear “open/close” behavior. | Inrush stress, arcing, welded contacts, coil brownout chatter. |
|
| Solid-state SSR / Triac / MOSFET / eFuse |
Fast control, diagnostics, programmable protection (eFuse). | Thermal limits, SOA under inrush, leakage current, dv/dt false triggers. |
|
Field symptom: “Commanded OFF, but load still behaves ON”
Most common causes are welded contacts (mechanical) or off-state leakage (solid-state) interpreted as “still on”.
- First 2 measurements: voltage across switch (open proof) + branch current residual.
- Discriminator: high current with “open” command → welded contact; tiny residual with voltage present → leakage signature.
- First fix: add independent sense-back and log both commanded_state and sensed_state.
Field symptom: “Random trips during motor/SMPS startup”
Inrush can exceed contact ratings or solid-state SOA, or trigger protection thresholds without proper timing logic.
- First 2 measurements: inrush current waveform + device temperature/rail droop during actuation.
- Discriminator: waveform shows high crest peak with short duration → inrush tuning issue; sustained overheating → thermal headroom issue.
- First fix: implement inrush-aware profiles (delay/ramp) and record peak/crest in the event log.
Fail-safe selection: fail-open vs fail-closed
Fail-safe is a design choice that must match branch criticality and the protection/logging strategy.
- Fail-open: safer for unknown faults; requires restore logic + clear user/service evidence.
- Fail-closed: continuity for critical circuits; requires stronger fault detection and independent protective cutoff paths.
- Proof requirement: default state, restore attempts, and interlock rejects must be logged with monotonic ordering.
Cite this figure: Branch Actuation & Proof Loop (F5) — use as the required control + diagnostics reference.
H2-6 — Safety & Protection Integration (what you tap, what you log)
Safety integration is not an EMC textbook. The goal is panel-local observability: tap protection states and capture event context that can be replayed in the field with timestamps, branch identity, and power-rail truth.
RCD / RCBO: trip state as a primary truth source
Use trip indication contacts (when available) as an input to the event recorder. Pair with branch current snapshots to classify context.
- First evidence: trip contact state + timestamp + rail_status.
- Context capture: pre-trigger RMS/peak/crest factor for the affected branch.
MCB: position or auxiliary contact, not guesswork
MCB handle position can disagree with assumptions after a fault. Prefer auxiliary contacts or position sensing where feasible.
- First evidence: MCB state + branch_id mapping + monotonic_counter.
- Discriminator: “commanded OFF” with MCB still ON indicates the switch is not the safety cutoff device.
Temperature: hotspots that explain nuisance trips
Place temperature taps at predictable stress points (busbar, solid-state devices, enclosure hotspots). Use trends, not single readings.
- First evidence: temperature rise rate + trip timestamp correlation.
- Discriminator: repeatable thermal signature → thermal headroom or airflow constraint.
Leakage / insulation: treat as an event with context
If leakage/insulation monitoring exists, log threshold crossings with branch context and rail truth to prevent “unknown cause” outcomes.
- First evidence: leakage level + branch_id + rail_status.
- Context capture: recent switching actions and inrush peaks from the proof loop.
AFCI: do not deep-dive algorithms—log what matters
AFCI needs a defined tap path (bandwidth and trigger). Record trigger time and a compact pre-trigger fingerprint rather than full waveforms.
- First evidence: trigger timestamp + band-energy indicator + branch_id.
- Context capture: short pre-trigger buffer (feature snapshot) to support field replay.
Event Record Schema (panel-local, replayable)
Keep the schema stable and versioned. A minimal record supports root-cause replay even under brownout conditions.
| Field | Meaning / why it matters |
|---|---|
| timestamp | Absolute time or secure timebase reference for event ordering and correlation. |
| branch_id | Physical branch mapping key (must be stable and versioned). |
| trip_reason | RCD/RCBO/MCB/AFCI/leakage/thermal classification label. |
| current_rms, peak | Compact pre-trigger signature that distinguishes overload vs inrush vs abnormal behavior. |
| commanded_state, sensed_state | Separates “issued command” from “proven physical state” (diagnostics anchor). |
| rail_status | Power integrity truth at the time of the event (brownout often fakes root causes). |
| reset_reason | If a reset occurs, it must be visible; otherwise events appear random and unrepeatable. |
| firmware_version | Reproducibility and field rollback control rely on version identification. |
| monotonic_counter | Prevents log reorder/rollback and supports tamper-resistant sequencing. |
Cite this figure: Safety Signals → Evidence Capture → Event Record (F6) — use as the panel-local logging and replay reference.
H2-7 — Solar & Storage Interfaces (PV inverter, ESS/PCS, backup transfer)
A HEMS panel acts as an interface hub: it normalizes DER state signals, enforces interlocks, and aligns timestamps so PV/ESS/transfer actions can be proven and replayed without protocol-stack deep dives.
PV inverter side
Focus on panel-relevant truth sources: generation power, grid-tied state, and fault indication—captured as time-aligned events.
- Signals needed: pv_power, grid_tied_state, fault_indication, (optional) curtail_enable.
- Common failures: PV “present” but no export seen; state flapping; fault appears without correlated context.
- First measurements: timestamp alignment check + a single “state truth” tap (contact/line status vs sampled state).
- Discriminator: flapping synchronized to grid-present changes → grid event; flapping with stable grid → interface noise or sampling power-domain coupling.
ESS / PCS side
Make “charge/discharge” and “allowed to run” provable. Log both the commanded intent and the permissive/interlock truth.
- Signals needed: soc, charge_discharge_state, alarm, permit_to_run or emergency_disconnect.
- Common failures: SOC looks normal but no discharge; alarms not time-correlated with switching; “E-stop asserted” but behavior unclear.
- First measurements: permissive/interlock inputs + panel-side import/export trend to cross-check state truth.
- Discriminator: state says “discharge” but panel power does not move → mapping/timebase fault or stale status; permissive rejects → interlock chain or contactor feedback issue.
Backup / transfer (ATS / contactor)
Only the interlock proof matters: demonstrate “no backfeed” and capture transfer state by independent sensing.
- Signals needed: grid_present, transfer_state (sensed), interlock_ok, (optional) critical_load_state.
- Common failures: chatter on restore; critical loads drop without a clear cause; inability to prove isolation at the time of an incident.
- First measurements: two independent proofs: transfer contact feedback + panel metering signature (unexpected reverse power / abnormal voltage relation).
- Discriminator: chatter plus reset_reason/rail_status events → panel brownout; chatter with stable rails → interlock logic or external input instability.
Minimum event correlation fields (DER interface)
DER interfaces become diagnosable when every state transition is time-aligned and mapped to a stable branch or source identifier.
| Field | Why it matters |
|---|---|
| timestamp + secure_time_ref | Correlation across PV/ESS/transfer without “ghost causality”. |
| source_id | Stable identity (PV/ESS/ATS) and wiring version control. |
| state + reason_code | State changes without a reason are not replayable. |
| interlock_ok + sensed_transfer_state | Proof of isolation and interlock enforcement. |
| rail_status + reset_reason | Prevents mistaking a brownout for a DER fault. |
Cite this figure: DER Interface Hub & Interlock Proof (F7) — use as the PV/ESS/transfer interface and logging reference.
H2-8 — Local Gateway & Security (secure boot, keys, and tamper-evident energy logs)
Panel security is device-side and testable: verified boot prevents unauthorized firmware, protected keys prevent cloning, and tamper-evident logs prevent replay/rollback—without requiring cloud architecture discussions.
Secure boot + anti-rollback (boot chain)
Only authenticated firmware should execute. Rollback protection prevents old vulnerable images from being loaded.
- Required behaviors: verified image check, version policy, and a deny path that is logged.
- Field signature: “mysterious behavior changes” often map to unauthorized image changes without a verifiable boot chain.
Key storage (TPM/HSM/SE)
Keys are valuable only when they cannot be exported. Prefer hardware-protected operations for signing and sealed storage.
- Required behaviors: non-exportable private keys, hardware-backed signing, sealed secrets bound to device state.
- Evidence anchor: log every cryptographic operation outcome with a monotonic counter.
Anti-tamper inputs (panel-specific)
Tamper signals should be treated as events with context—not as alarms without traceability.
- Examples: enclosure open, CT removal, phase swap suspicion, abnormal metering signatures.
- Minimum capture: timestamp, tamper_source, affected branch/source_id, rail_status.
Tamper-evident energy logs
Logs are credible only when ordering cannot be rewritten and power-fail writes preserve a minimal truth record.
- Required behaviors: monotonic counter, power-fail commit order, secure time reference, replay detection.
- Evidence anchor: minimal tuple must survive brownouts: timestamp, id, reason, rail_status, counter.
Three verification checks (must-pass)
Security is only real when it is verifiable. These checks can be executed during validation and referenced in field audits.
| Check | Expected result + required log evidence |
|---|---|
| Rollback image is rejected |
Device denies the older image or enters a restricted mode.
Log: fw_version, rollback_reject_reason, timestamp, monotonic_counter.
|
| Key is non-exportable |
Private key material cannot be read; signing occurs inside TPM/HSM/SE only.
Log: key_slot_id, op=sign, result, monotonic_counter.
|
| Event log is monotonic + survives power-fail |
Counters never decrease and the minimal tuple persists across abrupt power loss.
Log: timestamp, id, reason, rail_status, monotonic_counter.
|
Cite this figure: Secure Gateway Minimal Set & Verification Checks (F8) — use as the security validation reference.
H2-9 — Power Tree + Isolation Strategy (panel-grade power, not an EMC textbook)
A panel-grade design is defined by rails truth and boundaries: an isolated auxiliary supply feeds partitioned rails, isolation domains prevent noise injection, and entry protection is placed to control return paths and common-mode coupling.
Power tree reality (90–264Vac → isolated aux → rails)
Keep rails intentional: actions and comms should never share the same “truth” rail as low-noise metering.
- Recommended rails partition: DIG (MCU/MPU), ANA (AFE/ADC), RF (radio/PHY), DRV (actuation drivers).
- Minimum truth capture: log rail_status and reset_reason for every critical event and for any transfer/switching action.
- Panel rule: if switching coincides with comms drops or metering noise, prove or eliminate rail_dip before blaming protocol or algorithm.
Isolation domains (HV sensing vs logic vs comms)
Cross isolation only with necessary data. Avoid crossing noise loops via shared references or uncontrolled return paths.
- Domains: HV sensing (CT/shunt front-end), logic compute, comms (Ethernet/Wi-Fi/RS-485/PLC), actuation (relay/SSR drivers).
- Cross-domain signals: metering data, sensed feedback, interlock inputs—each mapped to a stable ID and timestamped.
- Failure signature: “drops only during switching” frequently indicates domain coupling through rails or return-path injection, not a link-layer defect.
Entry protection (surge/ESD/EFT) — panel-relevant only
Protection is about where energy returns. Place entry clamps and filters to control common-mode paths and protect sensitive front-ends.
- What to keep here: entry protection placement, domain return separation, and the common-mode path that hits AFE references.
- Field signature: after storms, “only some channels drift” often maps to a shared protection/return segment in one domain.
- First proof: correlate drift to a domain boundary and a shared return path before adjusting calibration.
Metering front-end sensitivity (without repeating AFE theory)
Low-load noise and phase bias often originate from rail noise, reference coupling, or protection leakage/offset—visible as a domain-level pattern.
- Low-load noise jump: compare ANA rail noise floor vs AFE output noise.
- Phase/PF bias: check synchronous sampling alignment and reference stability before changing sensor type.
- Storm drift: confirm whether offset appears in one isolation group—this points to protection/reference damage rather than “random” sensor behavior.
Power-related symptom shortlist (what to prove first)
Use symptoms to pick the first two measurements. Avoid turning the investigation into a generic EMC exercise.
| Symptom | First 2 measurements → likely cause class |
|---|---|
| Comms drops during branch switching |
Measure RF/COMMS rail + check reset_reason.
Usually rail dip / ground bounce injection across domains.
|
| Low-load metering noise suddenly increases |
Measure ANA rail noise + AFE output noise floor.
Often analog reference coupling or protection leakage shift.
|
| After storms, some channels show persistent offset |
Compare offset by isolation group + inspect shared protection/return segment evidence.
Frequently entry/protection or reference-domain damage.
|
Cite this figure: Power Tree + Isolation Domains Map (F9) — use as the panel power/isolation reference.
H2-10 — Validation & Field Debug Playbook (symptom → evidence → isolate → fix)
A repeatable playbook minimizes tools: every entry uses the same four blocks—Symptom, First 2 Measurements, Discriminator, and First Fix— plus a short list of log fields to confirm.
Decision entries (6–8 repeatable items)
Use the entries below as validation tests and as field triage. Each entry is designed to identify the dominant cause class quickly.
-
1) Negative power / PF looks wrong on some branches
Symptom: selected branches report negative power or consistently low PF under known resistive loads.
First 2 Measurements CT polarity check Phase mapping syncDiscriminator: if rail/reset truth is stable and only specific branches show sign inversions, the dominant cause is CT polarity or phase-to-branch mapping.
First Fix: enforce a mapping procedure (branch_id ↔ CT orientation) and lock it to a wiring version; add a quick “known load” self-check for commissioning.
Log fields: branch_id, phase_id, sign, pf, timestamp.
-
2) Inrush causes mis-trip or unstable switching behavior
Symptom: switching a branch triggers false protection, chatter, or unexpected off/on sequences.
First 2 Measurements Switch current waveform Trip reason / timingDiscriminator: if the peak/width matches motor or capacitive inrush and the trip aligns with the transient, the dominant cause is inrush handling, not steady-state overcurrent.
First Fix: implement staged turn-on (or inrush limiting strategy), adjust trip blanking windows, and require sense-back confirmation after actuation.
Log fields: branch_id, peak_current, trip_reason, sensed_state, timestamp.
-
3) SSR runs hot or shows unexpected residual voltage
Symptom: SSR area heats up or a “fully off” branch still shows residual voltage/current.
First 2 Measurements Residual voltage / leakage Hot-spot temperatureDiscriminator: if residual behavior correlates with temperature rise and does not track rail dips, the dominant cause is SSR leakage/SOA/thermal design.
First Fix: verify heat path, derate switching devices, and add a residual-voltage expectation label and sensed feedback where needed.
Log fields: branch_id, commanded_state, sensed_state, temp, timestamp.
-
4) Event log missing, duplicated, or out of order after power events
Symptom: after brownouts or abrupt power loss, logs contain gaps, duplicates, or counter decreases.
First 2 Measurements Monotonic counter continuity Rail/reset truthDiscriminator: counter discontinuities aligned with rail_status/reset_reason indicate power-fail commit ordering or hold-up insufficiency.
First Fix: enforce minimal tuple commit first, add power-fail interrupt path, and validate hold-up time for the log write budget.
Log fields: monotonic_counter, timestamp, rail_status, reset_reason, commit_state.
-
5) Comms drop only when switching
Symptom: Ethernet/Wi-Fi/RS-485 drops or retries spike only during switching actions.
First 2 Measurements COMMS/RF rail dip Reset reason or error counterDiscriminator: comms failures synchronized to rail dips or resets indicate coupling through rails/returns; stable rails with rising error counters suggests EMI injection at the interface.
First Fix: separate return paths, isolate comms rail from actuation transients, and add event correlation between switching timestamps and comms counters.
Log fields: timestamp, switch_action_id, comms_err_cnt, rail_status, reset_reason.
-
6) Metering drifts (worse at low load or after storms)
Symptom: offset/gain drift appears, especially at low load, or after surge/storm exposure.
First 2 Measurements ANA rail noise/offset AFE zero/gain stabilityDiscriminator: drift clustered by isolation/protection group indicates reference/protection damage or leakage shift; random per-branch drift suggests sensor installation or mapping issues.
First Fix: verify domain-level references, replace suspect protection/return segment components, and re-run a controlled “known load” verification set.
Log fields: timestamp, branch_id, offset, noise_floor, isolation_group.
-
7) Transfer/backup chatter or inconsistent interlock behavior
Symptom: transfer state flaps on restore or interlocks appear inconsistent across events.
First 2 Measurements Sensed transfer state Rail/reset truthDiscriminator: chatter with resets/rail dips points to panel power; chatter with stable rails points to external input stability or interlock logic mapping.
First Fix: require independent sensed feedback, debounce state taps, and attach every transition to a reason code and monotonic log entry.
Log fields: timestamp, transfer_state_sensed, interlock_ok, rail_status, reason_code.
Cite this figure: Field Debug Decision Tree (F10) — use as the validation and field triage reference.
H2-11 — IC / BOM Selection Map (function blocks + example MPNs)
This BOM map is built around evidence: branch metering must prove phase/sign and low-load noise; switching must prove “opened/closed” with sense-back; logs must remain tamper-evident across power loss. Example MPNs below are reference parts to anchor procurement shortlists (availability and compliance vary by region).
How to use this selection map
- Step 1: lock sensing approach per branch (CT vs shunt vs Rogowski) and required isolation domains.
- Step 2: pick actuation strategy (relay/contactor vs SSR vs eFuse) based on inrush and thermal limits.
- Step 3: choose security root + event storage so logs survive brownouts and resist rollback.
- Step 4: validate with two measurements per symptom (rail truth + signal truth), then tighten BOM.
BOM map (function block → key metrics → example MPNs → fit)
| Function block | Key metrics (what matters) | Example MPNs (reference shortlist) | Best fit |
|---|---|---|---|
| Energy metering IC (AFE/SoC) |
Phase accuracy, sign correctness, dynamic range (1W→kW), calibration hooks, multi-channel scalability.
Goal: PF/active power stays consistent under known loads.
|
ADI ADE9000, ADI ADE9153A, ST STPM34, ST STPM33 | Multi-branch metering cores |
| Isolated ADC / isolated modulator |
Isolation rating, CMTI, noise & linearity, sampling alignment, interface robustness.
Goal: shunt-based measurement without cross-domain noise loops.
|
ADI ADuM7701, ADI ADuM7703, TI AMC1306M25, TI AMC1311 | Shunt / HV sensing domains |
| Shunt monitor (bus current/voltage) |
High common-mode, low offset drift, fast transient capture, digital telemetry, alert pins.
Goal: prove inrush vs steady overcurrent in logs.
|
TI INA238, TI INA228, TI INA226, ADI LTC2949 | DC branches, rail truth |
| CT interface / PGA / op-amp chain |
Phase error budget, anti-alias filtering, gain programmability, low-noise at low current, overload recovery.
Goal: prevent negative power from polarity/map errors.
|
ADI AD8250 (PGA), TI PGA280 (PGA), TI OPA2188 (low-drift op-amp), TI OPA320 (low-noise op-amp) | CT-heavy branch metering |
| Relay / contactor coil driver |
Coil current control, kickback handling, diagnostics/sense-back input, fail-safe behavior, thermal headroom.
Goal: every actuation has a sensed confirmation.
|
TI DRV110, TI DRV103, Infineon BTS500xx (PROFET family), ST VNQ/VND series (smart switches) | Mechanical actuation |
| SSR / triac drive (AC) |
Gate drive robustness, dv/dt immunity, thermal loss, off-state leakage expectations, zero-cross constraints.
Goal: avoid “OFF but still hot” surprises.
|
onsemi MOC3063 (opto-triac), Vishay VO14642A (SSR), TI UCC27517 (gate driver), Infineon 2EDi family (gate drivers) | Silent switching / high cycles |
| High-side switch (DC) / diagnostics |
Inrush control, short protection, current limit, fault reporting, thermal shutdown behavior.
Goal: reason-coded protection, not “mystery trips”.
|
TI TPS1H200A, TI TPS1H100, Infineon BTS7008-2EPA, Infineon BTS500xx | DC loads / auxiliary circuits |
| eFuse / hot-swap (DC) |
Programmable limit, fast fault response, telemetry/fault flags, SOA, reverse current behavior.
Goal: inrush handling with logged trip reasons.
|
TI TPS25982, TI TPS25947, TI TPS2660, ADI LTC4368 | Protected DC branch switching |
| Isolated power (aux rails per domain) |
Isolation rating, output noise, load step response, efficiency vs heat, hold-up options.
Goal: keep ANA domain clean during switching.
|
TI SN6505 (isolated supply driver), Murata NXE1/NXE2 (isolated DC-DC), RECOM RxxP/RAC modules, Mean Well IRM modules | Per-domain isolated rails |
| Digital isolators (SPI/UART/I²C) |
CMTI, data rate, channel count, default failsafe states, power domain compatibility.
Goal: crossing only necessary data, not noise loops.
|
TI ISO7741, TI ISO7842, ADI ADuM141E, ADI ADuM1250 | AFE↔MCU crossings |
| Security root (SE/TPM) |
Non-exportable keys, secure boot support, monotonic counter/anti-rollback, tamper evidence hooks.
Goal: logs are tamper-evident after power loss.
|
Microchip ATECC608B, NXP SE050, Infineon OPTIGA TPM SLB 9670, Infineon OPTIGA Trust family | Trusted logs & updates |
| RTC + event storage |
Timestamp stability, brownout behavior, fast writes, endurance, atomic record strategy.
Goal: no missing/duplicated events across resets.
|
Maxim DS3231 (RTC), Microchip MCP79410 (RTC), Infineon FM24CL64B (FRAM), Winbond W25Q64JV (NOR flash) | Tamper-evident event logs |
| MCU/MPU + Ethernet PHY + radio |
Compute headroom, secure boot chain, stable time base, Ethernet reliability, multi-protocol coexistence power.
Goal: switching events never crash comms.
|
STM32U5 / STM32H7 (MCU), NXP i.MX RT1062 (MCU/MPU class), TI DP83825 / Microchip KSZ8081 (Ethernet PHY), Nordic nRF5340 / TI CC2652R7 / ESP32-C6 (radio SoCs) | Local gateway hardware |
Note: Some items above are families (e.g., isolated DC-DC modules, smart switches). Use the “key metrics” column to narrow exact variants by voltage/current/isolation class.
Selection reasons (avoid “part-number dumping”)
Metering chain (AFE / isolated ADC / shunt monitor)
- Why: branch-level energy control depends on phase/sign truth and low-load stability, not just “RMS looks plausible.”
- Hard constraint: the phase error budget must be defensible under temperature and across branches.
- Evidence hook: store pf, sign, noise_floor per branch and correlate with rail_status.
- Minimum proof: known resistive load test → PF ≈ 1 and consistent sign across mapped branches.
Actuation chain (relay/SSR/high-side/eFuse)
- Why: the panel must prove “opened/closed,” especially under inrush and aging.
- Hard constraint: inrush waveform type (motor vs capacitive input) dominates driver sizing and trip strategy.
- Evidence hook: log commanded_state + sensed_state + trip_reason.
- Minimum proof: switching timestamp aligns with sensed feedback, no unexplained chatter under controlled inrush.
Isolation + security + logs
- Why: isolation domains prevent noise loops; security root makes logs and updates tamper-evident.
- Hard constraint: brownouts must not produce log rollbacks or silent gaps.
- Evidence hook: monotonic counter continuity + power-fail commit ordering.
- Minimum proof: forced power-cut test → no counter decrease and no missing critical events.
Common BOM routes (choose by constraints)
| Route | Why it wins | Validation focus |
|---|---|---|
| CT + relay | Natural isolation for sensing; low conduction loss; clear physical open state with auxiliary contacts (sense-back friendly). | CT polarity/phase mapping; relay life/arcing under inrush; switching transient → comms rail stability. |
| Shunt + isolated ADC + eFuse (DC) | High accuracy at low load; reason-coded faults; fast protection with telemetry; best for protected DC auxiliaries. | Isolation domain correctness; rail truth correlation; trip reason alignment with waveform peaks. |
| CT + SSR | Silent, high-cycle switching; flexible control; better for frequent automation where mechanical wear is unacceptable. | Thermal hotspot + leakage behavior; “OFF residual” expectations; dv/dt immunity during switching. |
| Mixed (relay for heavy loads + eFuse for DC) | Optimizes heat, wear, and diagnostics across different branch types; reduces worst-case constraints per technology. | Consistent logging schema across device types; sense-back uniformity; domain return-path separation. |
Cite this figure: IC / BOM Selection Map (F11) — use as the procurement-to-validation cross-reference.
H2-12 — FAQs (evidence-based, no scope creep)
Each answer is engineered to “snap back” to this page’s evidence chain (metering, switching, DER interfaces, local security, power/isolation, and validation). No cloud/rule engines, no protocol tutorials, no appliance controls.
- Allowed: CT/shunt/Rogowski evidence, phase/sign, low-load noise, inrush/thermal, sense-back, event logs, timestamps/monotonic counter, rails/reset reasons, isolation domains/return paths, PV/ESS signal freshness.
- Banned: HVAC/fridge/washer control logic, cloud automation/rule engines, protocol-stack walkthroughs, inverter/UPS topology deep dives, compliance procedures.
One branch shows negative power often—CT polarity or phase mapping?
Start with two proofs: (1) a known resistive load on that branch while capturing synchronized V/I samples, and (2) the phase/branch mapping table used by firmware. If V–I phase and sign flip with CT orientation, polarity is wrong; if the sign error follows phase labels across branches, mapping is wrong. First fix: correct CT direction or mapping, then recalibrate.
Mapped: H2-3 / H2-4 / H2-10
Low-load (<5 W) reading jumps—noise floor, bandwidth, or sampling sync?
Measure (1) analog-rail noise at the metering front end and (2) AFE output noise (or raw ADC codes) during a steady low-load. If the noise rises only when radios or switching events occur, it is injection/return-path coupling. If it persists in quiet mode, it is bandwidth/quantization/anti-aliasing. First fix: tighten anti-alias filtering, increase averaging at low power, and isolate analog returns.
Mapped: H2-4 / H2-9 / H2-10
It trips immediately on closing—check inrush first or driver timing?
Capture (1) the inrush current waveform at the moment of closing and (2) the logged trip_reason with timestamp alignment. A short, high peak consistent with capacitive input or motor start indicates inrush dominance; a trip that precedes current rise suggests timing/window errors. First fix: add pre-charge/soft-start or delay the protection window and use reason-coded thresholds tuned to waveform class.
Mapped: H2-5 / H2-10
SSR temperature rises at “small” current—leakage loss or thermal path?
Check (1) on-state voltage drop (or conduction loss estimate) and (2) case temperature rise over time under the same RMS current. If temperature tracks voltage drop, conduction/heat sinking is the limiter; if heating appears even near OFF state, investigate leakage/residual conduction and dv/dt stress. First fix: improve thermal path, select lower-loss SSR/MOSFET stage, and validate snubber requirements for the specific load class.
Mapped: H2-5 / H2-10
Wi-Fi/Thread drops only when switching branches—what two waveforms first?
Measure (1) the comms-domain rail (e.g., 3V3 at the radio/PHY) and (2) a ground reference delta (ground bounce) during the exact switching edge. If link drop aligns with rail dip or bounce and reset_reason changes, the cause is power/return-path coupling, not the stack. First fix: add hold-up on comms rail, separate driver return paths from comms/AFE ground, and tighten domain isolation.
Mapped: H2-9 / H2-10
Event records occasionally missing—power loss or storage commit strategy?
Inspect (1) monotonic_counter continuity and (2) power-fail markers (rail_status, brownout/reset logs). If gaps correlate with brownouts, the commit window is too short; if counters continue but records vanish, the storage atomicity/endurance path is wrong. First fix: add supervisor + hold-up, use FRAM or two-phase commit, and log a commit-id so duplicates/holes are detectable.
Mapped: H2-6 / H2-8 / H2-10
PV/ESS status feels delayed—is interface refresh slow or local queue blocked?
Compare (1) source timestamps from PV/ESS signals and (2) local processing latency (queue depth or cycle time counters). If delay spikes during heavy switching or logging bursts, local scheduling/queue contention is dominant; if delay is constant, the refresh source is slow. First fix: prioritize interface tasks, decouple logging (buffer + batch commit), and keep “last-good” state with freshness flags.
Mapped: H2-7 / H2-10
After cover open, metering drifts—sensor shift or tamper-triggered derating?
Correlate (1) tamper_flag timestamp and mode changes with (2) metering offset/gain drift markers. If drift begins exactly at tamper detection and a protection mode toggles, the system is derating by design; if drift follows mechanical disturbance without mode change, sensor placement is suspect. First fix: tighten mechanical retention, tune tamper thresholds/service mode, and re-run a quick offset/phase validation.
Mapped: H2-8 / H2-10
Some circuits read too small—CT saturation or external magnetic interference?
Capture (1) the current waveform under high load and (2) readings after repositioning the CT relative to busbars/neighbor conductors. Saturation shows flattened peaks, phase distortion, and slow recovery at high current; external interference shows strong location/orientation sensitivity even at moderate current. First fix: increase CT headroom (core size/burden strategy), enforce installation spacing/orientation, and validate linearity at the highest expected branch current.
Mapped: H2-3 / H2-10
Same home type, different install: EMI passes/fails wildly—check which return path first?
Measure (1) common-mode/ground-bounce indicators during switching edges and (2) the coupling between driver return and analog/comms reference. If the failure follows protective-earth bonding and cable routing changes, return-path partitioning is the driver. If it follows only one domain, isolation boundaries are likely breached. First fix: enforce domain return separation, keep high-current loops tight, and validate rail stability at the comms module during worst-case switching.
Mapped: H2-9 / H2-10
After power restore, data repeats/out-of-order—RTC, counter, or commit is most suspicious?
Compare (1) RTC continuity (time jump size) and (2) monotonic_counter continuity plus commit-id behavior. If the counter resets or rolls back, anti-rollback/storage chain is failing; if the counter is monotonic but timestamps jump, the time base/backup supply is failing. First fix: back up RTC properly, anchor ordering on monotonic counters, and make log ingestion idempotent using commit-id.
Mapped: H2-8 / H2-10
UI says “opened” but the circuit is still live—how to confirm with sense-back evidence?
Validate (1) commanded_state vs sensed_state (aux contact/current/voltage presence) and (2) residual voltage at the load terminals. If sense-back indicates closed, contacts may be welded or driver stuck; if sense-back indicates open but voltage remains, suspect backfeed/neutral miswire or shared return paths. First fix: add voltage presence sensing, enforce interlocks, and log both command and sense-back with timestamps.
Mapped: H2-5 / H2-6 / H2-10
Cite this figure: FAQ Evidence Chain Map (F12)