Hot Water Recirculation Controller Design & Debug Guide
← Back to: Smart Home & Appliances
Hot water recirculation is a local control problem: a pump, a few sensors, and an evidence-based schedule/demand strategy to cut wait time without wasting heat, adding noise, or creating false starts. A robust design proves decisions with two signals first (Treturn trend and pump current), then hardens power/EMC and fault logic so comfort improves without reboots or high daily energy.
H2-1. Definition & System Boundary — What “Hot Water Recirculation” Means (and what it doesn’t)
Hot water recirculation is a closed-loop control approach that reduces “time-to-hot” at the tap by circulating water until a temperature evidence threshold is met. The engineering cost is a three-way trade: heat loss (loop stays warm), pump energy, and noise/lifetime (speed and start/stop stress). This page focuses only on the pump + sensors + controller and excludes water-heater internals and whole-home energy platforms.
In scope (controller-level, measurable)
- Pump control: BLDC/ECM drive, soft start, speed limiting, quiet bands.
- Evidence chain: temperature sensing + optional flow/proxy signals to decide “hot arrived”.
- Local logic: schedule, demand trigger, occupancy input (local), hysteresis, anti-short-cycle.
- Device ruggedness: power rail stability and motor EMI evidence points (scope-first debug).
Out of scope (avoid cross-page overlap)
- Water heater / tankless / boiler internal control and power topology.
- HEMS, cloud dashboards, Matter platform stack and backend orchestration.
- Plumbing installation tutorials, building-code certification walkthroughs.
Figure F1 The boundary diagram below separates “controller-in-scope” blocks from out-of-scope system blocks to keep design decisions and debug evidence focused.
ICNavigator, “Hot Water Recirculation,” Fig. F1 (System Boundary), 2026.
H2-2. Success Metrics & Constraints — Comfort, Energy, Noise, Reliability
A recirculation controller is “good” only when it can be verified with repeatable evidence. The practical definition of success is a balanced outcome: faster time-to-hot with bounded energy, acceptable noise, and predictable reliability under always-on constraints.
Primary metrics (what to measure and what counts as evidence)
| Metric | How to measure | Evidence signal | Typical range (use bands) |
|---|---|---|---|
| Time-to-hot (s) | Start from trigger → until temperature evidence crosses threshold | Return temp curveValve/trigger timestamp | Application-dependent; define a target band per loop size |
| Loop stability | Hold threshold with hysteresis and minimal cycling | Temp rippleDuty cycleCycle count | Stable within a bounded ripple band (avoid hunting) |
| Energy (Wh/day) | Integrate pump power over day; compare modes | Motor currentRun-time counter | Mode-dependent; keep within an explicitly budgeted band |
| Noise (dBA / banded) | Measure at fixed distance; capture worst-speed region | Speed vs noiseCurrent ripple | Define acceptable band per install environment |
| Reliability | Track starts/day, fault retries, lockout rate | Event countersFault logs | Bound start/stop frequency; minimize false trips |
Tip: if only two measurements are available, prioritize return temperature and pump current. Together they separate “heat arrival” from “pump doing work” and quickly expose false triggers.
Constraints that shape real designs (always-on + harsh coupling)
- 24/7 standby: low quiescent power and deterministic wake-up behavior (no random boot modes).
- Wet/humid environments: connector integrity and sensor drift must be expected, not treated as rare events.
- Large ΔT / thermal inertia: temperature evidence reacts slower than user expectation; control must avoid “hunting.”
- Low-flow uncertainty: direct flow sensing may be noisy at low rates; proxy evidence needs sanity checks.
- Motor EMI coupling: switching edges can induce rail dips and false sensing; debug must start with scope evidence.
Design trade-offs (with actionable tuning knobs)
- More frequent circulation vs higher heat loss: tune threshold + hysteresis + minimum-off time.
- Higher speed vs noise/lifetime: tune soft-start slope + speed caps + avoid resonance bands.
- More sensitive triggering vs false starts: fuse evidence (temp + flow/proxy) and raise confidence gates.
Figure F2 The metric map below shows the four success pillars and the minimal evidence signals that connect them to measurable debug points (used later in validation).
ICNavigator, “Hot Water Recirculation,” Fig. F2 (Success Metrics Map), 2026.
H2-3. Control Modes — Timer, Demand, Learning, and Hybrid (without platform creep)
A hot water recirculation controller succeeds when it turns “comfort” into repeatable local decisions: when to start, how long to run, and when to stop—all driven by evidence signals available on the device. The safest approach is to keep one shared state machine and allow different trigger sources (timer/demand/learning) to enter the same controlled sequence with identical anti-chatter rules.
Core stability rules (anti “flip-flop”): define Thigh/Tlow (temperature hysteresis), Ton_min/Toff_min (minimum on/off time), a preheat timeout, and a retry budget → lockout path. These four knobs prevent the most common failures: hunting on noisy temperature, short-cycling, and endless running after sensor faults.
Timer (Fixed windows)
- Best for: predictable daily usage (morning/evening) with minimal sensors.
- Key tuning: quiet hours, speed cap at night, Toff_min to avoid frequent starts.
- Typical pitfall: runs when nobody is home → energy loss; solved by adding a local “occupancy gate.”
Demand (Event-driven)
- Best for: reducing wasted runtime while keeping responsiveness.
- Key tuning: trigger debouncing, confidence gates (temp trend + pump work evidence).
- Typical pitfall: low-flow or noisy triggers cause false starts; solved by fusing temp rise rate and pump current.
Learning (Local pattern)
- Best for: homes with repeating patterns but irregular exact times.
- Key tuning: decay/forgetting factor, minimum confidence before preheat, maximum daily energy budget.
- Typical pitfall: pattern drift (weekends/season) → short cycling; solved by confidence gating and Toff_min.
Hybrid (Preheat + Demand)
- Best for: high comfort at peak hours, minimal waste during daytime.
- Key tuning: arbitration (which trigger wins), cooldown window after a run, speed caps by time-of-day.
- Typical pitfall: trigger conflicts → repeated starts; solved by a single shared state machine + Toff_min.
Minimal evidence set: a controller can remain robust with only return temperature and pump current. Return temperature defines “heat arrived,” while pump current proves “the pump is doing work” and helps detect dry-run/stall early.
Figure F2 The state machine below keeps all modes safe by sharing the same start/stop rules and anti-chatter guards.
ICNavigator, “Hot Water Recirculation,” Fig. F2 (Control State Machine), 2026.
H2-4. Temperature Sensing Chain — Where to Measure, How to Filter, How to Calibrate
Temperature is the primary “heat arrival” evidence, but it is also the most common source of false decisions when noise, thermal inertia, and motor switching interference are ignored. A robust chain needs three layers: correct measurement point, noise-aware sampling/filtering, and calibration + drift checks that keep thresholds meaningful over time.
Where to measure (control impact, not plumbing tutorial)
- Return-side evidence is usually the most reliable stop condition because it reflects loop closure.
- Pump-adjacent sensors may see local heating or coupling; require sanity checks (trend + pump work).
- Single-point designs must use a second evidence signal (pump current or runtime) to avoid “fake hot.”
Two measurements that solve most ambiguity
- Temperature ADC code: stability, ripple, and step response (hysteresis suitability).
- Pump current waveform: proves work, reveals dry-run/stall, shows EMI-coupled artifacts.
- Decision benefit: separates “real loop warm-up” from “local heat / noise injection.”
Sensor choice matrix (practical selection)
| Option | Strength | Risk / coupling | Calibration burden |
|---|---|---|---|
| NTC | Low cost, simple divider, wide availability | Non-linear; line noise + motor EMI can modulate ADC codes if routing is weak | 1-point often acceptable; 2-point improves threshold accuracy |
| RTD | Better linearity, stable over time, strong repeatability | Needs excitation/current source; wiring resistance matters; requires clean analog ground | 1-point may work; 2-point preferred when tight threshold bands are required |
Noise-aware sampling rules: sample away from motor switching edges when possible; use a short analog RC to reduce high-frequency injection; apply robust digital filtering (median/trimmed mean) and limit physically impossible temperature slopes to catch coupling artifacts.
Calibration & drift checks (keep thresholds meaningful)
- Boot self-check: detect open/short sensor conditions and reject out-of-range codes before enabling control.
- 1-point vs 2-point: use 2-point if tight hysteresis bands are required or if sensor spread is large.
- Drift monitor: track long-term offset trends; if bias grows, widen confidence gates or request service.
Figure F3 The front-end below shows the temperature evidence chain and the most common noise injection paths from the motor power stage.
ICNavigator, “Hot Water Recirculation,” Fig. F3 (Temperature Evidence Front-End), 2026.
H2-5. Flow/Presence Evidence — Flow Sensing Options & Proxy Signals
A recirculation controller does not need a dedicated flow meter to function, but it does need a reliable way to answer two questions: (1) Is the pump actually moving water? and (2) Is there a real hot-water intent right now? Direct flow sensing provides the strongest “loop is moving” evidence, while proxy signals can be made robust by using confidence gates that fuse pump-work evidence with temperature dynamics.
Direct flow sensing (strongest loop-closure evidence)
- Hall turbine: low cost; sensitivity can degrade at very low flow; affected by debris and orientation.
- Ultrasonic: wide dynamic range; higher BOM and power; careful mounting needed for repeatability.
- dP inference: uses pressure drop across a restriction; low-flow resolution can be limited; needs stable reference.
Proxy evidence (when flow meter is unavailable)
- Pump current / BEMF proxy: proves the motor is loaded; helps detect dry-run or stall patterns.
- Temperature slope (dT/dt): confirms heat is propagating through the return path, not just local heating.
- Optional acoustic/vibration: may help detect cavitation/no-flow, but must be treated as low-confidence.
Direct vs Proxy — selection matrix (cost, power, robustness, failure modes)
| Method | Cost / Power | Strength | Typical misread sources | Low-flow note |
|---|---|---|---|---|
| Hall turbine | Low / Low | Direct loop motion proof | Debris, orientation, magnetic noise | Resolution drops near threshold |
| Ultrasonic | Medium–High / Medium | Wide range, good repeatability | Mounting variance, bubbles, coupling | Usually better than turbine |
| dP inference | Low–Medium / Low | Works with stable restriction | Baseline drift, valve position changes | Weak at very low dP |
| Pump current proxy | Low / Low | Detects stall/dry-run patterns | Supply ripple, PWM artifacts | Does not measure flow directly |
| dT/dt (temp slope) | Low / Low | Confirms heat propagation | Thermal inertia, sensor placement | Slow response at low ΔT |
| Acoustic/vibration | Medium / Low–Medium | Can indicate cavitation/no-flow | Mounting noise, environmental vibration | Treat as optional/low confidence |
Confidence gate (proxy-only builds): require pump-work evidence (current above a minimum for Ton_min) and a return-temperature trend (dT/dt above a minimum within a window). Add a physical sanity limit on dT/dt to reject EMI-induced “fake heat.” If gates fail repeatedly, enforce Toff_min and escalate to lockout to prevent short-cycling.
Presence / hot-water intent inputs (local signals only)
- Button: strongest intent signal; simple debounce + minimum cooldown prevents repeated triggers.
- PIR: best used as an enable gate (allow preheat when motion is recent), not as a continuous demand source.
- Door contact: can bias intent near bathrooms/kitchens; treat as low-confidence and time-limited.
These inputs should only decide whether to enter the start sequence. They must not replace temperature stop rules or become a platform orchestration path.
Figure F4 Evidence fusion: direct flow (optional) + proxy signals + intent inputs, combined through confidence gates to avoid false starts.
ICNavigator, “Hot Water Recirculation,” Fig. F4 (Flow & Presence Evidence Fusion), 2026.
H2-6. BLDC Pump Control — Commutation, Speed Control, and Quiet Operation
BLDC/ECM recirculation pumps behave like small motor systems: stable comfort requires stable torque, and quiet operation requires minimizing torque ripple and control hunting. The most reliable approach is to implement a minimum viable control loop: a speed target from the state machine, a current limit for abnormal loads, fault detection for stall/dry-run, and thermal derating to protect the driver and motor.
Power-stage building blocks (what must exist)
- 3-phase bridge: integrated driver IC or external MOSFETs with a gate driver.
- Current sense: shunt (low-cost) or sensor; used for limiting and fault signatures.
- Temperature sense: driver/board temperature for derating and lockout thresholds.
- Control MCU: commutation + speed loop + protection policy (retry/lockout).
Hall 6-step (sensor-based)
- Fit: pumps with Hall sensors; reliable start under load.
- Risk: Hall noise/glitches can cause speed ripple → acoustic noise; apply debounce and consistency checks.
- Quiet lever: soft-start + speed ramp to avoid resonant bands.
Sensorless BEMF 6-step
- Fit: when sensors are not available; lower BOM/connector count.
- Risk: weak BEMF at low speed or with air in the loop; requires open-loop start then handover to closed-loop.
- Quiet lever: avoid aggressive handover; enforce current limits during start.
FOC (fit-based mention)
- Fit: when low noise and efficiency matter across a wide speed range.
- Risk: needs good current sensing and tuning; under-resourced implementations can be less stable than 6-step.
- Quiet lever: lower torque ripple, but only if sensing and control bandwidth are adequate.
Quiet operation knobs (most common wins)
- PWM frequency: avoid audible bands and structural resonance zones; keep margins for losses and EMC.
- Slew control: reduce dv/dt to limit EMI and acoustic excitation (at the cost of switching loss).
- Soft-start: ramp speed/current to avoid sudden hydraulic noise and torque steps.
- Loop stability: cap speed-loop aggressiveness to avoid hunting near thresholds.
Minimum viable control loop checklist: speed target → current limit → stall/dry-run detection → thermal derating → retry budget + lockout. This sequence is sufficient to keep the pump stable and quiet even when the system only has temperature as the primary comfort evidence.
Fault signatures (evidence-driven)
- Stall: current rises but speed estimate does not; action: limit, retry, then lockout.
- Dry-run / air: current/load pattern changes without the expected return-temperature trend; action: reduce speed, cooldown, retry budget.
- Over-temp: temperature crosses derating threshold; action: speed cap or stop, then cooldown window.
Figure F5 Power-stage block diagram: control MCU, driver/bridge, current and temperature feedback, and the protection policy loop.
ICNavigator, “Hot Water Recirculation,” Fig. F5 (BLDC Power Stage & Control Loop), 2026.
H2-7. Protection & Fault Detection — Dry-Run, Stall, Cavitation, Overheat
Protection in a recirculation controller should be evidence-driven and designed to avoid nuisance trips. The most reliable pattern is to require evidence pairs (two independent signals) within a time window, then apply a tiered action ladder: derate → retry → lockout → local alert.
Nuisance-trip avoidance rules: use a startup grace window for thermal inertia, confirm faults with N-of-M windows (multi-window confirmation), and enforce Ton_min / Toff_min to prevent short-cycling. Single-sample decisions are not robust in the presence of PWM noise and hydraulic transients.
Symptom → Evidence → Action (field-debug ready)
| Symptom (what is observed) | Evidence (use pairs + window) | Action ladder (avoid false stop) |
|---|---|---|
| Dry-run / no circulation pump sounds light, heat does not arrive |
|
|
| Stall / jam pump stops or hums, no rotation |
|
|
| Cavitation / air raspy noise, unstable flow |
|
|
| Overheat driver/board temperature rises |
|
|
Tiered action ladder (recommended default)
- Derate first: reduce speed / limit current to keep comfort while observing evidence longer.
- Retry second: cooldown then retry with a slower ramp; track retry budget.
- Lockout last: only for persistent strong signatures (stall, repeated overheat) or retry budget exceeded.
- Local alert: LED/buzzer conveys state without platform dependency.
Figure F6 Fault branches integrated into the state machine: derate/retry paths and lockout conditions with guard rails.
ICNavigator, “Hot Water Recirculation,” Fig. F6 (Fault Branches into State Machine), 2026.
H2-8. Power Tree & Low-Power Design — Always-On Controller Done Right
A hot water recirculation controller is typically always-on: it must keep time, sample sensors, and wake predictably while surviving input dips and surges. A robust implementation separates power domains (motor supply vs always-on logic vs sensors), uses UVLO/BOR for consistent reset behavior, and budgets energy across sleep, sampling bursts, run-time, and optional radio events.
Board-level scope (what matters here)
- Input: 12/24V DC (or adapter output) with board-level protection and UVLO.
- Domains: motor supply, always-on 3.3V, sensor rails (switchable).
- Consistency: brownout reset and safe restart policy prevent “resume pumping” after a dip.
Always-on power budget template (fill with measured currents)
| State | Typical duration/day | Current range | Energy/day (formula) | Reduction levers |
|---|---|---|---|---|
| Standby sleep | dominant (hours) | µA–low mA | E ≈ I_sleep · V_ao · t | deep sleep + RTC, disable sensor rails |
| Sampling burst | minutes total | mA–tens mA | E ≈ Σ(I_samp · V_ao · Δt) | short window, lower rate, median/trimmed mean |
| Pump running | event-driven | controller mA + motor W | E ≈ V_in · I_motor · t_run | quiet bands, derate, stop on confidence |
| Optional radio event | rare / short | tens–hundreds mA (peaks) | E ≈ Σ(I_radio · V_ao · Δt) | batch updates, short connect window |
Safe restart policy: on UVLO/BOR events, stop the motor drive first, then restore into a safe IDLE/cooldown state. Re-enter PREHEAT only after sensors re-validate temperature and intent evidence. This prevents repeated dips from creating rapid on/off cycles.
Low-power tactics (always-on friendly)
- MCU sleep: RTC wake for schedule slots and periodic sensor sampling.
- Intermittent sensing: power-gate sensor rails; sample in bursts, then compute stable statistics.
- Brownout consistency: BOR thresholds chosen so firmware never runs in an undefined voltage region.
- Write discipline: store configuration and counters with rate limiting to avoid frequent NVM writes.
Figure F7 Power tree with domains: motor supply, always-on 3.3V, and switchable sensor rails, plus UVLO/BOR and safe restart hooks.
ICNavigator, “Hot Water Recirculation,” Fig. F7 (Power Tree & Domains), 2026.
H2-9. EMC & Ruggedness (Device-Level) — Motor EMI, ESD/Surge Evidence Points
When the pump starts, the power stage can inject noise through three dominant paths: dv/dt coupling from the switching node, ground bounce from high-current return loops, and conducted ripple on the input rail. The fastest route to a fix is to measure a small set of probe points and map the symptom to a coupling path.
Device-level boundary: this chapter focuses on board-level evidence points and first hardware fixes that stop resets and RF dropouts. It does not expand into compliance standards or full certification workflows.
Noise paths → what is seen → first fixes
- dv/dt (SW node ringing) → radiated/capacitive coupling: RF dropouts worsen at pump PWM edges; near-field noise rises near antenna/XTAL. First fixes: gate resistor / slew control, snubber, reduce loop area around bridge, keep SW copper tight.
- Ground bounce → reference shift: ADC codes jump, false faults, MCU brownouts correlate with pump current. First fixes: split motor return vs logic ground, single controlled join, relocate sense returns, add local decoupling at MCU/RF.
- Conducted ripple on VIN → rail dips: MCU resets at start/stop events; 3.3V droops track VIN spikes. First fixes: TVS + π filter, common-mode choke (as needed), improve input bulk and placement, tighten UVLO/BOR behavior.
Top 6 probes (start here)
- P1 — VIN at board entry: ripple spikes at pump start/stop; use edge-trigger on pump enable.
- P2 — MCU 3.3V (or 1.8V) at MCU pins: dips / ringing; correlate with BOR/reset cause.
- P3 — Driver supply rail: sag or oscillation during commutation and PWM updates.
- P4 — SW/phase node: dv/dt level and ringing; compare “pump on” vs “pump off”.
- P5 — Current sense + its return: ground bounce contaminating thresholds and fault logic.
- P6 — RF near-field (antenna/XTAL zone): noise rise when pump runs; track with RSSI dropouts.
Minimal rugged hardware stack (board-level)
- Input protection: TVS sized for the input environment; clamp path kept short to return.
- Input filtering: π filter or CM choke where cable-borne noise dominates; place close to entry.
- Switch-edge control: gate resistor and/or snubber to reduce ringing and radiated coupling.
- Domain separation: motor power/return separated from always-on logic; controlled join point.
- Sensitive rails: RF/MCU local LDO + tight decoupling; keep noisy returns away from references.
Figure F8 Device-level coupling map with noise arrows and probe points P1–P6.
ICNavigator, “Hot Water Recirculation,” Fig. F8 (EMI Coupling Map & Probe Points), 2026.
H2-10. Validation Plan — Test Matrix for Comfort, Energy, Noise, and Faults
Validation should prove that the controller behaves correctly across modes, meets comfort/energy/noise targets, survives repeated edge cases (dry-run, stall, cavitation-like conditions), and remains stable under device-level ESD/EFT pre-checks. The matrix below is organized by Test item → Setup → Pass/Fail evidence.
Coverage categories
- Functional: mode switching, hysteresis, fault ladder, brownout recovery.
- Performance: wait time, return temperature stability, energy estimate, acoustic noise.
- Reliability: start-cycle endurance, thermal cycling, moisture stress, repeated fault recovery.
- EMC pre-check: device-level ESD/EFT spot checks with evidence of no resets or safe recovery.
Test Matrix — Test item → Setup → Pass/Fail evidence
| Test item | Setup | Pass/Fail evidence |
|---|---|---|
| Mode switching (timer/demand/hybrid) | Inject local triggers; vary schedule slots; log state transitions. | State machine follows expected path; no short-cycling; transitions timestamped and repeatable. |
| Hysteresis & anti-short-cycle | Sweep Treturn around thresholds; enforce Ton_min/Toff_min settings. | No chatter near thresholds; run/stop decisions stable across multiple sweeps. |
| Brownout recovery | Create controlled VIN dips; monitor reset cause and rails. | Motor stops first; controller returns to IDLE/cooldown; no unintended “resume pumping”. |
| Fault ladder execution (derate → retry → lockout) | Force dry-run/stall signatures; observe action timing and retry budget. | Derate occurs before lockout; retry backoff honored; lockout only after budget/strong signature. |
| Comfort wait-time | Start from cold loop; record time until Treturn reaches target. | Wait-time meets target band; Treturn curve matches expected ramp profile. |
| Return temperature stability | Run maintain mode; vary ambient and draw events; log Treturn. | Treturn stays within band; no oscillation; hysteresis behaves as designed. |
| Energy estimate (Wh/day) | Measure VIN current over day profile; integrate sleep/sample/run periods. | Energy model matches measured integration within acceptable error; dominates identified states. |
| Acoustic noise | Measure at fixed distance; sweep speeds; note “quiet bands”. | Noise stays under threshold in intended operating band; no prominent tonal peaks at target speeds. |
| Start-cycle endurance | Automate repeated start/stop cycles; track failures and temperature. | Failure rate below limit; no progressive drift in current/temperature signatures. |
| Thermal cycling | Cycle ambient/board temperature; repeat functional tests at extremes. | Thresholds and protections remain stable; no unexpected resets; calibration remains valid. |
| Moisture stress (device-level) | Operate in humid environment; observe leakage/false triggers. | No false faults; insulation-related anomalies are logged and handled safely. |
| Repeated fault recovery | Alternate dry-run/stall-like events with normal runs; track lockout triggers. | Recoveries follow ladder; lockout occurs only on persistent strong signatures; manual clear works. |
| ESD spot check (device-level) | Apply ESD to user-accessible points; monitor P2 (MCU rail) and reset cause. | No uncontrolled latch-up; no unsafe motor resume; safe recovery or controlled lockout + log. |
| EFT/burst spot check (device-level) | Inject burst on input cable; observe P1/P2/P4 simultaneously. | Controller stays stable or recovers safely; no repeated reset loop; evidence captured in event log. |
Evidence-first mindset: log state transitions, retry counters, and reset causes. During validation, capture at least VIN ripple (P1), MCU rail (P2), and SW node (P4) together to correlate symptoms with coupling paths.
Figure F9 Test wiring and measurement map: programmable supply, thermal/NTC simulation, motor load, and probe points.
ICNavigator, “Hot Water Recirculation,” Fig. F9 (Validation Setup & Measurement Map), 2026.
H2-9. EMC & Ruggedness (Device-Level) — Motor EMI, ESD/Surge Evidence Points
When the pump starts, the power stage can inject noise through three dominant paths: dv/dt coupling from the switching node, ground bounce from high-current return loops, and conducted ripple on the input rail. The fastest route to a fix is to measure a small set of probe points and map the symptom to a coupling path.
Device-level boundary: this chapter focuses on board-level evidence points and first hardware fixes that stop resets and RF dropouts. It does not expand into compliance standards or full certification workflows.
Noise paths → what is seen → first fixes
- dv/dt (SW node ringing) → radiated/capacitive coupling: RF dropouts worsen at pump PWM edges; near-field noise rises near antenna/XTAL. First fixes: gate resistor / slew control, snubber, reduce loop area around bridge, keep SW copper tight.
- Ground bounce → reference shift: ADC codes jump, false faults, MCU brownouts correlate with pump current. First fixes: split motor return vs logic ground, single controlled join, relocate sense returns, add local decoupling at MCU/RF.
- Conducted ripple on VIN → rail dips: MCU resets at start/stop events; 3.3V droops track VIN spikes. First fixes: TVS + π filter, common-mode choke (as needed), improve input bulk and placement, tighten UVLO/BOR behavior.
Top 6 probes (start here)
- P1 — VIN at board entry: ripple spikes at pump start/stop; use edge-trigger on pump enable.
- P2 — MCU 3.3V (or 1.8V) at MCU pins: dips / ringing; correlate with BOR/reset cause.
- P3 — Driver supply rail: sag or oscillation during commutation and PWM updates.
- P4 — SW/phase node: dv/dt level and ringing; compare “pump on” vs “pump off”.
- P5 — Current sense + its return: ground bounce contaminating thresholds and fault logic.
- P6 — RF near-field (antenna/XTAL zone): noise rise when pump runs; track with RSSI dropouts.
Minimal rugged hardware stack (board-level)
- Input protection: TVS sized for the input environment; clamp path kept short to return.
- Input filtering: π filter or CM choke where cable-borne noise dominates; place close to entry.
- Switch-edge control: gate resistor and/or snubber to reduce ringing and radiated coupling.
- Domain separation: motor power/return separated from always-on logic; controlled join point.
- Sensitive rails: RF/MCU local LDO + tight decoupling; keep noisy returns away from references.
Figure F8 Device-level coupling map with noise arrows and probe points P1–P6.
ICNavigator, “Hot Water Recirculation,” Fig. F8 (EMI Coupling Map & Probe Points), 2026.
H2-10. Validation Plan — Test Matrix for Comfort, Energy, Noise, and Faults
Validation should prove that the controller behaves correctly across modes, meets comfort/energy/noise targets, survives repeated edge cases (dry-run, stall, cavitation-like conditions), and remains stable under device-level ESD/EFT pre-checks. The matrix below is organized by Test item → Setup → Pass/Fail evidence.
Coverage categories
- Functional: mode switching, hysteresis, fault ladder, brownout recovery.
- Performance: wait time, return temperature stability, energy estimate, acoustic noise.
- Reliability: start-cycle endurance, thermal cycling, moisture stress, repeated fault recovery.
- EMC pre-check: device-level ESD/EFT spot checks with evidence of no resets or safe recovery.
Test Matrix — Test item → Setup → Pass/Fail evidence
| Test item | Setup | Pass/Fail evidence |
|---|---|---|
| Mode switching (timer/demand/hybrid) | Inject local triggers; vary schedule slots; log state transitions. | State machine follows expected path; no short-cycling; transitions timestamped and repeatable. |
| Hysteresis & anti-short-cycle | Sweep Treturn around thresholds; enforce Ton_min/Toff_min settings. | No chatter near thresholds; run/stop decisions stable across multiple sweeps. |
| Brownout recovery | Create controlled VIN dips; monitor reset cause and rails. | Motor stops first; controller returns to IDLE/cooldown; no unintended “resume pumping”. |
| Fault ladder execution (derate → retry → lockout) | Force dry-run/stall signatures; observe action timing and retry budget. | Derate occurs before lockout; retry backoff honored; lockout only after budget/strong signature. |
| Comfort wait-time | Start from cold loop; record time until Treturn reaches target. | Wait-time meets target band; Treturn curve matches expected ramp profile. |
| Return temperature stability | Run maintain mode; vary ambient and draw events; log Treturn. | Treturn stays within band; no oscillation; hysteresis behaves as designed. |
| Energy estimate (Wh/day) | Measure VIN current over day profile; integrate sleep/sample/run periods. | Energy model matches measured integration within acceptable error; dominates identified states. |
| Acoustic noise | Measure at fixed distance; sweep speeds; note “quiet bands”. | Noise stays under threshold in intended operating band; no prominent tonal peaks at target speeds. |
| Start-cycle endurance | Automate repeated start/stop cycles; track failures and temperature. | Failure rate below limit; no progressive drift in current/temperature signatures. |
| Thermal cycling | Cycle ambient/board temperature; repeat functional tests at extremes. | Thresholds and protections remain stable; no unexpected resets; calibration remains valid. |
| Moisture stress (device-level) | Operate in humid environment; observe leakage/false triggers. | No false faults; insulation-related anomalies are logged and handled safely. |
| Repeated fault recovery | Alternate dry-run/stall-like events with normal runs; track lockout triggers. | Recoveries follow ladder; lockout occurs only on persistent strong signatures; manual clear works. |
| ESD spot check (device-level) | Apply ESD to user-accessible points; monitor P2 (MCU rail) and reset cause. | No uncontrolled latch-up; no unsafe motor resume; safe recovery or controlled lockout + log. |
| EFT/burst spot check (device-level) | Inject burst on input cable; observe P1/P2/P4 simultaneously. | Controller stays stable or recovers safely; no repeated reset loop; evidence captured in event log. |
Evidence-first mindset: log state transitions, retry counters, and reset causes. During validation, capture at least VIN ripple (P1), MCU rail (P2), and SW node (P4) together to correlate symptoms with coupling paths.
Figure F9 Test wiring and measurement map: programmable supply, thermal/NTC simulation, motor load, and probe points.
ICNavigator, “Hot Water Recirculation,” Fig. F9 (Validation Setup & Measurement Map), 2026.
H2-11. Field Debug Playbook — Symptom → Evidence → Isolate → Fix
This playbook is designed for fast field isolation using a small, repeatable evidence set: VIN ripple (P1), MCU rail dips (P2), pump current (Ipump), and return temperature slope (dT/dt). Each symptom below uses the same four-block SOP format to prevent guess-based troubleshooting.
Common evidence set (recommended)
- P1: VIN at board entry (ripple spikes at start/stop).
- P2: MCU 3.3V/1.8V at pins (rail dips / BOR signature).
- Ipump: pump phase/bus current (startup peak + steady pattern).
- Treturn: return temperature ADC code vs time (dT/dt).
- Optional: SW node ringing (P4), current-sense return bounce (P5), RF near-field (P6).
SOP No recirculation / very short runs
First 2 measurements
- Ipump: capture startup peak and 2–3 seconds of steady current.
- Treturn dT/dt: log temperature ADC code for 30–120 seconds.
Discriminator
- High Ipump + no speed build → likely stall / overcurrent protection action.
- Normal Ipump + very weak dT/dt → likely threshold/lag window too aggressive or temperature sensing placement/filtering issue.
- Low Ipump + no meaningful dT/dt → likely dry-run signature or air/prime not established (treat as dry-run evidence).
First fix (minimum cost)
- Enforce Ton_min and thermal-lag window before judging “no heat movement”.
- Raise hysteresis and add N-of-M confirmation to avoid single-sample aborts.
- If stall evidence is present: add/verify current limit slope and retry backoff.
MPN examples (hardware levers)
TI INA240A1TI INA181A1Vishay WSL2512 0.01Ω
TI DRV10983TI DRV8323RSST STSPIN233
Murata BLM31PG600SN1LWürth 74438357006
Prevent
- Log: run attempts/day, abort reason, retry counters; flag repeated short-run patterns.
- Lockout only after persistent evidence (avoid “single spike” lockouts).
SOP Very noisy / resonance at certain speeds
First 2 measurements
- Ipump ripple: compare RMS/peak-to-peak at “quiet speed” vs “noisy speed”.
- SW ringing (P4) or driver PWM edge behavior: capture at the same operating points.
Discriminator
- Noisy band + Ipump ripple rises sharply → likely resonance / cavitation-like operating region; prefer speed avoidance.
- Noisy across many speeds + large SW ringing → edge too aggressive; likely EMI/mechanical excitation from dv/dt.
First fix (minimum cost)
- Create “quiet bands”: avoid narrow speed regions that consistently spike ripple/noise.
- Add soft-start slope and limit acceleration (reduces excitation).
- If SW ringing dominates: increase gate resistor or add RC snubber near bridge.
MPN examples (edge & drive)
TI DRV8323RSTI DRV8301ST L6234
Nexperia PSMN2R8-30YLCInfineon BSC009NE2LS5I
Vishay CRCW1206 10ΩTDK C3216X7R1H104K
Prevent
- Maintain a speed-to-noise map in validation; keep it stable across temperature and supply variation.
- Do not treat “noise-only” as a fault unless paired with current/temperature evidence.
SOP Reboots when the pump starts
First 2 measurements
- P2: MCU 3.3V/1.8V dip magnitude and duration (look for BOR/POR signature).
- P1: VIN ripple spikes synchronized with pump enable.
Discriminator
- P2 dips first → power tree / decoupling / UVLO/BOR issue (conducted event).
- P2 stable but RF drops / logic glitches → dv/dt coupling / ground bounce / return-path contamination.
First fix (minimum cost)
- Harden always-on rail: add local bulk + low-ESR decoupling at MCU/RF; confirm BOR threshold behavior.
- Strengthen input entry: TVS + π filter near connector; shorten clamp/return path.
- Reduce source noise: gate resistor / snubber and keep SW copper compact; separate power return from logic return.
MPN examples (PI & input)
TI LM5163ATI TPS54202MPS MP1584EN
TI TPS7A2033Microchip MCP1700-3302E
Littelfuse SMBJ33ADiodes Inc. SMBJ33A
Würth 744232101Murata BLM41PG600SN1L
Panasonic EEE-FK1V470P
Prevent
- Record reset cause and rail min/max during pump start events (field log).
- Use controlled join for grounds and keep current-sense return away from MCU reference.
SOP Temperature reading jumps / false triggers
First 2 measurements
- Treturn ADC raw vs filtered: capture with pump off and pump on.
- Ipump: check if temperature spikes are time-aligned with commutation/PWM edges.
Discriminator
- Spikes only when pump runs → EMI injection / sampling phase problem / ground reference shift.
- Spikes even when pump is off → sensor wiring/contact, divider/reference instability, moisture leakage paths.
First fix (minimum cost)
- Move ADC sampling away from switching edges; add digital median/outlier reject.
- Add/verify RC at sensor input and ensure sensor return is routed to a quiet reference point.
- Increase hysteresis and require multi-window confirmation before mode switching.
MPN examples (sensing chain)
Murata NCP18XH103F03RBVishay NTCLE100E3103JB0
Vishay PTS0805 1kTI OPA333AIDBVR
Yageo RC0603FR-0710KLTDK C1608X7R1H104K
Prevent
- Self-check on boot: detect open/short sensor; log drift rate and flag abnormal patterns.
- Keep sensor harness and ADC reference separated from motor power loops.
SOP Energy use unexpectedly high
First 2 measurements
- Run-time counters: minutes/day in PREHEAT/MAINTAIN and number of starts/day.
- Trigger attribution: count timer vs demand vs learning decisions.
Discriminator
- Too many starts/day → thresholds too sensitive or anti-short-cycle missing; likely false “need heat” triggers.
- Long maintain time → temperature band too tight; hysteresis too small; learning schedule too aggressive.
First fix (minimum cost)
- Increase Toff_min, widen temperature band, and require confirmed dT/dt before entering MAINTAIN.
- Cap daily run-time and add a local “energy guard” rule to prompt retuning.
- If always-on current is high: audit sleep state and sensor duty cycle (budget by states).
MPN examples (always-on efficiency)
ST STM32L072KZST STM32L052K8NXP MKL03Z32
TI TPS62743TI TPS7A0233
Microchip MCP7940NABLIC S-1335A33
Prevent
- Persist daily counters and expose a simple field report: starts/day, run-min/day, abort reasons/day.
- Keep learning local and bounded (no platform dependency); clamp schedule aggressiveness.
Figure F6 Decision tree for fast isolation using “Pump spins → Current normal → dT/dt present”.
ICNavigator, “Hot Water Recirculation,” Fig. F6 (Field Debug Decision Tree), 2026.
H2-12. FAQs (Accordion ×12) — Evidence-First, No Scope Creep
Rules
- Each answer starts with the first 2 checks, then a discriminator, then a first fix.
- Each answer ends with a strict chapter backlink: “→ Back to H2-x”.
- No plumbing tutorial, no water-heater internals, no cloud/platform deep dive.
Q1 Timer is ON but hot water is still slow — check temperature point or pump speed first?
TI DRV10983TI DRV8323RSTI INA240A1
Q2 Wi-Fi/Thread drops as soon as the pump starts — which two waveforms first?
TI LM5163ATI TPS7A2033Littelfuse SMBJ33AMurata BLM41PG600SN1L
Q3 False starts at night — overly sensitive threshold or drifting occupancy input?
Q4 Pump is very hot but there is no effect — dry-run or stall, and how to prove it?
Q5 Hot water arrives but cools quickly — hysteresis tuning or recirculation strategy?
Q6 Noise happens only at one speed band — PWM setting or mechanical resonance first?
Vishay CRCW1206 10ΩTDK C3216X7R1H104KTI DRV8323RS
Q7 Temperature display jitters — ADC sampling problem or motor EMI injection?
Murata NCP18XH103F03RBTI OPA333AIDBVRTDK C1608X7R1H104K
Q8 Power consumption is high — check recirculation duty cycle or pump efficiency point?
Q9 Frequent reboots — UVLO first or ground bounce first?
TI TPS54202Würth 744232101Littelfuse SMBJ33A
Q10 Button press does nothing — debounce issue or wake domain issue?
Q11 Flow reading is not trustworthy — low-flow resolution or bubbles/noise?
Q12 Worse behavior in winter — temperature-delta policy or supply sag first?
Figure F10 FAQ-to-chapter map (each Q maps back to evidence chapters).
ICNavigator, “Hot Water Recirculation,” Fig. F10 (FAQ → Chapter Map), 2026.