Smart Floor Heating: Temperature Sensing & SSR Power Control
← Back to: Smart Home & Appliances
Center Idea: Smart floor heating is a slow, high-inertia control system—stable comfort comes from trusted temperature sensing and measured power switching, not fast “on/off.” Use an evidence chain (T_floor/T_air curves + SSR_GATE/Vcc signals) to prevent overshoot, keep zones consistent, and diagnose reboots, alarms, and wireless dropouts quickly.
H2-1 — Set the Page Boundary & Center Idea (Make the Promise Clear)
Center idea: Smart floor heating is a slow thermal system. Stable comfort comes from trusted temperature sensing and predictable power delivery, validated by two evidence streams: temperature curves and power switching / load behavior.
This page stays device-side and focuses on the complete engineering loop: floor/air temperature sensing → SSR/relay power control → anti-overshoot control strategy → zoning & schedules → wireless coexistence under heating noise → validation & field debug evidence.
- A device boundary map with numbered test points that can be reused in validation and field SOP.
- A consistent way to separate sensor vs power-stage vs safety failures using minimal measurements.
- Control strategy guidance for slow thermal inertia: window, deadband, floor cap, and how to validate with curves.
- Field-debug mindset: symptom → evidence → isolate → first fix (no “app tutorial” content).
Hub-level architecture, server-side services, whole-home energy panels, deep utility programs, and building-code walkthroughs. (This page remains device-side and evidence-based.)
Evidence rule: Any comfort claim must tie back to at least one measurable artifact: T-floor / T-air curve and/or power switching / load behavior. If no evidence is available, the section belongs in troubleshooting—not in “design conclusions.”
H2-2 — Architecture: Separate Sense Chain vs Power Chain (Plus Safety)
Most field failures become solvable once the system is split into three chains: Sense chain (temperature credibility), Power chain (actual heating delivery), and Safety chain (hardware cutoffs that remain effective under MCU faults). The goal is not “more features,” but minimum observability: a small set of signals that prove what is happening.
Engineering rule: Comfort issues should never be diagnosed from UI text alone. Prove “what the floor is doing” with TP1/TP2 curves, and prove “what power is delivered” with TP3/TP4 timing (and TP5 if available). If stability breaks during heating, correlate with TP6 rail behavior.
- Floor probe + ambient sensor → ADC front-end → filtering / plausibility checks.
- MCU control → schedule / zone state machine → SSR/relay drive decision.
- Power stage (zero-cross / triac / relay / SSR) → heating mat/cable load.
- Aux PSU rails (3V3/5V) → brownout / watchdog reset handling.
- Safety chain (over-temp policy + independent cutoff elements) → fault latch behavior.
- Device-side wireless → coexistence under switching noise (evidence: retries/RSSI vs heating).
What: Floor/air temperature → ADC → filter → validated temperature.
Failure mode: probe placement error, noise-injected readings, drift, open/short.
Evidence: TP1/TP2 jitter or implausible slope vs heating state; stable UI but unstable raw codes.
First fix: placement + wiring separation; RC/median filter; sample away from switching edges.
What: SSR/relay timing → AC switching → load heating power.
Failure mode: false triggering (dv/dt), overheating, partial conduction, wiring faults.
Evidence: TP3/TP4 mismatch; TP5 shows missing power under “ON” state; abnormal temperature rise rate.
First fix: zero-cross timing; snubber strategy; thermal path; verify load current presence.
What: Hardware cutoffs + fault latch keep the system safe during MCU/PSU faults.
Failure mode: stuck-on power stage, missing cutoff, unsafe restart state.
Evidence: TP1 rises while control indicates “OFF”; FAULT state repeats after restart; cutoff not effective.
First fix: independent cutoff element; fail-safe default OFF; fault latch until manual clear.
| TP# | Signal | Typical tool | What it proves | Common pitfall |
|---|---|---|---|---|
| TP1 | T_floor (floor probe) | ADC log / DMM | Floor thermal state & slope | Placement error looks like “control bug” |
| TP2 | T_air (ambient) | ADC log | Room response vs floor cap behavior | Slow air dynamics misread as overshoot |
| TP3 | SSR_GATE / relay drive | Oscilloscope | Commanded switching intent | Looks correct even if power stage misbehaves |
| TP4 | AC_ZC (zero-cross) | Oscilloscope | Switching timing reference | Wrong reference causes extra EMI / heating jitter |
| TP5 (opt.) | I_load / power presence | Clamp / current sense | Actual delivered heating power | Partial conduction can hide as “some current” |
| TP6 | Vcc rails (3V3/5V) | Oscilloscope | Brownout / reboot correlation to heating | Probe ground loop can lie—keep leads short |
H2-3 — Temperature Sensing: Probe Type, Placement, ADC Front-End & Noise Immunity
Core principle: Temperature “accuracy” in floor heating is a system property. Credible control depends on thermal coupling (where the probe actually measures), electrical integrity (how the signal survives switching noise), and sampling discipline (not turning transient spikes into false temperature).
This section covers only floor/air temperature inputs used by a floor-heating controller. It intentionally avoids IAQ sensors (PM/VOC/CO₂) and any cloud-side analytics.
Use a floor probe position that represents the controlled mass. Prefer a protective sleeve/tube that enables replacement and stabilizes coupling. Avoid direct proximity to heating wire paths.
Keep probe wiring away from mains/SSR switching loops. Use a simple RC input network and stable divider resistors; treat long cable runs as noise antennas.
Align sampling away from switching edges when possible. Prefer median/clamp for impulsive spikes, then use light low-pass smoothing to avoid adding control lag.
Implement open/short detection thresholds and plausibility checks (slope limits, impossible jumps). Fault states should drive fail-safe heating behavior and clear operator messaging.
| Decision axis | NTC (e.g., 10k / 50k) | RTD (PT100 / PT1000) | What to validate (evidence) |
|---|---|---|---|
| Long cable runs | Often practical with divider + ADC; noise immunity must be engineered. | Lead resistance can become a dominant error source (especially PT100). | Raw code stability vs SSR switching; compare reading drift vs cable length. |
| Linearity & calibration | Nonlinear; needs curve/segment mapping. | More linear; measurement can be more complex. | Two-point or segmented fit vs reference; verify across operating range. |
| Noise & sampling | Susceptible to injected spikes; median/clamp often effective. | Susceptible to pickup in lead wires; requires robust measurement method. | Correlate spikes with SSR edges (TP3/TP4); quantify jitter in steady state. |
| Fault detect | Open/short produces saturation codes; easy to threshold. | Open/short behavior depends on excitation/meas scheme. | Inject open/short and confirm safe state + clear diagnosis. |
Sensor tolerance • divider tolerance • Vref drift • ADC INL/DNL • lead resistance (RTD).
Typical symptom: smooth curve, consistent bias.
Switching noise injection (SSR dv/dt) • sampling at edges • cable pickup • ground return coupling.
Typical symptom: spikes/jitter that trigger control oscillation.
Common installation pitfalls: probe not in a sleeve/tube • probe placed too close to heating wires • incorrect depth • probe cable routed with mains/SSR wiring. These often present as “control bugs” but are proven by temperature spikes that correlate with switching edges.
H2-4 — Control Strategy: Slow Thermal Inertia Without Hot-Foot, Oscillation, or Overshoot
Control framing: Floor heating is dominated by thermal inertia and time delay. Aggressive switching rarely creates faster comfort; it more often creates overshoot, switching stress, and measurement-noise-driven chatter. The objective is stable comfort with predictable evidence.
Primary regulation uses T_floor. This protects foot comfort and floor materials. Verification: T_floor reaches target with controlled slope; T_air may lag in high heat-loss rooms.
Primary regulation uses T_air with a floor temperature cap. Verification: T_air converges without driving T_floor beyond cap; cap prevents hot-foot during long calls.
Simple and robust. Risk: larger temperature ripple. Evidence: sawtooth curve amplitude tracks deadband width.
Uses a fixed window (e.g., minutes) and modulates duty to smooth delivered heat. Evidence: reduced ripple without frequent relay chatter.
Eliminates steady-state error; must prevent integral windup. Evidence: no persistent heating after crossing setpoint; overshoot stays bounded.
Tuning flow: choose a window that avoids chatter → set deadband/cap to prevent hot-foot → add min on/off and ramp limits to reduce stress → apply conservative cold-start behavior to avoid first-rise overshoot.
| Parameter | What it controls | Too small / too short | Too large / too long | Evidence to watch |
|---|---|---|---|---|
| Window length | How often duty can change | Chatter, EMI, visible ripple | Sluggish response | T curve ripple frequency vs duty steps |
| Deadband | On/off sensitivity | Switching too frequent | Large temperature ripple | Sawtooth amplitude |
| Floor cap | Foot comfort & protection | May limit air comfort | Hot-foot risk | T_floor plateau vs T_air convergence |
| Min on/off | Switching stress reduction | Wear, audible relay noise | Delayed correction | Switch count per hour |
| Ramp / slope limit | How fast duty rises | Overshoot on cold start | Slow warmup | Cold-start overshoot and settling time |
H2-5 — Power Actuation: Choosing SSR / Triac / Relay Without Field Failures
Objective: A power stage is “good” only when it stays predictable in real wiring: it must switch when commanded, stay off when commanded, manage heat, and minimize dv/dt-triggered false turn-on and EMI. The most common field failures cluster into three buckets: false triggering, overheating, and EMI coupling into low-voltage domains.
| Actuator | Switching frequency | Heat & thermal design | Noise / comfort | False trigger sensitivity | EMI profile | Best fit (typical) |
|---|---|---|---|---|---|---|
| Relay | Low (avoid frequent time-proportional toggling) | Low conduction loss; watch contact heating under stress | Audible clicks | Low dv/dt sensitivity; contacts can bounce or weld under abuse | Moderate; contact arcing can be noisy during abuse | Simple on/off with long min on/off; cost-focused designs |
| Triac + driver | Medium; depends on trigger strategy | Moderate; conduction drop causes heating | Silent | dv/dt + holding-current behaviors can cause unexpected conduction patterns | Can be high for random turn-on/phase methods | When controlled turn-on is engineered and verified by waveforms |
| Zero-cross SSR | Medium; still avoid unnecessary fast toggling | Often the dominant concern; requires heat path planning | Silent | Lower EMI than random turn-on; still has leakage and dv/dt limits | More friendly under proper layout and wiring | Time-proportional control with comfort stability focus |
Symptom: heating appears when command is OFF, or intermittent warm patches without schedule.
Evidence: correlate unexpected conduction with switching edges (AC_ZC / drive timing) and wiring proximity.
First fixes: reduce dv/dt injection loops, verify gate drive network, add snubber only when waveforms justify it.
Symptom: SSR/triac case runs hot, drift and intermittent faults after warm-up.
Evidence: estimate P ≈ V_on × I_rms, then measure case temperature rise by thermocouple/IR.
First fixes: improve heat path (copper area, thermal interface, airflow), reduce unnecessary toggling.
Symptom: sensor spikes, resets, wireless dropouts only during heating transitions.
Evidence: supply dip/ground bounce aligned to switching edges; temperature ADC spikes aligned to SSR gate events.
First fixes: enforce wiring separation, shorten hot loops, strengthen low-voltage decoupling and reset immunity.
Driver chain: MCU output → isolation/optocoupler (as required) → gate/input resistor → actuator input. Device-side surge control: use MOV/TVS only where they protect the product entry and are backed by layout discipline. Snubber: apply only when needed, proven by waveform ringing or dv/dt misbehavior—avoid adding components blindly.
Use actuator conduction drop at operating current. Treat it as continuous loss during ON windows.
Measure case temperature after steady operation. Use consistent placement and time-to-steady-state.
If rise is high, upgrade heat path or derate; if only under edge events, prioritize EMI/trigger debugging.
H2-6 — Safety Chain: Over-Temperature, Leakage Symptoms, and Hardware Fail-Safes
Safety principle: Safety is not a feature; it is a stack. The chain must remain effective even when the MCU is wedged, the supply browns out, or the power device fails. Build protection so that the default fault outcome is heating OFF, and confirm each layer by evidence.
Trigger: T_floor approaches cap or rises faster than allowed.
Action: reduce duty / switch OFF.
Evidence: duty decreases while temperature curve forms a controlled plateau near the cap.
Trigger: local temperature exceeds independent threshold.
Action: physically opens the power path in series.
Evidence: load power stops even if MCU drive remains asserted.
Trigger: command OFF but temperature continues to climb abnormally, or load current persists with duty = 0.
Action: latch fault state; require manual intervention to resume.
Evidence: “state vs physics” contradiction in logs and traces.
What to cover here: external RCD/GFCI behavior is observed as power loss events. Typical symptoms include trips on heater enable, intermittent trips during switching, and resets aligned with heating transitions. Device evidence should focus on brownout counters, reset reasons, and fault logs—without turning this page into a regulatory tutorial.
Keep mains and low-voltage domains physically and electrically separated. Maintain clear routing zones and controlled return paths.
On reset, watchdog, or sensor fault, the power output must move to OFF. Avoid ambiguous transitional states.
Over-temperature or runaway detection should latch to prevent oscillating failures. Recovery should require explicit operator confirmation.
H2-7 — Power Integrity & Immunity: Why Heating Causes Reboots or Dropouts
Core idea: Many “software-like” failures during heating are actually power-tree evidence. Switching edges and mains disturbances can cause rail sag, ground bounce, or brownout resets. The fix starts by proving causality: rail waveform aligned to SSR switching timing, plus reset reason and radio reconnect counters.
- M1 — Vcc rail at the load: probe 3V3/5V close to MCU or radio power pins (not only PSU output).
- M2 — Switching reference: capture SSR_GATE or AC_ZC on the same timebase to correlate edge events.
Strong evidence: a repeatable Vcc dip or spike that occurs at each switching edge, with resets or dropouts clustered at the same timestamps.
Wireless modules often have tighter transient limits. A small sag can cause silent link loss and reconnect storms before a full reset.
Brownout can look like random firmware instability. Confirm with reset reason (BOR) and watchdog counters, then fix rail margin.
Ground bounce and switching injection can corrupt ADC readings and touch/UI states, triggering control oscillations and user-visible glitches.
Ensure rails stay above UVLO during switching edges. Log brownout/reset reasons and reconnect counters to close the evidence loop.
Bulk helps only if placed where transient current is demanded. Validate by reduced dip depth and shorter recovery time on Vcc.
Separate noisy power loops from sensitive domains. Ground bounce is proven when ADC noise and radio drops align with switching edges.
Use device-side protection and layout discipline to limit injected disturbances. Confirm with fewer rail excursions under edge events.
Pass condition: Heating transitions do not cause measurable Vcc excursions at MCU/radio pins, reset reasons remain clean, and reconnect counters do not spike during switching.
H2-8 — Zoning & Scheduling: Consistency Across Rooms (Beyond “App Features”)
Core idea: Zoning is a control and evidence problem. The same setpoint can feel different across rooms because thermal resistance, thermal mass, and probe coupling differ. Use a single evidence metric—dT/dt at the same duty—to explain inconsistency and to justify per-zone parameters.
Best controllability. Each zone has direct floor feedback; consistency tuning is straightforward. Evidence: compare dT/dt distribution across zones at equal duty windows.
Lower sensor cost but weaker floor insight. Use conservative caps and longer windows to avoid overshoot. Evidence: stability vs comfort tradeoff appears in settling time and ripple amplitude.
Requires strict safety limits and slow control. Consistency is maintained by conservative windows and min on/off rules. Evidence: use rise-time and switching-count constraints to avoid chatter and hotspots.
- Preheat lead time: choose based on measured rise time (time-to-comfort), not guesswork.
- Hold strategy: long window + min on/off reduces switching stress and improves comfort stability.
- Night comfort: cap floor temperature and avoid rapid toggling to reduce noise and interference.
Metric: compare dT/dt across zones under the same duty window.
Interpretation: different slopes imply different coupling/installation/thermal paths.
Action: tune per-zone parameters (window, deadband, cap, min on/off) instead of forcing one global setting.
Goal: comfort at wake-up.
Strategy: preheat lead time + stable hold window.
Parameters: longer window, conservative cap, clear min on/off.
Validation: time-to-comfort, overshoot size, switching count per hour.
Goal: quick recovery without hotspots.
Strategy: controlled boost then settle into hold.
Parameters: ramp limits, avoid short windows that chatter.
Validation: settle time, ripple amplitude, zone-to-zone slope alignment.
Goal: steady comfort, low disturbance.
Strategy: stable hold + strict cap.
Parameters: longer window, reduced transitions, strong min on/off.
Validation: no chatter, no radio drops, consistent floor curve.
H2-9 — Connectivity Coexistence: Why Wireless Gets Worse Only During Heating
Core idea: “Heater on → wireless worse” is usually device-side coupling, not a cloud or gateway problem. The correct approach is to prove correlation: RSSI/PER/retry counters versus SSR duty and switching edges, then trace the coupling path (high di/dt loop → rails/ground → RF front-end).
- Sync reference: capture SSR_GATE or AC_ZC timestamps.
- Wireless metrics: RSSI, PER / retry rate, disconnect reason, reconnect counter.
- Power noise hints: radio rail ripple or short rail dips during switching transitions.
High confidence diagnosis: retries or dropouts cluster at specific switching edges or at certain duty bands (low/medium/high).
Switching noise raises ripple on the radio rail or 3V3. The RF front-end loses margin and retry rate climbs. Evidence: rail ripple rises with duty, retries rise with duty.
High-current returns inject noise into the RF/baseband reference. Evidence: ADC noise and wireless errors increase together at edges.
Noisy nodes or long harnesses near the antenna cause receiver desense. Evidence: RSSI looks stable but PER/retries spike.
Symptom
Only heating Link drops occur only when duty window toggles.
Evidence
Reconnect spikes align with SSR_GATE edges; radio rail shows short dips.
Likely cause
Rail margin + switching injection into radio supply.
First fix
Stiffen radio rail locally; reduce edge injection; validate ripple reduction and fewer retries.
Symptom
Specific power band Worst at medium or high duty, fine at low duty.
Evidence
PER/retry rate rises monotonically with duty; Vcc ripple increases with load.
Likely cause
Load-dependent coupling (rails or returns) rather than random RF interference.
First fix
Improve partitioning/returns and rail decoupling; confirm retry-vs-duty curve flattens.
Symptom
RSSI ok RSSI looks normal but latency and retries spike.
Evidence
PER rises at switching edges; near-field probe shows noise near antenna keepout.
Likely cause
Receiver desense from near-field coupling; RF front-end loses effective SNR.
First fix
Enforce antenna keepout and move noisy nodes/loops away; re-test PER under heating.
Symptom
Edge-specific Drops happen only at certain switching transitions.
Evidence
Errors cluster at edge timestamps; changing window phase shifts the cluster.
Likely cause
Switching event collides with RF critical timing window.
First fix
Apply switching synchronization (avoid RF critical slots); validate error clusters disappear.
Scope note: If the issue is proven to correlate with switching duty/edges, prioritize device-side coupling paths. Avoid jumping to gateway/cloud assumptions until the correlation evidence fails.
H2-10 — Validation Test Plan: Maximum Coverage with Minimal Instruments
Core idea: A compact validation plan should cover the highest risks—thermal behavior, power stage stress, immunity to disturbances, and safety under sensor faults—while producing consistent logs that enable fast field triage.
Thermocouple or contact probe; optional thermal camera for SSR and enclosure hotspots.
Oscilloscope/recorder for Vcc rails + switching reference (SSR_GATE or AC_ZC).
Serial/event logs for reset reason, reconnect counter, fault flags, and control state.
Control T_floor, T_air, duty, window length, min on/off, cap, zone ID
Power Vcc_3V3/5V, (optional) radio rail, switching edge marker
Events reset reason (BOR/WDT), reconnect counter, fault flags, safety state (fail-safe off / latched)
Test item
Step response: cold start → target comfort (floor-limited and/or air-controlled with cap)
Method
Apply a setpoint step; record T_floor/T_air curve and duty over time; repeat for at least two zones or two simulated thermal paths.
Pass
Overshoot bounded; stable settling without chatter; consistent rise-time behavior per zone after tuning.
Log fields
T_floor T_air duty window zone
Test item
Steady-state hold: ripple and switching count
Method
Hold near setpoint for extended period; count switching transitions per hour; track ripple amplitude.
Pass
No oscillation; acceptable ripple amplitude; switching not excessive for the chosen actuator.
Log fields
duty min_on_off switch_count
Test item
SSR temperature rise across duty bands
Method
Run low/medium/high duty profiles; measure SSR case temperature trend (thermocouple or IR).
Pass
No runaway rise; temperature stabilizes under steady conditions; margins remain under worst-case ambient.
Log fields
duty switch_count T_case
Test item
Brownout / rail sag immunity during heating transitions
Method
Induce controlled rail stress (edge-heavy transitions); capture Vcc + SSR reference; observe reset reasons and reconnect counters.
Pass
No resets; no reconnect storms; if reset occurs, recovery enters fail-safe off and does not re-energize unexpectedly.
Log fields
Vcc SSR_GATE/AC_ZC reset_reason reconnect safety_state
Test item
ESD / EFT / surge (device-side injection points)
Method
Exercise entry points: AC entry, enclosure, probe cable; track resets and safe recovery behavior.
Pass
No unsafe output; faults are captured; system returns to known safe state after disturbance.
Log fields
fault_flags reset_reason safety_state
Test item
Sensor open / short / drift injection
Method
Inject open/short; simulate drift with offset; verify detection thresholds and safe-state behavior.
Pass
Fault is detected quickly; output goes fail-safe off or enters strict limit mode; user-visible alarm is triggered.
Log fields
sensor_status fault_flags safety_state
Critical pass condition: After any reset or disturbance, the system must recover into a safe output state (fail-safe off or strict limited mode), never energizing heating unexpectedly.
H2-11 — Field Debug SOP: Symptom → Evidence → Isolate → Fix
Field failures often look like “software bugs” but resolve faster by proving correlation between temperature curves, switching timing, and rail integrity. This SOP uses a repeatable 4-step template with two measurements first, then a single discriminator to isolate root cause.
Template (use for every symptom): First 2 measurements → Discriminator → Likely root cause (ranked) → First fix (fast)
T_floor, T_air, dT/dt (rise rate), overshoot, settling time
SSR_GATE timing, AC_ZC reference, duty/window length, min on/off, switch_count
Vcc_3V3/5V (and optional radio rail), reset_reason (BOR/WDT), fault flags, reconnect/retry counters
Symptom A — Temperature Overshoot / “Too Hot” Floor
Applies when: electric floor heating with time-proportional control (SSR/relay/triac). Not for water loops or boiler systems.
First 2 measurements
- T_floor curve (and T_air if available): log at 1–5 s interval.
- Duty + window length (or SSR_GATE timing) aligned to the same timeline.
Discriminator (one-shot)
- If overshoot grows when window is short or deadband is small, the control loop is “too eager” for a high-inertia system.
- If overshoot happens even with long windows, suspect probe thermal coupling (probe does not represent actual floor surface temperature).
Likely root cause (ranked)
- Probe placement / thermal coupling error: probe too close to heating wire, wrong embed depth, no sleeve/tube, or local hotspot.
- Control parameters too aggressive: short windows, tiny deadband, missing min on/off, no slope limiting.
- Wrong control mode: air-controlled without a strict floor cap, or cap set too high for flooring type.
First fix (fast)
- Containment: enforce a floor cap, increase deadband, add min on/off, and lengthen the time window.
- Permanent: rework probe placement (sleeve/tube, avoid heating wire adjacency), then re-tune using step response.
MPN examples (verify ratings & approvals)
Escalate/RFQ trigger: repeated overshoot after correct probe placement and tuning indicates thermal model mismatch or hardware limitations (actuator constraints, safety cap strategy, or required re-layout of sensing lines).
Symptom B — Temperature Display Jumps / Drifts
Goal: distinguish real thermal changes from sampling noise, coupling, or probe faults.
First 2 measurements
- Raw ADC code (pre-filter) and filtered temperature output in parallel.
- SSR edge reference (SSR_GATE or AC_ZC) to check edge-aligned noise bursts.
Discriminator (one-shot)
- If spikes cluster at switching edges, the issue is coupled noise (layout/grounding/timing).
- If drift persists without edge correlation, suspect probe aging, divider resistor drift, or reference drift.
Likely root cause (ranked)
- Probe cable picks up switching noise: probe routed with mains/heater lines; large loop area; no RC/guarding.
- Sampling timing issue: sampling during high dv/dt transitions; insufficient settling after switching.
- Probe connection intermittency: loose terminals; micro-cracks; moisture ingress causing leakage.
- Resistor/reference drift: divider resistor tempco or reference instability changes conversion gain.
First fix (fast)
- Containment: move sampling away from edges, add median filtering and debounced fault thresholds.
- Hardware quick-fix: add RC low-pass near ADC pin, shorten probe routing, separate from mains bundle.
- Permanent: improve routing (twist/route away from SSR loop), tighten grounding, adjust front-end impedance.
MPN examples (front-end & protection)
Symptom C — SSR Overheats / Fails
Focus: prove whether the dominant loss is conduction loss, triggering/EMI loss, or thermal path failure.
First 2 measurements
- SSR case temperature trend (T_case) during low/med/high duty.
- Load current presence (clamp meter) or power level estimate + switching count/hour.
Discriminator (one-shot)
- If T_case rises roughly with current and on-time, conduction loss + thermal path dominates.
- If T_case rises sharply with frequent switching, reduce switching (window/min on/off) and re-check.
Likely root cause (ranked)
- Thermal design insufficiency: no heatsink margin, poor mounting, enclosure hot spots.
- Underrated SSR/triac: current/ambient derating ignored; repetitive surge events.
- dv/dt false triggering: inadequate snubber/MOV leading to unintended conduction heating.
First fix (fast)
- Containment: increase time window, enforce min on/off, and limit maximum duty under hot ambient.
- Hardware quick-fix: add snubber and MOV where appropriate; improve heatsinking contact.
- Permanent: re-select SSR/triac with correct derating; redesign thermal path.
MPN examples (actuator & drive)
Safety note: Any suspected “stuck-on” behavior must default to fail-safe off via independent hardware cutoff (thermal fuse / safety thermostat chain).
Symptom D — Reboots When Heating Turns On/Off
This symptom is often a power integrity problem triggered by switching transients.
First 2 measurements
- Vcc rail at the MCU/radio pins (3V3/5V and optional radio rail), captured with edge timing.
- SSR_GATE or AC_ZC aligned to Vcc to prove cause/effect.
Discriminator (one-shot)
- If Vcc dips align with switching edges and reset_reason = BOR, the root cause is rail margin/return path.
- If no Vcc dip is visible but reset_reason = WDT, suspect firmware lockup triggered by EMI or brownout side-effects.
Likely root cause (ranked)
- Insufficient bulk + high di/dt return: switching transient injects ground bounce or rail droop.
- UVLO margin too tight: PSU collapses briefly under mains disturbance or load steps.
- Entry disturbance: surge/EFT coupling into PSU, causing short rail interruptions.
First fix (fast)
- Containment: reduce edge aggressiveness (switching rate), avoid rapid toggling, and log brownout events.
- Hardware quick-fix: increase local bulk near MCU/radio; improve return routing; add input suppression.
- Permanent: separate noisy power/returns from logic/radio domains; validate across worst-case mains events.
MPN examples (power integrity)
Symptom E — Wireless Drops Only During Heating
Objective: prove correlation between retries/PER and switching duty/edges; then isolate rail vs near-field coupling.
First 2 measurements
- Retries/PER/reconnect counter vs duty (log per minute or per window).
- SSR edge timestamps (SSR_GATE or AC_ZC) to detect edge-clustered failures.
Discriminator (one-shot)
- If failures cluster at edges, apply switching synchronization (phase shift). If clusters move or disappear, coupling is confirmed.
- If RSSI is stable but PER spikes, suspect receiver desense from near-field coupling (antenna keepout violation).
Likely root cause (ranked)
- Rail ripple into radio domain (radio rail decoupling/partitioning insufficient).
- High di/dt loop near antenna (near-field coupling, harness antenna effect).
- Timing collision (switching edge overlaps RF critical window).
First fix (fast)
- Containment: shift switching windows away from RF critical activity; reduce switching edge density.
- Hardware quick-fix: strengthen radio rail decoupling; enforce antenna keepout; route noisy loops away.
- Permanent: redesign return paths and partitioning; validate PER vs duty becomes flat.
MPN examples (EMI/ESD and radio rail)
Symptom F — One Zone Never Heats / Intermittent Heating
Goal: separate “control not commanding” vs “command present but no power delivery”.
First 2 measurements
- Zone SSR_GATE presence (or relay coil drive) for the affected zone.
- T_floor response (rise slope) within a controlled on-window.
Discriminator (one-shot)
- If drive is present but T_floor does not respond, suspect power path / wiring / load.
- If drive is absent, check sensor fault flags or cap/lockout conditions for that zone.
Likely root cause (ranked)
- Wiring/terminal issue: loose terminal, swapped zone wiring, neutral/line error.
- Actuator channel failure: SSR open, relay contact damage.
- Sensor fault lockout: probe open/short triggers fail-safe off for that zone.
First fix (fast)
- Containment: force a short diagnostic on-window and confirm command + response.
- Hardware quick-fix: verify terminals and actuator channel; replace suspect actuator; clear latched faults only after evidence.
- Permanent: add per-zone current presence sensing or actuator health check hooks.
MPN examples (zone actuation)
Symptom G — Intermittent False Alarms (Overtemp / Probe Fault)
Target: confirm whether alarms are triggered by real thermal conditions or by edge-coupled sensing noise.
First 2 measurements
- Fault flags with timestamp (and latch state if implemented).
- Raw sensor evidence around the alarm: ADC code, open/short thresholds, and edge reference.
Discriminator (one-shot)
- If alarms align with switching edges, apply timing/RC changes. If alarms disappear, the root cause is sensing coupling.
- If alarms occur without edge correlation, suspect real hotspots, probe intermittency, or moisture leakage.
Likely root cause (ranked)
- Thresholds too tight + insufficient debounce for a noisy environment.
- Probe intermittency: connector oxidation, micro-movement, moisture ingress.
- True thermal event: localized hotspot due to installation or insulation changes.
First fix (fast)
- Containment: widen fault debounce and ensure safe output behavior (fail-safe off or strict limit mode).
- Hardware quick-fix: improve probe ESD/EMI protection; add RC and routing separation.
- Permanent: add probe integrity checks (open/short + plausibility) and alarm latching rules.
MPN examples (fault robustness)
Quick MPN Shortlist (Common Fix Parts)
Examples only. Always verify voltage/current/creepage approvals, thermal derating, and safety requirements.
Vishay H11AA1 onsemi MOC3063 onsemi MOC3023
ST BTA16-600B Littelfuse Q6008 Omron G2RL Crydom D2425
TDK EPCOS B722 MOV KEMET R46 X2 cap Nexperia PESD5V0S1BA Littelfuse SMBJ TVS
TI TPS3823 Microchip MCP1316 MPS MP1584 TI TPS62130 TI TLV75533
H2-12 — FAQs (Evidence-First, Device-Side Only)
Each answer stays inside the device boundary and closes the loop with two measurements, a single discriminator, and the fastest fix. No HEMS/cloud/gateway deep-dive.
1“It’s scheduled ON but still not warm” — probe placement or no real power?
First prove command vs response: capture SSR_GATE (duty/window) and T_floor rise rate (dT/dt). If SSR_GATE is active but dT/dt stays near zero, isolate wiring/load/actuator (open relay, failed triac/SSR, wrong terminal). If SSR_GATE is absent, check sensor plausibility or safety lockout. Fast fix: force a short diagnostic ON window and verify current presence.
2Temperature looks stable, but comfort feels hot/cold — window too short or filtering too heavy?
Log duty + window length and compare raw ADC vs filtered T_floor. If comfort oscillation repeats with a period close to the control window, the window/deadband is too aggressive—lengthen the window and enforce min on/off. If the display lags real changes (raw moves but filtered barely moves), filtering/settling is too heavy—reduce smoothing and avoid sampling near switching edges.
3Over-temp alarms happen often — real overheating or noisy probe wiring/contacts?
Correlate fault timestamps with SSR edges (SSR_GATE/AC_ZC) and inspect the raw ADC around the event. If alarms cluster at switching edges, it is usually coupling or intermittency—add RC near the ADC, separate probe routing from mains, and improve contact reliability. If alarms occur without edge correlation and dT/dt remains high, treat as a real thermal event and force fail-safe behavior.
4Is a very hot SSR “normal”? How to judge risk quickly?
Quantify, then decide: measure I_load (or power level) and SSR case temperature trend during low/med/high duty. If temperature rises roughly with on-time and current, conduction loss plus thermal path dominates; reduce switching density and improve heatsinking or derate maximum duty. If it heats abnormally during frequent toggling, increase the window and enforce min on/off. Re-select actuator only after evidence.
5Only at high power the link drops — EMI coupling or power droop? Which two waveforms?
Capture Vcc at MCU/radio and retries/PER while aligning with SSR edges. If Vcc dips or resets show BOR, it is rail margin/return-path—add local bulk, tighten UVLO margin, and separate noisy returns. If Vcc is stable but PER spikes at edges, it is EMI/near-field coupling—reduce edge density, improve snubber/MOV strategy, and enforce antenna keepout plus a clean radio rail.
6Set 26°C but it keeps hitting 29°C — tune parameters first or add a floor cap?
Apply safety-first layering: verify whether a floor cap exists and whether T_floor is capped correctly. If there is no cap (or it is too high), add/close the cap before tuning. If a cap exists and overshoot still happens, tune the slow system: lengthen the time window, increase deadband, and enforce min on/off. Validate by step response and overshoot reduction, not by “feel” alone.
7One room is always colder — thermal resistance difference or zone parameters not independent?
Compare zones under the same commanded duty: log duty and each zone’s dT/dt. If dT/dt differs strongly, the dominant factor is installation/thermal resistance or probe coupling—treat it as a physical delta and calibrate expectations. If zones share one probe or share parameters, the controller cannot correct per-room behavior—enable per-zone caps, offsets, and window settings, or add a per-zone probe where required.
8After power loss, the state feels wrong — restore last duty or default OFF?
Default to fail-safe: use reset_reason (PowerOn/BOR/WDT) and latched faults to decide behavior. For BOR/WDT or any uncertain state, force outputs OFF and re-validate sensors and caps before re-enabling heat. Only restore a gentle ramp (not the last duty) when the system is clean and stable. Fast fix: add a supervisor and log brownout events; avoid “instant resume” on unstable mains.
9Probe open/short — what is the safest expected behavior?
A safe design detects open/short by ADC thresholds with debounce, then forces a predictable output state. The minimum safe behavior is fail-safe OFF plus a visible error and a logged fault flag. Avoid “keep heating at last duty” under sensor uncertainty. Add plausibility checks (rate-of-change and range) and require a stable sensor window before clearing the fault. This prevents runaway heating from a broken probe or connector.
10Relay clicking complaints — is switching to SSR enough? What are trade-offs?
Start with control strategy: if clicking comes from frequent toggling, increase the window and add min on/off first. Switching to SSR reduces audible noise but introduces heat dissipation, possible leakage current, and dv/dt sensitivity. A zero-cross drive can reduce EMI, but thermal design still matters. Decide with evidence: switch_count/hour, T_case, and EMI symptoms. Then choose relay/triac/SSR accordingly.
11Floor temperature looks fine, but room air won’t warm — strategy issue or heat loss?
Use evidence to separate “control” from “capacity”: observe whether T_floor plateaus near the cap while T_air stays below target. If yes, the controller is doing what it is allowed to do; the limiting factor is heat transfer/heat loss (or too strict a cap). If T_floor is not reaching target and duty is constrained, tune window/deadband and verify sensor coupling. Confirm with step tests and steady-state error logs.
12Which two test categories are most often missed and cause painful field failures?
Two gaps dominate: (1) disturbance + recovery and (2) fault injection. Disturbance tests include brownout during switching, EFT/ESD at probe lines and enclosure, and verifying safe restart states. Fault injection includes probe open/short, stuck-on actuator detection, and alarm latching/clearing rules. The “comfort gap” is skipping long-duration steady-state tests across different thermal resistances and multi-zone simultaneous heating.