Boiler / Furnace Control: Flame Detection, Drives & Comms
← Back to: Smart Home & Appliances
Core idea: A boiler/furnace control board is a safety-first controller that proves flame, verifies airflow, and authorizes the gas valve through a hard lockout path—while using rails, sensor chains, and event logs as the evidence to diagnose lockouts, false flame, and intermittent resets.
In practice: Most “mystery faults” become deterministic once flame/air/valve timing is correlated with rail droop and timestamped logs, enabling fast isolation and the smallest safe first fix.
Featured Answer (Engineering Boundary)
A boiler/furnace control board safely manages ignition and steady combustion by proving flame, enforcing interlocks, and driving the inducer/blower, gas valve, and igniter while monitoring temperature/pressure. This page focuses on board-level hardware evidence (signals, power rails, lockout paths, and event logs), not thermostat UI, cloud apps, or protocol/register-map business logic.
The content is organized around five evidence anchors that recur in every chapter: prove what happened, isolate why, then apply the smallest safe fix.
System Block & Safety Domains
A boiler/furnace controller is best understood as five coupled domains: a safety interlock chain that must fail-safe, high-energy actuator drives (fans/valves/igniter), low-level sensing (flame/temperature/pressure), logic & event recording (MCU/WD/log), and rugged communications (RS-485/Ethernet). Clear domain boundaries prevent false lockouts and speed field isolation.
The flame decision must directly gate the gas valve enable path, and the lockout cause must be recoverable from an event record (fault code + timestamp + reset reason). This is the backbone for separating true hazards from false trips caused by EMI or rail droop.
What this chapter establishes (so later chapters stay vertical)
- Safety interlock chain: flame detect → valve enable gating → lockout action (fail-safe).
- Actuators: inducer/blower/pump/valves/igniter are treated as noise sources and load-step drivers.
- Sensing: flame, temperature, and pressure/DP are handled as low-level evidence sources that require shielding from switching noise.
- Low-voltage logic: MCU + AFE + watchdog/brownout + event log (diagnosis starts here).
- Mains/high-voltage: relay/triac/SSR drive + isolation boundary (not a certification walkthrough).
- Comms/diagnostics: RS-485/Ethernet hardware robustness (no protocol-stack deep dive).
First measurement entry points (minimal-tool, high-discrimination)
| Domain | What it proves | First 2 measurements |
|---|---|---|
| Safety chain | Whether lockout is caused by true interlock failure vs false decision | Flame raw + flame_ok decision timing; valve_enable gating vs lockout flag/event |
| Actuators | Whether load steps inject droop/noise that cascades into sensing or resets | Fan/igniter/valve drive waveform; 3V3/5V rail droop during load events |
| Sensing | Whether sensor chain is stable and consistent with physics | Sensor supply/reference ripple; ADC/comp output before filtering/decision |
| Logic & record | Whether resets/WD/brownout are the root cause of “random” failures | Reset pin + reset reason; watchdog behavior + event log fields |
| RS-485/Ethernet HW | Whether comm activity correlates with flame false trips or dropouts | RS-485 A/B differential + common-mode; PHY reset/link vs rail noise |
Flame Detection AFE Deep Dive
Flame detection becomes reliable when it is treated as a measurable signal chain rather than a binary “has flame / no flame” assumption. The controller must reject ignition noise, ground bounce, and leakage paths while still proving weak flames within a defined time window.
Two sensing routes (choose the correct evidence model)
- Ionization current / flame rectification: most common; a high-impedance electrode signal is conditioned into a stable “flame present” decision.
- UV flame sensing: used in specific burners; a phototube/photodiode current is converted by a TIA and filtered to reject ambient-light triggers.
Ionization / rectification AFE: pull a weak signal out of ignition noise
Engineering intent: ignition generates large dv/dt and di/dt events; the AFE must settle fast enough to meet flame-proving time limits while rejecting synchronous spikes and common-mode jumps.
UV AFE: TIA + filtering to avoid ambient-light false triggers
- TIA stability: maintain a clean baseline (dark current + leakage) and controlled bandwidth.
- Filter selection: reject slow ambient changes and supply ripple without delaying flame prove.
- False triggers: reflections, sunlight, hot surfaces, and sensor contamination must not create a stable “flame_ok”.
High-value failure mode: false flame from leakage or contamination
- Leakage path raises the integrated node even with no flame (humidity, residues, polluted surfaces).
- Rectifier/integrator can turn leakage into DC bias that sits inside the comparator window.
- Verification: compare “no flame” baseline drift across humidity/temperature changes and observe whether the decision flips without any ignition event.
Measure TP3 (flame raw) and TP4 (flame_ok decision) together. If TP4 flips while TP3 shows only baseline drift (no ignition/no flame evidence), suspect leakage/thresholding. If TP3 shows large synchronous spikes and TP4 chatters during ignition, suspect coupling/ground bounce/filtering.
Ignition & Flame-Proving Sequence (Evidence-Timed)
The ignition flow is valuable only when it produces discriminating evidence. Each phase is defined by what must be true (interlocks), what is driven (igniter/valve/fan), and what the controller must record to make field failures recoverable.
Phases and what they must prove
| Phase | Purpose | Required interlock | Failure evidence |
|---|---|---|---|
| Purge | Clear chamber; establish airflow | Air-proving true (DP/switch + tach consistency) | DP not reached; fan starts then rail droops; interlock toggles |
| Ignition | Create ignitable condition | Air-proving stays true; valve still gated | Igniter drive present but TP3 baseline never settles; TP4 chatters |
| Flame Prove | Prove flame within time window | Valve enabled only in allowed window; flame_ok must become stable | TP3 shows evidence but TP4 never asserts; or asserts then drops |
| Run | Maintain stable combustion | Flame_ok maintained; protections valid | Flame loss vs false trip separated by TP3+TP4 and event log |
| Retry / Lockout | Bounded recovery; fail-safe stop | Retry counter; lockout gate must cut valve enable | Log must preserve phase, cause code, and last-sample snapshot |
Igniter drive proves the noise event occurred; valve enable proves the safety gate behavior; flame_ok proves decision stability. If flame_ok appears before valve enable, suspect false flame (leakage/threshold). If valve enable is present but flame_ok never asserts, suspect true no-flame or AFE suppression. If flame_ok chatters synchronized with ignition, suspect coupling/ground bounce/filtering.
Fan / Blower Drive + Air-Proving (DP Interlock)
Blower and draft-inducer issues are the fastest way to trigger no-ignition, unstable flame, or “high-speed only” lockouts. The reliable approach is a closed evidence loop: command → fan rail → tach → DP response → interlock decision, with coupling checks into the flame AFE.
Board-level interfaces (what to measure first)
Control boards often drive an external fan module or a power stage through PWM/EN and then rely on tach + DP as the “air-proving” proof. The goal is not perfect speed control here; the goal is stable interlock evidence.
Two mandatory evidence chains (non-negotiable)
- Chain A — Fan rail droop + tach build: capture TP-FAN_V during start and speed step; verify TP-TACH reaches a stable frequency without missing pulses.
- Chain B — DP response curve: DP switch/state must change at expected airflow; DP analog must be monotonic with speed steps (not necessarily linear, but consistent).
Symptoms → discriminators (minimal measurements)
| Symptom | First 2 measurements | Discriminator (what it proves) |
|---|---|---|
| Fan does not spin | TP-PWM/TP-EN + TP-FAN_V | If command exists but TP-FAN_V collapses, suspect inrush or rail limitation; if rail is stable but tach absent, suspect harness/driver/fan module. |
| Speed unstable / tach jitter | TP-TACH + low-voltage rail (3V3/5V) | Tach dropouts aligned with low-voltage ripple indicate common return/power integrity; tach noise with stable rails points to input conditioning or harness coupling. |
| DP never proves (air-proving fail) | TP-DP + TP-TACH | Normal tach but DP flat indicates DP chain issue; low tach and DP fail indicates fan rail/drive or excessive load (use evidence to separate). |
| Only fails at high speed | TP-FAN_V droop at speed step + TP3 flame raw | If high-speed step causes rail droop and TP3 noise rises together, suspect shared return or injection into flame AFE; if droop dominates without TP3 noise, suspect rail capacity/inrush. |
When fan speed changes, observe whether TP3 (flame raw) noise increases in sync with TP-PWM edges or with TP-FAN_V droop. PWM-synchronous noise suggests harness/edge coupling; droop-synchronous noise suggests common return or supply injection that shifts the AFE reference.
Gas Valve / Damper / Pump Drivers (Protection + Hard Cut)
Actuation faults must be separated into logic (valve_enable), energy delivery (coil/motor voltage or current), and safe turn-off (clamp and hard-cut behavior). This chapter focuses on board-level drive topologies, protection, and measurable proof that lockout removes valve energy even if firmware fails.
Load map (classify by measurable evidence)
- AC gas valve: relay / triac / SSR. Evidence is coil voltage waveform and guaranteed de-energize under lockout.
- DC solenoid valve: low-side MOSFET or high-side switch. Evidence is coil current ramp + flyback clamp at turn-off.
- Damper / 3-way valve: DC motor (H-bridge) or stepper. Evidence is drive phase waveforms + enable gating.
- Pump: relay/triac/driver. Evidence is supply step response and return-path stress during start/stop.
Two mandatory evidence chains
- Chain A — valve_enable + coil energy: capture valve_enable together with coil voltage or coil current; both must collapse when lockout asserts.
- Chain B — turn-off transient: observe the turn-off spike and verify the clamp path (diode/TVS/snubber) keeps stress bounded.
Common failures → discriminators (minimal measurements)
| Failure | First 2 measurements | Discriminator (what it proves) |
|---|---|---|
| Valve does not actuate | valve_enable + coil V/I | If enable is present but coil energy is absent, suspect drive path (relay/triac/MOSFET/harness). If enable is absent, interlock/sequence is blocking (refer to H2-4 timing). |
| Actuates then drops (bounce/back) | coil current vs supply droop | Current ramps then collapses with rail droop or gating indicates supply/drive limitation; stable current with drop suggests non-electrical cause (use evidence to exclude board path). |
| Coil overheats | steady-state current + duty/hold control | Overcurrent or excessive hold energy points to gating/drive strategy; verify the commanded hold state matches measured current. |
| Driver device fails (blows) | turn-off spike + clamp node | Unclamped spikes or long loops indicate insufficient clamp or poor return path; verify clamp node behavior (diode/TVS/snubber actually conducts). |
Lockout must remove valve energy through a hardware gating path that does not rely on firmware timing. Verification requires capturing lockout/valve_enable together with coil V/I and confirming that coil energy collapses immediately when lockout is asserted.
Temperature & Pressure Sensing Chain (Robust Diagnostics)
The sensing chain should be engineered for robust field behavior rather than peak accuracy. Most “random over-temp” and “pressure false alarm” cases are traceable to raw ADC artifacts, sensor-supply/Vref ripple, and missing consistency gates against flame/airflow state.
Board-level measurement points (start here)
The minimum evidence pair for any jumpy sensor complaint is: pre-filter samples plus Vref / sensor-supply ripple. Without both, “filter tuning” becomes guesswork.
Temperature chain: multi-point sensors, noise, and open/short resilience
- Multi-point role separation: supply/return water, flue, and heat-exchanger sensors should enable cross-checks (a single channel jumping alone is a strong wiring/chain signal).
- Filter strategy (field-first): combine a light low-pass for ripple with outlier suppression for contact bounce; diagnostics should observe pre-filter behavior in parallel.
- Open/short detection: detect rail-saturated codes early (before the protection logic interprets them as real over-temp).
- Harness and return-path coupling: correlate sensor jumps with fan speed steps or ignition events to separate real thermal change from injected noise.
Pressure chain: ratiometric pitfalls and reference contamination
- Sensor supply ripple: pressure sensors often track supply; a noisy TP-SENS_V can appear as “pressure oscillation”.
- Vref ripple: a moving TP-VREF turns every ADC channel into a power-noise probe; pressure false alarms frequently align with load steps.
- MUX/settling artifacts: fast channel scanning without adequate settle time can create step-like jumps in raw codes (distinct from true pressure dynamics).
Three-gate diagnostics: range + rate + consistency
| Gate | What it blocks | Evidence to verify |
|---|---|---|
| Range check | Open/short or impossible sensor states being treated as real conditions | TP-ADC_T / TP-ADC_P saturations + explicit fault code (do not rely on filtered value) |
| Rate check | Unphysical jumps driving over-temp / pressure trips | Pre-filter sample-to-sample delta (raw slope), correlated with load-step timestamps |
| Consistency check | False trips that contradict combustion/airflow state | Temperature rise should align with flame_ok; DP changes should align with tach and commanded fan state |
If raw ADC steps track TP-VREF or TP-SENS_V ripple, the root cause is reference/supply injection. If only a single channel jumps while others remain stable, suspect wiring/contact/chain. If temperature/pressure alarms contradict flame_ok or DP/tach state, tighten the consistency gate before adjusting thresholds.
Power Tree & Brownout / Surge Robustness (Board-Level Consequences)
This chapter avoids power-topology derivations and focuses on how rail events corrupt safety evidence. Load steps from ignition, fans, and valves can produce rail droop, threshold shift, and partial resets, leading to false trips or confusing lockouts unless reset/windowing and logs are designed for proof.
Power domains (what each rail “hosts”)
- 12V (actuation domain): fans, relays/triacs/valves, ignition-related loads; primary source of load-step stress.
- 5V (intermediate/sensor domain): sensor supply, comms, auxiliary logic; vulnerable to injection from 12V events.
- 3V3 (logic/AFE domain): MCU, comparators/AFE, state tracking; most sensitive to brownout and reference drift.
Brownout classes (why “no reset” can still fail)
| Class | What happens | Typical board symptom | Evidence to capture |
|---|---|---|---|
| Threshold shift (no reset) | Rails sag but do not cross POR/BOD; analog thresholds and references move | false flame/over-temp/DP trip without a reset record | TP-3V3/TP-5V ripple + TP-VREF + trip timestamp |
| Hard reset | POR/BOD triggers; state machine restarts | unexpected lockout or retry loops | TP-RESET low pulse + event log “reset cause” |
| Partial init / timing window | rails recover during boot; sampling begins before AFE/refs settle | early false codes right after power-on | boot timeline + first-sample snapshot vs rail settling |
Power-on & reset windowing (safety default + valid sampling window)
- Safety default: during boot/reset, valve energy must remain gated off by hardware (lockout/hard gate), not by firmware timing.
- Valid sampling window: flame/temperature/pressure evidence should be accepted only after rails and references settle; pre-window samples should be logged as “invalid,” not interpreted as faults.
MOV/TVS/fuse/inrush limiting should be treated as “rail event shapers.” Detailed implementation belongs to the EMC/Safety subsystem page; here the focus remains on rail consequences, reset records, and proof.
Two mandatory evidence chains
- Chain A — rail waveform + reset/WD: capture TP-3V3 and TP-5V during fan/ignition/valve steps, together with TP-RESET (or reset cause) and watchdog status.
- Chain B — event log timeline: timestamps for load-step events and fault codes must align with the rail behavior (otherwise the logging chain is not robust under stress).
RS-485 / Ethernet Hardware Integration (Evidence-Driven)
Intermittent comms, random dropouts, and post-surge port failures are usually rooted in common-mode stress, ESD/surge paths, and PHY/transceiver supply or reset integrity. This section stays at the hardware layer: measure the physical evidence first, then decide whether termination, biasing, isolation, shielding reference, or rail windowing is the true limiter.
Board-level test points (minimum set)
For “CRC errors” or “sometimes works”: capture A/B differential and common-mode together. For “Ethernet drops”: capture PHY rail and PHY reset together.
RS-485 transceiver: what matters for field faults
- ESD and surge tolerance: prevents latent port damage that manifests as rare errors after a discharge event.
- Fault protection: survives A/B shorts to GND/V and bus contention without burning the interface.
- Common-mode range (CMR): directly determines whether ground potential differences become CRC errors and random framing loss.
- Failsafe behavior: avoids floating-bus noise being interpreted as data during idle or wiring faults.
Termination & biasing (principles, proven by measurements)
- Biasing goal: idle A–B should settle to a stable polarity; a floating idle that wanders typically increases error bursts.
- Termination goal: reduce reflections at the ends of the line; incorrect placement often shows up as edge ringing and timing margin loss.
- Evidence pair: measure A/B differential plus TP-485_CM. Large common-mode jumps frequently track the error bursts more than the differential amplitude does.
Ethernet: PHY + magnetics + common-mode management
- PHY rail and reset integrity: transient rail droop or reset glitches can create “link flaps” without any protocol involvement.
- Magnetics isolation: transformer isolation helps, but common-mode transients can still couple through parasitics and shield paths.
- CMC/ESD at the connector: treat these as the port’s “stress steering” elements; a damaged protection part can cause leakage that drags rails or distorts common-mode.
Isolation decision (when it is justified)
| Interface | Trigger condition (principle) | Hardware evidence | What isolation changes |
|---|---|---|---|
| RS-485 | Uncontrolled ground potential difference, long cables, repeated surge/ESD exposure | TP-485_CM frequently approaches/exceeds transceiver CMR; errors correlate with CM jumps | Breaks the ground loop; improves CM headroom; shifts stress to isolated barrier + port protection |
| Ethernet | Shield/common-mode is the dominant coupling path; repeated port damage events | Link flaps correlate with TP-ETH_RAIL / TP-ETH_RST; shield reference behavior correlates with failures | Magnetics already isolate data; emphasis becomes shield/chassis strategy and rail/reset robustness |
If RS-485 errors rise while TP-485_CM swings widely, prioritize common-mode/ground reference and isolation decisions. If Ethernet drops align with TP-ETH_RAIL ripple or TP-ETH_RST glitches, prioritize PHY supply windowing and reset conditioning. If a surge event precedes persistent failure, check whether the port protection now leaks or clamps abnormally (evidence first; no protocol assumptions).
Functional Safety Hooks: Lockout, Self-Test, Event Recording
Safety must be expressed as verifiable engineering hooks: a lockout matrix tied to evidence, self-tests that detect stuck inputs and actuator faults, and an event record that reconstructs failures with minimal ambiguity. The goal is a proof-friendly chain: event log + reset reason + flame/air/valve triad.
Lockout trigger matrix (evidence-bound)
| Trigger | Safety action | Required evidence snapshot | Notes (anti-false-trip) |
|---|---|---|---|
| Flame fail / flame loss | Immediate valve energy off + lockout/retry policy | flame_ok + valve_enable + timestamp | use debounce window for momentary drop only if valve energy is still safe |
| Air-proving fail (DP/pressure) | Block ignition/valve enable; escalate to lockout if persistent | DP state/value + fan command/tach + timestamp | consistency: DP should track tach and commanded fan state |
| Over-temp | Reduce/stop heat; lockout if hard limit is crossed | temp raw/filtered + flame_ok + stage + timestamp | reject single-sample spikes that contradict flame/air state |
| Pressure fault | Block ignition/valve; lockout if unsafe region reached | pressure raw + Vref/sensor supply status + timestamp | differentiate sensor-chain noise vs real pressure change (pre-filter evidence) |
| Stuck relay/triac / valve drive fault | Hard gate off + record fault + lockout | command vs feedback/energy signature + timestamp | treat as “must-prove-off” fault class |
Hard cut-off path (survives MCU failure)
- Lockout latch / gate: a hardware gate must collapse valve energy even if firmware is stalled.
- Proof requirement: when lockout asserts, valve_enable and the valve energy signature must drop consistently.
- Capture: log the lockout reason together with the last known flame_ok, DP, and valve_enable states.
Self-tests (signal-verifiable)
Each self-test should produce a recordable contradiction (command vs evidence), not a “silent fix.” The key is to capture the raw state at the moment the test fails.
Event recording: minimum fields that reconstruct failure
- Timestamp (at least monotonic ordering)
- Stage (purge / ignite / prove / run / retry / lockout)
- Lockout reason (single enumerated code)
- Evidence snapshot: flame_ok, DP state/value, valve_enable, fan command/tach, temp/pressure status flags
- Retry counters (attempt index and remaining budget)
- Reset reason (POR/BOD/WD/other)
Watchdog gating (prevents “fake running”)
- Feed condition: watchdog feed should be allowed only when critical safety checks have executed and evidence is internally consistent.
- WD record: on WD reset, store reset cause and the last stage + last lockout reason so the next boot can report a coherent story.
Collect event log (timestamp + stage + reason), reset reason, and the triad waveform/states: flame_ok + air-proving (DP) + valve_enable/energy signature. If lockout occurs without reset, suspect threshold shift/noise injection; if WD reset aligns with load steps, suspect rail windowing and boot sampling timing.
Validation & Field Debug Playbook (Symptom → Evidence → Isolate → Fix)
This playbook is designed for fast isolation with minimal tools. Every symptom is forced to land on one (or more) evidence anchors: flame, air (DP), valve, rails, log. Use the same 5-line template repeatedly to avoid “maybe causes” and converge on a discriminator.
Required triad for safety-related incidents: event log + reset reason + flame/air/valve. If any of these are missing, add the missing hook before deep analysis.
1) Ignition fails (no flame proving) flame • valve • log
- Symptom Purge/ignite runs, but no flame_prove within the proving window.
- First 2 measurements (1) Flame sense raw (ionization waveform) (2) Valve_enable + igniter_drive timing.
- Discriminator If valve_enable/igniter_drive are present yet flame waveform never crosses the expected window → flame AFE threshold/injection is dominant. If valve_enable is blocked → interlock/lockout path dominates (air/limit chain).
- First fix Shift flame sampling/proving window away from the highest ignition noise interval; tighten input current limiting and reduce common-mode injection into the flame front end.
- Prevent Record: proving-window start/stop timestamps + flame_ok decision + last flame_raw snapshot into event log.
2) Flame drops after 2–10 s flame • valve • log
- Symptom Flame is proven, then flame loss triggers retry/lockout shortly after.
- First 2 measurements (1) Flame_ok and/or flame_raw (2) Valve_enable (or valve energy signature if available).
- Discriminator If flame_ok drops before valve_enable drops → either true flame loss or flame signal corruption. If valve_enable drops first → hard gate, drive fault, or rail window dominates.
- First fix Add a short flame-loss debounce only when valve energy is confirmed stable; otherwise cut immediately and diagnose injection paths.
- Prevent Log a “3-signal snapshot” at flame loss: flame_ok + DP state + valve_enable.
3) Fan start causes reset or fault rails • log • air
- Symptom Draft inducer/blower command triggers MCU reset, brownout, or immediate fault.
- First 2 measurements (1) 3.3 V/5 V rail waveform during fan start (2) reset reason (POR/BOD/WD) + event timestamp.
- Discriminator If rail crosses BOR/BOD threshold and reset asserts → brownout is confirmed. If rails stay inside window but fault occurs → sensor/AFE injection or timing windowing dominates.
- First fix Separate high di/dt fan domain return from logic/AFE return; adjust reset/boot sampling windows to occur after rail stabilization.
- Prevent Add “load step validation”: fan start/stop at cold/hot line while logging rails + reset reason.
4) Lockout only at high power stage air • rails • flame
- Symptom Low stage runs, but high stage triggers lockout (air fail, flame fail, or mixed).
- First 2 measurements (1) DP/air-proving vs fan command/tach (2) rail ripple when power stage changes.
- Discriminator If DP fails to track fan state → airflow/DP chain dominates. If DP is consistent but flame misreports during high noise → flame AFE common-mode injection dominates.
- First fix Add DP↔tach consistency gate; harden flame windowing and reduce common-mode coupling from high-power switching/commutation events.
- Prevent Validate at boundary conditions: high stage, maximum fan, cold line, and wet/humid stress.
5) Enabling comms causes false flame flame • rails • log
- Symptom RS-485/Ethernet activity correlates with flame false-positive or false-negative.
- First 2 measurements (1) TP-485_CM (common-mode) or PHY rail ripple (2) flame_raw + comparator/flame_ok output.
- Discriminator If false flame aligns with common-mode jumps → comms ground reference/CM injection dominates. If it aligns with rail ripple/reset behavior → rail windowing dominates.
- First fix Control common-mode at the port entry (biasing/ESD path/shield reference) and harden flame input against CM injection; isolate only when evidence supports it.
- Prevent Include “comms on/off transient” in validation while logging TP-485_CM (or PHY rail) + flame decision.
6) Temperature spikes cause over-temp trip rails • log
- Symptom Heating appears normal, yet a sudden temp jump triggers over-temp.
- First 2 measurements (1) pre-filter ADC codes (raw) (2) Vref or sensor-supply ripple at the same timestamp.
- Discriminator If raw codes track Vref/supply ripple → reference/supply injection dominates. If single-sample spikes correlate with switching events → wiring/return coupling dominates.
- First fix Add range + rate gates, plus a “consistency gate” with flame/air state; reject impossible temp transitions that contradict operating stage.
- Prevent Record a short pre-trigger ring buffer (N samples) of raw codes around over-temp events.
7) Valve actuates but combustion is unstable valve • flame • log
- Symptom Valve command is present, but flame signal is unstable or feedback is inconsistent.
- First 2 measurements (1) valve_enable + energy signature (current/voltage proxy) (2) flame_raw (or flame_ok).
- Discriminator Valve energy stable but flame unstable → flame chain robustness/threshold/windowing dominates. Flame stable but valve energy fluctuates → drive protection, kickback clamp, or rail coupling dominates.
- First fix Verify kickback clamp behavior and drive timing; ensure event log captures both command and energy proxy at transitions.
- Prevent Add drive self-test: command-off must prove energy collapse within a fixed timeout (stuck relay/triac class).
8) Humid/rainy conditions cause false flame or missed flame flame • log
- Symptom After humidity exposure, false flame is detected (no flame) or flame is missed (real flame).
- First 2 measurements (1) no-flame baseline of flame input (idle) (2) flame input bias / leakage indicators (baseline shift, CM drift, or comparator bias drift).
- Discriminator “Flame-like” rectification signature present with valve disabled → leakage/contamination path dominates. Flame present but amplitude is reduced with higher noise → window/threshold no longer robust under leakage/CM shift.
- First fix Increase leakage tolerance (input impedance strategy, bias control, cleaning/guarding) and retune proving windows based on measured wet baselines.
- Prevent Add wet/leakage stress validation and store baseline + threshold margin into the event record.
IC / BOM Selection (MPN Classes) + RFQ-Ready Checklist
This section lists IC classes and concrete MPN examples that match the control-board evidence chain: flame sensing robustness, air-proving consistency, valve/fan drive survivability, rail/reset determinism, and comms common-mode tolerance. Select by “field symptom consequence” first, then map to specs.
A) Flame detect front-end (ionization/rectification) — MPN examples
| Class | What it solves | Key specs to watch | Example MPNs (common families) |
|---|---|---|---|
| Low-bias op-amp / integrator | Stable rectified-signal integration under leakage/humidity; reduces false prove | Input bias current, offset drift, EMI robustness, supply range | TI OPA333 / OPA376 • ADI ADA4528-1 • Microchip MCP6V01 |
| Low-power comparator (window building block) | Hard flame_ok thresholding after integration; supports proving windows | Input protection, hysteresis control, propagation vs noise immunity | TI TLV1701/TLV1702 • TI TLV3701 • ST TS881 • onsemi LMV331 |
| Precision reference (for stable thresholds) | Prevents “works in lab, trips in field” due to Vref wander | Tempco, noise, load regulation, startup behavior | TI REF3330/REF5030 • ADI ADR4525/ADR3450 • onsemi LM4040 |
Practical selection hook: if “humidity → false flame/miss” is frequent, prioritize ultra-low input bias and leakage tolerance (guarding + bias strategy), then use a window comparator stage with controlled hysteresis.
B) ADC / Comparator / Reference for temperature & pressure chains — MPN examples
| Class | What it solves | Key specs to watch | Example MPNs |
|---|---|---|---|
| ΔΣ ADC (sensor chains) | Stable readings with filtering; supports “raw code” evidence for spike triage | Noise, programmable data rate, input mux behavior, ref pin options | TI ADS1120 / ADS1220 • ADI AD7793 • Microchip MCP3561 |
| I²C ADC (quick integration) | Fast bring-up for NTC/pressure monitoring; easy to log raw codes | Gain options, input range, conversion time, I²C robustness | TI ADS1115 • ADI AD7997 • Microchip MCP3421 |
| Supervisor reference / shunt ref | Enables stable thresholding and open/short diagnostics under ripple | Dynamic impedance, temp drift, startup behavior | TI TL431 • onsemi NCP431 • ST TL431A |
C) Drivers (relay/triac/SSR, solenoid, fan PWM) — MPN examples
| Load | Selection focus | Key specs to watch | Example MPNs |
|---|---|---|---|
| Relay coil drive | Robust coil drive + clamp; avoids rail droop back-injecting into logic | Clamp strategy, current capability, thermal headroom | TI ULN2003A/ULN2803A • ST ULN2003A • onsemi ULN2803A |
| AC valve via triac/SSR | Predictable triggering; dv/dt tolerance; safe “prove-off” behavior | Isolation class (opto), zero-cross vs random-phase, dv/dt immunity | onsemi MOC3063 (zero-cross) • Vishay VO3063 • onsemi MOC3023 (random-phase) |
| DC solenoid / valve driver IC | Kickback control + current shaping; enables “energy signature” evidence | Peak/hold control, diagnostics, clamp behavior | TI DRV103 • TI DRV110 • onsemi NCV7751 (H-bridge/actuator class) |
| Fan PWM gate driver | Clean switching edges without logic rail upset | Gate drive current, UVLO, dV/dt immunity, supply range | TI TPS28225 • Microchip MCP1402 • onsemi NCP81071 |
Practical selection hook: if “fan start → reset” appears, treat driver choice + clamp/return path as a rail-integrity problem, not just a drive-current problem.
D) Supervisors (BOR / watchdog / sequencing) — MPN examples
| Function | What it solves | Key specs to watch | Example MPNs |
|---|---|---|---|
| Voltage supervisor (BOR/reset) | Eliminates “half-boot” and undefined sampling during droop | Threshold accuracy, hysteresis, reset delay, reset output type | TI TPS3808 • TI TPS3823 • Analog Devices MAX809/810 family |
| Window watchdog | Prevents “fake running” and enforces safety-check cadence | Window settings, reset behavior, supply range | TI TPS3430 • Analog Devices MAX6369 • Microchip MCP1316/1416 family |
| Sequencing / power-good | Ensures AFE/ADC sampling only after rails are valid | PG thresholding, timing, open-drain behavior | TI TPS3890 • TI TPS229xx (load switch class) • ADI ADM6315 family |
E) RS-485 transceiver (fault-protected, wide CMR) + isolation — MPN examples
| Need | Selection focus | Example transceiver MPNs | Example isolated RS-485 MPNs |
|---|---|---|---|
| General robust 485 | ESD, fault protection, wide common-mode range, failsafe | TI THVD1500 • TI SN65HVD1781 • ADI ADM3485E • Maxim MAX13487E | ADI ADM2682E/ADM2687E • TI ISO1452 + transceiver (2-chip) |
| Port ESD clamp (board-level) | Steer ESD/surge away from transceiver; avoid leakage after hits | Semtech SM712 (RS-485 TVS) | — |
Isolation decision must be evidence-based: if TP-485_CM repeatedly approaches the transceiver’s CMR and correlates with errors, isolation is justified.
F) Ethernet PHY + port protection hooks — MPN examples
| Class | What it solves | Key hooks to verify | Example MPNs |
|---|---|---|---|
| 10/100 PHY | Stable link under rail ripple and reset windowing | Rail quality, reset timing, strap configuration stability | Microchip LAN8720A • TI DP83825/DP83848 • Microchip KSZ8081/KSZ8051 |
| Ethernet ESD array | Improves port survival; reduces latent leakage faults | Low capacitance, placement at connector entry | TI TPD4E1U06 • Nexperia PESD1ETH series • Littelfuse SP305x series |
RFQ-ready selection checklist (copy/paste)
- Appliance type: Boiler / Furnace (gas / electric / hybrid)
- Flame method: ionization/rectification (default) or UV sensor (if applicable)
- Actuators: valve type (AC triac/SSR or DC solenoid), igniter type, fan type (PWM/ECM interface), pump (if present)
- Sensors: temp points (HX/flue/supply/return), pressure types (water/gas/DP)
- Comms: RS-485 and/or Ethernet; isolation requirement (evidence: TP-485_CM behavior)
- Rails: logic rails (3.3V/5V), actuator rails (12V/24V), brownout symptoms (yes/no)
- Top symptom (from H2-11): pick 1–2 and provide available evidence: flame/air/valve/rails/log
Notes on MPN usage: verify voltage ratings, isolation requirements, safety approvals, and thermal margins against the target market and appliance class. MPNs above are example families; final selection should follow the evidence anchors and the RFQ checklist.
FAQs (Evidence-Locked, Accordion-Ready)
Each question is answerable using only in-page evidence: flame, air (DP), valve, rails, log, or temp/pressure. Every answer follows the same rule: pick two measurements, apply a single discriminator, then do the smallest first fix.
Q1
“Ignition is active” but lockout always happens—check flame first or valve_enable first?
Answer Start with valve_enable because it proves whether the safety gate actually authorizes fuel; then confirm flame evidence.
First 2 measurements Valve_enable timing + flame_raw/flame_ok during the proving window valve + flame.
Discriminator If valve_enable is missing or drops first, the lockout path is upstream (interlock/limit). If valve_enable is stable but flame never crosses the window, the flame AFE threshold/noise window dominates.
First fix Align proving windows away from peak ignition noise and log timestamps for valve_enable, flame decision, and retries.
Q2
Flame drops 3–5 seconds after lighting—true flame loss or flame AFE chatter? What two proofs decide?
Answer Prove the order: flame decision vs valve authorization. The first signal to fail defines the root path.
First 2 measurements flame_ok (or comparator output) + valve_enable edge timing at the drop moment flame + valve.
Discriminator If flame_ok falls while valve_enable stays high, suspect AFE injection/windowing or leakage drift. If valve_enable falls first, suspect hard gate/drive fault or rail windowing that forces shutdown.
First fix Add a short, evidence-gated flame-loss debounce only when valve energy is confirmed stable; otherwise cut and log a 3-signal snapshot.
Q3
After rain/humidity, “false flame” appears—where is the most common leakage path and how to verify?
Answer The highest-yield suspect is leakage/contamination around the high-impedance flame input, creating a rectification-like bias even with no flame.
First 2 measurements Capture no-flame baseline (valve disabled) of flame_raw + the flame comparator output flame.
Discriminator If “flame-like” signatures exist with valve disabled, it is leakage/PCB surface contamination. If real flame exists but amplitude collapses under noise, thresholds/windows lack margin under wet baselines.
First fix Improve leakage tolerance (guarding/cleaning/bias strategy) and store wet baseline + threshold margin in the event record.
Q4
Fan start causes reset—rail droop from inrush or ground-bounce injection? What two points to measure first?
Answer Treat it as a rail determinism problem first; “no reset” and “reset” produce different fault narratives.
First 2 measurements 3V3/5V waveform during fan command + reset_reason/WD flag with timestamp rails + log.
Discriminator If rails cross BOR/BOD and reset asserts, brownout is confirmed. If rails stay inside the window but symptoms correlate with switching edges, ground bounce/common-mode injection is more likely.
First fix Separate high di/dt returns from logic/AFE returns and delay sensitive sampling until rails are stable; validate with repeated fan start/stop logs.
Q5
Air-proving intermittently fails—DP switch/sensor issue or weak fan? Which curve distinguishes them?
Answer Compare airflow evidence against fan evidence. A DP chain that does not track fan state is the fastest split.
First 2 measurements DP state/DP analog value versus tach/fan command over the same time window air + tach.
Discriminator Tach rises but DP does not move → DP sensor/switch chain or tubing path dominates. Tach is low or unstable with rail ripple during start → fan supply/inrush or rail windowing dominates.
First fix Add DP↔tach consistency gating and log DP transitions with timestamps at each proving attempt.
Q6
RS-485 enabled → flame misreports—common-mode injection or power noise? How to pick evidence?
Answer Decide whether the disturbance enters through port common-mode or through rail ripple; the correlation tells the story.
First 2 measurements Measure RS-485 common-mode (port reference jump) and capture flame_raw + comparator output during comms bursts CM + flame.
Discriminator If flame errors align with common-mode steps, port reference/ESD return paths inject into flame AFE. If they align with 3V3/5V ripple or resets, rail windowing and sequencing dominate.
First fix Control common-mode at the port entry and harden flame input against CM injection; isolate only when TP evidence supports it.
Q7
Temperature jumps trigger over-temp—sensor open/short first, or ADC reference drift first?
Answer Check raw evidence before filters: true sensor faults and reference drift look different in pre-filter codes.
First 2 measurements Capture pre-filter ADC codes and record Vref (or sensor-supply) ripple at the same timestamp raw ADC + Vref.
Discriminator Codes track Vref ripple → reference/supply injection dominates. Saturation to rail-like extremes → open/short is likely. Single-sample spikes aligned with switching events → wiring/return coupling dominates.
First fix Add range + rate gates plus a consistency gate with flame/air stage; store a short raw-code ring buffer around trips.
Q8
Valve “has voltage” but won’t open—driver protection, coil open, or mechanical jam? How to tell?
Answer Voltage alone is not proof of actuation; the decision needs an energy signature (current/decay behavior) plus the enable timeline.
First 2 measurements Measure coil current (or equivalent energy proxy) and capture valve_enable edge timing valve + log.
Discriminator Voltage present but near-zero current → open circuit/connector. Current present but no mechanical response → jam/return spring issue. Driver enters protection or supply droops → short/overtemp/undervoltage dominates.
First fix Verify kickback clamp behavior and log command + energy proxy at transitions for future root-cause retention.
Q9
Lockout only at high power stage—ignition noise or unstable valve/air/combustion? Which two proofs first?
Answer Start with the two state proofs that change with staging: airflow proof and flame proof. High-stage faults usually reveal which chain loses margin first.
First 2 measurements Log DP/tach consistency and capture flame_raw (or flame_ok) during the stage transition air + flame.
Discriminator DP stops tracking tach → airflow/DP chain dominates. Flame becomes noisy or flips near commutation events → AFE injection/windowing dominates. If both are stable but lockout happens, check rails + reset/WD timing around the transition.
First fix Add stage-transition validation: timestamped DP, flame decision, and rail minima logged for every high-stage entry.
Q10
After reboot the fault code “disappears”—what minimum event fields prevent “no evidence” investigations?
Answer Store the last actionable story, not just a code. Without reset cause and timestamps, root cause becomes unrecoverable.
First 2 measurements Ensure reset_reason (POR/BOD/WD) and event timestamp + last_state are captured non-volatilely log.
Discriminator If reset_reason is missing, brownout and watchdog faults look identical after reboot. If last_state/retry_count is missing, ignition failures and interlock failures are indistinguishable.
First fix Minimum fields: timestamp, state, decision flags (flame/air/valve), retry count, and reset_reason; add a short snapshot of key analog minima if possible.
Q11
Flame signal is weak but can be maintained—adjust thresholds or fix return paths first?
Answer Do not tune thresholds blindly. Decide whether weakness is a stable low-amplitude condition or a noise-dominated injection problem.
First 2 measurements Capture flame_raw amplitude/noise and correlate it with a known aggressor event (ignition pulses, fan start, comms bursts) flame.
Discriminator Weak but stable and not correlated to aggressors → threshold/window margin can be adjusted. Weak and strongly correlated to common-mode jumps or rail ripple → return paths/CM injection must be fixed first.
First fix Improve CM/return control, then re-evaluate flame window thresholds with logged baselines to avoid trading false positives for false negatives.
Q12
Field issue is intermittent and cannot be reproduced in the lab—how to design a “minimal reproduction jig” and log fields?
Answer The jig should reproduce disturbances, not the whole appliance. Make the injection source controllable and the evidence capture deterministic.
First 2 measurements Use a repeatable aggressor (fan start load step or comms on/off burst) and log event timestamps + reset_reason + flame/air/valve snapshot log + rails.
Discriminator If faults align with load steps, rail windowing/return coupling dominates. If they align with port bursts, common-mode injection dominates. If neither aligns, expand capture to include raw ADC minima and comparator edges.
First fix Add a short ring buffer (N samples) for rails/Vref/flame decision around triggers so “one-off” becomes evidence-backed and debuggable.