Ventilation / ERV Controller: Fan Drive, CO₂/VOC, EMC
← Back to: Smart Home & Appliances
Core idea: A robust ERV controller is built around testable evidence: stable fan/valve actuation, clean CO₂/VOC/ΔP sensing, and on-device filter-life decisions that are backed by timestamps and logs. When airflow or readings look wrong, the fastest path is to correlate drive commands, feedback signals, and power/EMC events to isolate the real fault before changing hardware.
What a Ventilation / ERV Controller Covers
This page defines the hardware boundary of an ERV controller as a local closed-loop system: power entry → sensing → control → actuation → fault logging, plus a gateway interface that must not break local safety or stability.
Hardware boundary (8–12 lines, engineering-contract style)
- Inputs: AC mains (100–240Vac) or 24Vac or 24Vdc (chosen by installation; impacts surge/EFT coupling and brownout risk).
- Protection: line/terminal ESD & surge clamp, inrush/OV/UV handling, and a reset-safe startup sequence.
- Power conversion: SMPS to a low-voltage domain (e.g., 3V3/1V8) with defined ripple/noise budgets for sensors and radio.
- Control core: MCU + watchdog + RTC/timebase + nonvolatile event log (root-cause evidence, not only “error codes”).
- Fan actuation: ECM/BLDC command (PWM or 0–10V) and/or AC fan switching (relay/SSR/triac), with a measurable feedback path when needed.
- Damper/valve actuation: 24Vac on/off, DC motor, or stepper-based actuator; optional limit/position feedback.
- Sensing: CO₂ (NDIR), VOC (MOX), and pressure/airflow evidence (ΔP/static pressure); temperature/RH only as compensation/protection (not platform analytics).
- Connectivity: Wi-Fi/Thread/Zigbee/RS-485/Ethernet as interface layer for setpoints/status/updates—without pushing real-time control off-device.
Out of scope (hard exclusions to prevent overlap)
- HRV-specific: defrost logic, ΔT/heat-exchanger energy recovery deep dive, and seasonal efficiency control.
- IAQ Hub platform: multi-sensor fusion dashboards, whole-home air scoring, and cloud-first analytics pipelines.
- Protocol deep dives: Matter/Thread/Zigbee spec details, routing, clusters, and cloud backend architecture.
- Power: LV-rail ripple and droop during fan/actuator transitions; brownout/reset counters.
- Actuation: commanded PWM/0–10V vs tach/FG continuity (or ΔP response) to prove drive vs mechanical loading.
- Sensing: CO₂/VOC status bytes, sampling period, and noise floor (before/after radio bursts or motor PWM edges).
- Comms: command sequence IDs/time stamps and reconnect counts to prove “duplicate/late commands” vs genuine control instability.
Power Tree + Sensing Chain + Control Loops (All Tied to Measurable Signals)
ERV controller architecture is best understood as three evidence-driven lanes. Each lane defines what to measure, what it proves, and what breaks first under field stress (surge/EFT, motor transients, radio bursts).
Lane 1 — Power tree (domain separation + stress points)
- Input domain: terminal/mains protection (surge/ESD/EFT). Evidence: clamp heating marks, post-event leakage, fuse/PTC state, UV/OV flags.
- Conversion domain: SMPS stability under load steps. Evidence: LV droop at fan start/actuator movement; ripple at sensor sampling frequency bands.
- Digital domain (MCU/AFE/Radio): noise budget and reset immunity. Evidence: brownout counter, watchdog resets, rail ripple correlated to radio TX bursts.
- Drive domain (fan/actuator): high di/dt returns and coupling. Evidence: current spikes aligned with sensor noise or comms dropouts.
Lane 2 — Sensing chain (interfaces + sampling discipline)
- CO₂ (NDIR): module status + sampling cadence are part of the evidence. Evidence: status byte, error counters, response time vs known ventilation changes.
- VOC (MOX): heater drive injects noise into the measurement unless sampling is scheduled. Evidence: ADC noise floor changes when heater PWM edges move.
- ΔP / static pressure: useful as a mechanical discriminator and filter-life input. Evidence: monotonic relationship between fan command/FG and pressure response.
- Sampling discipline: choose “quiet windows” (avoid motor PWM edges and radio TX bursts). Evidence: repeatable noise reduction when sampling phase shifts.
Lane 3 — Actuation loop (what closes locally vs via gateway)
- Local loop must remain stable offline: fan speed/airflow control, actuator interlocks, and fault handling cannot depend on network availability.
- Gateway can be delayed: schedules, user setpoints, telemetry uploads, and firmware updates—only if commands are sequenced and acknowledged.
- Evidence-first comms: sequence ID + timestamp + retry count prevent “late/duplicate command” confusion during reconnection.
Minimum necessary feedback — Tach/FG vs ΔP (choose by requirement)
- Tach/FG is sufficient when the requirement is stable speed, duct resistance is relatively fixed, and some airflow loss can be handled by maintenance alerts.
- ΔP is required when the requirement is stable airflow/exchange, duct resistance changes widely (filter clog, damper moves, long duct), or filter-life must be explainable.
- Combined (Tach + ΔP) gives a strong discriminator: command is constant but ΔP collapses → mechanical/duct issue; tach collapses → drive/power issue.
Fan Drive Options: ECM/BLDC vs AC (Triac/Relay/SSR)
Fan-drive selection starts with what fan is being driven and ends with measurable evidence paths. A drive choice that cannot be validated in the field (tach/ΔP response, command waveform integrity, rail droop correlation) will cause repeated “works in lab, fails in homes” cycles.
Quick identification (what to check first)
- Wiring/label: 2-wire / 3-wire / 4-wire; presence of FG/tach; dedicated PWM or 0–10V control pin.
- Control ownership: ECM fan (internal controller) vs raw BLDC/PMSM (external 3-phase drive required).
- Supply type: AC mains / 24Vac / 24Vdc determines switching method and dominant EMI coupling paths.
Bucket A — ECM / BLDC (commanded speed, optional feedback)
- Control variables: PWM frequency/window (avoid audible and sensor aliasing), 0–10V output impedance and reference, and FG/tach input conditioning (Schmitt, RC, or time-window debouncing for low-speed pulse dropout).
- Common failures (must map to evidence): low-speed whine (PWM band vs mechanical resonance), start failure (UVLO / rail droop), speed hunting (loop instability or noise injection), EMI exceed (fast edges / return path).
- Evidence points (minimum set): Command (PWM or 0–10V) + Feedback (FG/tach or ΔP response) + one power integrity trace (LV rail droop/ripple).
Bucket B — AC fan (relay steps / triac phase / SSR)
- Relay steps: simple and robust, but contact arcing can couple into sensor/MCU rails; evidence is reset counters aligned to switching events.
- Triac phase control: enables speed shaping, but creates sharp current edges; evidence is EMI peaks aligned to firing angle and sensor noise correlated to phase edges.
- SSR: quieter switching, but check leakage and thermal rise; evidence is residual current (off-state) and temperature drift during continuous operation.
- Failure signatures: low-speed buzz, intermittent start (insufficient torque at small conduction angles), EMI-driven dropouts, and nuisance resets.
Valve / Damper / Actuator Drive: 24Vac, DC + Limits, Stepper
Damper/valve actuation faults are usually mechanical (stiction/backlash) but are diagnosed by electrical evidence: current shape, back-EMF behavior, and position/limit consistency. The goal is to prove whether the failure is drive/power or mechanics/installation within minutes.
Actuator families (and the best evidence per family)
- 24Vac two-wire on/off: evidence = coil current presence + action timing + interlock contact (if provided).
- DC motor + limit/Hall: evidence = current ramp + back-EMF signature + limit transitions consistency.
- Stepper: evidence = phase current + travel time consistency + end-stop/limit confirmation (missed-step shows drift).
Failure chain A — Jam / stiction (won’t move)
- Symptom: command issued but position evidence does not change.
- Electrical proof: drive current rises quickly and hits limit, while back-EMF stays low.
- Discriminator: current high + no movement → mechanical jam; current near zero → wiring/drive/power missing.
Failure chain B — Backlash / rebound (moves then slips back)
- Symptom: end-position reached but then “unreaches”; airflow/ΔP becomes unstable.
- Electrical proof: limit signal toggles twice or ΔP drops after a stable command.
- Discriminator: command steady + evidence retreats → mechanical rebound/seal force, not firmware logic.
Failure chain C — Stepper missed steps (position drifts over cycles)
- Symptom: repeated moves gradually shift the effective end-point or travel time increases.
- Electrical proof: phase current remains nominal but end-stop timing drifts; occasional “late limit” events.
- Discriminator: normal current + drifting timing → torque margin/installation friction; abnormal current → drive limit/rail droop.
Minimum measurement template (fast field proof)
- First 2 measurements: (1) actuator drive output (voltage or current), (2) position evidence (limit/Hall or ΔP response).
- Third proof when needed: LV rail droop during actuation (separates power weakness from mechanics).
- First fixes: add supply margin/return path cleanup, tune current limit/soft-start, improve harness contacts, reduce seal friction or adjust end-stops.
CO₂ Sensing Chain (NDIR): Interface, Drift, Field Evidence
An NDIR CO₂ “number” is only trustworthy when the chain is traceable: interface correctness, power integrity, and drift explanations must each have a measurable proof. This chapter stays strictly on hardware and field-verifiable evidence (no IAQ cloud analytics).
Interface map (what must be nailed first)
- UART: define voltage domain and framing; log freshness (timestamp per sample) and retry counters.
- I²C: verify pull-ups, bus idle level, and noise margin; log NACK/timeout events and recovery attempts.
- PWM output: measure frequency and duty stability; decode duty over a fixed window to avoid jitter artifacts.
- Power sensitivity points: verify CO₂ module VDD ripple during fan/actuator events; keep a local ground return.
Drift & bias sources (hardware-verifiable)
- Temperature-related drift: bias changes with enclosure temperature; verify with local temperature reference and consistent drift direction.
- Optics aging / dust: slow long-term offset; verify via status/health flags (if exposed) and increasing deviation vs a reference point.
- Update rate / averaging: “slow response” can be an output cadence artifact; verify by reading sample interval and moving-average window behavior from logs.
Field symptoms → discriminators (prove the root cause)
- Always high / always low: check reference point deviation (outdoor or well-ventilated baseline) + long-term trend.
- Slow response: compare sensor update interval vs the expected ventilation time constant; confirm cadence before suspecting “bad sensor”.
- Step jumps / spikes: correlate spikes with fan PWM edges, actuator switching, or radio bursts; then check CO₂ VDD ripple alignment.
Evidence pack (minimum set for field triage)
- Status word / error code: self-test, “measurement ready”, weak-signal flags, sensor fault flags.
- Sample interval + timestamp: detect stale samples and buffering artifacts (freshness must be explicit).
- Power proof: CO₂ VDD ripple/rail droop during fan/actuator switching; log brownout/reset counters.
- Environmental reference: a stable baseline point (outdoor / high-ventilation) used only to validate bias direction and magnitude.
First fixes (evidence-driven)
- If spikes align to VDD ripple: improve local decoupling, separate return path, and avoid reading during switching bursts.
- If freshness is wrong: add timestamps/sequence counters, enforce update timeout, and record retries/errors.
- If bias grows over months: confirm optics/dust suspicion with a reference check; add a maintenance flag rather than “blindly recalibrating”.
- If slow response is cadence-driven: reduce averaging window or expose update interval to logs and UI for correct interpretation.
VOC Sensing (MOX): AFE, Heater-Drive Noise, Cross-Sensitivity
MOX VOC sensing is dominated by a coexistence problem: heater drive injects large disturbances while the sensor readout is a small signal. Stability requires a clean AFE chain, disciplined sampling windows, and evidence that separates true chemistry changes from electrical coupling.
Heater drive (the main disturbance source)
- PWM heater: edges inject ground bounce and supply ripple; verify by aligning ADC noise with PWM edges.
- Constant-current heater: reduces ripple but still requires thermal headroom; verify ripple reduction and consistent warm-up behavior.
- Return-path separation: keep heater current loop away from AFE/ADC reference return; verify by reduced code jitter under identical PWM.
AFE chain (make the small signal measurable)
- Divider / resistive readout: simplest, but sensitive to ground/reference noise; keep source impedance and ADC sampling consistent.
- Bridge / differential readout: improves immunity to common-mode noise; verify by lower variance under the same heater drive.
- Instrumentation / low-noise stage + RC anti-alias: pushes PWM/radio energy out of band; verify via reduced high-frequency code components.
- RC anti-alias: set to attenuate PWM harmonics before ADC; confirm with FFT or time-domain jitter reduction.
Sampling discipline (avoid edges and bursts)
- Sampling window: sample only in a stable region between PWM edges; keep a fixed timing relationship.
- Radio burst coupling: if VOC noise increases during Wi-Fi/Thread activity, avoid sampling during bursts or isolate rails/returns.
- Fan PWM coupling: if VOC noise rises when fan speed changes, correlate VOC code variance with fan PWM edges and supply ripple.
Field symptoms → evidence discriminators
- VOC “random jumping”: check if jumps are periodic and phase-locked to heater PWM or fan PWM edges.
- Worse during wireless activity: check if jitter increases during radio bursts; confirm via timestamp correlation.
- Humidity/temperature driven misreads: if VOC follows RH/T trends without electrical correlation, treat as cross-sensitivity (do not blame the ADC first).
Pressure/Airflow Evidence: Differential Pressure, Fan Curve, Duct Issues
Differential pressure (ΔP) turns “duct/mechanics” into measurable evidence. The goal is to separate mechanical restrictions/leaks/condensation from electrical control faults using a minimum trio: command (PWM), speed (FG/tach), and ΔP.
ΔP sensor practical boundary (what matters in field)
- Interface: analog / I²C / SPI — prioritize stable readings and error visibility over rich features.
- Range: low range improves sensitivity; avoid saturation during boost mode or high duct resistance.
- Zero drift: temperature and stress shift baseline; require a logged “zero check” event to keep bias visible.
- Hose/condensation: water in tubes causes lag, hysteresis, and false offsets; treat as a primary field failure mode.
Fan-curve evidence (turn symptoms into discriminators)
- Same PWM, higher ΔP → likely restriction increased (filter clog / duct resistance).
- Same PWM, lower speed → load increased or power margin reduced; validate with current/ripple snapshot.
- Same speed, lower ΔP → leakage/short-circuit airflow or damper position mismatch.
- Non-monotonic trio (PWM↑ but speed/ΔP not trending) → verify sensor validity and mechanical consistency before blaming firmware.
Physical consistency rules (fast validity checks)
- Monotonic trend: in a fixed duct configuration, PWM↑ should not repeatedly produce ΔP↓ over steady windows.
- Lag: ΔP should settle within a reasonable time constant; excessive lag points to hose water or blockage.
- Hysteresis: up/down ramps differing strongly suggests condensation or mechanical sticking, not “random noise”.
Common duct issue signatures (evidence-first)
- Filter clog: ΔP baseline rises over time; speed drops or current rises at the same PWM.
- Restriction: boost mode fails to increase airflow; ΔP saturates high while speed cannot climb.
- Condensation: ΔP shows slow response + large hysteresis, worse in humid/cold conditions.
- Leak: speed normal but ΔP low; improves after sealing or damper correction.
Filter-Life Algorithm (On-device): Inputs, Model, Calibration, Logs
This page’s unique depth is an on-device filter-life system that remains verifiable in the field. A percentage alone is not actionable; the device must also log why it warned or escalated (trigger reason + snapshot).
Inputs (only what the device can measure)
- ΔP: strongest direct evidence, but must guard against condensation and hose artifacts.
- Speed vs PWM deviation: same PWM producing lower speed indicates higher load/resistance; requires reliable FG/tach.
- Power/current: supportive evidence; rising current at comparable operating points suggests increasing resistance or mechanical drag.
- Run hours: stable fallback; never used as the sole trigger when strong evidence exists.
Simplified model options (field-verifiable)
- Threshold + slope: detect baseline shift (ΔP↑) and trend (ΔP slope↑) at stable operating points.
- State machine: Clean → Warning → Replace, with clear entry/exit conditions and a recorded trigger reason.
- 2-of-3 evidence gate: reduce false positives by requiring agreement (ΔP + speed deviation + power).
Calibration baseline (the most common deployment failure)
- Post-install baseline: capture ΔP₀, (PWM→Speed)₀, and I₀ after a stable warm-up window.
- Exclude transients: ignore actuation changes and mode switches during baseline capture.
- Light seasonal handling: allow a maintenance or re-baseline hint when long-term offset shifts are detected (keep it device-side).
Logs (must record reason, not only %)
- State: Clean / Warning / Replace
- Trigger reason (enum): DP_HIGH, SPEED_DROP, POWER_RISE, HOURS
- Snapshot: ΔP, PWM, speed, current, timestamp (and optional temp/RH as context)
- Evidence count: how many signals agreed at trigger time (1/2/3)
Validation of false-positive / false-negative
- False-positive test: after installing a new filter, ΔP and speed deviation should return near baseline; if not, suspect duct/condensation, not filter-life logic.
- False-negative test: simulate restriction (controlled airflow block); ΔP↑ + speed deviation↑ should escalate state within defined persistence time.
- Proof pattern: escalation must be reproducible and reversible with evidence changes.
Gateway Comms for ERV: What Must Be Real-time, What Can Be Deferred
Communications design for ERV should be validated by control requirements and field evidence, not by protocol depth. The key question is simple: when links drop, the system must remain safe, stable, and provable.
Must be local (real-time, never depend on gateway)
- Fan control loop: PWM/command → speed/ΔP feedback → local regulation and sanity checks.
- Safety interlocks: cover/door, damper limits, over-current/over-temp, sensor fault flags.
- Fail-safe mode: a deterministic offline behavior (hold / degrade / safe) with a logged reason.
Can be deferred via gateway (loss-tolerant)
- Status reporting: modes, setpoints, filter-life state, sensor health summary.
- Alerts: Replace warning, sensor weak-signal, EMC-related reconnect statistics.
- Firmware update: execute only in a safe window; keep basic rollback/failure protection concept-only.
Three classic field failures → required evidence fields
- Mode fallback on drop: offline_enter_ts, offline_duration, mode_before/after, fallback_reason.
- Duplicate commands: cmd_id/cmd_seq, ack_seq, dedup_drop_count, retry_count.
- State desync: state_version, state_gen_ts, last_applied_cmd_seq, resync_success_count.
Offline buffering (device-side, concept-level but testable)
- Command side: accept only sequence-tagged commands; discard stale/expired sequences.
- Status side: buffer snapshots in a ring; flush in timestamp order after reconnection.
- Resync step: align current state version before applying new commands after reconnect.
First fixes (evidence-driven)
- Frequent duplicates: enforce dedup by cmd_id/cmd_seq and make command execution idempotent.
- Frequent desync: add state_version + state_gen_ts, and require resync before new actuation.
- Drop-triggered regressions: define offline policy explicitly and log fallback_reason for service proof.
Rugged Power & EMC Checklist (ERV-specific)
This checklist stays ERV-specific: motor/valve switching noise, NDIR/MOX sensitivity, and wireless coexistence. The structure is always: Risk → Where to probe → Pass gate → Typical fix. General EMC theory belongs on the shared subsystem page.
Risk map (ERV coupling paths)
- Motor PWM → supply ripple / radiated noise → RF throughput drop + sensor readout jitter.
- Valve/relay actuation → EFT-like spikes → I²C/UART errors, brownout resets, mode fallback.
- Long wires/terminals → ESD injection → false alerts, desync, random reconnect loops.
Surge / ESD (power entry & terminals)
- Risk: transients at mains/24V entry and on long I/O terminals.
- Probe: entry-before/after protection, low-voltage rails, reset/brownout counters.
- Pass gate: no uncontrolled reset loops; no persistent sensor fault flags; no corrupted logs.
- Fix: improve entry clamping/return, add staged filtering, isolate sensitive sensor rails.
EFT (valve harness coupling)
- Risk: actuation edges couple into low-voltage domains and comm lines.
- Probe: valve drive waveform + LV rail ripple + interface error counters.
- Pass gate: no burst of NACK/timeouts; no mode fallback triggered by actuation.
- Fix: reduce loop area, add snubbers/return management, schedule sensor sampling away from edges.
EMI (motor PWM harmonics ↔ RF drop evidence)
- Risk: PWM harmonics reduce RF throughput or cause reconnect storms.
- Probe: reconnect_count, retry_count, packet_loss proxy + PWM duty/frequency logs (time-aligned).
- Pass gate: RF counters do not correlate with PWM changes; mode does not revert on RF events.
- Fix: adjust PWM edge control, add filtering, improve ground partitioning, antenna separation.
Sensor integrity under noise (NDIR/MOX coexistence)
- Risk: sensor rails and ADC windows polluted by motor/valve switching or radio bursts.
- Probe: sensor VDD ripple + CO₂/VOC value jumps + sampling timestamp relative to switching events.
- Pass gate: moving the sampling window away from edges reduces jitter; no false VOC/CO₂ spikes during actuation.
- Fix: local rail filtering/LDO, controlled sampling windows, return-path discipline.
General EMC theory (shared subsystem link)
- For cross-product EMC/safety patterns and protection building blocks, jump to the shared page: EMC / Safety & Metering Subsystem (general methods, components, and test taxonomy).
Validation & Field Debug Playbook (Symptom → Evidence → Isolate → Fix)
This SOP is built for minimum tools and maximum proof. Each symptom follows the same four blocks: First 2 measurements (fastest evidence), Discriminator (power/drive/sense/comms split), First fix (highest ROI action), and Log to keep (service-grade evidence).
Minimum toolset (field-ready)
- DMM (supply and continuity), basic scope (PWM/FG/ripple), USB-UART (sensor status/logs), optional clamp meter.
- Time alignment is mandatory: logs should include timestamps so events can be correlated with PWM, valve actuation, and RF bursts.
Quick MPN shortlist (commonly used building blocks)
- CO₂ (NDIR): Sensirion SCD41 / SCD40, Senseair S8, Sensirion SCD30.
- VOC (MOX): Sensirion SGP40 / SGP30, ams CCS811, Bosch BME688.
- ΔP / airflow evidence: Sensirion SDP810 / SDP31 (diff pressure family examples).
- Current sense: TI INA180 / INA240 (motor/valve current evidence).
- ESD / IO protection: TI TPD2E001, Nexperia PESD5V0S1UL, Littelfuse SMF05C (array).
- Logic / conditioning: TI SN74LVC1G17 (Schmitt), TI TLV3201 (comparator).
- Power (example rails): TI TPS62130 (buck), TI TPS7A02 (LDO).
- RS-485 (if used): TI THVD1450, Maxim MAX3485.
Symptom 1 — Fan won’t start / intermittent start
First 2 measurements (minimum tools)
- TP-PWM/DRV: confirm a real command exists (PWM / 0–10V / triac gate presence).
- TP-VDD: check rail sag or brownout evidence (VDD droop, reset/brownout counter).
Discriminator (power vs drive vs interlock)
- Command present + no FG/speed → drive stage / interlock / stalled load more likely.
- Command missing or resets observed → supply integrity / EFT coupling / entry protection more likely.
- Start attempts repeat → look for “start_fail_reason” and correlate with valve/relay actuation timestamps.
First fix (highest ROI)
- Interlock proof: verify limit/cover signals are consistent (no bouncing during start).
- Drive edge control: reduce coupling from actuation edges; add snubber/return management if a coil/relay fires at start.
- Evidence-grade current sensing: add/validate motor/valve current measurement to distinguish stall vs command loss.
- Example MPNs (as applicable): TI INA180/INA240 (current sense), TI SN74LVC1G17 (FG/limit deglitch), TI TPD2E001 (I/O ESD), TI TPS7A02 (clean sensor/control rail).
Log to keep (service proof)
- start_attempt_count, start_success_count, start_fail_reason
- brownout_count, reset_reason, vdd_min_mv
- drive_cmd_ts, fg_detect_ts (or speed_rpm_at_ts)
Symptom 2 — Low-speed whine / speed hunting
First 2 measurements (minimum tools)
- TP-FG: speed jitter / missing pulses (rpm variance vs time).
- TP-PWM: PWM frequency + duty stability (does jitter track PWM edges?).
Discriminator (mechanical vs electrical)
- FG jitter time-locked to PWM edge → coupling/grounding/drive edge control issue.
- FG jitter while PWM stable → duct pulsation / mechanical load / resonance more likely.
- Whine band: if noise appears only in a narrow speed band, suspect resonance (keep a “rpm_band” log).
First fix (highest ROI)
- PWM strategy: move PWM frequency out of audible/mechanical resonance zones; avoid aggressive low-speed edge rates.
- FG conditioning: ensure clean thresholding and deglitching before control uses FG.
- Example MPNs: TI SN74LVC1G17 (Schmitt cleanup), TI TLV3201 (fast comparator), TI INA180 (current evidence).
Log to keep
- pwm_freq_hz, pwm_duty_pct, speed_rpm, speed_jitter_rpm
- dp_pa (if present), mode, rpm_band_id
Symptom 3 — CO₂/VOC drift or random spikes
First 2 measurements (minimum tools)
- Sensor status: read NDIR/MOX status words, error codes, and sampling interval counters.
- Rail + timing: measure VDD_SENS ripple (or proxy) and align sample_ts to PWM/valve/RF events.
Discriminator (sensor health vs coupling)
- Status errors present → sensor health/rail integrity issue more likely than “algorithm”.
- Status OK but spikes align to events → sampling window / ground bounce / RF burst coupling.
- Slow response → NDIR sampling interval or airflow path issue; confirm ΔP/flow evidence before replacing sensors.
First fix (highest ROI)
- Sampling window: schedule CO₂/VOC ADC reads away from PWM edges, valve actuation, and RF bursts.
- Clean sensor rail: add/verify local LDO + decoupling for sensor/AFE domain.
- Example MPNs: CO₂ Sensirion SCD41/SCD40, Senseair S8; VOC Sensirion SGP40/SGP30, ams CCS811, Bosch BME688; LDO TI TPS7A02; ESD TI TPD2E001.
Log to keep
- co2_ppm, voc_raw/idx, sensor_status, sample_interval_ms
- sample_ts, rf_tx_event_ts, valve_act_ts, pwm_change_ts
- vdd_sens_min_mv / ripple_proxy, i2c_nack_count
Symptom 4 — Filter replaced but alarm persists / false alarm
First 2 measurements (minimum tools)
- Trigger reason: confirm which evidence fired (DP_HIGH / SPEED_DROP / POWER_RISE / HOURS).
- Post-replace snapshot: compare ΔP and PWM→speed deviation against baseline after a stable run window.
Discriminator (duct issue vs baseline/state issue)
- ΔP does not drop after replacement → duct restriction, hose water, or pressure plumbing fault more likely than filter-life logic.
- ΔP drops but alarm remains → baseline not refreshed, thresholds too tight, or state/log ordering issue.
- Non-monotonic evidence (ΔP↓ but SPEED_DROP still triggers) → check FG integrity and sampling windows.
First fix (highest ROI)
- Re-baseline: run a defined stable window after installation and store baseline_version with timestamp.
- Pressure plumbing audit: check hose orientation and condensation traps; confirm zero drift behavior.
- Example MPNs: ΔP Sensirion SDP810/SDP31; FRAM Fujitsu MB85RC256V (baseline/log storage example); RTC Micro Crystal RV-3028-C7 (timestamp stability example).
Log to keep
- filter_state, trigger_reason, baseline_version, rebaseline_ts
- dp_pa, pwm_duty_pct, speed_rpm, current_ma (snapshot)
- hose_fault_flag (if used), dp_zero_offset
Symptom 5 — Gateway is online but control is unresponsive / high latency
First 2 measurements (minimum tools)
- Command application: cmd_seq vs last_applied_cmd_seq (is the command executed locally?).
- Link stability: retry_count and reconnect_count time-aligned to the symptom.
Discriminator (dedup/resync vs link noise)
- cmd_seq increases but not applied → dedup/expiry/resync gating issue more likely than “radio”.
- reconnect storms → potential EMI correlation (align with PWM/actuation timing before blaming firmware).
- State mismatch → state_version/state_gen_ts missing or resync not enforced.
First fix (highest ROI)
- Force resync: require state alignment before accepting new actuation commands after reconnect.
- Dedup + expiry: discard stale sequences; ensure idempotent command execution.
- Example MPNs (if hardwired gateway used): TI THVD1450 or Maxim MAX3485 (RS-485); ESD Littelfuse SMF05C (I/O array), Nexperia PESD5V0S1UL (single-line ESD).
Log to keep
- cmd_id, cmd_seq, ack_seq, last_applied_cmd_seq
- state_version, state_gen_ts, resync_success_count
- retry_count, reconnect_count, offline_duration
Symptom 6 — RF enabled → sensor noise increases / readings become unstable
First 2 measurements (minimum tools)
- Correlation: align sensor spikes with rf_tx_event_ts (or RSSI/tx windows) using timestamps.
- Rail integrity: check VDD_SENS ripple (or ADC jitter proxy) during RF bursts.
Discriminator (shared rail/return vs radiated coupling)
- Spikes align + rail ripple increases → shared rail/return path coupling more likely.
- Spikes align but rail stable → radiated coupling / layout proximity more likely.
- No alignment → revisit PWM/valve edges as the dominant aggressor (use H2-10 evidence mapping).
First fix (highest ROI)
- Domain isolation: give sensors an LDO + local filtering; separate return paths where possible.
- Sampling windows: avoid RF burst windows for sensitive reads (CO₂/VOC/ΔP snapshots).
- Example MPNs: LDO TI TPS7A02; ESD TI TPD2E001; RF+MCU module example ESP32-C6 (Wi-Fi + Thread class), BLE/Thread class example nRF52840 (use-case dependent).
Log to keep
- rf_tx_event_ts, reconnect_count, retry_count
- co2_ppm/voc_raw snapshot + sample_ts
- vdd_sens_ripple_proxy, adc_jitter_metric, pwm_change_ts
FAQs (Evidence-first, ERV-specific)
Each answer stays inside this page boundary: fan/valve drive evidence, CO₂/VOC/ΔP evidence, filter-life baseline/logs, gateway disconnect evidence (seq/ts/retry), and rugged power/EMC correlation. Each answer is short, actionable, and log-driven.