Smart HVAC Terminal Hardware Design Guide
← Back to: IoT & Edge Computing
Core Idea
A Smart HVAC Terminal is a field-grade sensor-and-actuator endpoint: it turns real-world temperature/humidity/pressure signals into stable control of dampers and fans while staying reliable on noisy 24V power and long cables. The winning design is evidence-first—manage error sources, protect I/O and power, and keep calibration, self-test, and logs so faults can be located and serviced quickly.
What a Smart HVAC Terminal Is (and Is Not)
A Smart HVAC Terminal is a room/zone endpoint that measures local temperature, humidity, and pressure, drives dampers/valves/fans, and exposes rugged field I/O and 24V power handling with device-level diagnostics. It is not a gateway, cloud controller, or building management platform.
Typical deployments (hardware commonality focus): VAV terminals, FCU controllers, fresh-air terminals, radiant floor valve nodes—each uses the same core hardware loop: sense → control → actuate, under noisy power and long-cable constraints.
System position (endpoint responsibility): local sensors + local actuation + cable-robust interfaces + brownout-tolerant power + actionable field evidence. Any higher-level orchestration belongs elsewhere and should remain out of scope.
In-scope hardware responsibilities (the “must-not-fail” set):
- Sensing chain integrity: T/RH/pressure accuracy is dominated by placement, environment, and filtering choices—not only the IC.
- Actuation without surprises: damper/valve/fan outputs must survive inductive kick, miswiring, and stalls while providing clear diagnostics.
- Field I/O robustness: RS-485/Ethernet reliability depends on common-mode control, grounding strategy, isolation boundaries, and transient paths.
- Rugged 24V power entry: brownouts and transients must not cause random resets, latchups, or “ghost” faults.
- Device-level evidence: logs and health flags should distinguish sensor faults, actuator stalls, link instability, and power events.
Sensor Chain Reality: Why “Bad Readings” Rarely Mean a Bad IC
In HVAC terminals, measurement quality is dominated by an error chain: installation/environment + sensor physics + analog front-end + sampling/filter strategy + power/ground coupling. A stable number on screen can still be wrong if the chain is biased or delayed.
First-pass triage (fast and evidence-driven):
- Classify the symptom shape (offset / drift / noise / slow / spikes). This immediately narrows the bucket.
- Break one coupling first (reduce airflow, remove condensation risk, bypass pressure tubing, or isolate power noise). If the symptom changes, the root cause is likely not the IC.
- Only then touch electronics: verify reference stability, RC corner choices, sampling/anti-aliasing, and ground return paths.
Error buckets and the most common mechanisms (kept compact here; deeper chapters will expand):
- Temperature: self-heating, thermal coupling to enclosure/PCB, airflow-dependent dynamics.
- Humidity: condensation, contamination, slow response that looks “stable,” temperature-correction sensitivity.
- Pressure: tubing/port resonance, pulsation tied to fan/damper events, zero drift and mechanical bias.
- ADC / Reference / Filtering: aliasing risk, RC vs digital filter delay, step-response distortion.
- Power / Ground: ripple-to-code coupling, inductive kick injection, ground potential differences over long cables.
Temperature Measurement Hardware: Choosing NTC, RTD, or Digital Sensors
Temperature accuracy in HVAC terminals is usually limited by thermal placement and excitation strategy, not ADC resolution. The practical goal is a design that stays stable across airflow changes, enclosure heat coupling, and long-cable installation variability.
Card A — When to use which (engineering decision rules)
- NTC (thermistor): best for cost-sensitive room sensors when cable runs are short to moderate and a small offset is acceptable.
- RTD (Pt100/Pt1000): preferred when higher repeatability is required or long cables/terminal contact resistance cannot be ignored.
- Digital temperature IC: best when noise immunity and calibration handling are prioritized, and I²C wiring constraints are manageable.
Core boundary: if temperature error changes when airflow or nearby heat sources change, the dominant cause is likely thermal coupling, not the sensor IC.
NTC: divider, excitation current, self-heating, and lead resistance
- Divider & ADC interaction: high divider impedance reduces power but increases sensitivity to sampling transients and leakage paths.
- Excitation current: stronger excitation improves noise immunity but increases self-heating (a controllable bias source).
- Lead resistance: long cables add series R and terminal contact variability; the effect can look like drift or offset between installations.
- First proofs: reduce excitation or duty-cycle excitation and check if readings drop; compare short-lead reference vs installed cable.
RTD: 2/3/4-wire boundary (practical, not theoretical)
- 2-wire: simplest but lead/terminal resistance directly becomes measurement error; acceptable mainly for short runs or loose accuracy targets.
- 3-wire: common field compromise; assumes two leads are matched—mismatch or contact changes can create intermittent offsets.
- 4-wire: best when cable variability must be removed from the measurement; usually justified for longer runs or tight repeatability needs.
- First proofs: wiggle/torque terminal connections and watch for step changes; compare measured lead resistance vs allowable error budget.
Digital temperature IC: I²C wiring, pull-ups, and isolation boundary
- Pull-up strength: too strong increases emissions and coupling; too weak makes edges slow and increases susceptibility to noise and “stuck” behavior.
- Routing: long I²C runs behave like antennas; keep wiring short or treat as a boundary requiring buffering/isolation (only as needed).
- Failure pattern: sporadic “stale” values or jumpy readings often correlate with bus retries/timeouts rather than true temperature change.
- First proofs: log read retries/timeouts; reduce bus speed/pull-up strength and verify stability improves without changing placement.
Card B — Placement & cabling checklist (stability-first)
- Avoid local heat: keep sensors away from regulators, drivers, relays, and high-current copper pours.
- Control thermal paths: minimize enclosure-to-sensor conduction; wall contact can bias readings toward wall temperature.
- Airflow awareness: airflow changes affect both true temperature and sensor dynamics; decide whether to measure “air” or “surface” and place accordingly.
- Condensation adjacency: if the sensor sits near cold surfaces, plan for condensation risk and cross-check with humidity handling.
- Cabling discipline: strain-relief and stable terminal contact reduce “pseudo-drift” caused by contact resistance changes (especially RTD).
Humidity Measurement Hardware: Condensation, Contamination, Response Time, and Drift
Humidity problems in HVAC terminals are often caused by condensation and contamination, not electronics. A humidity number can look stable while being wrong due to a large time constant or an unrecognized dew-point event. The practical goal is predictable behavior: known failure modes, fast evidence, and recoverable maintenance actions.
Card A — Symptom → Proof → Fix (maintenance-oriented)
-
Stuck / Saturated (e.g., reads ~100% or a fixed value):
likely condensation or surface wetting.
First proof: correlate with temperature approaching dew point; drying/airflow change partially restores behavior.
Fix direction: adjust placement away from cold surfaces, add moisture-aware shielding, plan recovery behavior. -
Slow response (environment changes but reading lags strongly):
likely protective cap + contamination or heavy filtering.
First proof: step-change test and measure time-to-63% (time constant increases over life if contaminated).
Fix direction: balance protection vs response, reduce excessive filtering, define acceptable time constant for control. -
Spikes / jumps (brief excursions):
often caused by airflow bursts, thermal coupling, or intermittent wetting.
First proof: spikes coincide with fan/damper events or rapid temperature swings.
Fix direction: improve mechanical shielding, reduce thermal coupling, set sensible spike rejection without hiding real changes. -
Long-term bias (drift):
typically due to contamination/aging rather than ADC error.
First proof: drift correlates with exposure history; cleaning/drying improves partially but not fully.
Fix direction: use contamination-aware placement, define recalibration/replace policy, store drift indicators in logs.
Card B — Design levers that prevent “stable but wrong” humidity
- Protection strategy: dust/moisture protection reduces contamination but increases response time; choose based on “control vs monitor” priority.
- Filtering strategy: avoid a single heavy low-pass that creates lag; if needed, separate a fast control view and a slow reporting view (behavior-level, not algorithm-level).
- Temperature compensation boundary: compensation is only as good as local temperature validity; poor thermal placement creates compensation error that looks like RH drift.
- Calibration boundary: factory calibration is scalable; field single-point correction can fix offset but not nonlinearity; multi-point field calibration is rarely practical unless controlled fixtures exist.
- Evidence hooks: log dew-point risk flags, time constant estimates (from step tests), and anomaly counters (stuck/spike events).
Pressure & Differential Pressure Measurement: Ports, Pulsation, Filtering, and Zero Management
In real installations, unstable ΔP is most often caused by the pneumatic path (take-off ports, tubing, condensation, partial blockage), while electronics typically comes second. A robust design treats the pressure chain as a system: sampling location → tubing dynamics → sensor/AFE → filtering delay → zero-state validity.
Card A — Pneumatic & installation checklist (most issues start here)
- Take-off location: avoid ports placed near elbows, fans, dampers, or strong turbulence zones; these inject dynamic pressure into ΔP.
- Take-off geometry: burrs, angled probes, and directionality can bias the sampled pressure; keep port shape consistent across builds.
- Tubing length: long/uneven tubes add delay and can create resonance; symptoms often include periodic oscillation and phase lag after an actuator step.
- Kinks and compression: “soft blockage” can appear only at certain flow conditions; strain-relief and routing discipline prevent intermittent faults.
- Condensation: water traps and cold surfaces can cause sticking/lag/ hysteresis; treat dew-point proximity as an operating mode, not a rare event.
- Dust/partial clog: gradual drift and slower step response often come from contamination at the port or inside tubing.
Card B — Filtering and zeroing strategy (do not “slow it down until it looks stable”)
- Pulsation handling: analog RC helps prevent aliasing and reduces high-frequency ripple before sampling; digital filtering then shapes the final behavior. The boundary is delay vs stability: too much low-pass creates a “stable but late” signal that misleads control logic.
- Analog RC vs digital filter: RC is deterministic and protects the ADC front end; digital filtering is adjustable and can support different views (fast control vs slow reporting). If only one heavy filter is used, it may hide condensation/clog events by flattening the evidence.
-
Zero drift management should be treated as a state machine:
- Boot zero: only valid when a known “zero ΔP” condition exists (e.g., ports equalized, no strong pulsation).
- Periodic re-zero: compensates slow drift, but must be blocked during high dynamics (fan ramps, damper movements).
- Event-triggered re-zero: useful after maintenance actions or after detecting long-term bias during low-dynamic windows.
- Validity checks: if the computed zero offset is outside expected bounds, mark it “untrusted,” keep last known-good zero, and log the event.
Evidence hook: record step response (time-to-63%) after a known actuator step. If the dominant change tracks tube length/routing, the pneumatic path is leading; if it tracks filter parameters, the signal processing is leading.
Actuator Outputs: Damper/Valve/Fan Drives, Protection Paths, and Stall/Limit Evidence
HVAC terminal outputs must be selected around the actuator type (24VAC/24VDC, spring return, proportional control) and validated as a closed loop: drive topology, protection paths, and evidence that distinguishes stall vs normal end-stop vs power/cabling issues.
Card A — Output type selection (and what goes wrong if chosen incorrectly)
-
Relay (mechanical):
best for 24VAC or higher-current on/off loads.
Common wrong-choice symptom: using it for frequent modulation results in chatter, premature wear, and supply disturbances during switching. -
SSR (solid-state):
good for silent operation and frequent switching.
Common wrong-choice symptom: AC SSR can leak or fail to turn off cleanly for certain loads; thermal buildup can shift behavior over time. -
Low-side MOSFET (DC):
simple and cost-effective for 24VDC loads.
Common wrong-choice symptom: shared ground return causes measurement/logic upsets during actuation, seen as sensor glitches or sporadic resets. -
High-side switch (DC):
improves supply-domain control and supports short-to-ground detection.
Common wrong-choice symptom: missing backfeed handling allows regenerative energy to lift the bus, triggering resets or faults. -
Half-bridge / H-bridge:
required for reversible motors or “floating” control (open/stop/close).
Common wrong-choice symptom: poor freewheel/recirculation paths or missing timing margins lead to heating and repeated fault trips. -
Proportional control (0–10V / PWM):
common for fans or modulating valves.
Common wrong-choice symptom: overly slow output updates or excessive filtering produces stable-looking signals that cannot track real demand changes.
Field reality: many “actuator faults” are actually supply events (brownout or backfeed). Always correlate faults with bus voltage during switching.
Protection paths (engineering-level, not an EMC deep dive)
- Inductive kickback: provide a defined path (flyback/TVS/snubber depending on AC/DC and switching method) to avoid overshoot and repeated protection trips.
- Backfeed / regeneration: spring return and motor inertia can push energy back into the supply; clamp or absorb energy to prevent bus lift and resets.
- Miswire / short: plan for short-to-ground, short-to-supply, and intermittent cable faults; define safe shutdown and retrial policies.
- Brownout: actuation current can pull down the 24V rail; separate sensitive domains and log undervoltage events for evidence.
Card B — Stall vs end-stop vs supply/cabling: evidence checklist
- Time-window evidence: measure “command-to-stop” time and enforce a window; repeated over-window events are strong indicators of mechanical issues or supply limits.
-
Current/voltage signatures:
- Stall: current rises and stays high (high plateau) while motion does not complete within the time window.
- Normal end-stop: current peaks then drops sharply at completion; limit feedback often changes near the drop.
- Supply/cabling issue: bus voltage sags heavily while current fails to reach expected levels; time window is exceeded without a true stall plateau.
- Limit/position evidence (if available): limit switch or position feedback should align with current change; mismatch indicates sensor/feedback faults or mechanical slip.
- Logging evidence: keep counters for stall, end-stop, brownout, and backfeed clamp events; correlation reduces “no-fault-found” returns.
Fan / EC Fan Interfaces: 0–10V, PWM, Tach/FG, and Noise Immunity
Fan instability in the field is often misdiagnosed as a “bad fan.” In practice, most issues come from reference ground shifts, common-mode injection, long-cable coupling, and the fact that analog control is amplitude-sensitive. A robust terminal design closes the loop with evidence: command signal → cable/ground → fan-side decode → Tach/FG reality.
Card A — Interface selection (and typical wrong-choice symptoms)
-
0–10V (analog):
simplest and widely compatible, but highly sensitive to ground reference and coupled noise on long runs.
Wrong-choice symptom: low-speed hunting, speed wobble synchronized with relay/SSR switching or nearby power converters.
Hardware focus: output impedance, RC shaping, reference/return path, isolation boundary. -
PWM (duty control):
more immune to amplitude drift but can become an interference source due to fast edges.
Wrong-choice symptom: duty “pollution” from edge coupling, ringing/overshoot causing mis-decode at the fan input.
Hardware focus: frequency/duty window, edge control, cable coupling, buffering/isolation boundary. -
Tach/FG (speed feedback):
the “truth channel” for accountability; it separates control jitter from mechanical/electrical fan-side issues.
Wrong-choice symptom: false pulses or missing pulses due to insufficient input protection and poor debounce.
Hardware focus: input clamps, thresholding/pull-ups, debounce, stall decision evidence.
Noise-immunity essentials (engineering boundaries)
- 0–10V output impedance and filtering: if output impedance is too high, cable coupling and fan-input leakage can visibly move the voltage. Use RC shaping to suppress spikes, but avoid excessive low-pass that creates a “stable but late” command.
- Ground reference and common-mode: 0–10V is always “relative to a return.” A poor return path turns actuation currents into control error. When ground is not trustworthy, treat isolation as a boundary choice rather than an optional upgrade.
- PWM edge behavior: fast edges radiate and couple; edge control (buffer + series resistance / controlled slew) often reduces field jitter more than changing PWM frequency.
- Tach/FG conditioning: protect against ESD and cable transients, and debounce in a way that preserves real speed dynamics. A stall decision should not rely on a single missing pulse; use time windows and corroborating evidence when available.
Three “measure first” evidence checks (fast field triage)
- Time-align command vs Tach/FG: does the command jitter first, or does Tach/FG jitter first?
- Check reference/return: does the control reference move during switching events (ground shift/common-mode)?
- Change routing: separate control wires from power/relay/motor wiring; if jitter changes immediately, coupling is the driver.
Communication Hardware Robustness: RS-485 vs Ethernet (Physical Layer Only)
Many “random disconnect” complaints are caused by physical robustness differences between devices: common-mode headroom, grounding, termination/biasing, supply noise to PHYs, and ESD exposure. This section provides a hardware-side evidence path. Protocol details (Modbus/BACnet registers, retries, timeouts) belong to the dedicated protocol pages.
Card A — RS-485 dropouts: three evidence types to check first
-
Evidence #1 — Common-mode / ground potential difference:
RS-485 is not only differential; if common-mode exceeds receiver headroom, errors become random.
What to capture: A/B relative to local ground and any ground offset that changes with fan/actuator switching.
Fix direction: isolation boundary, return-path discipline, and surge current routing. -
Evidence #2 — Waveform integrity (termination / bias):
reflections and ringing indicate wrong termination or stubs; idle drifting indicates weak biasing.
What to capture: differential waveform for overshoot/ringing, and idle stability under cable changes.
Fix direction: correct termination placement, biasing strategy, and controlled topology choices. -
Evidence #3 — Error counters and event correlation:
CRC/frame errors that cluster around switching events strongly suggest hardware injection.
What to capture: error counters time-aligned to fan/relay/valve events and power rail disturbances.
Fix direction: improve isolation, protectors, return currents, and local supply stability.
Card B — Ethernet link instability: three evidence types to check first
-
Evidence #1 — Link flaps (up/down cycles):
frequent renegotiation events indicate a physical robustness issue rather than an application-layer timeout.
What to capture: link status transitions and when they occur. -
Evidence #2 — PHY supply noise during events:
PHY rails can be sensitive to short spikes or dips; unstable rails correlate strongly with link flaps.
What to capture: supply dips/spikes aligned to actuation switching or PoE/24V bus disturbances.
Fix direction: stronger local decoupling, cleaner rail partitioning, and reduced injection through returns. -
Evidence #3 — ESD / shield / grounding interactions:
different devices implement shield and chassis references differently; ESD exposure can create intermittent faults.
What to capture: ESD marks or “after-plugging” degradation, plus immediate behavior change after swapping port/cable.
Fix direction: port protection, shield termination consistency, and controlled chassis/earth handling.
Rugged 24V Power: Entry Protection, Brownout, Transients, and No-Reboot Behavior
Field 24V rails are often “dirty”: long harnesses, shared supplies, motor/relay switching, and surge exposure can introduce brief dips and spikes that look harmless on average but still reset logic and comms. A robust Smart HVAC Terminal focuses on energy routing (where transients go), deterministic reset behavior (UV threshold + hysteresis), and minimal hold-up for the domains that must stay alive.
Card A — 24V field transient types → engineering countermeasures
| Type (field reality) | Typical symptom | Capture first evidence | Fix direction (engineering-level) |
|---|---|---|---|
| Reverse wiring | No boot, hot parts, fuse opens. | Polarity at the terminal; fuse/entry part temperature. | Reverse protection boundary (ideal-diode/bridge/high-side) + clear terminal labeling. |
| Surge / induced spikes | Failures cluster with storms/outdoor events; entry protector discoloration. | Entry TVS condition; event correlation; spike presence at the terminal. | Fast clamp near the terminal + controlled surge return path to chassis/earth (do not inject into signal ground). |
| EFT / switching spikes | Random resets when relays/fans/valves switch. | Reset reason counter time-aligned to switching events. | Entry filtering + domain decoupling + driver flyback loops closed locally (keep switching currents away from logic returns). |
| Brownout dips | Supply returns to 24V quickly, yet MCU/comms reboot. | Minimum voltage at the device during the event; UV threshold proximity. | UV threshold + hysteresis + PG/reset blanking; minimal hold-up for logic/comms across short dips. |
| Long-cable droop | Device-end 24V is consistently lower; failures worsen under load. | Measure 24V at the device terminal versus supply source; check connector heating. | Wire/terminal sizing, contact resistance control, and local energy storage for critical domains. |
Card B — No-reboot reset/PG strategy checklist (brownout done right)
- Set a real UV threshold with hysteresis: avoid “threshold hovering” that triggers repetitive resets. Hysteresis is a stability requirement, not a luxury.
- PG/reset blanking and debounce: short spikes should not reset the system; real brownout should lead to a clean, deterministic reset. Separate short-glitch handling from long-dip handling.
- Domain partitioning: keep the high di/dt driver domain from pulling down logic/comms rails through shared impedance. Partition rails and returns: Analog, Digital/Comms, Driver.
- Minimal hold-up for critical domains: target a short “ride-through” window for MCU + comms so brief dips do not force a reboot. Driver power may be allowed to drop first if the system fails safe.
- Record reset reasons: keep brownout counters and reset-cause flags so “rare” problems become measurable and fixable.
Measure first (fast triage): (1) device-end minimum 24V during the event, (2) PG/reset behavior around the dip, (3) driver switching return injection into logic rails.
Interference Immunity and Wiring: Terminals, Long Cables, Grounding, Shielding, and Input Protection
Immunity is rarely “mystical.” Most field issues become predictable once layout and wiring are reviewed as zones, return paths, and port-protection priority. This section turns common failure patterns into checklists that can be applied during schematic review, PCB layout review, and harness installation.
Card A — Layout review checklist (zoning and returns)
- Terminal zoning: separate power entry, high-energy outputs (valves/fans), comm ports, and sensor inputs at the connector level.
- Return paths: keep driver flyback and switching currents local; avoid routing these returns through analog or comm reference regions.
- Analog vs digital boundary: place input RC and reference points close to the sensing/ADC boundary; keep the “quiet island” physically quiet.
- First landing at the port: clamps and filters must sit near the terminal so harsh energy is handled before it reaches internal zones.
- Harness choices: use twisted pairs for differential/long runs; prefer shield where needed, but do not let shield currents flow through signal ground.
- Ground loop awareness: avoid forming unintended loops between remote equipment grounds and local signal reference.
Review order: Terminals → Protection placement → Return paths → Zone separation → Harness/shield termination → Sensitive inputs.
Card B — Port protection priority (ESD / EFT / Surge placement)
A practical rule: clamp fast at the port, limit energy next, then buffer/isolate as the boundary. Keep the energy return away from quiet references.
| Port | Priority #1 (at terminal) | Priority #2 (energy control) | Priority #3 (boundary) |
|---|---|---|---|
| 24V IN | TVS clamp + correct return path | Fuse/eFuse limit + entry filter | Domain partitioning + supervisor UV/PG |
| Driver outputs | Flyback clamp near the switch/load | Snubber / series impedance where needed | Keep driver return local; isolate if required |
| 0–10V / PWM / Tach | ESD clamp near terminal | RC / series resist for filtering and edge control | Buffer / isolation boundary when reference is not trusted |
| Sensor inputs | ESD clamp close to terminal | RC anti-alias and input limiting near ADC boundary | Analog “quiet zone” separation |
| RS-485 | ESD/surge clamp at connector | Termination/bias placement discipline | Isolation boundary for ground offsets |
| Ethernet | ESD at RJ45/shield handling | PHY rail cleanliness and local decoupling | Magnetics boundary and controlled chassis reference |
Calibration, Service, and Self-Test: An Operable Strategy from Factory to Field
Long-lived Smart HVAC terminals need an “operable” plan: drift must be controlled, faults must be diagnosable, and field service must be predictable. This section provides (1) calibration paths that are realistic in production and in the field, (2) a minimum self-test set that catches wiring/aging/contamination failures, and (3) a minimum log schema that turns intermittent issues into actionable evidence.
Card A — How to choose calibration strategy (cost vs accuracy vs maintenance)
Calibration should match failure physics and field reality. Temperature errors often come from installation and self-heating, humidity errors are dominated by contamination/condensation and slow response, and differential pressure errors are frequently caused by tubing/condensate rather than electronics. Use factory calibration for consistency; use field checks and zero-management for operability.
| Quantity | Practical calibration path | When to trigger service | Example BOM (part numbers) |
|---|---|---|---|
| Temperature (T) |
Factory: sensor-level calibration or system offset trim. Field: quick reference check + offset update; avoid “calibrating away” bad mounting. |
Persistent bias beyond limits; self-heating indicators; inconsistent reading across zones; sudden step change after wiring/placement changes. |
Digital T sensor: TI TMP117, ADI ADT7420, Maxim MAX31875. RTD interface (Pt100/Pt1000): Maxim MAX31865. ULP NTC ADC reference (system trim): ADI ADR4525 (reference) + MCU ADC. |
| Humidity (RH) |
Factory: preferred for initial accuracy and compensation. Field: “health check” + replace protective cap/sensor if response is slow or contaminated; avoid frequent re-trim. |
Response time becomes slow; RH gets “stuck”, jumps, or shows persistent bias after cleaning/ventilation checks; condensation events flagged. |
RH/T sensor modules: Sensirion SHT31, SHT41, TI HDC2080. Combined RH/T/P (compact): Bosch BME280 (use with field health checks). |
| Pressure / ΔP |
Factory: zero + span (or multi-point) for sensor consistency. Field: strong focus on zero management (startup re-zero / periodic re-zero / event-triggered re-zero) after tubing checks. |
Zero drift persists after tubing inspection; slow settling; sensitivity loss after condensate/blocked port events; readings no longer correlate with fan state. |
Digital differential ΔP: Sensirion SDP31 / SDP810 series. Board-mount pressure: TE Connectivity MS4525DO. Analog ΔP (requires ADC + health checks): NXP MPXV7002DP. |
Card B — Minimum self-test + minimum log fields (make faults locatable)
A self-test without logs is not serviceable. The minimum set below catches most real faults: wiring opens/shorts, out-of-range, drift trends, response slow-down (RH/ΔP), plus power/reset and comm error evidence.
Minimum self-test set
- Open/short detection: detect ADC saturation, I²C sensor missing/CRC failures, impossible codes, and stuck-at readings.
- Sanity range: clamp impossible values; prevent control from acting on corrupted input.
- Drift trend: track long-term bias (rolling mean/median); alert when drift accumulates beyond a threshold.
- Response health: flag “too slow” response (RH / ΔP); often indicates contamination, condensation, or tubing blockage.
- Cross-check (lightweight): compare sensor trends with actuator states (fan command vs ΔP response) to identify non-electrical faults.
Minimum log fields (recommended)
| Category | Minimum fields | Example BOM (part numbers) |
|---|---|---|
| Calibration & Versions | calibration mode (factory/field), calibration version ID, timestamp, coefficient set ID (index/hash), runtime since last calibration. |
EEPROM for coefficients: Microchip 24LC256, 24AA256. FRAM (high endurance): Fujitsu MB85RC256V, Infineon FM24CL64B. |
| Sensor Health | open/short flags, out-of-range count + last time, drift metric, response-health flag (slow/normal), “condensation suspected” marker (RH). |
RH/T with diagnostics: Sensirion SHT31/SHT41 (CRC/status). ΔP digital: Sensirion SDP31 (status framing + stable digital output). |
| Power & Reset | reset reason code (POR/BOR/WD), brownout counter, supply-low event timestamp, last “minimum voltage” if measured. |
Supervisor/reset: TI TPS3839, Microchip MCP1316, Maxim MAX809. Rail monitor (optional evidence): TI INA226 (bus voltage/current telemetry). |
| Actuator Evidence | last command snapshot (0–10V or PWM duty), tach/FG statistics (dropouts/jitter), stall/timeout events (count + last time). |
Current-sense amplifier (stall evidence): TI INA180 / INA199. Window watchdog for “hang” detection: TI TPS3431. |
| Comms Evidence | link drop count (Ethernet), RS-485 error counter snapshot, “noise burst” timestamps, correlation with resets or switching events (same time window). |
RS-485 transceiver (rugged): TI THVD1550, ADI LTC2862. Isolated RS-485 (when ground offsets exist): ADI ADM2587E, TI ISO1410 (isolator) + transceiver. |
| Event Storage | circular buffer index, record CRC, monotonic counter, power-loss marker. |
Low-power SPI NOR (logs): Winbond W25Q32JV, Macronix MX25R6435F. RTC for timestamp: NXP PCF8523, Microchip MCP7940N. |
Field triage (decision): If response is slow (RH/ΔP) after cleaning/tubing checks → replace sensor/protection parts; if bias is stable and repeatable → apply field offset; if resets correlate with events → fix power/reset evidence path first.
FAQs: Evidence-First Debug for Smart HVAC Terminals
Each answer follows the same rule: pick the first evidence that separates two root-cause classes, then apply the smallest hardware-side fix. Protocol stacks, gateways, cloud, TSN/PTP details are intentionally out of scope.
1 Why is the temperature reading “one beat late”? Check self-heating first or filter delay?
Separate thermal lag/self-heating from signal-chain delay. First evidence: (1) step response time with airflow change, (2) sensor excitation/power (self-heating correlates with current), and (3) digital filter window vs update rate. If lag remains with filtering reduced, fix mounting/airflow coupling before “recalibrating.”
- Quick test: reduce averaging; compare rise/fall time.
- Parts: TI TMP117, ADI ADT7420, Maxim MAX31875.
2 In rainy season RH drifts or gets stuck—condensation or contamination? What field evidence helps?
Condensation often causes sudden anomalies and recovery artifacts; contamination typically causes slow response and persistent bias. First evidence: (1) response-time health (rise/fall time), (2) sensor status/CRC errors, (3) protective cap/filter condition and airflow path. If response is slow, service/replace protection parts before applying offsets.
- Quick test: compare RH response to a controlled humidity change (bag/vent swap).
- Parts: Sensirion SHT31/SHT41, TI HDC2080.
3 Differential pressure reading is very jittery—check pneumatic path first or electronics first?
Start with the pneumatic path. Tubing length, port blockage, condensate, and resonance can create jitter that no filter can “fix” without adding unusable delay. First evidence: (1) short/straight tube A/B test, (2) water/contamination check at ports, (3) correlation with fan state. Then tune analog RC and digital filtering.
- Quick test: temporarily shorten tubing and compare variance.
- Parts: Sensirion SDP31 / SDP810, TE MS4525DO.
4 After extending NTC wires the reading shifts—line resistance or common-mode noise?
Line resistance creates a stable, repeatable bias; common-mode noise creates time-varying error tied to switching (fan/valves). First evidence: (1) measure NTC resistance at the terminal vs at the sensor, (2) observe ADC node ripple during actuator switching, (3) A/B test with twisted pair and shielding. Use input RC close to the ADC boundary.
- Quick test: log raw ADC codes during fan speed changes.
- Parts: Reference for stable ADC scaling: ADI ADR4525 (system-level); input protection + RC at port.
5 0–10V fan control becomes “fast/slow”—is it output noise or a fan-side issue?
Decide whether the command is unstable or the fan response is unstable. First evidence: (1) 0–10V ripple and ground reference shift, (2) tach/FG dropout or jitter statistics, (3) correlation with cable routing/shield termination. If 0–10V is clean but FG is unstable, focus on fan wiring or input protection at the fan interface.
- Quick test: scope 0–10V at fan end while logging FG pulse counts.
- Parts: Add input protection/series resist; use current evidence via TI INA180.
6 Damper actuator does not move but power is OK—check relay/driver first or limit/stall first?
Start at the boundary: verify the output truly reaches the load, then validate mechanical endpoints. First evidence: (1) output terminal voltage/state under command, (2) current signature or run-time window, (3) limit switch feedback change (if available). If the output toggles correctly but current/position does not change, suspect limit, stall, or wiring at the actuator.
- Quick test: measure output at terminal while forcing a known command.
- Parts: Window watchdog for stuck control: TI TPS3431; current evidence: TI INA199.
7 How to set stall-detect thresholds without false alarms? Which evidence calibrates thresholds?
Thresholds must be tied to a time window and operating state, not a single current number. First evidence: (1) distribution of normal startup current and duration, (2) stall current/plateau behavior, (3) limit-reached time distribution across units and temperature. Use logging to tune: false alarms cluster around cold start, supply dips, or wiring resistance.
- Quick test: record peak current + time-to-limit for 30–50 cycles.
- Parts: TI INA180 / INA199 (sense), log storage: Fujitsu MB85RC256V (FRAM).
8 Reboots happen on 24V transients—fix entry protection first or reset/PG strategy first?
Use evidence to choose. If device-end 24V dips below UV, start at entry protection and hold-up. If 24V stays above UV but resets still occur, fix PG/reset hysteresis and debounce. First evidence: (1) minimum device-end 24V during the event, (2) reset reason (POR/BOR/WD), (3) PG waveform relative to the transient.
- Quick test: log reset reason and timestamp; correlate with actuator switching events.
- Parts: Supervisor TI TPS3839, Microchip MCP1316, Maxim MAX809; telemetry TI INA226.
9 RS-485 becomes less stable when only the peer device changes—what three evidence classes come first?
Use the “three-evidence set”: (1) common-mode/ground offset evidence (peer grounding difference), (2) waveform evidence (termination/bias, reflections, overshoot), (3) error/time-window evidence (dropouts correlate with switching or ESD events). If ground offsets are large, add isolation; if reflections dominate, fix termination placement and stubs.
- Quick test: measure A/B common-mode vs local ground and compare between peers.
- Parts: TI THVD1550, ADI LTC2862; isolated option ADI ADM2587E or TI ISO1410 + transceiver.
10 Ethernet shows occasional link flap—more often supply noise, or ESD/grounding/shield issues?
Supply-noise flaps correlate with load switching and 24V dips; ESD/grounding flaps correlate with cable handling, dry air, and shield termination. First evidence: (1) link up/down counters with timestamps, (2) PHY rail ripple or droop during events, (3) RJ45 shield-to-chassis strategy and visible ESD marks near the connector. Fix the correlated class first.
- Quick test: correlate link flaps with 24V transient logs and actuator switching.
- Parts: Rail monitor TI INA226; reset supervisor TI TPS3839; ESD near port (interface-dependent).
11 Should calibration be time-based, event-based, or drift-monitoring triggered?
Time-based calibration fits predictable drift and scheduled service. Event-based calibration fits condensation, filter replacement, tubing service, and wiring changes. Drift-monitoring triggers are best when logs exist: track bias and response health, then calibrate only when thresholds are exceeded. For RH/ΔP, slow response usually indicates maintenance/replacement rather than recalibration.
- Quick test: trend drift metrics + response-health flags over weeks.
- Parts: FRAM for durable logs: Fujitsu MB85RC256V, Infineon FM24CL64B; RTC: NXP PCF8523.
12 What is the minimum commissioning test checklist to quickly assign responsibility (sensor/actuator/comms/power)?
Use a minimum evidence set: device-end 24V min value + reset reason; PG behavior; raw sensor status/CRC and raw codes; actuator output terminal verification; stall/limit event counters; FG pulse statistics; RS-485 common-mode and waveform snapshot; Ethernet link counters with timestamps. This isolates whether the issue is measurement, actuation, comm physical layer, or power integrity.
- Quick test: capture one “bad event” window with timestamps across power/comms/actuation.
- Parts: Supervisor MAX809; telemetry INA226; RS-485 THVD1550; storage W25Q32JV.