Edge PID / Loop Controller (Precision ADC/DAC, Safety & HMI)
← Back to: Industrial Sensing & Process Control
Edge PID / Loop Controller is a deterministic “last-mile” control core that closes real-world loops locally by budgeting latency and jitter end-to-end, while keeping measurement/output paths precise and fail-safe under noise, brownouts, and software faults.
It focuses on evidence-driven design: measurable timing/ADC/output/PID states and a safe-state supervision policy so tuning and field debugging stay predictable and low-cost.
What This Page Covers (and What It Doesn’t)
An Edge PID / Loop Controller is the last-mile controller that closes a physical loop locally with deterministic timing. It combines precision ADC/DAC, a low-jitter time base, and robust safety supervision (watchdog + hold-up + safe outputs), plus a local HMI for commissioning, diagnostics, and recovery.
-
What readers should be able to design after this page
A loop-controller architecture that remains stable, low-noise, and deterministic under real scheduling, conversion delay, and output non-idealities.
-
What readers should be able to budget and verify
End-to-end latency + jitter from sensor → ADC → compute → DAC/PWM → actuator, with measurable timing checkpoints and acceptance limits tied to loop bandwidth and stability margins.
-
What readers should be able to harden for field reality
Fail-safe behavior that survives brownouts and software faults: watchdog policy, hold-up behavior, deterministic safe outputs, and post-fault evidence (reason codes + event logs).
System Architecture Blueprint (Signal, Time, Safety)
This blueprint decomposes an edge loop controller into three planes that must align: Signal (what is measured/commanded), Time (when sampling and actuation happen), and Safety (what happens when power or software becomes untrustworthy). The goal is a design that can be mapped to hardware and firmware in minutes, and verified using measurable checkpoints.
-
Signal plane (sensor → compute → actuator)
Sensor front-end and anti-aliasing define in-band noise and phase. ADC conversion + digital filtering define group delay. Output DAC/PWM defines settling, update boundary, and safe clamps.
-
Time plane (low-jitter clock → schedule → timestamp)
A low-jitter time base and deterministic schedule define sampling instant, compute window, and output update instant. Timestamping turns timing into evidence and enables repeatable debugging.
-
Safety plane (watchdog + brownout + hold-up + safe outputs)
Supervision must define safe output values and transitions under faults. Hold-up time is budgeted to preserve safe actuation and capture reason codes and last-gasp logs during brownouts.
Deliverable checklist: For every block below, pin down (1) a spec that matters for loop stability or safety, (2) how it is measured, and (3) the failure signature that will appear in logs or waveforms when it is wrong.
Deterministic Timing Model (Latency + Jitter Budget)
Many field issues that look like “bad tuning” are actually time-domain failures: sampling instants drift, compute deadlines slip, or outputs update at inconsistent moments. A deterministic timing model converts the full loop path into a budget that can be measured, validated, and enforced.
Sampling instant uncertainty
ADC aperture + clock jitter define when “now” is. Excess jitter behaves like phase noise and can destabilize high-bandwidth loops.
Conversion + filter group delay
ADC conversion and digital filtering add fixed delay. Group delay directly consumes phase margin near crossover.
Compute time (worst-case)
Worst-case execution time (not average) determines whether the loop ever misses its update boundary under load.
Output update time
DAC settling and PWM edge timing define when the actuator command becomes physically real.
Plant + sensor delay
Plant dead time and sensor dynamics are often the largest hidden delays; they must be identified and budgeted.
Deterministic update boundary
A single, repeatable “update instant” prevents non-uniform sampling and enables meaningful stability margins.
Budget fields to compute (must-have)
| Budget item | What it represents | How to measure (typical) | Failure signature |
|---|---|---|---|
| Tsample, Tsample_jitter | Nominal sampling period and sampling instant variation (RMS / p-p). | Timestamp sampling strobes; measure time-interval error on a scope/LA. | Limit-cycle noise, sensitivity to derivative term, unstable high-bandwidth behavior. |
| Tadc_conv | ADC conversion latency from sample to available code. | Toggle a GPIO at start/end of conversion; histogram the delay. | Unexpected phase loss; stability degrades without obvious tuning changes. |
| Tfilter_gd | Digital filter group delay (configuration dependent). | Measure step response delay; or derive from filter specs at relevant frequency. | Bandwidth cannot be raised; oscillation appears near crossover. |
| Tcompute_wc | Worst-case compute time including ISR latency, DMA contention, and cache effects. | GPIO start/end markers around control update; capture worst-case under max traffic. | Occasional output jumps; instability only under load; missed update boundaries. |
| Tupdate | Time from “compute finished” to actuator command physically settling (DAC/PWM). | Measure DAC settle-to-error threshold; PWM edge placement jitter via TIE. | Residual ripple, impulse-like disturbances, inconsistent actuation timing. |
| Tplant_deadtime | Equivalent plant dead time + sensor delay visible to the controller. | Step test / correlation fit; identify dominant delay and sensor dynamics. | Large overshoot, slow recovery, instability despite seemingly conservative gains. |
| Teffective_delay, Jtotal | Total effective delay and total jitter (RMS / conservative bound). | Sum budget components (delay) and aggregate jitter carefully (RMS or worst-case). | Margins do not match analysis; loop behaves differently than simulated tuning. |
Precision ADC Design for Control Loops (Accuracy vs Speed vs Noise)
ADC selection defines what can be measured in the loop bandwidth that matters. The key trade is not only resolution versus sample rate, but also in-band noise, group delay, and drift—all of which directly shape stability margins and steady-state performance.
SAR ADC
Low latency and fast settling support higher loop bandwidths, but require careful anti-aliasing, reference noise control, and layout hygiene.
Low delayFast responseNoise-managedSigma-Delta ADC
Excellent low-frequency noise performance via noise shaping, but digital filtering can introduce significant group delay that consumes phase margin.
Low in-band noiseFilter delayBandwidth-limited-
Input conditioning: anti-alias corner vs loop bandwidth
Anti-alias filtering must suppress out-of-band noise while keeping phase loss small near crossover. Every filter adds phase and must be included in the H2-3 timing budget.
-
Reference strategy: external vs ratiometric
Reference noise appears as measurement noise; reference drift appears as slow PV bias that the integrator must carry. Choose ratiometric sensing when sensor output scales with the same reference.
-
Calibration policy: offset/gain + temperature compensation
Define when calibration runs (factory, commissioning, maintenance), how data is validated (CRC/version), and how temperature compensation is applied without destabilizing the loop.
-
Over-sampling & decimation: noise benefit vs group delay
Over-sampling can improve in-band resolution, but decimation filters can add large group delay. If phase margin degrades, noise improvement becomes counterproductive.
DAC / PWM Output Stage (Settling, Glitch Energy, and Safe Clamps)
Output-stage non-idealities often drive “mystery” overshoot, limit cycles, and hard-to-reproduce artifacts. A robust output stage must treat the command as a time-aligned physical signal: it must settle quickly, avoid impulse-like glitches, remain phase-transparent (no hidden delay), and enforce safe bounds under both normal operation and faults.
Settling vs update rate
When DAC/PWM settles slower than the update boundary, the loop sees extra delay and residual error that consume phase margin.
Tdac_settleTupdatedelayGlitch energy
Small switching glitches can inject impulse-like disturbances that excite plant resonances and create unexplained ringing.
glitch arearingingresonanceFiltering without hiding phase
Output filters must not smuggle in unbudgeted group delay; phase impact must be accounted for in the H2-3 timing model.
group delayphasecrossoverSlew/rate limits and clamps
Slew limiting and hard/soft clamps protect actuators and avoid exciting resonances, but require integrator-aware handling.
du/dtu_min/u_maxanti-windup-
DAC settling and update boundary alignment
Define an update boundary and verify that the DAC output reaches the required error band before the next update. Record settle-to-threshold time (e.g., 0.1% / 0.01%) and treat residual settle as effective delay in the timing budget.
-
Glitch impulse control
Capture the output around update instants to quantify impulse-like glitches. If the plant rings after every update, prioritize glitch reduction and synchronous updating before changing PID gains.
-
Output filtering with explicit phase accounting
Any RC/active filter must publish its group delay (or phase at crossover) as a first-class budget item. Filtering that “looks good on noise” but erodes phase margin will amplify overshoot and instability risk.
-
Slew-rate limiting / rate clamp
Apply rate limits to avoid exciting mechanical or thermal resonances and to protect actuators. Rate limiting is a nonlinearity; its interaction with the integrator must be defined (freeze, back-calc, or conditional integration).
-
Clamps and safe bounds (hard + soft)
Define hard limits (absolute safety) and optional soft limits (comfort zone). A clamp strategy must specify integrator behavior to prevent windup, limit cycles, and slow recovery after saturation.
-
PWM as DAC: resolution, dither, edge jitter, synchronous update
PWM output quality is set by effective resolution, edge placement jitter, and whether updates are synchronized to a fixed boundary. Dither can spread quantization artifacts; unsynchronized edge timing behaves like output phase noise.
-
Failsafe output definition (fault/brownout)
Specify the exact safe output value (voltage/duty), the transition profile (hard cut vs ramp), and the time-to-safe requirement. Safe output behavior must remain deterministic even when software is untrusted.
Discrete-Time PID Done Right (Not Just Kp/Ki/Kd)
A PID controller is an embedded component with state, timing, and nonlinear boundaries. Correct behavior requires discretization consistent with the sample period, noise-aware derivative design, saturation-aware integration, and deterministic mode transitions that prevent output jumps.
Discretization method
Tustin/bilinear, backward Euler, and matched pole-zero produce different closed-loop behavior. The chosen method must match Tsample and delay budget.
TsampleTeffective_delayBand-limited derivative
Derivative action must be filtered to avoid amplifying measurement noise and sampling jitter; treat it as a bandwidth-limited feature.
D filternoiseSetpoint weighting (β, γ)
Weighting reduces overshoot on setpoint steps without sacrificing disturbance rejection, separating tracking from regulation behavior.
βγAnti-windup + saturation policies
Back-calculation or conditional integration prevents integrator windup during clamps and slew limits, improving recovery and avoiding limit cycles.
anti-windupsat_time%-
Discretization choices: Tustin vs backward Euler vs matched pole-zero
Select a discretization aligned to the expected operating bandwidth and timing constraints. If delay/jitter is non-negligible, use the H2-3 budget to determine the safe crossover region before selecting the discrete form.
-
Derivative filtering: band-limited derivative
Implement derivative action with an explicit low-pass limit. Without band-limiting, measurement noise and sampling jitter translate into output chatter and resonant excitation.
-
Setpoint weighting (β, γ)
Apply setpoint weighting to reduce overshoot on setpoint changes while keeping disturbance rejection aggressive. Weighting should be logged alongside step-response tests for reproducibility.
-
Anti-windup: back-calculation vs conditional integration
Back-calculation provides smoother recovery but requires a well-chosen tracking time constant; conditional integration is simpler but needs strict saturation and direction rules. Choose based on clamp behavior and recovery requirements.
-
Bumpless transfer: manual↔auto, mode changes, setpoint steps
Ensure mode transitions do not introduce output discontinuities. Align integrator state and output command at the transition boundary, and apply setpoint ramping when required by actuator constraints.
-
Output saturation handling and integrator freeze policies
Define integrator behavior under saturation and slew limits (freeze, back-calc, or conditional). Record saturation duration and integrator state to diagnose slow recovery and limit cycles.
| Artifact | What to compare | What it proves | Minimum fields to log |
|---|---|---|---|
| Step response A/B | Baseline vs +weighting vs +anti-windup | Overshoot reduction and recovery behavior under saturation | Tsample, overshoot%, settling time, sat_time%, integrator_state |
| Disturbance response | With/without weighting (disturbance path) | Disturbance rejection preserved while tracking is improved | error(t), control output u(t), bandwidth notes |
| Stability margins | Margins at chosen sample rate and delay budget | Defensible crossover region and robustness reserve | Tsample, Teffective_delay, phase margin, gain margin |
Loop Shaping Toolkit (Bode, Margins, and Compensation Patterns)
This chapter turns tuning into a repeatable workflow: measure the plant response, set explicit margin targets, and select compensation patterns that trade phase, noise, and delay in a controlled way. The goal is not perfect modeling; it is a defensible bandwidth and robustness reserve backed by evidence.
Measure first (FRF)
Identify dominant poles, resonances, and effective delay using measured frequency response rather than ideal assumptions.
FRFdominant polesdelayTarget margins
Set phase/gain margin targets and a crossover range consistent with latency budget and noise constraints.
PM targetGM targetcrossoverChoose patterns
Use lead/lag, notch, feedforward, and cascaded-loop separation patterns as a practical toolbox with known tradeoffs.
lead/lagnotchFFValidate & document
Verify margins, noise amplification, and delay consumption; record filter values and evidence fields for reproducibility.
PM/GMnoiseTeffectivePractical toolbox (what to do and when)
-
Plant identification: dominant poles and effective delay
Use measured response to locate slope changes and phase roll-off that reveal dominant dynamics. Extract effective delay and treat it as a hard constraint on crossover. Avoid relying on ideal plant order assumptions when fixtures, sensors, or computation introduce hidden dynamics.
-
Lead compensation (add phase near crossover)
Use lead when phase margin is short near the desired crossover. Place the lead action around the crossover region to boost phase while minimizing excessive high-frequency gain that can amplify measurement noise.
-
Lag compensation (increase low-frequency gain)
Use lag to reduce steady-state error and lower integrator burden when the plant allows it. Explicitly account for the phase penalty and ensure the added low-frequency gain does not increase saturation time or limit-cycle risk.
-
Notch filters (resonance suppression with a latency trade)
Use a notch when a resonance peak dominates the response near or below the intended crossover. Tune notch center frequency from measured resonance; keep bandwidth only as wide as required. Document the added phase/group delay impact.
-
Feedforward (reduce integrator burden for predictable loads)
Use feedforward when disturbance or load is measurable or predictable. Validate that feedforward reduces residual error and integrator activity without degrading stability margins.
-
Cascaded loops (inner/outer bandwidth separation)
Use an inner loop to linearize and speed up a fast dynamic, then wrap an outer loop for slower regulation. Maintain clear bandwidth separation so the outer loop does not fight the inner loop or inject delay into the fast path.
-
Deadband / hysteresis (nonlinear effects on limit cycles)
Deadband and hysteresis can create limit cycles and bias. Treat them as nonlinear elements: identify signatures in residual error and output histograms, then apply linearization or redesign boundaries rather than “tuning harder.”
| Pattern | Primary goal | When to use | Tradeoff to document | Evidence fields |
|---|---|---|---|---|
| Lead | Increase PM near crossover | PM shortfall at desired bandwidth | High-frequency noise gain | PM_target, lead_zero/pole, FRF_phase(f) |
| Lag | Increase low-frequency gain | Steady-state error/integrator burden high | Phase penalty / slower response | LF gain, lag_zero/pole, sat_time% |
| Notch | Suppress resonance peak | Resonance near/under crossover | Added group delay / sensitivity | f_res, Q_res, notch_f0/Q, group_delay |
| Feedforward | Reduce residual error & I load | Predictable load/disturbance available | Model mismatch / bias | ff_gain, residual_error_RMS, integrator_state |
| Cascade | Decouple fast/slow dynamics | Two distinct time constants exist | Bandwidth separation requirement | inner_bw, outer_bw, separation_ratio |
| Deadband | Reduce chatter / handle stiction | Actuator deadzone/stiction visible | Limit cycles / bias risk | deadband_width, u_histogram, limit_cycle_amp |
Robustness in the Real World (Noise, Drift, Quantization, and Nonlinearities)
Field stability is often lost to “small” effects that are invisible in ideal models: quantization creates limit cycles, drift biases the integrator, deadzones introduce stick-slip behavior, and temperature shifts move gain and phase. Robust designs treat these effects as first-class elements with clear signatures and logging evidence.
Quantization & limit cycles
Finite resolution can prevent convergence and create periodic residual error. Dithering can spread energy into a noise floor.
LSBlimit cycleditherDrift & bias
Sensor bias drift drives integrator load and can cause saturation or slow recovery. Bias estimation must be slow and verifiable.
offsettempI stateDeadzone / stiction
Actuator deadzone makes small outputs ineffective, then motion occurs abruptly. Linearization must avoid relay-feedback traps.
deadzonestick-slipNonlinear limits
Rate limits and saturation change loop dynamics and can create limit cycles and slow recovery unless anti-windup is aligned.
du/dtsat_time%Signatures and first fixes
-
Quantization limit cycles
Signature: periodic residual error and output toggling near steady state. First fix: add controlled dither or increase effective resolution; reduce derivative sensitivity to quantized PV noise.
-
Sensor bias drift
Signature: slowly growing DC residual error and integrator state drift. First fix: implement offset estimation/compensation with a slow time constant; validate via temperature-tagged logs.
-
Actuator stiction / deadzone
Signature: output increases without plant response until a threshold, then motion jumps. First fix: deadzone linearization and hysteresis rules; avoid aggressive relay-like probing that destabilizes the loop.
-
Temperature gain/phase drift
Signature: performance changes with temperature, including reduced phase margin and new oscillation bands. First fix: gain scheduling with smooth interpolation; log PM proxies vs temperature to prevent discontinuities.
-
Rate limits and saturation as nonlinear elements
Signature: slow recovery after saturation, persistent offset, or limit cycles near clamps. First fix: align anti-windup policy with clamp/slew behavior; verify saturation time percentage decreases.
| Effect | Typical symptom | What to log | Why it matters |
|---|---|---|---|
| Quantization | Limit cycle near steady state | residual error waveform, u histogram, LSB estimates | Distinguishes true oscillation from resolution-induced chatter |
| Drift | DC bias grows over time | residual error DC, integrator state, temperature | Shows whether integrator is carrying sensor bias or plant bias |
| Deadzone | Threshold then jump response | u histogram, stick-slip events, deadzone estimate | Separates stiction effects from poor PID gains |
| Temp effects | Stability changes with temperature | temperature, performance metrics, PM proxy | Enables scheduling decisions and prevents abrupt mode changes |
| Nonlinear limits | Slow recovery / clamp chatter | sat_time%, integrator state, u histogram | Verifies anti-windup effectiveness and boundary policy alignment |
Safety Supervision (Watchdog, Brownout, Hold-Up, Safe State)
A loop controller must fail predictably. Safety supervision defines health criteria, supply-fault behavior, hold-up obligations, and safe outputs in measurable terms—then proves them with reset reasons, brownout flags, timing evidence, and event logs.
Health is more than “kick the watchdog”
Define health as heartbeat + deterministic timing + data-path liveness (sampling and output updates), not just a periodic feed.
heartbeatdeadlinedata-pathTwo brownout lines: warn vs reset
Separate early-warning actions from hard reset thresholds so the system can enter a safe state before control becomes unreliable.
warn thresholdreset thresholdHold-up is a budget with duties
Hold-up time must be sized to maintain safe outputs and write minimum evidence logs (“last-gasp”) within a verified time-to-safe.
hold-up mstime-to-safelast-gaspSafe state must be measurable
Specify the exact clamp/ramp/latch behavior and the maximum time allowed to reach it under brownout or watchdog faults.
clampramplatchDesign decisions to pin down (and how to prove them)
-
Watchdog policy: simple vs windowed + “healthy” definition
Choose simple watchdogs for basic hang detection, or windowed watchdogs to also catch runaway timing. Define “healthy” as a combination of heartbeat presence and deterministic execution evidence (e.g., control loop deadlines met, sampling/output tasks observed). A feed should be allowed only when health evidence is valid.
-
Brownout detection: early warning vs hard reset thresholds
Set an early-warning threshold that triggers safe-state entry and log preparation, and a lower hard-reset threshold that forces a reset when reliable control is no longer possible. Validate both thresholds against real supply dip waveforms and detection latency.
-
Hold-up time budget: what must remain correct during supply collapse
Define the hold-up obligation explicitly: maintain safe outputs for a guaranteed minimum time and complete the minimum evidence logging path (event sequence counter + timestamp + reset/brownout cause). Measure time-to-safe and last-gasp completion under worst-case dips.
-
Safe state definition: clamp/ramp/latch and timing
Safe state is an output behavior contract. Specify the safe output value (or duty), whether the transition is a bounded ramp or a hard clamp, and the maximum time-to-safe. Ensure the safe output is maintained through the hold-up window and after reset (if required).
-
Fault taxonomy: recoverable vs latched vs degraded
Classify faults so recovery is predictable: recoverable faults can retry with bounded attempts; latched faults require explicit user/service action; degraded faults reduce bandwidth/output limits while preserving safety. Define retry counts and backoff time to avoid rapid oscillation between states.
-
Event log integrity: monotonic counters, timestamps, and last-gasp strategy
Logs must survive power events: use a monotonic event sequence counter to prevent ambiguity, store timestamps from the same timebase used by the control loop, and implement a “last-gasp” write path that commits the final cause codes before brownout reset. Record commit success as part of the evidence.
| Decision | What must be specified | What to measure (evidence) | Pass criteria example |
|---|---|---|---|
| Watchdog | Mode (simple/windowed), health gates | watchdog_cause, deadline_miss_count, heartbeat_miss_count | No deadline misses at worst case; watchdog triggers only on defined unhealthy conditions |
| Brownout | Warn/reset thresholds, detection latency | brownout_warn_flag, brownout_reset_flag, supply dip waveform | Warn asserted before unsafe execution; reset only below hard threshold |
| Hold-up | Hold-up duration and duties | hold_up_ms_measured, time_to_safe_ms, last_gasp_commit_ok | Safe output maintained ≥ target; last-gasp commit success in worst-case dip |
| Safe state | Clamp/ramp/latch behavior | u_safe_value, safe_ramp_rate, time_to_safe_ms | Output reaches safe state within bound and stays there until recovery policy allows exit |
| Fault policy | Recoverable/latched/degraded + retry/backoff | fault_code, fault_class, retry_count, backoff_ms | No rapid flapping; repeated faults transition to latched or degraded states predictably |
| Log integrity | Monotonic counter, timestamp source, commit rules | event_seq monotonicity, event_ts validity, log_commit_ok | No counter rollback; timestamps consistent; final cause persisted before reset |
Firmware Execution Model (Determinism First)
Determinism is an architecture property. The control loop must run at a fixed rate with bounded jitter while comms and UI execute in lower-priority lanes. Parameter updates must be atomic at defined boundaries, sampling must be timestamped consistently, and commits must support rollback to prevent half-written states.
Priority separation
Fixed-rate control ISR/task is highest priority; comms and HMI are lower priority and must never steal deadlines.
fixed-ratedeadlinesSynchronous update boundaries
Setpoints and parameters are double-buffered and swap only at a safe boundary so the control math never sees half updates.
double bufferatomic swapSampling path determinism
ADC sampling uses DMA and consistent timestamping; ISR latency and overruns are measured, not assumed.
ADC DMAtimestampCommit & rollback
Calibration and configuration writes use an atomic commit protocol with integrity checks and rollback on failure.
commitrollbackCRCArchitecture patterns (determinism-first)
-
Fixed-rate control lane + non-real-time lanes
The control loop runs on a fixed-rate ISR or highest-priority task with bounded worst-case execution time. Communication, UI rendering, and logging run in lower-priority lanes and must be rate-limited to protect deadlines.
-
Double-buffered setpoints and synchronous parameter swaps
Use two copies of setpoints/parameters: one “active” used by the control loop and one “staging” written by comms/UI. Swap pointers only at a defined boundary (e.g., loop tick) so updates are atomic and deterministic.
-
ADC DMA + timestamping strategy (sampling instant is a contract)
Trigger sampling in a deterministic way (timer-driven), use DMA to move samples, and attach timestamps using the same timebase as the control loop. Record ISR latency distribution and DMA overruns to prove stability under load.
-
Calibration updates and commits: atomicity + rollback
Treat calibration/configuration as transactional: write to a staging slot, verify integrity (CRC), then commit with a single state flip. On failure, keep the previous known-good slot and record rollback evidence.
-
Timebase health monitoring and jitter monitoring
Monitor timebase health continuously: detect abnormal jitter growth and, if applicable, clock source faults. Trigger safe degradation policies when timing quality cannot be guaranteed.
| Subsystem | Determinism risk | Evidence fields | Pass criteria example |
|---|---|---|---|
| Control lane | Missed deadlines from contention | jitter_pkpk, deadline_miss_count, isr_latency_hist | Zero deadline misses; jitter within bound at worst-case load |
| Params | Half-updated setpoints/coefficients | param_version, param_switch_ts, boundary_ok | Param changes only at boundary; no mixed-version cycles observed |
| Sampling | ADC jitter/overrun and timestamp mismatch | sample_ts, adc_dma_overrun, isr_latency_hist | No overruns; sampling instant consistent; timestamps monotonic |
| Comm/UI | Priority inversion / burst traffic | cpu_load_worstcase, queue_depth, rate_limit_hits | Control deadlines protected even during burst comms/UI activity |
| Commit | Half-written calibration/config | commit_ok, rollback_count, crc_ok, cfg_slot | Commit is atomic; rollback preserves previous slot; failures are logged |
| Timebase | Clock instability raises jitter | clock_ok, jitter_rms, clock_switch_event | Jitter stays within bound; faults trigger degrade policy and evidence logs |
Local HMI for Commissioning & Operations (Make Field Debug Cheap)
A local HMI should answer decisive questions quickly: current mode, target, PV/SV behavior, alarms, and I/O health. The goal is not “more screens”—it is a minimum set of screens and guided flows that reduce mis-tuning, speed up triage, and keep changes auditable even when the network is unavailable.
Minimal but decisive screens
Mode, setpoint/output, PV/SV trend, alarms, and I/O state—each with the minimum evidence fields required for triage.
statustrendalarmsI/OCommissioning flows
Guided steps for zero/span, actuator direction check, and auto/manual transitions with bumpless behavior and safe limits.
zero/spandirectionauto/manualGuided diagnostics rules
“If oscillating → check X/Y” style rules that point to concrete evidence fields, then recommend the smallest safe action.
if-thenevidenceRoles, audit, offline
Operator vs technician permissions, change audit trails, and offline-safe operation so field work is possible without network.
RBACauditofflineScreen set (minimum) and required fields
| Screen | Purpose | Minimum fields (evidence-first) | Fast decision it enables |
|---|---|---|---|
| Status / Mode | Establish current operating state | mode, state, timebase_ok, last_reset_reason, brownout flags | Is the system in control, manual, safe, or degraded? |
| Setpoint & Output | Confirm commanded vs applied | SP, PV, output_u, clamp/ramp state, saturation % | Is output limited or saturating? |
| PV/SV Trend | Observe dynamics cheaply | PV, SV, error, output_u (short window + long window) | Oscillating, slow, drifting, or healthy? |
| Alarms | Explain why behavior changed | fault_code, fault_class, timestamp, recommended checks | Recoverable vs latched vs degraded path? |
| I/O State | Validate signal path | adc_valid, adc_overrun, sample_ts, output_update_strobe, io_range flags | Is the loop failing due to I/O or timing? |
Commissioning flows (guided, safe by design)
-
Flow A — Sensor zero/span (with commit/rollback)
Step 1: ensure mode=Manual and safe output bounds are active. Step 2: capture zero reference, then span reference; validate adc_valid and range flags. Step 3: write calibration to staging and verify integrity; then atomically commit (or rollback on failure).
-
Flow B — Actuator direction check (small step + bounded ramp)
Apply a small output step through a bounded ramp. Confirm PV changes in the expected direction. If direction is wrong, fix mapping before enabling auto control. Log the action with an audit record.
-
Flow C — Auto/manual transition (bumpless enable)
Preload the output and integrator state to match current PV. Switch to auto using bumpless transfer rules. The HMI should display time-to-stable and saturation percentage during the first seconds after enable.
Guided diagnostics (symptom → evidence → first safe action)
-
If oscillating → check jitter and delay first
Evidence: jitter_pkpk (or output update timing), deadline_miss_count. First action: reduce loop bandwidth (lower update rate or add filtering) and lock parameter updates to boundaries.
-
If slow → check total delay and output settling
Evidence: latency budget counters, DAC/PWM update-to-settle time, and PV trend slope. First action: remove hidden delay (excessive filtering), then retune for the effective delay.
-
If no response → check I/O path liveness before tuning
Evidence: adc_valid, overrun flags, I/O range flags, and output clamp state. First action: force a safe manual test step and validate direction and sensor range.
-
If saturating → treat it as a nonlinear event, not a tuning event
Evidence: saturation %, integrator state snapshot, output clamp/rate limit active. First action: enable/adjust anti-windup policy and confirm safe bounds are correct for the actuator.
- Operator: change setpoint, switch mode (Auto/Manual/Safe), acknowledge alarms.
- Technician: calibration and PID/limits, but only via guided flows and with change records.
- Audit: record param_version, who/when/what, and commit/rollback outcomes.
MPN examples (local HMI + field debug BOM)
MCU / compute (HMI + control-class)
STMicroelectronics STM32H743 · NXP MIMXRT1062 · Microchip SAM E70
Safety supervision helpers: Texas Instruments TPS3839 (supervisor) · Maxim/ADI MAX809 (reset supervisor)
Display / touch (local HMI)
Bridgetek/FTDI BT815 / FT813 (graphics controller) · Solomon Systech SSD1963 (TFT controller)
Touch controller: FocalTech FT5336 · Goodix GT911
Time & nonvolatile (audit + offline)
RTC: Micro Crystal RV-3028-C7 · Analog Devices/Maxim DS3231M
FRAM: Fujitsu MB85RS64V · SPI NOR: Winbond W25Q64JV
Field I/O and isolation (service ports)
Isolated RS-485: Analog Devices ADM2587E · CAN/CAN-FD: TI TCAN1042
Digital isolator: ADI ADuM1401 · Silicon Labs Si8642 · eFuse/high-side: TI TPS2595 / TPS2660
Figures & “Cite this figure” Plan (3:2 SVG set)
This subpage uses three reusable 3:2 block-diagram figures. Each figure is designed to be understandable in seconds, with minimal text (≥18px) and high diagram density. The “Cite this figure” block is intentionally visible to support reuse and referencing.
F1 — Architecture map
Signal chain + timebase + supervision + HMI, with callouts for latency/jitter points, safe-state path, and log path.
F2 — Latency & jitter waterfall
Stacked timeline from sampling aperture to output settling, with jitter bars on sampling and output update edges.
F3 — Discrete PID essentials
Block diagram for setpoint weighting, band-limited derivative, saturation, anti-windup feedback, and bumpless transfer.
MPN callouts (optional)
Small MPN labels can be used for typical building blocks (ADC/DAC/MCU/supervisor), without turning the figure into a datasheet.
H2-13. FAQs (Accordion ×12)
Each answer follows a fixed field-debug structure: 1 conclusion + 2 evidence checks + 1 first fix, with chapter back-references. Keep evidence fields measurable (scope/log/counters) and keep the first fix minimal and reversible.
Q1 Oscillation started after a firmware update—tuning changed or timing changed?
Conclusion: Treat post-update oscillation as a determinism regression until proven otherwise; timing changes can mimic “bad tuning.”
- Evidence check #1: Compare jitter_pkpk/deadline_miss_count before vs after the update (control task and ADC sampling edge).
- Evidence check #2: Verify param_version and a hash/summary of PID + filters + clamps matches the intended release configuration.
First fix: Lock the loop to a fixed-rate ISR/high-priority task and freeze parameter updates to synchronous boundaries (double-buffered commit).
Use a supervisor to ensure reset/brownout behavior is consistent across firmware revisions and test runs.
Q2 Looks stable in lab, unstable on the machine—noise coupling or resonance?
Conclusion: If the instability is machine-only, suspect a narrowband resonance or injected noise before retuning gains.
- Evidence check #1: Check whether oscillation frequency is consistent across runs (resonance signature) using PV/SV trend and an FFT snapshot if available.
- Evidence check #2: Compare in-band measurement noise (inband_noise_rms) and ground/reference flags between lab and machine.
First fix: Reduce loop bandwidth slightly and add a targeted notch only if a stable resonance peak is confirmed; otherwise harden the measurement path (shielding/grounding/reference).
Isolation on RS-485 helps break ground loops that often appear only when installed on the machine.
Q3 Overshoot only on setpoint steps—setpoint weighting missing or bumpless transfer broken?
Conclusion: Step-only overshoot usually points to setpoint-path implementation (weighting) or a broken bumpless enable—not plant changes.
- Evidence check #1: Confirm setpoint weighting mode/state (sp_weight_mode) and whether P/D respond directly to SV steps.
- Evidence check #2: During manual↔auto transition, observe integrator_state and output jump; bumpless systems keep the output continuous.
First fix: Enable setpoint weighting and preload/track the integrator during mode changes (manual tracking + bumpless switch), then revalidate step response.
Q4 Slow recovery from disturbance—integrator limit, output saturation, or hidden delay?
Conclusion: Slow recovery is often saturation or hidden delay disguised as “needs more Ki.”
- Evidence check #1: Inspect sat_percent and clamp_active during the disturbance and recovery window.
- Evidence check #2: Measure effective delay (Tdelay) and filter group delay (ADC digital filter/decimation) relative to the sample rate.
First fix: Remove hidden delay first (reduce excessive filtering/decimation) and ensure anti-windup is active; only then adjust Ki.
Q5 Limit cycle at steady state—quantization, deadband, or derivative noise?
Conclusion: Steady-state limit cycles are usually quantization/nonlinearity effects; changing Kp often makes them worse.
- Evidence check #1: Look at output_histogram and PV ripple: a few discrete output levels indicate quantization-dominated cycling.
- Evidence check #2: Check deadband_active and D-term noise (or derivative filter state) for high-frequency chatter.
First fix: Band-limit or reduce D, add a small dither when quantization dominates, and avoid deadband unless explicitly required by the actuator.
Q6 Random spikes in control output—ADC reference instability or clock jitter?
Conclusion: Random spikes usually originate from the measurement/timebase path; treat them as evidence problems first, not tuning problems.
- Evidence check #1: Correlate spikes with reference health (vref_mon or ref-good flag) and supply dip events.
- Evidence check #2: Correlate spikes with jitter_pkpk on sampling or output update edges (logic analyzer or timer capture).
First fix: Freeze the control schedule (fixed-rate) and enable reference monitoring; isolate by disabling noncritical tasks to see whether jitter coupling disappears.
Use a precision ADC with well-understood digital filter delay and stable reference strategy when spikes are measurement-path driven.
Q7 After brownout, output jumps briefly—hold-up policy or DAC reset state?
Conclusion: A post-brownout “output blip” is usually a missing safe clamp during the early-warning window or an unsafe DAC/PWM default state.
- Evidence check #1: Verify brownout_flag, supply_dip_min, and hold_up_ms around the event.
- Evidence check #2: Capture the first output state after reset (dac_reset_state or PWM default duty) and whether clamp_active asserted immediately.
First fix: Define and enforce a safe state: clamp/ramp-down on early warning, and hardware-default the output to a safe value during reset.
A proper supervisor can provide predictable reset thresholds and timing to prevent undefined output windows.
Q8 Watchdog resets but logs show nothing—logging not last-gasp safe or reset cause not captured?
Conclusion: If resets are real but logs are empty, either reset cause capture is late/non-atomic, or the log write path cannot survive the final milliseconds.
- Evidence check #1: Ensure last_reset_reason/wdog_cause is captured at boot very early and stored in a retained/backup register.
- Evidence check #2: Verify last_gasp_write_ok and monotonic_counter continuity (missing increments imply failed atomic commits).
First fix: Implement a minimal last-gasp record (reset cause + timestamp/counter + a few key fields) with atomic commit and a detectable “write complete” marker.
FRAM enables fast, low-wear writes for tiny last-gasp records that must survive resets and brownouts.
Q9 Tuning changes don’t “stick”—parameter commit atomicity or version mismatch?
Conclusion: “Doesn’t stick” is usually a commit/rollback problem or a UI-to-runtime version mismatch, not “bad gains.”
- Evidence check #1: Compare the runtime control task’s param_version against the HMI-displayed version (they must match after commit).
- Evidence check #2: Inspect commit_state/rollback_state and the last commit error code under power-cycle and brownout tests.
First fix: Use a double-buffer parameter bank and boundary switch-over; make the HMI show “active version + activation timestamp” explicitly.
Q10 HMI shows correct PV, but actuator behaves wrong—output scaling/clamp or direction inversion?
Conclusion: If PV is correct but the actuator response is wrong, the fault is usually in the output path: scaling, polarity, clamps, or rate limits.
- Evidence check #1: Check clamp_active/rate_limit_active and whether output is stuck near a bound despite controller effort.
- Evidence check #2: Run the commissioning direction test: a small manual step should move PV in the expected direction; otherwise polarity/inversion is present.
First fix: Re-run the actuator direction check with safe bounds, then correct output scaling/polarity and revalidate clamps before retuning gains.
A known, monotonic DAC with predictable reset behavior reduces “mystery” scaling and glitch issues in the output path.
Q11 Works until comms traffic increases—deadline miss or priority inversion?
Conclusion: Load-dependent instability is typically determinism loss (deadline misses) caused by IRQ storms, DMA contention, or priority inversion.
- Evidence check #1: Track deadline_miss_count and max execution time of the control task under worst-case comms.
- Evidence check #2: Watch max_isr_latency or comms IRQ occupancy; correlate with output jitter or PV ripple.
First fix: Make control fixed-rate and highest priority, throttle comms/UI, and schedule parameter updates only at safe boundaries.
Digital isolation helps reduce ground noise and transients that worsen under heavy comms activity in the field.
Q12 Stable at room temp, unstable when hot—gain drift, sensor bias, or rate limits?
Conclusion: Temperature-triggered instability is usually a drift or nonlinearity problem; it changes the effective plant or measurement scale, not just the gains.
- Evidence check #1: Correlate temp with sensor bias estimate (offset) and any gain schedule state; look for monotonic drift.
- Evidence check #2: Check whether rate_limit_active or sat_percent increases at high temperature (actuator capability drops).
First fix: Enable temperature-aware calibration/gain scheduling and revalidate clamps/rate limits at hot conditions; only then retune bandwidth.
Timestamp changes and events to correlate drift/instability with temperature and operating history.