123 Main Street, New York, NY 10001

Edge PID / Loop Controller (Precision ADC/DAC, Safety & HMI)

← Back to: Industrial Sensing & Process Control

Edge PID / Loop Controller is a deterministic “last-mile” control core that closes real-world loops locally by budgeting latency and jitter end-to-end, while keeping measurement/output paths precise and fail-safe under noise, brownouts, and software faults.

It focuses on evidence-driven design: measurable timing/ADC/output/PID states and a safe-state supervision policy so tuning and field debugging stay predictable and low-cost.

H2-1 · What This Page Covers (and What It Doesn’t)

What This Page Covers (and What It Doesn’t)

An Edge PID / Loop Controller is the last-mile controller that closes a physical loop locally with deterministic timing. It combines precision ADC/DAC, a low-jitter time base, and robust safety supervision (watchdog + hold-up + safe outputs), plus a local HMI for commissioning, diagnostics, and recovery.

  • What readers should be able to design after this page

    A loop-controller architecture that remains stable, low-noise, and deterministic under real scheduling, conversion delay, and output non-idealities.

  • What readers should be able to budget and verify

    End-to-end latency + jitter from sensor → ADC → compute → DAC/PWM → actuator, with measurable timing checkpoints and acceptance limits tied to loop bandwidth and stability margins.

  • What readers should be able to harden for field reality

    Fail-safe behavior that survives brownouts and software faults: watchdog policy, hold-up behavior, deterministic safe outputs, and post-fault evidence (reason codes + event logs).

Not covered (by design): This page focuses on the edge controller side of the loop (timing, conversion, supervision, HMI). It avoids deep dives into plant-specific modeling, high-power actuator stages, and cloud/SCADA platform topics.
Edge PID / Loop Controller Scope boundary and design promises Deterministic Timing Precision ADC / DAC Safety Supervision Latency + jitter budget ADC DAC Noise + delay + drift Watchdog + hold-up Fixed update boundary Measured worst-case timing Timestamped evidence In-band ENOB / noise Group delay tracked Calibrated drift policy Defined safe outputs Brownout survival Reason codes + logs Read this page to turn “PID tuning” into a measurable system: timing, conversion, and supervision.
Figure H2-1 — Scope boundary: the controller is defined by deterministic timing, precision conversion, and survivable safety supervision.
Cite this figure Use this diagram to reference the edge loop-controller scope (timing + ADC/DAC + safety).
H2-2 · System Architecture Blueprint (Signal, Time, Safety)

System Architecture Blueprint (Signal, Time, Safety)

This blueprint decomposes an edge loop controller into three planes that must align: Signal (what is measured/commanded), Time (when sampling and actuation happen), and Safety (what happens when power or software becomes untrustworthy). The goal is a design that can be mapped to hardware and firmware in minutes, and verified using measurable checkpoints.

  • Signal plane (sensor → compute → actuator)

    Sensor front-end and anti-aliasing define in-band noise and phase. ADC conversion + digital filtering define group delay. Output DAC/PWM defines settling, update boundary, and safe clamps.

  • Time plane (low-jitter clock → schedule → timestamp)

    A low-jitter time base and deterministic schedule define sampling instant, compute window, and output update instant. Timestamping turns timing into evidence and enables repeatable debugging.

  • Safety plane (watchdog + brownout + hold-up + safe outputs)

    Supervision must define safe output values and transitions under faults. Hold-up time is budgeted to preserve safe actuation and capture reason codes and last-gasp logs during brownouts.

Deliverable checklist: For every block below, pin down (1) a spec that matters for loop stability or safety, (2) how it is measured, and (3) the failure signature that will appear in logs or waveforms when it is wrong.

Architecture Blueprint Signal plane + Time plane + Safety plane Signal plane Time plane Safety plane Sensor AFE anti-alias + protect ADC noise + delay Control Compute PID + filters + modes sync update boundary DAC / PWM settle + clamp feedback Low-jitter clock Deterministic schedule Timestamp evidence Watchdog + reason code Brownout + hold-up Safe outputs + ramp policy
Figure H2-2 — The blueprint aligns three planes: signal (loop math + conversion), time (sampling/update determinism), and safety (survivable outputs + evidence).
Cite this figure Use this diagram to reference the edge controller planes (signal/time/safety) and their verification checkpoints.
H2-3 · Deterministic Timing Model (Latency + Jitter Budget)

Deterministic Timing Model (Latency + Jitter Budget)

Many field issues that look like “bad tuning” are actually time-domain failures: sampling instants drift, compute deadlines slip, or outputs update at inconsistent moments. A deterministic timing model converts the full loop path into a budget that can be measured, validated, and enforced.

Sampling instant uncertainty

ADC aperture + clock jitter define when “now” is. Excess jitter behaves like phase noise and can destabilize high-bandwidth loops.

Conversion + filter group delay

ADC conversion and digital filtering add fixed delay. Group delay directly consumes phase margin near crossover.

Compute time (worst-case)

Worst-case execution time (not average) determines whether the loop ever misses its update boundary under load.

Output update time

DAC settling and PWM edge timing define when the actuator command becomes physically real.

Plant + sensor delay

Plant dead time and sensor dynamics are often the largest hidden delays; they must be identified and budgeted.

Deterministic update boundary

A single, repeatable “update instant” prevents non-uniform sampling and enables meaningful stability margins.

Key separation: Delay is a systematic phase loss; jitter is time uncertainty that behaves like noise/phase modulation. Both must be measured and treated differently.

Budget fields to compute (must-have)

Budget item What it represents How to measure (typical) Failure signature
Tsample, Tsample_jitter Nominal sampling period and sampling instant variation (RMS / p-p). Timestamp sampling strobes; measure time-interval error on a scope/LA. Limit-cycle noise, sensitivity to derivative term, unstable high-bandwidth behavior.
Tadc_conv ADC conversion latency from sample to available code. Toggle a GPIO at start/end of conversion; histogram the delay. Unexpected phase loss; stability degrades without obvious tuning changes.
Tfilter_gd Digital filter group delay (configuration dependent). Measure step response delay; or derive from filter specs at relevant frequency. Bandwidth cannot be raised; oscillation appears near crossover.
Tcompute_wc Worst-case compute time including ISR latency, DMA contention, and cache effects. GPIO start/end markers around control update; capture worst-case under max traffic. Occasional output jumps; instability only under load; missed update boundaries.
Tupdate Time from “compute finished” to actuator command physically settling (DAC/PWM). Measure DAC settle-to-error threshold; PWM edge placement jitter via TIE. Residual ripple, impulse-like disturbances, inconsistent actuation timing.
Tplant_deadtime Equivalent plant dead time + sensor delay visible to the controller. Step test / correlation fit; identify dominant delay and sensor dynamics. Large overshoot, slow recovery, instability despite seemingly conservative gains.
Teffective_delay, Jtotal Total effective delay and total jitter (RMS / conservative bound). Sum budget components (delay) and aggregate jitter carefully (RMS or worst-case). Margins do not match analysis; loop behaves differently than simulated tuning.
Guidance (not dogma): keep effective delay well below the inverse of the desired crossover frequency; keep jitter small relative to the sample period, otherwise treat it as noise/phase modulation and adjust filtering, derivative action, and scheduling determinism.
Latency + Jitter Budget Timing waterfall with jitter bars (measurable checkpoints) time Aperture jitter ADC conv Tadc_conv Filter Tfilter_gd Compute Tcompute_wc Update Tupdate sampling jitter output edge jitter Plant + sensor delay Tplant_deadtime Budget outputs Tsample / Tloop Teffective_delay Jtotal (RMS / p-p) Checkpoints: sample strobe · ADC-ready · compute end · output settled
Figure H2-3 — A timing waterfall turns the loop into measurable delays and jitter sources, enabling a defensible stability budget.
Cite this figure Reference this diagram when documenting latency/jitter checkpoints and total effective delay.
H2-4 · Precision ADC Design for Control Loops (Accuracy vs Speed vs Noise)

Precision ADC Design for Control Loops (Accuracy vs Speed vs Noise)

ADC selection defines what can be measured in the loop bandwidth that matters. The key trade is not only resolution versus sample rate, but also in-band noise, group delay, and drift—all of which directly shape stability margins and steady-state performance.

SAR ADC

Low latency and fast settling support higher loop bandwidths, but require careful anti-aliasing, reference noise control, and layout hygiene.

Low delayFast responseNoise-managed

Sigma-Delta ADC

Excellent low-frequency noise performance via noise shaping, but digital filtering can introduce significant group delay that consumes phase margin.

Low in-band noiseFilter delayBandwidth-limited
  • Input conditioning: anti-alias corner vs loop bandwidth

    Anti-alias filtering must suppress out-of-band noise while keeping phase loss small near crossover. Every filter adds phase and must be included in the H2-3 timing budget.

  • Reference strategy: external vs ratiometric

    Reference noise appears as measurement noise; reference drift appears as slow PV bias that the integrator must carry. Choose ratiometric sensing when sensor output scales with the same reference.

  • Calibration policy: offset/gain + temperature compensation

    Define when calibration runs (factory, commissioning, maintenance), how data is validated (CRC/version), and how temperature compensation is applied without destabilizing the loop.

  • Over-sampling & decimation: noise benefit vs group delay

    Over-sampling can improve in-band resolution, but decimation filters can add large group delay. If phase margin degrades, noise improvement becomes counterproductive.

Evidence fields (debug-ready): ENOB (in-band) input-referred noise density step response group delay Vref noise drift (ppm/°C) long-term stability
Precision ADC Trade Map Noise vs delay vs drift (closed-loop impact) Sensor + AFE anti-alias ADC core SAR / ΣΔ Digital filter group delay Codes PV samples Closed-loop trade rails In-band noise lower is better Group delay lower is better Drift budget ppm/°C SAR ADC low delay · fast loops Sigma-Delta ADC low noise · filter delay Evidence ENOB (in-band) noise density step response group delay drift (ppm/°C)
Figure H2-4 — ADC selection must be evaluated in-band (noise) and in-loop (group delay + drift), then validated using evidence fields that match debug workflows.
Cite this figure Reference this diagram to explain SAR vs Sigma-Delta tradeoffs using noise, delay, drift, and evidence fields.
H2-5 · DAC / PWM Output Stage (Settling, Glitch Energy, and Safe Clamps)

DAC / PWM Output Stage (Settling, Glitch Energy, and Safe Clamps)

Output-stage non-idealities often drive “mystery” overshoot, limit cycles, and hard-to-reproduce artifacts. A robust output stage must treat the command as a time-aligned physical signal: it must settle quickly, avoid impulse-like glitches, remain phase-transparent (no hidden delay), and enforce safe bounds under both normal operation and faults.

Settling vs update rate

When DAC/PWM settles slower than the update boundary, the loop sees extra delay and residual error that consume phase margin.

Tdac_settleTupdatedelay

Glitch energy

Small switching glitches can inject impulse-like disturbances that excite plant resonances and create unexplained ringing.

glitch arearingingresonance

Filtering without hiding phase

Output filters must not smuggle in unbudgeted group delay; phase impact must be accounted for in the H2-3 timing model.

group delayphasecrossover

Slew/rate limits and clamps

Slew limiting and hard/soft clamps protect actuators and avoid exciting resonances, but require integrator-aware handling.

du/dtu_min/u_maxanti-windup
  • DAC settling and update boundary alignment

    Define an update boundary and verify that the DAC output reaches the required error band before the next update. Record settle-to-threshold time (e.g., 0.1% / 0.01%) and treat residual settle as effective delay in the timing budget.

  • Glitch impulse control

    Capture the output around update instants to quantify impulse-like glitches. If the plant rings after every update, prioritize glitch reduction and synchronous updating before changing PID gains.

  • Output filtering with explicit phase accounting

    Any RC/active filter must publish its group delay (or phase at crossover) as a first-class budget item. Filtering that “looks good on noise” but erodes phase margin will amplify overshoot and instability risk.

  • Slew-rate limiting / rate clamp

    Apply rate limits to avoid exciting mechanical or thermal resonances and to protect actuators. Rate limiting is a nonlinearity; its interaction with the integrator must be defined (freeze, back-calc, or conditional integration).

  • Clamps and safe bounds (hard + soft)

    Define hard limits (absolute safety) and optional soft limits (comfort zone). A clamp strategy must specify integrator behavior to prevent windup, limit cycles, and slow recovery after saturation.

  • PWM as DAC: resolution, dither, edge jitter, synchronous update

    PWM output quality is set by effective resolution, edge placement jitter, and whether updates are synchronized to a fixed boundary. Dither can spread quantization artifacts; unsynchronized edge timing behaves like output phase noise.

  • Failsafe output definition (fault/brownout)

    Specify the exact safe output value (voltage/duty), the transition profile (hard cut vs ramp), and the time-to-safe requirement. Safe output behavior must remain deterministic even when software is untrusted.

Evidence fields (debug-ready): Tdac_settle glitch area edge_jitter_rms filter_group_delay u_min/u_max du/dt_limit time-to-safe
Output Stage Map Settling · glitch · phase · clamps · failsafe Compute update boundary DAC / PWM settle + edge timing Output filter phase / delay Actuator plant input Non-idealities Settling Glitch Edge jitter Bounds + failsafe Clamp (hard/soft) u_min / u_max Slew / rate limit du/dt limit Fault path Fault / brownout Safe output (value + time)
Figure H2-5 — Output quality depends on settling, glitch energy, edge jitter, and phase-transparent filtering; safety requires explicit clamps and deterministic safe outputs.
Cite this figure Use this diagram to document output non-idealities, clamp policies, and fault-to-safe behavior.
H2-6 · Discrete-Time PID Done Right (Not Just Kp/Ki/Kd)

Discrete-Time PID Done Right (Not Just Kp/Ki/Kd)

A PID controller is an embedded component with state, timing, and nonlinear boundaries. Correct behavior requires discretization consistent with the sample period, noise-aware derivative design, saturation-aware integration, and deterministic mode transitions that prevent output jumps.

Discretization method

Tustin/bilinear, backward Euler, and matched pole-zero produce different closed-loop behavior. The chosen method must match Tsample and delay budget.

TsampleTeffective_delay

Band-limited derivative

Derivative action must be filtered to avoid amplifying measurement noise and sampling jitter; treat it as a bandwidth-limited feature.

D filternoise

Setpoint weighting (β, γ)

Weighting reduces overshoot on setpoint steps without sacrificing disturbance rejection, separating tracking from regulation behavior.

βγ

Anti-windup + saturation policies

Back-calculation or conditional integration prevents integrator windup during clamps and slew limits, improving recovery and avoiding limit cycles.

anti-windupsat_time%
  • Discretization choices: Tustin vs backward Euler vs matched pole-zero

    Select a discretization aligned to the expected operating bandwidth and timing constraints. If delay/jitter is non-negligible, use the H2-3 budget to determine the safe crossover region before selecting the discrete form.

  • Derivative filtering: band-limited derivative

    Implement derivative action with an explicit low-pass limit. Without band-limiting, measurement noise and sampling jitter translate into output chatter and resonant excitation.

  • Setpoint weighting (β, γ)

    Apply setpoint weighting to reduce overshoot on setpoint changes while keeping disturbance rejection aggressive. Weighting should be logged alongside step-response tests for reproducibility.

  • Anti-windup: back-calculation vs conditional integration

    Back-calculation provides smoother recovery but requires a well-chosen tracking time constant; conditional integration is simpler but needs strict saturation and direction rules. Choose based on clamp behavior and recovery requirements.

  • Bumpless transfer: manual↔auto, mode changes, setpoint steps

    Ensure mode transitions do not introduce output discontinuities. Align integrator state and output command at the transition boundary, and apply setpoint ramping when required by actuator constraints.

  • Output saturation handling and integrator freeze policies

    Define integrator behavior under saturation and slew limits (freeze, back-calc, or conditional). Record saturation duration and integrator state to diagnose slow recovery and limit cycles.

Proof artifacts (repeatable): step response A/B overshoot% settling time phase margin gain margin Tsample
Artifact What to compare What it proves Minimum fields to log
Step response A/B Baseline vs +weighting vs +anti-windup Overshoot reduction and recovery behavior under saturation Tsample, overshoot%, settling time, sat_time%, integrator_state
Disturbance response With/without weighting (disturbance path) Disturbance rejection preserved while tracking is improved error(t), control output u(t), bandwidth notes
Stability margins Margins at chosen sample rate and delay budget Defensible crossover region and robustness reserve Tsample, Teffective_delay, phase margin, gain margin
Discrete PID Component Map A PID that behaves: timing + noise + boundaries + transitions Setpoint (SP) Process value (PV) Weighting β, γ Error e = SP – PV Discrete PID discretization method state + Tsample P I D D filter band-limited Limiter clamp + slew Anti-windup back-calc / cond Bumpless transfer manual ↔ auto · mode change Control output (u)
Figure H2-6 — A stable embedded PID requires discretization consistent with Tsample, a band-limited derivative, saturation-aware integration, and bumpless transitions.
Cite this figure Use this diagram to document discrete PID features: weighting, D filtering, anti-windup, bumpless transfer, and output limiting.
H2-7 · Loop Shaping Toolkit (Bode, Margins, and Compensation Patterns)

Loop Shaping Toolkit (Bode, Margins, and Compensation Patterns)

This chapter turns tuning into a repeatable workflow: measure the plant response, set explicit margin targets, and select compensation patterns that trade phase, noise, and delay in a controlled way. The goal is not perfect modeling; it is a defensible bandwidth and robustness reserve backed by evidence.

Measure first (FRF)

Identify dominant poles, resonances, and effective delay using measured frequency response rather than ideal assumptions.

FRFdominant polesdelay

Target margins

Set phase/gain margin targets and a crossover range consistent with latency budget and noise constraints.

PM targetGM targetcrossover

Choose patterns

Use lead/lag, notch, feedforward, and cascaded-loop separation patterns as a practical toolbox with known tradeoffs.

lead/lagnotchFF

Validate & document

Verify margins, noise amplification, and delay consumption; record filter values and evidence fields for reproducibility.

PM/GMnoiseTeffective

Practical toolbox (what to do and when)

  • Plant identification: dominant poles and effective delay

    Use measured response to locate slope changes and phase roll-off that reveal dominant dynamics. Extract effective delay and treat it as a hard constraint on crossover. Avoid relying on ideal plant order assumptions when fixtures, sensors, or computation introduce hidden dynamics.

  • Lead compensation (add phase near crossover)

    Use lead when phase margin is short near the desired crossover. Place the lead action around the crossover region to boost phase while minimizing excessive high-frequency gain that can amplify measurement noise.

  • Lag compensation (increase low-frequency gain)

    Use lag to reduce steady-state error and lower integrator burden when the plant allows it. Explicitly account for the phase penalty and ensure the added low-frequency gain does not increase saturation time or limit-cycle risk.

  • Notch filters (resonance suppression with a latency trade)

    Use a notch when a resonance peak dominates the response near or below the intended crossover. Tune notch center frequency from measured resonance; keep bandwidth only as wide as required. Document the added phase/group delay impact.

  • Feedforward (reduce integrator burden for predictable loads)

    Use feedforward when disturbance or load is measurable or predictable. Validate that feedforward reduces residual error and integrator activity without degrading stability margins.

  • Cascaded loops (inner/outer bandwidth separation)

    Use an inner loop to linearize and speed up a fast dynamic, then wrap an outer loop for slower regulation. Maintain clear bandwidth separation so the outer loop does not fight the inner loop or inject delay into the fast path.

  • Deadband / hysteresis (nonlinear effects on limit cycles)

    Deadband and hysteresis can create limit cycles and bias. Treat them as nonlinear elements: identify signatures in residual error and output histograms, then apply linearization or redesign boundaries rather than “tuning harder.”

Translate a phase-margin target into filter values: start from measured crossover and phase, compute the phase shortfall, then select lead/lag/notch parameters that add the required phase where it matters. Re-validate PM/GM and confirm the result does not violate the delay budget.
Pattern Primary goal When to use Tradeoff to document Evidence fields
Lead Increase PM near crossover PM shortfall at desired bandwidth High-frequency noise gain PM_target, lead_zero/pole, FRF_phase(f)
Lag Increase low-frequency gain Steady-state error/integrator burden high Phase penalty / slower response LF gain, lag_zero/pole, sat_time%
Notch Suppress resonance peak Resonance near/under crossover Added group delay / sensitivity f_res, Q_res, notch_f0/Q, group_delay
Feedforward Reduce residual error & I load Predictable load/disturbance available Model mismatch / bias ff_gain, residual_error_RMS, integrator_state
Cascade Decouple fast/slow dynamics Two distinct time constants exist Bandwidth separation requirement inner_bw, outer_bw, separation_ratio
Deadband Reduce chatter / handle stiction Actuator deadzone/stiction visible Limit cycles / bias risk deadband_width, u_histogram, limit_cycle_amp
Loop Shaping Toolbox FRF → margins → patterns → validate Measured FRF magnitude + phase Mag Phase Targets PM / GM / crossover PM target GM target crossover range Patterns choose with tradeoffs Lead Lag Notch FF Cascade Deadband Validate margins · noise · delay budget PM/GM Noise Delay
Figure H2-7 — A practical loop-shaping workflow connects measured FRF to explicit margin targets, then to compensation patterns and validation checks.
Cite this figure Use this diagram to document the loop-shaping workflow and compensation choices.
H2-8 · Robustness in the Real World (Noise, Drift, Quantization, and Nonlinearities)

Robustness in the Real World (Noise, Drift, Quantization, and Nonlinearities)

Field stability is often lost to “small” effects that are invisible in ideal models: quantization creates limit cycles, drift biases the integrator, deadzones introduce stick-slip behavior, and temperature shifts move gain and phase. Robust designs treat these effects as first-class elements with clear signatures and logging evidence.

Quantization & limit cycles

Finite resolution can prevent convergence and create periodic residual error. Dithering can spread energy into a noise floor.

LSBlimit cycledither

Drift & bias

Sensor bias drift drives integrator load and can cause saturation or slow recovery. Bias estimation must be slow and verifiable.

offsettempI state

Deadzone / stiction

Actuator deadzone makes small outputs ineffective, then motion occurs abruptly. Linearization must avoid relay-feedback traps.

deadzonestick-slip

Nonlinear limits

Rate limits and saturation change loop dynamics and can create limit cycles and slow recovery unless anti-windup is aligned.

du/dtsat_time%

Signatures and first fixes

  • Quantization limit cycles

    Signature: periodic residual error and output toggling near steady state. First fix: add controlled dither or increase effective resolution; reduce derivative sensitivity to quantized PV noise.

  • Sensor bias drift

    Signature: slowly growing DC residual error and integrator state drift. First fix: implement offset estimation/compensation with a slow time constant; validate via temperature-tagged logs.

  • Actuator stiction / deadzone

    Signature: output increases without plant response until a threshold, then motion jumps. First fix: deadzone linearization and hysteresis rules; avoid aggressive relay-like probing that destabilizes the loop.

  • Temperature gain/phase drift

    Signature: performance changes with temperature, including reduced phase margin and new oscillation bands. First fix: gain scheduling with smooth interpolation; log PM proxies vs temperature to prevent discontinuities.

  • Rate limits and saturation as nonlinear elements

    Signature: slow recovery after saturation, persistent offset, or limit cycles near clamps. First fix: align anti-windup policy with clamp/slew behavior; verify saturation time percentage decreases.

Minimum logging pack: residual error (RMS + DC) control output histogram saturation time % integrator state temperature
Effect Typical symptom What to log Why it matters
Quantization Limit cycle near steady state residual error waveform, u histogram, LSB estimates Distinguishes true oscillation from resolution-induced chatter
Drift DC bias grows over time residual error DC, integrator state, temperature Shows whether integrator is carrying sensor bias or plant bias
Deadzone Threshold then jump response u histogram, stick-slip events, deadzone estimate Separates stiction effects from poor PID gains
Temp effects Stability changes with temperature temperature, performance metrics, PM proxy Enables scheduling decisions and prevents abrupt mode changes
Nonlinear limits Slow recovery / clamp chatter sat_time%, integrator state, u histogram Verifies anti-windup effectiveness and boundary policy alignment
Field Robustness Map Effects → symptoms → evidence logs Effects Symptoms Logs Quantization Drift Deadzone Temperature Limits (sat/du/dt) Limit cycle DC residual Jump response PM loss Slow recovery Residual error (RMS/DC) Output histogram Saturation time % Integrator state Temperature
Figure H2-8 — Robustness comes from treating quantization, drift, deadzones, temperature, and nonlinear limits as first-class effects with measurable signatures and logs.
Cite this figure Use this diagram to map field effects to symptoms and the minimum evidence logs required for diagnosis.
H2-9 · Safety Supervision (Watchdog, Brownout, Hold-Up, Safe State)

Safety Supervision (Watchdog, Brownout, Hold-Up, Safe State)

A loop controller must fail predictably. Safety supervision defines health criteria, supply-fault behavior, hold-up obligations, and safe outputs in measurable terms—then proves them with reset reasons, brownout flags, timing evidence, and event logs.

Health is more than “kick the watchdog”

Define health as heartbeat + deterministic timing + data-path liveness (sampling and output updates), not just a periodic feed.

heartbeatdeadlinedata-path

Two brownout lines: warn vs reset

Separate early-warning actions from hard reset thresholds so the system can enter a safe state before control becomes unreliable.

warn thresholdreset threshold

Hold-up is a budget with duties

Hold-up time must be sized to maintain safe outputs and write minimum evidence logs (“last-gasp”) within a verified time-to-safe.

hold-up mstime-to-safelast-gasp

Safe state must be measurable

Specify the exact clamp/ramp/latch behavior and the maximum time allowed to reach it under brownout or watchdog faults.

clampramplatch

Design decisions to pin down (and how to prove them)

  • Watchdog policy: simple vs windowed + “healthy” definition

    Choose simple watchdogs for basic hang detection, or windowed watchdogs to also catch runaway timing. Define “healthy” as a combination of heartbeat presence and deterministic execution evidence (e.g., control loop deadlines met, sampling/output tasks observed). A feed should be allowed only when health evidence is valid.

  • Brownout detection: early warning vs hard reset thresholds

    Set an early-warning threshold that triggers safe-state entry and log preparation, and a lower hard-reset threshold that forces a reset when reliable control is no longer possible. Validate both thresholds against real supply dip waveforms and detection latency.

  • Hold-up time budget: what must remain correct during supply collapse

    Define the hold-up obligation explicitly: maintain safe outputs for a guaranteed minimum time and complete the minimum evidence logging path (event sequence counter + timestamp + reset/brownout cause). Measure time-to-safe and last-gasp completion under worst-case dips.

  • Safe state definition: clamp/ramp/latch and timing

    Safe state is an output behavior contract. Specify the safe output value (or duty), whether the transition is a bounded ramp or a hard clamp, and the maximum time-to-safe. Ensure the safe output is maintained through the hold-up window and after reset (if required).

  • Fault taxonomy: recoverable vs latched vs degraded

    Classify faults so recovery is predictable: recoverable faults can retry with bounded attempts; latched faults require explicit user/service action; degraded faults reduce bandwidth/output limits while preserving safety. Define retry counts and backoff time to avoid rapid oscillation between states.

  • Event log integrity: monotonic counters, timestamps, and last-gasp strategy

    Logs must survive power events: use a monotonic event sequence counter to prevent ambiguity, store timestamps from the same timebase used by the control loop, and implement a “last-gasp” write path that commits the final cause codes before brownout reset. Record commit success as part of the evidence.

Evidence fields (minimum set): last_reset_reason brownout_warn_flag brownout_reset_flag watchdog_cause event_seq event_ts last_gasp_commit_ok supply_dip_waveform
Decision What must be specified What to measure (evidence) Pass criteria example
Watchdog Mode (simple/windowed), health gates watchdog_cause, deadline_miss_count, heartbeat_miss_count No deadline misses at worst case; watchdog triggers only on defined unhealthy conditions
Brownout Warn/reset thresholds, detection latency brownout_warn_flag, brownout_reset_flag, supply dip waveform Warn asserted before unsafe execution; reset only below hard threshold
Hold-up Hold-up duration and duties hold_up_ms_measured, time_to_safe_ms, last_gasp_commit_ok Safe output maintained ≥ target; last-gasp commit success in worst-case dip
Safe state Clamp/ramp/latch behavior u_safe_value, safe_ramp_rate, time_to_safe_ms Output reaches safe state within bound and stays there until recovery policy allows exit
Fault policy Recoverable/latched/degraded + retry/backoff fault_code, fault_class, retry_count, backoff_ms No rapid flapping; repeated faults transition to latched or degraded states predictably
Log integrity Monotonic counter, timestamp source, commit rules event_seq monotonicity, event_ts validity, log_commit_ok No counter rollback; timestamps consistent; final cause persisted before reset
Safety Supervision healthy criteria · brownout lines · hold-up · safe state · logs Supply Rail brownout warn vs reset Warn threshold early actions Reset threshold hard reset Watchdog + Health Gate feed allowed only if healthy Heartbeat Deadlines met Safe State clamp / ramp / latch + time-to-safe Clamp Ramp Latch Hold-up maintain safe outputs hold-up ms budget Event Log Integrity monotonic counter + timestamp + last-gasp commit event_seq event_ts commit
Figure H2-9 — Separate brownout warn/reset lines, gate watchdog feeds with health evidence, enter safe state within a measured time-to-safe, and persist last-gasp logs with monotonic counters.
Cite this figure Use this diagram to document safety supervision decisions, evidence fields, and the safe-state + logging obligations.
H2-10 · Firmware Execution Model (Determinism First)

Firmware Execution Model (Determinism First)

Determinism is an architecture property. The control loop must run at a fixed rate with bounded jitter while comms and UI execute in lower-priority lanes. Parameter updates must be atomic at defined boundaries, sampling must be timestamped consistently, and commits must support rollback to prevent half-written states.

Priority separation

Fixed-rate control ISR/task is highest priority; comms and HMI are lower priority and must never steal deadlines.

fixed-ratedeadlines

Synchronous update boundaries

Setpoints and parameters are double-buffered and swap only at a safe boundary so the control math never sees half updates.

double bufferatomic swap

Sampling path determinism

ADC sampling uses DMA and consistent timestamping; ISR latency and overruns are measured, not assumed.

ADC DMAtimestamp

Commit & rollback

Calibration and configuration writes use an atomic commit protocol with integrity checks and rollback on failure.

commitrollbackCRC

Architecture patterns (determinism-first)

  • Fixed-rate control lane + non-real-time lanes

    The control loop runs on a fixed-rate ISR or highest-priority task with bounded worst-case execution time. Communication, UI rendering, and logging run in lower-priority lanes and must be rate-limited to protect deadlines.

  • Double-buffered setpoints and synchronous parameter swaps

    Use two copies of setpoints/parameters: one “active” used by the control loop and one “staging” written by comms/UI. Swap pointers only at a defined boundary (e.g., loop tick) so updates are atomic and deterministic.

  • ADC DMA + timestamping strategy (sampling instant is a contract)

    Trigger sampling in a deterministic way (timer-driven), use DMA to move samples, and attach timestamps using the same timebase as the control loop. Record ISR latency distribution and DMA overruns to prove stability under load.

  • Calibration updates and commits: atomicity + rollback

    Treat calibration/configuration as transactional: write to a staging slot, verify integrity (CRC), then commit with a single state flip. On failure, keep the previous known-good slot and record rollback evidence.

  • Timebase health monitoring and jitter monitoring

    Monitor timebase health continuously: detect abnormal jitter growth and, if applicable, clock source faults. Trigger safe degradation policies when timing quality cannot be guaranteed.

Acceptance criteria (instrumentable): jitter_pkpk deadline_miss_count cpu_load_worstcase isr_latency_hist adc_dma_overrun
Prove deterministic behavior with a scope/logic analyzer on a loop-tick or output-update strobe. Under worst-case comms + UI + logging activity, deadline_miss_count must remain zero and measured jitter must stay within the defined bound.
Subsystem Determinism risk Evidence fields Pass criteria example
Control lane Missed deadlines from contention jitter_pkpk, deadline_miss_count, isr_latency_hist Zero deadline misses; jitter within bound at worst-case load
Params Half-updated setpoints/coefficients param_version, param_switch_ts, boundary_ok Param changes only at boundary; no mixed-version cycles observed
Sampling ADC jitter/overrun and timestamp mismatch sample_ts, adc_dma_overrun, isr_latency_hist No overruns; sampling instant consistent; timestamps monotonic
Comm/UI Priority inversion / burst traffic cpu_load_worstcase, queue_depth, rate_limit_hits Control deadlines protected even during burst comms/UI activity
Commit Half-written calibration/config commit_ok, rollback_count, crc_ok, cfg_slot Commit is atomic; rollback preserves previous slot; failures are logged
Timebase Clock instability raises jitter clock_ok, jitter_rms, clock_switch_event Jitter stays within bound; faults trigger degrade policy and evidence logs
Deterministic Execution Model fixed-rate control lane + atomic updates + measured jitter Control lane (highest priority) timer tick → sample → compute → output Timebase ADC DMA Timestamp Compute Output Comms / UI lane (lower priority) rate-limited · cannot steal deadlines Comms Local HMI Logging (bounded) Double-buffered Params swap only at boundary Active Staging Boundary Commit / Rollback transactional updates CRC Rollback Acceptance scope/LA proof jitter_pkpk deadline_miss protect deadlines
Figure H2-10 — Determinism-first execution isolates a fixed-rate control lane, uses DMA + consistent timestamping, swaps parameters only at boundaries, and applies transactional commits with rollback; acceptance is proven with measured jitter and zero deadline misses.
Cite this figure Use this diagram to document the scheduling model, atomic update boundaries, and acceptance evidence for deterministic execution.
H2-11 · Local HMI for Commissioning & Operations

Local HMI for Commissioning & Operations (Make Field Debug Cheap)

A local HMI should answer decisive questions quickly: current mode, target, PV/SV behavior, alarms, and I/O health. The goal is not “more screens”—it is a minimum set of screens and guided flows that reduce mis-tuning, speed up triage, and keep changes auditable even when the network is unavailable.

Minimal but decisive screens

Mode, setpoint/output, PV/SV trend, alarms, and I/O state—each with the minimum evidence fields required for triage.

statustrendalarmsI/O

Commissioning flows

Guided steps for zero/span, actuator direction check, and auto/manual transitions with bumpless behavior and safe limits.

zero/spandirectionauto/manual

Guided diagnostics rules

“If oscillating → check X/Y” style rules that point to concrete evidence fields, then recommend the smallest safe action.

if-thenevidence

Roles, audit, offline

Operator vs technician permissions, change audit trails, and offline-safe operation so field work is possible without network.

RBACauditoffline

Screen set (minimum) and required fields

Screen Purpose Minimum fields (evidence-first) Fast decision it enables
Status / Mode Establish current operating state mode, state, timebase_ok, last_reset_reason, brownout flags Is the system in control, manual, safe, or degraded?
Setpoint & Output Confirm commanded vs applied SP, PV, output_u, clamp/ramp state, saturation % Is output limited or saturating?
PV/SV Trend Observe dynamics cheaply PV, SV, error, output_u (short window + long window) Oscillating, slow, drifting, or healthy?
Alarms Explain why behavior changed fault_code, fault_class, timestamp, recommended checks Recoverable vs latched vs degraded path?
I/O State Validate signal path adc_valid, adc_overrun, sample_ts, output_update_strobe, io_range flags Is the loop failing due to I/O or timing?

Commissioning flows (guided, safe by design)

  • Flow A — Sensor zero/span (with commit/rollback)

    Step 1: ensure mode=Manual and safe output bounds are active. Step 2: capture zero reference, then span reference; validate adc_valid and range flags. Step 3: write calibration to staging and verify integrity; then atomically commit (or rollback on failure).

  • Flow B — Actuator direction check (small step + bounded ramp)

    Apply a small output step through a bounded ramp. Confirm PV changes in the expected direction. If direction is wrong, fix mapping before enabling auto control. Log the action with an audit record.

  • Flow C — Auto/manual transition (bumpless enable)

    Preload the output and integrator state to match current PV. Switch to auto using bumpless transfer rules. The HMI should display time-to-stable and saturation percentage during the first seconds after enable.

Guided diagnostics (symptom → evidence → first safe action)

  • If oscillating → check jitter and delay first

    Evidence: jitter_pkpk (or output update timing), deadline_miss_count. First action: reduce loop bandwidth (lower update rate or add filtering) and lock parameter updates to boundaries.

  • If slow → check total delay and output settling

    Evidence: latency budget counters, DAC/PWM update-to-settle time, and PV trend slope. First action: remove hidden delay (excessive filtering), then retune for the effective delay.

  • If no response → check I/O path liveness before tuning

    Evidence: adc_valid, overrun flags, I/O range flags, and output clamp state. First action: force a safe manual test step and validate direction and sensor range.

  • If saturating → treat it as a nonlinear event, not a tuning event

    Evidence: saturation %, integrator state snapshot, output clamp/rate limit active. First action: enable/adjust anti-windup policy and confirm safe bounds are correct for the actuator.

Role-based access + audit (minimum):
  • Operator: change setpoint, switch mode (Auto/Manual/Safe), acknowledge alarms.
  • Technician: calibration and PID/limits, but only via guided flows and with change records.
  • Audit: record param_version, who/when/what, and commit/rollback outcomes.
Offline requirement: the above screens and flows must function safely without network connectivity.

MPN examples (local HMI + field debug BOM)

MCU / compute (HMI + control-class)

STMicroelectronics STM32H743 · NXP MIMXRT1062 · Microchip SAM E70

Safety supervision helpers: Texas Instruments TPS3839 (supervisor) · Maxim/ADI MAX809 (reset supervisor)

Display / touch (local HMI)

Bridgetek/FTDI BT815 / FT813 (graphics controller) · Solomon Systech SSD1963 (TFT controller)

Touch controller: FocalTech FT5336 · Goodix GT911

Time & nonvolatile (audit + offline)

RTC: Micro Crystal RV-3028-C7 · Analog Devices/Maxim DS3231M

FRAM: Fujitsu MB85RS64V · SPI NOR: Winbond W25Q64JV

Field I/O and isolation (service ports)

Isolated RS-485: Analog Devices ADM2587E · CAN/CAN-FD: TI TCAN1042

Digital isolator: ADI ADuM1401 · Silicon Labs Si8642 · eFuse/high-side: TI TPS2595 / TPS2660

Local HMI for Field Debug screens · flows · guided rules · roles/audit · offline Minimal Screens decisive evidence fields Status / Mode Setpoint & Output PV/SV Trend Alarms I/O State Commissioning Flows guided + safe limits Zero/Span Direction Auto/Manual Guided Diagnostics symptom → evidence → first safe action If oscillating If slow If no response If saturating Roles & Audit Operator Technician Audit Offline-capable commissioning and triage
Figure H2-11 — A “debug-cheap” local HMI is a minimum screen set plus guided flows and diagnostics rules, governed by role-based access and an audit trail, and usable offline.
Cite this figure Use this diagram to document the local HMI information architecture and the field triage workflow.
H2-12 · Figures & “Cite this figure” Plan (3:2 SVG set)

Figures & “Cite this figure” Plan (3:2 SVG set)

This subpage uses three reusable 3:2 block-diagram figures. Each figure is designed to be understandable in seconds, with minimal text (≥18px) and high diagram density. The “Cite this figure” block is intentionally visible to support reuse and referencing.

F1 — Architecture map

Signal chain + timebase + supervision + HMI, with callouts for latency/jitter points, safe-state path, and log path.

F2 — Latency & jitter waterfall

Stacked timeline from sampling aperture to output settling, with jitter bars on sampling and output update edges.

F3 — Discrete PID essentials

Block diagram for setpoint weighting, band-limited derivative, saturation, anti-windup feedback, and bumpless transfer.

MPN callouts (optional)

Small MPN labels can be used for typical building blocks (ADC/DAC/MCU/supervisor), without turning the figure into a datasheet.

F1 · Edge Loop Controller Map signal chain + time + supervision + HMI Signal Chain Sensor ADC Control DAC/PWM Actuator Timebase sampling + update timing Low-jitter clk Timestamps Supervision watchdog · brownout · hold-up Watchdog Brownout Logs Local HMI commissioning + guided diagnostics Trends Alarms I/O lat/jit lat/jit MPN ex: ADS131M04 MPN ex: DAC8562 MPN ex: TPS3839
F1 — Architecture map: signal chain plus timebase, supervision, and local HMI. The “lat/jit” markers identify where timing uncertainty matters most.
Cite this figure Use this figure to explain the loop controller blocks, the timing-critical points, and the safety/logging paths.
F2 · Latency & Jitter Waterfall stacked timing segments + jitter bars time Aperture Conversion Filtering Compute Settle jitter jitter max allowed latency ADC group delay scheduler latency output update timing settling tail Use this figure to make “timing problems” obvious sampling jitter and output edge jitter behave like phase/noise in real loops
F2 — Latency & jitter waterfall: a visual budget from sampling aperture to output settling, highlighting where jitter and hidden delay accumulate.
Cite this figure Use this figure to document the end-to-end timing chain and where to instrument jitter and delay in the field.
F3 · Discrete PID Essentials anti-windup + bumpless + band-limited D SV PV SP weight Σ P I D filter Σ Saturation u (output) Anti-windup Bumpless switch Manual track
F3 — Discrete PID essentials: setpoint weighting, band-limited derivative, saturation, anti-windup feedback, and bumpless transfer hooks.
Cite this figure Use this figure to explain why anti-windup and bumpless transfer are implementation necessities, not tuning “extras.”
MPN examples referenced in figures (minimal callouts): ADS131M04 DAC8562 TPS3839
These callouts are intentionally minimal: they illustrate typical building blocks (precision ADC, DAC, supervisor) without turning the figure into a parts list.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-13. FAQs (Accordion ×12)

Each answer follows a fixed field-debug structure: 1 conclusion + 2 evidence checks + 1 first fix, with chapter back-references. Keep evidence fields measurable (scope/log/counters) and keep the first fix minimal and reversible.

Q1 Oscillation started after a firmware update—tuning changed or timing changed?

Conclusion: Treat post-update oscillation as a determinism regression until proven otherwise; timing changes can mimic “bad tuning.”

  • Evidence check #1: Compare jitter_pkpk/deadline_miss_count before vs after the update (control task and ADC sampling edge).
  • Evidence check #2: Verify param_version and a hash/summary of PID + filters + clamps matches the intended release configuration.

First fix: Lock the loop to a fixed-rate ISR/high-priority task and freeze parameter updates to synchronous boundaries (double-buffered commit).

Refs: H2-3H2-10H2-11
MPN example (reset/supervision): TI TPS3839

Use a supervisor to ensure reset/brownout behavior is consistent across firmware revisions and test runs.

Q2 Looks stable in lab, unstable on the machine—noise coupling or resonance?

Conclusion: If the instability is machine-only, suspect a narrowband resonance or injected noise before retuning gains.

  • Evidence check #1: Check whether oscillation frequency is consistent across runs (resonance signature) using PV/SV trend and an FFT snapshot if available.
  • Evidence check #2: Compare in-band measurement noise (inband_noise_rms) and ground/reference flags between lab and machine.

First fix: Reduce loop bandwidth slightly and add a targeted notch only if a stable resonance peak is confirmed; otherwise harden the measurement path (shielding/grounding/reference).

Refs: H2-4H2-7H2-8
MPN example (isolated field bus): ADI ADM2587E

Isolation on RS-485 helps break ground loops that often appear only when installed on the machine.

Q3 Overshoot only on setpoint steps—setpoint weighting missing or bumpless transfer broken?

Conclusion: Step-only overshoot usually points to setpoint-path implementation (weighting) or a broken bumpless enable—not plant changes.

  • Evidence check #1: Confirm setpoint weighting mode/state (sp_weight_mode) and whether P/D respond directly to SV steps.
  • Evidence check #2: During manual↔auto transition, observe integrator_state and output jump; bumpless systems keep the output continuous.

First fix: Enable setpoint weighting and preload/track the integrator during mode changes (manual tracking + bumpless switch), then revalidate step response.

Refs: H2-6H2-11
Q4 Slow recovery from disturbance—integrator limit, output saturation, or hidden delay?

Conclusion: Slow recovery is often saturation or hidden delay disguised as “needs more Ki.”

  • Evidence check #1: Inspect sat_percent and clamp_active during the disturbance and recovery window.
  • Evidence check #2: Measure effective delay (Tdelay) and filter group delay (ADC digital filter/decimation) relative to the sample rate.

First fix: Remove hidden delay first (reduce excessive filtering/decimation) and ensure anti-windup is active; only then adjust Ki.

Refs: H2-3H2-4H2-5H2-6
Q5 Limit cycle at steady state—quantization, deadband, or derivative noise?

Conclusion: Steady-state limit cycles are usually quantization/nonlinearity effects; changing Kp often makes them worse.

  • Evidence check #1: Look at output_histogram and PV ripple: a few discrete output levels indicate quantization-dominated cycling.
  • Evidence check #2: Check deadband_active and D-term noise (or derivative filter state) for high-frequency chatter.

First fix: Band-limit or reduce D, add a small dither when quantization dominates, and avoid deadband unless explicitly required by the actuator.

Refs: H2-6H2-8
Q6 Random spikes in control output—ADC reference instability or clock jitter?

Conclusion: Random spikes usually originate from the measurement/timebase path; treat them as evidence problems first, not tuning problems.

  • Evidence check #1: Correlate spikes with reference health (vref_mon or ref-good flag) and supply dip events.
  • Evidence check #2: Correlate spikes with jitter_pkpk on sampling or output update edges (logic analyzer or timer capture).

First fix: Freeze the control schedule (fixed-rate) and enable reference monitoring; isolate by disabling noncritical tasks to see whether jitter coupling disappears.

Refs: H2-3H2-4H2-10
MPN example (precision ADC): TI ADS131M04

Use a precision ADC with well-understood digital filter delay and stable reference strategy when spikes are measurement-path driven.

Q7 After brownout, output jumps briefly—hold-up policy or DAC reset state?

Conclusion: A post-brownout “output blip” is usually a missing safe clamp during the early-warning window or an unsafe DAC/PWM default state.

  • Evidence check #1: Verify brownout_flag, supply_dip_min, and hold_up_ms around the event.
  • Evidence check #2: Capture the first output state after reset (dac_reset_state or PWM default duty) and whether clamp_active asserted immediately.

First fix: Define and enforce a safe state: clamp/ramp-down on early warning, and hardware-default the output to a safe value during reset.

Refs: H2-5H2-9
MPN example (supervisor): Microchip MCP1316

A proper supervisor can provide predictable reset thresholds and timing to prevent undefined output windows.

Q8 Watchdog resets but logs show nothing—logging not last-gasp safe or reset cause not captured?

Conclusion: If resets are real but logs are empty, either reset cause capture is late/non-atomic, or the log write path cannot survive the final milliseconds.

  • Evidence check #1: Ensure last_reset_reason/wdog_cause is captured at boot very early and stored in a retained/backup register.
  • Evidence check #2: Verify last_gasp_write_ok and monotonic_counter continuity (missing increments imply failed atomic commits).

First fix: Implement a minimal last-gasp record (reset cause + timestamp/counter + a few key fields) with atomic commit and a detectable “write complete” marker.

Refs: H2-9H2-10
MPN example (FRAM for last-gasp): Fujitsu MB85RS64V

FRAM enables fast, low-wear writes for tiny last-gasp records that must survive resets and brownouts.

Q9 Tuning changes don’t “stick”—parameter commit atomicity or version mismatch?

Conclusion: “Doesn’t stick” is usually a commit/rollback problem or a UI-to-runtime version mismatch, not “bad gains.”

  • Evidence check #1: Compare the runtime control task’s param_version against the HMI-displayed version (they must match after commit).
  • Evidence check #2: Inspect commit_state/rollback_state and the last commit error code under power-cycle and brownout tests.

First fix: Use a double-buffer parameter bank and boundary switch-over; make the HMI show “active version + activation timestamp” explicitly.

Refs: H2-10H2-11
Q10 HMI shows correct PV, but actuator behaves wrong—output scaling/clamp or direction inversion?

Conclusion: If PV is correct but the actuator response is wrong, the fault is usually in the output path: scaling, polarity, clamps, or rate limits.

  • Evidence check #1: Check clamp_active/rate_limit_active and whether output is stuck near a bound despite controller effort.
  • Evidence check #2: Run the commissioning direction test: a small manual step should move PV in the expected direction; otherwise polarity/inversion is present.

First fix: Re-run the actuator direction check with safe bounds, then correct output scaling/polarity and revalidate clamps before retuning gains.

Refs: H2-5H2-11
MPN example (precision DAC): TI DAC8562

A known, monotonic DAC with predictable reset behavior reduces “mystery” scaling and glitch issues in the output path.

Q11 Works until comms traffic increases—deadline miss or priority inversion?

Conclusion: Load-dependent instability is typically determinism loss (deadline misses) caused by IRQ storms, DMA contention, or priority inversion.

  • Evidence check #1: Track deadline_miss_count and max execution time of the control task under worst-case comms.
  • Evidence check #2: Watch max_isr_latency or comms IRQ occupancy; correlate with output jitter or PV ripple.

First fix: Make control fixed-rate and highest priority, throttle comms/UI, and schedule parameter updates only at safe boundaries.

Refs: H2-10H2-3
MPN example (isolator for noisy comms wiring): ADI ADuM1401

Digital isolation helps reduce ground noise and transients that worsen under heavy comms activity in the field.

Q12 Stable at room temp, unstable when hot—gain drift, sensor bias, or rate limits?

Conclusion: Temperature-triggered instability is usually a drift or nonlinearity problem; it changes the effective plant or measurement scale, not just the gains.

  • Evidence check #1: Correlate temp with sensor bias estimate (offset) and any gain schedule state; look for monotonic drift.
  • Evidence check #2: Check whether rate_limit_active or sat_percent increases at high temperature (actuator capability drops).

First fix: Enable temperature-aware calibration/gain scheduling and revalidate clamps/rate limits at hot conditions; only then retune bandwidth.

Refs: H2-4H2-5H2-8
MPN example (RTC for audit timestamping): Micro Crystal RV-3028-C7

Timestamp changes and events to correlate drift/instability with temperature and operating history.