PdM Edge Node Design: IEPE Accelerometer AFE to Edge Features
← Back to: IoT & Edge Computing
A PdM edge node is only trustworthy when it can prove its measurements end-to-end—IEPE compliance and low-noise AFE feed synchronous sampling and explainable features, and every alarm is backed by node-side evidence logs that make field issues reproducible on the bench.
H2-1|PdM Edge Node Engineering Boundary & System Decomposition
This page focuses on the device-side condition-monitoring node: sensor → IEPE/AFE → synchronous ADC → edge features → uplink/log evidence. Cloud analytics, gateway aggregation, and network planning are out of scope.
What “PdM Edge Node” means (device-side only)
Two field profiles (design priorities differ)
- Industrial / always-powered (24 V / PoE / wired): prioritize robustness against ground shift, supply noise, and field transients; keep strong evidence logs for troubleshooting.
- Remote / duty-cycled (battery + short captures): prioritize repeatable wake-up timing, capture window discipline (pre/post), and configuration traceability across firmware updates.
Rule of thumb: Always-powered nodes fail “quietly” (noise/ground coupling). Duty-cycled nodes fail “silently” (missed events due to settling/trigger/capture window).
Three questions every node must answer (turn into measurable outputs)
- What to measure? Define frequency band, amplitude range, and event shape (steady trend vs shock/impulse).
- How to trust sampling? Define sampling rate & anti-alias margin, phase alignment targets, and minimum self-check evidence (bias/clip/drop counters).
- How to make uplink usable? Separate trend vs event payloads; attach sequence, configuration hash, and error counters for auditability.
Module responsibilities (each with “field evidence”)
- Sensing & mounting: mechanical coupling changes show up as band-dependent amplitude shifts; track mounting state as metadata when possible.
- IEPE excitation: constant-current stability and compliance headroom; field evidence: bias voltage, clip/saturation counters, cable drop checks.
- AFE conditioning: coupling corner, gain/DR, anti-alias; field evidence: sweep response and alias signatures.
- Synchronous sampling: no drop/overrun and known timebase; field evidence: sample-seq continuity and timestamp monotonicity.
- Edge features: explainable metrics (RMS/peak/crest/kurtosis/envelope/FFT bands); field evidence: repeatability under controlled stimulation.
- Uplink + evidence: payload structured for diagnosis; field evidence: seq gaps, retry stats, and config version alignment.
H2-2|Sensor & Measurement Targets: Define Band First, Then Build the Chain
The measurement band and event shape determine sampling rate, anti-alias strategy, noise floor, and dynamic range. Sensor type and mounting are chosen to satisfy that chain, not the other way around.
Start with a measurable target (band + amplitude + event shape)
- Low-frequency trend (structural vibration, imbalance): longer windows, stable RMS/peak trends, tolerance to minor phase variation.
- High-frequency shock/impulse (bearing/gear early fault): short windows, strong need for headroom (no clipping) and disciplined pre/post capture.
Common pitfall: “Looks fine in steady state” while short impulses clip or alias, making envelope/crest-based indicators unreliable.
Translate sensor specs into node-side consequences (with evidence hooks)
- Sensitivity: sets ADC code utilization; too high clips on shocks, too low buries early faults in noise.
- Noise density: defines feature jitter floor; verify via quiet-run spectrum and metric repeatability.
- Max g / overload: protects event integrity; verify by clip counters and waveform flat-top signatures.
- Mounting coupling: mechanically filters high-frequency content; verify by controlled repeat tests after remounting.
When IEPE is the practical choice (clear boundary, device-side reasons)
- Long cables / harsh noise: constant-current excitation + defined bias point supports stable operation and easier field diagnostics.
- Need for evidence: bias voltage and compliance headroom are measurable; clipping risk can be monitored and logged.
- High-frequency reliability: common PdM ecosystem for industrial vibration; broad sensor options with known behavior under shock.
Scope note: This section stays at the node interface level (excitation + signal integrity). Network architecture and cloud analytics remain out of scope.
Design mapping: band → sampling/AA → DR/noise → features
- Sampling rate: choose with anti-alias margin (filter roll-off + transition band), not only Nyquist.
- Anti-alias: align cutoff to the target band; avoid “alias spikes” that mimic fault signatures.
- Dynamic range: reserve headroom for impulses; avoid “feature distortion” from rare events.
- Feature set: trend (RMS/peak/FFT bands) vs shock (crest/kurtosis/envelope + short FFT windows).
H2-3|IEPE Constant-Current & the “Compliance Voltage” Window
An IEPE link behaves reliably only when (1) the excitation current is stable and (2) the compliance voltage still has headroom under the largest shock. If compliance collapses, the failure mode is not “slightly worse”—it becomes clipping, spectrum distortion, and missed impulsive events.
IEPE, simplified (what must be guaranteed)
Design must guarantee: excitation current in the intended range (application-dependent) and enough compliance voltage under max event.
- Excitation current affects noise robustness and cable drop (more current improves some noise margins but increases I × R drop).
- Compliance voltage must cover bias + signal swing + cable drop + protection drop + margin.
Compliance budget (turn “it works” into a measurable window)
V_needed ≈ V_bias + V_signal(peak) + V_cable_drop + V_protect_drop + Margin
- V_bias: sensor bias point after settling; measurable at the node input.
- V_signal(peak): peak swing during worst-case shock/impulse (the event that PdM cannot afford to lose).
- V_cable_drop: proportional to excitation current and cable/connector resistance; increases quickly with long runs or poor contacts.
- V_protect_drop: protection elements can add drop during large excursions; this directly steals headroom.
- Margin: required so the largest event remains linear (no flat-topping, no slow recovery).
Field symptoms → evidence (two common failure trees)
Symptoms: flat-topped waveform, distorted impulse, “strange” harmonics, missed shocks, unstable envelope/crest metrics.
Evidence: waveform shows hard clipping at peaks; clipping/overrange counter increments; behavior improves with shorter cable, lower excitation current, or reduced gain.
Symptoms: low-frequency lift, false peaks, envelope mis-triggers, trend drift without mechanical cause.
Evidence: elevated noise floor during quiet operation; artifacts correlate with supply changes; improvement after filtering/decoupling or cleaner excitation source.
Engineering check trio (a minimal, repeatable workflow)
- 1) Measure static bias voltage: confirm it settles within the expected window and remains stable across temperature and cable changes.
- 2) Stress the max-event swing: apply or capture the largest expected shock; verify no clipping and fast recovery.
- 3) Estimate cable drop: compute/measure cable resistance and connectors; verify I × R drop does not consume margin at the worst event.
Practical rule: PdM credibility is decided by the rare event. A chain that is linear for steady vibration but clips on impulses will produce confident-looking, wrong features.
H2-4|Low-Noise AFE Chain: HPF → Gain → Anti-Alias → Protect → Diff/Vcm → ADC
The path from IEPE output to the ADC is a sequence, not a parts list. Each stage exists to preserve useful mechanical information while preventing aliasing, clipping, and protection-induced distortion.
Canonical chain (device-side)
Key trade-offs (write as “decision + field evidence”)
- HPF corner: too high removes real low-frequency vibration (trend becomes artificially flat); too low allows drift/bias wander into metrics (trend drifts without mechanics).
- Anti-alias filtering: not “stronger is always better” — aggressive filters can add phase/group-delay behavior that harms impulse timing; insufficient filtering creates alias peaks that mimic faults.
- Input protection: protection parasitics can steal headroom and add distortion under large events; symptoms look like “hard bends” and slow recovery during shocks.
Stage-by-stage: what each block must prove
- AC coupling / HPF: proves low-frequency integrity; verify with low-band excitation and long-window stability checks.
- Gain (TIA/INA/PGA): proves noise floor vs headroom balance; verify that rare impulses never clip while quiet-run noise stays low.
- AA LPF: proves alias control; verify via sweep near Nyquist and by changing sample rate to see whether suspicious peaks “move” (alias signature).
- Protect: proves survivability without measurement damage; verify large-event behavior (no premature conduction causing flat-tops).
- Diff driver / Vcm: proves ADC compatibility; verify input common-mode and full-scale swing alignment under load.
Minimum validation set (fast acceptance tests)
- Sine (single tone): confirm linearity, no clipping, and reasonable harmonic content (gain + ADC range check).
- Sweep (band + near Nyquist): confirm amplitude/phase sanity and absence of alias peaks (AA + Fs check).
- Impact / impulse: confirm peak capture without flat-tops and fast recovery; confirm trigger and pre/post buffer produce stable features.
Acceptance goal: features remain stable across repeats, and any failure mode leaves a clear signature (clip counter, alias behavior, recovery time).
H2-5|Synchronous ADC & Sampling Plan: From Rules of Thumb to Computable KPIs
Sampling decisions must be defensible with device-side evidence. A practical plan converts sample rate, dynamic range, and sync error into KPIs that can be measured during bring-up and verified in the field.
Sample rate is not only Nyquist
- Anti-alias margin: leave transition band between the highest useful band and
Fs/2so the AA filter can roll off without folding energy back as false peaks. - Impulse time resolution:
Δt = 1/Fs. If Δt is coarse, short shocks look “rounded,” peak/crest features drift, and triggers become inconsistent. - Window discipline: feature windows are
T = N/Fs. Trend windows and event windows must both remain meaningful and repeatable.
Dynamic range: noise floor vs maximum shock (avoid “steady looks fine, shock clips”)
- Noise-floor KPI: quiet-run spectrum floor and short-term feature variance (repeatability) indicate whether the chain is dominated by electrical noise.
- Headroom KPI: maximum event must stay below full-scale with margin; clipping and slow recovery contaminate the post-event window and break envelope/crest metrics.
- Bring-up check: step through gain/scale options and record: noise floor, clip counter, and recovery time under large impulses.
Why synchronous ADC matters: channel skew becomes phase error
Phase error ≈ 2π · f · Δt_skew (higher frequency → more sensitive).
- Multi-point comparability: two-end bearing measurements and multi-axis sensing rely on stable cross-channel phase relationships.
- Skew KPI: measure channel-to-channel time alignment and stability across temperature and run time (device-internal).
- Evidence: apply the same input to multiple channels and check correlation lag / phase difference repeatability.
Interface & sampling clock: practical conclusions + checks
- Clock jitter hurts high-frequency credibility: a “clean looking” low-frequency spectrum can hide high-frequency SNR loss if the sampling clock is noisy.
- Check 1 (HF sine): inject a near-top-band sine and observe SNR / noise floor; degradation that grows with frequency is a classic jitter signature.
- Check 2 (peak shape): as input frequency increases, watch for “fatter peaks” and rising skirts/noise floor around the tone.
Boundary reminder: this section covers device-internal ADC clock/reference integrity only (no network time algorithms).
H2-6|Node-Internal Timebase & Phase Consistency: From Jitter/Drift to Diagnostic Evidence
This section covers node-internal time only: XO/PLL/RTC behavior, warm-up stability, and evidence that separates clock drift from gain drift and mounting changes. Network time algorithms are out of scope.
What “timebase” means inside a PdM node
- XO/TCXO: sets the sampling cadence; temperature and aging shift frequency and phase behavior.
- PLL (optional): can multiply/clean clocks but adds its own lock/warm-up behavior that must be observable.
- RTC: anchors long-term scheduling and timestamps; must remain monotonic across sleep cycles and reboots.
- Warm-up stability: define a settle window after power-up before comparing “day-to-day” trend features.
“Same machine, different day” not comparable: split into three causes
Symptoms: peak frequency slowly shifts; phase relationships drift; event alignment slides over time.
Evidence: peak-position drift vs temperature/run-time; timestamp rate error; early-run features unstable until warm-up settles.
Symptoms: amplitude changes while the frequency structure stays similar.
Evidence: quiet-run floor and calibration tone amplitude shift; RMS scales without matching peak-position drift.
Symptoms: high-band energy changes strongly; multi-axis ratios change after re-mounting.
Evidence: band-energy step changes after installation events; coherence changes across axes/points.
Actionable rule: log installation events as metadata; otherwise electrical drift and mechanical coupling changes become indistinguishable.
Multi-point phase relationships drift: three device-side chains to check
- Sampling alignment chain: channel skew changes with temperature/supply → phase error grows at higher frequency.
- Trigger alignment chain: trigger latency/debounce varies → t0 alignment drifts inside pre/post buffers.
- Path/connection chain: cable/connector changes create delay steps and amplitude changes → correlation lag steps.
Evidence chain table (device-side only)
- Waveform phase / correlation: use correlation lag and phase difference repeatability to detect skew and trigger drift.
- Spectral peak stability: track peak position and peak width; frequency drift and “peak fattening” point to timebase issues.
- Timestamp continuity: enforce monotonic timestamps and sequence continuity; gaps are direct proof of capture/logging faults.
- Warm-up/temperature tags: correlate metrics with temperature and time-since-boot to separate warm-up effects from real machine changes.
Explicit boundary: do not attribute drift to network sync; this section is strictly internal (XO/PLL/RTC + timestamps).
H2-7|Edge Feature Extraction: Build Explainable Features Before “AI”
A PdM edge node earns trust with explainable, repeatable features and device-side evidence. Start with deterministic metrics (time/frequency/event logic), then add learning later if needed.
Quality gate first: do not compute features on untrustworthy samples
OK / Degraded / Bad.
- Signal integrity: clip/overrange, saturation count, recovery time indicators.
- Capture integrity: sample drops, FIFO overflow, timestamp/sequence gaps.
- Context integrity: gain/range state, temperature and supply summary for the record.
Rule: if quality is Bad, emit evidence (counters + snippet) but suppress “confident” conclusions.
Explainable time-domain features: what each one is good at
- RMS: stable energy trend for steady vibration; robust for long-run monitoring.
- Peak: highlights shocks and impacts; sensitive to clipping and bandwidth limits.
- Crest Factor (Peak/RMS): amplifies rare impulses riding on a steady baseline; collapses if peaks clip.
- Kurtosis: emphasizes sparse spikes; requires consistent windowing and installation metadata to stay comparable.
Common failure mode: clipping can make crest/kurtosis look “healthier” by flattening real spikes.
Frequency-domain features: node-side implementation mindset
- FFT band energy: map spectrum into a few fixed bands (low/mid/high or application bands) and trend band deltas.
- Peak tracking: track top peaks (frequency + amplitude) plus peak stability (drift/width) as credibility evidence.
- Envelope / sideband (minimal): band-limit → rectification or simple envelope → low-rate FFT/bands; keep compute bounded.
Credibility hint: suspicious peaks that shift when Fs changes are often alias artifacts (device-side check).
Event triggering: threshold + sliding window + debounce (with pre/post capture)
Idle → Candidate → Confirmed → Cooldown.
- Sliding window: short window for detection, longer window for confirmation to suppress noise spikes.
- Debounce: separate enter/exit thresholds or minimum duration to prevent “chatter.”
- Pre/Post capture: store a short snippet around t0 so the alert can be audited later.
Output recommendation: feature + confidence + minimal raw evidence
- Features: fixed fields (time + frequency) to keep results comparable across firmware versions.
- Confidence: derived from quality-gate status + repeatability checks (not an opaque probability).
- Evidence snippet: small raw segment (pre/post) for forensic review and model improvement later.
- Context: timestamp, config version, gain/range, temperature/supply summary, counter snapshot.
Goal: the same parameter set remains stable across temperature, supply variation, and installation changes (tracked as metadata).
H2-8|Local Data & Logs: Every Alert Must Answer “Why Trust This?”
Alerts become actionable only when the edge node can present evidence, context, and health. The core design goal is auditability under real field constraints (offline periods, reboots, transient faults).
Two-layer local storage: short raw snippets + long feature trends
- Ring buffer (snippets): store short raw waveform segments around events (pre/post) for forensic review.
- Trend store: store low-rate feature trends for long-term baselining and drift detection.
- Link key: join by
event_id, timestamp and a sequence index so nothing is ambiguous.
Benefit: an alert can be re-checked even when backhaul is unreliable or the device reboots.
Minimum required event record fields (standardized, compact, reproducible)
- Time & ordering: monotonic timestamp + continuous sequence number (gap is direct evidence of capture/logging faults).
- Config identity: config version or hash, plus sampling/feature parameter set identifier.
- Measurement state: gain/range state, full-scale mapping, temperature and supply summary.
- Integrity snapshot: clip count, drop/overflow counters, timestamp gap count, sync/skew status (if available).
The three most useful log families in the field
- Sensor-chain health: bias/compliance indicators, saturation counters, sensor disconnect hints.
- Sampling-chain health: drop/overflow counters, clock anomaly hints, alignment/skew stability flags.
- Reporting-chain health: retry/latency summaries, sequence breaks, payload integrity checks (device-side).
Evidence Pack: make every alert “question-proof”
- Conclusion: severity + short feature-change summary (what changed, how much).
- Evidence: raw snippet reference + feature vector + confidence derived from quality gate.
- Context: config version, gain/range, temperature/supply summary, install metadata tag if available.
- Integrity: timestamp/sequence continuity + counter snapshot (clip/drop/overflow/gaps).
Non-negotiable: without evidence + integrity, an alert is not trustworthy, regardless of back-end analytics.
H2-9|LoRa / Ethernet Uplink: Bandwidth + Reliability + Diagnosability (Node-Side)
Node-side uplink is not “send more data.” It is a disciplined contract: keep bandwidth predictable, make failures observable, and preserve evidence for later verification.
Two uplink modes: Trend vs Event (with built-in rate limits)
- Periodic trend uplink: low bandwidth and highly robust. Send compact feature summaries and deltas.
- Event uplink: alert + evidence. Send short pre/post snippets only under a strict quota.
- Degrade ladder: event+snippet → event-only → trend-only when quota/queue pressure rises.
Principle: when the field gets noisy (bursts of events), the node must stay alive and keep auditable records locally.
Payload schema: optimize for audit and reconciliation
seq, timestamp, config_hash, quality_flag, counter_snapshot.
- seq (continuity): exposes loss, duplication, or reordering; makes “what was missed” measurable.
- timestamp: aligns trend and event timelines; enables time-window audits.
- config_hash: binds the measurement to firmware and parameter set (repeatability).
- quality_flag + reason codes: indicates whether features should be trusted.
- counter snapshot: clip/drop/overflow/gap/retry counts to explain anomalies.
Reliability as diagnosable behavior (not a vague promise)
- Queue observability: queue depth, overflow count, last-success timestamp.
- Retry observability: retry count, backoff state, consecutive-fail counter.
- State code: emit a compact uplink state:
Up / Degraded / Downwith reason codes.
Outcome: a failed uplink is still informative if it produces consistent state + counters.
Ethernet-side evidence (strictly node-side)
- Link evidence: link up/down, link-flap count, PHY error counters (when available).
- Power coupling evidence: reboots/resets correlated with high-current uplink activity (e.g., PoE rail droop).
- Local timing: last-success timestamp + sequence continuity (no TSN/PTP algorithm discussion).
H2-10|Power & Robustness: When “Waveforms Look Fine” but Features Jitter
Feature jitter often comes from power and reference instability. The waveform may look acceptable, yet the feature pipeline is sensitive to noise-floor shifts, reference drift, and ground/common-mode injection.
Three coupling paths that create “pseudo vibration”
- Current-source ripple: modulates IEPE bias/output, raising low-frequency floor and triggering false events.
- AFE reference / common-mode drift: appears as gain/bias drift, making day-to-day data incomparable.
- ADC Vref jitter / ground bounce: broadens peaks and lifts high-frequency noise floor, destabilizing band energy.
Key idea: “normal looking” time traces can still produce unstable crest/kurtosis/band energy because those metrics amplify subtle shifts.
Why features are more fragile than visual waveform checks
- Crest / kurtosis: highly sensitive to clipping, tiny spikes, and window inconsistency.
- Band energy / peak tracking: sensitive to noise-floor shifts and reference jitter (peaks widen or drift).
- Event state machine: sensitive to low-frequency drift; debounce can fail if the baseline moves.
Ground & shielding in long IEPE cabling: node-side symptoms only
- Ground loop / common-mode swing: injects mains components and slow baseline movement.
- Observable symptoms: low-frequency lift, 50/60 Hz + harmonics, unstable triggers, phase inconsistency across channels.
- Node-side evidence: rising quality flags, clip counts, baseline drift indicators, and repeatability failure across sessions.
Boundary: focus on symptoms and evidence at the node; avoid turning this into a general EMC guide.
Three must-measure points in the field (and what instability looks like)
current ripple, AFE ref/CM, ADC Vref/GND.
- Probe 1 — current-source ripple: ripple correlates with low-frequency lift and false triggers.
- Probe 2 — AFE ref/common-mode: drift correlates with gain-equivalent drift and day-to-day mismatch.
- Probe 3 — ADC Vref/ground bounce: activity-dependent noise correlates with HF band jitter and peak broadening.
Minimal mitigations (node essentials only)
- Partition & timing: reduce coupling by separating analog/reference rails and avoiding heavy uplink bursts during capture windows.
- Reference stability: prioritize low-impedance reference return and stable common-mode biasing for the ADC driver.
- Verify by evidence: expect counter/quality improvements and reduced feature jitter under the same parameter set.
H2-11 — Validation & Debug: A Bench-to-Field Closed Loop
This chapter turns IEPE/AFE/ADC/feature/logging knowledge into a repeatable engineering loop. Every test produces a consistent evidence package (bias/compliance, clip/recovery, sampling health, feature stability, seq continuity), so field failures can be reproduced on the bench and fixed with confidence.
1) What “done” looks like (node-side definition)
- Inputs are controlled: sine/sweep/impact excitation can be repeated with the same mounting and the same limits.
- Evidence is explicit: bias/compliance margin, clip counters, anti-alias indicators, drop/overflow flags, and timebase/sequence continuity are always logged.
- Features are repeatable: the same stimulus produces tightly bounded RMS/Peak/Band-energy trends across temperature and supply variation.
- Field triage is deterministic: sensor-chain health → sampling health → uplink continuity, in that order.
2) The closed-loop recipe (copyable method)
- Step A — Bench input Inject repeatable stimuli (electrical sine/sweep, mechanical shaker, impact hammer) and keep mounting consistent.
- Step B — Node evidence Record bias/compliance margin, saturation/clip & recovery, sampling drop/overflow, quality flags, and config hash.
- Step C — Feature repeatability Repeat N times and compare spread (mean + variability) for RMS/Peak/Crest/Kurtosis and band energies.
- Step D — Field triage + regression Use the evidence priority to localize root cause, then convert the field scenario into a bench regression case.
Figure H2-11 Closed-loop: Bench Inputs → Node Evidence → Repeatability → Field Triage → Regression Case
3) Bench setup (three injection modes + a golden reference)
The bench should support three repeatable stimulus types and one comparison reference. The goal is not “pretty waveforms”; the goal is stable evidence counters and repeatable features under controlled inputs.
- Electrical sine/sweep injection (front-end verification): validates gain/phase, filter corners, anti-alias behavior, and clipping margin without mechanical uncertainty.
- Shaker / handheld calibrator (steady mechanical excitation): validates band energy tracking and amplitude stability with controlled frequency and acceleration.
- Impact hammer (impulse events): validates trigger, pre/post capture, and saturation recovery behavior.
- Golden reference channel: compare against a known-good IEPE accelerometer path to detect mounting or sensor-chain anomalies early.
4) Test checklist with pass/fail evidence (make each item measurable)
(a) Compliance margin sweep — vary cable length, temperature, and supply voltage. The evidence target is not only “no clipping”, but also stable bias/compliance margin and consistent event capture.
- Record: bias voltage, compliance headroom, clip counters, quality flags, and a short pre/post snippet reference.
- Fail pattern: waveform flattening, spectrum anomalies, missing impacts, and unstable trigger behavior.
(b) Saturation & recovery time — inject an over-range impulse and measure how fast the chain returns to valid measurements.
- Record: clip onset timestamp, recovery time, post-event baseline settling, and any feature “aftershocks”.
- Fail pattern: slow recovery makes the next real event look “smaller” or “noisier”, creating false negatives/positives.
(c) Anti-alias foldback check — sweep beyond Nyquist and verify that out-of-band energy does not reappear as in-band peaks.
- Record: in-band/out-of-band energy ratio, peak location drift, and band-energy stability at the stopband edge.
- Fail pattern: “ghost peaks” or band-energy inflation that looks like a bearing defect but is pure foldback.
(d) Feature repeatability — repeat the same input N times. Pass criteria must be numerical (spread bound), not visual.
- Record: mean + variability for RMS/Peak/Crest/Kurtosis and band energies; keep the config hash constant.
- Fail pattern: waveforms look similar but features drift → often caused by supply/ground/reference instability.
5) The “minimum record schema” (so every result is regression-ready)
Every run should emit a consistent, parseable record. This is the difference between “a lab demo” and “an engineering product”.
- Identity: test_id, timestamp, node_id, firmware/config hash, parameter set ID.
- Input conditions: stimulus type, frequency/amplitude (or impact count), mounting method, cable length, temperature, supply voltage.
- Acquisition conditions: Fs, window length, filter profile ID, trigger thresholds/debounce, pre/post buffer lengths.
- Evidence: bias/compliance headroom, clip counters, drop/overflow flags, timebase status, uplink seq continuity, snippet references.
6) Field triage priority (deterministic, node-side)
Field debugging should follow evidence priority. This avoids “feature chasing” when the sensor chain is already compromised.
- Sensor-chain first: bias/compliance margin + clip/saturation counters. If these are unhealthy, features are not trustworthy.
- Sampling next: drop/overflow/timebase anomalies. If the capture is unhealthy, repeatability will collapse.
- Uplink last: seq/timestamp gaps. If continuity breaks, the field history becomes incomplete and alarms lose context.
7) Regression rule (turn every field failure into a bench replay)
- Capture the trigger envelope: supply/temperature/cable/mounting conditions + the exact config hash that produced the failure.
- Define expected evidence: which counters rise, which quality flags trip, what the snippet should look like.
- Fix target: after the fix, the same bench replay must keep evidence healthy and features within the repeatability bound.
8) Example reference BOM (specific part numbers for a practical validation setup)
The following part numbers are commonly used to build a practical PdM node validation bench and a “golden reference” chain. They are examples to make the checklist actionable; final selection should match the target frequency band and mounting constraints.
| Category | Example P/N / Model | Why it helps in H2-11 |
|---|---|---|
| Golden reference sensor | PCB Piezotronics 352C33 | IEPE accelerometer reference channel for bias/compliance sanity checks and repeatability comparisons. |
| Handheld calibration shaker | PCB Piezotronics 394C06 | Repeatable steady excitation to validate amplitude/band-energy tracking and feature spread across repeated runs. |
| Impact / impulse source | PCB Piezotronics 086C03 | Controlled impulse events to validate trigger, pre/post capture, and saturation recovery behavior. |
| Portable ICP/IEPE conditioner | PCB Piezotronics 480C02 | Battery-powered signal conditioning for quick field/bench checks of IEPE turn-on, bias, and basic signal integrity. |
| IEPE data acquisition | NI 9234 | Simultaneous-sampling dynamic acquisition module widely used with IEPE sensors—useful as a bench reference capture path. |
| Sync ADC options | ADI AD7768-1 TI ADS131M04 |
Representative sigma-delta ADCs often chosen for simultaneous sampling / precision capture; useful as “known-good” comparator designs. |
| LoRa radio (node-side) | Semtech SX1262 | Representative LoRa transceiver for building a deterministic seq/timestamp continuity test (periodic vs event payload). |
| Ethernet (node-side) | WIZnet W5500 Microchip LAN8720A |
Representative embedded Ethernet controller + PHY choices to implement link-up evidence, drop counters, and payload continuity tests. |
| IEPE constant-current building block | LT3092 | Programmable current-source IC option used as a building block for IEPE excitation experiments (compliance margin sweeps). |
| Edge DSP MCU example | ST STM32H743 | Representative high-performance MCU family for feature extraction + logging + evidence counters under repeated test workloads. |
H2-12 — FAQs (Node-side, evidence-first)
Each answer follows an evidence-first flow: symptoms → two measurement points → quick branching → node-side action → deep-dive chapter. No gateway/cloud/network-planning content.
1) IEPE compliance voltage is insufficient: what are the most common waveform/spectrum symptoms, and which two points should be measured first?
Insufficient compliance most often shows as flat-topped peaks, asymmetric clipping, and “clean” tones turning into odd harmonics or sudden broadband rise near impacts. Measure (1) IEPE bias at the node input and (2) headroom to the clipping limit at the largest expected swing. If headroom collapses under load or long cable, treat it as a compliance budget issue first.
- Branching: bias stable but peaks clip → compliance headroom too small; bias moves with supply/radio activity → ripple/ground coupling dominates.
- Node-side action: lower gain before clipping, reduce series drop, confirm protection-device drop under peak conditions.
2) After extending the cable, alarms increase: is it the sensor/cable or constant-current ripple, and how to tell?
Longer cables increase both DC drop (reducing compliance headroom) and susceptibility to common-mode pickup. First check whether bias/headroom degrades monotonically with cable length—that points to compliance loss. Then correlate alarms with current-source ripple or radio/PoE switching events; if alarms cluster with ripple bursts, ripple/ground coupling is the driver.
- Branching: headroom shrinks with length → cable drop/compliance; headroom OK but features jump with supply noise → ripple/EMI coupling.
- Node-side action: add ripple measurement in logs, enforce quality flags when bias/headroom unstable.
3) Low-frequency vibration exists, but RMS/trend barely changes: is the high-pass corner mis-set?
Yes—an overly high AC-coupling corner is a classic reason low-frequency energy disappears from RMS. Confirm by comparing band energy below the corner versus above the corner (or run a slow sweep). If RMS is insensitive while peaks still appear, the chain is filtering out the low-frequency content before feature extraction. The fix is a corner frequency aligned to the true mechanical band, not a generic “noise cleanup” value.
- Branching: low-band energy collapses → high-pass too high; low-band energy exists but RMS unchanged → windowing/feature definition mismatch.
- Node-side action: log the configured corner ID/hash; gate trend validity if corner changes.
4) Impacts are often “missed”: change sampling rate first, or change trigger + buffer first?
Start with trigger and buffer. Many missed impacts are not bandwidth-limited; they are logic-limited: threshold too high, debounce too aggressive, or pre/post windows too short. Validate by logging trigger decisions (armed/triggered/rejected reason) and keeping a short rolling snippet buffer. Only raise sampling rate if the captured waveform lacks the expected rise time or high-frequency content after trigger settings are correct.
- Branching: no trigger events logged → trigger tuning; trigger logged but waveform looks smeared → sampling/filter bandwidth.
- Node-side action: add pre/post capture IDs; include a “trigger reject reason” counter.
5) Steady-state is small, but rare peaks saturate the ADC: how to choose range/gain more robustly?
Treat peaks as a separate design constraint: use p99/p999 peak statistics (not average) and enforce a minimum headroom target. If the ADC clips even rarely, early bearing features become untrustworthy because crest/kurtosis inflate and recovery distorts the next window. Stabilize by lowering analog gain, adding a “high-peak mode” profile, and logging clip counters + recovery time. Example ADC classes: AD7768-1 / ADS131M04 (simultaneous sampling) as typical references.
- Branching: clip counters spike during impacts → gain/range mismatch; clip without impacts → compliance/ripple coupling.
- Node-side action: choose deterministic gain profiles; include gain/range state in every event record.
6) Multi-channel phase relation drifts: check sampling sync first or mounting coupling first, and how to order evidence?
Check sampling synchronization first using a bench same-source test (electrical injection or controlled shaker) to eliminate mounting uncertainty. If phase is stable on the bench but drifts in the field, the likely driver is mounting stiffness/placement or cable-induced coupling differences. Log channel-to-channel time skew, timebase status, and temperature so drift can be separated into timing versus mechanical coupling evidence.
- Branching: drift on bench → sync/timebase; drift only in field → mounting/cable coupling.
- Node-side action: keep a phase sanity metric (cross-correlation peak shift) per capture.
7) Crest factor vs Kurtosis for early bearing faults: which is better, and why do false positives happen?
Kurtosis is highly sensitive to rare impulsive spikes, making it useful for early defect impacts—but also vulnerable to false positives from clipping, ESD bursts, or switching noise. Crest factor depends on the peak-to-RMS ratio, so it can also inflate when RMS is suppressed by filtering or window choices. Reduce false alarms by validating clip counters, enforcing quality flags, and keeping a short snippet for post-hoc confirmation.
- Branching: kurtosis rises with clip counters → saturation artifact; crest rises with corner/filter changes → configuration artifact.
- Node-side action: pair “feature value” with “confidence/quality” computed from evidence counters.
8) Envelope/band features are unstable on-node: is it window/bandwidth or the noise floor?
Separate algorithm settings from analog noise by a two-step check: keep window/band parameters fixed (log the config hash), then observe whether spectral floor and peak width change with supply or radio activity. If the floor rises and peaks broaden, the cause is often analog/reference noise. If stability changes mostly with window length or band edges, the issue is parameterization (windowing/filters), not physics.
- Branching: floor correlates with supply/ripple → noise floor; feature changes only when window changes → parameterization.
- Node-side action: freeze parameters per deployment; log both “band definition” and “effective noise floor” metric.
9) LoRa bandwidth is small: how to layer trend vs event reporting without losing critical evidence?
Use a two-layer payload: trend packets at low duty cycle (band energies, RMS, temperature, health counters) and event packets with rate limiting (alarm type, top features, evidence snapshot, and a short snippet reference ID). Keep raw snippets in a local ring buffer and report only indexes/hashes over LoRa. Example node radios often use SX1262-class transceivers; the layering logic remains node-side.
- Branching: trend stable but events missing → event throttling/buffer sizing; both unstable → sensor/power evidence first.
- Node-side action: enforce event budget per hour/day; include a “dropped-event due to budget” counter.
10) Uplink occasionally drops packets: how to design sequence/retry/reconciliation fields so data remains traceable?
Make loss visible and reconstructable using a minimal field set: seq (monotonic), timestamp, config_hash, last_success_seq, and a compact health counter snapshot. With these, a receiver can detect gaps, confirm device configuration at the time of the event, and correlate anomalies to evidence counters. Ethernet nodes can apply the same idea (e.g., W5500/LAN8720A-class links) without discussing network planning.
- Branching: seq gaps with healthy sampling counters → link issue; seq continuous but features wrong → node signal chain issue.
- Node-side action: store-and-forward critical events locally; retry with bounded policy and log retry counts.
11) After temperature changes, all features drift: sensor sensitivity drift or AFE/reference drift, and how to verify?
Use a controlled comparison to isolate drift: first, run a bench repeatability check with a known excitation and a golden reference channel; if drift persists under controlled input, it is likely AFE/reference/timebase. Then check whether drift correlates with ADC Vref/common-mode/bias stability and supply noise. If drift appears only with different mounting or cable routing, treat it as mechanical coupling or installation variability.
- Branching: drift on bench → analog/reference/timebase; drift only in field → mounting/cable/ground coupling.
- Node-side action: log temperature + evidence counters; flag “trend invalid” when reference stability degrades.
12) Measurement “looks normal” but alarms bounce: which three interference classes should be checked first?
Prioritize node-side interference that creates “fake vibration” signatures: (1) constant-current ripple or supply switching noise coupling into the IEPE chain, (2) AFE reference/common-mode movement that warps gain/baseline, and (3) ADC Vref/GND bounce that lifts the noise floor and destabilizes band features. Confirm by correlating alarms with ripple metrics, bias/common-mode drift, and clip/overflow counters before chasing feature tuning.
- Branching: alarms align with supply/radio/PoE events → power coupling; alarms align with clip/recovery → range/compliance; alarms align with floor rise → reference/GND noise.
- Node-side action: add three “must-log” points: current-source ripple, AFE ref/common-mode, ADC Vref/GND noise proxy.
Figure H2-12 Evidence-first triage flow (symptom → measure → branch → action → deep dive)