Uplink occasionally drops packets: how to design sequence, retry, and reconciliation fields so data remains traceable?

Use a minimal traceability set: seq (monotonic), timestamp, config_hash, last_success_seq, and a compact health-counter snapshot. This reveals gaps, proves which configuration produced the event, and correlates anomalies to node evidence. Apply bounded retry and store-and-forward for critical events, while logging retry counts. If seq gaps occur with healthy sampling counters, treat it as link loss; if seq is continuous but features are wrong, return to node chain evidence.

PdM Edge Node Design: IEPE Accelerometer AFE to Edge Features

Q: IEPE compliance voltage is insufficient: what are the most common waveform/spectrum symptoms, and which two points should be measured first?

Insufficient compliance most often appears as flat-topped peaks, asymmetric clipping, and odd harmonics or broadband rise near impacts. Measure (1) IEPE bias at the node input and (2) headroom to the clipping limit at the largest expected swing. If headroom collapses under load or long cable, treat it as a compliance budget issue; if bias moves with supply or radio activity, prioritize ripple/ground coupling checks.

Q: After extending the cable, alarms increase: is it the sensor/cable or constant-current ripple, and how to tell?

Longer cables reduce compliance headroom via DC drop and increase susceptibility to common-mode pickup. First verify whether bias/headroom degrades monotonically with cable length (compliance loss). Then correlate alarms with current-source ripple or radio/PoE switching events; if alarms cluster with ripple bursts while headroom remains adequate, ripple/ground coupling is the dominant driver. Log ripple metrics and bias stability to keep the diagnosis node-side.

Q: Low-frequency vibration exists, but RMS/trend barely changes: is the high-pass corner mis-set?

An overly high AC-coupling corner is a common reason low-frequency energy disappears from RMS. Confirm by comparing band energy below the corner versus above it (or run a slow sweep). If low-band energy collapses, the high-pass is too high. If low-band energy remains but RMS is insensitive, the feature definition/windowing may be mismatched. Log the configured corner ID/hash to avoid silent drift in trend behavior.

Q: Impacts are often missed: change sampling rate first, or change trigger and buffer first?

Start with trigger and buffer. Many missed impacts are threshold/debounce/pre-post window issues rather than bandwidth limits. Log trigger decisions (armed/triggered/rejected reason) and keep a short rolling snippet buffer. If triggers occur but waveforms lack expected rise-time content, then increase sampling bandwidth or revisit anti-alias filtering. Add pre/post capture IDs and reject-reason counters for traceable tuning.

Q: Steady-state is small, but rare peaks saturate the ADC: how to choose range/gain more robustly?

Use p99/p999 peak statistics (not averages) and enforce minimum headroom to prevent rare clipping. Even infrequent saturation inflates crest/kurtosis and distorts recovery into the next window. Stabilize by lowering analog gain, adding a high-peak profile, and logging clip counters plus recovery time. Include gain/range state in every event record so drift and false alarms can be traced to configuration rather than guessing.

Q: Multi-channel phase relation drifts: check sampling sync first or mounting coupling first, and how to order evidence?

Check sampling synchronization first using a same-source bench test to eliminate mounting uncertainty. If phase is stable on the bench but drifts in the field, prioritize mounting stiffness/placement and cable-induced coupling differences. Log channel-to-channel time skew, timebase status, and temperature so drift can be separated into timing versus mechanical coupling evidence. Keep the decision sequence deterministic: sync/timebase evidence before feature tuning.

Q: Crest factor vs Kurtosis for early bearing faults: which is better, and why do false positives happen?

Kurtosis is sensitive to rare impulsive spikes, making it useful for early defect impacts, but it false-trips on clipping, ESD bursts, or switching noise. Crest factor depends on the peak-to-RMS ratio and can inflate when RMS is suppressed by filtering or window choices. Reduce false positives by validating clip counters, enforcing quality flags, and retaining short snippets so alarms can be confirmed against raw evidence.

Q: Envelope/band features are unstable on-node: is it window/bandwidth or the noise floor?

Fix window and band parameters first (log the config hash), then observe whether spectral floor and peak width change with supply or radio activity. If the floor rises and peaks broaden, analog/reference noise is likely. If stability changes mainly with window length or band edges, parameterization is the primary cause. Add a noise-floor proxy metric and freeze parameter sets per deployment to keep results repeatable across time and temperature.

Q: LoRa bandwidth is small: how to layer trend vs event reporting without losing critical evidence?

Layer reporting into trend packets (low duty cycle: band energies, RMS, temperature, health counters) and event packets (rate-limited: alarm type, top features, evidence snapshot, and a snippet reference ID). Keep raw snippets in a local ring buffer and report only indexes/hashes over LoRa. Enforce an event budget and log dropped-event counters so loss is visible and the remaining evidence remains trustworthy.

← Back to: IoT & Edge Computing

A PdM edge node is only trustworthy when it can prove its measurements end-to-end—IEPE compliance and low-noise AFE feed synchronous sampling and explainable features, and every alarm is backed by node-side evidence logs that make field issues reproducible on the bench.

H2-1｜PdM Edge Node Engineering Boundary & System Decomposition

This page focuses on the device-side condition-monitoring node: sensor → IEPE/AFE → synchronous ADC → edge features → uplink/log evidence. Cloud analytics, gateway aggregation, and network planning are out of scope.

What “PdM Edge Node” means (device-side only)

Functional chain: Sensor → IEPE excitation (if used) → low-noise AFE (coupling/filter/gain) → synchronous sampling (ADC + timebase) → on-device features/trigger → uplink + local evidence log.

Analog chain correctness Synchronous sampling integrity Explainable edge features Evidence-first logging

Two field profiles (design priorities differ)

Industrial / always-powered (24 V / PoE / wired): prioritize robustness against ground shift, supply noise, and field transients; keep strong evidence logs for troubleshooting.
Remote / duty-cycled (battery + short captures): prioritize repeatable wake-up timing, capture window discipline (pre/post), and configuration traceability across firmware updates.

Rule of thumb: Always-powered nodes fail “quietly” (noise/ground coupling). Duty-cycled nodes fail “silently” (missed events due to settling/trigger/capture window).

Three questions every node must answer (turn into measurable outputs)

What to measure? Define frequency band, amplitude range, and event shape (steady trend vs shock/impulse).
How to trust sampling? Define sampling rate & anti-alias margin, phase alignment targets, and minimum self-check evidence (bias/clip/drop counters).
How to make uplink usable? Separate trend vs event payloads; attach sequence, configuration hash, and error counters for auditability.

Module responsibilities (each with “field evidence”)

Sensing & mounting: mechanical coupling changes show up as band-dependent amplitude shifts; track mounting state as metadata when possible.
IEPE excitation: constant-current stability and compliance headroom; field evidence: bias voltage, clip/saturation counters, cable drop checks.
AFE conditioning: coupling corner, gain/DR, anti-alias; field evidence: sweep response and alias signatures.
Synchronous sampling: no drop/overrun and known timebase; field evidence: sample-seq continuity and timestamp monotonicity.
Edge features: explainable metrics (RMS/peak/crest/kurtosis/envelope/FFT bands); field evidence: repeatability under controlled stimulation.
Uplink + evidence: payload structured for diagnosis; field evidence: seq gaps, retry stats, and config version alignment.

Figure F1 — Device-side decomposition that keeps scope tight: analog correctness, synchronous sampling integrity, explainable features, and evidence-first logs to support field debugging.

H2-2｜Sensor & Measurement Targets: Define Band First, Then Build the Chain

The measurement band and event shape determine sampling rate, anti-alias strategy, noise floor, and dynamic range. Sensor type and mounting are chosen to satisfy that chain, not the other way around.

Start with a measurable target (band + amplitude + event shape)

Low-frequency trend (structural vibration, imbalance): longer windows, stable RMS/peak trends, tolerance to minor phase variation.
High-frequency shock/impulse (bearing/gear early fault): short windows, strong need for headroom (no clipping) and disciplined pre/post capture.

Common pitfall: “Looks fine in steady state” while short impulses clip or alias, making envelope/crest-based indicators unreliable.

Translate sensor specs into node-side consequences (with evidence hooks)

Sensitivity: sets ADC code utilization; too high clips on shocks, too low buries early faults in noise.
Noise density: defines feature jitter floor; verify via quiet-run spectrum and metric repeatability.
Max g / overload: protects event integrity; verify by clip counters and waveform flat-top signatures.
Mounting coupling: mechanically filters high-frequency content; verify by controlled repeat tests after remounting.

When IEPE is the practical choice (clear boundary, device-side reasons)

Long cables / harsh noise: constant-current excitation + defined bias point supports stable operation and easier field diagnostics.
Need for evidence: bias voltage and compliance headroom are measurable; clipping risk can be monitored and logged.
High-frequency reliability: common PdM ecosystem for industrial vibration; broad sensor options with known behavior under shock.

Scope note: This section stays at the node interface level (excitation + signal integrity). Network architecture and cloud analytics remain out of scope.

Design mapping: band → sampling/AA → DR/noise → features

Sampling rate: choose with anti-alias margin (filter roll-off + transition band), not only Nyquist.
Anti-alias: align cutoff to the target band; avoid “alias spikes” that mimic fault signatures.
Dynamic range: reserve headroom for impulses; avoid “feature distortion” from rare events.
Feature set: trend (RMS/peak/FFT bands) vs shock (crest/kurtosis/envelope + short FFT windows).

Figure F2 — “Band-first” planning map that prevents scope drift: define band/event shape, then set sampling and anti-alias, reserve dynamic range for shocks, validate mounting coupling, and output explainable features with evidence fields.

H2-3｜IEPE Constant-Current & the “Compliance Voltage” Window

An IEPE link behaves reliably only when (1) the excitation current is stable and (2) the compliance voltage still has headroom under the largest shock. If compliance collapses, the failure mode is not “slightly worse”—it becomes clipping, spectrum distortion, and missed impulsive events.

IEPE, simplified (what must be guaranteed)

IEPE model: constant-current source + sensor internal buffer.
Design must guarantee: excitation current in the intended range (application-dependent) and enough compliance voltage under max event.

Excitation current affects noise robustness and cable drop (more current improves some noise margins but increases I × R drop).
Compliance voltage must cover bias + signal swing + cable drop + protection drop + margin.

Compliance budget (turn “it works” into a measurable window)

Treat compliance as a budget, not a checkbox:
V_needed ≈ V_bias + V_signal(peak) + V_cable_drop + V_protect_drop + Margin

V_bias: sensor bias point after settling; measurable at the node input.
V_signal(peak): peak swing during worst-case shock/impulse (the event that PdM cannot afford to lose).
V_cable_drop: proportional to excitation current and cable/connector resistance; increases quickly with long runs or poor contacts.
V_protect_drop: protection elements can add drop during large excursions; this directly steals headroom.
Margin: required so the largest event remains linear (no flat-topping, no slow recovery).

Field symptoms → evidence (two common failure trees)

A) Compliance is insufficient
Symptoms: flat-topped waveform, distorted impulse, “strange” harmonics, missed shocks, unstable envelope/crest metrics.
Evidence: waveform shows hard clipping at peaks; clipping/overrange counter increments; behavior improves with shorter cable, lower excitation current, or reduced gain.

B) Excitation ripple / noise leaks into the signal
Symptoms: low-frequency lift, false peaks, envelope mis-triggers, trend drift without mechanical cause.
Evidence: elevated noise floor during quiet operation; artifacts correlate with supply changes; improvement after filtering/decoupling or cleaner excitation source.

Engineering check trio (a minimal, repeatable workflow)

1) Measure static bias voltage: confirm it settles within the expected window and remains stable across temperature and cable changes.
2) Stress the max-event swing: apply or capture the largest expected shock; verify no clipping and fast recovery.
3) Estimate cable drop: compute/measure cable resistance and connectors; verify I × R drop does not consume margin at the worst event.

Practical rule: PdM credibility is decided by the rare event. A chain that is linear for steady vibration but clips on impulses will produce confident-looking, wrong features.

Figure F3 — IEPE reliability is a budget problem. Compliance must cover bias + peak signal + cable/protection drops and still keep margin during max shock.

H2-4｜Low-Noise AFE Chain: HPF → Gain → Anti-Alias → Protect → Diff/Vcm → ADC

The path from IEPE output to the ADC is a sequence, not a parts list. Each stage exists to preserve useful mechanical information while preventing aliasing, clipping, and protection-induced distortion.

Canonical chain (device-side)

Chain: AC coupling / HPF (remove bias) → gain (TIA/INA/PGA) → anti-alias LPF → input protection (as light as practical) → differential driver + common-mode set → ADC input range alignment.

Key trade-offs (write as “decision + field evidence”)

HPF corner: too high removes real low-frequency vibration (trend becomes artificially flat); too low allows drift/bias wander into metrics (trend drifts without mechanics).
Anti-alias filtering: not “stronger is always better” — aggressive filters can add phase/group-delay behavior that harms impulse timing; insufficient filtering creates alias peaks that mimic faults.
Input protection: protection parasitics can steal headroom and add distortion under large events; symptoms look like “hard bends” and slow recovery during shocks.

Stage-by-stage: what each block must prove

AC coupling / HPF: proves low-frequency integrity; verify with low-band excitation and long-window stability checks.
Gain (TIA/INA/PGA): proves noise floor vs headroom balance; verify that rare impulses never clip while quiet-run noise stays low.
AA LPF: proves alias control; verify via sweep near Nyquist and by changing sample rate to see whether suspicious peaks “move” (alias signature).
Protect: proves survivability without measurement damage; verify large-event behavior (no premature conduction causing flat-tops).
Diff driver / Vcm: proves ADC compatibility; verify input common-mode and full-scale swing alignment under load.

Minimum validation set (fast acceptance tests)

Sine (single tone): confirm linearity, no clipping, and reasonable harmonic content (gain + ADC range check).
Sweep (band + near Nyquist): confirm amplitude/phase sanity and absence of alias peaks (AA + Fs check).
Impact / impulse: confirm peak capture without flat-tops and fast recovery; confirm trigger and pre/post buffer produce stable features.

Acceptance goal: features remain stable across repeats, and any failure mode leaves a clear signature (clip counter, alias behavior, recovery time).

Figure F4 — The AFE is a sequence: remove bias, set gain for noise vs headroom, block aliases, keep protection light, set differential/common-mode correctly, then align ADC range.

H2-5｜Synchronous ADC & Sampling Plan: From Rules of Thumb to Computable KPIs

Sampling decisions must be defensible with device-side evidence. A practical plan converts sample rate, dynamic range, and sync error into KPIs that can be measured during bring-up and verified in the field.

Sample rate is not only Nyquist

Treat Fs as a three-constraint decision: anti-alias transition room, impulse time resolution, and feature-window discipline.

Anti-alias margin: leave transition band between the highest useful band and Fs/2 so the AA filter can roll off without folding energy back as false peaks.
Impulse time resolution: Δt = 1/Fs. If Δt is coarse, short shocks look “rounded,” peak/crest features drift, and triggers become inconsistent.
Window discipline: feature windows are T = N/Fs. Trend windows and event windows must both remain meaningful and repeatable.

Field evidence for aliasing: suspicious spectral peaks move when Fs changes. True mechanical peaks stay put.

Dynamic range: noise floor vs maximum shock (avoid “steady looks fine, shock clips”)

Noise-floor KPI: quiet-run spectrum floor and short-term feature variance (repeatability) indicate whether the chain is dominated by electrical noise.
Headroom KPI: maximum event must stay below full-scale with margin; clipping and slow recovery contaminate the post-event window and break envelope/crest metrics.
Bring-up check: step through gain/scale options and record: noise floor, clip counter, and recovery time under large impulses.

Quiet-run floor Max-event margin Clip/overrange Recovery time

Why synchronous ADC matters: channel skew becomes phase error

A small channel timing mismatch becomes a frequency-dependent phase error:
Phase error ≈ 2π · f · Δt_skew (higher frequency → more sensitive).

Multi-point comparability: two-end bearing measurements and multi-axis sensing rely on stable cross-channel phase relationships.
Skew KPI: measure channel-to-channel time alignment and stability across temperature and run time (device-internal).
Evidence: apply the same input to multiple channels and check correlation lag / phase difference repeatability.

Interface & sampling clock: practical conclusions + checks

Clock jitter hurts high-frequency credibility: a “clean looking” low-frequency spectrum can hide high-frequency SNR loss if the sampling clock is noisy.
Check 1 (HF sine): inject a near-top-band sine and observe SNR / noise floor; degradation that grows with frequency is a classic jitter signature.
Check 2 (peak shape): as input frequency increases, watch for “fatter peaks” and rising skirts/noise floor around the tone.

Boundary reminder: this section covers device-internal ADC clock/reference integrity only (no network time algorithms).

Figure F5 — Synchronous sampling credibility comes from alignment (skew), consistent trigger t0, and stable pre/post buffers, plus device-side evidence (clip, timestamps, correlation).

H2-6｜Node-Internal Timebase & Phase Consistency: From Jitter/Drift to Diagnostic Evidence

This section covers node-internal time only: XO/PLL/RTC behavior, warm-up stability, and evidence that separates clock drift from gain drift and mounting changes. Network time algorithms are out of scope.

What “timebase” means inside a PdM node

XO/TCXO: sets the sampling cadence; temperature and aging shift frequency and phase behavior.
PLL (optional): can multiply/clean clocks but adds its own lock/warm-up behavior that must be observable.
RTC: anchors long-term scheduling and timestamps; must remain monotonic across sleep cycles and reboots.
Warm-up stability: define a settle window after power-up before comparing “day-to-day” trend features.

“Same machine, different day” not comparable: split into three causes

1) Clock drift / warm-up behavior
Symptoms: peak frequency slowly shifts; phase relationships drift; event alignment slides over time.
Evidence: peak-position drift vs temperature/run-time; timestamp rate error; early-run features unstable until warm-up settles.

2) Gain / analog state drift
Symptoms: amplitude changes while the frequency structure stays similar.
Evidence: quiet-run floor and calibration tone amplitude shift; RMS scales without matching peak-position drift.

3) Mounting / coupling changes
Symptoms: high-band energy changes strongly; multi-axis ratios change after re-mounting.
Evidence: band-energy step changes after installation events; coherence changes across axes/points.

Actionable rule: log installation events as metadata; otherwise electrical drift and mechanical coupling changes become indistinguishable.

Multi-point phase relationships drift: three device-side chains to check

Sampling alignment chain: channel skew changes with temperature/supply → phase error grows at higher frequency.
Trigger alignment chain: trigger latency/debounce varies → t0 alignment drifts inside pre/post buffers.
Path/connection chain: cable/connector changes create delay steps and amplitude changes → correlation lag steps.

Evidence chain table (device-side only)

Waveform phase / correlation: use correlation lag and phase difference repeatability to detect skew and trigger drift.
Spectral peak stability: track peak position and peak width; frequency drift and “peak fattening” point to timebase issues.
Timestamp continuity: enforce monotonic timestamps and sequence continuity; gaps are direct proof of capture/logging faults.
Warm-up/temperature tags: correlate metrics with temperature and time-since-boot to separate warm-up effects from real machine changes.

Explicit boundary: do not attribute drift to network sync; this section is strictly internal (XO/PLL/RTC + timestamps).

Figure F6 — Diagnose day-to-day non-comparability with device-side proofs: correlation lag, peak shift, timestamp continuity, and warm-up behavior (internal only).

H2-7｜Edge Feature Extraction: Build Explainable Features Before “AI”

A PdM edge node earns trust with explainable, repeatable features and device-side evidence. Start with deterministic metrics (time/frequency/event logic), then add learning later if needed.

Quality gate first: do not compute features on untrustworthy samples

Convert device-side counters into a quality flag and reason codes before feature math: OK / Degraded / Bad.

Signal integrity: clip/overrange, saturation count, recovery time indicators.
Capture integrity: sample drops, FIFO overflow, timestamp/sequence gaps.
Context integrity: gain/range state, temperature and supply summary for the record.

Rule: if quality is Bad, emit evidence (counters + snippet) but suppress “confident” conclusions.

Explainable time-domain features: what each one is good at

RMS: stable energy trend for steady vibration; robust for long-run monitoring.
Peak: highlights shocks and impacts; sensitive to clipping and bandwidth limits.
Crest Factor (Peak/RMS): amplifies rare impulses riding on a steady baseline; collapses if peaks clip.
Kurtosis: emphasizes sparse spikes; requires consistent windowing and installation metadata to stay comparable.

Window length fixed No clipping Repeatability checked

Common failure mode: clipping can make crest/kurtosis look “healthier” by flattening real spikes.

Frequency-domain features: node-side implementation mindset

FFT band energy: map spectrum into a few fixed bands (low/mid/high or application bands) and trend band deltas.
Peak tracking: track top peaks (frequency + amplitude) plus peak stability (drift/width) as credibility evidence.
Envelope / sideband (minimal): band-limit → rectification or simple envelope → low-rate FFT/bands; keep compute bounded.

Credibility hint: suspicious peaks that shift when Fs changes are often alias artifacts (device-side check).

Event triggering: threshold + sliding window + debounce (with pre/post capture)

Treat triggering as a small, deterministic state machine: Idle → Candidate → Confirmed → Cooldown.

Sliding window: short window for detection, longer window for confirmation to suppress noise spikes.
Debounce: separate enter/exit thresholds or minimum duration to prevent “chatter.”
Pre/Post capture: store a short snippet around t0 so the alert can be audited later.

Output recommendation: feature + confidence + minimal raw evidence

Features: fixed fields (time + frequency) to keep results comparable across firmware versions.
Confidence: derived from quality-gate status + repeatability checks (not an opaque probability).
Evidence snippet: small raw segment (pre/post) for forensic review and model improvement later.
Context: timestamp, config version, gain/range, temperature/supply summary, counter snapshot.

Goal: the same parameter set remains stable across temperature, supply variation, and installation changes (tracked as metadata).

Figure F7 — A PdM node stays credible by gating features with device-side quality, triggering events deterministically, and sending a minimal evidence pack.

H2-8｜Local Data & Logs: Every Alert Must Answer “Why Trust This?”

Alerts become actionable only when the edge node can present evidence, context, and health. The core design goal is auditability under real field constraints (offline periods, reboots, transient faults).

Two-layer local storage: short raw snippets + long feature trends

Ring buffer (snippets): store short raw waveform segments around events (pre/post) for forensic review.
Trend store: store low-rate feature trends for long-term baselining and drift detection.
Link key: join by event_id, timestamp and a sequence index so nothing is ambiguous.

Benefit: an alert can be re-checked even when backhaul is unreliable or the device reboots.

Minimum required event record fields (standardized, compact, reproducible)

A record is only credible if it is reproducible: bind the measurement to time, configuration, state, and integrity counters.

Time & ordering: monotonic timestamp + continuous sequence number (gap is direct evidence of capture/logging faults).
Config identity: config version or hash, plus sampling/feature parameter set identifier.
Measurement state: gain/range state, full-scale mapping, temperature and supply summary.
Integrity snapshot: clip count, drop/overflow counters, timestamp gap count, sync/skew status (if available).

The three most useful log families in the field

Sensor-chain health: bias/compliance indicators, saturation counters, sensor disconnect hints.
Sampling-chain health: drop/overflow counters, clock anomaly hints, alignment/skew stability flags.
Reporting-chain health: retry/latency summaries, sequence breaks, payload integrity checks (device-side).

Sensor health Sampling health Reporting health

Evidence Pack: make every alert “question-proof”

Conclusion: severity + short feature-change summary (what changed, how much).
Evidence: raw snippet reference + feature vector + confidence derived from quality gate.
Context: config version, gain/range, temperature/supply summary, install metadata tag if available.
Integrity: timestamp/sequence continuity + counter snapshot (clip/drop/overflow/gaps).

Non-negotiable: without evidence + integrity, an alert is not trustworthy, regardless of back-end analytics.

Figure F8 — An alert becomes trustworthy when it links to an Evidence Pack, local storage (snippets + trends), and three health-log families with counters.

H2-9｜LoRa / Ethernet Uplink: Bandwidth + Reliability + Diagnosability (Node-Side)

Node-side uplink is not “send more data.” It is a disciplined contract: keep bandwidth predictable, make failures observable, and preserve evidence for later verification.

Two uplink modes: Trend vs Event (with built-in rate limits)

Periodic trend uplink: low bandwidth and highly robust. Send compact feature summaries and deltas.
Event uplink: alert + evidence. Send short pre/post snippets only under a strict quota.
Degrade ladder: event+snippet → event-only → trend-only when quota/queue pressure rises.

Principle: when the field gets noisy (bursts of events), the node must stay alive and keep auditable records locally.

Payload schema: optimize for audit and reconciliation

A packet should be self-consistent and verifiable. Minimum fields: seq, timestamp, config_hash, quality_flag, counter_snapshot.

seq (continuity): exposes loss, duplication, or reordering; makes “what was missed” measurable.
timestamp: aligns trend and event timelines; enables time-window audits.
config_hash: binds the measurement to firmware and parameter set (repeatability).
quality_flag + reason codes: indicates whether features should be trusted.
counter snapshot: clip/drop/overflow/gap/retry counts to explain anomalies.

Reliability as diagnosable behavior (not a vague promise)

Queue observability: queue depth, overflow count, last-success timestamp.
Retry observability: retry count, backoff state, consecutive-fail counter.
State code: emit a compact uplink state: Up / Degraded / Down with reason codes.

Outcome: a failed uplink is still informative if it produces consistent state + counters.

Ethernet-side evidence (strictly node-side)

Link evidence: link up/down, link-flap count, PHY error counters (when available).
Power coupling evidence: reboots/resets correlated with high-current uplink activity (e.g., PoE rail droop).
Local timing: last-success timestamp + sequence continuity (no TSN/PTP algorithm discussion).

Fig H2-9-1 — Trend/event sources feed a node-side policy (rate limit + queue + retry + degrade ladder) producing diagnosable packets with audit fields.

H2-10｜Power & Robustness: When “Waveforms Look Fine” but Features Jitter

Feature jitter often comes from power and reference instability. The waveform may look acceptable, yet the feature pipeline is sensitive to noise-floor shifts, reference drift, and ground/common-mode injection.

Three coupling paths that create “pseudo vibration”

Current-source ripple: modulates IEPE bias/output, raising low-frequency floor and triggering false events.
AFE reference / common-mode drift: appears as gain/bias drift, making day-to-day data incomparable.
ADC Vref jitter / ground bounce: broadens peaks and lifts high-frequency noise floor, destabilizing band energy.

Key idea: “normal looking” time traces can still produce unstable crest/kurtosis/band energy because those metrics amplify subtle shifts.

Why features are more fragile than visual waveform checks

Crest / kurtosis: highly sensitive to clipping, tiny spikes, and window inconsistency.
Band energy / peak tracking: sensitive to noise-floor shifts and reference jitter (peaks widen or drift).
Event state machine: sensitive to low-frequency drift; debounce can fail if the baseline moves.

Ground & shielding in long IEPE cabling: node-side symptoms only

Ground loop / common-mode swing: injects mains components and slow baseline movement.
Observable symptoms: low-frequency lift, 50/60 Hz + harmonics, unstable triggers, phase inconsistency across channels.
Node-side evidence: rising quality flags, clip counts, baseline drift indicators, and repeatability failure across sessions.

Boundary: focus on symptoms and evidence at the node; avoid turning this into a general EMC guide.

Three must-measure points in the field (and what instability looks like)

Measure at the node and map each instability to feature symptoms. Three priority probes: current ripple, AFE ref/CM, ADC Vref/GND.

Probe 1 — current-source ripple: ripple correlates with low-frequency lift and false triggers.
Probe 2 — AFE ref/common-mode: drift correlates with gain-equivalent drift and day-to-day mismatch.
Probe 3 — ADC Vref/ground bounce: activity-dependent noise correlates with HF band jitter and peak broadening.

Minimal mitigations (node essentials only)

Partition & timing: reduce coupling by separating analog/reference rails and avoiding heavy uplink bursts during capture windows.
Reference stability: prioritize low-impedance reference return and stable common-mode biasing for the ADC driver.
Verify by evidence: expect counter/quality improvements and reduced feature jitter under the same parameter set.

Fig H2-10-1 — Power/reference instability can create feature jitter through specific injection points; three field probes tie symptoms back to causes at the node.

Node-side only • Bench → Evidence → Repeatability → Field triage → Regression

H2-11 — Validation & Debug: A Bench-to-Field Closed Loop

This chapter turns IEPE/AFE/ADC/feature/logging knowledge into a repeatable engineering loop. Every test produces a consistent evidence package (bias/compliance, clip/recovery, sampling health, feature stability, seq continuity), so field failures can be reproduced on the bench and fixed with confidence.

1) What “done” looks like (node-side definition)

Inputs are controlled: sine/sweep/impact excitation can be repeated with the same mounting and the same limits.
Evidence is explicit: bias/compliance margin, clip counters, anti-alias indicators, drop/overflow flags, and timebase/sequence continuity are always logged.
Features are repeatable: the same stimulus produces tightly bounded RMS/Peak/Band-energy trends across temperature and supply variation.
Field triage is deterministic: sensor-chain health → sampling health → uplink continuity, in that order.

Boundary reminder: this is PdM node validation. It does not cover gateway/cloud pipelines, network planning, or timing algorithms.

2) The closed-loop recipe (copyable method)

Step A — Bench input Inject repeatable stimuli (electrical sine/sweep, mechanical shaker, impact hammer) and keep mounting consistent.
Step B — Node evidence Record bias/compliance margin, saturation/clip & recovery, sampling drop/overflow, quality flags, and config hash.
Step C — Feature repeatability Repeat N times and compare spread (mean + variability) for RMS/Peak/Crest/Kurtosis and band energies.
Step D — Field triage + regression Use the evidence priority to localize root cause, then convert the field scenario into a bench regression case.

Figure H2-11 Closed-loop: Bench Inputs → Node Evidence → Repeatability → Field Triage → Regression Case

3) Bench setup (three injection modes + a golden reference)

The bench should support three repeatable stimulus types and one comparison reference. The goal is not “pretty waveforms”; the goal is stable evidence counters and repeatable features under controlled inputs.

Electrical sine/sweep injection (front-end verification): validates gain/phase, filter corners, anti-alias behavior, and clipping margin without mechanical uncertainty.
Shaker / handheld calibrator (steady mechanical excitation): validates band energy tracking and amplitude stability with controlled frequency and acceleration.
Impact hammer (impulse events): validates trigger, pre/post capture, and saturation recovery behavior.
Golden reference channel: compare against a known-good IEPE accelerometer path to detect mounting or sensor-chain anomalies early.

Minimum kit principle: one controllable input + one reference + repeatable mounting beats an overcomplicated setup.

4) Test checklist with pass/fail evidence (make each item measurable)

(a) Compliance margin sweep — vary cable length, temperature, and supply voltage. The evidence target is not only “no clipping”, but also stable bias/compliance margin and consistent event capture.

Record: bias voltage, compliance headroom, clip counters, quality flags, and a short pre/post snippet reference.
Fail pattern: waveform flattening, spectrum anomalies, missing impacts, and unstable trigger behavior.

(b) Saturation & recovery time — inject an over-range impulse and measure how fast the chain returns to valid measurements.

Record: clip onset timestamp, recovery time, post-event baseline settling, and any feature “aftershocks”.
Fail pattern: slow recovery makes the next real event look “smaller” or “noisier”, creating false negatives/positives.

(c) Anti-alias foldback check — sweep beyond Nyquist and verify that out-of-band energy does not reappear as in-band peaks.

Record: in-band/out-of-band energy ratio, peak location drift, and band-energy stability at the stopband edge.
Fail pattern: “ghost peaks” or band-energy inflation that looks like a bearing defect but is pure foldback.

(d) Feature repeatability — repeat the same input N times. Pass criteria must be numerical (spread bound), not visual.

Record: mean + variability for RMS/Peak/Crest/Kurtosis and band energies; keep the config hash constant.
Fail pattern: waveforms look similar but features drift → often caused by supply/ground/reference instability.

5) The “minimum record schema” (so every result is regression-ready)

Every run should emit a consistent, parseable record. This is the difference between “a lab demo” and “an engineering product”.

Identity: test_id, timestamp, node_id, firmware/config hash, parameter set ID.
Input conditions: stimulus type, frequency/amplitude (or impact count), mounting method, cable length, temperature, supply voltage.
Acquisition conditions: Fs, window length, filter profile ID, trigger thresholds/debounce, pre/post buffer lengths.
Evidence: bias/compliance headroom, clip counters, drop/overflow flags, timebase status, uplink seq continuity, snippet references.

If a field alarm cannot be traced to a matching evidence record + snippet reference, the node is not “diagnosable” yet.

6) Field triage priority (deterministic, node-side)

Field debugging should follow evidence priority. This avoids “feature chasing” when the sensor chain is already compromised.

Sensor-chain first: bias/compliance margin + clip/saturation counters. If these are unhealthy, features are not trustworthy.
Sampling next: drop/overflow/timebase anomalies. If the capture is unhealthy, repeatability will collapse.
Uplink last: seq/timestamp gaps. If continuity breaks, the field history becomes incomplete and alarms lose context.

7) Regression rule (turn every field failure into a bench replay)

Capture the trigger envelope: supply/temperature/cable/mounting conditions + the exact config hash that produced the failure.
Define expected evidence: which counters rise, which quality flags trip, what the snippet should look like.
Fix target: after the fix, the same bench replay must keep evidence healthy and features within the repeatability bound.

This loop converts “field anecdotes” into a growing, automated validation library.

8) Example reference BOM (specific part numbers for a practical validation setup)

The following part numbers are commonly used to build a practical PdM node validation bench and a “golden reference” chain. They are examples to make the checklist actionable; final selection should match the target frequency band and mounting constraints.

Category	Example P/N / Model	Why it helps in H2-11
Golden reference sensor	PCB Piezotronics 352C33	IEPE accelerometer reference channel for bias/compliance sanity checks and repeatability comparisons.
Handheld calibration shaker	PCB Piezotronics 394C06	Repeatable steady excitation to validate amplitude/band-energy tracking and feature spread across repeated runs.
Impact / impulse source	PCB Piezotronics 086C03	Controlled impulse events to validate trigger, pre/post capture, and saturation recovery behavior.
Portable ICP/IEPE conditioner	PCB Piezotronics 480C02	Battery-powered signal conditioning for quick field/bench checks of IEPE turn-on, bias, and basic signal integrity.
IEPE data acquisition	NI 9234	Simultaneous-sampling dynamic acquisition module widely used with IEPE sensors—useful as a bench reference capture path.
Sync ADC options	ADI AD7768-1 TI ADS131M04	Representative sigma-delta ADCs often chosen for simultaneous sampling / precision capture; useful as “known-good” comparator designs.
LoRa radio (node-side)	Semtech SX1262	Representative LoRa transceiver for building a deterministic seq/timestamp continuity test (periodic vs event payload).
Ethernet (node-side)	WIZnet W5500 Microchip LAN8720A	Representative embedded Ethernet controller + PHY choices to implement link-up evidence, drop counters, and payload continuity tests.
IEPE constant-current building block	LT3092	Programmable current-source IC option used as a building block for IEPE excitation experiments (compliance margin sweeps).
Edge DSP MCU example	ST STM32H743	Representative high-performance MCU family for feature extraction + logging + evidence counters under repeated test workloads.

Practical tip for repeatability: always log config hash + mounting method + cable length. Those three fields explain many “same machine, different day” discrepancies.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12 — FAQs (Node-side, evidence-first)

Each answer follows an evidence-first flow: symptoms → two measurement points → quick branching → node-side action → deep-dive chapter. No gateway/cloud/network-planning content.

1) IEPE compliance voltage is insufficient: what are the most common waveform/spectrum symptoms, and which two points should be measured first?

Insufficient compliance most often shows as flat-topped peaks, asymmetric clipping, and “clean” tones turning into odd harmonics or sudden broadband rise near impacts. Measure (1) IEPE bias at the node input and (2) headroom to the clipping limit at the largest expected swing. If headroom collapses under load or long cable, treat it as a compliance budget issue first.

Branching: bias stable but peaks clip → compliance headroom too small; bias moves with supply/radio activity → ripple/ground coupling dominates.
Node-side action: lower gain before clipping, reduce series drop, confirm protection-device drop under peak conditions.

Go deeper: H2-3 (IEPE compliance window) · H2-11 (compliance margin sweep test)

2) After extending the cable, alarms increase: is it the sensor/cable or constant-current ripple, and how to tell?

Longer cables increase both DC drop (reducing compliance headroom) and susceptibility to common-mode pickup. First check whether bias/headroom degrades monotonically with cable length—that points to compliance loss. Then correlate alarms with current-source ripple or radio/PoE switching events; if alarms cluster with ripple bursts, ripple/ground coupling is the driver.

Branching: headroom shrinks with length → cable drop/compliance; headroom OK but features jump with supply noise → ripple/EMI coupling.
Node-side action: add ripple measurement in logs, enforce quality flags when bias/headroom unstable.

Go deeper: H2-3 · H2-10 (power/noise → false vibration)

3) Low-frequency vibration exists, but RMS/trend barely changes: is the high-pass corner mis-set?

Yes—an overly high AC-coupling corner is a classic reason low-frequency energy disappears from RMS. Confirm by comparing band energy below the corner versus above the corner (or run a slow sweep). If RMS is insensitive while peaks still appear, the chain is filtering out the low-frequency content before feature extraction. The fix is a corner frequency aligned to the true mechanical band, not a generic “noise cleanup” value.

Branching: low-band energy collapses → high-pass too high; low-band energy exists but RMS unchanged → windowing/feature definition mismatch.
Node-side action: log the configured corner ID/hash; gate trend validity if corner changes.

Go deeper: H2-4 (AC coupling & high-pass tradeoffs)

4) Impacts are often “missed”: change sampling rate first, or change trigger + buffer first?

Start with trigger and buffer. Many missed impacts are not bandwidth-limited; they are logic-limited: threshold too high, debounce too aggressive, or pre/post windows too short. Validate by logging trigger decisions (armed/triggered/rejected reason) and keeping a short rolling snippet buffer. Only raise sampling rate if the captured waveform lacks the expected rise time or high-frequency content after trigger settings are correct.

Branching: no trigger events logged → trigger tuning; trigger logged but waveform looks smeared → sampling/filter bandwidth.
Node-side action: add pre/post capture IDs; include a “trigger reject reason” counter.

Go deeper: H2-5 (sampling plan) · H2-7 (event trigger + pre/post capture)

5) Steady-state is small, but rare peaks saturate the ADC: how to choose range/gain more robustly?

Treat peaks as a separate design constraint: use p99/p999 peak statistics (not average) and enforce a minimum headroom target. If the ADC clips even rarely, early bearing features become untrustworthy because crest/kurtosis inflate and recovery distorts the next window. Stabilize by lowering analog gain, adding a “high-peak mode” profile, and logging clip counters + recovery time. Example ADC classes: AD7768-1 / ADS131M04 (simultaneous sampling) as typical references.

Branching: clip counters spike during impacts → gain/range mismatch; clip without impacts → compliance/ripple coupling.
Node-side action: choose deterministic gain profiles; include gain/range state in every event record.

Go deeper: H2-5 (dynamic range & saturation)

6) Multi-channel phase relation drifts: check sampling sync first or mounting coupling first, and how to order evidence?

Check sampling synchronization first using a bench same-source test (electrical injection or controlled shaker) to eliminate mounting uncertainty. If phase is stable on the bench but drifts in the field, the likely driver is mounting stiffness/placement or cable-induced coupling differences. Log channel-to-channel time skew, timebase status, and temperature so drift can be separated into timing versus mechanical coupling evidence.

Branching: drift on bench → sync/timebase; drift only in field → mounting/cable coupling.
Node-side action: keep a phase sanity metric (cross-correlation peak shift) per capture.

Go deeper: H2-6 (timebase & phase evidence) · H2-11 (bench-to-field loop)

7) Crest factor vs Kurtosis for early bearing faults: which is better, and why do false positives happen?

Kurtosis is highly sensitive to rare impulsive spikes, making it useful for early defect impacts—but also vulnerable to false positives from clipping, ESD bursts, or switching noise. Crest factor depends on the peak-to-RMS ratio, so it can also inflate when RMS is suppressed by filtering or window choices. Reduce false alarms by validating clip counters, enforcing quality flags, and keeping a short snippet for post-hoc confirmation.

Branching: kurtosis rises with clip counters → saturation artifact; crest rises with corner/filter changes → configuration artifact.
Node-side action: pair “feature value” with “confidence/quality” computed from evidence counters.

Go deeper: H2-7 (explainable features + stability)

8) Envelope/band features are unstable on-node: is it window/bandwidth or the noise floor?

Separate algorithm settings from analog noise by a two-step check: keep window/band parameters fixed (log the config hash), then observe whether spectral floor and peak width change with supply or radio activity. If the floor rises and peaks broaden, the cause is often analog/reference noise. If stability changes mostly with window length or band edges, the issue is parameterization (windowing/filters), not physics.

Branching: floor correlates with supply/ripple → noise floor; feature changes only when window changes → parameterization.
Node-side action: freeze parameters per deployment; log both “band definition” and “effective noise floor” metric.

Go deeper: H2-7 · H2-4 (AFE/filter chain)

9) LoRa bandwidth is small: how to layer trend vs event reporting without losing critical evidence?

Use a two-layer payload: trend packets at low duty cycle (band energies, RMS, temperature, health counters) and event packets with rate limiting (alarm type, top features, evidence snapshot, and a short snippet reference ID). Keep raw snippets in a local ring buffer and report only indexes/hashes over LoRa. Example node radios often use SX1262-class transceivers; the layering logic remains node-side.

Branching: trend stable but events missing → event throttling/buffer sizing; both unstable → sensor/power evidence first.
Node-side action: enforce event budget per hour/day; include a “dropped-event due to budget” counter.

Go deeper: H2-9 (reporting modes) · H2-8 (local logs/snippets)

10) Uplink occasionally drops packets: how to design sequence/retry/reconciliation fields so data remains traceable?

Make loss visible and reconstructable using a minimal field set: seq (monotonic), timestamp, config_hash, last_success_seq, and a compact health counter snapshot. With these, a receiver can detect gaps, confirm device configuration at the time of the event, and correlate anomalies to evidence counters. Ethernet nodes can apply the same idea (e.g., W5500/LAN8720A-class links) without discussing network planning.

Branching: seq gaps with healthy sampling counters → link issue; seq continuous but features wrong → node signal chain issue.
Node-side action: store-and-forward critical events locally; retry with bounded policy and log retry counts.

Go deeper: H2-9 (payload & traceability)

11) After temperature changes, all features drift: sensor sensitivity drift or AFE/reference drift, and how to verify?

Use a controlled comparison to isolate drift: first, run a bench repeatability check with a known excitation and a golden reference channel; if drift persists under controlled input, it is likely AFE/reference/timebase. Then check whether drift correlates with ADC Vref/common-mode/bias stability and supply noise. If drift appears only with different mounting or cable routing, treat it as mechanical coupling or installation variability.

Branching: drift on bench → analog/reference/timebase; drift only in field → mounting/cable/ground coupling.
Node-side action: log temperature + evidence counters; flag “trend invalid” when reference stability degrades.

Go deeper: H2-6 · H2-10 · H2-11

12) Measurement “looks normal” but alarms bounce: which three interference classes should be checked first?

Prioritize node-side interference that creates “fake vibration” signatures: (1) constant-current ripple or supply switching noise coupling into the IEPE chain, (2) AFE reference/common-mode movement that warps gain/baseline, and (3) ADC Vref/GND bounce that lifts the noise floor and destabilizes band features. Confirm by correlating alarms with ripple metrics, bias/common-mode drift, and clip/overflow counters before chasing feature tuning.

Branching: alarms align with supply/radio/PoE events → power coupling; alarms align with clip/recovery → range/compliance; alarms align with floor rise → reference/GND noise.
Node-side action: add three “must-log” points: current-source ripple, AFE ref/common-mode, ADC Vref/GND noise proxy.

Go deeper: H2-10 (power & immunity) · H2-11 (bench-to-field verification)

Figure H2-12 Evidence-first triage flow (symptom → measure → branch → action → deep dive)

PdM Edge Node Design: IEPE Accelerometer AFE to Edge Features

PdM Edge Node Design: IEPE Accelerometer AFE to Edge Features

H2-1｜PdM Edge Node Engineering Boundary & System Decomposition

What “PdM Edge Node” means (device-side only)

Two field profiles (design priorities differ)

Three questions every node must answer (turn into measurable outputs)

Module responsibilities (each with “field evidence”)

H2-2｜Sensor & Measurement Targets: Define Band First, Then Build the Chain

Start with a measurable target (band + amplitude + event shape)

Translate sensor specs into node-side consequences (with evidence hooks)

When IEPE is the practical choice (clear boundary, device-side reasons)

Design mapping: band → sampling/AA → DR/noise → features

H2-3｜IEPE Constant-Current & the “Compliance Voltage” Window

IEPE, simplified (what must be guaranteed)

Compliance budget (turn “it works” into a measurable window)

Field symptoms → evidence (two common failure trees)

Engineering check trio (a minimal, repeatable workflow)

H2-4｜Low-Noise AFE Chain: HPF → Gain → Anti-Alias → Protect → Diff/Vcm → ADC

Canonical chain (device-side)

Key trade-offs (write as “decision + field evidence”)

Stage-by-stage: what each block must prove

Minimum validation set (fast acceptance tests)

H2-5｜Synchronous ADC & Sampling Plan: From Rules of Thumb to Computable KPIs

Sample rate is not only Nyquist

Dynamic range: noise floor vs maximum shock (avoid “steady looks fine, shock clips”)

Why synchronous ADC matters: channel skew becomes phase error

Interface & sampling clock: practical conclusions + checks

H2-6｜Node-Internal Timebase & Phase Consistency: From Jitter/Drift to Diagnostic Evidence

What “timebase” means inside a PdM node

“Same machine, different day” not comparable: split into three causes

Multi-point phase relationships drift: three device-side chains to check

Evidence chain table (device-side only)

H2-7｜Edge Feature Extraction: Build Explainable Features Before “AI”

Quality gate first: do not compute features on untrustworthy samples

Explainable time-domain features: what each one is good at

Frequency-domain features: node-side implementation mindset

Event triggering: threshold + sliding window + debounce (with pre/post capture)

Output recommendation: feature + confidence + minimal raw evidence

H2-8｜Local Data & Logs: Every Alert Must Answer “Why Trust This?”

Two-layer local storage: short raw snippets + long feature trends

Minimum required event record fields (standardized, compact, reproducible)

The three most useful log families in the field

Evidence Pack: make every alert “question-proof”

H2-9｜LoRa / Ethernet Uplink: Bandwidth + Reliability + Diagnosability (Node-Side)

Two uplink modes: Trend vs Event (with built-in rate limits)

Payload schema: optimize for audit and reconciliation

Reliability as diagnosable behavior (not a vague promise)

Ethernet-side evidence (strictly node-side)

H2-10｜Power & Robustness: When “Waveforms Look Fine” but Features Jitter

Three coupling paths that create “pseudo vibration”

Why features are more fragile than visual waveform checks

Ground & shielding in long IEPE cabling: node-side symptoms only

Three must-measure points in the field (and what instability looks like)

Minimal mitigations (node essentials only)

1) What “done” looks like (node-side definition)

2) The closed-loop recipe (copyable method)

3) Bench setup (three injection modes + a golden reference)

4) Test checklist with pass/fail evidence (make each item measurable)

5) The “minimum record schema” (so every result is regression-ready)

6) Field triage priority (deterministic, node-side)

7) Regression rule (turn every field failure into a bench replay)

8) Example reference BOM (specific part numbers for a practical validation setup)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

Explore

Categories

Get in Touch