Process Data Logger / Gateway (Integrity-Grade Logging)
← Back to: Industrial Sensing & Process Control
A process data logger is not just a recorder—it is an evidence system that keeps records in order, keeps time consistent, survives power loss without silent corruption, and makes tampering detectable with verifiable signatures.
A “trustworthy” logger is judged by its proof: monotonic timelines, safe commit markers, recoverable storage, and audit-ready integrity fields—not by how many protocols it can read.
H2-1. Center Idea — What Makes a Logger “Trustworthy”
This chapter does not describe features. It sets the acceptance bar for what an industrial logger must prove—under audit, incident review, and worst-case power events.
A process data logger is not a recorder. It is an evidence system: it must preserve integrity, survive power loss without structural damage, keep time consistent, and make tampering detectable. Multi-protocol support is engineering workload; the real challenge is whether the recorded data remains defensible when something goes wrong.
- No-loss (Completeness): data gaps must be prevented where possible, and when unavoidable they must be measured and reported (drop counters, gap markers).
- In-order (Causality): the system must prove “what happened first” using sequence numbers and commit boundaries, not just arrival timing.
- Monotonic time (No backward jumps): wall-clock can step during sync, but logs must remain monotonic using a stable counter and recorded clock-step events.
- Power-fail safe (Structural self-consistency): brownouts must not leave half-written metadata or corrupted indices; recovery must be deterministic.
- Tamper-evident (Verifiable integrity): edits, deletions, or rollback must be detectable via hash chaining/signatures and key epoch rules.
These criteria define “trust” as a testable contract. The rest of the page maps each contract item to architecture choices and evidence fields that can be validated with power-yank tests, time-step tests, and integrity checks.
H2-2. System Architecture Overview
This architecture is best understood as four interlocking paths—data, power, time, and trust. Field failures become debuggable when each symptom is mapped to one (or a coupling) of these paths.
- Data path: ingress → aggregation → buffer → storage commit. Goal: preserve completeness and ordering under rate mismatch and bursts.
- Power path: brownout detect → freeze ingress → safe commit window → mark clean. Goal: prevent half-written structures and enable deterministic recovery.
- Time path: RTC/time sync → timestamp discipline → monotonic counter + clock-step events. Goal: logs remain monotonic even if wall-clock steps.
- Trust path: hash/sign → key epoch rules → verification status. Goal: tampering and rollback become detectable, not debatable.
A “module list” only becomes an evidence system when each module produces a small set of audit fields that can be checked after the fact: source identity, ingress time, sequence counters, commit markers, clean-shutdown flags, clock-step events, and signature verification state. Those fields are the bridge between engineering design and defensible records.
H2-3. Multi-Protocol Aggregation Engine
This chapter avoids protocol details and focuses on aggregation principles: converting heterogeneous inputs into a single evidence event stream with identity, time meaning, order proof, and observable loss.
A logger becomes defensible only when the aggregation layer emits records that can answer audit questions: what arrived, from where, in what order, with what time meaning, and what was lost or degraded. The core deliverable is not “more protocols”—it is a normalized event model.
Collection mode sets the meaning of time
- Polling collection provides a scheduled sampling instant; jitter is dominated by task timing. The event time represents “observed at poll.”
- Interrupt/event collection captures edges and bursts; latency is dominated by ISR/queueing. The event time represents “entered evidence chain after interrupt.”
Because the physical meaning differs, the system must treat ingress timestamp as the canonical “evidence entry time” and keep any additional “source time” strictly optional and explicitly labeled in later time-integrity chapters.
Normalization: the minimum audit record
- Timestamp at ingress: the earliest consistent point where all sources can be measured on one clock reference.
- Per-source sequence: enables gap/duplicate/reorder detection without guessing based on arrival timing.
- Quality & loss observability: drops and degradations must be measurable (drop counters, reason codes, overrun flags).
Rate mismatch handling: choose a controlled degradation
- Backpressure: slow sources or pause ingestion when queue depth exceeds thresholds, with explicit
bp_state. - Decimation: down-sample with declared factors and windows, never silently.
- Drop policy: if dropping is required, record
drop_cntandreason_codeso gaps are explainable.
Any degradation that does not leave an evidence trace becomes indistinguishable from tampering or malfunction.
H2-4. Buffering & Ordering Strategy
This chapter is the engineering pivot: buffering is not a throughput trick. It defines commit semantics—what is considered “persisted,” what may be lost, and how order remains provable.
The primary KPI is ordering integrity, not peak throughput. Throughput can be degraded with explicit traces (drops/decimation/backpressure), but if ordering becomes unprovable, records lose evidentiary value.
Ring buffer vs linear write: choose by recoverability
- Ring buffer isolates burst intake and makes “last consistent point” discoverable using pointers and wrap counts.
- Linear write can be simpler but is more vulnerable to partial metadata damage unless commit boundaries are explicit.
Double buffering enables atomic commit boundaries
- Buffer A receives ingress events while Buffer B is committed to storage.
- Switching buffers defines a commit boundary: a unit that must be locatable, verifiable, and recoverable.
Commit boundaries: locatable, verifiable, recoverable
- Locatable:
commit_ptrandcommit_ididentify the last completed boundary. - Verifiable: each boundary carries CRC/hash so partial writes are detectable.
- Recoverable: on reboot, replay rules rebuild state up to the last verified boundary.
Atomic records: detect “half records” without guessing
- Records must be self-delimiting (length + end marker) so the system can distinguish complete vs partial entries.
- Partial records are quarantined to the uncommitted region and never contaminate the committed history.
H2-5. NAND / SD Storage Management
Storage is the most common root cause of “mysterious corruption.” The goal is not just to write data, but to keep a recoverable and verifiable history across aging, noise, and unexpected power loss.
Flash reality: “written” is not the same as “recoverable”
NAND and SD-backed media rely on ECC, mapping, and retries. As devices age or operate at high temperature, raw bit errors increase and recovery work grows. When power fails during internal updates, metadata can become inconsistent even when some files still appear readable. Integrity-grade logging therefore requires health counters and deterministic recovery rules, not just a mountable filesystem.
Wear leveling: physical location is not evidence
- Wear leveling relocates data across physical blocks to extend life, so physical addresses cannot be treated as stable proof.
- The evidence boundary must be the commit structure: segment headers, commit markers, and verifiable chain pointers.
- Rising
ecc_err_cntis an early signal of approaching end-of-life or thermal stress.
Bad block management: detect and contain degradation
- Factory bad blocks exist from day one; grown bad blocks appear over time.
- The logger should maintain a
bad_block_tblwith growth trend and timestamps to make failures explainable. - Weak blocks often show up first as increasing
ecc_err_cntand retry activity before hard failure.
Journaling filesystem vs raw partition: deterministic recovery wins
Non-transactional metadata updates can leave directory structures and allocation maps half-updated after power loss.
The result may be “mountable but wrong,” which is fatal for evidence systems. Journaling introduces a replay rule:
after restart, the system replays (or rolls back) to the last consistent metadata state, and records the outcome in journal_replay.
Raw partitions can also be safe, but only when they implement an explicit journal-like commit protocol at the application layer.
Metadata redundancy: protect the recovery entry point
- Redundant metadata copies (A/B or N-of-M) protect pointers, indices, and segment headers from single-point corruption.
- Updates must be write-new-then-switch: write a new metadata copy, verify it, then atomically advance the active pointer.
- Verification status should be summarized in
verify_statso post-mortem analysis does not require guesswork.
CRC vs cryptographic hash: different threats, different guarantees
- CRC is effective against random corruption (noise, bit flips, media errors) and is computationally cheap.
- Cryptographic hash detects structured changes and supports tamper evidence when combined into a chain.
- A hash chain makes deletions/insertions detectable by linking each segment to the previous one;
hash_headidentifies the current chain head.
H2-6. Power-Loss Hold-Up & Safe Commit
Hold-up does not prevent a power loss. It guarantees a deterministic safe-commit window: enough time to freeze ingestion, flush critical state, advance commit metadata, and mark a clean boundary.
Hold-up sizing is a timing budget problem
The sizing question is not “how many farads,” but “how many milliseconds of guaranteed work.” The budget must cover detection latency, ingress freeze, storage flush behavior, metadata commit, and verification margin. If the window is not guaranteed at end-of-life temperature and aging, the design is not deterministic.
Brownout detection threshold: early enough, not noisy
- Too late: voltage collapses before metadata commit finishes, creating partial states.
- Too sensitive: line ripple triggers frequent commit cycles, reducing performance and increasing wear.
- Count and timestamp events in
bo_event, and trackunexp_rstto prove stability over time.
Pre-commit window: freeze → commit → mark clean
- Freeze ingress: stop accepting new events at a defined boundary and preserve ordering.
- Commit metadata: advance journal entries, commit pointers, and hash head as the durable boundary.
- Mark clean: set
clean_flagonly after the commit is verifiably complete.
If power collapses mid-window, the restart must replay to the last verified boundary and record an incomplete_mk
so the loss is explainable rather than ambiguous.
Supercap vs bulk capacitor: choose predictability under aging
- Supercap: longer windows, but requires charging control and health monitoring (leakage/ESR aging).
- Bulk capacitor: simpler, but shorter and less predictable under temperature and lifetime degradation.
- The best choice is the one that preserves the timing budget at end-of-life, not the one with the largest nominal energy.
Flush timing: a request is not proof
SD/eMMC devices may absorb writes internally and complete them later. A flush call is therefore a request, not a guarantee. The only safe proof is a committed boundary marker that can be verified after reboot: commit pointer advanced, journal entry consistent, and hash head updated.
H2-7. Time Integrity & Synchronization
Time is the core of evidence: it defines order, causality, and replay. The system must treat wall-clock time as a convenience and monotonic time as the ordering authority.
RTC drift: bounded, observable, and never assumed
RTC drift is inevitable under temperature variation and aging. Evidence systems therefore treat RTC as a local reference with
bounded error that must be measured and recorded. Storing rtc_offset makes time interpretation auditable:
it explains why timestamps diverge after long offline periods and how much correction was applied.
Discipline strategy: slew when possible, step only with a trace
- Slew gradually adjusts the wall clock to avoid discontinuities in human-readable time.
- Step may be required for large errors or leap-second style events, but must generate a
clock_step_logentry. - Quality gating prevents bad time sources from poisoning evidence; expose the result via
sync_state.
Monotonic counter: the ordering key that never goes backward
Every record should carry a strictly increasing mono_ctr used for sorting, windowing, and latency metrics.
Wall clock time remains a secondary field for cross-system alignment and human reading. If wall time repeats or moves backward,
monotonic ordering still guarantees a consistent event sequence.
Wall clock vs monotonic: dual-time model per record
- monotonic_time (via
mono_ctr): ordering and causality. - wall_time: alignment and reporting; may step.
- sync markers:
last_sync_tsandsync_stateexplain validity at capture time.
Leap seconds and clock steps: allow wall time to bend, never the log
Leap seconds, operator time changes, or upstream corrections can force wall time discontinuities. The logger must accept that wall time
can repeat or jump, but it must never break monotonic ordering. Each step event must be recorded with direction and magnitude in
clock_step_log so any downstream analysis can re-map wall time with full context.
H2-8. Signatures & Tamper Resistance
Evidence requires more than “checksums.” The system must detect modifications, prevent silent deletion/insertion, and resist rollback to old but valid histories.
Hash chain per block: detect deletions and insertions
A single hash can prove that one block was not altered, but it cannot prove that the history is complete.
A hash chain links each committed segment to the previous one (via prev_hash), making missing or inserted segments detectable.
The current chain head (hash_head) is part of the evidence state and must be advanced only at verified commit boundaries.
Segment-level signing: bind evidence to a device identity
- Each commit segment produces a segment hash; the logger signs it to create a tamper-evident record.
- Verification must produce an explicit
sig_verifyresult (pass/fail + reason), not a silent best-effort. - Signing granularity should align with commit boundaries to keep recovery and verification consistent.
Root key storage: signatures only matter if keys are non-extractable
- Software-stored keys are copyable and undermine evidentiary value.
- MCU secure storage raises the bar but still depends on platform hardening.
- Secure elements keep root keys non-extractable and perform signing internally; record the choice as
key_src.
Anti-rollback counter: prevent “valid but old” histories
Without anti-rollback, an attacker can replay an older signed log that still verifies. A monotonic rollback_ctr,
stored in a domain that cannot be decremented, prevents reverting to prior histories or firmware states.
Binding evidence to the running software environment via fw_ver_hash closes the loop: the log can prove not only that it was unmodified,
but also that it was produced under the expected firmware lineage.
H2-9. Gateway Uplink Strategy (Optional)
Uplink is a copy path, not the source of truth. The local evidence store remains authoritative: only verifiable, committed segments should be transported and accepted.
Store-and-forward: upload only committed segments
- Transport unit is a committed segment (bounded by commit pointers and journal rules), not raw streaming bytes.
- Each uploaded segment should carry segment identity: segment ID, monotonic range, hash head, and signature.
- Uploading “in-progress” data breaks recovery semantics and makes gaps impossible to explain.
Offline mode: treat disconnection as normal
- Logging must continue locally even when uplink is down; backlog growth must not corrupt commit boundaries.
- Backpressure should apply to uplink throughput, not to evidence preservation.
- Queue depth and last-ack markers (if implemented) should be logged as operational evidence.
Retry backoff: stability and device health
Aggressive retries amplify congestion and increase energy draw. Use exponential backoff with jitter, and record failure classes (timeout, handshake, unreachable) so behavior remains auditable over time.
Duplicate suppression: dedup by identity, never by wall time
- Duplicate delivery is expected under retries; reception must be idempotent.
- Dedup keys should be derived from segment/record identity (e.g., segment ID + hash), not wall-clock timestamps that may step.
- Monotonic counters are the ordering authority; wall time is not safe for identity decisions.
TLS vs local storage trust: transport security is not evidence integrity
TLS protects data in transit, but it does not prove that records were not modified at rest or replayed from an older valid history. Evidence integrity still requires local commit boundaries, hash chains, signatures, and anti-rollback policies. A receiver should verify chain and signatures before accepting a segment as evidence.
Upload boundary
Only committed segments; never partial windows. Align with commit pointers and journal rules.
Acceptance rule
Verify signature + chain continuity before acknowledging. Reject unverifiable segments.
Backoff rule
Exponential backoff + jitter; classify failures to keep behavior explainable.
Dedup rule
Dedup by stable ID/hash; never by wall time. Reception should be idempotent.
H2-10. Failure Modes & Forensics
Forensics converts symptoms into evidence-driven conclusions. Each failure mode below maps to specific evidence fields and the chapters that define recovery, ordering, and integrity rules.
Corrupted SD card (→ H2-5 / H2-6)
- Looks like: mount failures, unreadable segments, “random” missing data.
- Check: rising
ecc_err_cnt, growth inbad_block_tbl, frequentjournal_replay, presence ofincomplete_mk. - Likely cause: media degradation plus incomplete safe-commit windows.
- First fix: strengthen journaling/metadata redundancy and re-budget hold-up for verified commit.
Time reset to epoch (→ H2-7 / H2-6)
- Looks like: timestamps jump to 1970/2000; ordering becomes confusing across reboots.
- Check: empty or stale
last_sync_ts, abnormalrtc_offset, large deltas inclock_step_log, correlated brownout/reset events. - Likely cause: RTC power domain loss or discipline policy accepting poor time sources.
- First fix: mark wall time invalid while preserving
mono_ctrordering; tighten sync quality gating.
Record gap (→ H2-4 / H2-5 / H2-6)
- Looks like: missing span in monotonic ranges; segment continuity breaks.
- Check: commit pointer jumps,
incomplete_mkmarkers, chain discontinuity (hash head / prev hash), journal replay rollbacks. - Likely cause: interrupted commit boundary during power loss or buffer wrap without explicit boundary markers.
- First fix: enforce atomic commit boundaries and write explicit gap markers to keep evidence explainable.
Duplicate entries (→ H2-3 / H2-7 / H2-9)
- Looks like: identical records repeated, often around reconnect or retry events.
- Check: duplicate source+sequence, repeated
mono_ctrranges, uplink retry bursts; confirm dedup keys are not wall-time based. - Likely cause: retries without idempotent acceptance; dedup driven by wall clock that stepped.
- First fix: dedup by stable record/segment ID and require idempotent receiver behavior.
Brownout loop (→ H2-6 / H2-5)
- Looks like: repeated resets; system never reaches a clean commit state.
- Check: rapidly increasing
unexp_rst,clean_flagrarely true, frequentbo_event, repeatedjournal_replay. - Likely cause: brownout threshold too aggressive or hold-up budget insufficient under load/inrush.
- First fix: re-budget the safe-commit window, adjust thresholds/debounce, and minimize work required inside the window.
H2-11. Validation & Test Strategy
Validation is only meaningful when each test produces a consistent evidence story: power events, time events, storage recovery, and tamper checks must be traceable in fields and logs.
Power yank test (→ H2-6 / H2-5 / H2-4)
Test actions
1) Run steady ingress + commit workload.
2) Yank power at three phases: (A) pre-commit, (B) inside commit window, (C) metadata flush boundary.
3) Reboot and execute recovery scan / journal replay.
4) Verify last committed segments offline (chain + signature).
Expected evidence fields
last_clean_shutdown_flag=false (for yank cases)
unexpected_reset_counter++
incomplete_record_marker present (B/C)
journal_replay_log indicates replay/repair (if enabled)
commit_pointer / hash_head consistent after recovery
sig_verify=PASS for last committed segment
MPN examples (validation fixtures / power path): TPS2663 (eFuse / hot-swap), TPS25982 (eFuse), TPS3808 (supervisor), LTC4365 (surge stopper), LTC3350 (supercap backup controller), LTC4040 (backup power manager).
Time rollback test (→ H2-7 / H2-6)
Test actions
1) Capture stable logs under a disciplined time source.
2) Force wall-clock steps (backward and forward).
3) Continue logging across multiple commit segments.
4) Compare ordering by mono_ctr vs wall time; validate step traceability.
Expected evidence fields
mono_ctr strictly increasing (no repeats/backward)
clock_step_log contains direction + delta + reason/source
last_sync_ts updates across discipline events
rtc_offset changes remain explainable / bounded
MPN examples (RTC / time base): DS3231M (temperature-compensated RTC), RV-3032-C7 (ultra-low-power RTC), Abracon ABS07 / Epson FC-135 (typical 32.768 kHz crystal families for RTC domains).
Corrupted block injection (→ H2-5 / H2-8)
Test actions
1) Select one committed segment and its metadata region.
2) Inject corruption (bit flip, unreadable block, index damage).
3) Reboot and force recovery path (scan + journal replay).
4) Run offline verification (hash chain + signature).
Expected evidence fields
ecc_err_cnt increases (or read-fail counters trigger)
bad_block_tbl updated (if remap occurs)
journal_replay_log records repair/rollback steps
sig_verify=FAIL (tamper/corrupt) for impacted segment, with reason
chain discontinuity detectable via hash_head/prev_hash
MPN examples (flash targets for injection): W25N01GW (SPI NAND family), MT29F / MT29F1G (raw NAND families), industrial microSD examples often used in logging validation: Swissbit S-45u / S-55u series (family naming), Kingston Industrial microSD families.
Wear endurance test (→ H2-5 / H2-6)
Test actions
1) Run two write profiles: (A) small-record high-frequency, (B) large-segment low-frequency.
2) Execute accelerated write cycles to a defined total written budget.
3) Periodically sample error counters and remap tables.
4) Randomly verify segments (signature + chain) throughout the run.
Expected evidence fields
ecc_err_cnt trend observable but bounded
bad_block_tbl growth rate explainable
journal_replay_log appears only on defined fault triggers
sig_verify remains PASS for all committed segments (no silent degradation)
MPN examples (endurance-oriented storage options): eMMC families such as Micron eMMC (industrial grades) or Kioxia eMMC (industrial grades), SPI-NAND options like W25N series; endurance testing should use the exact SD/eMMC/NAND candidates planned for deployment.
Tamper verification test (→ H2-8 / H2-7 / H2-5)
Test actions
1) Modify payload inside a committed segment (content tamper).
2) Delete one middle segment (history gap).
3) Insert a fabricated segment (history splice).
4) Replay an older valid log set (rollback / replay).
5) Run verifier: chain check + signature check + anti-rollback gate.
Expected evidence fields
sig_verify=FAIL for modified segments (reason recorded)
hash_head/prev_hash mismatch pinpoints deletion/insertion
rollback_ctr rejects older histories (monotonic requirement)
fw_ver_hash mismatch rejects wrong firmware lineage
MPN examples (root-of-trust / key protection): ATECC608B (secure element), NXP SE050 (secure element family), Infineon OPTIGA™ Trust M (secure element family). These enable non-extractable keys and on-chip signing for verifiable evidence.
H2-12. FAQs (Evidence-Driven)
Each answer follows a strict evidence pattern: 1 conclusion, 2 evidence checks, and 1 first fix—then maps back to the chapters that define the rules.
Records out of order — timestamp or buffer commit issue?
→ H2-4
→ H2-7
Records out of order — timestamp or buffer commit issue?
SD card corrupt after outage — missing hold-up or no journaling?
→ H2-5
→ H2-6
SD card corrupt after outage — missing hold-up or no journaling?
Logs look intact but fail audit — missing signatures?
→ H2-8
Logs look intact but fail audit — missing signatures?
Time jumped backward — NTP step or RTC reset?
→ H2-7
Time jumped backward — NTP step or RTC reset?
After firmware update logs unverifiable — key rotation issue?
→ H2-8
After firmware update logs unverifiable — key rotation issue?
Random gaps under high load — ingress overrun?
→ H2-3
Random gaps under high load — ingress overrun?
Duplicate entries after reconnect — retry logic missing dedupe?
→ H2-9
Duplicate entries after reconnect — retry logic missing dedupe?
Frequent unexpected reset counter increment — brownout threshold wrong?
→ H2-6
Frequent unexpected reset counter increment — brownout threshold wrong?
Audit asks for “proof of non-tampering” — what fields matter?
→ H2-8
Audit asks for “proof of non-tampering” — what fields matter?
Storage wears out early — write amplification?
→ H2-5
Storage wears out early — write amplification?
RTC drifts too much — no discipline strategy?
→ H2-7
RTC drifts too much — no discipline strategy?
CRC passes but hash fails — partial rewrite?
→ H2-5
→ H2-8
CRC passes but hash fails — partial rewrite?