123 Main Street, New York, NY 10001

Calibration & NVM for Machine Vision Cameras

← Back to: Imaging / Camera / Machine Vision

This page defines what calibration “owns” and how per-unit calibration sets (including temperature-drift tables and provenance) should be stored, validated, and traced in non-volatile memory—without crossing into security, buffering, timing, or ISP deep dives.

H2-1. Definition & scope boundaries: what “Calibration & NVM” owns

Intent: lock the page boundary so every discussion can be routed to the correct owner (calibration vs configuration vs trace), preventing accidental drift into ISP/security/storage topics.

Operational definitions (mechanically checkable)

  • Calibration data = per-unit values derived from measurement (trims/LUTs/maps/tables). If two devices differ, it is expected here.
  • Configuration = user or system settings (modes, ROI, output format). If a factory reset changes it, it belongs here—not in calibration.
  • Traceability = provenance fields bound to a calibration set (who/when/where/which reference). It explains origin, not image tuning strategy.
  • NVM in this page = persistent storage for small/medium calibration sets + metadata (KB to low-MB range), not video buffering or PLP design.

Routing rules (to prevent cross-writing)

Evidence chain (what indicates calibration/NVM corruption vs “normal tuning”):
  • Sudden color cast / shading change after reboot or update, while the ISP binary is unchanged → suspect wrong/invalid calibration set selection.
  • Wrong scale / measurement mismatch (metrology drift) with otherwise “nice looking” images → suspect geometric calibration (intrinsics/extrinsics/distortion) mismatch.
  • Unit-to-unit inconsistency within the same BOM/build → suspect per-unit trims or traceability binding errors (wrong module’s set applied).
  • Temperature-dependent drift that is repeatable across cycles → suspect temperature-drift tables, sensor temperature validity, or interpolation bounds.
Owns: per-unit trims Owns: temp drift tables Owns: trace metadata Not here: keys / PLP / codec / PTP
F1 — Data ownership map for Calibration and NVM Block diagram showing Sensor, ISP, and MCU feeding a Calibration Manager that reads and writes NVM calibration sets, temperature tables, and trace metadata, with banned domains clearly marked. Data ownership: Calibration & NVM Define what belongs here vs what must be routed to sibling pages Sensor Per-unit offsets/gain hooks ISP Applies LUTs / maps MCU / SoC Loads & validates sets Calibration Manager Select • Validate • Apply • Commit Schema version + CRC gate Bind set ↔ module serial NVM Calibration sets + metadata Per-Unit Trims Temp Tables Metadata / Trace BANNED here Keys / TRNG PLP / Buffer Tip: Treat calibration sets like firmware artifacts — versioned, validated, and traceable.
Figure F1. Ownership map: this page covers calibration sets (per-unit trims, temperature tables, trace metadata) and their NVM persistence. Security keys/TRNG and PLP/video buffering are intentionally out of scope.

H2-2. Calibration dataset taxonomy: what must be stored, and at what granularity

Intent: define a complete but structured taxonomy so calibration coverage is not “forgotten” during design, while keeping each parameter group tied to footprint, update frequency, and field symptoms.

How to plan calibration data (three questions that prevent wasted NVM)

  • What failure does this parameter prevent? (color cast, vignetting, scale error, unit mismatch, thermal drift)
  • What is the data shape? scalar, vector/matrix, 1D table, 2D grid/LUT, or sparse map
  • How often can it change? factory once, after rework (lens/sensor swap), periodic service, or never
Domain Typical contents (examples) Data shape / footprint hint Primary symptom if wrong Update trigger
Geometric Lens distortion map; intrinsics/extrinsics (module); pixel pitch / scale; alignment offsets Scalars/matrices (10s–100s B) + grids/LUTs (KB–100KB+) Wrong scale / metrology error while “image looks fine” Lens/module replacement; mechanical rework; factory alignment
Radiometric Black level; gain/linearity; PRNU/DSNU stats; shading (LSC); color matrix (CCM); LUT hooks Scalars/tables (100s B–KB) + 2D shading grids (KB–100KB+) Color cast / vignetting / noise shift or inconsistent brightness Factory calibration; sensor replacement; sometimes optics change
Temporal Rolling-shutter timing offsets; exposure timing trims; capture-to-apply alignment micro-params Small offsets (10s–100s B) Motion artifacts / misalignment that appear “timing-related” Module variant change; factory characterization; specific rework events
Thermal Temperature-drift coefficients/tables for black level, gain, focus position, or scale; bounds & interpolation rules Piecewise tables (KB range), fixed-point encoding recommended Repeatable drift vs temperature (cold OK, hot fails) Factory thermal sweep; service recalibration in harsh deployments
Manufacturing / Trace Device/module serial; lot; station ID; reference standard ID; timestamps; producer software version Structured header + fields (100s B–few KB) Unexplainable unit variance and weak RMA root-cause Always updated when a new calibration set is committed

Minimum Viable Set (MVS): the smallest set that prevents “silent wrongness”

  • Radiometric base: black level + gain/linearity trims
  • Shading: LSC/shading grid if the optics stack produces visible vignetting
  • Geometric baseline: intrinsics + distortion map whenever measurement/positioning matters
  • Thermal minimum: at least black-level vs temperature correction (with bounds)
  • Trace header: device/module serial + station ID + set version + timestamp

Engineering goal: a device should never “look OK but be wrong” in metrology or thermal stability due to missing calibration assets.

Extended Set: when deeper calibration is justified

  • Metrology / 3D: full geometric chain (intrinsics/extrinsics, scale, alignment per module)
  • Low-light / HDR: richer radiometric characterization and temperature-aware LUT selection hooks
  • Harsh thermal: multi-point drift tables per key block (sensor + optics + mechanics)
  • High serviceability: append-only trace events (factory, rework, service recal)

Planning rule: expand only when there is a clear symptom to prevent and a measurable acceptance test to validate it.

Footprint & update frequency planning (keep it practical, not a storage deep dive):
  • Separate “rare writes” from “frequent counters”: calibration sets should be committed only on factory/service events, not on every boot.
  • Bind by IDs: the calibration set header should carry both device serial and module serial; applying a set with mismatched IDs must be rejected.
  • Plan for growth: reserve schema space for new fields; avoid formats that require rewriting the entire set for one small addition.
F2 — Calibration set composition and footprint hints A calibration set container split into geometric, radiometric, thermal, and manufacturing trace sections with size hints and dependency arrows. Calibration Set = domains + sizes + dependencies Plan what to store, how big it is, and what symptom it prevents Calibration Set (versioned artifact) Geometric intrinsics • distortion • alignment Size hint: 10sB → 100KB+ Radiometric black level • LSC • CCM/LUT Size hint: 100sB → 100KB+ Thermal drift tables • bounds • interpolation Size hint: KB-class tables Manufacturing / Trace serial • station • reference • timestamp Size hint: 100sB → few KB thermal affects radiometric trace binds the set Header: magic • schema_version • set_id • device/module serial • CRC region Planning tip: every parameter group must map to a measurable acceptance test and a clear failure symptom.
Figure F2. Calibration set composition: store parameters in domains (geometric, radiometric, thermal, trace) with clear footprint hints. This structure supports “minimum viable set” vs “extended set” planning without scope creep.

H2-3. Data model & schema: TLV/CBOR/flat structs, alignment, and forward compatibility

Intent: prevent format debt. Calibration sets must evolve over years (new parameters, new modules, new factory metadata) without breaking old devices or causing silent misinterpretation.

Schema selection rules (choose based on evolution risk)

  • TLV (Type–Length–Value) — preferred when fields grow or differ by module. Unknown types can be skipped safely.
  • CBOR / protobuf-like — useful for structured tooling, but bound parsing cost and define strict limits (depth, keys, sizes).
  • Flat structs — only for truly stable mini-headers. Risk: offset brittleness and backward breaks.
Planning rule: if a parameter is likely to be added later, it should be a new TLV, not a change to an existing field’s meaning.

Compatibility contract (must/should rules)

  • MUST: unknown TLV types are skipped by length (no hard failure).
  • MUST: bounds-check every length and record_size; reject overflow immediately.
  • MUST: schema_version gates interpretation; unsupported versions are rejected and trigger rollback.
  • SHOULD: reserve a type range for future expansion; keep type meanings immutable.
  • MUST: define a deterministic CRC region (which bytes are covered and which are excluded).
Header field Why it exists (failure it prevents) Typical rule
magic Prevents mis-parsing random bytes as a calibration record. Reject if mismatch.
schema_version Prevents old firmware from interpreting new layouts incorrectly. Reject if unsupported; do not “guess”.
record_size Prevents out-of-bounds reads and partial-page confusion. Reject if exceeds maximum or inconsistent with storage page rules.
set_id Enables traceability, A/B selection, and deterministic rollback decisions. Must be unique per commit; logged in trace events.
created_time Helps correlate issues to stations, lots, and reference standards drift. Store as UTC epoch or ISO-like fixed format.
producer_fw_version Explains why two sets differ; supports compatibility gates and audits. Must be captured at commit time.
device_serial Prevents applying the wrong unit’s calibration set (silent wrongness). Reject if mismatch or missing when required.
module_serial (recommended) Prevents lens/sensor-module swap errors; binds calibration to the physical module. Reject if mismatch; allow “unknown” only in controlled factory modes.
Deterministic CRC region & alignment notes:
  • Define CRC coverage as header (excluding mutable pointers) + payload, excluding the CRC field itself.
  • Use a single endianness (commonly little-endian) and document it as a rule, not an assumption.
  • Allow TLV values to be padded for 4-byte alignment while keeping len authoritative.

Compatibility tests (evidence chain)

  • Backward: new firmware reads old record missing new TLVs → uses defaults and marks “degraded” if needed.
  • Forward: old firmware reads new record with extra TLVs → skips unknown types, still applies known ones.
  • Negative: CRC mismatch, len overflow, record_size mismatch → must reject and fall back to last-known-good.
Core: TLV + versioning Core: unknown-field skip Core: deterministic CRC region Goal: no silent misread
F3 — On-flash record layout: header + TLVs + CRC Diagram showing a versioned record header, TLV payload sequence including reserved and unknown types, and a CRC footer with a marked CRC region. On-flash record layout (versioned, bounded, verifiable) Header defines interpretation; TLVs enable growth; CRC prevents silent corruption Record (one calibration set snapshot) Header magic • version • size • IDs Payload TLV sequence (order-flexible) TLV TLV Unknown Reserved CRC 32C/64 CRC region: header + payload (exclude CRC field) TLV structure (safe skipping) Type Length Value (aligned) Rule: if Type is unknown, skip Length bytes (after bounds checks).
Figure F3. Versioned record layout: a strict header + extensible TLV payload + CRC footer. Unknown TLVs are safely skipped (after bounds checks), enabling forward compatibility without silent misreads.

H2-4. Integrity & robustness: CRC, ECC, redundancy, and power-fail safe updates (A/B)

Intent: make calibration updates transactional. A device must never end up applying a half-written set or losing the last-known-good set due to brownouts or partial writes.

Integrity layers (why CRC and ECC both matter)

  • Device ECC can correct small bit flips but may not detect truncation or wrong-length records.
  • Per-record CRC detects structural corruption (partial pages, wrong length, torn writes).
  • Rule: CRC failure always rejects a candidate set, even if ECC reports “corrected”.
Recommended: CRC32C for speed or CRC64 for stronger detection when records are large.

Redundancy patterns (robust selection)

  • A/B slots: Active + Candidate, selected by a monotonic generation counter.
  • Atomic commit pointer: switch active only after verify; pointer update is the only “commit”.
  • Majority vote (N copies): only for tiny critical scalars (optional), not for large maps.
Power-fail safe update (transaction rules):
  • Write Candidate to the inactive slot (never overwrite Active in place).
  • Verify: schema_version supported + serial binding match + CRC pass + required TLVs present.
  • Commit: flip the active pointer atomically (or write a small “commit record” with a higher generation).
  • Rollback: on any failure, keep Active unchanged; mark Candidate invalid and log an error counter.

Fault injection tests (evidence chain)

  • Brownout during write: power loss mid-payload → Candidate must be rejected; Active remains valid.
  • Brownout during commit: power loss while flipping pointer → system must select the latest fully verified Active on boot.
  • Bit flips: corrupt header/len/CRC → must fail CRC or bounds checks and trigger rollback.
  • Partial page: half-programmed page → record_size/CRC mismatch must be detected.
Core: A/B + generation Core: atomic commit Core: CRC + ECC Goal: never lose last-good
F4 — Safe update state machine for calibration sets State machine showing read active, write candidate, verify, atomic commit pointer, and rollback paths, with power-loss risk points highlighted. Safe update state machine (transactional calibration) Write → Verify → Commit pointer; rollback on any error or power loss Read Active select last-known-good Write Candidate inactive slot only Verify CRC + IDs + schema Commit Pointer ATOMIC Done new set active Rollback keep Active verify fail → rollback commit fail power loss risk power loss risk Rule: never overwrite Active in place; only a verified Candidate can become Active via an atomic commit.
Figure F4. Transactional update flow: write Candidate to the inactive slot, verify (CRC + IDs + schema), then atomically commit the pointer. Any failure or brownout triggers rollback to last-known-good Active.

H2-5. NVM technology selection: EEPROM vs SPI NOR vs FRAM/MRAM vs OTP/eFuse

Intent: practical NVM selection for calibration sets (size, update frequency, transaction complexity, and industrial reliability), without drifting into generic storage theory.

Calibration-focused selection criteria

  • Endurance: write/erase limits vs your commit rate (budget it).
  • Retention: hot environments reduce data retention—treat temperature as a first-class input.
  • Write granularity: page/sector erase complexity directly impacts power-fail safety design.
  • Boot-time read reliability: must always locate last-known-good quickly.
  • Industrial susceptibility: EMI/ESD and temperature cycling show up as read/write errors—design for reject + rollback.

Endurance budgeting (evidence chain)

  • Budget writes as: writes/day × years × margin ≤ endurance limit.
  • Only write on calibration commit events; never on every boot.
  • Separate fast-changing counters from calibration sets (avoid silent aging failures).
Planning tip: if commit frequency is uncertain, assume worst-case service behavior and add margin.
NVM type Best fit (camera calibration) Main risk if misused Recommended write model
EEPROM Tiny sets and infrequent commits (scalars, small tables, metadata). Wear-out if used for counters/logs A/B records + CRC + generation counter
SPI NOR Flash Larger LUTs/maps (shading grids, distortion maps) where capacity matters. Erase-page complexity → torn writes Append-only journal + verify + atomic pointer commit + GC policy
FRAM / MRAM Frequent updates or higher safety margin for field recalibration and trace events. Higher BOM / capacity constraints Still use CRC + generation; random-write friendly journaling
OTP / eFuse Immutable IDs or one-time trims (binding only). Irreversible errors if written wrong Use only for identity/one-time constants; keep calibration sets in rewritable NVM
Interface considerations (only as they affect reliability):
  • I²C/SPI bus sharing: ensure calibration reads are not blocked at boot; timeouts must trigger rollback.
  • Pull-ups and noise: treat communication faults like data faults—reject and fall back to last-known-good.
F5 — Decision tree: choose NVM for calibration sets Decision tree starting from calibration set size and update frequency and leading to EEPROM, SPI NOR, FRAM/MRAM, or OTP/eFuse, with recommended write model notes. NVM decision tree (calibration sets) Start from set size and commit frequency, then pick the simplest safe write model Calibration set size + update frequency (KB vs 100KB+; commits per month vs per day) Small + low-frequency commits trims, small tables, metadata Medium/Large sets LUTs, shading grids, maps EEPROM Write model: A/B + CRC + gen SPI NOR Flash Write model: append-only + GC Frequent commits / service recalibration Prefer FRAM/MRAM for safer, frequent updates OTP / eFuse IDs / one-time trims only Rule: pick the simplest memory that supports your commit model safely (A/B for small, journal for NOR, FRAM/MRAM for frequent).
Figure F5. Decision tree: start from calibration set size and commit frequency, then choose the lowest-risk NVM and write model. OTP/eFuse is reserved for immutable identity or one-time trims only.

H2-6. Endurance & wear management: write minimization, wear leveling, and journaling

Intent: prevent silent aging failures that appear months later. Calibration storage should degrade gracefully: detect, reject, roll back, and provide measurable counters for predictive maintenance.

Write minimization (do not write unless it is a commit)

  • Store calibration sets only on calibration commit events (factory, rework, service).
  • Move fast-changing counters/statistics out of the calibration region.
  • Optional: “no-change commit” suppression (if the new set matches the old set).

Wear leveling strategy (choose the simplest that works)

  • EEPROM: rotate across N slots (A/B or small ring) with generation counters.
  • SPI NOR: append-only journal + block erase GC; avoid in-place overwrite.
  • FRAM/MRAM: still use journal/generation for recoverability and auditability.
SPI NOR GC policy (minimum, calibration-friendly):
  • Write records sequentially until a block is full; then switch to the next block (append-only).
  • Erase only blocks that contain no last-known-good record (never erase the only safe fallback).
  • Keep a small pointer/metadata area that can always select the latest valid record after reboot.

Monitoring & predictive thresholds (evidence chain)

  • Track write_count (EEPROM/FRAM/MRAM) and erase_count per NOR block.
  • Track verify_fail / CRC_fail / fallback_events and expose them to field logs.
  • Set alarm thresholds (example): warn at ~80% of endurance budget; investigate rising fallback_events.
F6 — Append-only journal for calibration sets (ring buffer) Diagram of an append-only log across pages/blocks, each record with generation counter and CRC, plus a pointer to the latest valid record and a safe garbage collection direction. Append-only journal (wear leveling + recoverability) Each commit appends a record; boot selects the latest valid generation; GC erases only safe blocks Journal area (pages/blocks) Block A Block B Block C gen=101 • CRC OK gen=102 • CRC OK gen=103 • CRC FAIL gen=104 • CRC OK gen=105 • CRC OK gen=106 • CRC OK gen=107 • CRC OK gen=108 • CRC OK gen=109 • CRC OK Latest-valid pointer selects highest generation with CRC OK (e.g., gen=109) GC / erase policy (NOR) erase only blocks that contain no last-known-good append direction (new commits) Rule: select latest valid generation at boot; never let GC erase the only safe fallback record.
Figure F6. Append-only journal: each commit appends a new record (generation + CRC). Boot selects the highest valid generation; NOR GC erases only blocks that do not contain last-known-good data.

H2-7. Temperature-drift tables: representation, interpolation, and validation strategy

Intent: make thermal stability measurable and repeatable. Define how drift tables are represented, how runtime interpolation behaves (including clamping and optional hysteresis), and how to validate them in a chamber with acceptance bands.

Representations (pick stability over cleverness)

  • Piecewise-linear table: temperature points → coefficient vectors (preferred for deterministic behavior).
  • Polynomial coefficients: compact but watch numeric stability; limit degree and normalize temperature range.
  • LUT per parameter group: separate tables for groups (e.g., black level vs temp) with independent headers/CRCs.
Definition rule: always document the temperature source (die/board sensor) and units for every table.

Interpolation rules (runtime contract)

  • MUST: clamp outside the calibrated range (no extrapolation).
  • MUST: temperature points are strictly monotonic; reject broken tables.
  • MUST: deterministic rounding for fixed-point interpolation.
  • SHOULD: update gating to avoid coefficient chatter when temperature noise is small.
  • Optional: warm-up hysteresis if behavior differs on heating vs cooling (state-based, not guesswork).
Storage topic Recommended rule Why it matters
Fixed-point encoding Store Q-format + scale factors in the table header. Prevents overflow/units confusion across firmware versions.
Compression Prefer predictable compression (scaling + fixed-point) over opaque codecs. Deterministic decode reduces field variance and test ambiguity.
Per-table CRC Each table carries its own CRC (in addition to record CRC). Localizes corruption and avoids “one bit breaks all”.
Clamping behavior Define a single clamp rule for below-min and above-max. Avoids diverging behavior across SKUs and firmware branches.

Thermal sweep validation SOP (evidence chain)

  • Soak points: choose low/mid/high plus edge points; hold until stable.
  • Stabilization criteria: temperature rate and key metric rate must be under thresholds for a fixed time window.
  • Golden reference: keep illumination/targets fixed; do not confuse optical drift with thermal drift.
  • Acceptance bands: define per-metric limits (e.g., black-level error, gain error, scale error) for pass/fail.
  • Random-temp verification: validate at non-table temperatures to confirm interpolation behavior.
F7 — Thermal calibration workflow: fit, store, verify Workflow diagram from thermal chamber measurement through fitting drift tables, fixed-point encoding, storing to NVM, and verification at random temperatures. Thermal drift workflow (repeatable + verifiable) Chamber → stabilize → measure → fit table → encode → store → verify at random temperatures Chamber set T points Stabilize soak criteria Measure sensor metrics Fit drift table piecewise / poly Encode fixed-point + CRC Store to NVM commit record Verify random temperatures + acceptance bands Rule: clamp no extrapolation Per-table CRC localize corruption Stable criteria soak before measure
Figure F7. Thermal calibration workflow: chamber stabilization, measurement, table fitting, fixed-point encoding with per-table CRC, NVM commit, and verification at random temperatures using acceptance bands.

H2-8. Traceability & provenance: what metadata makes field failures diagnosable

Intent: make traceability concrete. Define the minimum metadata and event model that turns “mystery field failures” into diagnosable correlations by lot, station, reference artifact, or service actions.

Minimum trace set (join keys + provenance)

  • Join keys: device serial, module serial, lens ID (if present), calibration set_id.
  • Station context: station ID + station software version + timestamp (UTC).
  • Reference artifact: chart/standard ID + last certification date.
  • Identity field: operator/system identity (not crypto; just provenance).

Calibration event log (append-only)

  • Events: factory_cal, service_recal, lens_replace, module_swap, board_swap.
  • Each event stores: type, time, actor (station/operator), related serials, optional notes, linked set_id (if produced).
  • Storage policy: immutable birth record + append-only event records.

Field triage using metadata (evidence chain)

  • Lot/station correlation: cluster failures by station ID and station software version.
  • Reference drift: correlate shifts to a reference artifact ID or certification window.
  • Wrong binding: detect lens/module mismatch by comparing live IDs with stored join keys.
  • Service actions: explain step changes via event history (lens replace, board swap, recal).
F8 — Trace chain: device ↔ calibration set ↔ station ↔ reference ↔ lot Relationship diagram linking device identity to calibration set, calibration station, reference standard, and batch/lot, highlighting join keys such as serials and station IDs. Trace chain (diagnosable by join keys) Link failures to station, reference artifacts, and lots using stable IDs and append-only events Device device_serial module_serial • lens_id Calibration set set_id • schema_version timestamp (UTC) Station station_id sw_version • actor_id Reference standard ref_artifact_id cert_date Batch / Lot lot_id • shift_id correlation target JOIN KEYS: serials + set_id + station_id Storage policy: immutable birth record + append-only events factory_cal • service_recal • lens_replace • module_swap • board_swap Use metadata to cluster field failures by lot/station/reference windows
Figure F8. Trace chain with explicit join keys: device identity links to calibration sets, stations, reference artifacts, and lots. Append-only events explain step changes and make field triage correlation-driven instead of guess-driven.

H2-9. Update, migration & rollback: treating calibration as a managed artifact

Intent: version without bricking image quality. Calibration updates must be gated by compatibility and integrity checks, migration must be idempotent, and rollback must always preserve the last-known-good set.

Calibration as an artifact (contract)

  • Each set carries: set_id, schema_version, producer_fw_version, generation, CRC status.
  • Activation rule: only sets that pass integrity + compatibility gates may become active.
  • Rollback rule: never delete the last-known-good (LKG) set.

Migration patterns (idempotent by design)

  • Boot-time schema migration: read old set → produce new candidate record → verify → commit pointer.
  • Idempotency: repeated boots must not re-transform the same data (use migration markers/version fields).
  • Raw + derived (optional): store raw measurements and derived tables separately so new firmware can re-derive safely.
Gate / rule Required behavior Failure handling
Integrity gate CRC OK (record + per-table CRC where applicable). Reject candidate; keep LKG; log fault
Compatibility gate Firmware supports candidate schema_version (range-based, deterministic). Do not activate; keep LKG; raise incompatible_schema
Upgrade / downgrade loops New FW reads old sets; old FW ignores unknown fields or safely rejects. Must land on LKG after any loop
Rollback invariant LKG remains discoverable and activatable at every boot. Always recover
Deployment modes (workflow only):
  • Factory provisioning: station produces sets; device enforces gates and activation.
  • Field service tool: updates allowed subsets (service mode) and appends an audit event.
  • Remote calibration package: package staged as candidate → verified → gated → committed (no full firmware talk here).

Evidence chain: test matrix that must pass

  • Upgrade/downgrade loops (A↔B firmware) with repeated boots.
  • Power-loss during candidate write (partial record) → must keep LKG.
  • Corrupted candidate (bit flips / CRC fail) → reject + fault + LKG remains active.
  • Incompatible schema version → gate blocks activation deterministically.
F9 — A/B sets with compatibility gates Diagram showing active LKG and candidate sets with CRC and compatibility gates before pointer commit; failed gates keep LKG and raise a fault. A/B + compatibility gates (never lose LKG) Firmware checks integrity and schema compatibility before activating a candidate set Active (LKG) gen=G • CRC OK always retained Candidate gen=G+1 • CRC? new schema_version Firmware gates CRC OK + schema compatible If PASS → commit pointer activate candidate as new active If FAIL → keep LKG raise fault + do not delete LKG Invariant: at every boot there is always a discoverable LKG set that can be activated.
Figure F9. Compatibility gates: candidate sets are activated only when CRC and schema compatibility checks pass. Any failure keeps the last-known-good set active and logs a fault.

H2-10. Manufacturing & field workflows: factory calibration, rework, and service recalibration

Intent: connect storage design to real operations. Define factory provisioning steps, rework invalidation rules, and a safe field service recalibration subset with auditability and quick checks.

Factory steps (provision → calibrate → verify → commit)

  • Blank check (NVM health / reserved areas).
  • Write identity (birth record: device/module/lens IDs).
  • Run calibration routines (raw + derived where applicable).
  • Verify (lightweight capture + thresholds).
  • Commit (candidate write → verify → pointer commit).
  • Lock baseline record (logical baseline; future changes are appended events).

Rework scenarios (what gets invalidated)

  • Lens change: geometric sets must be regenerated; radiometric shading should be re-verified.
  • Sensor module swap: per-unit radiometric sets must be invalidated and recalibrated.
  • Board swap: temperature sensing path may change → verify drift tables against thresholds.

Field recalibration (service mode subset + audit)

  • Allow only a subset of parameters for service recalibration (e.g., drift tables and limited trims).
  • Every service action appends an audit event (type + time + actor + linked set_id).
  • Service tool must run a lightweight validation capture before commit.
Two quick checks for service triage:
  1. Read active set generation and CRC status (detect fallbacks and candidate rejects).
  2. Run a quick validation capture and compare against thresholds (black level / scale error / basic radiometric limits).
F10 — Workflow swimlane: factory station, firmware, NVM, service tool Swimlane diagram showing provisioning, calibration, verification, commit, and audit event append across factory station, device firmware, NVM, and service tool lanes. Workflow swimlane (factory + service) Provision → calibrate → verify → commit → append event (rework/service paths included) Factory station Device firmware NVM Service tool Provision IDs Run calibration Verify Gate checks Commit pointer Write candidate set Append audit event Service mode recal Quick verify capture Rework path: lens/module/board changes invalidate specific sets → service recal → new commit + audit event.
Figure F10. Swimlane workflow: factory station provisions identities and runs calibration; firmware enforces gates; NVM stores candidate sets and appends audit events; service tool performs subset recalibration with quick verification.

H2-11. Validation & field debug playbook: symptoms → evidence → isolate → fix (Calibration/NVM)

Intent: a repeatable, field-friendly SOP. Each symptom bucket lists the first two checks, what evidence to collect, how to isolate quickly, the first fix to stop the bleed, and concrete BOM/MPN examples when a design change is required.

Bucket A — “Image suddenly off after update” often gating / migration

  • First 2 checks: (1) read active set schema_version + producer_fw_version (2) check candidate/active CRC status + gate-fail flags.
  • Evidence to collect: A/B headers (magic, schema, gen, timestamp), CRC fail counter, active slot pointer, last gate decision (compatible/incompatible).
  • Isolate: compare A vs B: if candidate newer but rejected → gate is working; if candidate activated but incompatible → gate bug or matrix mismatch.
  • First fix: force activate last-known-good (LKG) and block candidate activation until compatibility is proven by a minimal validation capture.
  • Design change (MPN examples): if you need larger, safer “calibration package” storage, use SPI NOR with strict A/B records:
    • SPI NOR examples: Winbond W25Q64JV, Winbond W25Q128JV, Macronix MX25L12835F, Micron MT25QL128 (use as calibration-set container, not video buffers).
    • Small identity/config EEPROM examples (if sets are tiny): Microchip 24AA256, ST M24C64, ROHM BR24G256.

Bucket B — “Thermal drift worse than expected” often table bounds / temp chain

  • First 2 checks: (1) verify drift table temperature range + clamp-hit counters (2) cross-check temperature sensor reading vs external reference.
  • Evidence to collect: drift table header (units, Q-format, CRC), clamp-hit count, table version vs build, temperature sensor raw + filtered value.
  • Isolate: if clamp hits occur near normal operating temps → table range or sensor bias; if clamp hits only at extremes → add soak points / revise range.
  • First fix: clamp deterministically and revert to a prior drift table revision known to pass thermal sweep thresholds.
  • Design change (MPN examples):
    • High-accuracy digital temperature sensors (for stable drift compensation): TI TMP117, ADI ADT7420, Microchip MCP9808.
    • If drift tables are frequently updated in the field, consider non-volatile memory with safer writes: SPI FRAM Infineon/Cypress FM25V02A, Fujitsu MB85RS256B.

Bucket C — “Unit-to-unit mismatch in metrology” often wrong binding / provenance

  • First 2 checks: (1) compare live module/lens IDs vs IDs stored in active calibration set (2) inspect trace metadata: station_id, ref_artifact_id, timestamp.
  • Evidence to collect: device_serial, module_serial, lens_id, set_id, station_id, station SW version, reference artifact ID/cert date, last service event type.
  • Isolate: if IDs mismatch → wrong set bound to hardware; if IDs match but error persists → calibration routine or reference artifact drift.
  • First fix: enforce binding rules: require station/service tool to read IDs before commit; reject any set whose join keys do not match live hardware.
  • Design change (MPN examples): if you need immutable/always-readable identity storage (non-crypto), use dedicated ID EEPROMs:
    • I²C EUI identity EEPROM examples: Microchip 24AA02E64 (EUI-64), Microchip 24AA02E48 (EUI-48), Microchip AT24MAC402 (MAC/EUI family).
    • Regular EEPROM for “birth record + provenance”: ST M24C64, Microchip 24AA256.

Bucket D — “Intermittent corruption after months” often endurance / bus / temperature

  • First 2 checks: (1) read write/erase counters vs budget (2) check CRC error rate trend over time (increasing suggests aging/corruption).
  • Evidence to collect: wear counters, bad-block flags (if any), bus error counters (I²C/SPI retries), temperature extremes history, CRC fail distribution by address/page.
  • Isolate: if wear counters near limit → endurance; if CRC spikes align with temperature/EMI events → bus integrity or extreme temps; if localized pages fail → bad sector behavior.
  • First fix: reduce write frequency immediately (commit-once, no periodic writes) and enable append-only records with majority/LKG fallback.
  • Design change (MPN examples): for frequent updates, migrate from EEPROM/NOR to FRAM/MRAM:
    • SPI FRAM examples: Infineon/Cypress FM25V02A, Fujitsu MB85RS256B, Fujitsu MB85RS1MT (larger density option).
    • SPI MRAM examples: Everspin MR25H256, Everspin MR25H10 (higher density option).
    • SPI EEPROM (higher endurance than NOR for medium sets): ST M95M02 (capacity class example).
F11 — Field decision tree for Calibration/NVM failures Decision tree starting from CRC failures and pointer validity, then schema compatibility, binding checks, thermal clamp hits, and endurance limits; each leaf ends with a single first fix. Field decision tree (single first-fix per leaf) Start with integrity → pointer/gen → compatibility → binding → thermal clamp → endurance CRC fail detected? Pointer/gen invalid? Schema compatible? YES NO/UNKNOWN First fix: Reject candidate → keep LKG active (block activation until re-verified) IDs match live hw? Clamp hits frequent? Wear limit exceeded? First fix: Revert to prior drift table revision (keep deterministic clamping) First fix: Reduce writes now (commit-only, no periodic writes) Operational rule: every failure path must end with LKG activation or a deterministic clamp/commit behavior.
Figure F11. Decision tree for field triage. Always start with integrity (CRC), then pointer/generation validity, then schema compatibility and binding, then thermal clamp behavior and wear limits. Each leaf has a single first fix.
Use case Recommended memory type Concrete MPN examples (shortlist) Notes (calibration/NVM context)
Very small, rarely updated “birth record” I²C EEPROM Microchip 24AA256, ST M24C64, ROHM BR24G256 Good for IDs + provenance; still apply record CRC and LKG rules.
Identity / join keys (preprogrammed EUI) EUI EEPROM Microchip 24AA02E64, Microchip 24AA02E48, Microchip AT24MAC402 Non-crypto identity storage for binding and traceability.
Medium/large calibration packages (maps/LUTs) SPI NOR Winbond W25Q64JV, W25Q128JV, Macronix MX25L12835F, Micron MT25QL128 Use append-only records + A/B + erase policy; not for video buffering.
Frequent updates (service recal, counters) SPI FRAM Infineon/Cypress FM25V02A, Fujitsu MB85RS256B, Fujitsu MB85RS1MT Safer writes; still keep per-table CRC and generation counters.
Frequent updates + higher density need SPI MRAM Everspin MR25H256, Everspin MR25H10 Good for journaling; pair with deterministic commit rules.
Thermal drift accuracy bottleneck Digital temp sensor TI TMP117, ADI ADT7420, Microchip MCP9808 Improves drift table correctness; validate vs external reference.
MPN note: The part numbers above are practical, widely used examples for calibration/NVM design patterns. Final selection still depends on capacity, temperature grade, interface voltage, and your write-cycle budget (writes/day × years × margin).

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12. FAQs (Calibration & NVM)

Each answer stays within this page scope and points back to the chapter where the full evidence chain lives.

Q1Calibration data vs configuration—what should never be overwritten?
Calibration data is per-unit measured truth (offset/gain, shading, distortion, drift tables) and should never be overwritten by UI/config changes. Configuration is user or mode settings and may change freely. Keep calibration as append-only versioned records with a generation counter, and update only via candidate→verify→commit. Separate storage regions so config writes cannot touch calibration records.
Maps to: H2-2 (dataset taxonomy) + H2-3 (schema/forward-compat rules)
Q2Why does image quality change after firmware update if ISP didn’t change?
Even with the same ISP block, firmware can change how calibration is interpreted: schema fields, scaling factors, fixed-point Q formats, default fallbacks, or migration logic may differ. If a candidate set is activated without strict compatibility gates, “valid” numbers can be read with the wrong meaning. The safe approach is compatibility gating plus LKG rollback, then a minimal validation capture before committing any migrated set.
Maps to: H2-4 (transaction-safe A/B) + H2-9 (migration/rollback & compatibility gates)
Q3What’s the safest way to update calibration in the field?
Treat field calibration as a transaction: write a candidate record to a new location, verify CRC (and per-table CRC if used), check schema compatibility, then run a lightweight validation capture against thresholds. Only then flip the active pointer atomically and append a service audit event. Limit field updates to an approved subset (for example drift tables), and never delete the last-known-good set.
Maps to: H2-4 (CRC/A-B commit flow) + H2-10 (service workflows & audit)
Q4CRC passes but the picture is still wrong—what’s next?
CRC only proves the bytes are intact, not that the set matches the hardware or conditions. Next, check binding join keys (device/module/lens IDs) and provenance metadata, then inspect drift-table clamp-hit counters and temperature sensor bias versus a reference. Compare the active set generation to the LKG set and run a minimal validation capture. Use the decision tree to isolate whether the issue is binding, drift modeling, or compatibility.
Maps to: H2-11 (symptom→evidence→isolate→fix SOP)
Q5How big should temperature-drift tables be, and how many points?
Start with a piecewise-linear table covering the full operating range using a small number of soak points (often 6–10) and fixed-point encoding with explicit scale factors. Validate by thermal sweep: enforce stabilization criteria, measure residual error, and add points only where nonlinearity shows up. Clamp deterministically outside the calibrated range and track clamp-hit counters so field logs can prove range or sensor issues.
Maps to: H2-7 (representation, interpolation, validation strategy)
Q6EEPROM vs SPI NOR for calibration maps—what’s the tipping point?
The tipping point is set size and update mechanics. If calibration fits in small records and updates are rare, EEPROM is simple. Once you store larger maps/LUTs (hundreds of KB or more) and need structured journaling with erase-block handling, SPI NOR is usually the practical choice. If field recalibration causes frequent writes, consider FRAM/MRAM. Examples: EEPROM (Microchip 24AA256), SPI NOR (Winbond W25Q128JV), SPI FRAM (FM25V02A).
Maps to: H2-5 (NVM selection criteria)
Q7How to prevent ‘wrong module calibration’ during rework?
Make the calibration set self-identifying and enforce binding at commit time. Store module serial and lens ID (if present) inside the calibration header, and require the station/service tool to read live IDs before writing a candidate. Reject any candidate whose join keys do not match the device. Append rework events (lens_replace, module_swap) so field triage can correlate mismatches immediately and prevent “silent” misbinding.
Maps to: H2-8 (traceability keys) + H2-10 (rework/service workflow gates)
Q8What’s an A/B calibration slot and why not just one copy?
A/B means you always keep an active set (A) while writing a candidate set (B). You never update the active bytes in place. After writing B, you verify CRC and compatibility, then atomically switch the pointer. This survives power loss during writes and prevents partial records from becoming active. A single copy risks bricking image quality if an update is interrupted or a page is corrupted.
Maps to: H2-4 (integrity, redundancy, power-fail safe updates)
Q9How to budget flash endurance for yearly service recalibration?
Budget endurance from the storage mechanics, not just “updates per year.” Compute: recal events/year × bytes written per event × erase-block amplification × years × margin. Then verify against the part’s guaranteed program/erase cycles at temperature. Reduce writes by committing only on recal events and using append-only journaling to distribute wear. Track write/erase counters and define alarm thresholds so aging can be detected before failures.
Maps to: H2-6 (endurance, write minimization, wear management)
Q10Can calibration be compressed safely without losing accuracy?
Yes, if compression respects an explicit error budget and keeps the schema self-describing. Prefer fixed-point quantization with declared scale/Q-format, then apply simple lossless packing (delta coding, run-length) where appropriate. Never compress the mandatory header fields; keep per-table CRC over the decompressed data region. Validate by comparing residual error across temperature points and ensuring the decompressor is deterministic across firmware versions.
Maps to: H2-3 (schema + scaling) + H2-7 (validation for drift/accuracy)
Q11What metadata is most valuable when debugging returns (RMA)?
The most valuable metadata enables correlation and replay: device/module/lens IDs, set_id, schema_version, generation, station_id, station software version, reference artifact ID and certification date, timestamps, and the last calibration event type (factory_cal vs service_recal). With these, you can cluster failures by lot, station, reference artifact, or rework history. Pair metadata with quick checks (generation/CRC + validation capture) to triage fast.
Maps to: H2-8 (traceability/provenance) + H2-11 (field debug playbook)
Q12How to design schema versioning so old firmware doesn’t brick?
Use an extensible schema (TLV/CBOR-like) with a mandatory header (magic, schema_version, set_id, generation, CRC region) and strict unknown-field skipping rules. Firmware should advertise a supported schema range and gate activation deterministically. Old firmware must either safely ignore new optional fields or reject the candidate while keeping the LKG set active. Never delete LKG, and make migrations idempotent so repeated boots do not double-transform data.
Maps to: H2-3 (data model/forward compat) + H2-9 (compatibility gates, migration & rollback)