Video Pipeline Security for IP Cameras & Edge Devices
← Back to: Security & Surveillance
Video pipeline security ensures that only trusted firmware can boot, keys remain non-exportable, and every encrypted video stream is verifiable against tampering, replay, and rollback. In practice, it’s a closed evidence loop: boot/update gates + key vault + nonce/epoch discipline + integrity checks + auditable telemetry.
H2-1. Featured Answer + Boundary
Featured Answer (extractable, 45–80 words):
Video pipeline security ensures that an IP camera or edge encoder runs only authenticated firmware and protects streams from interception, tampering, replay, and downgrade. It combines verified boot with signed images, TRNG-backed key generation and secure key storage, plus stream encryption and integrity tags with replay defense. This page focuses on device-side boot, keys, stream crypto, and integrity checks—excluding VMS/NVR architectures and network-protocol deep dives.
Boundary: device-side secure boot chain, key lifecycle (TRNG→KDF→vault), stream protection (AEAD/integrity tags), replay/timestamp checks, and security event evidence.
What “secure” means on a real video device
- Authenticity: only signed boot stages and signed firmware components are allowed to execute (chain-of-trust gates).
- Anti-rollback: the device rejects older images even if they are validly signed, using a version gate and a monotonic counter or equivalent.
- Confidentiality: streams are encrypted with per-session keys; nonce/IV uniqueness is enforced across reboots and reconnections.
- Integrity: each frame/segment carries an authentication tag; failures are counted and logged as actionable evidence.
- Replay defense: sequence numbers and a replay window drop duplicates/out-of-order beyond policy, with explicit counters.
H2-2. Threat Model for Video Devices (Practical, Not Academic)
Goal: translate real attack paths into control points and evidence that later chapters can verify with event codes and counters.
Five practical threat groups (each maps to controls + evidence)
| Threat group | Typical attack surface | Control points (device-side) | Evidence (event codes) |
|---|---|---|---|
| Unauthorized firmware / malicious update | Update channel, removable storage, recovery mode, component swapping | Verified boot gate, signed manifest covering all components, update verify-before-switch | BOOT_FAIL_SIGMANIFEST_HASH_MISMATCHUPDATE_VERIFY_FAIL |
| Key extraction / cloning | Debug ports, memory scraping, offline NVM reads, fault injection (concept) | Key vault (non-exportable), usage policy, lifecycle states + secure erase/tamper flags | KEY_ACCESS_DENYKEY_EXPORT_BLOCKTAMPER_ERASE_TRIGGER |
| Stream eavesdropping / tampering / replay | Packet capture, MITM, frame injection/drop, replay of prior segments | AEAD encryption, integrity tags per frame/segment, nonce/epoch management, replay window | STREAM_DECRYPT_FAILINTEG_TAG_FAILREPLAY_DROP |
| Downgrade / rollback | Forcing older signed images, abusing recovery paths, counter reset attempts | Min-version policy, monotonic counter, A/B commit marker with last-known-good rollback | ROLLBACK_BLOCKCOUNTER_INVALIDSLOT_COMMIT_FAIL |
| Debug port abuse | JTAG/SWD, UART shell, hidden boot commands, test fixtures in the field | Debug lifecycle (factory vs field), lock controls, audited unlock attempts | DBG_UNLOCK_ATTEMPTDBG_UNLOCK_DENYDBG_STATE_CHANGED |
Evidence-first event code discipline (so field debug is deterministic)
- Event naming is functional, not narrative: codes identify a gate outcome or policy violation (e.g., signature fail, rollback blocked, replay dropped).
- Every control point emits two signals: a one-shot event plus a monotonic counter (e.g., tag-fail count, replay-drop count).
- Events must pin the “first failing gate”: boot verify → manifest → key vault → stream decrypt → tag verify → replay window.
H2-3. Root of Trust & Boot Chain (ROM → BL → OS/App)
Purpose: make the chain-of-trust auditable. Each boot stage must pass a verify gate before the next stage runs, and every failure must emit stage ID + result code + policy context.
Chain-of-trust in practice (what “must be true” at each hop)
- Stage0 (Boot ROM, immutable anchor): holds the root verification material (public key hash / key index / fused digest) and validates the next-stage loader before execution.
- Stage1 (Bootloader, policy enforcer): parses the manifest, verifies signatures and component hashes, checks target HW ID and min-version / rollback policy, selects the active slot (A/B), then hands off only verified components.
- Stage2 (OS/App, trusted runtime): runs only after Stage1 gates pass; runtime should continue to emit security telemetry (verify status, key vault readiness, and stream protection counters).
Verified boot vs measured boot (scope-safe)
- Verified boot: a hard gate—if verification fails, execution is blocked or falls back to last-known-good.
- Measured boot: records measurements for later decisions—useful for evidence even when a recovery path is allowed.
This page focuses on verified boot gates plus measured-style evidence (event codes/counters) without expanding into platform attestation systems.
Evidence contract (minimum fields that make field-debug deterministic)
| Field / counter | Meaning | Typical event codes |
|---|---|---|
| boot_stage_id | Which stage produced the decision (ROM / BL / OS). | BOOT_STAGE_ENTER |
| verify_result_code | Gate outcome (OK / SIG_FAIL / HASH_MISMATCH / HWID_FAIL / POLICY_FAIL). | BOOT_FAIL_SIG, BOOT_FAIL_HASH |
| rollback_counter_value + min_allowed_version | Anti-rollback policy context that explains “why a valid signature is still rejected.” | ROLLBACK_BLOCK, COUNTER_INVALID |
| active_slot / pending_slot | A/B decision state (what is running vs what is staged). | SLOT_SWITCH, SLOT_COMMIT_FAIL |
| first_fail_gate | First failing gate index (prevents noisy logs by pinning the root cause). | BOOT_FAIL_FIRST_GATE |
Gate checklist (what is validated, and what must be logged)
- Gate0 (ROM→BL): validate BL signature or digest; log
boot_stage_idandverify_result_code. - Gate1 (BL→OS/App): validate manifest signature, component hash list, HW ID match, and anti-rollback; log counter + min version + slot selection.
- Failure behavior: prefer “block + fallback to LKG slot” over “continue with partial trust”; log the first failing gate and chosen fallback path.
H2-4. Firmware Signing & Manifest Design (What Must Be Signed)
Purpose: prevent “signed but still compromised.” Signing must cover what runs, how it is assembled, and the policy gates that block downgrade and mismatched hardware.
What must be signed (minimum coverage)
- Manifest / descriptor: schema version, device family, target HW ID, image version, min allowed version, and the full component list.
- Component hash list: hashes for every executable or security-relevant blob (boot stages, kernel/rootfs/app, security policy blob, critical config).
- Policy binding: a policy hash (or policy component hash) so “rules” cannot be swapped while the main firmware stays signed.
- Key identity metadata: signing key ID / algorithm ID so verification knows which trust anchor to apply.
Partial signing pitfalls (why systems still get “changed”)
- Bootloader-only signing: app/config/policy can be swapped to alter security behavior while boot still “verifies.”
- Image-only signing without min-version gate: an older but validly signed image reintroduces known vulnerabilities (downgrade).
- Component hashes without HW binding: a signed image intended for another hardware revision is accepted (misconfiguration or unsafe peripheral mapping).
Manifest field checklist (keep it self-describing & auditable)
| Field | Why it matters | Evidence |
|---|---|---|
| manifest_version (schema) | Compatibility and parsing safety; supports forward/backward handling without ambiguity. | MANIFEST_VER |
| image_version + min_allowed_version | Anti-rollback gate: block valid signatures that violate minimum version policy. | ROLLBACK_BLOCK |
| target_hw_id (+ optional board rev) | Prevents cross-flashing to incompatible hardware families/revisions. | BOOT_FAIL_HWID |
| component_hash_list (name, hash, length) | Stops “splicing”: a mix of signed components from different releases. | MANIFEST_HASH_MISMATCH |
| policy_hash (security policy / critical config) | Ensures the security rules and key-usage restrictions are bound to the signed release. | POLICY_HASH_MISMATCH |
| signature + signing_key_id | Authenticity: verification anchor selection and signature validation. | BOOT_FAIL_SIG |
Evidence contract (fields to log when verification fails)
- manifest_version, image_version, target_hw_id
- failed_component_name + expected_hash vs measured_hash (or a hash ID reference)
- policy_hash (expected vs measured) and min_allowed_version
- verify_result_code (SIG_FAIL / HASH_MISMATCH / HWID_FAIL / POLICY_FAIL)
H2-5. Anti-Rollback & Secure Update (A/B, Counters, Commit Policy)
Goal: updates must behave like a transaction—either commit successfully or return to last-known-good—while blocking downgrade even when signatures are valid.
Two non-negotiable guarantees
- Fail-safe upgrade: an interrupted or unstable update cannot brick the device; it must fall back to a known-good slot.
- Anti-rollback: older images are rejected by policy (min-version + monotonic counter), even if correctly signed.
Design points (practical)
- Monotonic counter: a non-decreasing value stored in protected storage (secure vault/SE concept) that enforces a minimum acceptable version.
- A/B slots: keep a last-known-good slot (A) while staging a candidate slot (B) for trial boot.
- Commit marker: write commit only after a health window passes (e.g., stable boots + stream protection counters behaving).
- Update states: download → verify → switch → trial → commit; every state logs the first failing step.
Update transaction (download → verify → switch → trial → commit)
| State | What must be true | If it fails | Evidence fields |
|---|---|---|---|
| DOWNLOADING | Payload fully received and stored in the inactive slot. | UPDATE_DL_FAIL |
update_state_code, last_failure_step |
| VERIFYING | Manifest signature OK; component hashes match; target HW ID matches; min-version gate passes. | UPDATE_VERIFY_FAIL, ROLLBACK_BLOCK |
verify_result_code, monotonic_counter, min_allowed_version |
| SWITCH_PENDING | Pending slot selected; boot parameters updated for trial boot. | UPDATE_SWITCH_FAIL |
active_slot, pending_slot |
| TRIAL_BOOT | Candidate slot boots and stays healthy through the trial window (no repeated verify/tag failures). | SLOT_REVERT_LKG |
trial_boot_count, integ_tag_fail_count (concept) |
| COMMITTED | Commit marker written; monotonic counter updated to the new minimum accepted version. | UPDATE_COMMIT_FAIL |
commit_marker, monotonic_counter |
Evidence contract (minimum fields)
- active_slot, pending_slot, update_state_code, last_failure_step
- monotonic_counter_value + min_allowed_version + image_version
- verify_result_code (SIG_FAIL / HASH_MISMATCH / HWID_FAIL / POLICY_FAIL)
H2-6. TRNG → DRBG → Key Derivation (Where Keys Come From)
Goal: treat randomness and key generation as an auditable supply chain. The device must prove entropy health, maintain DRBG reseed discipline, and derive per-device keys with explicit contexts.
TRNG vs PRNG vs DRBG (engineering meaning)
- TRNG: a physical entropy source. It is where “unpredictability” enters the system.
- PRNG: an algorithmic generator. Useful for simulation or non-security uses, but not a trust anchor.
- DRBG: a deterministic generator seeded from entropy with defined reseed rules and observable counters—what production systems typically use for consistent security-grade output.
Entropy health test & degraded mode (concept, evidence-first)
- Health test: continuous checks detect stuck/biased entropy output; failures must be surfaced as events, not ignored.
- Degraded mode: if entropy is not healthy, the device should restrict key operations (e.g., block new long-term key creation, limit sessions) and emit an explicit mode flag.
- Why it matters: silent RNG failure produces keys that look “valid” but become predictable and clonable.
Per-device key derivation (KDF with device binding)
- Device binding: derive keys using a stable device UID (concept) so two devices cannot generate the same keys from identical firmware.
- Context separation: each purpose uses a distinct kdf_context_id (boot verify, stream session, storage sealing) to prevent cross-use of keys.
- Epoch/rotation: session keys should be derived with an epoch (or session ID) so rotation is explicit and auditable.
Evidence contract (minimum fields)
| Field / counter | What it proves | Typical event codes |
|---|---|---|
| entropy_status (OK/WARN/FAIL) | Whether the entropy source is considered healthy for cryptographic use. | RNG_HEALTH_FAIL |
| drbg_reseed_counter | Reseed discipline is active; helps detect “never reseeded” situations. | DRBG_RESEED |
| kdf_context_id | Keys are separated by purpose; prevents accidental or malicious cross-use. | KDF_CTX_INVALID |
| kdf_epoch / session_id | Key rotation is explicit and traceable across sessions and reboots. | KEY_EPOCH_ROLL |
| key_generation_mode (NORMAL/DEGRADED) | The device behavior is safe when entropy health fails (no silent operation). | RNG_DEGRADED_MODE |
H2-7. Key Storage & Access Control (Vault Abstraction)
Goal: make key storage auditable and enforceable. Treat SE/TPM/TEE as a key vault that exposes operations (sign/decrypt/derive) via handles—never exporting private keys.
Key vault rules (minimum, enforceable)
- Never-export private keys: apps and firmware receive handles, not raw key material.
- Policy-gated usage: each key has explicit usage flags (encrypt-only / sign-only / attest-only).
- Lifecycle states: factory → provisioned → field → decommission; secure erase must produce auditable events.
Design points (practical)
- Key slots: each slot has
key_slot_id,key_version,usage_flags, andkey_state. - Usage flags: deny any operation outside policy and log
KEY_USE_DENY_POLICY. - Rotation & retirement: new version becomes ACTIVE; old versions become RETIRED/REVOKED and cannot sign new content.
- Secure erase: erase + verify (concept) + event emission; do not silently “forget.”
Evidence contract (minimum fields)
| Field | What it proves | Typical event codes |
|---|---|---|
| key_slot_id | Which vault slot/handle performed the operation. | KEY_IMPORT_OK, KEY_IMPORT_FAIL |
| key_version | Rotation is traceable; old versions can be retired/revoked. | KEY_ROTATE, KEY_REVOKE |
| usage_flags | Operations are restricted by policy (encrypt-only / sign-only / attest-only). | KEY_USE_DENY_POLICY |
| key_state (ACTIVE/RETIRED/REVOKED/ERASED) | Lifecycle is enforced; decommission results in a permanent ERASURE state. | KEY_ERASE |
| tamper_event_count / erase_event | Tamper/erase actions are not deniable; audit trails survive reboots. | TAMPER_TRIP, KEY_ERASE |
Access control model (who can request what)
- Boot chain: verification-only operations; no general-purpose decrypt/sign outside boot policy.
- Crypto engine: AEAD encrypt/decrypt and signing strictly by usage flags and context.
- Video pipeline app: receives only key handles/session handles; cannot export long-term keys.
H2-8. Stream Confidentiality (Session Keys, Nonce/IV Discipline)
Goal: protect the video stream’s confidentiality without a protocol deep dive. The real risk is not “choice of cipher,” but session key rotation and nonce/IV uniqueness across frames, reconnects, and reboots.
Encryption approach (concept-first)
- Prefer AEAD: a single pass provides confidentiality and integrity tags (conceptually reducing mismatch between encrypt and authenticate paths).
- Session-based keys: derive a session key for a session ID, then rotate by key epoch.
- Rotation triggers: by time, by frame count, or on reconnect; rotation must be explicit and logged.
Nonce/IV rules (the non-negotiables)
- Uniqueness per key: never reuse a nonce/IV under the same key epoch.
- Reboot recovery: after reboot, either restore nonce counters safely or force a new key epoch (do not restart from zero silently).
- Near-wrap handling: when nonce counter approaches its limit, rotate epoch before wraparound.
Evidence contract (minimum fields)
| Field / counter | What it proves | Typical event codes |
|---|---|---|
| session_id | Frames are tied to a specific session; helps prevent cross-session mixups. | SESSION_START |
| key_epoch | Key rotation is explicit; replay and decrypt mismatches become diagnosable. | EPOCH_ROTATE |
| nonce_counter | Nonce/IV uniqueness is maintained across frames; detects counter resets. | NONCE_COUNTER_RESET |
| rotation_count | Rotation policy is functioning and trackable in the field. | EPOCH_ROTATE |
| replay_drop_count (optional) | Replay resistance signal aligns with threat model and integrity checks. | REPLAY_DROP |
Session behavior checklist (field-auditable)
- On session start: log
session_idand initializekey_epoch=0with a fresh base nonce counter. - Per frame: increment
nonce_countermonotonically; include it in the protected metadata (concept). - On rotation: increment
key_epoch, reset nonce counter safely for the new epoch, and logEPOCH_ROTATE. - On reboot: restore nonce counter or force a new epoch; never resume with reused nonces under the same epoch.
H2-9. Stream Integrity (Auth Tags, Watermark, Replay Defense)
Goal: detect and stop insertion, deletion, tampering, and replay so modified content cannot “still play” without leaving evidence. Integrity must be measurable by counters and event codes, not assumptions.
Integrity scope (what must be caught)
- Tamper: frame payload or protected metadata changed.
- Splice: frames from another stream/session inserted.
- Drop/Insert: missing frames or injected frames (sequence gaps/duplicates).
- Replay: valid old frames resent to appear current.
Integrity granularity (engineering trade)
- Per-frame tag: best localization (“which frame failed”), higher overhead.
- Per-segment tag: lower overhead (e.g., per GOP/time slice), coarser localization (“which segment failed”).
- Rule of thumb: choose the smallest granularity that still supports field triage without ambiguity.
Replay defense (sequence number + window)
- Sequence number: every authenticated unit (frame or segment) carries a strictly increasing
seq_number. - Sliding accept window: allow limited reordering while rejecting duplicates and overly old seq values.
- On reject: drop content and log
REPLAY_DROPwith the offending seq.
Optional watermark / signing (forensics-lite)
- Watermark: a trace marker (ON/OFF/FAIL) to support later attribution; it does not replace auth tags.
- Forensics signal: record watermark status changes and failures as events, not UI states.
Evidence contract (minimum fields)
| Field / counter | What it proves | Typical event codes |
|---|---|---|
| seq_number | Ordering and gap detection for insert/drop/splice symptoms. | SEQ_GAP_DETECTED |
| replay_drop_count (+ last_bad_seq) | Replay/duplicate rejection is active and measurable. | REPLAY_DROP |
| tag_verify_fail_count | Authenticated content modification is detected (payload/metadata tamper). | TAG_VERIFY_FAIL |
| integrity_granularity_mode (FRAME/SEGMENT) | Interpretation of failures is consistent (frame-level vs segment-level). | INTEG_MODE_SET |
| watermark_status (OFF/ON/FAIL) | Forensics marker is present and stable; failures are explicit. | WATERMARK_FAIL |
H2-10. Secure Time, Timestamps & Ordering Proof (Non-Repudiation Lite)
Goal: integrity needs more than tags—it needs ordering proof. Within page scope, “secure-ish time” means a trustworthy ordering signal built from time source state plus monotonic counters, not a full compliance/WORM system.
Secure-ish time sources (within scope)
- RTC with hold-up: useful when valid; must expose
rtc_valid_flag. - Monotonic counter fallback: preserves ordering even when RTC is invalid or stepped.
- Time state matters: every timestamp must be interpreted through
time_source_state.
Timestamp strategy (capture vs encode vs transmit)
- capture_ts: when pixels are captured / delivered to pipeline.
- encode_ts: when the encoder outputs the authenticated unit.
- transmit_ts: when the protected unit is handed to transport.
- Why three: separates “sensor timing,” “pipeline buffering,” and “network scheduling” without a protocol deep dive.
Ordering proof (what to log)
- Sequence + monotonic time:
seq_numberproves frame order; monotonic time proves time order. - Clock step detection: record any backward/large jump events as
clock_step_events. - Source switching: if RTC becomes invalid, log a source switch and continue with monotonic fallback.
Evidence contract (minimum fields)
| Field / flag | What it proves | Typical event codes |
|---|---|---|
| time_source_state (RTC_OK/RTC_INVALID/MONO_ONLY) | How timestamps should be interpreted and audited. | TS_SOURCE_SWITCH |
| rtc_valid_flag | RTC trust status is explicit; prevents silent misuse of invalid RTC time. | RTC_INVALID |
| clock_step_events | Detects time discontinuity that breaks naive “timestamp-only” proofs. | CLOCK_STEP_DETECTED |
| capture_ts / encode_ts / transmit_ts | Pinpoints where delay/reordering occurs (capture vs pipeline vs transmit). | TS_RECORD |
| seq_number + monotonic_counter_value | Ordering proof remains valid even if RTC is invalid or adjusted. | ORDER_PROOF |
H2-11. Security Telemetry & Field Debug Playbook (Symptom → Evidence → Fix)
This chapter turns security failures into a field-runnable SOP. Every path starts from two objective signals (counters / event codes), uses a discriminator to separate root causes (key vs time vs nonce vs signature/policy), then applies a minimal first fix (LKG rollback, session rebuild, epoch rotate, reprovision).
Evidence dictionary (minimum) + MPN toolbox (examples)
| Domain | Must-have fields / event codes | MPN examples (hardware anchors) |
|---|---|---|
| Boot / Update | BOOT_STAGE_ID BOOT_FAIL_SIG BOOT_FAIL_ROLLBACK UPDATE_STATE ACTIVE_SLOT PENDING_SLOT MONO_COUNTER_VALUE LKG_BOOT_OK |
SPI NOR flash: Winbond W25Q128JV
Macronix MX25L128
Secure counter (example): Maxim/ADI DS28C36 |
| Key vault | KEY_SLOT_ID KEY_VERSION KEY_STATE KEY_INVALID KEY_USE_DENY_POLICY PROVISION_STATE TAMPER_TRIP |
Secure element: Microchip ATECC608B
NXP SE050
Infineon OPTIGA™ Trust M (SLM97xx)
ST STSAFE-A110
TPM 2.0 (if used): Infineon SLB9670 |
| Session / Nonce | SESSION_ID KEY_EPOCH NONCE_COUNTER NONCE_COUNTER_RESET NONCE_REUSE_DETECTED DECRYPT_FAIL_COUNT | Secure MCU (vault+TRNG in-chip, example): STM32H563 NXP i.MX RT685 Microchip SAM E70 |
| Integrity / Replay | SEQ_NUMBER REPLAY_DROP_COUNT TAG_VERIFY_FAIL_COUNT SEQ_GAP_DETECTED WATERMARK_STATUS | External FRAM for metadata (example): Fujitsu MB85RS64V Infineon FM25V02A |
| Time / Ordering | TIME_SOURCE_STATE RTC_VALID_FLAG CLOCK_STEP_EVENTS CAPTURE_TS ENCODE_TS TRANSMIT_TS | RTC with backup/trickle (example): Abracon AB1815 Microchip MCP7940N |
Notes: MPNs are examples for BOM concreteness; equivalent parts are acceptable. The SOP below relies on fields/event codes, not vendor-specific APIs.
Symptom A — Post-update no boot / reboot loop
Typical signature: after upgrade, the device fails early (no video) or reboots repeatedly before services start.
First 2 checks
| Check | Read | Interpretation |
|---|---|---|
| Boot fail class | BOOT_STAGE_ID + BOOT_FAIL_SIG/BOOT_FAIL_ROLLBACK |
Locates the failing hop (ROM/BL/OS/App) and the reason (signature vs rollback). |
| Update state | UPDATE_STATE + ACTIVE_SLOT/PENDING_SLOT |
Separates “package verify failed” vs “slot switch/commit failed”. |
Discriminator (hard evidence)
- Signature chain issue:
BOOT_FAIL_SIG=1and fails consistently at the same early stage → manifest/signing/descriptor mismatch. - Rollback policy issue:
BOOT_FAIL_ROLLBACK=1with increasedMONO_COUNTER_VALUE→ anti-rollback gate blocks the image. - A/B commit issue:
UPDATE_STATEstuck atswitch/commit→ slot state machine not finalized (not a key problem). - Policy/keys issue: boot passes signature but services fail later with
KEY_USE_DENY_POLICY/KEY_INVALID→ vault policy/provisioning mismatch.
First fix (minimal actions)
- Rollback to LKG: force boot from last-known-good slot (
ACTIVE_SLOT=LKG), clearPENDING_SLOT, logLKG_BOOT_OK. - Repair update state: reset
UPDATE_STATEto a safe terminal state; do not keep retrying an unverified image. - If rollback gate blocks: verify monotonic counter source (SE/TPM/counter IC). If counter is inconsistent, reprovision counter state (factory-safe process only).
Relevant MPN examples: secure counter/vault (DS28C36, ATECC608B, SE050), boot storage (W25Q128JV, MX25L128).
Symptom B — Boots but stream cannot decrypt / black screen
Typical signature: UI and services are up, but video shows black/garbled output or the receiver reports decrypt/auth failures.
First 2 checks
| Check | Read | Interpretation |
|---|---|---|
| Session health | SESSION_ID + DECRYPT_FAIL_COUNT |
Confirms session-level failure vs sporadic transport loss. |
| Nonce/epoch continuity | KEY_EPOCH + NONCE_COUNTER (and reset/reuse flags) |
Detects epoch mismatch, counter reset, or nonce reuse under same epoch. |
Discriminator
- Epoch mismatch: decrypt fails spike after reconnect/rotation while nonce counters look monotonic → check
KEY_EPOCHalignment and session rekey logic. - Nonce discipline broken:
NONCE_COUNTER_RESETorNONCE_REUSE_DETECTED→ treat as critical; keys must rotate. - Vault policy denial: decrypt fails with
KEY_USE_DENY_POLICY→ key usage flags / caller identity mismatch (vault ACL). - Time discontinuity side-effect: concurrent
RTC_VALID_FLAG=falseor increasedCLOCK_STEP_EVENTS→ ordering/window logic may reject valid units.
First fix
- Rebuild session: issue a new
SESSION_ID, reset session state, and logSESSION_START. - Force epoch rotate: increment
KEY_EPOCHand reinitialize nonce counter for the new epoch. - If nonce cannot be restored safely: rotate epoch immediately; never reuse nonce under the same epoch.
- If policy deny: correct vault usage policy (sign-only vs decrypt-only) before retrying.
Relevant MPN examples: key vault (NXP SE050, OPTIGA Trust M, STSAFE-A110), secure MCU crypto (STM32H563, i.MX RT685).
Symptom C — Sporadic integrity fail / replay drops spike
Typical signature: video may still appear, but counters show sudden growth in drops or verification failures, often after network changes or time adjustments.
First 2 checks
| Check | Read | Interpretation |
|---|---|---|
| Replay behavior | REPLAY_DROP_COUNT + last_bad_seq |
Confirms window-based rejection (old/duplicate seq) and pinpoints the seq value. |
| Integrity behavior | TAG_VERIFY_FAIL_COUNT + SEQ_GAP_DETECTED |
Separates “content/auth failure” from “ordering/gap anomalies”. |
Discriminator
- Replay-only spike:
REPLAY_DROP_COUNTrises whileTAG_VERIFY_FAIL_COUNTstays flat → likely reordering/window/time discontinuity. - Auth failure spike:
TAG_VERIFY_FAIL_COUNTrises → tamper/splice/mismatched key epoch (check vault + session sync). - Gap dominant: frequent
SEQ_GAP_DETECTEDwithout tag failures → drop/insert symptoms or pipeline loss; tune evidence capture window to isolate when gaps start. - Clock step correlation: increased
CLOCK_STEP_EVENTSalongside drops → time base instability; do not rotate keys blindly.
First fix
- Stabilize ordering proof: correct time source state reporting; ensure monotonic fallback is active when RTC is invalid.
- Session rekey boundary: if drop spike follows reconnect, rebuild session and rotate
KEY_EPOCHto prevent cross-epoch mixing. - Increase localization if needed: switch integrity granularity to per-frame temporarily to locate the failure point (then revert if overhead is high).
Relevant MPN examples: metadata persistence (MB85RS64V, FM25V02A), RTC/backup (AB1815, MCP7940N).
Symptom D — After key rotation, all streams drop
Typical signature: rotation event is issued, then every receiver loses the stream until manual intervention.
First 2 checks
| Check | Read | Interpretation |
|---|---|---|
| Vault version state | KEY_VERSION + KEY_STATE (ACTIVE/RETIRED/REVOKED) |
Confirms whether the new key is truly ACTIVE or stuck in a pending state. |
| Session alignment | KEY_EPOCH + SESSION_ID (post-rotation) |
Detects “key rotated but session not rebuilt” or epoch not advanced. |
Discriminator
- Rotation not committed: rotation event exists but
KEY_VERSIONdoes not change /KEY_STATEnot ACTIVE → vault lifecycle commit failed. - Epoch not advanced:
KEY_VERSIONupdated butKEY_EPOCHunchanged → session state still bound to old epoch. - Policy mismatch: new key is ACTIVE but
KEY_USE_DENY_POLICYrises → wrong usage flags for encrypt/sign operations. - Provisioning regression:
PROVISION_STATEunexpectedly changes (e.g., “factory-like”) → provisioning state machine error.
First fix
- Activate the correct key version: enforce ACTIVE state for the intended
KEY_VERSION; retire prior version safely. - Rebuild session + rotate epoch: new
SESSION_ID,KEY_EPOCH++, nonce reinit for new epoch. - Correct policy flags: fix usage flags before restarting streams (avoid repeated deny loops).
- Fallback safety: if the vault cannot confirm ACTIVE key, rollback to LKG behavior until reprovision completes.
Relevant MPN examples: vault/TPM (ATECC608B, SE050, SLB9670, STSAFE-A110).
H2-12. FAQs ×12 (Evidence-Driven Accordion)
Each answer stays within on-device video pipeline security and maps back to the page’s evidence chain (event codes/counters). Use the “Evidence” line to request exactly the right logs from the field.
Verified boot vs measured boot—which one prevents malware from persisting?
Answer: Verified boot blocks persistence by refusing to execute the next stage unless signature/manifest gates pass. Measured boot can still boot but records what was loaded for later attestation or audit. If the goal is “malware must not survive reboot,” the enforceable gate is verified boot at each hop in the chain.
- What to check:
BOOT_STAGE_IDandBOOT_FAIL_SIG(which hop failed, and why). - First fix: convert “measure-only” stages into “verify-and-block” gates, with an LKG fallback.
Firmware is signed, but device still runs altered components—what did we forget to cover in the manifest?
Answer: Signing “one image” is not enough if the manifest does not bind every security-relevant component to a hash and policy. Common misses include configuration/policy blobs, calibration/security settings, ML models, plugin modules, or per-board feature flags. The manifest must explicitly list each component hash plus version, HW ID, and minimum allowed version.
- What to check:
manifest_versionandcomponent_hash_listcompleteness (do all critical blobs appear?). - First fix: extend the manifest to cover config/policy/model blobs, then enforce verification before load.
Update bricked devices only sometimes—brownout during commit or rollback counter mismatch?
Answer: Intermittent bricks usually separate into (1) commit/switch interrupted (power dip mid-write) or (2) anti-rollback gate rejecting after a counter step. If devices die at a consistent update phase, it is a state machine/atomicity problem; if boot rejects with rollback evidence, it is counter/policy mismatch. Start with update state and slot markers.
- What to check:
UPDATE_STATEplusACTIVE_SLOT/PENDING_SLOT, thenBOOT_FAIL_ROLLBACK/MONO_COUNTER_VALUE. - First fix: enforce “either commit or revert to LKG,” and log the last failed step.
Stream decrypt fails after reboot—nonce/IV reuse or lost key epoch?
Answer: After reboot, decrypt failures commonly come from nonce/IV reuse under the same epoch (critical) or epoch/session desynchronization (recoverable). If the nonce counter reset or reuse is detected, treat it as unsafe to continue and rotate epoch immediately. If epoch is simply mismatched, rebuild the session and align epoch before resuming.
- What to check:
NONCE_COUNTER_RESET/NONCE_REUSE_DETECTEDandKEY_EPOCH(plusDECRYPT_FAIL_COUNTtrend). - First fix: new
SESSION_IDandKEY_EPOCH++; never reuse a nonce under the same epoch.
Integrity checks fail only on some clients—tag mismatch or replay window tuning?
Answer: If only some clients fail, separate “authentication mismatch” from “window/order rejection.” A rising tag-verify failure count indicates a key/epoch/tag computation mismatch. A rising replay-drop count with stable tag failures points to sequence-window tuning or time discontinuity. Do not rotate keys blindly; first confirm which counter is moving and whether time steps correlate.
- What to check:
TAG_VERIFY_FAIL_COUNTvsREPLAY_DROP_COUNTand thelast_bad_seqvalue. - First fix: if replay-only, stabilize time/order proof then adjust window; if tag-fail, rebuild session/epoch.
How often should session keys rotate for long-running CCTV streams?
Answer: Rotate session keys often enough to limit exposure, but not so often that reconnection/rotation becomes the dominant failure source. Practical triggers are time-based, frame-count-based, or reconnect-based rotation. The critical requirement is that every rotation advances the key epoch, resets nonce under the new epoch, and is logged so field teams can correlate outages with rotation events.
- What to check:
KEY_EPOCHprogression androtation_count(plusSESSION_IDchanges). - First fix: define a rotation policy with explicit logging and a safe reconnection path (rebuild session on rotate).
How do you prove “this video is from this device” without deep compliance scope?
Answer: Use a device-unique identity key that never leaves the vault and sign lightweight evidence for provenance: device ID, key version, sequence range, and monotonic time state. This is “non-repudiation lite” without full compliance/WORM scope. Store the signature and time-source state alongside the stream metadata so audits can link content to a specific device key.
- What to check:
KEY_SLOT_ID/KEY_VERSIONandTIME_SOURCE_STATE(plusSEQ_NUMBERrange). - First fix: sign and log provenance metadata at session start and at periodic checkpoints; keep keys non-exportable.
Where should anti-rollback counters live to survive storage replacement?
Answer: Anti-rollback only works if the monotonic counter cannot be reset by swapping flash/eMMC. The counter should live in a tamper-resistant store (SE/TPM/secure counter) or be cryptographically bound to device identity so replacement does not reset state. When the counter gate triggers, logs must show the counter value and the minimum allowed version.
- What to check:
MONO_COUNTER_VALUEandBOOT_FAIL_ROLLBACK(plus policy “min allowed version” field). - First fix: migrate counters into a secure element/counter IC (e.g., ATECC608B, SE050, DS28C36) and enforce it in boot.
TRNG health fails—what’s a safe degrade mode?
Answer: If TRNG health fails, the safe response is to enter a constrained mode: stop generating new long-term identity keys, shorten session lifetimes, increase reseed frequency, and log every entropy failure with a monotonic timestamp. The goal is to prevent silent weak-key generation. Resume full mode only after health recovers or after a controlled maintenance action.
- What to check:
entropy_status(health flag) andDRBG_reseed_countertrend. - First fix: switch to degrade policy: disable new root keys, force rapid session rekey, and raise an alarm event.
How to lock debug ports without breaking manufacturing test?
Answer: Use lifecycle gating: allow controlled debug only in factory states, then permanently restrict or require signed authorization after provisioning. The field state should treat debug unlock as a security event with audit logging. This preserves manufacturing test while preventing post-deploy extraction of keys or firmware modification. Make the state transition explicit and verifiable in logs.
- What to check:
PROVISION_STATEand a debug control event (e.g.,DEBUG_UNLOCK_EVENT), plusTAMPER_TRIPif present. - First fix: enforce “factory-only unlock,” then lock/fuse after provisioning; log every unlock attempt and outcome.
Key rotation caused fleet outage—what are the top two logs to request first?
Answer: Request one vault log and one session log. From the vault: confirm which key version is ACTIVE and whether policy denies occurred. From the session: confirm epoch advancement and the session restart point. These two logs quickly separate “rotation not committed” from “session not rebuilt” and avoid days of guessing across devices and clients.
- What to check:
KEY_VERSION/KEY_STATE/KEY_USE_DENY_POLICYandKEY_EPOCH/SESSION_IDat rotation time. - First fix: activate correct key version, rebuild session, and enforce
KEY_EPOCH++on rotation boundaries.
Encryption enabled but bandwidth/latency worsened—what to measure first?
Answer: Measure where time grows in the pipeline before blaming “encryption overhead.” Compare capture→encode, encode→protect, and protect→send timestamps. Then check whether frequent rotation/reconnect is forcing repeated session setup. If protect→send expands only during rotation windows, the bottleneck is session churn; if it is constant, the crypto engine path may be the limiter.
- What to check:
ENCODE_TSvsTRANSMIT_TSdeltas androtation_count/KEY_EPOCHcorrelation. - First fix: reduce unnecessary rekeys, make rotation deterministic, and prefer hardware crypto paths when available.