123 Main Street, New York, NY 10001

Lighting Protection & Metering: eFuse, Event Logs, Telemetry

← Back to: Lighting & LED Drivers

Lighting protection & metering is about making surges/ESD/faults survivable while turning every protection action into auditable evidence (waveforms + event logs + counters). With a coordinated clamp→limit→disconnect chain and crash-safe logging, field failures become measurable, explainable, and serviceable instead of guesswork.

H2-1. Page Mission & System Boundary (What We Protect, What We Prove)

This page treats protection as a controlled action and metering/logging as the proof layer. The goal is not only to survive transients, but to explain what happened (event), what the hardware did (action), and what the product experienced (outcome) using a consistent evidence schema.

Protection = action Metering/Logs = evidence Outcome = symptom + drift

1) Protection Domains (what must be protected)

  • Input domain (AC/DC/PoE feed): highest energy exposure; dominated by surge/lightning coupling and line disturbances.
  • Bus domain (DC link & distribution): vulnerable to UVLO/brownout and repetitive burst disturbances; often where resets begin.
  • Load domain (LED strings / wiring): open/short events, hot-plug transients, and reverse energy injection from long cable runs.
  • Logic & I/O domain (MCU, aux rails, PHY-side rails): low-energy but fragile; ESD/EFT can create silent leakage or repeated resets.

Domain separation exists to decide where energy is allowed to go and which parts may sacrificially fail (clamps) versus which parts must survive (switching silicon, metering core, and control rails).

2) Evidence Chain (what must be proven)

  • Event: surge / ESD / EFT (burst) / lightning-like common-mode coupling.
  • Action: clamp, current-limit, disconnect, auto-retry, latch-off (and any thermal foldback behavior).
  • Outcome: brownout/reset, hard failure, degraded brightness, abnormal heating, communication errors, parameter drift.

A useful system makes the action observable and the outcome attributable. That requires both a fast protection path (trip) and a persistent proof path (counters + logs + timestamps).

3) Deliverables from This Page (what the reader can reuse)

  • Reference architecture: clamp → limit/disconnect → sense → event log → telemetry export.
  • Protection action taxonomy: how “limit vs latch vs retry” decisions map to survivability and serviceability.
  • Minimum evidence schema: which fields must exist so a transient can be reconstructed after the fact.
  • Verification evidence points: the two most valuable waveforms and the log alignment rules.
Protection + Metering Reference Architecture Event → Action → Outcome, captured as measurable evidence Threats Surge ESD EFT/Burst Input transient Clamp TVS/MOV/GDT Limit eFuse / HSS Sense Fast protect + slow metering FAST SLOW Event log Counters + timestamp + reason event_type • peak • duration Telemetry I²C / PMBus / UART export readout + service diagnostics Outcome (what users see) Reset / Brownout Degrade (dim/hot) Comms errors Parameter drift
Figure F1. A luminaire becomes diagnosable when protection actions (clamp/limit/disconnect) are measured and persisted as evidence (counters + timestamps + reasons) that can be exported for service.

H2-2. Threat Model for Luminaires (Surge/ESD/EFT/Lightning vs Real Symptoms)

Standards terminology is only useful when it predicts field behavior. The practical model is symptom → likely threat class → first evidence points. The same symptom can be produced by different threats, and the same threat can produce both catastrophic failures and “still works but degraded” outcomes—so evidence must be designed in.

1) Symptom groups (start from what is observable)

  • Reset / reboot: repeated restarts, flickering on/off, intermittent recovery after power cycle.
  • Hard fail: no light output, no response, or protective shutdown that never recovers.
  • Degrade: still turns on, but runs hotter, dimmer, or derates earlier than before.
  • Comms errors: sporadic CRC/errors, missing acknowledgements, unstable auxiliary interfaces.
  • Drift: thresholds shift, current accuracy changes, nuisance trips increase over time.

“Degrade” and “drift” are the costly failures: the product appears functional but silently loses margin. Without metering + event logs, these cases are easily misattributed to general design quality rather than transient exposure.

2) Threat classes (translate into damage mechanism + signature)

  • Surge: high-energy line transient → clamp stress, silicon overstress, latch-up, and parameter shift.
    Typical signature: clamp activation + bus voltage dip + current-limit/disconnect event.
  • ESD: localized discharge at I/O/connector → latent leakage or intermittent interface weakness.
    Typical signature: I/O fault flags + rising comms error count + unexplained standby current.
  • EFT/Burst: repetitive fast transients → repeated UVLO/brownout, reset storms, nuisance trips.
    Typical signature: reset_reason + brownout counter + short-duration undervoltage events.
  • Lightning-like common-mode coupling: large CM energy → ground reference shift and multi-rail anomalies.
    Typical signature: simultaneous anomalies across domains + elevated event density around the incident time window.

3) Evidence-first triage (the “two waveforms” rule)

  • Waveform #1: input/bus voltage after the clamp stage (captures residual stress and brownout depth).
  • Waveform #2: eFuse/high-side switch current limit or disconnect timing (captures the protective action).

If only one log can be kept, preserve: event_type, peak (V/I), duration, trip_action, rail_id, temp_bin, and a monotonic time base. These fields allow reconstruction even without absolute wall-clock time.

Symptom → Threat Class → First Evidence Points Cells show the quickest proof fields / captures to prioritize Legend: Evidence = waveform or log field Observed symptom Surge ESD EFT/Burst Lightning-CM Reset / brownout reboot storms bus_min + trip clamp_count reset_reason io_fault_flag brownout_count uv_events multi-rail anomalies Hard fail no output latched_off ov/oc reason io damage leak path nuisance trip reset loop ground shift domain-wide Degrade dim / hot power drift event_density leakage rise standby_I uv frequency thermal bins multi-domain shift Comms errors CRC / timeouts reset bursts bus dips crc_err_cnt io_flag timeouts reset_reason CM noise domain skew
Figure F2. A field-first threat model: start from the symptom, then prioritize the smallest set of waveforms and log fields that can prove the likely transient class and the protection action taken.

H2-3. Protection Chain Topology (Clamp → Limit → Disconnect → Recover)

A reliable luminaire protection design is not “adding more parts.” It is a repeatable 4-stage action chain: Clamp controls peak stress, Limit controls energy, Disconnect defines the survival boundary, and Recover defines serviceability. Each stage must leave a traceable log signature.

Clamp = peak control Limit = energy control Disconnect = boundary Recover = maintainability

1) The 4 stages and what they guarantee

  • Clamp (TVS/MOV/GDT): reduces voltage spikes to a defined ceiling; may be sacrificial under extreme energy.
  • Limit (eFuse/HSS): enforces a programmable current/energy envelope; prevents “long-duration stress.”
  • Disconnect (latch-off / ORing / reverse block): isolates the downstream rail when the envelope is exceeded.
  • Recover (auto-retry vs latched + service): defines whether the product self-heals or requires intervention.

Practical rule: Clamp decides peak, Limit decides energy, Disconnect decides survival, and Recover decides user experience. If any stage is missing, the “weakest stage” becomes unpredictable silicon damage.

2) Sacrificial vs recoverable: when each is required

  • Sacrificial-first (clamps accept damage): when external energy is unbounded or line coupling is harsh (outdoor, long cable runs).
  • Recoverable-first (eFuse/HSS self-manages): when downtime/false trips are costly (commercial, networked nodes, frequent switching).
  • Hybrid is common: clamp absorbs peak; eFuse limits energy and sets the action policy.

The design objective is not “never fail.” It is controlled failure or controlled recovery, with evidence to prove which occurred.

3) Coupling to metering/logging: actions must leave evidence

  • Counters (count): how often protection activates (event density).
  • Peak & duration (peak, duration): how severe each event was (stress intensity).
  • Reason & action (trip_reason, trip_action): why the stage engaged and what it did (limit vs disconnect vs retry).

Minimum log fields that make the chain diagnosable: event_type, trip_reason, trip_action, peak_v/peak_i, duration, count, rail_id, temp_bin, plus a monotonic time base to correlate repeated bursts.

4-Stage Protection Chain (Action + Evidence) Clamp → Limit → Disconnect → Recover, each stage emits proof fields Clamp TVS / MOV / GDT peak_v event_type Limit eFuse / HSS I-limit peak_i duration Disconnect Latch / ORing trip_action trip_reason Recover Retry / Service count temp_bin Outcome bus (what users experience) Reset / Brownout needs proof fields Degrade dim/hot margin loss Hard fail service action Drift threshold shifts Evidence-ready fields: event_type • trip_reason • trip_action • peak_v/peak_i • duration • count • rail_id • time_base
Figure F3. A generic protection chain template: each stage must have a defined action and emit minimal proof fields (reason, peak, duration, count) so field failures can be attributed rather than guessed.

H2-4. eFuse / High-Side Switch Deep Dive (SOA, Inrush, Fault Modes)

An eFuse/high-side switch is a programmable power risk executor: it shapes inrush, enforces a current/energy envelope, and exposes state and reason codes so protection becomes measurable. In lighting, it effectively upgrades a fuse from “one-time disconnect” into a state machine with evidence.

1) Capabilities that matter in luminaires

  • Inrush shaping: soft-start, dv/dt control, hot-plug handling to avoid nuisance trips.
  • Fault handling: programmable current limit, short-circuit response, thermal protection, OV/UV gating.
  • Energy direction: reverse current block / ideal-diode ORing behavior to prevent back-feed damage.
  • Observability: fault pins + telemetry registers (reason, peak, counters) to build an evidence trail.

2) SOA + repetition: why “survived once” can still fail later

  • Peak vs energy: a clamp handles peak; the eFuse limits time-integrated stress on the pass FET.
  • Thermal accumulation: dense auto-retry cycles can create heat oscillation (cool → retry → re-trip).
  • Field diagnosis requirement: log retry_count_window, duration, and temp_bin to correlate failures with event density and recovery time.

Mechanism-first takeaway: repeated disturbances turn into failure when the protection policy allows too much “on-time under stress” before a full disconnect, or retries are too aggressive for the thermal time constant.

3) Failure modes (and what the evidence should look like)

  • Nuisance trips: thresholds too tight or sensing noise → elevated count with short duration, weak peaks.
  • Too-slow disconnect: limit response not fast enough → high peak_i and long duration before trip_action.
  • Clamp dies first: clamp absorbs energy because limit is ineffective → rising clamp_count and higher residual peak_v.
  • Reverse energy injection: missing reverse block/ORing → abnormal reverse events and downstream rail anomalies.

These signatures map directly to state transitions in the eFuse state machine below; logs should capture reason, action, peak, and duration per transition.

4) Interfaces to MCU / metering SoC (make trips measurable)

  • Hardware pins: FAULT / PG / EN / IMON (align waveforms with log commits).
  • Digital telemetry: I²C/PMBus registers for reason codes, counters, temperature, and configuration.
  • Sampling sync: trip edge triggers “pre/post” capture and a crash-safe log commit.
eFuse / High-Side Switch State Machine Transitions emit proof fields: trip_reason • trip_action • peak • duration • count Idle EN=0 / waiting count_total Inrush dv/dt + soft-start inrush_ms Run normal operation rail_id Fault detect OCP / OTP / UV / REV trip_reason Limit I-limit / foldback peak_i • duration Latch-off / Auto-retry service boundary vs self-heal trip_action retry_count EN soft-start done fault action limit path Proof fields per transition (minimum) trip_reason trip_action peak_i/peak_v duration count rail_id temp_bin time_base enough to reconstruct event density, severity, and policy behavior
Figure F4. eFuse/HSS as a state machine: design it so every fault transition records reason/action/peak/duration/counters, enabling correlation with repeated disturbances and SOA/thermal accumulation.

H2-5. Current/Voltage Sensing for Protection & Metering (Shunt/Hall + Sampling Strategy)

Protection needs peak + duration to justify a trip action; metering needs energy + trend to explain degradation over time. A robust luminaire therefore uses a dual-path sensing strategy: a fast path for threshold decisions and a slow path for RMS/average/integration—synchronized with a pre/post event window.

Fast path: peak/duration Slow path: Wh/runtime/drift Sync: pre/post window

1) Current sensing: shunt vs Hall (what changes the evidence quality)

  • Shunt (resistor + amplifier): high bandwidth and linearity, ideal for capturing short over-current peaks. Key trade-offs are loss/heat and common-mode robustness during surges.
  • Hall (magnetic sensing): galvanic isolation and low insertion loss, useful when the measurement domain must be isolated. Trade-offs include offset/temperature drift and limited peak fidelity for very fast events.
  • Evidence-first selection: protection evidence prioritizes no-saturation peak capture; metering evidence prioritizes stable, calibratable drift behavior.

If the sensor saturates during the disturbance, peak is understated and root-cause becomes guesswork. Sensor robustness during common-mode excursions is therefore part of the protection chain, not an afterthought.

2) Voltage sensing: divider vs isolated sampling (avoid injecting the transient)

  • Divider sampling: simplest and fast, but must be protected so surge energy is not injected into the logic domain.
  • Isolated sampling: separates hazardous domain from measurement domain; often safer for evidence retention but may trade bandwidth and cost.
  • Common-mode reality: surge/lightning-like events can shift reference potentials; the measurement chain must remain functional to preserve bus_min, V_valley, and event timing.

Metering is only valuable if it survives the incident. Measurement-domain protection is a prerequisite for a credible evidence chain.

3) Sampling strategy: one sensor, two paths, one timeline

  • Fast protect channel: comparator/window thresholding to capture peak and duration, generate the trip edge, and label trip_reason.
  • Slow metering channel: ADC sampling for RMS/average/integration to produce Wh, Ah, runtime_h, and drift metrics.
  • Synchronization: the trip edge freezes a pre/post buffer so causality can be proven (e.g., “bus dipped first” vs “limit engaged first”).

Minimal synchronization requirement: the trip edge triggers (1) a log commit and (2) a pre/post snapshot capture. Without both, the system can say “a trip happened” but cannot explain the timeline.

Dual-Path Sensing: Fast Protect + Slow Metering Same sensor feed, split into two time-scale paths, aligned by trigger Sensors current + voltage Shunt Hall FAST path (protect) comparator / window Trip peak duration trip_reason SLOW path (meter) ADC + RMS + integrator ADC RMS Wh runtime drift Pre/Post buffer freeze window on trip edge pre post trigger Event log commit reason + action + timestamps event_type • trip_reason • trip_action peak • duration • count • rail_id temp_bin • time_base freeze commit
Figure F5. Dual-path sensing: the same sensor feed is split into a fast protect path (peak/duration/trip_reason) and a slow metering path (Wh/runtime/drift). The trip edge freezes a pre/post buffer and triggers an evidence log commit.

H2-6. Metering SoC Architecture (Energy, Runtime, Health Indicators)

A metering SoC becomes valuable when it converts raw V/I/P samples into accounts the luminaire can defend: energy and runtime for usage, counters and valleys for disturbance exposure, and drift metrics for health. The output must be structured as a dashboard plus an event trail.

Energy: Wh/Ah Runtime: hours/switches Risk: trips/brownouts Health: valleys/drift

1) Core accounts (the minimum set worth keeping)

  • Instant/average: V/I/P (instant + average) for trend baselining and anomaly detection.
  • Energy: Wh (and optionally Ah) to quantify usage and support lifecycle accounting.
  • Runtime: runtime_h, switch_count as exposure proxies for stress and service cycles.
  • Temperature linkage: temp_bins for lifetime inference and for explaining derating/degradation.

2) Health indicators (turn incidents into diagnosable signals)

  • Protection density: protect_trip_count_window, event_density.
  • Power integrity: brownout_count, bus_min_hist (valley distribution, not only min).
  • Performance drift: P_avg_drift / I_error_trend to detect aging, contamination, or thermal path degradation.
  • Policy quality: retry_count_window, latch_count to expose nuisance trips vs true hazards.

These indicators answer three field questions with evidence: “How often?”, “How severe?”, and “Is it getting worse?”

3) Data flow and retention: periodic snapshots vs event-driven commits

  • Periodic snapshot: low-rate dashboard updates for energy, runtime, temperature bins, and drift baselines.
  • Event-driven commit: on trip/reset, record compact incident facts (reason/action/peak/duration/counters).
  • Wear-aware storage: store summaries and rolling histograms; keep only the most recent N detailed incidents.

Evidence design goal: stable long-term trends plus a short incident trail that survives resets and power loss.

4) Interfaces (export the dashboard, not raw noise)

  • I²C / PMBus / UART: read the dashboard blocks (energy/runtime/health/counters) and fetch the latest incidents.
  • Two-level API: summary view for service and monitoring; detail view for forensic reconstruction.
Metering SoC Dashboard Schema Structured outputs: accounts + health + incident trail Metering SoC sample → compute → summarize → export Energy Wh • Ah Runtime hours • switches Counters trip • brownout # Min/Max & Valleys Vmin_hist • Pmax Temp bins lifetime proxy Health drift • density Log storage FRAM / EEPROM latest N events Telemetry I²C / PMBus / UART Retention policy Periodic snapshots Wh • runtime • temp_bins • drift Event-driven commits reason • action • peak • duration • counts
Figure F6. A practical metering SoC dashboard: track energy/runtime, disturbance exposure, valleys, and drift, then export summaries plus a compact incident trail to nonvolatile storage and telemetry interfaces.

H2-7. Event Logging That Survives Reality (Timestamp, NVM, Wear, Integrity)

A luminaire log is only useful if it survives power loss, can be recovered after reboot, and remains verifiable after long-term wear. The recommended approach is an append-only ring log with a two-step commit marker and per-record integrity checks.

Crash-safe: commit marker Wear-aware: ring buffer Trust: CRC (optional signature)

1) Minimal event schema (strongly recommended as a hard requirement)

  • Identity: event_id, event_type (surge/esd/oc/ot/uv/ov).
  • Severity: peak_value (V/I), duration.
  • Action: trip_action (limit/latch/retry), plus trip_reason if available.
  • Localization: rail_id (which domain), temperature_bin.
  • Density: counters (lifetime + recent window counts).
  • Timeline: time_base (RTC or monotonic counter for ordering).

The schema is designed to answer field questions with evidence: what happened, how severe, what the protection did, where it occurred, and whether events are becoming more frequent.

2) NVM choice: match the medium to the write pattern

  • FRAM: best for frequent small writes (counters, compact event records), strong power-loss tolerance.
  • EEPROM: suitable for moderate logging rates; control update frequency to avoid hot-spot wear.
  • Flash: large capacity but erase/write granularity demands append-only design and careful recovery rules.

Reliability comes from the log structure first; the storage medium determines endurance and recovery margins.

3) Crash-safe write rule: payload first, commit last

  • Stage A (payload): write the full record body and CRC, but do not “publish” it yet.
  • Stage B (commit): write a small commit marker that turns the record valid.
  • On reboot: scan backward for the last valid commit marker to rebuild head/tail.

Avoid “index-first” updates. If power fails mid-write, the commit marker cleanly separates valid vs incomplete records.

4) Integrity: make logs verifiable, not just present

  • CRC per record: detects torn writes and bit flips; invalid CRC means “do not trust.”
  • Optional signature/MAC concept: if tamper resistance is required, authenticate the record payload. (Keep the implementation details outside this page’s scope.)

A verifiable log supports warranty disputes and forensic analysis without relying on assumptions.

Crash-Safe Ring Log (Append-Only + Commit Marker) Payload first, commit last; reboot scan rebuilds index Ring buffer slots empty payload + crc commit = 0 payload + crc commit = 1 event_type • peak • duration • rail_id • temp_bin time_base • trip_action • counters • crc • commit tail head Two-step write (transaction-like) Stage A: write payload + CRC (not valid yet) Stage B: write commit marker (makes record valid) Reboot recovery scan scan last K slots find last commit=1 rebuild head/tail Wear-aware behavior Ring overwrite spreads writes evenly → consistent endurance for EEPROM/Flash
Figure F7. A crash-safe ring log: records are appended as “payload+CRC” and only become valid after a commit marker. After power loss, a short scan finds the last committed record and rebuilds indices.

H2-8. Surge/ESD/Lightning Evidence Playbook (What to Capture, Where to Probe)

Field debugging succeeds when evidence collection is disciplined: capture the smallest set of signals that can prove causality. The default triage is two waveforms (input voltage and protection current/limit), with an optional third (MCU rail/reset) when the failure looks like a system reboot rather than a pure power event.

#1 V_in / V_clamp #2 I_limit / IMON Optional #3 reset

1) Probe points that answer “what happened” vs “what reacted”

  • Input & clamp: before/after clamp to quantify residual stress (supports peak_v evidence).
  • eFuse/HSS VIN/VOUT: confirm limit/disconnect behavior (supports trip_action).
  • Current sense / IMON: observe current limiting dynamics and event duration (supports peak_i, duration).
  • MCU rail / reset: attribute resets to power integrity vs unrelated logic (supports brownout_count correlation).

2) Two-waveform triage (default SOP)

  • Waveform #1: V_in and/or V_after_clamp (residual voltage and valleys).
  • Waveform #2: I_limit or IMON at the protection stage (action timing and severity).
  • Optional #3: MCU_VDD or RESET (prove reboot causality).

The goal is not to “measure everything” but to prove the sequence: disturbance → protection action → system consequence.

3) Trigger alignment (pre/post window): prove causality, not correlation

  • Trigger source: use the protection trip edge (FAULT/PG change) as the primary alignment anchor.
  • Pre window: capture the disturbance arrival and clamp response.
  • Post window: capture limit/disconnect behavior and any brownout/reset aftermath.

Alignment matters more than raw sample rate. Without a shared time base, “before/after” cannot be proven.

4) Use the log to prove exposure (did a surge event actually happen?)

  • Type + severity: event_type + high peak_v/peak_i + meaningful duration supports real exposure.
  • Density: rising counters (especially in a time window) indicates a harsh electrical environment.
  • Power integrity trend: changes in bus_min_hist or increased brownout_count show weakening margins over time.
Probe Map + Waveform Triage (2 signals first) Probe the chain, then capture V_in and I_limit aligned to the trip edge Luminaire power chain (simplified) Input V_in Clamp V_after eFuse / HSS VIN / VOUT Bus Vbus LED load domain current sense / IMON Control domain MCU_VDD / RESET 1 P1 2 P2 3 P3 4 P4 5 P5 6 P6 2-wave triage #1 Voltage P1 or P2 #2 Current/Limit P5 (IMON) Optional #3 P6 (RESET) Pre/Post window pre post trip
Figure F8. Probe map on a simplified luminaire chain. Default evidence capture uses two aligned signals: voltage at input/clamp (P1/P2) and current/limit behavior (P5), with reset/MCU rail (P6) as the optional third.

H2-9. Coordination Rules (TVS/MOV/GDT vs eFuse vs Downstream Loads)

Protection parts do not add up automatically. Coordination is about energy going to the right path, actions happening in the right order, and sensitive ICs never becoming the transient “energy sink.” A coordinated chain uses a clamp stage to shave peaks, a limit stage to control energy injection, and a disconnect/retry policy to prevent thermal accumulation and nuisance trips.

Energy path Time order Relative thresholds

1) Coordination objectives (what “good” looks like)

  • Route energy correctly: large transient energy is handled by the clamp path, not by metering/MCU rails.
  • Protect expensive silicon: the most fragile domains (metering SoC, ADC front-end, MCU) must see reduced residual stress.
  • Maintain system continuity: short disturbances should not become repeated resets or “false faults.”

2) Common failure patterns (why “more parts” can die faster)

  • TVS placed too late: the transient hits switches or metering ICs first; the clamp acts after damage occurs.
  • eFuse limiting too slow: MOV absorbs repeated energy, overheats, and drifts or fails after multiple events.
  • Thresholds too tight: EFT-like bursts trigger nuisance trips, creating repeated retry/reset storms.

When coordination fails, the log typically shows either “no action during high peak” or “too many actions during low severity.”

3) Layered threshold strategy (relative order matters more than absolute numbers)

  • Clamp threshold: earliest action to reduce peak stress (V_after_clamp should be bounded).
  • Limit threshold: engages after clamp to control energy injection (I_limit timing defines severity control).
  • Disconnect threshold: triggers only when abnormal conditions persist (prevents thermal accumulation).
  • Retry policy: distinguishes “short disturbance” vs “persistent fault” using duration and count windows.

Coordination is proven by sequence: peak arrives → clamp reduces residual → limit shapes current → disconnect/retry only if needed.

4) Evidence-based checks (use logs + two waveforms to validate coordination)

  • High peak without action: high peak_v / residual voltage but no trip_action indicates missing coupling between sensing and protection timing.
  • Action that causes collapse: trip_action=limit with worsening bus_min_hist indicates limit settings or downstream hold-up margin mismatch.
  • Nuisance storm signature: very short duration + high retry_count_window suggests EFT-like bursts being interpreted as real faults.
Coordination Ladder (Clamp → Limit → Disconnect → Retry) Relative thresholds + time ordering prevent “parts added, failure faster” Threshold / Severity Time low high ns→µs→ms CLAMP TVS/MOV/GDT peak shave LIMIT eFuse/HSS energy control DISCONNECT latch-off / ORing RETRY policy window + caps clamp first then limit Pitfall tags TVS too late chips see peak eFuse too slow MOV overheats threshold tight EFT nuisance trips retry storm EFT zone: short, repetitive
Figure F9. Coordination ladder: clamp acts first to reduce peaks, limit shapes energy next, disconnect triggers only for persistence, and retry policy prevents short repetitive disturbances from becoming false fault storms.

H2-10. Telemetry & Maintenance (From Logs to Action)

Metering and logging are only valuable when they drive decisions. The recommended model is a closed loop: events + snapshots are reduced into KPI signals, then mapped to actions for field maintenance, warranty handling, and predictive service—without requiring any specific cloud platform assumptions.

Snapshot: periodic Event: on-trigger KPI → Action

1) Device-side telemetry outputs (what to export)

  • Periodic snapshot: Wh, runtime_h, temp bins, drift summaries, valley histogram summaries.
  • Event report: event_type, peak, duration, trip_action, rail_id, time_base, window counters.
  • Rate control: during event storms, report counts/summary first and limit detailed records to the latest N.

2) Local maintenance workflow (field-ready sequence)

  • Read dashboard first: energy/runtime, trip density, brownout exposure, drift indicators.
  • Read latest N events: confirm type/severity/action and timeline ordering.
  • Capture 2-wave evidence: V_in (or V_after_clamp) + I_limit, aligned to the trip edge.

The fastest root-cause path is: dashboard anomaly → event pattern → two waveforms to prove sequence.

3) Reset/clear policy (protect evidence value)

  • Who can clear: restrict to authenticated service access or physical presence.
  • What to clear: clear window counters and health score; keep lifetime counters for history and warranty evidence.
  • When to clear: clear only after a maintenance action (replace clamp parts, fix grounding, clean thermal path) to create a new epoch.

Clearing is not housekeeping; it defines a before/after comparison window for maintenance effectiveness.

4) Decision mapping (KPI → recommended action)

  • High surge density: rising event density → inspect grounding/wiring, verify external surge protection condition.
  • Power drift + hotter bins: increasing drift with hotter bins → check thermal path, contamination, aging indicators.
  • Many brownouts/valleys: increased brownout exposure → point to input quality or hold-up margin issues (without expanding into PFC design).
Log → KPI → Action Loop (Device-side) Turn evidence into maintenance decisions and measurable outcomes Inputs Periodic snapshots Wh • runtime Event reports peak • duration KPI extraction Event density count/window Valleys / brownouts Vmin hist Drift + temp bins health trend Health score low high score + reason tags Actions Local service inspect grounding replace clamp parts Remote notice alert + summary schedule maintenance After service clear window counters keep lifetime history verify KPI improves new maintenance epoch
Figure F10. Log-to-action loop: device-side snapshots and events are reduced into KPI signals and a health score, mapped to local/remote actions. After service, clear window counters (keep lifetime history) and verify KPI improvement.

H2-11. Compliance-Ready Artifacts (What to Document for EMC/Surge Readiness)

This section is a pre-test evidence pack: it does not teach standards or lab procedures. It defines the minimal artifacts that make surge/ESD readiness auditable and test failures quickly reproducible: (1) protection-chain BOM evidence, (2) threshold & policy table, and (3) event-log field guide with sample exports, plus a fail-package for fast root-cause replay.

Design params Test captures Log exports

1) Protection-chain BOM evidence (category + placement + rating + MPN examples)

Provide a BOM-level list that allows a reviewer to validate energy path and exposure boundaries without guessing. Include: category, location, nominal rating fields, and example MPNs (verify ratings against the target design).

Stage / Category Where it sits (example) What to document (rating fields) Example MPNs (verify suitability)
TVS diode (fast clamp) After connector, before sensitive rails (P2) VRWM / VCL / peak pulse power, package, polarity Vishay SMBJ58A, SMCJ58A; Littelfuse SMBJ33A
MOV (energy absorber) Across input (line-to-line / line-to-PE), “energy path” VAC rating, surge current, energy rating, thermal class EPCOS/TDK S14K275; Bourns MOV-14D471K
GDT (high-energy shunt) Line-to-PE (or shield/earth) for large common-mode events DC sparkover, impulse sparkover, surge rating, insulation Bourns 2038-09-SM; Littelfuse CG3-230L
eFuse (programmable limit) After clamp, before downstream conversion/loads (P3/P4) ILIM range, t_blank/debounce, SOA, thermal shutdown behavior TI TPS25947, TPS25940A; ADI/LTC LTC4368 (surge stopper class)
High-side switch (protected load switch) Domain isolation (LED load vs control rail), per-rail (rail_id) RDS(on), current limit, fault flag timing, reverse current handling Infineon PROFET™ examples: BTS7002-1EPP, BTS50010-1TAD
Current sense (shunt) Near eFuse output or LED load return (P5/IMON) mΩ value, TCR, power rating, Kelvin routing note Vishay WSL2512 series; Isabellenhütte ISA-WELD (example family)
Isolated current sense (optional) When sense domain must stay isolated from surge reference CM range, isolation rating, bandwidth/latency, output type TI AMC1301, AMC3301; ADI ADuM3190 (isolation/control class)
NVM for logs Local ring log storage (H2-7) Endurance, write granularity, power-loss behavior Fujitsu MB85RS64V (FRAM); Microchip 24LC256 (EEPROM); Winbond W25Q64JV (SPI Flash)

The MPN list is provided as concrete examples for documentation completeness. Always confirm electrical ratings, creepage/clearance, and safety constraints against the specific luminaire architecture.

2) Threshold & policy table (limit / latch / retry)

Before any lab run, produce a single-page “behavior contract” that maps conditions to detection and action. Keep it device-side and design-focused (no platform assumptions).

Condition Detection basis Action Timing / counters Evidence fields
OC / short IMON exceeds ILIM; duration integration Limit → disconnect (if persistent) t_blank, t_limit_max, retry backoff, retry cap event_type, peak_i, duration, trip_action, counters
OV / surge-like V_after_clamp peak + window count Clamp first; limit injection; optional latch if repeated count_window thresholds, escalation policy peak_v, duration, trip_action, rail_id, temperature_bin
UV / brownout Bus valley histogram; PG/reset correlation Graceful retry; avoid reset storm debounce, min-off time, window caps bus_min_hist, brownout_count, time_base
EFT nuisance Short repetitive bursts; low severity but high frequency Reject/ignore via debounce + caps; avoid false faults short duration gate, retry_count_window threshold duration (short), retry_count_window (high)

3) Event-log field guide + sample export

Provide a field dictionary and a small sample export that shows how events are interpreted. The goal is “explainable evidence” (what happened, how severe, what the system did, where it occurred, and when).

  • Field dictionary: field → meaning → unit → source (sensor/logical) → notes
  • Sample export: at least 3 entries (surge-like, OC, brownout) with valid CRC/commit markers
  • Time base: if RTC is unreliable, keep a monotonic counter for ordering

Sample export (illustrative)

CSV-like rows (device-side)

event_id,event_type,peak_v,peak_i,duration_ms,trip_action,rail_id,temp_bin,time_base,counters,crc_ok,commit_ok
10421,surge,860V,2.1A,0.12,limit,INPUT,25-50C,mono:8891231,win=3;life=57,1,1
10422,oc,52V,6.8A,4.5,disconnect,LED_BUS,50-75C,mono:8891450,win=2;life=12,1,1
10423,brownout,48V,1.0A,18.0,retry,CTRL_RAIL,25-50C,mono:8892102,win=5;life=90,1,1

4) Minimal fail-package (what to bring back when a test fails)

A failure is only actionable when it can be replayed as a short evidence bundle. Keep the package minimal and consistent:

  • Waveforms (2 required, 1 optional): V_in (or V_after_clamp) + I_limit/IMON; optional MCU_VDD/RESET.
  • Log export: last N events around the failure point (include time_base, trip_action, window counters, CRC/commit validity).
  • Conditions sheet: input setup, grounding mode, temperature range (temp_bin), load mode/power level.

This bundle is intentionally independent of any cloud platform. It is designed to minimize re-test cycles and guesswork.

Compliance Evidence Checklist (Pre-Test Pack) Prepare three deliverables + a minimal fail-package for fast replay Design params Test captures Log exports BOM evidence category + placement + rating MPN list TVS/MOV/GDT/eFuse/HSS/NVM Threshold policy limit/latch/retry + counters Coordination note energy path + time order 2 waveforms (default) V_in/V_clamp + I_limit/IMON aligned to trip edge Optional #3 MCU_VDD / RESET Pre/Post window disturbance → action → outcome Conditions sheet input/grounding/temp/load Field dictionary meaning + unit + source Sample export surge / OC / brownout CRC + commit validity Fail-package log last N events around failure Time base RTC or monotonic ordering
Figure F11. Compliance-ready evidence checklist: design parameters (including MPN evidence), minimal test captures, and log exports that make failures reproducible and reviews efficient.
Cite this figure: “Figure F11 — Compliance Evidence Checklist (Pre-Test Pack), ICNavigator.”

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12. FAQs ×12 (Evidence-Routed, No Scope Creep)

Each answer routes the symptom to the first evidence to capture (waveforms + log fields) and the chapter to use. The intent is long-tail coverage without introducing new design scope.

2 waveforms first Log fields second Chapter mapping
1Adding a TVS makes reboots more frequent—clamp placement or eFuse thresholds?
If reboots increase after adding TVS, suspect coordination: the clamp may be placed “too late,” or the eFuse threshold/debounce converts short disturbances into repeated retries. Capture V_after_clamp and I_limit/IMON around the reboot edge, then check trip_action, duration, and retry_count_window to confirm whether the system is clamping late or tripping too aggressively.
Evidence: V_after_clamp + I_limit/IMON · Fields: trip_action, duration, retry_count_window · Map: H2-9, H2-4
2After lightning it still lights but looks dimmer—what “degradation signal” should logs show?
A “still working but dimmer” case is often a trend problem, not a single peak. Look for drift indicators: rising average power for the same setpoint, temperature bins shifting hotter, or abnormal increases in protective-event density. Correlate those with lifetime counters and min/max history. If dimming follows repeated events, the log should show increasing event_density, hotter temperature_bin, and long-term P_avg_drift (or equivalent).
Evidence: snapshot trends + event density · Fields: temperature_bin, counters, min/max, drift summary · Map: H2-6, H2-7
3EFT testing keeps triggering protection—threshold too tight or sensing-path noise?
EFT nuisance trips typically appear as very short disturbances with high repetition. Use a dual-path check: the fast protect path may see spikes while the slower metering path stays stable. Capture V_after_clamp and I_limit/IMON with a short pre/post window; then inspect duration and retry_count_window. If duration is tiny yet retries surge, thresholds/debounce are likely too tight; if IMON is noisy, the sensing path needs filtering or placement correction.
Evidence: short window waveforms · Fields: duration, retry_count_window, peak_value · Map: H2-5, H2-4
4After a surge, occasional lockups occur with no hard damage—how to align waveforms and logs using an “event window”?
Use an event window with pre-trigger and post-trigger buffers so the waveform timeline and log timestamp share the same reference edge. Trigger on the first protection indication (IMON threshold crossing or clamp residual threshold), record V_in/V_after_clamp and I_limit/IMON, then match the captured edge to time_base and the nearest event_id. A consistent alignment shows whether the lockup follows a brownout valley, a retry storm, or a single long-duration limit.
Evidence: pre/post buffers + shared trigger edge · Fields: time_base, event_id, duration, trip_action · Map: H2-8, H2-7
5eFuse auto-retry keeps heating the system—should it switch to latch-off, and what evidence proves it?
Latch-off is justified when retries repeatedly inject energy into a persistent fault or thermal-limited condition. Confirm with evidence: high retry_count_window, repeated trip_action=retry without recovery, and temperature bins trending upward during retries. Capture I_limit waveform to see whether each retry hits the same limit plateau and duration. If the fault persists across retries and thermal stress accumulates, a latch-off escalation policy is safer than endless retry.
Evidence: retry plateau + thermal trend · Fields: retry_count_window, temperature_bin, trip_action, duration · Map: H2-4, H2-10
6Same batch, some sites show extreme surge counts—check grounding first or input wiring first?
Start with classification evidence: surge-like events show high residual peaks and low frequency; wiring/ground issues often raise common-mode coupling and increase event density during storms or switching. Compare event_density and peak_v distributions across sites, then validate with two probes: V_in/V_after_clamp and the clamp path current (or IMON proxy). If one site has consistently higher residual voltage after clamp, input wiring/SPD placement is suspect; if counts spike with ground reference shifts, grounding is suspect.
Evidence: cross-site distributions + two probes · Fields: peak_v, counters, event_density · Map: H2-10, H2-2
7High-side switch thermal trips frequently—real short-circuit or excessive inrush?
Differentiate by time profile. Inrush problems show a repeatable current spike at turn-on with limited duration; short-circuit tends to sustain high current or quickly re-trigger after retry. Capture I_limit/IMON during power-up and during the thermal trip. Check duration, trip_action, and whether events cluster at enable edges. If trips correlate with turn-on edges and short windows, tune soft-start/inrush control; if trips occur during steady run, suspect load fault or wiring.
Evidence: turn-on waveform vs steady-state · Fields: duration, trip_action, counters (cluster), rail_id · Map: H2-4, H2-5
8Energy (Wh) metering is inaccurate—sampling bandwidth limits or calibration/temperature drift?
First separate dynamic error from drift. If error grows with dimming changes or PWM depth, suspect bandwidth/aliasing in the sampling chain; verify by comparing fast-path transitions to slow-path RMS/energy accumulation. If error correlates with temperature bins or long runtime, suspect calibration coefficients or shunt/ADC drift. Review snapshot stability and confirm with controlled load steps: the metering should converge consistently. Use fields for temperature_bin, calibration version, and min/max to detect drift signatures.
Evidence: step tests + temp correlation · Fields: temperature_bin, min/max, calibration_id/version · Map: H2-6, H2-5
9Logs sometimes disappear—how to survive power-loss writes without corrupting the index?
Use crash-safe logging: ring buffer + commit marker + integrity check. A record is valid only when a commit flag and CRC are present; otherwise it is ignored and the index is rebuilt by scanning for the last committed entry. Evidence is in the metadata: crc_ok, commit_ok, and a monotonic time_base. Validate by forced power interruption tests: the system should recover to a consistent pointer and preserve earlier committed events.
Evidence: commit+CRC validity · Fields: crc_ok, commit_ok, time_base · Map: H2-7
10Protection triggers but MCU records nothing—timing issue or reset interrupts logging?
If a protection action occurs without a corresponding log entry, the write may be interrupted by a reset or the log is downstream of a collapsing rail. Capture optional MCU_VDD/RESET along with V_after_clamp and I_limit. Then check whether commit_ok fails around that time and whether a brownout counter increments. If reset precedes commit, move logging to an always-powered domain or use a two-phase commit that finalizes quickly before rails collapse.
Evidence: RESET alignment + commit validity · Fields: commit_ok, brownout_count, time_base · Map: H2-7, H2-8
11What is the minimal “warranty-grade” event log field set?
A warranty-grade record must support severity, action, location, and ordering. Minimum recommended set: event_id, event_type, peak_value (V or I), duration, trip_action, rail_id, temperature_bin, counters (window + lifetime), and time_base (RTC or monotonic). Add integrity flags (crc_ok, commit_ok) so exported logs remain trustworthy after power-loss recovery and field service handling.
Evidence: field dictionary + sample export · Fields: event_id, event_type, peak_value, duration, trip_action, rail_id, temperature_bin, counters, time_base, crc_ok/commit_ok · Map: H2-7, H2-11
12On-site triage: lightning strike or grid disturbance—how to tell quickly?
Use pattern recognition with minimal evidence. Lightning/surge-like exposure tends to show high peak_v and clamp activity with relatively sparse counts, sometimes followed by degradation trends. Grid disturbance shows frequent bus valleys, rising brownout_count, and resets that align to input dips rather than clamp residual peaks. Capture two waveforms: V_in/V_after_clamp and I_limit, then compare the event density and valley histogram over time. The correct classification drives the next inspection step.
Evidence: peak vs valley signature · Fields: peak_v, bus_min_hist, brownout_count, counters · Map: H2-2, H2-8
FAQ Routing Mini-Map Symptom → First evidence (2 waveforms + fields) → Chapter Symptom First evidence Chapter More reboots after TVS added Dimmer but works post lightning EFT nuisance trips false faults Missing log during action/reset On-site triage lightning vs grid V_after_clamp + IMON duration, retry_count Trend snapshots density, temp bins, drift Short window waveforms duration, window counts RESET + commit flags time_base alignment Peak vs valley signature peak_v vs brownout H2-9 + H2-4 coordination + eFuse H2-6 + H2-7 metering + logging H2-5 + H2-4 sensing + thresholds H2-7 + H2-8 crash-safe + probes H2-2 + H2-8 threats + capture SOP
Figure F12. FAQ routing mini-map: each symptom points to the first evidence (2 waveforms + key fields) and the chapter that resolves it.
Cite this figure: “Figure F12 — FAQ Routing Mini-Map (Lighting Protection & Metering), ICNavigator.”