123 Main Street, New York, NY 10001

Programmable Digital LED Driver (I2C/PMBus, Telemetry & Logs)

← Back to: Lighting & LED Drivers

A programmable digital LED driver is “worth it” when lighting behavior must be defined by data (register maps, curves, and policies) and proven by evidence (telemetry, counters, CRC, and event logs), not by fixed analog defaults. This page shows how to configure safely (shadow→apply→commit), deliver smooth dimming, and make faults diagnosable using measurable fields and a repeatable validation flow.

What it is, and when a “programmable” driver is worth it

A programmable digital LED driver is defined by a register image (what the product “is”), a dimming engine (how brightness changes over time), and read-back evidence (telemetry + fault logs) that makes field issues diagnosable.

Decision rule: “Programmable” is worth it when the product must be configured (SKU flexibility / calibration), controlled (curves & fades without artifacts), and observed (telemetry + logs for traceability) with measurable pass/fail evidence.

Three capability layers that must be delivered (not just claimed)

  • Config (write): the device becomes a specific product by writing a stable image (channel mapping, rated current limits, default behavior). Evidence must include revision/id and config check (CRC/PEC/readback).
  • Control (curves & fades): brightness is produced by a dimming engine (LUT/segments + fade timing). Evidence must include the effective target (what the IC is actually enforcing) to avoid “write succeeded but nothing changed”.
  • Observability (telemetry + fault logs): health and failures are exportable as structured data (status bytes/words, fault flags, event logs with snapshots). Evidence must include “stale/valid” and a way to correlate events (timestamp or monotonic counter).

Typical system roles (who owns what)

  • Host MCU / controller: discovers devices, writes the register image safely, issues runtime dimming commands, and records evidence (readback + error counters).
  • Driver IC: stores the image (shadow/active and optional NVM), executes curves/fades, and exports telemetry/status/log data.
  • Sensors (temperature / current sense): provide inputs for derating and diagnostics (used as evidence fields, not as a topology discussion).
  • Factory calibration tool: programs immutable defaults (factory image), writes calibration coefficients, and records version/CRC for traceability.

Evidence fields to lock at project start (minimum set)

  • Address plan: fixed strap vs soft address; collision avoidance on multi-drop buses.
  • Bus speed & margin: target bitrate + pull-up/capacitance budget (rise-time criterion).
  • Integrity: PEC/CRC enabled? mandatory read-after-write verification? retry policy for NACK/bus error.
  • NVM commit policy: when commits happen, how atomicity is ensured, and how brownout is detected/recovered.
Programmable Driver = Image + Dimming Engine + Evidence Config (write) • Control (curves/fades) • Observability (telemetry/logs) Capability Layers (deliverables) Config register image + NVM commit Control curves + fades + effective target Observability telemetry + status + logs timestamp / counter System Roles (data flows) Host MCU I2C/PMBus master Factory Tool cal + default image Driver IC register image + dimming engine telemetry + fault logs Sensors temp / current LED Channels output behavior Config/Control Telemetry/Logs Commit
Cite this figure: Capability layers and role/data-flow view for programmable digital LED drivers (conceptual). See References.

System architecture: separating the power stage from the digital plane

This page focuses on the digital plane (bus + registers + telemetry/logs). The power stage can be treated as a separate layer, as long as the digital plane remains measurable, recoverable, and noise-resilient.

Digital-plane signals and what they mean (practical semantics)

  • SCL / SDA: configuration writes, runtime control commands, and read-back evidence. Failure signatures include NACK bursts, glitches, and SDA stuck-low.
  • ALERT / INT (if present): event-driven indication that new status/log data is available, reducing blind polling and improving diagnosability.
  • EN / RESET: controlled bring-up and last-resort recovery when the bus hangs or the device enters an unknown state (useful for field robustness).
  • FAULT (if present): hardware-level fault indicator; it must be consistent with status flags/logs to support root-cause analysis.

Key architecture rule: every “digital plane” failure must be observable at a small set of measurement points (SCL/SDA + one event/reset line) and recoverable without power-cycling the entire luminaire.

Isolation and reference domains (interface-only impact)

  • Propagation delay and edge shaping can reduce timing margin at higher bus speeds; “looks fine on paper” can still fail under EMI.
  • Pull-up placement becomes non-trivial across an isolator; poor placement often shows up as slow rise time, NACK bursts, or unstable logic thresholds.
  • Bidirectional behavior can interact with clock stretching and bus recovery, making “stuck bus” incidents harder to clear unless reset/recovery is designed in.

Minimum measurement set (2 + 1) for fast diagnosis

  • TP_SCL: verify rise/fall time, glitches, and continuous clocking during transactions.
  • TP_SDA: verify ACK/NACK behavior and ensure SDA is not held low after an error.
  • TP_INT (or TP_RESET/TP_EN): confirm the device provides an observable event path and a deterministic recovery path.
Layering: Power Plane vs Digital Plane (measure & recover) Focus: SCL/SDA + ALERT/INT + EN/RESET for diagnosability Power Plane Power stage (out of scope) kept separate from bus/telemetry logic Digital Plane (in-scope) Host MCU I2C/PMBus master ISO opt Driver IC register image telemetry + logs ALERT / INT SCL/SDA ALERT/INT EN/RESET TP_SCL TP_SDA TP_INT Goal bus errors must be measurable and recoverable without power-cycling the whole system
Cite this figure: Digital-plane layering and test-point plan (conceptual). See References.

I²C / SMBus / PMBus essentials that actually break products

This chapter is written as a deliverable: it defines address policy, timing margin, and integrity/recovery rules that keep bus transactions stable under real wiring, capacitance, and noise.

Failure signatures to design against: NACK bursts → retry storms, “write OK but behavior unchanged”, bus stuck-low (hung bus), device not-ready windows, and long-cable variance across batches.

Address planning: strap vs soft address (and multi-device collision rules)

  • Strap address: predictable and production-safe; preferred when multiple identical devices share one bus and collisions must be structurally impossible.
  • Soft address: flexible but must include a deterministic source of truth (when it is written, who is allowed to write it, and how it is restored after reset).
  • Collision rule: if two devices respond to one address, the system must enter a no-write safety mode (stop configuration writes) until the conflict is resolved.
  • Scan policy: scanning is for discovery and inventory (read-only); configuration writes should require an explicit match on revision ID (and optional variant ID) to prevent writing the wrong target.

Timing margin: pull-up + bus capacitance + rise time + clock stretching

  • Rise-time budget is the real limiter in field wiring. Treat the bus as Rpullup + Cb: as Cb grows (cables, connectors, ESD parts), the edge slows and margin collapses.
  • Deliverable requirement: specify a target bus speed together with a maximum allowed Cb (or verified tr) and a recommended Rpullup range.
  • Waveform pass/fail: verify tr at TP_SCL/TP_SDA and ensure the HIGH level stays above the input threshold with margin (glitches near VIH/VIL are the common “works on bench, fails in product” cause).
  • Clock stretching: treat long SCL-low periods as a first-class spec item. Define host tolerance (timeout), record the worst-case stretch, and design retry/backoff so stretching cannot trigger retry storms.

Integrity and recovery: PEC/CRC, repeated start, ACK/NACK semantics, retries

  • PEC/CRC: enable when wiring/noise is not tightly controlled. Integrity must be measurable via counters (PEC failures, retry counters) rather than “seems stable”.
  • Repeated start: define whether the target supports it and how the host behaves if a repeated-start sequence fails mid-transaction (abort + recovery path).
  • ACK/NACK semantics: distinguish “not-present/not-ready” from “data rejected/busy” from “integrity failure”; each category must map to a different recovery action.
  • Retry policy: specify max retries, backoff timing, and a forced escape hatch (device reset or bus recovery) to prevent infinite retry loops.

Evidence checklist (loggable and testable)

  • Electrical: measured tr on SCL/SDA, estimated Cb, observed glitch count (if any), and worst-case clock-stretch duration.
  • Statistics: NACK rate, retry counter, PEC/CRC failure counter, and “stuck-bus incidents”.
  • Policy: address table, scan mode vs write mode boundary, and a defined recovery path for each failure class.
I²C Physical Model + Waveform Pass/Fail Rpullup + Cb define rise time; thresholds define noise margin Host MCU SCL / SDA master Bus Node Rpullup VDD Cb bus cap SCL SDA Targets one or more devices Input Thresholds VIH / VIL Waveform criteria at TP_SCL / TP_SDA VIH VIL tr glitch Pass tr within budget HIGH margin above VIH TP_SCL TP_SDA
Cite this figure: I²C physical-layer model and waveform pass/fail criteria (conceptual). See References.

Register map strategy: pages, atomicity, and “safe writes”

Treat configuration as a transaction. A write is not complete until the intended image is verified and the active behavior is proven to match (via effective targets and status).

Pages / banks: scalability with explicit targeting

  • PAGE/BANK is an implicit state. If host and device disagree on the active page, writes silently land in the wrong place.
  • Rule: every configuration transaction must begin with an explicit target page selection and must record that page in the host log.
  • Group writes: when multiple channels must be consistent, use a group mechanism (page-wide apply or group commit) rather than sequential live edits.

Atomic update: Shadow → Verify → Apply → Active

  • Shadow: a staging area to build a coherent image (multiple fields, multiple registers) without exposing intermediate states.
  • Verify: read-back and/or checksum/PEC validation catches “half writes”, bus noise, and addressing mistakes before any change becomes live.
  • Apply: a single action that transfers shadow to active, enforcing “all-or-nothing” behavior. A busy/lock flag must gate apply to prevent partial transitions.
  • Active: the only truth for runtime behavior; reading effective targets and status must confirm that active equals the intended image.

Safe write transaction template (repeatable steps)

  1. Select target: set PAGE/BANK and confirm device identity (revision ID).
  2. Write payload: write fields to shadow (prefer block writes where supported) and record a transaction ID on the host.
  3. Read-back verify: read critical fields (or config CRC if available); count errors and enforce a retry/backoff policy.
  4. Apply: assert apply/group-commit; confirm completion (busy cleared) before proceeding.
  5. Prove active: read effective targets + status; if mismatch, record last-error code and stop further writes (prevent cascading failure).

Evidence fields (minimum)

  • Transaction fields: target page, payload length, checksum/PEC status, apply/commit flag, and host-side retry count.
  • Read-back fields: revision ID, config CRC (or equivalent), effective targets, and last-error code / busy flag.
Safe Write = Target + Shadow + Verify + Apply + Prove Active Treat configuration as a transaction, not a single register write Host MCU transaction log + counters Select PAGE Write block Driver IC internal state Shadow staging image Verify read-back / CRC / PEC Apply Gate busy / lock / rules Active effective targets payload readback OK apply read effective Error Code last-error / busy fail
Cite this figure: Shadow→Verify→Apply→Active transaction model for safe register programming (conceptual). See References.

NVM/OTP: committing defaults without bricking units

Non-volatile storage turns “configuration” into a product promise. The goal is consistent defaults across production lots, safe field updates, and a provable rollback path when writes fail.

Core rule: a commit is never “done” until the target image is validated (CRC/version/valid bit) and the boot selection logic is proven to choose a safe image after resets and brownouts.

Image model: STORE / RESTORE / DEFAULT (without vendor-specific commands)

  • DEFAULT image (Factory baseline): the known-good configuration that must remain recoverable under all failure modes. Treat as read-mostly and immutable in the field.
  • USER image (Field configuration): optional customer/runtime preferences that may be updated, but must never override the ability to boot safely.
  • RUN image (Active/shadow): the working set used during normal operation. RUN may change frequently, but must not trigger frequent NVM writes.
  • STORE persists a selected image; RESTORE loads an image into RUN; DEFAULT is the emergency fallback when validation fails.

Brownout risk: why partial writes brick units (or create “silent corruption”)

  • Failure mode: power loss during commit can leave metadata updated but payload incomplete (or vice versa). The result is an image that “exists” but fails validation.
  • Silent corruption is worse than a hard fail: behavior changes after reboot with no obvious error unless CRC/version/valid bits are checked.
  • Minimum safeguards: commit must be gated by a “safe-to-write” condition (no brownout, stable reset cause), and must end with mandatory validation before switching the active image pointer.
  • After reset: boot selection must prefer the newest valid image; if CRC fails, it must roll back deterministically to a valid prior image (or DEFAULT).

Endurance and write-rate limits: factory vs field partitions

  • NVM is not a cache: frequent commits for dimming behavior, telemetry, or runtime tweaks will consume endurance and raise failure probability.
  • Partition strategy: separate “factory baseline” from “field preferences”. Factory partition should be write-protected outside production; field partition must be rate-limited and validated.
  • Commit policy: define allowed commit triggers (e.g., commissioning only), and enforce minimum intervals and maximum lifetime commit counts per partition.
  • Rollback discipline: never overwrite the only known-good image. Always keep at least one prior valid image in reserve.

Evidence checklist (must be readable via host or diagnostics)

  • Commit trace: commit counter, last commit status (OK/FAIL/INCOMPLETE), and a timestamp or sequence number if available.
  • Validation: image CRC per image (DEFAULT/USER A/USER B) plus version and valid bit.
  • Power event proof: brownout flag and reset cause around commits.
NVM Image Layout + Validation + Rollback Keep a known-good DEFAULT; write inactive slot; validate then switch Driver NVM DEFAULT Factory baseline CRC VER VALID USER A Field config CRC VER VALID USER B Inactive slot CRC VER VALID Metadata COUNTER LAST STATUS BROWNOUT RESET Commit → inactive slot Verify CRC / VALID Boot selects newest VALID ROLLBACK → DEFAULT
Cite this figure: NVM image layout with DEFAULT + USER A/B, CRC/version/valid metadata, and deterministic rollback (conceptual). See References.

Dimming engine: curves, fades, and what “linear” really means

Curves and fades are a pipeline problem: commands are interpreted by an engine, converted into a current target, then constrained by clamps and derating. “Linear” must be defined in the domain that matters.

Curve representations: LUT vs segmented linear vs polynomial (concept-level tradeoffs)

  • LUT: predictable and calibratable; low-level control can be dense where the eye is most sensitive. Cost is points and storage.
  • Segmented linear: compact and stable; good when register space is limited while still allowing “denser low end”.
  • Polynomial: few parameters but sensitive to edge behavior and numeric stability. Requires careful bounding to avoid overshoot near endpoints.
  • Key engineering point: the curve defines low-light resolution and step visibility, not just a mapping from “percent” to “current”.

Fades: timebase, step strategy, and avoiding visible stair-steps

  • Timebase: an internal engine tick is deterministic; host-timed updates can jitter with scheduling and bus latency.
  • Step strategy: a constant step size often looks “steppy” at low light. Better strategies densify steps near low levels or use time-normalized interpolation.
  • Interpolation: define how intermediate points are computed (nearest / linear between points). Even a LUT needs an interpolation policy for smooth fades.
  • Deep dimming stability: use a minimum current clamp to avoid dropouts and a controlled transition through the lowest region where quantization dominates.

Perceptual consistency: current-linear is not visually-linear

  • Gamma/log curves are practical tools for “equal perceived steps”. The point is not the math, but the measurable outcome: fewer visible jumps at low levels.
  • Define “linear” explicitly: linear in current, linear in perceived brightness, or linear in command scale. The chosen definition must match product expectations.
  • Stability knobs: clamp, optional dither, and derating must be placed in the pipeline to avoid unexpected jumps when constraints engage.

Evidence checklist (configuration + verification)

  • Config: curve ID, LUT points (or segment params), fade time, step rate, min current clamp, and dither enable (if supported).
  • Output: ripple vs dim level (trend), plus a low-level stability metric such as jitter/jump count per time window.
Curve Pipeline: Command → Engine → Current Target → Output Insert clamp / derate / dither to keep deep dimming stable Command curve ID fade time Engine LUT / segments tick / step Target current target Output measured Clamp Derate Dither effective target Verification Ripple vs level Low-level jitter count Step visibility events
Cite this figure: Dimming curve pipeline and stability insertion points (clamp/derate/dither) with verification hooks (conceptual). See References.

Runtime control vs protection overrides: who wins

A dimming command is not the output. The output is the result of a priority resolver that combines runtime intent with protection and derating constraints. When “writes do nothing” in the field, the missing piece is usually the winning layer and its recovery rules.

Practical model: the resolver produces an effective current target. If any hard shutdown condition is active (UVLO/OTP/critical latch), the effective target becomes zero regardless of the runtime command.

Priority ladder (from highest to lowest)

  • Hard shutdown: UVLO / OTP / critical fault → effective target forced to 0 (safe state).
  • Fault latch: latched short/open/overcurrent → blocks output until clear rules are satisfied.
  • Thermal derating: scales down the target (derating factor) to avoid reaching a shutdown threshold.
  • Soft constraints: min clamp, slew limit, fade step-rate caps → reshape the target without declaring a fault.
  • Runtime intent: manual dim target / fade engine output (the “requested” target).

Soft derate vs hard off: recovery rules decide field behavior

  • Soft derate keeps output on but reduces brightness. It must include hysteresis to prevent oscillation near thresholds.
  • Hard off forces output to zero. It must define clear conditions (cool-down timer, retry budget) to avoid “never recovers” failures.
  • Recovery triggers typically combine: temperature below a release threshold, a minimum time window, and a stable input condition (no UVLO/reset churn).
  • Command consistency: after recovery, the resolver should return to the latest valid runtime intent, not an undefined intermediate value.

Debug method: prove the winning layer using four fields

  • effective current target: what is actually applied after all overrides.
  • derating factor: the scaling applied by thermal or other derate logic (explains “why dimmer”).
  • fault latch bit: indicates “blocked until cleared” conditions (explains “why stuck off”).
  • retry timer: remaining cool-down / retry delay (explains “when it may recover”).

A runtime write that “does not work” is diagnosable when the effective target is visible alongside derating and latch state.

Priority Resolver: Runtime Intent vs Protection Overrides Effective target is the only output that matters in the field Runtime intent manual dim target fade engine output Thermal derate derating factor Fault latch latched fault bit clear condition UVLO / OTP hard shutdown Priority Resolver select / scale / clamp effective current target min clamp slew limit Output effective target applied current Recovery control hysteresis timer TP_EFFECTIVE
Cite this figure: Priority resolver that merges runtime intent, derating, and protection layers into an effective current target (conceptual). See References.

Telemetry: what to measure, how to trust it, how to use it

Telemetry is a signal chain, not a list of numbers. Trust depends on sampling path, calibration, filtering, update timing, and explicit freshness/range flags. Use depends on thresholds and trends that can be executed locally by the host.

What to measure: categories that explain behavior

  • Supply health: VIN/VOUT indicates margin to UVLO and identifies sag events that change effective output.
  • Output proof: ILED (and duty/target indicators if available) proves whether the effective target is being met or constrained.
  • Thermal context: temperature explains derating engagement and proximity to shutdown thresholds.
  • Energy estimate: power estimation supports trend-based warnings and detects abnormal load or thermal drift patterns.

How to trust it: raw vs scaled, calibration, filtering, and freshness

  • Raw vs scaled: raw codes (ADC counts) expose clipping and offset; scaled values apply units and calibration coefficients.
  • Calibration coefficients: define absolute accuracy; coefficients should have an identifier or revision for traceability.
  • Filter window: reduces noise but adds latency; the host must interpret values in the context of the filter and update period.
  • Update period and stale flag: a fresh-but-late value is different from a stale value; both must be detectable.
  • Range/clip flags: if a channel is saturated or out-of-range, scaled values may be misleading even if they look stable.

How to use it locally: threshold alerts and trend warnings (no cloud required)

  • Threshold alerts: VIN near UVLO, temperature near derate/shutdown, ILED deviation from effective target beyond tolerance.
  • Trend warnings: rising temperature slope, repeated VIN dips, or increasing power estimate over time (indicates cooling degradation or load change).
  • Reliability checks: ignore updates marked stale; down-rank channels with clip flags; require persistence across multiple fresh samples.

Evidence checklist (minimum telemetry packet for field reproducibility)

  • telemetry raw and telemetry scaled captured together (same sample time).
  • cal coefficients (or coefficient set ID / revision).
  • update period and latency context (filter window implied by configuration).
  • stale flag to prevent using old values as proof.
  • range/clip flags to identify saturation and out-of-range behavior.
Telemetry Chain: ADC → Filter → Scale/Cal → Registers → Host Trust requires raw/scaled context plus stale and clip flags Sensing VIN / VOUT ILED Temp / Duty ADC raw codes Filter window / latency Scale / Cal coefficients scaled units Telemetry Registers RAW SCALED FLAGS stale / clip Host alerts trend warnings Timing & Freshness update period filter latency stale flag blocks decisions
Cite this figure: Telemetry signal chain showing raw/scaled paths, calibration, filtering, register flags, and local host usage (conceptual). See References.

Fault flags & event logs: making failures diagnosable

A fault that cannot be explained becomes a product failure. The goal is traceability: a compact status view for “what is wrong now,” plus a durable history of “what happened first” and “what the system looked like at that moment.”

Traceability chain: Flags (state) + Time (counter) + Snapshot (context) + Log (history) → host decode.

Flags: transient vs latched (and why both matter)

  • Transient (“seen”) indicates an event occurred at least once. It is essential for intermittent issues that self-recover before anyone reads status.
  • Latched (“blocked”) indicates recovery is intentionally prevented until clear conditions are met. It protects safety and prevents rapid re-trigger cycles.
  • Interpretation rule: a clean “current state” without history cannot explain field reports. A durable “seen” bit without a current latch cannot explain whether the device is still in a faulted condition.

Multi-fault concurrency: avoid losing the root cause

  • Status word/byte as a full bitfield: preserves the “many things can be true” reality.
  • Fault code as a quick classifier: points to the dominant category for fast triage.
  • First-fault pointer: captures the earliest trigger so later secondary effects do not overwrite the root cause.
  • Event log (ring): records fault ordering so concurrency can be replayed rather than guessed.

Event logs: ring buffer + snapshot fields that actually diagnose

An event log should not only store “what fault happened,” but also “what the system looked like when it happened.” A minimal snapshot typically includes input margin (VIN/VOUT), thermal context (temperature), and control context (requested target vs effective target). These three are usually enough to distinguish override-driven behavior from genuine electrical faults.

Clear strategy: preserve evidence without blocking recovery

  • Clear-on-read: simple, but risky for diagnostics because evidence disappears when polled. Best limited to counters or non-critical transient summaries.
  • Explicit clear: preserves evidence until a deliberate action clears it. Best for latched faults and first-fault capture.
  • Power-cycle clear: can mask repeating issues if evidence vanishes on every restart. Use carefully, and prefer retaining first-fault/log history across resets when possible.

Evidence fields (minimum set for reproducible fault analysis)

  • status word/byte (full bitfield)
  • fault code (primary classifier)
  • first-fault pointer (root-cause anchor)
  • log index (ring buffer position)
  • timestamp/counter (ordering without requiring real time)
Fault Traceability: Flags → Snapshot → Ring Log → Host Decode Preserve root cause with first-fault capture and context fields Fault sources OV / UVLO OC / short / open OTP / thermal comm / bus error Flags transient / latched Snapshot captured at trigger VIN/VOUT Temp cmd target effective status word counter First-fault pointer root cause anchor Ring log index + entries log index fault code timestamp/counter Host decode classify + correlate
Cite this figure: Fault traceability pipeline showing status flags, first-fault capture, snapshot fields, ring-buffer logging, and host decoding (conceptual). See References.

Robustness: bus integrity under EMI, isolation, and hot-plug

A robust control bus must remain recoverable in noisy environments. The engineering goal is not “never errors,” but “errors are detectable, counted, and recoverable without human intervention,” including isolation boundaries and hot-plug disturbances.

EMI failure signatures that break products

  • SCL glitches: narrow pulses or spikes that look like extra clocks to state machines.
  • SDA bit flips: unintended data transitions causing corrupted bytes or false start/stop interpretation.
  • Hung bus: SCL or SDA held low (often SDA) so no new transaction can begin.

Recovery ladder: detect → free the bus → re-sync → escalate if needed

A practical recovery sequence starts with stuck-low detection (line low beyond a safe time budget), then applies clock-pulse recovery to release a device stuck mid-bit, and finally issues a STOP to force the bus back to idle. If the bus remains hung, escalation uses a device reset line or watchdog strategy.

  • Step 1 — detect: SDA (or SCL) low longer than a defined threshold → declare “hung.”
  • Step 2 — recover: drive 9 clock pulses to advance a stuck receiver through remaining bits.
  • Step 3 — re-sync: generate a STOP condition (SCL high while SDA rises) to return to idle.
  • Step 4 — escalate: if still hung, apply device reset or rely on watchdog to prevent permanent deadlock.

Isolation boundary effects (interface-level only)

  • Propagation delay reduces timing margin and can reshape edges seen by the bus.
  • Pull-up placement matters across the boundary; the “bus” can behave like two segments with different rise behavior.
  • Bidirectional limits can affect edge cases (including recovery pulses and any stretching-like behavior), so recovery must be validated across the isolation path.

Evidence fields (make robustness measurable)

  • stuck-low detection (count and/or duration)
  • bus error counters (NACK/timeouts/corruption proxy counters)
  • reset cause (bus-driven vs other sources)
  • watchdog reset count (deadlock prevention indicator)
Hung Bus: Signature and Recovery (9 clocks + STOP) Detect stuck-low, apply recovery pulses, re-sync, then escalate Typical hung-bus observation SCL SDA glitch t_stuck (stuck-low detect window) Recovery sequence 9 clock pulses 1 2 3 4 5 6 7 8 9 STOP re-sync idle If still hung → device reset Watchdog prevents permanent deadlock Isolation adds delay
Cite this figure: Hung-bus signature and bus recovery sequence using clock pulses and STOP, with escalation and interface-level isolation notes (conceptual). See References.
.

Validation plan: bring-up → programming → dimming quality → telemetry/log verification

This gate-based plan validates a programmable digital LED driver from first contact to diagnosable failures. Each gate defines what must be proven, the two waveform groups to capture, and pass/fail criteria that can be reused in R&D bring-up and production.

Waveform rule (per gate): always capture Bus (SCL/SDA) + One proof signal (ILED or INT/ALERT). If channels are limited, prefer SCL + ILED for dimming gates and SCL + INT for fault/log gates.

Gate 0 — Bench sanity (avoid false failures)

Prove the test setup is not generating bus faults: stable idle levels, no stuck-low, and predictable reset/INT behavior before any configuration writes.

Waveforms (2 groups)
  • Bus: SCL/SDA idle level + first transaction edge quality
  • Proof: INT/ALERT at power-up (or ILED if INT not available)
Pass / Fail
  • PASS: no sustained stuck-low; clean edges without repeated unintended pulses
  • FAIL: SDA (or SCL) held low beyond a defined window; recurring glitches at idle
Evidence fields to record
reset cause, timestamp/counter baseline, bus error counters (host), stuck-low count/duration (host)
MPN examples (bench / interface)
Logic analyzer: Saleae Logic Pro 8 (SAL-00113) · I²C isolator (optional boundary test): ADuM1250 or ISO1540-Q1

Gate 1 — Interface connectivity (scan + read ID)

Confirm the device is discoverable at the intended address plan and returns stable identity/capability fields across repeated reads.

Stimulus
Address scan strategy (no uncontrolled “storm”) → read revision/ID/capability → repeat N times.
Waveforms (2 groups)
  • Bus: one full scan burst + ID read transaction
  • Proof: INT/ALERT (if present) for comm-error signaling
Pass / Fail
  • PASS: address is stable; ID/revision reads match every time; NACK/retry remains near-zero in the test window
  • FAIL: address intermittently disappears; ID varies; repeated NACK bursts or bus lockups
Evidence fields to record
revision ID, status word/byte (comm-related), NACK rate / retry counter (host), timestamp/counter
MPN examples (device under test)
I²C LED driver examples for connectivity exercises: PCA9955B (16-ch) / PCA9956B (24-ch)

Gate 2 — Programming safety (shadow → apply → readback)

Prove that configuration updates are atomic and auditable: write staging registers, apply in a single step, then read-back to verify.

Stimulus
Write shadow (not active) → set apply/commit bit → read-after-write verification (including PEC/CRC if enabled).
Waveforms (2 groups)
  • Bus: full write sequence (page/select + payload + apply)
  • Proof: INT/ALERT pulse timing around apply or error
Pass / Fail
  • PASS: readback matches written values; config CRC/PEC passes; last-error remains clear
  • FAIL: partial writes, mismatched readback, repeated retries, or apply produces inconsistent active behavior
Evidence fields to record
target page/bank, payload, checksum/PEC enable, apply flag, revision ID, config CRC, last-error code

Gate 3 — NVM commit robustness (including brownout drill)

Prove non-volatile defaults can be stored without bricking units. Validate both “clean commit” and “power-loss during commit” behaviors.

Stimulus
  • Normal commit → power-cycle → re-scan + readback
  • Brownout drill: interrupt power within the commit window → power-cycle → verify recovery path (safe image / invalid flag)
Waveforms (2 groups)
  • Bus: commit transaction + post-reset recovery reads
  • Proof: INT/ALERT (commit status / failure indication) or ILED (if commit affects output mode)
Pass / Fail
  • PASS: after any drill, the device is discoverable and identity is readable; image CRC indicates valid/invalid deterministically; rollback behavior is predictable
  • FAIL: address disappears permanently; persistent stuck bus; CRC/state becomes non-deterministic across retries
Evidence fields to record
commit counter, last commit status, image CRC, brownout flag, reset cause, timestamp/counter

Gate 4 — Dimming quality (curve consistency + fade smoothness + deep-dim stability)

Validate the dimming engine output as a measurable signal: consistent mapping from dim command to ILED, smooth fades without visible steps, and stable behavior at very low targets.

Stimulus
  • Curve: select curve ID → sweep a defined set of dim levels (low/mid/high)
  • Fade: execute up/down fades with fixed fade time and step policy
  • Deep dim: hold at minimum clamp for a dwell window and observe stability
Waveforms (2 groups)
  • Bus: dim command + fade programming sequence
  • Proof: ILED waveform (ripple, steps, dropouts, monotonicity)
Pass / Fail (example measurable criteria)
  • PASS: ILED is monotonic with dim code; fade has bounded step amplitude; deep-dim dwell shows no periodic drop-to-zero or uncontrolled jumps
  • FAIL: non-monotonic points, repeated step discontinuities, or deep-dim instability events above an allowed count
Evidence fields to record
curve ID, LUT/segment descriptor ID, fade time, step rate, min current clamp, dither enable (if any), effective current target, deep-dim instability counter

Gate 5 — Telemetry consistency (accuracy + latency + filter behavior)

Prove telemetry is trustworthy: raw-to-scaled mapping is consistent, update period behaves as specified, and stale/clip flags correctly describe data validity.

Stimulus
  • Read raw + scaled pairs repeatedly at a fixed rate
  • Apply a controlled change (e.g., dim step) and measure telemetry latency and settling
  • Validate stale/clip behavior by pausing reads or pushing ranges intentionally
Waveforms (2 groups)
  • Bus: telemetry polling burst (to correlate with update period)
  • Proof: INT/ALERT (threshold/abnormal indication) or ILED (to correlate telemetry vs output)
Pass / Fail (example measurable criteria)
  • PASS: update period stays within a bounded tolerance; scaled values track reference trends; stale flag asserts only when appropriate; clip flags align with forced range conditions
  • FAIL: update period jitter beyond tolerance, inconsistent scaling vs calibration, stale/clip flags unreliable
Evidence fields to record
telemetry raw vs scaled, calibration coefficients (or coeff set ID), update period, latency estimate, filter window ID, stale flag, range/clip flags, timestamp/counter

Gate 6 — Fault injection (flags → snapshot → log → controlled clear)

Make failures diagnosable on purpose. Inject a controlled fault, verify a log entry is created with the right context, then confirm clearing behavior is deliberate and does not erase evidence unintentionally.

Stimulus (controlled, interface-level)
  • Bus disturbance: create a short hung-bus condition (stuck-low) and verify recovery ladder
  • Threshold fault: trigger a defined alarm/limit crossing (without redesigning the power stage)
  • Then: read status → read snapshot fields → read log index/entry → apply explicit clear and confirm post-clear state
Waveforms (2 groups)
  • Bus: injection moment + recovery pulses + STOP re-sync
  • Proof: INT/ALERT timing (fault set/clear), plus ILED if output behavior is relevant
Pass / Fail
  • PASS: flags set with correct transient/latched semantics; log index advances; snapshot contains VIN/temp/targets; explicit clear is controllable and does not erase first-fault/history unexpectedly
  • FAIL: no log entry, missing snapshot context, uncontrolled auto-clear, or recovery enters a reset storm
Evidence fields to record
status word/byte, fault code, first-fault pointer, log index, timestamp/counter, clear method used, bus recovery count, reset cause, watchdog reset count
MPN examples (isolation / hot-plug robustness testing)
I²C isolators to validate boundary behavior: ADuM1250 / ADuM1251 · ISO1540-Q1
Gate-based Validation (Bring-up → Program → NVM → Dimming → Telemetry → Faults) Each gate: SCL/SDA + (ILED or INT) + pass/fail + evidence fields Validation gates G0 Bench sanity idle + reset proof G1 Scan + read ID address + identity G2 Program safely shadow→apply→rb G3 NVM commit brownout drill G4 Dimming quality curve + fade + deep G5 Telemetry raw/scaled + stale G6 Fault injection → log verify → controlled clear flags + snapshot + log index + clear method Bus: SCL/SDA Proof: ILED Proof: INT Evidence pack Pass/Fail status word/byte log index counter/time reset cause bus counters commit + CRC clear method
Cite this figure: Gate-based validation flow for programmable digital LED drivers, showing the ordered test gates, required waveform checkpoints (SCL/SDA + ILED/INT), and the evidence-pack fields to log. See References.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Programmable Digital LED Driver)

Each answer stays inside this page boundary and points back to measurable evidence fields (readback, counters, flags, CRC, effective targets, and log indices).

Write succeeds but behavior doesn’t change — shadow/apply issue or priority override?
If readback matches but output behavior doesn’t, it’s usually an apply/atomicity gap or a priority override. Confirm config moved from shadow to active, then check whether the resolver clamps the final output due to derating or a latched fault. First fix: enforce read-after-write + apply sequencing, then clear latch or relax override conditions.
Evidence fields: apply/commit flag, config CRC, effective current target, derating factor, fault latch bit. (→H2-4/H2-7)
Units drift apart after production — did NVM commit differ or calibration scaling differ?
Separate “stored defaults differ” from “same raw, different scaling.” First compare NVM image validity and versioning across units; then compare raw telemetry versus scaled values under the same stimulus. If raw matches but scaled diverges, calibration coefficients or coefficient set IDs are inconsistent. First fix: version coefficient sets, log the active set, and verify commit + CRC at end-of-line.
Evidence fields: commit counter, image CRC/version, telemetry raw vs scaled, cal coefficients/set ID, last commit status. (→H2-5/H2-8)
Dimming looks stepped at low levels — LUT resolution, step rate, or minimum clamp?
Low-level stepping usually comes from a coarse LUT (too few points near zero), a large engine step size, or a minimum-current clamp that collapses multiple dim codes to one target. Verify how many distinct targets exist below the clamp and whether dithering is enabled to smooth quantization. First fix: densify the low-end LUT, reduce step size/increase tick rate, and tune the clamp.
Evidence fields: curve ID, LUT points/segment ID, step rate, fade tick/step, min current clamp, dither enable, effective current target. (→H2-6)
Fade sometimes stutters — bus retries or engine tick starvation?
A stuttering fade is either command delivery jitter (retries/NACK storms) or engine timebase issues (tick/step not serviced). Correlate stutter moments with bus error counters; then compare the programmed fade time/step rate versus actual output steps. First fix: reduce competing traffic during fades, cap retries, and validate tick/step scheduling under worst-case polling.
Evidence fields: NACK rate, retry counter, bus error counters, fade time, step rate, effective current target trajectory. (→H2-3/H2-6/H2-10)
Telemetry numbers “freeze” — stale flag, update period, or bus hung?
Treat a “freeze” as either valid-but-slow data, invalid data (stale), or a transport failure. Check the stale flag and update period first; then check stuck-low detection and bus error counters. If stale asserts while the bus is healthy, it’s a sampling/refresh issue. If bus health collapses, run recovery and confirm reads resume before trusting any values.
Evidence fields: stale flag, update period, telemetry timestamp/counter, stuck-low detect, bus error counters, reset cause. (→H2-8/H2-10)
Fault clears but returns instantly — latch vs retry policy vs unresolved root cause?
If a fault returns immediately, either it’s latched (clear does nothing), the retry policy re-triggers too fast, or the root condition is still present. Check the latch bit and retry timer behavior; then confirm whether the first-fault pointer keeps pointing to the same code. First fix: gate clear operations on “condition removed,” add hysteresis/timers, and preserve first-fault context for diagnosis.
Evidence fields: status word/byte, fault latch bit, fault code, first-fault pointer, retry timer, derating factor/effective target. (→H2-9/H2-7)
Random NACK bursts in field — pull-up sizing, capacitance growth, or EMI spikes?
Classify NACK bursts by waveform evidence: slow edges and low high-level margin point to pull-up/capacitance issues, while sharp edges plus isolated glitches point to EMI spikes. Calculate rise-time margin from Rpullup and Cb, then correlate NACK clusters with bus error counters and any stuck-low events. First fix: tighten rise-time (stronger pull-up or lower speed), reduce traffic bursts, and add recovery thresholds.
Evidence fields: Rpullup, Cb, tr, high-level margin, NACK rate, bus error counters, stuck-low event count. (→H2-3/H2-10)
Bus stuck low after hot-plug — recovery sequence or reset cause chain?
After hot-plug, a stuck-low bus is often a reset/brownout chain that leaves a device holding SDA, or a missed recovery ladder. Confirm reset cause and stuck-low detection first. Then run a standard recovery: 9 SCL pulses, STOP, re-scan, and only then issue a device reset if needed. If isolation is used, verify bidirectional behavior at the boundary (e.g., ISO1540-Q1).
Evidence fields: reset cause, brownout flag, stuck-low detect, bus recovery count, scan result, last-error code. (→H2-10)
Two drivers respond to same address — strap conflict or soft-address mis-write?
Decide whether the address is hardware-derived or software-assigned. If the collision persists after power-cycle, suspect strap conflict; if it changes across commits, suspect a soft-address write to the wrong page/bank or missing readback verification. First fix: enforce an address ownership rule (one source of truth), log the address source, and require read-after-write + CRC/commit verification for any soft-address update.
Evidence fields: scan collision signature, target page/bank, readback address, config CRC, last-error code, commit counter/status. (→H2-3/H2-4)
Config corruption after brownout — commit atomicity or missing image CRC?
If corruption follows brownout, verify whether commit is truly atomic and whether image CRC is enforced on boot. A correct design must detect invalid images and fall back deterministically (factory image or last-known-good). First fix: use a two-phase commit (write + validate + mark valid), store version + CRC, and block loading user image when CRC fails. Always log reset cause and brownout flags for correlation.
Evidence fields: brownout flag, reset cause, last commit status, image CRC, commit counter, rollback reason/state. (→H2-5)
Logs exist but are not useful — missing snapshot fields or wrong trigger design?
A useful log needs three things: a trigger that captures “first cause,” a snapshot that explains context, and indices/timestamps that preserve ordering. If logs only contain a code, add snapshot fields like VIN, temperature, effective target, and derating factor. If ordering is unclear, fix ring-buffer semantics and expose log index plus first-fault pointer. First fix: define a minimum snapshot contract and validate it in fault-injection tests.
Evidence fields: fault code, log index, timestamp/counter, first-fault pointer, snapshot VIN/temp/effective target/derating, trigger ID. (→H2-9)
How to structure factory vs field updates safely — what must be immutable?
Split immutable factory essentials from field-tunable settings. Keep safety limits, address source rules, CRC/rollback logic, and calibration baselines immutable or tightly guarded. Allow field updates only in a user partition with write-rate limits and mandatory readback verification. First fix: implement factory+user images (or A/B), record image version/CRC/commit status, and require validation gates for any field update (program, commit, fault/log verification).
Evidence fields: image version, image CRC, commit counter/status, write-frequency policy flag, rollback reason/state, validation gate result. (→H2-5/H2-11)
MPN examples used for reproducible bench & boundary tests
I²C LED driver (test DUT class): PCA9955B, PCA9956B
I²C isolation boundary: ADuM1250, ADuM1251, ISO1540-Q1
Logic analyzer (transaction evidence): Saleae Logic Pro 8 (SAL-00113)