Programmable Digital LED Driver (I2C/PMBus, Telemetry & Logs)

Q: Write succeeds but behavior doesn’t change — shadow/apply issue or priority override?

If readback matches but behavior doesn’t, it’s usually an apply/atomicity gap or a priority override. Confirm shadow→active using apply/commit and config CRC, then check effective current target, derating factor, and fault latch bit. First fix: enforce read-after-write + apply sequencing, then clear latch or relax override conditions. Evidence: apply/commit, config CRC, effective target, derating, latch.

Q: Units drift apart after production — did NVM commit differ or calibration scaling differ?

Separate stored-default differences from scaling differences. Compare commit counter, last commit status, and image CRC/version across units, then compare telemetry raw vs scaled under the same stimulus. If raw matches but scaled differs, calibration coefficients/set IDs diverge. First fix: version and log coefficient sets and verify commit+CRC at end-of-line. Evidence: commit counter/status, image CRC, raw vs scaled, cal coeff set.

Q: Dimming looks stepped at low levels — LUT resolution, step rate, or minimum clamp?

Low-level stepping usually comes from coarse LUT density near zero, large engine steps, or a minimum-current clamp collapsing codes to one target. Verify LUT points/segment ID, step rate, min current clamp, and whether dithering is enabled. First fix: densify low-end LUT, reduce step size/increase tick rate, and tune clamp. Evidence: curve ID, LUT points, step rate, clamp, dither, effective target.

Q: Fade sometimes stutters — bus retries or engine tick starvation?

A stuttering fade is either command delivery jitter (retries/NACK bursts) or engine timebase issues (tick/step not serviced). Correlate stutter with NACK rate, retry counter, and bus error counters; then compare programmed fade time/step rate to output steps. First fix: reduce competing traffic during fades and cap retries; validate tick scheduling under worst-case polling. Evidence: NACK/retry/counters, fade time, step rate.

Q: Telemetry numbers “freeze” — stale flag, update period, or bus hung?

Treat a freeze as valid-but-slow updates, invalid data (stale), or transport failure. Check stale flag and update period first, then stuck-low detection and bus error counters. If stale asserts while bus is healthy, it’s sampling/refresh behavior; if bus health collapses, run recovery and confirm reads resume. Evidence: stale flag, update period, timestamp/counter, stuck-low detect, bus counters, reset cause.

Q: Fault clears but returns instantly — latch vs retry policy vs unresolved root cause?

If a fault returns immediately, it may be latched, re-triggered by retry policy, or caused by an unresolved condition. Check status word/byte and fault latch bit, then observe retry timer behavior and whether first-fault pointer repeats. First fix: gate clear on “condition removed,” add hysteresis/timers, and preserve first-fault context. Evidence: latch bit, fault code, first-fault pointer, retry timer, effective target/derating.

Q: Random NACK bursts in field — pull-up sizing, capacitance growth, or EMI spikes?

Classify by waveform evidence: slow edges and reduced high-level margin suggest pull-up/capacitance issues; sharp edges with isolated glitches suggest EMI spikes. Compute rise-time margin from Rpullup and Cb, then correlate NACK clusters with bus error counters and stuck-low events. First fix: tighten rise-time (stronger pull-up or lower speed), reduce traffic bursts, and enable recovery thresholds. Evidence: Rpullup, Cb, tr, NACK rate, counters, stuck-low count.

Q: Bus stuck low after hot-plug — recovery sequence or reset cause chain?

After hot-plug, stuck-low is often a reset/brownout chain leaving a device holding SDA, or a missed recovery ladder. Confirm reset cause/brownout flag and stuck-low detection. Run 9 SCL pulses + STOP, re-scan, then device reset if needed; verify isolation boundary behavior if present. Evidence: reset cause, brownout flag, stuck-low detect, bus recovery count, scan result, last-error code.

Q: Two drivers respond to same address — strap conflict or soft-address mis-write?

If collision persists after power-cycle, suspect strap conflict; if it changes across commits, suspect soft-address writes to the wrong page/bank or missing readback verification. First fix: define one address source of truth, log address source, and require read-after-write + CRC/commit verification for soft-address updates. Evidence: scan collision signature, target page/bank, readback address, config CRC, commit counter/status.

Q: Config corruption after brownout — commit atomicity or missing image CRC?

If corruption follows brownout, confirm whether commit is atomic and whether image CRC is enforced on boot. A correct design must detect invalid images and fall back deterministically. First fix: two-phase commit (write+validate+mark valid), store version+CRC, and block loading user image on CRC fail; log reset cause/brownout flags. Evidence: brownout flag, reset cause, last commit status, image CRC, commit counter, rollback state.

← Back to: Lighting & LED Drivers

A programmable digital LED driver is “worth it” when lighting behavior must be defined by data (register maps, curves, and policies) and proven by evidence (telemetry, counters, CRC, and event logs), not by fixed analog defaults. This page shows how to configure safely (shadow→apply→commit), deliver smooth dimming, and make faults diagnosable using measurable fields and a repeatable validation flow.

What it is, and when a “programmable” driver is worth it

A programmable digital LED driver is defined by a register image (what the product “is”), a dimming engine (how brightness changes over time), and read-back evidence (telemetry + fault logs) that makes field issues diagnosable.

Decision rule: “Programmable” is worth it when the product must be configured (SKU flexibility / calibration), controlled (curves & fades without artifacts), and observed (telemetry + logs for traceability) with measurable pass/fail evidence.

Three capability layers that must be delivered (not just claimed)

Config (write): the device becomes a specific product by writing a stable image (channel mapping, rated current limits, default behavior). Evidence must include revision/id and config check (CRC/PEC/readback).
Control (curves & fades): brightness is produced by a dimming engine (LUT/segments + fade timing). Evidence must include the effective target (what the IC is actually enforcing) to avoid “write succeeded but nothing changed”.
Observability (telemetry + fault logs): health and failures are exportable as structured data (status bytes/words, fault flags, event logs with snapshots). Evidence must include “stale/valid” and a way to correlate events (timestamp or monotonic counter).

Typical system roles (who owns what)

Host MCU / controller: discovers devices, writes the register image safely, issues runtime dimming commands, and records evidence (readback + error counters).
Driver IC: stores the image (shadow/active and optional NVM), executes curves/fades, and exports telemetry/status/log data.
Sensors (temperature / current sense): provide inputs for derating and diagnostics (used as evidence fields, not as a topology discussion).
Factory calibration tool: programs immutable defaults (factory image), writes calibration coefficients, and records version/CRC for traceability.

Evidence fields to lock at project start (minimum set)

Address plan: fixed strap vs soft address; collision avoidance on multi-drop buses.
Bus speed & margin: target bitrate + pull-up/capacitance budget (rise-time criterion).
Integrity: PEC/CRC enabled? mandatory read-after-write verification? retry policy for NACK/bus error.
NVM commit policy: when commits happen, how atomicity is ensured, and how brownout is detected/recovered.

Cite this figure: Capability layers and role/data-flow view for programmable digital LED drivers (conceptual). See References.

System architecture: separating the power stage from the digital plane

This page focuses on the digital plane (bus + registers + telemetry/logs). The power stage can be treated as a separate layer, as long as the digital plane remains measurable, recoverable, and noise-resilient.

Digital-plane signals and what they mean (practical semantics)

SCL / SDA: configuration writes, runtime control commands, and read-back evidence. Failure signatures include NACK bursts, glitches, and SDA stuck-low.
ALERT / INT (if present): event-driven indication that new status/log data is available, reducing blind polling and improving diagnosability.
EN / RESET: controlled bring-up and last-resort recovery when the bus hangs or the device enters an unknown state (useful for field robustness).
FAULT (if present): hardware-level fault indicator; it must be consistent with status flags/logs to support root-cause analysis.

Key architecture rule: every “digital plane” failure must be observable at a small set of measurement points (SCL/SDA + one event/reset line) and recoverable without power-cycling the entire luminaire.

Isolation and reference domains (interface-only impact)

Propagation delay and edge shaping can reduce timing margin at higher bus speeds; “looks fine on paper” can still fail under EMI.
Pull-up placement becomes non-trivial across an isolator; poor placement often shows up as slow rise time, NACK bursts, or unstable logic thresholds.
Bidirectional behavior can interact with clock stretching and bus recovery, making “stuck bus” incidents harder to clear unless reset/recovery is designed in.

Minimum measurement set (2 + 1) for fast diagnosis

TP_SCL: verify rise/fall time, glitches, and continuous clocking during transactions.
TP_SDA: verify ACK/NACK behavior and ensure SDA is not held low after an error.
TP_INT (or TP_RESET/TP_EN): confirm the device provides an observable event path and a deterministic recovery path.

Cite this figure: Digital-plane layering and test-point plan (conceptual). See References.

I²C / SMBus / PMBus essentials that actually break products

This chapter is written as a deliverable: it defines address policy, timing margin, and integrity/recovery rules that keep bus transactions stable under real wiring, capacitance, and noise.

Failure signatures to design against: NACK bursts → retry storms, “write OK but behavior unchanged”, bus stuck-low (hung bus), device not-ready windows, and long-cable variance across batches.

Address planning: strap vs soft address (and multi-device collision rules)

Strap address: predictable and production-safe; preferred when multiple identical devices share one bus and collisions must be structurally impossible.
Soft address: flexible but must include a deterministic source of truth (when it is written, who is allowed to write it, and how it is restored after reset).
Collision rule: if two devices respond to one address, the system must enter a no-write safety mode (stop configuration writes) until the conflict is resolved.
Scan policy: scanning is for discovery and inventory (read-only); configuration writes should require an explicit match on revision ID (and optional variant ID) to prevent writing the wrong target.

Timing margin: pull-up + bus capacitance + rise time + clock stretching

Rise-time budget is the real limiter in field wiring. Treat the bus as Rpullup + Cb: as Cb grows (cables, connectors, ESD parts), the edge slows and margin collapses.
Deliverable requirement: specify a target bus speed together with a maximum allowed Cb (or verified tr) and a recommended Rpullup range.
Waveform pass/fail: verify tr at TP_SCL/TP_SDA and ensure the HIGH level stays above the input threshold with margin (glitches near VIH/VIL are the common “works on bench, fails in product” cause).
Clock stretching: treat long SCL-low periods as a first-class spec item. Define host tolerance (timeout), record the worst-case stretch, and design retry/backoff so stretching cannot trigger retry storms.

Integrity and recovery: PEC/CRC, repeated start, ACK/NACK semantics, retries

PEC/CRC: enable when wiring/noise is not tightly controlled. Integrity must be measurable via counters (PEC failures, retry counters) rather than “seems stable”.
Repeated start: define whether the target supports it and how the host behaves if a repeated-start sequence fails mid-transaction (abort + recovery path).
ACK/NACK semantics: distinguish “not-present/not-ready” from “data rejected/busy” from “integrity failure”; each category must map to a different recovery action.
Retry policy: specify max retries, backoff timing, and a forced escape hatch (device reset or bus recovery) to prevent infinite retry loops.

Evidence checklist (loggable and testable)

Electrical: measured tr on SCL/SDA, estimated Cb, observed glitch count (if any), and worst-case clock-stretch duration.
Statistics: NACK rate, retry counter, PEC/CRC failure counter, and “stuck-bus incidents”.
Policy: address table, scan mode vs write mode boundary, and a defined recovery path for each failure class.

Cite this figure: I²C physical-layer model and waveform pass/fail criteria (conceptual). See References.

Register map strategy: pages, atomicity, and “safe writes”

Treat configuration as a transaction. A write is not complete until the intended image is verified and the active behavior is proven to match (via effective targets and status).

Pages / banks: scalability with explicit targeting

PAGE/BANK is an implicit state. If host and device disagree on the active page, writes silently land in the wrong place.
Rule: every configuration transaction must begin with an explicit target page selection and must record that page in the host log.
Group writes: when multiple channels must be consistent, use a group mechanism (page-wide apply or group commit) rather than sequential live edits.

Atomic update: Shadow → Verify → Apply → Active

Shadow: a staging area to build a coherent image (multiple fields, multiple registers) without exposing intermediate states.
Verify: read-back and/or checksum/PEC validation catches “half writes”, bus noise, and addressing mistakes before any change becomes live.
Apply: a single action that transfers shadow to active, enforcing “all-or-nothing” behavior. A busy/lock flag must gate apply to prevent partial transitions.
Active: the only truth for runtime behavior; reading effective targets and status must confirm that active equals the intended image.

Safe write transaction template (repeatable steps)

Select target: set PAGE/BANK and confirm device identity (revision ID).
Write payload: write fields to shadow (prefer block writes where supported) and record a transaction ID on the host.
Read-back verify: read critical fields (or config CRC if available); count errors and enforce a retry/backoff policy.
Apply: assert apply/group-commit; confirm completion (busy cleared) before proceeding.
Prove active: read effective targets + status; if mismatch, record last-error code and stop further writes (prevent cascading failure).

Evidence fields (minimum)

Transaction fields: target page, payload length, checksum/PEC status, apply/commit flag, and host-side retry count.
Read-back fields: revision ID, config CRC (or equivalent), effective targets, and last-error code / busy flag.

Cite this figure: Shadow→Verify→Apply→Active transaction model for safe register programming (conceptual). See References.

NVM/OTP: committing defaults without bricking units

Non-volatile storage turns “configuration” into a product promise. The goal is consistent defaults across production lots, safe field updates, and a provable rollback path when writes fail.

Core rule: a commit is never “done” until the target image is validated (CRC/version/valid bit) and the boot selection logic is proven to choose a safe image after resets and brownouts.

Image model: STORE / RESTORE / DEFAULT (without vendor-specific commands)

DEFAULT image (Factory baseline): the known-good configuration that must remain recoverable under all failure modes. Treat as read-mostly and immutable in the field.
USER image (Field configuration): optional customer/runtime preferences that may be updated, but must never override the ability to boot safely.
RUN image (Active/shadow): the working set used during normal operation. RUN may change frequently, but must not trigger frequent NVM writes.
STORE persists a selected image; RESTORE loads an image into RUN; DEFAULT is the emergency fallback when validation fails.

Brownout risk: why partial writes brick units (or create “silent corruption”)

Failure mode: power loss during commit can leave metadata updated but payload incomplete (or vice versa). The result is an image that “exists” but fails validation.
Silent corruption is worse than a hard fail: behavior changes after reboot with no obvious error unless CRC/version/valid bits are checked.
Minimum safeguards: commit must be gated by a “safe-to-write” condition (no brownout, stable reset cause), and must end with mandatory validation before switching the active image pointer.
After reset: boot selection must prefer the newest valid image; if CRC fails, it must roll back deterministically to a valid prior image (or DEFAULT).

Endurance and write-rate limits: factory vs field partitions

NVM is not a cache: frequent commits for dimming behavior, telemetry, or runtime tweaks will consume endurance and raise failure probability.
Partition strategy: separate “factory baseline” from “field preferences”. Factory partition should be write-protected outside production; field partition must be rate-limited and validated.
Commit policy: define allowed commit triggers (e.g., commissioning only), and enforce minimum intervals and maximum lifetime commit counts per partition.
Rollback discipline: never overwrite the only known-good image. Always keep at least one prior valid image in reserve.

Evidence checklist (must be readable via host or diagnostics)

Commit trace: commit counter, last commit status (OK/FAIL/INCOMPLETE), and a timestamp or sequence number if available.
Validation: image CRC per image (DEFAULT/USER A/USER B) plus version and valid bit.
Power event proof: brownout flag and reset cause around commits.

Cite this figure: NVM image layout with DEFAULT + USER A/B, CRC/version/valid metadata, and deterministic rollback (conceptual). See References.

Dimming engine: curves, fades, and what “linear” really means

Curves and fades are a pipeline problem: commands are interpreted by an engine, converted into a current target, then constrained by clamps and derating. “Linear” must be defined in the domain that matters.

Curve representations: LUT vs segmented linear vs polynomial (concept-level tradeoffs)

LUT: predictable and calibratable; low-level control can be dense where the eye is most sensitive. Cost is points and storage.
Segmented linear: compact and stable; good when register space is limited while still allowing “denser low end”.
Polynomial: few parameters but sensitive to edge behavior and numeric stability. Requires careful bounding to avoid overshoot near endpoints.
Key engineering point: the curve defines low-light resolution and step visibility, not just a mapping from “percent” to “current”.

Fades: timebase, step strategy, and avoiding visible stair-steps

Timebase: an internal engine tick is deterministic; host-timed updates can jitter with scheduling and bus latency.
Step strategy: a constant step size often looks “steppy” at low light. Better strategies densify steps near low levels or use time-normalized interpolation.
Interpolation: define how intermediate points are computed (nearest / linear between points). Even a LUT needs an interpolation policy for smooth fades.
Deep dimming stability: use a minimum current clamp to avoid dropouts and a controlled transition through the lowest region where quantization dominates.

Perceptual consistency: current-linear is not visually-linear

Gamma/log curves are practical tools for “equal perceived steps”. The point is not the math, but the measurable outcome: fewer visible jumps at low levels.
Define “linear” explicitly: linear in current, linear in perceived brightness, or linear in command scale. The chosen definition must match product expectations.
Stability knobs: clamp, optional dither, and derating must be placed in the pipeline to avoid unexpected jumps when constraints engage.

Evidence checklist (configuration + verification)

Config: curve ID, LUT points (or segment params), fade time, step rate, min current clamp, and dither enable (if supported).
Output: ripple vs dim level (trend), plus a low-level stability metric such as jitter/jump count per time window.

Cite this figure: Dimming curve pipeline and stability insertion points (clamp/derate/dither) with verification hooks (conceptual). See References.

Runtime control vs protection overrides: who wins

A dimming command is not the output. The output is the result of a priority resolver that combines runtime intent with protection and derating constraints. When “writes do nothing” in the field, the missing piece is usually the winning layer and its recovery rules.

Practical model: the resolver produces an effective current target. If any hard shutdown condition is active (UVLO/OTP/critical latch), the effective target becomes zero regardless of the runtime command.

Priority ladder (from highest to lowest)

Hard shutdown: UVLO / OTP / critical fault → effective target forced to 0 (safe state).
Fault latch: latched short/open/overcurrent → blocks output until clear rules are satisfied.
Thermal derating: scales down the target (derating factor) to avoid reaching a shutdown threshold.
Soft constraints: min clamp, slew limit, fade step-rate caps → reshape the target without declaring a fault.
Runtime intent: manual dim target / fade engine output (the “requested” target).

Soft derate vs hard off: recovery rules decide field behavior

Soft derate keeps output on but reduces brightness. It must include hysteresis to prevent oscillation near thresholds.
Hard off forces output to zero. It must define clear conditions (cool-down timer, retry budget) to avoid “never recovers” failures.
Recovery triggers typically combine: temperature below a release threshold, a minimum time window, and a stable input condition (no UVLO/reset churn).
Command consistency: after recovery, the resolver should return to the latest valid runtime intent, not an undefined intermediate value.

Debug method: prove the winning layer using four fields

effective current target: what is actually applied after all overrides.
derating factor: the scaling applied by thermal or other derate logic (explains “why dimmer”).
fault latch bit: indicates “blocked until cleared” conditions (explains “why stuck off”).
retry timer: remaining cool-down / retry delay (explains “when it may recover”).

A runtime write that “does not work” is diagnosable when the effective target is visible alongside derating and latch state.

Cite this figure: Priority resolver that merges runtime intent, derating, and protection layers into an effective current target (conceptual). See References.

Telemetry: what to measure, how to trust it, how to use it

Telemetry is a signal chain, not a list of numbers. Trust depends on sampling path, calibration, filtering, update timing, and explicit freshness/range flags. Use depends on thresholds and trends that can be executed locally by the host.

What to measure: categories that explain behavior

Supply health: VIN/VOUT indicates margin to UVLO and identifies sag events that change effective output.
Output proof: ILED (and duty/target indicators if available) proves whether the effective target is being met or constrained.
Thermal context: temperature explains derating engagement and proximity to shutdown thresholds.
Energy estimate: power estimation supports trend-based warnings and detects abnormal load or thermal drift patterns.

How to trust it: raw vs scaled, calibration, filtering, and freshness

Raw vs scaled: raw codes (ADC counts) expose clipping and offset; scaled values apply units and calibration coefficients.
Calibration coefficients: define absolute accuracy; coefficients should have an identifier or revision for traceability.
Filter window: reduces noise but adds latency; the host must interpret values in the context of the filter and update period.
Update period and stale flag: a fresh-but-late value is different from a stale value; both must be detectable.
Range/clip flags: if a channel is saturated or out-of-range, scaled values may be misleading even if they look stable.

How to use it locally: threshold alerts and trend warnings (no cloud required)

Threshold alerts: VIN near UVLO, temperature near derate/shutdown, ILED deviation from effective target beyond tolerance.
Trend warnings: rising temperature slope, repeated VIN dips, or increasing power estimate over time (indicates cooling degradation or load change).
Reliability checks: ignore updates marked stale; down-rank channels with clip flags; require persistence across multiple fresh samples.

Evidence checklist (minimum telemetry packet for field reproducibility)

telemetry raw and telemetry scaled captured together (same sample time).
cal coefficients (or coefficient set ID / revision).
update period and latency context (filter window implied by configuration).
stale flag to prevent using old values as proof.
range/clip flags to identify saturation and out-of-range behavior.

Cite this figure: Telemetry signal chain showing raw/scaled paths, calibration, filtering, register flags, and local host usage (conceptual). See References.

Fault flags & event logs: making failures diagnosable

A fault that cannot be explained becomes a product failure. The goal is traceability: a compact status view for “what is wrong now,” plus a durable history of “what happened first” and “what the system looked like at that moment.”

Traceability chain: Flags (state) + Time (counter) + Snapshot (context) + Log (history) → host decode.

Flags: transient vs latched (and why both matter)

Transient (“seen”) indicates an event occurred at least once. It is essential for intermittent issues that self-recover before anyone reads status.
Latched (“blocked”) indicates recovery is intentionally prevented until clear conditions are met. It protects safety and prevents rapid re-trigger cycles.
Interpretation rule: a clean “current state” without history cannot explain field reports. A durable “seen” bit without a current latch cannot explain whether the device is still in a faulted condition.

Multi-fault concurrency: avoid losing the root cause

Status word/byte as a full bitfield: preserves the “many things can be true” reality.
Fault code as a quick classifier: points to the dominant category for fast triage.
First-fault pointer: captures the earliest trigger so later secondary effects do not overwrite the root cause.
Event log (ring): records fault ordering so concurrency can be replayed rather than guessed.

Event logs: ring buffer + snapshot fields that actually diagnose

An event log should not only store “what fault happened,” but also “what the system looked like when it happened.” A minimal snapshot typically includes input margin (VIN/VOUT), thermal context (temperature), and control context (requested target vs effective target). These three are usually enough to distinguish override-driven behavior from genuine electrical faults.

Clear strategy: preserve evidence without blocking recovery

Clear-on-read: simple, but risky for diagnostics because evidence disappears when polled. Best limited to counters or non-critical transient summaries.
Explicit clear: preserves evidence until a deliberate action clears it. Best for latched faults and first-fault capture.
Power-cycle clear: can mask repeating issues if evidence vanishes on every restart. Use carefully, and prefer retaining first-fault/log history across resets when possible.

Evidence fields (minimum set for reproducible fault analysis)

status word/byte (full bitfield)
fault code (primary classifier)
first-fault pointer (root-cause anchor)
log index (ring buffer position)
timestamp/counter (ordering without requiring real time)

Cite this figure: Fault traceability pipeline showing status flags, first-fault capture, snapshot fields, ring-buffer logging, and host decoding (conceptual). See References.

Robustness: bus integrity under EMI, isolation, and hot-plug

A robust control bus must remain recoverable in noisy environments. The engineering goal is not “never errors,” but “errors are detectable, counted, and recoverable without human intervention,” including isolation boundaries and hot-plug disturbances.

EMI failure signatures that break products

SCL glitches: narrow pulses or spikes that look like extra clocks to state machines.
SDA bit flips: unintended data transitions causing corrupted bytes or false start/stop interpretation.
Hung bus: SCL or SDA held low (often SDA) so no new transaction can begin.

Recovery ladder: detect → free the bus → re-sync → escalate if needed

A practical recovery sequence starts with stuck-low detection (line low beyond a safe time budget), then applies clock-pulse recovery to release a device stuck mid-bit, and finally issues a STOP to force the bus back to idle. If the bus remains hung, escalation uses a device reset line or watchdog strategy.

Step 1 — detect: SDA (or SCL) low longer than a defined threshold → declare “hung.”
Step 2 — recover: drive 9 clock pulses to advance a stuck receiver through remaining bits.
Step 3 — re-sync: generate a STOP condition (SCL high while SDA rises) to return to idle.
Step 4 — escalate: if still hung, apply device reset or rely on watchdog to prevent permanent deadlock.

Isolation boundary effects (interface-level only)

Propagation delay reduces timing margin and can reshape edges seen by the bus.
Pull-up placement matters across the boundary; the “bus” can behave like two segments with different rise behavior.
Bidirectional limits can affect edge cases (including recovery pulses and any stretching-like behavior), so recovery must be validated across the isolation path.

Evidence fields (make robustness measurable)

stuck-low detection (count and/or duration)
bus error counters (NACK/timeouts/corruption proxy counters)
reset cause (bus-driven vs other sources)
watchdog reset count (deadlock prevention indicator)

Cite this figure: Hung-bus signature and bus recovery sequence using clock pulses and STOP, with escalation and interface-level isolation notes (conceptual). See References.

Validation plan: bring-up → programming → dimming quality → telemetry/log verification

This gate-based plan validates a programmable digital LED driver from first contact to diagnosable failures. Each gate defines what must be proven, the two waveform groups to capture, and pass/fail criteria that can be reused in R&D bring-up and production.

Waveform rule (per gate): always capture Bus (SCL/SDA) + One proof signal (ILED or INT/ALERT). If channels are limited, prefer SCL + ILED for dimming gates and SCL + INT for fault/log gates.

Gate 0 — Bench sanity (avoid false failures)

Prove the test setup is not generating bus faults: stable idle levels, no stuck-low, and predictable reset/INT behavior before any configuration writes.

Waveforms (2 groups)

Bus: SCL/SDA idle level + first transaction edge quality
Proof: INT/ALERT at power-up (or ILED if INT not available)

Pass / Fail

PASS: no sustained stuck-low; clean edges without repeated unintended pulses
FAIL: SDA (or SCL) held low beyond a defined window; recurring glitches at idle

Evidence fields to record

reset cause, timestamp/counter baseline, bus error counters (host), stuck-low count/duration (host)

MPN examples (bench / interface)

Logic analyzer: Saleae Logic Pro 8 (SAL-00113) · I²C isolator (optional boundary test): ADuM1250 or ISO1540-Q1

Gate 1 — Interface connectivity (scan + read ID)

Confirm the device is discoverable at the intended address plan and returns stable identity/capability fields across repeated reads.

Stimulus

Address scan strategy (no uncontrolled “storm”) → read revision/ID/capability → repeat N times.

Waveforms (2 groups)

Bus: one full scan burst + ID read transaction
Proof: INT/ALERT (if present) for comm-error signaling

Pass / Fail

PASS: address is stable; ID/revision reads match every time; NACK/retry remains near-zero in the test window
FAIL: address intermittently disappears; ID varies; repeated NACK bursts or bus lockups

Evidence fields to record

revision ID, status word/byte (comm-related), NACK rate / retry counter (host), timestamp/counter

MPN examples (device under test)

I²C LED driver examples for connectivity exercises: PCA9955B (16-ch) / PCA9956B (24-ch)

Gate 2 — Programming safety (shadow → apply → readback)

Prove that configuration updates are atomic and auditable: write staging registers, apply in a single step, then read-back to verify.

Stimulus

Write shadow (not active) → set apply/commit bit → read-after-write verification (including PEC/CRC if enabled).

Waveforms (2 groups)

Bus: full write sequence (page/select + payload + apply)
Proof: INT/ALERT pulse timing around apply or error

Pass / Fail

PASS: readback matches written values; config CRC/PEC passes; last-error remains clear
FAIL: partial writes, mismatched readback, repeated retries, or apply produces inconsistent active behavior

Evidence fields to record

target page/bank, payload, checksum/PEC enable, apply flag, revision ID, config CRC, last-error code

Gate 3 — NVM commit robustness (including brownout drill)

Prove non-volatile defaults can be stored without bricking units. Validate both “clean commit” and “power-loss during commit” behaviors.

Stimulus

Normal commit → power-cycle → re-scan + readback
Brownout drill: interrupt power within the commit window → power-cycle → verify recovery path (safe image / invalid flag)

Waveforms (2 groups)

Bus: commit transaction + post-reset recovery reads
Proof: INT/ALERT (commit status / failure indication) or ILED (if commit affects output mode)

Pass / Fail

PASS: after any drill, the device is discoverable and identity is readable; image CRC indicates valid/invalid deterministically; rollback behavior is predictable
FAIL: address disappears permanently; persistent stuck bus; CRC/state becomes non-deterministic across retries

Evidence fields to record

commit counter, last commit status, image CRC, brownout flag, reset cause, timestamp/counter

Gate 4 — Dimming quality (curve consistency + fade smoothness + deep-dim stability)

Validate the dimming engine output as a measurable signal: consistent mapping from dim command to ILED, smooth fades without visible steps, and stable behavior at very low targets.

Stimulus

Curve: select curve ID → sweep a defined set of dim levels (low/mid/high)
Fade: execute up/down fades with fixed fade time and step policy
Deep dim: hold at minimum clamp for a dwell window and observe stability

Waveforms (2 groups)

Bus: dim command + fade programming sequence
Proof: ILED waveform (ripple, steps, dropouts, monotonicity)

Pass / Fail (example measurable criteria)

PASS: ILED is monotonic with dim code; fade has bounded step amplitude; deep-dim dwell shows no periodic drop-to-zero or uncontrolled jumps
FAIL: non-monotonic points, repeated step discontinuities, or deep-dim instability events above an allowed count

Evidence fields to record

curve ID, LUT/segment descriptor ID, fade time, step rate, min current clamp, dither enable (if any), effective current target, deep-dim instability counter

Gate 5 — Telemetry consistency (accuracy + latency + filter behavior)

Prove telemetry is trustworthy: raw-to-scaled mapping is consistent, update period behaves as specified, and stale/clip flags correctly describe data validity.

Stimulus

Read raw + scaled pairs repeatedly at a fixed rate
Apply a controlled change (e.g., dim step) and measure telemetry latency and settling
Validate stale/clip behavior by pausing reads or pushing ranges intentionally

Waveforms (2 groups)

Bus: telemetry polling burst (to correlate with update period)
Proof: INT/ALERT (threshold/abnormal indication) or ILED (to correlate telemetry vs output)

Pass / Fail (example measurable criteria)

PASS: update period stays within a bounded tolerance; scaled values track reference trends; stale flag asserts only when appropriate; clip flags align with forced range conditions
FAIL: update period jitter beyond tolerance, inconsistent scaling vs calibration, stale/clip flags unreliable

Evidence fields to record

telemetry raw vs scaled, calibration coefficients (or coeff set ID), update period, latency estimate, filter window ID, stale flag, range/clip flags, timestamp/counter

Gate 6 — Fault injection (flags → snapshot → log → controlled clear)

Make failures diagnosable on purpose. Inject a controlled fault, verify a log entry is created with the right context, then confirm clearing behavior is deliberate and does not erase evidence unintentionally.

Stimulus (controlled, interface-level)

Bus disturbance: create a short hung-bus condition (stuck-low) and verify recovery ladder
Threshold fault: trigger a defined alarm/limit crossing (without redesigning the power stage)
Then: read status → read snapshot fields → read log index/entry → apply explicit clear and confirm post-clear state

Waveforms (2 groups)

Bus: injection moment + recovery pulses + STOP re-sync
Proof: INT/ALERT timing (fault set/clear), plus ILED if output behavior is relevant

Pass / Fail

PASS: flags set with correct transient/latched semantics; log index advances; snapshot contains VIN/temp/targets; explicit clear is controllable and does not erase first-fault/history unexpectedly
FAIL: no log entry, missing snapshot context, uncontrolled auto-clear, or recovery enters a reset storm

Evidence fields to record

status word/byte, fault code, first-fault pointer, log index, timestamp/counter, clear method used, bus recovery count, reset cause, watchdog reset count

MPN examples (isolation / hot-plug robustness testing)

I²C isolators to validate boundary behavior: ADuM1250 / ADuM1251 · ISO1540-Q1

Cite this figure: Gate-based validation flow for programmable digital LED drivers, showing the ordered test gates, required waveform checkpoints (SCL/SDA + ILED/INT), and the evidence-pack fields to log. See References.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Programmable Digital LED Driver)

Each answer stays inside this page boundary and points back to measurable evidence fields (readback, counters, flags, CRC, effective targets, and log indices).

Write succeeds but behavior doesn’t change — shadow/apply issue or priority override?

If readback matches but output behavior doesn’t, it’s usually an apply/atomicity gap or a priority override. Confirm config moved from shadow to active, then check whether the resolver clamps the final output due to derating or a latched fault. First fix: enforce read-after-write + apply sequencing, then clear latch or relax override conditions.

Evidence fields: apply/commit flag, config CRC, effective current target, derating factor, fault latch bit. (→H2-4/H2-7)

Units drift apart after production — did NVM commit differ or calibration scaling differ?

Separate “stored defaults differ” from “same raw, different scaling.” First compare NVM image validity and versioning across units; then compare raw telemetry versus scaled values under the same stimulus. If raw matches but scaled diverges, calibration coefficients or coefficient set IDs are inconsistent. First fix: version coefficient sets, log the active set, and verify commit + CRC at end-of-line.

Evidence fields: commit counter, image CRC/version, telemetry raw vs scaled, cal coefficients/set ID, last commit status. (→H2-5/H2-8)

Dimming looks stepped at low levels — LUT resolution, step rate, or minimum clamp?

Low-level stepping usually comes from a coarse LUT (too few points near zero), a large engine step size, or a minimum-current clamp that collapses multiple dim codes to one target. Verify how many distinct targets exist below the clamp and whether dithering is enabled to smooth quantization. First fix: densify the low-end LUT, reduce step size/increase tick rate, and tune the clamp.

Evidence fields: curve ID, LUT points/segment ID, step rate, fade tick/step, min current clamp, dither enable, effective current target. (→H2-6)

Fade sometimes stutters — bus retries or engine tick starvation?

A stuttering fade is either command delivery jitter (retries/NACK storms) or engine timebase issues (tick/step not serviced). Correlate stutter moments with bus error counters; then compare the programmed fade time/step rate versus actual output steps. First fix: reduce competing traffic during fades, cap retries, and validate tick/step scheduling under worst-case polling.

Evidence fields: NACK rate, retry counter, bus error counters, fade time, step rate, effective current target trajectory. (→H2-3/H2-6/H2-10)

Telemetry numbers “freeze” — stale flag, update period, or bus hung?

Treat a “freeze” as either valid-but-slow data, invalid data (stale), or a transport failure. Check the stale flag and update period first; then check stuck-low detection and bus error counters. If stale asserts while the bus is healthy, it’s a sampling/refresh issue. If bus health collapses, run recovery and confirm reads resume before trusting any values.

Evidence fields: stale flag, update period, telemetry timestamp/counter, stuck-low detect, bus error counters, reset cause. (→H2-8/H2-10)

Fault clears but returns instantly — latch vs retry policy vs unresolved root cause?

If a fault returns immediately, either it’s latched (clear does nothing), the retry policy re-triggers too fast, or the root condition is still present. Check the latch bit and retry timer behavior; then confirm whether the first-fault pointer keeps pointing to the same code. First fix: gate clear operations on “condition removed,” add hysteresis/timers, and preserve first-fault context for diagnosis.

Evidence fields: status word/byte, fault latch bit, fault code, first-fault pointer, retry timer, derating factor/effective target. (→H2-9/H2-7)

Random NACK bursts in field — pull-up sizing, capacitance growth, or EMI spikes?

Classify NACK bursts by waveform evidence: slow edges and low high-level margin point to pull-up/capacitance issues, while sharp edges plus isolated glitches point to EMI spikes. Calculate rise-time margin from Rpullup and Cb, then correlate NACK clusters with bus error counters and any stuck-low events. First fix: tighten rise-time (stronger pull-up or lower speed), reduce traffic bursts, and add recovery thresholds.

Evidence fields: Rpullup, Cb, tr, high-level margin, NACK rate, bus error counters, stuck-low event count. (→H2-3/H2-10)

Bus stuck low after hot-plug — recovery sequence or reset cause chain?

After hot-plug, a stuck-low bus is often a reset/brownout chain that leaves a device holding SDA, or a missed recovery ladder. Confirm reset cause and stuck-low detection first. Then run a standard recovery: 9 SCL pulses, STOP, re-scan, and only then issue a device reset if needed. If isolation is used, verify bidirectional behavior at the boundary (e.g., ISO1540-Q1).

Evidence fields: reset cause, brownout flag, stuck-low detect, bus recovery count, scan result, last-error code. (→H2-10)

Two drivers respond to same address — strap conflict or soft-address mis-write?

Decide whether the address is hardware-derived or software-assigned. If the collision persists after power-cycle, suspect strap conflict; if it changes across commits, suspect a soft-address write to the wrong page/bank or missing readback verification. First fix: enforce an address ownership rule (one source of truth), log the address source, and require read-after-write + CRC/commit verification for any soft-address update.

Evidence fields: scan collision signature, target page/bank, readback address, config CRC, last-error code, commit counter/status. (→H2-3/H2-4)

Config corruption after brownout — commit atomicity or missing image CRC?

If corruption follows brownout, verify whether commit is truly atomic and whether image CRC is enforced on boot. A correct design must detect invalid images and fall back deterministically (factory image or last-known-good). First fix: use a two-phase commit (write + validate + mark valid), store version + CRC, and block loading user image when CRC fails. Always log reset cause and brownout flags for correlation.

Evidence fields: brownout flag, reset cause, last commit status, image CRC, commit counter, rollback reason/state. (→H2-5)

Logs exist but are not useful — missing snapshot fields or wrong trigger design?

A useful log needs three things: a trigger that captures “first cause,” a snapshot that explains context, and indices/timestamps that preserve ordering. If logs only contain a code, add snapshot fields like VIN, temperature, effective target, and derating factor. If ordering is unclear, fix ring-buffer semantics and expose log index plus first-fault pointer. First fix: define a minimum snapshot contract and validate it in fault-injection tests.

Evidence fields: fault code, log index, timestamp/counter, first-fault pointer, snapshot VIN/temp/effective target/derating, trigger ID. (→H2-9)

How to structure factory vs field updates safely — what must be immutable?

Split immutable factory essentials from field-tunable settings. Keep safety limits, address source rules, CRC/rollback logic, and calibration baselines immutable or tightly guarded. Allow field updates only in a user partition with write-rate limits and mandatory readback verification. First fix: implement factory+user images (or A/B), record image version/CRC/commit status, and require validation gates for any field update (program, commit, fault/log verification).

Evidence fields: image version, image CRC, commit counter/status, write-frequency policy flag, rollback reason/state, validation gate result. (→H2-5/H2-11)

MPN examples used for reproducible bench & boundary tests

I²C LED driver (test DUT class): PCA9955B, PCA9956B
I²C isolation boundary: ADuM1250, ADuM1251, ISO1540-Q1
Logic analyzer (transaction evidence): Saleae Logic Pro 8 (SAL-00113)

Programmable Digital LED Driver (I2C/PMBus, Telemetry & Logs)

Programmable Digital LED Driver (I2C/PMBus, Telemetry & Logs)

What it is, and when a “programmable” driver is worth it

Three capability layers that must be delivered (not just claimed)

Typical system roles (who owns what)

Evidence fields to lock at project start (minimum set)

System architecture: separating the power stage from the digital plane

Digital-plane signals and what they mean (practical semantics)

Isolation and reference domains (interface-only impact)

Minimum measurement set (2 + 1) for fast diagnosis

I²C / SMBus / PMBus essentials that actually break products

Address planning: strap vs soft address (and multi-device collision rules)

Timing margin: pull-up + bus capacitance + rise time + clock stretching

Integrity and recovery: PEC/CRC, repeated start, ACK/NACK semantics, retries

Evidence checklist (loggable and testable)

Register map strategy: pages, atomicity, and “safe writes”

Pages / banks: scalability with explicit targeting

Atomic update: Shadow → Verify → Apply → Active

Safe write transaction template (repeatable steps)

Evidence fields (minimum)

NVM/OTP: committing defaults without bricking units

Image model: STORE / RESTORE / DEFAULT (without vendor-specific commands)

Brownout risk: why partial writes brick units (or create “silent corruption”)

Endurance and write-rate limits: factory vs field partitions

Evidence checklist (must be readable via host or diagnostics)

Dimming engine: curves, fades, and what “linear” really means

Curve representations: LUT vs segmented linear vs polynomial (concept-level tradeoffs)

Fades: timebase, step strategy, and avoiding visible stair-steps

Perceptual consistency: current-linear is not visually-linear

Evidence checklist (configuration + verification)

Runtime control vs protection overrides: who wins

Priority ladder (from highest to lowest)

Soft derate vs hard off: recovery rules decide field behavior

Debug method: prove the winning layer using four fields

Telemetry: what to measure, how to trust it, how to use it

What to measure: categories that explain behavior

How to trust it: raw vs scaled, calibration, filtering, and freshness

How to use it locally: threshold alerts and trend warnings (no cloud required)

Evidence checklist (minimum telemetry packet for field reproducibility)

Fault flags & event logs: making failures diagnosable

Flags: transient vs latched (and why both matter)

Multi-fault concurrency: avoid losing the root cause

Event logs: ring buffer + snapshot fields that actually diagnose

Clear strategy: preserve evidence without blocking recovery

Evidence fields (minimum set for reproducible fault analysis)

Robustness: bus integrity under EMI, isolation, and hot-plug

EMI failure signatures that break products

Recovery ladder: detect → free the bus → re-sync → escalate if needed

Isolation boundary effects (interface-level only)

Evidence fields (make robustness measurable)

Validation plan: bring-up → programming → dimming quality → telemetry/log verification

Gate 0 — Bench sanity (avoid false failures)

Gate 1 — Interface connectivity (scan + read ID)

Gate 2 — Programming safety (shadow → apply → readback)

Gate 3 — NVM commit robustness (including brownout drill)

Gate 4 — Dimming quality (curve consistency + fade smoothness + deep-dim stability)

Gate 5 — Telemetry consistency (accuracy + latency + filter behavior)

Gate 6 — Fault injection (flags → snapshot → log → controlled clear)

Request a Quote

Accepted Formats

Attachment

FAQs (Programmable Digital LED Driver)

Explore

Categories

Get in Touch