DALI-2 / D4i Interface Design & Metering
← Back to: Lighting & LED Drivers
DALI-2 / D4i is the “two-wire control + data contract” that makes luminaires interoperable and maintainable: get bus power and edge/timing margins right first, then commission reliably and expose standardized D4i diagnostics/energy/runtime data. In practice, stable field performance comes from an evidence-driven loop—measure (BUS_V, edges, frames, queues) → isolate the root cause → apply the smallest fix without breaking the bus.
DALI-2 vs D4i: Where the Interface Sits in a Luminaire System
This chapter locks the system boundary: who sends commands, who executes them, and what changes when a design targets DALI-2 interoperability versus D4i standardized luminaire data.
A) System roles (separate responsibilities, avoid scope mix)
- Application Controller: initiates commissioning, addressing, grouping, scenes, and reads back status/data from field devices.
- Control Gear: implements the bus interface, decodes commands, executes output behavior deterministically, and exposes standardized device data.
- This page focuses on the Control Gear interface layer: transceiver front-end + bus power hooks + firmware behaviors that third-party systems validate.
B) What DALI-2 adds (engineering meaning)
DALI-2 is primarily about interoperability: consistent behavior under common test methods. The technical risk is not “missing a feature” but non-deterministic behavior when multiple vendors interact.
- Deterministic state handling: explicit busy/timeout behavior, predictable retries, and stable state after brownouts.
- Commissioning survivability: discovery and address assignment should converge reliably, not “work only on a quiet bench.”
- Proof-first implementation: maintain counters (retries/timeouts/framing errors) and minimal event logs for commissioning and field debug.
C) What D4i adds (why the data matters)
D4i builds on DALI-2 and standardizes what luminaire data is available and where it is stored. The design goal is operational transparency: devices can be commissioned, audited, and maintained via standardized reads.
- Luminaire data: identity and capability metadata (useful for asset tracking and consistent commissioning).
- Energy & runtime: accumulated metrics that enable auditing, usage profiling, and maintenance planning.
- Diagnostics: standardized health indicators and fault/event records that reduce truck rolls and guesswork.
D) Minimum evidence chain (what to prove before calling it “ready”)
- Role correctness: the device behaves as Control Gear (reliable query responses + deterministic command execution).
- Link health: measurable response success rate and stable retry/timeout counts under representative bus loading.
- D4i robustness (if applicable): energy/runtime counters are monotonic; persistence across power cycles is consistent; reporting does not congest the bus.
Physical Layer Basics: 2-Wire Bus, Topology, and Wiring Constraints
Most “DALI instability” issues originate in the physical layer: bus power margin, distributed capacitance, edge quality, and noise coupling. This chapter converts wiring into a measurable evidence :contentReference[oaicite:1]{index=1}/b>.
A) 2-wire, non-polarity bus: what it implies in hardware
- Non-polarity improves installation, but protection and coupling must be symmetric (both stress directions must be safe).
- Bus power is part of signaling: current limiting defines “idle high” under load; margin changes with node count and cable length.
- Decoder margin is edge-driven: Manchester-coded signaling relies on transitions; slow edges reduce the safety margin against noise.
B) Topology and wiring: why trunk + stubs change reliability
In real installations, the bus behaves like a distributed network. Cable capacitance and branch discontinuities distort edges, and the failure mode is typically intermittent (depends on load, environment, and noise).
- Long trunk → higher distributed capacitance → slower rise/fall edges → smaller decoding margin.
- Many stubs → local capacitance steps + reflections → ringing near decision level → occasional bit errors.
- Connectors/terminals → contact resistance drift → edge deformation and higher susceptibility to common-mode pickup.
C) “Cable capacitance load” as an evidence chain (make it testable)
Treat cable capacitance as a design parameter. The goal is to keep the receiver’s sampling away from threshold noise and ringing. A fast proof method uses a two-point capture plus a single-variable change:
- Measure TP_NEAR and TP_FAR: compare rise time, overshoot/ringing, and noise around the decision level during active traffic.
- Correlate with counters: retries, timeouts, framing errors (baseline vs after wiring modifications).
- One-variable change: reduce stub length or remove a branch; a capacitance-driven issue improves immediately when load is reduced.
D) EMC filtering and isolation boundary (interface-level only)
- Filtering trade-off: EMI fixes that soften edges can increase retries/timeouts; validate every change with waveform + counters.
- Isolation impact: isolation elements can add delay and reduce edge steepness; place them so the transceiver still sees clean transitions.
- Protection behavior matters: surge/ESD clamping and current limiting must recover cleanly, without “sticking” the bus near threshold.
DALI Bus Power: Budgeting, Regulation Window, and Protection
Bus power is not a “background utility” in DALI: it directly determines signal margin, edge quality, and commissioning convergence. This chapter turns bus power into a calculable budget and a verifiable acceptance.
A) What bus power must guarantee (write it as acceptance)
- Regulation window under load: the bus must stay inside the communication voltage window at the far end, with the maximum node count and representative wiring.
- Stable signaling levels: idle and active levels must remain stable during traffic, without hovering near the receiver threshold.
- Dynamic event survivability: insertion, short events, and surge clamping must recover automatically, without requiring a power cycle.
B) Budget template: static + dynamic + margin (reusable)
A practical budget separates what is always present from what is event-driven. The goal is to prove margin under both steady-state and worst-case transient conditions.
- Static draw: sum of all bus-interface loads (each node), plus any bus-powered controller-side load if applicable.
- Dynamic peaks: cable charging, node insertion, and protection transitions (events that momentarily increase demand).
- Protection margin: current-limit threshold tolerance, thermal drift of limit components, and clamp behavior after surges.
C) Protection that does not break communication (short, foldback, surge)
Protection should prevent damage and preserve recoverability. The most common failure mode is a “half-alive bus”: not fully shorted, but clamped or current-limited near the decoder’s decision band.
- Short-circuit limiting: choose a limiting strategy that avoids long dwell near threshold and provides clean recovery once the short is removed.
- Foldback behavior: verify foldback does not create oscillation (repeated collapse/restart) that looks like random protocol errors.
- Surge/ESD clamp: ensure the clamp path is fast and strong, and that it releases cleanly (no lingering leakage that drags VBUS down).
- Non-polarity tolerance: protection and coupling must be symmetric so wiring direction does not change stress handling.
D) Brownout / hold-up boundary (optional, interface-level)
- Quiet exit: on undervoltage, release the bus cleanly (high-impedance behavior) to avoid dragging the line in the threshold region.
- State consistency: commissioning should converge after a brownout; avoid partial writes that create address/data inconsistency.
- Persistence policy: if counters/logs exist, define what survives power loss and how monotonicity is preserved.
Transceiver Front-End: Coupling, Level Shifting, Isolation, and EMC Hooks
Treat the DALI front-end as a reusable interface cell: protection + receive shaping + transmit drive + optional isolation, with explicit test points. This keeps designs stable even when MCU or transceiver ICs change.
A) Define the reusable “interface cell” boundary
- Inputs: DALI 2-wire bus (non-polarity), including surge/ESD stress environment.
- Outputs: logic-level RX/TX (UART or equivalent) and optional FAULT/STATUS indicators.
- Mandatory test points: TP_BUS (line), TP_RX (post-shaping), TP_TX (drive node) to correlate waveforms with error counters.
B) Receive path: clamp/filter → comparator → clean logic
Receive robustness is determined by threshold margin and edge integrity, not by “stronger filtering.” Over-filtering often reduces transition steepness and shrinks decode margin.
- Clamp: protects against ESD/surge; it must not flatten normal signaling into the decision band.
- Filter: remove fast spikes and common-mode pickup; validate that rise/fall time remains adequate for decoding.
- Comparator / shaping: ensure noise near the decision level stays below the effective threshold margin during active traffic.
C) Transmit path: switch/driver → line shaping without breaking decode
- Drive level + edge control: edges should be steep enough for margin, but not so aggressive that ringing crosses the decision band.
- Return path awareness: transmit switching can inject ground bounce that contaminates RX; keep the cell layout and reference clean.
- Shaping as a verified step: any RC/series elements must be accepted only if they improve both waveforms and retries/timeouts.
D) Optional isolation + EMC hooks (interface-level)
- When isolation is needed: large ground potential differences or harsh interference environments; evaluate impact on edge timing and thresholds.
- Where to place isolation: typically on the logic side or as a modular block; avoid placing it where it degrades TP_BUS edge quality.
- EMC hooks: separate common-mode vs differential-mode paths; avoid fixes that improve emissions but push signaling into the decision band.
Protocol Essentials You Must Implement Correctly (Timing, Encoding, Collisions)
This section avoids spec restatement and focuses on the few protocol mechanics that directly decide interoperability: Manchester edge placement, effective sampling margin under real wiring, and how collisions manifest as field failures.
A) Manchester in engineering terms: edge placement + sampling margin
- Manchester is edge-driven: receivers infer symbols from transitions, so edge position matters more than absolute high/low level.
- Sampling margin shrinks in the field: long harness capacitance and “helpful” filtering can slow edges and push transitions toward window boundaries.
- Noise becomes deterministic: spikes near the decision band look like extra transitions and can be decoded as valid symbols.
B) Frame tolerance: why “small jitter” becomes bit flips
Most “random” errors are actually repeatable: a transition drifts into the sampling boundary or a spike creates an extra crossing. The interoperability check is not “it works once,” but “it stays stable when the margin is reduced.”
- Over-filtering trap: fewer visible spikes, but slower rise/fall reduces decode margin and increases retries.
- Software timing trap: ISR latency or timer drift under CPU load shifts sampling relative to edges.
- Decision-band contamination: ringing or clamp leakage can hover near threshold and create false edges.
C) Collisions and arbitration: the engineering consequence
- Collision signature: the bus shows abnormal occupancy or mixed-level behavior, which triggers timeouts and “stuck commissioning.”
- Multi-master is not required: noise-induced “pseudo-transmit” can behave like a second master during discovery or assignment.
- Failure cascade: collisions increase retries → retries extend bus activity → margin worsens → commissioning stops converging.
D) Interoperability-ready checklist (practical)
- Capture frames at near and far points: verify edge placement remains inside the effective sampling window.
- Sweep wiring (length/stubs) and node count: verify retry/timeout counters do not show step changes.
- Stress noise conditions: confirm the bus recovers from collisions without entering “infinite retry” behavior.
Commissioning: Random Address Assignment, Discovery, and Persistence
Commissioning must work across bench debug, production lines, and field replacement. The goal is not “assign once,” but converge reliably, verify explicitly, and survive power events.
A) Pipeline: Discover → Assign → Verify → Commit
- Discover: build a reliable “seen list” and confirm bus health before attempting assignment (avoid writing into a marginal bus).
- Assign: allocate short addresses using a collision-aware strategy; avoid parallel writes that create inconsistent device state.
- Verify: always perform readback checks; treat “no readback” as not assigned.
- Commit: make persistence explicit; record success/fail reason codes and provide a deterministic recovery path.
B) Persistence principles (interface-level, but testable)
- Atomicity: a configuration becomes valid only after a final “valid flag/version” step; partial writes must not appear as valid.
- Versioning: store a minimal config_version (or equivalent) to detect stale data and control migrations.
- Write discipline: avoid excessive NVM writes; rate-limit and coalesce updates to protect endurance.
C) Why configs get lost after swap/brownout (root-cause list)
- Brownout during commit: valid flag never set, version mismatch, or inconsistent readback across power cycles.
- Identity changes on replacement: a swap looks like a new device; address conflict scans should run before assignment.
- Address conflict: duplicate short addresses cause “wrong luminaire reacts” and discovery instability.
- Silent NVM failure: writes fail under undervoltage/temperature without a failure marker; commissioning appears successful but does not persist.
D) Practical SOP: production + field replacement
- Production: reset/initialize → discover → assign → verify → commit → power-cycle → verify again.
- Field: after replacement, run conflict scan → discover → assign + verify → commit; avoid “blind retries” when convergence stalls.
- Escalation: if convergence time suddenly increases with node count, treat it as a bus-margin issue first (retry/timeout bursts + TP waveforms).
Grouping, Scenes, and Control Behavior That Impacts User Experience
Grouping and scenes are not “UI features” in the field. They are observable behaviors: command priority, fade consistency across luminaires, and predictable state after dropouts. This section defines behaviors you can measure and accept.
A) Priority and conflict handling (engineering rules)
- Broadcast / group / unicast: define a deterministic rule for simultaneous or back-to-back commands (e.g., last-wins with a minimum hold, or explicit override classes).
- Scene vs direct level: specify whether a direct level command interrupts a scene fade immediately, ramps to a new target, or waits until fade completes.
- Command storms: implement queue protection (merge/drop policy) to avoid output “hunting” and bus overload during commissioning or noisy links.
B) Scene and fade consistency (what must be identical)
- Start alignment: luminaires in the same group should begin transition within a bounded sync error, or the scene looks “broken.”
- Curve consistency: the fade curve shape must be stable across devices (and after reboot), not just “smooth” on one unit.
- Interrupt rules: define how fades are interrupted and resumed; inconsistent policies create visible steps and mismatched brightness.
C) Interface boundary to the driver execution layer (no scope creep)
- Control intent: the DALI layer should output a clean intent: target level + fade parameters + state intent (hold/restore/interrupt).
- Execution point: the driver layer maps intent to actual current/PWM. Implementation may differ, but observable behavior must match.
- Traceability: log or timestamp “setpoint applied” so field behavior can be correlated to bus events and arbitration decisions.
D) Dropout and recovery behavior (predictable state)
- Restore policy: define whether the luminaire restores last state, defaults, or a safe state after power/bus loss.
- No surprise replay: avoid uncontrolled “replay” that causes visible jumps when the bus returns.
- Group re-sync: after recovery, align group behavior so one unit does not fade late and “chase” the scene.
D4i Data Model: Luminaire Data, Diagnostics, and an Interoperability Mindset
D4i’s core value is operational data standardization: a third-party controller can discover a luminaire, read consistent data objects, and use them for maintenance and diagnostics without vendor-specific guesswork.
A) Data categories: what to provide and why it matters
- Static: identity, rated parameters, firmware/version — used for asset inventory and compatibility checks.
- Accumulated: runtime/energy/counters — used for maintenance planning and lifetime tracking.
- Event-driven: faults/warnings/maintenance events — used for fast diagnosis and closed-loop service actions.
B) Interoperability first: stable fields beat “private beauty”
- Presence matters: a field that reliably exists and is readable is more useful than a fancy private extension.
- Stable semantics: keep units, ranges, and meaning stable across firmware; changes must be versioned and backward-compatible.
- No ambiguity: avoid controller-specific interpretations; validate against at least one third-party controller.
C) Update strategy: static vs accumulated vs event-driven
- Static: factory-written; update only on firmware/product change, with explicit version markers.
- Accumulated: update periodically or on thresholds; rate-limit to protect NVM endurance.
- Event-driven: write on event with debounce/aggregation to avoid “event storms” and excessive writes.
D) Validation plan: prove third-party readability
- Interop matrix: at least one third-party controller reads static/accumulated/event objects without ambiguity.
- Consistency: repeated reads match within expected tolerance; accumulated fields move only per policy.
- Compatibility: firmware updates preserve object meaning or provide versioned migration.
Energy & Runtime Metering: Measurement Chain, Accuracy, and Reporting
Metering must be a closed loop: what to sense → how to compute → how to accumulate → how to report → how to verify. This section defines measurement semantics and the evidence needed to prove accuracy, continuity, and “no impact” on control responsiveness.
A) Sense-point selection defines data semantics
- Input-side sensing (mains/DC input): reflects total luminaire energy including auxiliary rails and losses; best match for “billable” energy.
- Output-side sensing (LED current/voltage): reflects driver output energy closely tied to dimming behavior; requires careful handling of ripple/PWM.
- Driver-internal estimation (switch metrics): low BOM cost but model-dependent; interoperability is harder because semantics vary by implementation.
B) Accuracy drivers: build an error budget instead of “tuning a constant”
- Sensor errors: shunt tolerance/TC, amplifier offset/drift, hall bias, gain error.
- Sampling errors: ADC quantization, reference drift, aliasing under PWM/ripple, sampling phase jitter.
- System coupling: ground bounce and common-mode noise coupling into the measurement node; temperature gradients across sense parts.
- Compute errors: RMS vs average mismatch, windowing choices, insufficient filtering creating report “jitter.”
C) Accumulated energy & runtime: continuity, overflow, and power-loss behavior
- Counter definition: choose units and width (Wh/mWh; seconds/minutes) and define overflow behavior (rollover vs saturate) explicitly.
- Continuity on power loss: use checkpoints so energy/runtime remains continuous across brownouts; define the allowed discontinuity bound.
- Calibration principle: prefer factory calibration; if multi-point is used, version the coefficients and keep backward compatibility.
D) Reporting strategy: avoid bus congestion and data “jitter”
- Refresh separation: static objects (rare), accumulated objects (low-rate), event objects (triggered + debounced).
- Bandwidth control: threshold-based updates, smoothing/windowing, and staggered schedules across many nodes.
- Control-first policy: when queues rise or retries increase, metering must degrade first (lower rate / defer / drop) before affecting dimming latency.
Firmware Architecture: Stacks, Logs, and Fault-Handling Without Breaking the Bus
Many DALI failures come from firmware structure: blocking paths, retry storms, and uncontrolled logging. A robust architecture keeps control deterministic, makes data-plane tasks degradable, and exits faults quietly.
A) Layering: PHY → Frame → Command → Data Model
- PHY: timing-critical receive/transmit primitives; produce clean symbols and capture edge/timing statistics.
- Frame: decode/validate frames and classify errors; do not embed “business retries” here.
- Command: apply arbitration and behavior rules (group/scene/interrupt/restore); own the user-visible state machine.
- Data Model: implement D4i objects with caching, versioning, and update policies (static/accumulated/event-driven).
B) Task/queue design: non-blocking with backpressure
- Rx/Tx decouple: ISR performs minimal work and enqueues; parsing and actions occur in tasks.
- Control-first: dimming/scene commands outrank metering reports and log export.
- Backpressure: when queue depth grows, degrade metering/logging first (reduce rate, defer, or drop) before affecting control latency.
- Retry discipline: retries use backoff and maximum caps; avoid “retry storms” that lock the bus.
C) Logging that helps debugging without killing NVM
- RAM ring buffer: capture high-rate events with timestamps and reason codes; export on demand.
- Aggregation counters: compress repeated events into counters (burst counts) instead of writing each event.
- NVM summaries: persist only critical summaries and checkpoints with rate limits and explicit write budgets.
D) Fault strategy: “quiet exit” on short/brownout/bus abnormal
- Bus abnormal / collision: enter a bounded silence window and probe at a controlled interval; avoid continuous Tx attempts.
- Short / voltage window violation: stop high-rate activity; keep minimal health markers and wait for recovery.
- Brownout: preserve counter continuity via checkpointing rules, then restart without replay storms.
H2-11. Validation & field debug playbook: symptom → evidence → isolate → fix
This chapter is a field-ready diagnostic workflow for DALI-2 / D4i interfaces. The emphasis is on minimal tools, first-two measurements, and evidence-based branching (no standard re-telling).
A. Minimal field toolkit & standardized test points (TP)
- DMM: bus DC level, short check, voltage drop along wiring.
- Scope (small is OK): bus edges, droop/overshoot, short-circuit recovery.
- Frame capture: DALI sniffer or logic analyzer at the interface front-end (Tx/Rx).
- Device counters/logs (optional but powerful): frame error counters, retry rate, queue depth, brownout flags.
bus_v_idle=____V, bus_v_min(load)=____V, droop_event=____mV/____ms, rise_time=____µs, noise_pkpk=____mV, frame_err={manchester:__, stopbit:__, checksum:__}, retry_rate=__%, txq_peak=__, brownout_flag=__.
B. One-screen debug matrix (symptom → 2 measurements → discriminator → first fix)
| Symptom | First 2 measurements | Discriminator (what proves root cause) | First fix action (lowest risk) | Example MPNs to inspect / swap |
|---|---|---|---|---|
| No response / not discovered |
(1) TP1: bus_v_idle & bus_v_min under load (2) TP2: any Tx/Rx edge activity at front-end |
Low/unstable BUS_V → current limit foldback/short/wiring drop BUS_V OK but no edges → front-end path broken (Rx clamp / opto / comparator) Edges exist but frames invalid → timing/filters damaging edges |
Segment the line (remove branches) → isolate short/load Reduce bus load and re-test; verify current limit behavior Temporarily bypass “heavy filtering” and re-check frame integrity |
Bus PSU modules: RELV4-16, DLP-04R Isolated front-end: TCLT1000 optocoupler (Tx/Rx), MMBT2222A-TP NPN Zener clamp example: MM5Z5V1 |
| Intermittent dropouts |
(1) TP1: droop/overshoot during dropout (2) TP3: capture retry bursts / collisions |
Dropout aligns with BUS_V droop → power budget / surge / foldback Retry storm with stable BUS_V → firmware backoff/queue overload Error spikes on long lines → capacitive load edge deformation |
Throttle non-critical traffic (meter/log refresh), then re-test stability Add retry ceiling + randomized backoff; ensure “quiet exit” on errors Rework topology: trunk + short spurs; reduce capacitive loading |
Reference MCU stacks: PIC16F1779, PIC16F1947, MSPM0G3507 Bus PSU modules: RELV4-16, DLP-04R |
| Some commands ignored (groups/scenes inconsistent) |
(1) TP3: compare same command across devices (2) TP4: cmd drop/merge counters; Tx queue peak |
Only fails under high traffic → queue/backpressure policy wrong Group members respond inconsistently → commissioning data mismatch Fade/scene inconsistent → behavior mapping not uniform |
Re-run write→read-back verification for group/scene records Prioritize control commands over metering/log exports Unify fade timing rules; avoid blocking delays in command handler |
NVM endurance risks: if logs are stored, check NVM wear & commit policy MCU example families: PIC16F18326 / PIC16F1779 / MSPM0G3507 |
| Multiple controllers conflict (multi-master pain) |
(1) TP3: collision frequency; overlapping frames (2) TP2: line level during arbitration |
Frames overlap at start bit → insufficient listen-before-talk / backoff Collisions increase with noise → false edge detection / thresholds |
Enforce idle-detect before transmit; implement randomized backoff Tighten Rx deglitching without smearing valid edges |
Isolation/repeaters for segmentation: Lunatone 86458401 (DALI repeater / galvanic isolation) |
| Metering drift / discontinuity (D4i energy/runtime) |
(1) Compare to external power meter (spot check) (2) TP4: counter rollover / brownout flags |
Step jumps after power events → missing atomic commit / brownout handling Slow drift only → sense placement / scaling / temperature coefficient |
Add atomic commit (A/B or journal) for counters; store brownout reason Validate scaling with a known load; lock update rate to avoid bus congestion |
If bus-powered logic is used, validate hold-up path + brownout reset supervisor (system-level choice). Use certified ecosystem references via DALI Product Database when selecting D4i gear. |
C. Debug decision tree (F11)
The tree below starts at a user-visible symptom and forces a quick separation into: bus power, waveform integrity, commissioning, collisions, and firmware queue/log storms.
D. Parts-oriented checklist (quick swaps that resolve 80% of field failures)
These are common “swap points” in real fixtures. The goal is fast isolation, not vendor lock-in.
1) Bus power & current limit (if TP1 is unstable)
- Swap-in known-good DALI bus PSU module to prove the problem is upstream: RECOM RELV4-16 or MEAN WELL DLP-04R.
- Verify current limit behavior: if foldback is too aggressive, devices “blink” in/out of discovery (TP3 shows retry bursts).
2) Isolated transceiver front-end (if TP1 is OK but TP2/TP3 fail)
- Optocouplers on Tx/Rx paths (example from reference circuits): TCLT1000 (x2 for Tx + Rx).
- Discrete driver/receiver transistor often used around optos: MMBT2222A-TP (NPN).
- Input clamp / threshold shaping example: MM5Z5V1 (5.1 V zener) in the logic-side protection network.
3) Firmware stack reference MCUs (if TP3 is valid but behavior collapses under traffic)
- Microchip examples: PIC16F1779 (DALI-2 transceiver implementation), PIC16F1947 (DALI interface app note), PIC16F18326 (common in lighting reference designs).
- TI example platform: MSPM0G3507 used in DALI reference implementations for controller/DUT roles.
4) Multi-master segmentation / isolation (if collision rate is high)
- Segment/extend with galvanic isolation where needed: Lunatone DALI Repeater (Art. Nr. 86458401) as a field-proven option.
E. “First 60 seconds” workflow (what to do on-site)
- Measure TP1 (BUS_V): record idle and worst-case under load.
- Capture TP3 (frames): confirm start bit + Manchester integrity and error counters.
- If TP1 is bad → isolate wiring/short/load; prove with known-good PSU module (RELV4-16 / DLP-04R).
- If TP1 is good but TP3 is bad → focus front-end (opto/transistor/clamp) and edge integrity.
- If TP1 and TP3 are good but behavior is inconsistent → commissioning read-back, then firmware queues/retry limits.
H2-12. FAQs
These FAQs capture long-tail debugging intent without scope creep. Each answer anchors to a measurable evidence chain (TP1 BUS_V, TP2 edge integrity, TP3 frame/retry, TP4 queue/log/brownout).