123 Main Street, New York, NY 10001

Bypass & Redundant Channel for LED Driver Systems

← Back to: Industrial Sensing & Process Control

Bypass and redundant channel design in LED drivers is not about adding extra hardware—it is about making every transfer evidence-driven, state-controlled, and auditable. A reliable system proves why it switched, how it switched, and that the mechanism still works, through measurable signals, structured voting logic, and verifiable event logs.

What “Bypass / Redundant Channel” Means in LED Driver Systems

A bypass design removes a failed element from the series energy path so the luminaire can keep operating in a controlled, traceable way. A redundant channel design switches the load to a separate healthy path (or ORs two paths) when the primary path becomes untrustworthy.

Series-path bypass Parallel redundancy Transfer / switchover Fail-safe state Evidence-driven decision

This topic stays focused on controlled power-path continuity under faults: (1) what exactly is bypassed or replaced, (2) what evidence proves a fault is real, and (3) what the system must do when evidence is incomplete. Interface protocols and converter topology details are treated only as boundary conditions, not as the subject of this page.

Two patterns that must not be mixed:

  • Auto-bypass (series-path): a bypass element bridges a failed point (e.g., an open segment), keeping current flowing through a defined safe path. The bypass element becomes part of the safety case and must be diagnosable.
  • Redundant channel (parallel): the load is transferred to a separate channel (Path-A → Path-B) using an ORing or transfer element. The system must prove the inactive path is not silently failed (latent fault).

Where it appears in lighting (as design triggers, not product categories):

  • Availability-critical installations (e.g., roadway/tunnel): “no full blackout” is the top requirement; degradation is allowed if auditable.
  • High maintenance-cost sites (industrial/high-bay): automatic transfer reduces downtime while logs guide service action.
  • Multi-segment / multi-string luminaires: isolating or bypassing a failing segment avoids cascading failure across the entire lamp.
  • Safety-dominant emitters (concept-level only): when hazard is high, the default state is conservative even if availability suffers.
Bypass vs Redundant Channel (Definition Map) Same goal: controlled continuity • Different energy path strategy Series-path Auto-bypass Source LED Load Failed point Bypass Parallel Redundant Channel Path A Path B Transfer / ORing LED Operating states (concept) Normal Primary path active Bypass / Transfer Controlled switchover Fail-safe OFF When evidence is insufficient Cite this figure /lighting/
Figure F1 — Concept boundary: series-path auto-bypass versus parallel redundant channel, plus the three-state safety model.

Scope boundary: this page focuses on controlled power-path continuity, health evidence, fail-safe policy, and auditable behavior. Interface protocols, converter derivations, and full EMC filter design are intentionally out of scope.

System Requirements & Failure Philosophy

“Reliability” becomes actionable only when it is expressed as testable requirement fields. A bypass/redundant design must be specified in terms of what is allowed to happen during faults, what must never happen, and what evidence is required before switching states.

Requirement fields to define before circuit choices:

  • Availability target: uptime objective and what “degraded operation” means (e.g., reduced current, reduced segments, limited runtime).
  • Max blackout time: the maximum allowable interruption during bypass/transfer (ms-level target for critical installations).
  • Switchover stress limit: maximum inrush/overshoot during transfer and the acceptable settling window.
  • Output accuracy bound: maximum allowed current error after transfer (steady-state and transient).
  • Recovery policy: automatic recovery vs service-only recovery; retry rate limit and latch conditions.
  • Auditability: what events must be recorded, how long logs must persist, and what minimum fields are required per event.
  • Proof-test interval: how often the bypass/transfer mechanism must be verified to prevent latent faults.

Fail-safe default must be explicitly chosen for the “uncertain evidence” condition. In other words, when health signals disagree, or when a suspected fault cannot be confirmed, the design must decide whether the safe action is OFF or BYPASS / TRANSFER. This decision depends on the hazard class: availability-critical installations may prioritize continuity, while safety-dominant emitters require conservative shutdown unless sufficient evidence supports continued operation.

Latent-fault tolerance is mandatory: the bypass mechanism, the transfer element, and the health monitor can fail silently. Requirements must specify (1) what constitutes a latent fault, (2) how it will be detected (diagnostic evidence), and (3) how the detection will be proven later (audit records). Without this, redundancy can create a false sense of safety.

Copyable requirement checklist (fill-in):

  • MTBF goal: ____
  • Max blackout time: ____ ms
  • Max inrush during switchover: ____ A (____ µs window)
  • Allowed current error after transfer: ____ % (transient) / ____ % (steady)
  • Max transfers per hour: ____
  • Default safe state when evidence is insufficient: OFF / BYPASS (choose one + rationale)
  • Log retention: ____ events or ____ days
  • Proof-test interval: ____ months
Requirements & Failure Philosophy (Field Map) Turn “reliability” into measurable acceptance criteria A) Availability Uptime target Allowed degradation Max blackout time B) Switchover Stress Max inrush / overshoot Settling window Transfer rate limit C) Fail-safe Policy Evidence insufficient OFF BYPASS / TRANSFER D) Auditability Event taxonomy Minimum evidence fields Retention + proof-test interval Acceptance Criteria Test plan + audit proof Cite this figure /lighting/
Figure F2 — Requirements field map: availability, switchover stress, fail-safe policy, and auditability converge into acceptance criteria.

Reference Architecture: Dual-Path Power + Dual-Sense + Control Plane

A redundant lighting channel design stays robust only when the architecture is expressed as three layers: power paths (what carries energy), evidence inputs (what proves health), and a control plane (what decides, supervises, and records). This reference architecture is reused across later chapters (voting, state machine, diagnostics coverage, audit logs, and verification).

Power paths: Path-A / Path-B ORing / Transfer element Bypass element Evidence inputs: I / V / T Contact feedback Supply-good CTRL + Watchdog + Event Log

Canonical blocks (kept intentionally protocol- and topology-agnostic)

  • Power paths: Path-A and Path-B carry energy to the same LED load. A transfer element enforces mutual exclusivity (one path at a time), while an ORing element supports parallel tolerance (both may conduct depending on conditions).
  • Bypass element: a dedicated actuator that bridges or isolates a failed point/segment under controlled policy. It is treated as a safety-relevant component that requires diagnosability and proof testing.
  • Sensing inputs (evidence only): current (I), voltage (V), temperature (T), contact feedback (CF), supply-good (PG), and isolation/ground-leak status (ISO) are modeled as inputs to the decision logic. Subsystem implementation stays out of scope here.
  • Control plane: a controller (concept-level) runs health evaluation and state transitions; a watchdog enforces a known-safe behavior during control failure; and a nonvolatile event log preserves evidence for later audit and service diagnostics.

Design rule: switching or bypass actions require at least two independent evidence signals (e.g., I + V, or Vdrop + CF) to avoid single-sensor false positives and common-mode failures.

Reference Architecture (Reusable Canon) Dual paths + evidence inputs + control plane Power Plane Path A Path B ORing / Transfer SW-OR / SW-XFER LED Load Bypass Actuator SW-BYP Control Plane CTRL Watchdog Event Log NVM / Ring Health Monitor Inputs (Evidence) I V T CF Contact PG Supply ISO Leak Cite this figure /lighting/
Figure F3 — Reusable reference architecture: Path-A/Path-B, ORing/transfer, bypass actuator, evidence inputs, and a control plane with watchdog and event log.

Bypass Element Choices: Relay, MOSFET, SSR, eFuse — Tradeoffs That Matter

Bypass and transfer actuators must be selected under lighting-specific constraints such as surge exposure, long cable transients, thermal headroom, and creepage/clearance boundaries. The correct choice is not determined by a single “rated current” number; it is determined by whether the actuator remains controllable, diagnosable, and stable across fault and switchover conditions.

Mechanical relay

Strengths: very low conduction loss and strong surge tolerance. Risks: contact wear/weld, bounce, and coil-driven noise. Diagnostics must consider “commanded state” vs “actual contact state.”

Back-to-back MOSFET

Strengths: fast switching, no mechanical wear, and precise control. Risks: SOA and surge stress, plus gate-drive failure modes. Diagnostics rely on Vds + current + temperature evidence.

Solid-state relay (SSR)

Strengths: simple control and mechanical robustness. Risks: leakage in OFF state and limited surge capability. Leakage must be accounted for to avoid unintended residual current.

eFuse / hot-swap switch

Strengths: protection + telemetry in one device, often with fast fault response. Risks: conduction loss and transient behavior that can look like faults if thresholds and timing are not aligned with the system policy.

Decision fields (keep as bullet checks, not a giant table)

  • Peak surge current / energy: whether the actuator survives lightning/surge events without drifting into latent damage.
  • Steady loss (thermal): heat dissipation margin under sealed housings and high ambient temperature.
  • Isolation requirement: whether a physical open is required, and how creepage/clearance constraints shape component choice.
  • Diagnostic observability: whether welded-on, stuck-off, or half-on states can be proven by evidence (Vdrop/Vds, current, temperature, feedback).
  • Lifetime model: contact cycles, thermal cycling, and surge count endurance—not just steady current rating.
  • Response time: ability to meet max blackout time and settling windows defined in requirements.
  • Leakage: OFF-state leakage impact on residual current and unintended illumination behavior.
Actuator Comparison (Bypass / Transfer) Current paths + diagnostic sensing points Relay Bypass Source LED Load Relay I Sense I (series) Vd Sense Vdrop across contacts Failure modes to prove Welded-on Bounce / chatter Back-to-back MOSFET Bypass Source LED Load MOSFET Pair Vds Sense Vds (health) I Sense I T Sense Temp Stress to validate SOA / Surge Gate-drive faults Cite this figure /lighting/
Figure F4 — Relay vs back-to-back MOSFET: compare current paths and the minimum sensing points needed to diagnose welded/stuck/half-on behaviors.

Selection principle: choose the actuator that can meet surge and thermal constraints and provide auditable evidence of its true state (not just its command state), aligned with the max blackout time and fail-safe policy.

Channel-Health Monitoring: What to Measure and What It Proves

A channel-health monitor must be evidence-driven: actions (bypass/transfer/fail-safe) should follow only from signal changes that prove a fault class and exclude common false positives such as brownout and transient coupling. Evidence is grouped into four domains: LED path, power path context, thermal, and actuator truth.

LED path: ILED / Vstring Events: Ripple / Dropout Context: PG / BusSag / Brownout Thermal: ΔT / dT/dt Actuator: CF / Vdrop / Gate

Core measurements (and the fault classes they support)

  • LED path evidence: ILED waveform and dropout events indicate whether energy is truly delivered. Vstring/headroom helps separate “open-like” behavior from “supply margin” issues. Ripple and dropout bursts are strong indicators for intermittent connectors and segment discontinuities.
  • Power-path context (flag level): input brownout margin and intermediate bus sag explain whether LED-path anomalies may be caused by upstream instability. Switch-node abnormality flags (not detailed waveforms) can annotate abnormal converter states without drifting into topology discussions.
  • Thermal evidence: hotspot-to-ambient ΔT and runaway signatures (dT/dt) distinguish overload and thermal coupling failures from benign ambient changes. Rate-of-rise is typically a stronger indicator than absolute temperature alone.
  • Bypass/transfer actuator truth: relay coil drive evidence, contact feedback (CF), MOSFET gate status, and Vdrop/Vds across the element validate “commanded state vs actual conduction,” enabling detection of welded-on, stuck-off, and half-on states.

Evidence rule: do not trigger bypass/transfer from a single sensor. Require a minimal evidence set (typically 2–3 signals) per fault type, and treat context-only signals (PG/BusSag) as gating evidence to avoid brownout-driven false actions.

Fault types (defined by minimal evidence sets)

  • Open LED string: ILED drops or becomes discontinuous and Vstring/headroom rises toward limit; dropout events increase.
  • Short / bypassed segment: Vstring collapses and ILED deviates (over/limited); context flags may indicate stress.
  • Current drift: ILED offset persists with supportive thermal/voltage evidence; distinguish aging vs transient conditions.
  • Intermittent connector: bursty dropout events with correlated Vstring jumps; avoid confusing with intended PWM dimming modes.
  • Thermal overload: ΔT or dT/dt indicates runaway; current limiting may appear as a secondary signature.
  • Actuator stuck/welded: commanded OFF yet Vdrop/current indicates conduction; CF mismatches command.
Fault Signature Matrix (Evidence Map) Each fault requires a minimal set of changing signals Required change Possible change Not relied upon I Vstring Dropout PG BusSag T CF Vdrop Fault types Open string Short Current drift Intermittent Thermal overload Stuck / welded Cite this figure /lighting/
Figure F5 — Compact fault signature matrix: map fault classes to minimal evidence signals to reduce false trips and improve diagnosability.

Voting Logic & Redundancy Patterns: 1oo2, 2oo2, 2oo3 (Practical View)

Voting logic is valuable only when it is framed in engineering terms: what it saves (missed faults, unsafe continuation) versus what it costs (false trips, hardware complexity, validation burden). Voting should operate on independent evidence channels rather than on duplicated signals that share the same failure causes.

1oo2 (one-out-of-two)

High availability: a single channel can trigger bypass/transfer. Cost: higher false-trip risk if one evidence chain is noisy or drifting. Requires strict retry limits and strong audit logging.

2oo2 (two-out-of-two)

High safety: action occurs only when both channels agree. Cost: reduced availability and slower response when one chain is degraded. Suits cases where false actions are more harmful than brief loss of availability.

2oo3 (two-out-of-three)

Balanced behavior: tolerates one bad chain while avoiding single-chain false trips. Cost: more sensors, logic, and validation effort. Best when both availability and safety are important and the system can afford complexity.

Handling disagreement (engineer-friendly policy)

  • Enter degraded mode: limit actions (e.g., reduce current, lock out repeated transfers) while collecting more evidence.
  • Increase sampling confidence: extend observation window, raise sampling rate, or require repeated consistent signatures before action.
  • Request service / proof-test: when evidence remains inconsistent, record a service-needed event and schedule a proof test of the actuator.
  • Escalate to fail-safe OFF: when hazard is high or evidence cannot be trusted, prioritize safe state over availability.

Independence rule: voting inputs should avoid common-mode failures (shared ADC reference, shared ground path, shared firmware bug). Prefer mixed-domain evidence (e.g., I + Vdrop + CF) over duplicated measurements with shared error sources.

Voting With Independent Evidence Channels Avoid common-mode failures by separating references and paths Evidence Channels Ch1 I V PG Ref A GND A FW A Ch2 Vdrop CF T Ref B GND B FW B Ch3 BusSag Dropout Ref C GND C FW C Voting 1oo2 / 2oo2 / 2oo3 Disagreement handling Degraded → Sample ↑ → Service Decision Output State machine Bypass / Transfer Common-mode pitfalls (avoid) ! Shared ADC Ref ! Shared GND ! Shared FW Path Cite this figure /lighting/
Figure F6 — Practical voting block: independent evidence channels feed a voting decision, with common-mode pitfalls explicitly called out.

Switchover State Machine: Debounce, Transfer, Re-try, Latch Policies

Redundancy becomes stable only when it is governed by an explicit state machine. Without timers, rate limits, and verification, “smart switchover” can degrade into repeated transfers that look like random flicker. The state machine below separates transient filtering from confirmed faults, enforces cooldown to prevent oscillation, and records audit logs at each decision point.

Tconfirm (debounce) Tretry (backoff) Tholdoff (cooldown) Rate limit: N/hour Verify window Latch vs Auto-recover

State intent (why each state exists)

  • Normal: establish baseline statistics (dropout counts, temperature trend, bus margin) and prevent unnecessary actions.
  • Suspect: apply debounce to exclude brief transients; require minimal multi-signal evidence before escalation.
  • Confirmed fault: commit a fault class decision using evidence sets; decide whether transfer/bypass is permitted for the hazard class.
  • Transfer / bypass: execute the action under rate limits and safe timing rules; immediately transition into verification.
  • Verify: confirm that the new path carries the load and the old path is truly isolated; detect actuator failure modes early.
  • Degraded run: stabilize operation when evidence is inconsistent or capacity is reduced; restrict further transfers and collect more data.
  • Service required: request proof-test/maintenance when repeated attempts, rate limits, or unsafe ambiguity is reached.

Key mechanisms (rules that prevent flicker-like oscillation)

  • Debounce windows: Suspect must persist for Tconfirm with consistent evidence (e.g., I+V or dropout counts) before confirming.
  • Rate limits: enforce max transfers per hour and a minimum dwell time on each path to protect actuators and reduce visible disturbance.
  • Cooldown / thermal settle: after transfer, hold actions for Tholdoff to allow electrical and thermal stabilization before any re-try.
  • Latch vs auto-recover: lock out automatic recovery when hazard class requires deterministic behavior; allow auto-recover only with bounded retries and logging.

Audit requirement: log on entry to Suspect, on fault confirmation, on every transfer attempt, and on verify outcome. Include evidence snapshots (I/V/T/CF/Vdrop + context flags) so that false positives and actuator faults can be proven after the fact.

Switchover State Machine (Stability First) Timers + verification + rate limits prevent random flicker Normal LOG Suspect LOG Confirmed fault LOG Transfer LOG Verify LOG Degraded run Service required Tconfirm Tholdoff Rate limit: N/hour Tretry Latch / Auto-recover Cite this figure /lighting/
Figure F7 — State diagram with Tconfirm (debounce), Tretry (backoff), Tholdoff (cooldown), rate limits, and explicit logging points to keep redundancy stable and auditable.

Diagnostics Coverage: Detecting Welded Contacts, Stuck MOSFETs, and False Positives

Redundancy is auditable only when diagnostic coverage is explicit: welded contacts, stuck switches, and false positives must be detectable using repeatable tests and multi-signal correlation. Coverage improves when tests are executed inside a defined safe window (stable supply, not during transfer, energy-limited) and when measurements compare commanded states against measured conduction evidence.

Relay weld detection

When commanded OPEN, verify isolation by measuring Vdrop across contacts (and/or an energy-limited test stimulus where safe). A mismatch between command and conduction evidence indicates welded or stuck behavior.

MOSFET stuck-on detection

When commanded OFF, compare gate status against Vds and path current evidence. OFF command with low Vds or sustained current indicates stuck-on or shorted conduction.

Stuck-off detection

When commanded ON, verify current rises and Vds/Vdrop falls. If no current flows, gate context signals (PG/BusSag) and cross-check the alternate path to avoid confusing supply collapse with actuator failure.

False-positive suppression

Correlate evidence across domains (I + V + T, or I + CF + Vdrop) instead of single thresholds. Use context flags as gating signals to prevent brownout-driven false trips.

Built-in self-test (BIST) principles (safety-first)

  • Safe window: run BIST only when supply-good is stable, not during transfer, and with an energy-limited stimulus.
  • Expected reading: define what must change (and what must not) for each test so that coverage is verifiable.
  • Independence: do not rely on a single measurement chain; compare command vs independent conduction evidence.

Coverage statement: welded-on / stuck-on / stuck-off can be proven when BIST stimuli are available and command states are compared against Vdrop/Vds and path-current evidence under stable context (PG/BusSag gated).

BIST Injection Points (Diagnostics Coverage) Energy-limited tests + expected readings under a safe window SAFE WINDOW PG stable Not during transfer Energy limited Dual-path + actuator under test Path A Actuator Relay / MOSFET Path B Actuator Relay / MOSFET ORing / Transfer LED Load CTRL + BIST engine Itest Gate toggle Vprobe Expected (OPEN): Vdrop high, I ~ 0 Expected (ON): Vds low, I rises Weld/stuck proof: Cmd ≠ Evidence Cite this figure /lighting/
Figure F8 — BIST injection points and expected readings: prove welded/stuck conditions by comparing commanded states against conduction evidence (Vdrop/Vds and path current) under a safe window.

Audit Logs & Evidence: What to Record So Failures Can Be Proven

Bypass and redundancy become “auditable” only when event logs preserve a complete evidence chain: what was measured, how the decision was made, and what action was executed. An evidentiary log is not a narrative; it is a set of records that can survive field disputes by showing the exact evidence snapshot, thresholds, firmware identity, and sequence ordering.

FAULT_DETECTED VOTE_DISAGREE BYPASS_COMMAND BYPASS_CONFIRMED RECOVERY_ATTEMPT LATCHED_OFF PROOF_TEST_PASS PROOF_TEST_FAIL

Fields per event (what makes the record evidentiary)

  • Identity & ordering: event_type, channel_id, record_id, monotonic_ctr (no rollback), and record_crc/commit marker.
  • Time provenance: timestamp plus timestamp source (RTC / network / relative). If time is uncertain, the source must say so.
  • Decision reproducibility: reason_code, vote_mode, state_from→state_to, and the exact threshold set used.
  • Evidence snapshot: raw or reduced measurements (I/V/T/CF/Vdrop + context flags such as PG/BusSag) captured at the decision point.
  • Software traceability: firmware version plus config hash (or equivalent) so the active policy cannot be disputed later.

Retention concept: use a ring buffer with wear leveling. Preserve critical events (LATCHED_OFF, PROOF_TEST_FAIL, repeated BYPASS_CONFIRMED) longer than routine health summaries. Include a commit marker so partial writes after brownout are detectable.

Copy-ready mini-template (compact event record schema)

event_type: BYPASS_CONFIRMED
ts: 2026-02-12T00:00:00Z
ts_src: RTC
monotonic_ctr: 0001234567
channel_id: PATH_A
state: TRANSFER->VERIFY
reason_code: OPEN_STRING
vote_mode: 2oo3
snapshot: {I_led_avg, V_string, dropout_cnt, PG, BusSag, T_hot, CF, V_drop}
thr_set: {Tconfirm, Tretry, Tholdoff, I_min, V_max, T_max, N_drop}
fw_version: vX.Y.Z
config_hash: 0x________
record_crc: 0x________

Minimum evidence set (by event type)

  • FAULT_DETECTED: I/V/dropout + PG/BusSag + thr_set + state
  • VOTE_DISAGREE: per-channel inputs summary + vote_mode + reason_code
  • BYPASS_COMMAND: command parameters + rate-limit counters + state
  • BYPASS_CONFIRMED: CF + Vdrop/Vds + path-current evidence + verify window result
  • LATCHED_OFF: hazard class + retry counters + final evidence snapshot
  • PROOF_TEST_PASS/FAIL: stimulus_id + expected reading + measured summary
Evidentiary Event Log Pipeline Snapshot → Decision → NVM record → Service export Sensor snapshot I / V / Dropout T / dTdt CF / Vdrop PG / BusSag Evidence reducer avg / flags / counters snapshot_id Vote 1oo2 / 2oo2 / 2oo3 State machine transfer / verify / latch Event builder type / reason thr_set / vote fw / hash / ctr NVM ring buffer wear leveling + commit marker record_crc commit_marker retain Service export EXPORT PACKAGE field report maintenance tool LOG LOG Cite this figure /lighting/
Figure F9 — Event log pipeline: sensor snapshot and context gating feed vote/state decisions, which build evidentiary records into an NVM ring buffer for service export.

Hardware Implementation Notes: Relay Drive, Isolation Boundaries, and Noise Immunity

The implementation details that matter most for bypass and redundancy are the ones that directly change switchover stability, verification reliability, and diagnostic evidence quality. This section focuses on coil-drive behavior, brownout chatter risks, isolation-aware feedback design, and measurement integrity around high di/dt bypass paths.

Relay coil drive: clamp choice changes release time and noise

  • Diode clamp: lower EMI but slower release; increases transfer overlap risk and extends verification windows.
  • Zener/TVS clamp: faster release but higher dv/dt; can inject noise into feedback and measurement paths.
  • RC clamp: balanced behavior but parameter-sensitive; requires validation across tolerance and temperature.

Coil brownout behavior: prevent chatter

  • Chatter risk: brownout can place coil current near the pickup/hold boundary, causing repeated contact toggling.
  • Mitigation concept: apply UVLO-like gating for coil drive, minimum on/off dwell time, and confirm state using CF/Vdrop evidence.

Isolation boundaries: keep cross-domain feedback simple and robust

  • Crossing signals: CF/state feedback crossing isolation should be low-complexity and noise-robust.
  • Evidence integrity: loss/stuck conditions on feedback signals should map to explicit reason codes and log events.

High di/dt bypass paths: measure for evidence, not just precision

  • Kelvin sense: Vdrop/Vds evidence should be sensed with Kelvin routing to avoid inductive and ground-bounce corruption.
  • Sampling windows: avoid measuring during transfer edges; align measurement windows with Verify/Tholdoff policy.
Relay Coil Drive Options (Release vs Noise) Clamp choice impacts Verify & Tholdoff timing A) Diode clamp Coil + Switch COIL SW Clamp: Diode Release time SLOW Noise injection LOW B) Zener/TVS Coil + Switch COIL SW Clamp: Zener/TVS Release time FAST Noise injection HIGHER C) RC clamp Coil + Switch COIL SW Clamp: R + C Release time BALANCED Noise injection MEDIUM Release time feeds Verify & Tholdoff timer settings; noise injection can corrupt CF/Vdrop evidence if not bounded. Cite this figure /lighting/
Figure F10 — Coil clamp options (diode vs Zener/TVS vs RC): release time and noise tradeoffs directly affect switchover timing, verification reliability, and diagnostic evidence quality.

Verification & Proof Testing: How You Validate It Won’t Fail Silently

Redundancy that “works in the lab” can still fail silently in the field when actuators weld, MOSFETs stick, logs corrupt during brownout, or surge/ESD causes false bypass/latch events. A verification plan must therefore test behavior (state transitions), evidence (I/V/T/CF/Vdrop consistency), and auditability (event sequence + monotonic counters) as a single system.

Functional faults Transfer endurance False-bypass / false-latch Log integrity Service proof-test

Test matrix (what to test, what evidence must appear, what to log)

1) Functional — forced open/short

  • Stimulus: emulate open LED string / short / drift.
  • Expected behavior: Normal → Suspect → Confirmed → Transfer/Bypass → Verify → Degraded or Service-required (policy-dependent).
  • Expected evidence: ILED dropouts + Vstring/headroom change; actuator evidence (CF + Vdrop/Vds) confirms the action.
  • Required logs: FAULT_DETECTED → (optional VOTE_DISAGREE) → BYPASS_COMMAND → BYPASS_CONFIRMED → (RECOVERY_ATTEMPT or LATCHED_OFF).

2) Functional — thermal ramp

  • Stimulus: controlled temperature rise to trip thermal policy.
  • Expected behavior: enter Degraded run or latch-off only with sufficient thermal evidence and context gating.
  • Expected evidence: T_hot / ΔT / dTdt trends cross configured thresholds without contradictory PG/BusSag context.
  • Required logs: FAULT_DETECTED (THERMAL_*) with thr_set + snapshot; state transition fields must show policy execution.

3) Functional — intermittent connector

  • Stimulus: intermittent open events (bursty dropouts) to stress debounce and rate limit.
  • Expected behavior: Suspect filtering blocks oscillation; transfers are bounded by Tconfirm + max transfers/hour.
  • Expected evidence: dropout_cnt spikes, but transfers occur only after confirmation; Verify windows are respected.
  • Required logs: repeated FAULT_DETECTED is acceptable; repeated BYPASS_COMMAND without confirmation is a failure.

4) Transfer robustness — endurance + policy stability

  • Stimulus: repeated switchover cycles and max transfers/hour saturation.
  • Expected behavior: commands execute; Verify confirms; rate limit triggers Service-required or latch policy when exceeded.
  • Expected evidence: CF and Vdrop/Vds remain consistent after N cycles; no increasing mismatch rate over time.
  • Required logs: BYPASS_COMMAND / BYPASS_CONFIRMED sequences with retry counters and rate-limit counters captured.

5) Surge/ESD (mis-trigger focus only)

  • Stimulus: surge/ESD exposures relevant to false events.
  • Expected behavior: no “action without evidence.” If bypass/latch occurs, it must be preceded by valid evidence snapshots.
  • Expected evidence: BYPASS_COMMAND and LATCHED_OFF are allowed only when FAULT_DETECTED meets the minimum evidence set.
  • Required logs: preserve the evidence snapshot immediately preceding any action-class event.

6) Log integrity — power loss and monotonic continuity

  • Stimulus: remove power during NVM write and during export packaging.
  • Expected behavior: records are either fully committed (CRC/commit marker) or explicitly marked invalid; never ambiguous “half records.”
  • Expected evidence: monotonic_ctr never rolls back across reboot; discontinuities are detectable and reportable.
  • Required logs: record_crc + commit_marker fields; boot-time continuity check event (concept-level) if implemented.

Acceptance rule: any bypass/latch action must be provably linked to a prior evidentiary FAULT_DETECTED record (snapshot + thr_set + policy ID). “Action without evidence” is classified as false-bypass/false-latch.

Proof-test procedure (service can validate actuator + sensors with bounded impact)

  • Choose a safe window: PG stable, transfer disabled, energy-limited stimulus enabled.
  • Freeze policies: lock rate-limit counters for the test window; prevent state machine from reacting to test stimuli as real faults.
  • Inject stimulus: small Itest/Vprobe or controlled gate toggle (implementation-dependent) to verify commanded vs measured conduction.
  • Observe evidence: CF/Vdrop/Vds and path-current summary must match the expected result for OPEN/ON.
  • Log result: PROOF_TEST_PASS/FAIL with stimulus_id, expected reading, measured summary, and monotonic_ctr.

Example part numbers (MPN) commonly used to build and validate this plan

The verification strategy above maps to concrete hardware building blocks for actuation, protection, logging, time, and supervision. The list below provides example MPNs engineers often use as reference points in prototypes and verification fixtures.

Relay (signal/power examples): Omron G5Q series; Panasonic TQ2 series
Hot-swap / eFuse (telemetry-capable examples): TI TPS25940; TI TPS25982
Surge / TVS (rail clamp examples): Littelfuse SMBJ series; Vishay SMBJ series
Precision supervisor / reset (monotonic/log integrity helper): TI TPS3839; Analog Devices ADM809/ADM810 family
FRAM for robust event storage (example): Fujitsu MB85RS64V (SPI FRAM)
Real-time clock (timestamp source example): Maxim/ADI DS3231 (TCXO RTC)
Digital isolator for feedback crossing (example): TI ISO7721; Analog Devices ADuM1250 (I²C isolator class)
Proof-Test Flow (Service-Executable) Each step produces expected evidence and a verifiable log record 1) SAFE WINDOW PG stable 2) FREEZE POLICY no transfer 3) STIMULUS Itest / Vprobe 4) OBSERVE CF / Vdrop 5) DECIDE PASS / FAIL 6) LOG RESULT PROOF_TEST_* Expected evidence outputs (what must be recorded) snapshot: I/V/T + PG/BusSag evidence: CF + Vdrop/Vds policy: thr_set + monotonic_ctr log: PROOF_TEST_PASS/FAIL + stimulus_id export: service package (optional) for audit Cite this figure /lighting/
Figure F11 — Proof-test flow: a safe window and frozen policy allow bounded stimuli; expected evidence outputs (snapshot + actuator conduction proof + monotonic continuity) are logged as PROOF_TEST_PASS/FAIL.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs (Bypass / Redundant Channel)

Each answer is structured as: 1-sentence conclusion + 2 evidence checks + 1 first fix. Every question maps back to H2-3…H2-11 to avoid scope creep.

1Auto-bypass triggers during surge but the channel is actually fine—thresholds or common-mode pickup?

Conclusion: If bypass actions occur without a valid pre-event evidence snapshot, it is almost always a mis-trigger (threshold/context gating or common-mode pickup), not a real channel fault.

  • Evidence check A: In the log sequence, verify FAULT_DETECTED exists immediately before BYPASS_COMMAND, and that its snapshot includes I/V/dropout changes plus context (PG/BusSag) consistent with a real fault. (H2-5/H2-11)
  • Evidence check B: Compare surge timestamps with feedback integrity: do CF/Vdrop readings spike or glitch during the surge window, indicating common-mode injection into sensing/feedback routes? (H2-10)
  • First fix: Tighten “action requires evidence” gating: require 2-signal correlation (I + V) and a minimum debounce window before allowing bypass; add a rail clamp and validate mis-trigger immunity (e.g., TVS in SMBJ family) and ensure supervisor reset timing is stable (e.g., TPS3839). (H2-5/H2-10/H2-11)
Maps to: H2-5 / H2-10 / H2-11
2Relay chatters at low line—coil brownout or debounce window too short?

Conclusion: Relay chatter at low line is usually coil brownout behavior amplified by insufficient minimum dwell/holdoff timing, not “random” firmware.

  • Evidence check A: Correlate low-line events with coil drive supply and chatter frequency: do repeated state transitions occur without stable CF confirmation? (H2-7)
  • Evidence check B: Inspect coil suppression and release time: slow release can stretch transfer windows and cause repeated re-entries into Suspect/Transfer if verification is time-misaligned. (H2-10)
  • First fix: Add UVLO-like gating for the coil drive and enforce minimum on/off dwell; choose a clamp strategy consistent with the state machine timing and validate with a supervisor (e.g., TPS3839) plus a relay family suited for switching duty (e.g., Omron G5Q series). (H2-7/H2-10)
Maps to: H2-7 / H2-10
3We bypass correctly, but brightness steps visibly—transfer timing or current-loop settle?

Conclusion: Visible brightness steps after bypass are usually a transfer/verify timing issue that reinitializes or perturbs the current loop, not a bypass “failure.”

  • Evidence check A: Compare ILED ripple/settling before and after BYPASS_CONFIRMED; a transient ILED drop or overshoot aligned with Transfer→Verify indicates loop settle or headroom disturbance. (H2-5)
  • Evidence check B: Validate state-machine windows: if Verify begins too early (before coil release or MOSFET stabilization), it can force retries that look like flicker/steps. (H2-7)
  • First fix: Add a holdoff and a “soft-verify” stage (delay + filtered evidence) so current loop settles before declaring stable; enforce a maximum transfers/hour to prevent repeated perceptible steps. (H2-7/H2-5)
Maps to: H2-7 / H2-5
4Voting disagrees intermittently—sensor drift or shared reference causing common-mode?

Conclusion: Intermittent vote disagreement is often caused by hidden common-mode coupling (shared reference/ground/ADC) rather than true sensor drift.

  • Evidence check A: Look at VOTE_DISAGREE records: do disagreements cluster during surge/transfer edges (common-mode), or do they evolve slowly with temperature/time (drift)? (H2-6)
  • Evidence check B: Check independence assumptions: are both channels using the same ADC reference or the same return path such that ground bounce can move both “independent” readings together? (H2-5)
  • First fix: Increase independence at the evidence layer: separate references/filters or staggered sampling windows; add a proof-test that injects a small stimulus and verifies each channel’s response distinctly. For cross-domain signals, use a robust isolator (e.g., ISO7721). (H2-6/H2-5/H2-11)
Maps to: H2-6 / H2-5
5Contact is welded but system doesn’t detect it—missing Vdrop test or wrong injection point?

Conclusion: Welded contacts go undetected when the design lacks a credible open-state evidence method (Vdrop measurement or safe test-current injection at the correct point).

  • Evidence check A: When “open” is commanded, is Vdrop across the contact measured (Kelvin) and logged as part of a proof-test or BIST event? (H2-8)
  • Evidence check B: If a test current is injected, confirm it traverses the suspect element and produces an unambiguous signature; wrong injection points can bypass the evidence path. (H2-11)
  • First fix: Add a bounded proof-test step that measures Vdrop/Vds with a known small stimulus and logs PROOF_TEST_PASS/FAIL; if using a telemetry-capable hot-swap/eFuse for controlled stimulus, a common reference part is TPS25982 or TPS25940. (H2-8/H2-11)
Maps to: H2-8 / H2-11
6MOSFET bypass runs hot in normal mode—Rds(on) margin or SOA underestimated?

Conclusion: A hot bypass MOSFET in normal operation is typically a margin issue (Rds(on) at temperature, gate drive, or current distribution), and verification should confirm SOA and thermal evidence under worst case.

  • Evidence check A: Measure and log Vds (or Vdrop) and current simultaneously to compute real conduction loss versus expectations, including at elevated temperature. (H2-4)
  • Evidence check B: Review proof-test/verification results for worst-case current and thermal ramp: if temperature rise is faster than predicted, SOA or cooling assumptions are wrong. (H2-11)
  • First fix: Increase conduction margin (lower Rds(on) device or parallel strategy) and add thermal gating that prevents repeated transfers under high junction temperature; validate with thermal ramp tests and record evidence snapshots for audit. (H2-4/H2-11)
Maps to: H2-4 / H2-11
7False ‘open-string’ faults after maintenance—connector intermittency or sense wiring routing?

Conclusion: Post-maintenance “open-string” faults are most often intermittent connectors or disturbed sense wiring that turns benign noise into dropout signatures.

  • Evidence check A: Confirm dropout_cnt patterns: intermittent connectors create bursty dropouts with recoveries; a real open string produces persistent I=0 with consistent Vstring changes. (H2-5)
  • Evidence check B: Validate routing/sense integrity around high di/dt paths; poor reference routing can create false Vstring/headroom readings during switching edges. (H2-10)
  • First fix: Increase confirmation robustness (Tconfirm + 2-signal correlation) and rework sense routing for Kelvin reference; if log retention is fragile during repeated maintenance cycles, store key events in FRAM (e.g., MB85RS64V) for robust history. (H2-5/H2-10/H2-9)
Maps to: H2-5 / H2-10
8Event logs show ‘bypass confirmed’ but field tech sees no bypass—feedback signal integrity or definition mismatch?

Conclusion: “Confirmed” in the log without real bypass almost always means the confirmation criteria is too weak or the feedback definition/integrity is wrong across the isolation boundary.

  • Evidence check A: Inspect what “confirmed” means: does BYPASS_CONFIRMED require both CF and a credible conduction metric (Vdrop/Vds + current recovery), or just one noisy signal? (H2-9)
  • Evidence check B: Validate feedback integrity: check for stuck-at, polarity inversion, or cross-domain corruption; verify that CF transitions correlate with actual Vdrop change. (H2-10)
  • First fix: Strengthen confirmation to require two independent proofs (CF + Vdrop/Vds evidence) and add a periodic proof-test; if feedback crosses isolation, use a proven digital isolator class (e.g., ISO7721) and log a “feedback health” reason code on anomalies. (H2-9/H2-10/H2-11)
Maps to: H2-9 / H2-10
9Power-loss during fault causes corrupted history—log atomicity or monotonic counter handling?

Conclusion: Corrupted history under power loss is an atomicity/commit problem first, and a monotonic continuity problem second; both must be verifiable after reboot.

  • Evidence check A: Verify records have commit markers and CRC; partial writes must be detectable and never interpreted as valid events. (H2-9)
  • Evidence check B: Confirm monotonic_ctr continuity across resets; rollbacks or unexplained gaps must be flagged or explainable. (H2-11)
  • First fix: Implement two-phase commit (write → CRC → commit marker) and store critical counters in robust NVM; FRAM is a common choice for write endurance (e.g., MB85RS64V), and a stable timestamp source for audit is a TCXO RTC (e.g., DS3231). (H2-9/H2-11)
Maps to: H2-9 / H2-11
10System recovers too aggressively and oscillates—retry policy or thermal cooldown missing?

Conclusion: Oscillation is usually a retry/backoff policy failure (no cooldown, no transfer rate limit, or too-quick verify), not a fundamental redundancy concept issue.

  • Evidence check A: Look for repeated Transfer→Verify→Retry sequences within a short interval; this indicates missing max transfers/hour and insufficient holdoff. (H2-7)
  • Evidence check B: Check requirements and hazard policy: if the system allows recovery without thermal settle, the evidence (T_hot/ΔT) will show rising stress while retries continue. (H2-2)
  • First fix: Add exponential backoff and thermal cooldown gates before retry; enforce a hard rate limit with a Service-required latch after repeated failures to prevent visible flicker and stress accumulation. (H2-7/H2-2)
Maps to: H2-7 / H2-2
11Redundant channel works in lab, fails in cold start—health monitor gating or timing assumptions?

Conclusion: Cold-start failures are commonly caused by incorrect gating assumptions (PG/bus readiness) and timer windows that do not account for cold behavior of actuators and sensing.

  • Evidence check A: During cold start, verify that health evaluation is gated by valid PG/BusSag conditions; if evidence is sampled before the system is settled, false faults and failed verify are expected. (H2-5)
  • Evidence check B: Compare transfer/verify timers at cold and warm: relay release time, MOSFET behavior, and sensor offsets can shift windows enough to cause systematic Verify failures. (H2-7/H2-11)
  • First fix: Add explicit “boot-safe window” gating and cold-calibrated timing margins; validate with a proof-test at cold start and log the exact thr_set and timer values used. If timestamps are needed for audit correlation, a stable RTC like DS3231 is a common reference. (H2-5/H2-7/H2-11)
Maps to: H2-5 / H2-7 / H2-11
12How do we prove to an auditor the bypass is ‘controlled’ not ‘random’?

Conclusion: Bypass is provably controlled when every action is linked to a minimum evidence set and a reproducible policy record, and when periodic proof-tests prove the actuator and sensors still behave as assumed.

  • Evidence check A: For each bypass action, show the event chain: FAULT_DETECTED (snapshot + thr_set + fw/config) → BYPASS_COMMANDBYPASS_CONFIRMED (CF + Vdrop/Vds + current recovery). (H2-9)
  • Evidence check B: Demonstrate proof-test logs: PROOF_TEST_PASS/FAIL records with monotonic_ctr continuity and commit integrity prove the mechanism hasn’t silently degraded. (H2-11)
  • First fix: Adopt a fixed event schema (type/reason/snapshot/thr_set/fw/config_hash/ctr/CRC) and enforce “action requires evidence”; store critical records in robust NVM (e.g., MB85RS64V) and keep time provenance via RTC (e.g., DS3231). (H2-9/H2-11)
Maps to: H2-9 / H2-11