PoE Switch for CCTV: PSE Power Budgeting, Switching & Protection
← Back to: Security & Surveillance
Core idea: A CCTV PoE switch is not “just a switch”—it must deliver predictable per-port power and stable video switching under burst loads, heat, and outdoor surges, with logs that explain every dropout and recovery.
If power budgeting, protection/thermal derating, and telemetry are engineered as one closed-loop system, cameras stay online and video stays smooth even at high port density.
H2-1. Definition & Boundary (What a CCTV PoE Switch Must Guarantee)
Definition (engineering-grade): A CCTV PoE switch is Ethernet switching plus controlled PoE sourcing (PSE) plus industrial survivability (protection, thermal policy, and logs) to keep cameras powered and streaming without unpredictable port drops.
This page focuses on the switch-side guarantees: power continuity, deterministic recovery behavior, video-friendly L2 functions, and measurable evidence (telemetry + event logs).
3 acceptance KPIs (written as verifiable outcomes):
Power uptime (no “reboot loops” under normal peaks)
Evidence to expose: port_power_on_count, OCP_trip_count, UVLO_event_count, MPS_drop_count, and minimum bus voltage bus_48V_min.
Port stability (no frequent flap or renegotiation)
Evidence to expose: link_flap_count, poe_detect_fail, classify_fail, power_denied_budget, and per-port last fault reason code.
Thermal headroom (sustained load with deterministic derating)
Evidence to expose: hotspot_temp, derating_state, fan_rpm, and “why” codes (OTP, power-budget, airflow).
Boundary (what this page will and will not cover):
- In scope: PoE PSE behavior (budgeting, metering, recovery), switching features that matter for video (VLAN/QoS/IGMP), thermal/derating policy, surge/ESD survivability, and event logging.
- Out of scope: camera ISP/codec tuning, NVR/VMS ingest architecture, cloud management systems, and deep protocol/security derivations.
H2-2. CCTV Workload Model (Traffic + Power Profile You Design For)
Design input must be explicit: CCTV loads are not “steady resistors”. They create power events (step changes and short bursts) and traffic events (bitrate spikes and re-join storms). A robust PoE switch is designed to survive these events without turning them into port resets or packet loss.
- Power axis: steady draw + IR/heater step + PTZ/motor burst → impacts PSE current limiting, bus droop, and derating policy.
- Traffic axis: multi-stream VBR + I-frame spikes + multicast previews → impacts queues, uplink congestion, and IGMP behavior.
- Coupling risk: a power event can trigger link renegotiation; a link event can trigger traffic bursts; both can cascade if policies are not deterministic.
Constraint checklist (use as engineering acceptance inputs):
- Total power budget: PSU rating minus efficiency, temperature derating, and reserve margin.
- Per-port transient margin: step and burst tolerance without false OCP or UVLO.
- Backplane and uplink headroom: worst-case aggregation + burst spikes.
- Queue/QoS + IGMP stability: prevent video starvation and multicast flooding.
Table A — Port class → power profile → PoE class target (typical engineering ranges)
| Port load class | Steady power (W) | Peak delta (W) | Peak duration | PoE target | Switch-side policy to define |
|---|---|---|---|---|---|
| Fixed camera (basic) | 6–12 | +2–6 (IR on) | seconds–minutes | 802.3af / 802.3at | Per-port limit + “no reboot” threshold; staggered start optional |
| IR-heavy / dual-illuminator | 8–15 | +6–15 (IR step) | minutes | 802.3at | Step handling: avoid false OCP; verify bus droop margin |
| Heated outdoor camera | 10–18 | +8–20 (heater) | minutes–hours | 802.3at / 802.3bt | Derating: preserve critical ports; define “budget exceeded” action |
| PTZ / motorized | 12–22 | +15–40 (motor burst) | 100 ms–2 s | 802.3bt (Type 3) | Burst tolerance: foldback vs trip; recovery behavior (no oscillation) |
| High-power edge device | 20–45 | +10–30 | seconds | 802.3bt (Type 3/4) | Strict priority + budget reservation; detailed metering and alarms |
Numbers are typical planning ranges to define margins and policies; validation must confirm actual peak delta and duration for the target camera set.
Table B — Traffic patterns → switching functions → what to measure
| Traffic pattern | Common symptom if mis-handled | Switch feature to apply | What to measure (evidence) |
|---|---|---|---|
| Multi-stream VBR (normal recording) | Random stutter / mosaic | QoS queues for video, avoid oversubscription | Per-queue drops, uplink utilization, per-port RX/TX drops |
| I-frame / event spikes | Short freezes at the same time across cameras | Queue headroom + sensible buffer thresholds | Microburst drop counters, queue depth peaks |
| Multicast preview / wall | Flooding, uplink congestion | IGMP snooping (and querier where needed) | IGMP tables, multicast replication counters, CPU load |
| Re-join storm (after power event) | Uplink saturation, long recovery | Deterministic port recovery + rate limiting where appropriate | Port flap count, reauth/rejoin count, uplink peak utilization |
| Segmentation requirement | Cross-talk between camera and uplink domains | VLAN isolation (camera / uplink / maintenance) | VLAN membership audits, unexpected broadcast counts |
H2-3. PoE Standards & Power Negotiation (af/at/bt, 2-pair/4-pair, LLDP)
Goal: Define what the switch PSE must do (and prove with logs) during PoE negotiation—without turning this chapter into a standards textbook. The engineering objective is predictable allocation, bounded power, and deterministic recovery for CCTV loads.
Key idea: every failure branch in negotiation should map to a reason code and a counter, so field issues never become “random”.
Detection confirms a valid PD signature before power is applied.
- Primary risk: false detect / detect fail under long cable, moisture, or leakage paths.
- What to log: attempt count, fail count, and a stable reason code (e.g.,
DETECT_FAIL).
Evidence fields: detect_attempt_count, detect_fail_count, port_fault_reason
Classification selects the safe power envelope for the port.
- Primary risk: mis-classification → “power-on loops” (too low) or budget collapse (too high).
- Engineering requirement: classification results must be repeatable for the same port + cable.
Evidence fields: classify_result, classify_fail_count, allocated_power_W, power_denied_budget
Power on must tame inrush; maintain must avoid false drop due to MPS behavior.
- Primary risk: inrush triggers
OCP/UVLO→ repeated resets; MPS drop triggers false off. - Engineering requirement: bounded retries + backoff, never endless oscillation.
Evidence fields: power_on_fail, power_on_retry_count, OCP_trip_count, UVLO_event_count, MPS_drop_count
If supported, LLDP power enables dynamic requests, but must remain bounded and auditable.
- Benefit: tighter budgeting and metered allocation.
- Risk: unexpected power changes if policies are inconsistent.
- CCTV preference: “predictable, capped power” with priority for critical ports.
Evidence fields: lldp_power_request_W, lldp_power_grant_W, lldp_change_count
Planning table — PoE level → usable power → CCTV load class → margin & policy
| PoE level | Pairs | Usable power (planning) | Typical CCTV load class (switch view) | Margin rule | Policy note (CCTV) |
|---|---|---|---|---|---|
| 802.3af | 2-pair | ~10–13 W | steady + small step (IR) | keep 20–30% headroom | fixed cap, avoid frequent renegotiation |
| 802.3at | 2-pair | ~20–25 W | steady + step (IR/heater) | reserve for cold-start and cable loss | staggered start under high density |
| 802.3bt Type 3 | 4-pair | ~45–60 W | steady + bursts (PTZ motor) | budget for short bursts | foldback preferred over hard trip |
| 802.3bt Type 4 | 4-pair | ~70–90 W | high-power edge device | strict per-port cap + priority | LLDP can help, but cap must be enforced |
Planning values are for setting policies and headroom. Verification should confirm real peak delta and duration for the target CCTV device set.
H2-4. PSE Power Path Architecture (Per-Port vs Multiport, Hot-Swap & Inrush)
Goal: Explain the hardware skeleton of the PSE power path and the behaviors that determine real-world uptime: fault isolation, inrush control, MPS robustness, and deterministic dropback policies. This chapter describes port-side PSE architecture (not upstream PSU topologies).
Per-port vs multiport architectures decide whether faults are isolated or contagious.
- Per-port isolation: one port trips without dragging neighbors.
- Multiport sharing: bus sag or protection actions can cascade across a group.
Evidence fields: port_fault_count, group_fault_count, bus_48V_min
Hot-plug events are where CCTV ports most often “loop” (power on → trip → retry).
- Failure mechanism: inrush → OCP/UVLO → retry oscillation.
- Requirement: foldback / controlled ramp, plus bounded retries and backoff.
Evidence fields: power_on_retry_count, OCP_trip_count, UVLO_event_count
MPS robustness prevents false power-off when the load behavior is intermittent.
- Failure symptom: “works for minutes, then drops” with no clear trigger.
- Requirement: log MPS drops and correlate with port state transitions.
Evidence fields: MPS_drop_count, port_state, port_fault_reason
Deterministic dropback defines what happens after OCP/OTP/budget events.
- Must define: retry cap, backoff time, lockout conditions, restore conditions.
- CCTV expectation: critical ports should survive via priority before non-critical ports.
Evidence fields: retry_backoff_ms, lockout_state, derating_state
Design checklist — mandatory building blocks (port power path)
- Protection coordination: input surge/ESD path that does not force a whole-switch brownout.
- Power switch + thermal: MOSFET sized for sustained dissipation; predictable derating when hot.
- Sensing and metering: current sense placement that avoids noise-triggered false trips.
- Control behavior: inrush limiting + foldback strategy + bounded retries/backoff + lockout states.
- Observability: reason codes and counters tied to every state transition (on/off/retry/deny).
H2-5. Power Budgeting & Allocation (Guaranteed Power vs Oversubscription)
Goal: Turn “limited total power” into a deterministic service policy: critical ports stay powered, non-critical ports degrade in a controlled way, and every action is explainable via reason-coded logs.
This chapter defines budgeting rules and shed/restore behavior. It does not deep-dive upstream PSU topologies.
Budget is sustained power, not a nameplate number.
- Accounting: conversion efficiency loss + thermal derating + reserve.
- Engineering outcome: an explicit
available_budget_Wunder worst-case temperature.
Evidence fields: available_budget_W, reserve_W, derating_state
Guaranteed power protects critical CCTV ports from brownout and oscillation.
- Applies to: entrance/exit cameras, critical PTZ zones, safety-related coverage.
- Rule: guaranteed allocations must not be reclaimed during budget stress.
Evidence fields: port_priority, guaranteed_W, guaranteed_total_W
Oversubscription (sum of caps > budget) is allowed only with a documented shed policy.
- Trigger: budget waterline crosses Warning/Critical.
- Requirement: cap-down first, selective off second, staged restore last.
Evidence fields: oversubscription_enabled, budget_waterline, shedding_active
Priority policy example (CCTV)
| Port class | Priority | Guaranteed power | Per-port cap | Shed order (when Critical) | Restore order |
|---|---|---|---|---|---|
| Entrance / Critical coverage | P0 | Yes (protect) | cap-high (bounded) | Last to shed (avoid off) | First restore |
| PTZ / burst load | P1 | Partial (if needed) | cap with burst allowance | Cap-down before off | Second restore |
| Normal cameras | P2 | No | cap-medium | First to shed (selective off) | Staged restore |
The table is a template. The key requirement is that the switch can expose and audit the policy (priority, caps, shed/restore sequence).
Decision tree — what happens when total power is not enough
- Trigger: budget crosses
Warningwaterline. - Action: reduce caps on P2, then P1 if needed.
- Log:
event_type=CAP_APPLIED,cap_target_W,port_id,reason=BUDGET
- Trigger: cap-down applied; wait for bus and thermals to recover.
- Action: hold state for a bounded window.
- Log:
grace_timer_ms,bus_48V_min,hotspot_temp
- Trigger: budget remains in
Criticalafter grace window. - Action: turn off lowest priority ports (P2 first), using a fixed ordering rule.
- Log:
event_type=PORT_OFF,reason=BUDGET_SHED,duration_ms
- Trigger: budget returns to
Safewaterline for a stable window. - Action: restore ports in priority order, with backoff to avoid synchronized inrush.
- Log:
event_type=RESTORE,restore_backoff_ms,restore_sequence_id
H2-6. Power Metering & Telemetry (What to Measure, What to Log)
Goal: Make observability executable: define what to measure, what to count, and what to log so field faults map to evidence (not guesses). Telemetry should explain OCP/UVLO/OTP/MPS/budget actions per port.
- Metering:
V_port,I_port,P_port(avg/peak suggested). - Counters:
power_on_count,OCP_trip_count,MPS_drop_count. - Reason codes:
port_fault_reason= OCP/UVLO/OTP/MPS/BUDGET/DETECT/CLASS.
Use: distinguish “inrush loop” vs “thermal drop” vs “budget shed”.
- Bus health:
bus_48V_min,bus_48V_avg. - Thermals:
psu_temp,hotspot_temp,derating_state. - Cooling:
fan_rpm+fan_fault.
Use: prove whether a port fault is isolated or system-wide.
A minimal event record should be consistent across all actions:
timestamp,event_type,port_idreason_code,policy_actionV/I/Psnapshot (optional but recommended)duration_ms(when applicable)
- CLI: live port state + last-N events + counters.
- SNMP: counters + alarms + waterline state.
- Web UI: readable dashboards with exportable events.
Only “what to expose” is defined here (no cloud/remote platform design).
Metrics → threshold style → fault discriminator → next action
| Metric / counter | Threshold style (how to use) | Fault discriminator (what it proves) | Next action (first move) |
|---|---|---|---|
OCP_trip_count (per port) |
rate spike (per minute/hour) | inrush/overload loop vs stable load | check power-on events and cap policy; apply foldback or raise grace/backoff |
MPS_drop_count (per port) |
correlate with idle/low-load intervals | false maintain failure vs real disconnect | confirm MPS policy and port state transitions; avoid oscillation by bounded retry |
bus_48V_min (system) |
min over window (e.g., 1–10 s) | system brownout vs isolated port trip | if system-wide sag, prioritize cap-down/shed and stagger restore |
hotspot_temp + derating_state |
temperature threshold with hysteresis | thermal-driven drops vs electrical faults | activate derate before off; verify fan state and airflow |
event_type=PORT_OFF with reason=BUDGET_SHED |
count + duration | policy shed (expected) vs protection trip (unexpected) | review oversubscription settings; validate restore order/backoff |
Threshold values should be site-tuned. The deliverable here is the mapping: which metric proves which failure class, and which action is safe to take first.
H2-7. Switching Fabric for Video (L2/L3 Features That Actually Matter)
Goal: Keep CCTV video stable under real load: low loss, bounded jitter, and predictable behavior during bursts (I-frames / event spikes). The focus is on configuration targets and how to verify them, not generic networking theory.
- Target: separate
Camera VLAN,Uplink/NVR VLAN, andO&M VLANto reduce broadcast noise and limit lateral exposure. - Failure it prevents: unknown-unicast/broadcast flooding that steals buffers and creates microburst drops.
- Verification: check broadcast/unknown-unicast counters and ensure cameras cannot talk across VLANs without explicit policy.
- Target: ensure video traffic stays in the higher-priority queue during congestion.
- Failure it prevents: video drops/jitter when management traffic or uplink contention occurs.
- Verification: under a controlled congestion test, video queue drops remain near zero while lower queues absorb loss.
Evidence fields: per_queue_drop, queue_occupancy, egress_rate
- Target: multicast streams only reach ports that joined the group; avoid “multicast = flood”.
- Failure it prevents: multicast flooding that burns bandwidth, heats magnetics, and causes loss spikes.
- Verification: non-subscribed ports should receive no multicast stream; group table should be stable.
Evidence fields: igmp_group_table, querier_state, mcast_flood_count
- Use only if device is placed in industrial ring topologies.
- Target: bounded recovery time after link break without persistent storms.
- Verification: link pull test: measure video recovery time and topology-change event logs.
Evidence fields: topology_change_count, convergence_event_ts
- Target: minimal segmentation and least-exposure between VLANs (only required flows allowed).
- Keep it bounded: no deep routing features; use only what supports CCTV segmentation and auditability.
- Verification: ACL hit counters show allowed flows; denies are counted and time-stamped.
Evidence fields: acl_hit_count, acl_deny_count
- Loss: per-port drops and per-queue drops during normal + stress tests.
- Jitter: bounded variation under uplink contention (watch buffers/queues).
- Microbursts: spikes from I-frames/events should not collapse queues or trigger multicast floods.
The deliverable is a verification plan: inject congestion, observe queue drops, and confirm IGMP membership behavior.
Feature → problem → verification → evidence (CCTV-focused)
| Feature | Solves (what breaks without it) | Verification (how to test) | Evidence (counters/logs) |
|---|---|---|---|
| VLAN isolation (Camera/Uplink/O&M) | broadcast/unknown flooding, lateral exposure, buffer instability | send broadcast/unknown-unicast stimulus; confirm containment per VLAN | bcast_count, unknown_ucast_count, MAC learning table stability |
| QoS (video-priority queue) | video drops/jitter under uplink contention | create congestion on uplink; check video queue drops and latency stability | per_queue_drop, queue_occupancy, egress_rate |
| IGMP Snooping (+ querier presence) | multicast turning into flood, bandwidth/thermal stress, microburst loss | subscribe one port, keep one non-subscriber; confirm only member receives stream | igmp_group_table, querier_state, mcast_flood_count |
| RSTP/ERPS (optional) | link-break outage and storms in ring deployments | pull one ring link; measure recovery time and event count | topology_change_count, convergence_event_ts |
| Static route / ACL (minimal) | unbounded cross-segment reachability and un-audited exposure | attempt disallowed traffic; confirm deny with counted hits | acl_hit_count, acl_deny_count, login/audit logs |
H2-8. Thermal Design & Derating (Port Density Is a Heat Problem)
Goal: Treat heat as an operational control problem: identify hotspots, define sensor placement, and implement a derating state machine that prevents port oscillation (drop/reconnect loops) while preserving critical coverage.
Heat-source map (where PoE switches fail first)
- per-port FETs and current sense regions
- high density: many watts in a small footprint
- field symptom: repeated port resets under high power
- rectification/protection parts can heat during stress
- transient events can create thermal spikes
- field symptom: protection events that correlate with temperature
- throughput-driven thermal rise
- high junction temp may degrade buffering behavior
- field symptom: throughput drop and loss spikes at elevated temps
- continuous PoE current + ambient heat
- hot magnetics can indicate airflow problems
- field symptom: link instability and thermal stress near ports
Thermal design checklist (SOP-style)
- hotspot copper spreading and thermal vias in PSE zones
- separate heat islands: avoid stacking PSE + DC/DC + ASIC too tightly
- ensure consistent contact pressure for heatsinks where used
- airflow should sweep the highest density heat sources first
- fan tach monitoring + fault handling (blocked/stalled)
- dust aging: assume airflow drops over time; derate strategy must cover it
- T1: PSE hotspot region (highest risk)
- T2: switch ASIC region
- T3: outlet/near magnetics (airflow indicator)
Sensors must map to actions and logs (not just “for display”).
Derating state machine (temperature → action → log)
| State | Trigger style | Actions (ordered) | Evidence (events/counters) |
|---|---|---|---|
| SAFE | below threshold with margin | normal caps; allow high power per policy | derating_state=SAFE |
| WARN | T1/T2 rising; hold time window | cap-down P2 → cap-down P1; stagger new port power-ups | event_type=DERATE_ENTER, policy_action=CAP, cap_target_W |
| CRITICAL | T1/T2 exceeds critical; persists | disable highest power modes; selective shed P2 ports; enforce backoff | event_type=PORT_OFF, reason=OTP or reason=THERMAL_DERATE, duration_ms |
| RESTORE | temperature back to safe for stable window | staged restore by priority; release caps gradually | event_type=RESTORE, restore_sequence_id, restore_backoff_ms |
H2-9. Protection & Survivability (OC/SC/Surge/ESD/Lightning, Grounding)
Goal: Outdoor long-cable CCTV requires protection that is layered, placed correctly, and recoverable. The deliverable is not “a TVS list” but a clear mapping of threat → component → placement → expected failure mode → verification → evidence logs.
- ESD: very fast transients; must be handled at the connector edge and return path.
- Surge: higher energy; needs clamp + discharge + impedance control to keep energy out of PHY/PSE.
- Lightning / induced events: common-mode energy and ground potential differences; grounding/isolation boundary matters.
- OC/SC: misuse, water ingress, damaged cables; must trip per-port first without collapsing the whole switch.
- Layer 0 — RJ45 edge: divert common-mode energy to chassis/ground early; keep return loop short.
- Layer 1 — Isolation boundary: magnetics + isolation strategy defines what can cross into PHY domain.
- Layer 2 — Chip guard: final clamp / filtering before PHY/PSE sense nodes.
The same component can be “right” or “wrong” depending on placement and return path.
- Bus OVP/UVLO: keep 48–57V domain stable; protect the PSU and prevent undefined operation.
- Bus short-circuit protection: isolate the fault; avoid a full-system brownout if the fault is per-port.
- Input surge handling: AC/DC front-end must reject surges without dragging the PoE bus into reset cycles.
- Port-first principle: per-port trip/cut should handle most cable faults.
- Bus protection as last resort: bus trips only when the bus is actually threatened (not a single port).
- Recovery policy: backoff + staged restore prevents oscillation after surge events.
- Hard failure: power-on fail / link never up → distinguish PSE fault vs PHY fault by timestamps.
- Intermittent: link flaps / CRC errors / drops → correlate with thermal and protection events.
- Minimum log: timestamp, port, reason code, action taken, duration, recovery outcome.
Example evidence fields: port_off_reason, ilim_trip_count, link_flap_count, crc_error_count, bus_uvlo_count
Threat → protection → placement → failure mode → verification → evidence
| Threat | Protection parts (examples) | Placement (must be explicit) | Failure mode if missing | Verification method | Evidence (logs/counters) |
|---|---|---|---|---|---|
| ESD | clamp at interface, controlled return path | RJ45 edge (Layer 0), shortest route to chassis/ground | PHY upset, link flap, latent damage | spot ESD injection; check link stability and error counters | link_flap_count, crc_error_count, event_type=ESD |
| Surge (common/diff-mode) | TVS / discharge element / impedance control | Layer 0 + Layer 1 boundary; keep energy out of PHY/PSE | port dead, repeated trips, bus droop | surge test (available fixture); validate trip locality + recovery | port_off_reason, bus_V_min, recovery_time_ms |
| Lightning / induced | grounding strategy + isolation boundary discipline | chassis/ground entry control; do not inject into logic ground | multi-port instability, unpredictable resets | long-cable stress; monitor common-mode related symptoms | multi_port_fault_count, event timestamps correlation |
| OC/SC (water ingress / damaged cable) | per-port current limit, fast cut, backoff | PSE per-port power path | bus collapse, all ports drop, PSU resets | short each port; confirm only the target port trips | ilim_trip_count, port_off_reason=OCP/SC, bus_uvlo_count |
| Bus abnormal | OVP/UVLO, bus SC protection | 48–57V bus & PSU input domains | undefined brownout, repeated reboot cycles | bus droop injection / PSU fault simulation | bus_ovp_count, bus_uvlo_count, reboot logs |
H2-10. Validation & Production Test Plan (Pass/Fail Criteria You Can Ship With)
Goal: Turn Power/Network/Thermal/Protection into a repeatable shipping checklist with clear fixtures, steps, pass/fail criteria, and a consistent log field template for traceability.
- Power: power-up success rate, current limit response, MPS stability, bus droop margins.
- Network: throughput, microburst loss, IGMP stability, VLAN isolation behavior.
- Thermal: full-load soak, fan policy, derating entry/exit, staged restore without oscillation.
- Protection: ESD/surge robustness with controllable recovery and explainable logs.
- Required:
timestamp,device_id,fw_version,test_id,domain,port_id,thresholds,result - Recommended:
ambient,load_profile,traffic_profile,duration_ms - Events:
reason_code,action,recovery_time_ms
A test without evidence fields cannot be debugged or audited after shipment.
- Inject stress (congestion / full-power / long soak / protection stimulus).
- Observe counters (drops, faults, state transitions).
- Require controlled recovery (no endless reboot loops, no port oscillation).
Master test table (fixture → steps → criteria → log fields)
| Test ID | Domain | Fixture / Setup | Steps (short) | Pass / Fail criteria | Log fields (minimum) |
|---|---|---|---|---|---|
| PWR-01 | Power | programmable PoE loads / per-port load profiles | power-up all ports by priority; repeat N cycles | power-up success rate meets target; no unexpected bus UVLO | port_id, class, requested_W, actual_W, bus_V_min, result |
| PWR-02 | Power | short/OC stimulus per port | apply OC/SC; verify localized trip + backoff | only target port trips; system remains stable; recovery controlled | ilim_trip_count, port_off_reason, backoff_ms, bus_uvlo_count |
| NET-01 | Network | traffic generator; uplink congestion injection | run video-like streams + add congestion | video-priority queue drops remain bounded; no collapse under microbursts | per_queue_drop, queue_occupancy, port_drop, result |
| NET-02 | Network | IGMP join/leave cycling | subscribe one port; keep one non-member; stream multicast | non-member receives no multicast; group table stable | igmp_group_table, querier_state, mcast_flood_count |
| THM-01 | Thermal | full-power + full-traffic soak; ambient control if available | run for soak time; log T1/T2/T3 and fan tach | temps within limits or derating enters as designed; no oscillation | T1, T2, T3, fan_rpm, derating_state |
| THM-02 | Thermal | derating trigger & restore | force WARN/CRITICAL; verify staged restore | restore requires cool-down window; staged power-up; stable end state | event_type, action, duration_ms, restore_sequence |
| PRO-01 | Protection | ESD spot injection (available fixture) | apply ESD to defined points; monitor link + counters | link recovers; no persistent flaps; errors within target | crc_error_count, link_flap_count, recovery_time_ms |
| PRO-02 | Protection | surge stimulus (available fixture) | apply surge; verify localized trip + controlled recovery | no whole-box brownout; recovery policy executed and logged | port_off_reason, bus_V_min, bus_uvlo_count, action |
H2-11. Field Debug Playbook (Symptom → Evidence → Isolate → Fix)
Purpose: Give installers a shortest path. For each symptom, run two checks, use a discriminator to choose the branch, then apply the first fix that is measurable and reversible.
Port keeps powering up / powering down (power cycling).
First 2 checks
port_off_reason+ilim_trip_count(OCP/SC trips)bus_uvlo_count+bus_V_min(48–57V bus droop)
Discriminator
- High OCP trips + stable bus → localized port / cable / PD fault
- Low OCP trips + frequent bus UVLO → total budget / PSU droop / simultaneous inrush
- Trips correlate with temperature (
T_pse/T_hotspot) → thermal derating oscillation
Isolate
- Force the port to a lower power limit and retest with a short known-good cable.
- Disable other high-power ports, then re-run power-up to see whether bus UVLO disappears.
- Capture a 5–10 minute timeline: reason code → action → recovery result.
First fix (with MPN examples)
- Per-port OCP/SC robustness: verify fast current limit + controlled backoff in the PSE. MPN examples: TPS23881 (TI 8-port PSE), LTC4296-1 (Analog Devices 4-port PSE), PD69200 (Microchip PoE PSE).
- Bus droop / inrush: add/verify hot-swap / inrush limiting and staged port power-up policy. MPN examples: LTC4222 (ADI hot-swap), LTC4366 (ADI surge stopper/OVP), TPS25982 (TI eFuse with surge protection).
- Thermal oscillation: smooth derating thresholds + minimum off-time before re-powering. MPN examples (temperature sensing): TMP102 (TI), EMC2101 (Microchip fan controller).
Dropouts only at night / when IR illuminator turns on.
First 2 checks
actual_Wtrend (orport_power_stepif available) at IR-on timestampsilim_trip_countorderating_statechanges during IR-on
Discriminator
- Power step → OCP/limit → power-off → port limit too tight or cable loss too high
- Power step without power-off, but link errors rise (
crc_error_count) → noise/return-path coupling (still switch-side)
Isolate
- Temporarily raise port priority or limit; compare failure rate before/after.
- Swap to a shorter/known-good cable; if issue disappears, cable loss is dominant.
- Correlate IR-on times with bus droop (
bus_V_min) and port events.
First fix (with MPN examples)
- Enable predictable power negotiation / dynamic allocation so IR steps do not hit hard limits. MPN examples (PSE): TPS23881, LTC4296-1, PD69200.
- Improve per-port power path noise immunity (layout/return path + appropriate magnetics/CMC selection). MPN examples (Ethernet magnetics family): Pulse H5007NL (example family), Würth 7490xxxx (CMC families).
- Surge/ESD hardening at the RJ45 edge to reduce night-humidity related transients. MPN examples: SMBJ58A (TVS class, choose proper rating), SM712 (ESD array class for data lines).
Note: magnetics/CMC part numbers are platform-dependent (port count, PoE class, isolation spec). Use these as “example families” and finalize by your PoE class + EMI test results.
Multiple streams stutter, but links stay up (no link-down).
First 2 checks
per_queue_drop/queue_occupancy(microburst congestion evidence)mcast_flood_count+igmp_table_state(multicast control plane)
Discriminator
- Queue drops rise while link stays up → buffering/QoS mismatch vs workload bursts
- Multicast flood rises → IGMP snooping/querier misbehavior
- Errors remain low but stutter persists → verify video queue mapping and uplink bottleneck
Isolate
- Halve camera count for 10 minutes: if drops scale down, you have a congestion-driven root cause.
- Force video VLAN separation from management VLAN; verify drops shift to expected queues only.
- Lock one multicast group; verify non-member ports see no multicast traffic.
First fix (with MPN examples)
- Switch SoC with adequate buffers for bursty video + multicast. MPN examples (switch SoC families): BCM53xx (Broadcom family), Marvell Prestera (88E/98DX families).
- Better timestamping/telemetry for queue drops so stutter correlates with counters. MPN examples (MCU for telemetry/logging): STM32F4 (ST), ESP32 (Espressif, if low-cost mgmt is needed).
These are platform SoC choices; the “field fix” is usually QoS/IGMP configuration + uplink planning, but listing SoC families helps readers understand what hardware capability is required.
After thunderstorm, a few ports are completely dead (no power).
First 2 checks
detection_fail_count/classify_fail_count(PoE detect/classify failures)port_short_state+port_off_reason(hard short vs protection latch-off)
Discriminator
- Detect/classify fails + link also dead → front-end boundary damage (RJ45/magnetics/PHY-side)
- Hard short detected → external cable/PD fault; validate with known-good short cable first
- Link works but power fails → PSE power path (FET/sense) likely impacted
Isolate
- Test with a known-good short cable + known-good PoE load (avoid risking another port).
- Move the same cable to a different port: cable vs port separation.
- Check whether surge/ESD event logs exist around storm time.
First fix (with MPN examples)
- Layer-0 surge diversion at RJ45 edge and correct return-to-chassis strategy. MPN examples (TVS class): SMBJ58A, SMCJ58A (choose rating to match PoE rail). For data-line ESD arrays: TPD4E05U06 (TI class), SM712 (industry class).
- Bus input hardening so one surge does not brownout the entire box. MPN examples: LTC4366 (ADI surge stopper), TPS25982 (TI eFuse), LTC4222 (ADI hot-swap).
- PSE replacement path for damaged ports (board-level service). MPN examples: TPS23881, LTC4296-1, PD69200.
After ~1 hour at full load, ports drop one by one.
First 2 checks
T_hotspot_max/T_pse/T_psu(temperature peak and slope)derating_state+fan_rpm(derating entry/exit + fan stall evidence)
Discriminator
- Temp rises → derating toggles → ports drop → derating policy oscillation or insufficient airflow
- High temp + abnormal fan RPM → airflow blockage / fan fault dominates
- Normal temp but ports drop → re-check power budget and OCP/UVLO evidence
Isolate
- Reduce total PoE power by 10–20% for 10 minutes: if stable, thermal headroom is the constraint.
- Increase fan duty temporarily; see if dropouts delay.
- Stage port power-up and restore sequence; verify state machine is monotonic (no thrash).
First fix (with MPN examples)
- Fan control + tach monitoring to prevent silent airflow collapse. MPN examples: EMC2101 (Microchip fan controller), MAX31760 (Analog Devices fan controller class).
- Distributed temperature sensing near PSE FETs, magnetics, and PSU hot spots. MPN examples: TMP102 (TI), LM75-class temperature sensors.
- Derating state machine that first limits per-port power then disables low-priority ports only if needed. Implementation uses MCU/SoC firmware; typical MCU families: STM32 (ST), NXP LPC (NXP).
MPN quick index (by function block)
| Function block | MPN examples | Why it appears in field debug |
|---|---|---|
| PSE controller (multi-port) | TPS23881 (TI), LTC4296-1 (ADI), PD69200 (Microchip) | Explains OCP trips, detect/classify failures, per-port behavior and logging needs. |
| Hot-swap / inrush / bus protection | LTC4222 (ADI), LTC4366 (ADI), TPS25982 (TI) | Addresses bus UVLO, brownout loops, and surge-induced bus collapse. |
| ESD / TVS (class examples) | TPD4E05U06 (TI), SM712 (ESD array class), SMBJ58A/SMCJ58A (TVS class) | Correlates post-storm failures and intermittent link errors with protection layering. |
| Thermal monitoring / fan control | TMP102 (TI), EMC2101 (Microchip), MAX31760 (ADI) | Explains thermal soak dropouts, derating oscillation, fan stall evidence. |
| Switch SoC (family examples) | Broadcom BCM53xx family, Marvell Prestera families | Frames stutter-without-linkdown issues as buffer/QoS/IGMP capability limits. |
H2-12. FAQs
Format: Each answer follows the same evidence chain: Short answer → What to measure (2 items) → First fix (1 action). This keeps troubleshooting inside the switch boundary (power, protection, thermal, switching telemetry, validation logs).
Why does a camera reboot when IR LEDs turn on?Maps to: H2-2 / H2-5 / H2-11 +
Short answer: IR LEDs create a step load that trips port current-limit or pulls the 54V bus into UVLO. What to measure: (1) actual_W / ilim_trip_count at IR-on timestamps, (2) bus_V_min / bus_uvlo_count. First fix: raise that port’s limit/priority and enable staged power-up or longer backoff.
Port shows “PoE on” but the camera never boots—what to check first?Maps to: H2-4 / H2-6 / H2-11 +
Short answer: “PoE on” can still be repeated retries, MPS loss, or a latched protection state. What to measure: (1) power_on_attempt_count + port_off_reason, (2) mps_lost_count or classify_fail_count. First fix: test with a short known-good cable + known-good PoE load, then tune inrush/backoff.
How to set power priority when total budget is insufficient?Maps to: H2-5 +
Short answer: define a deterministic shedding policy so critical ports never drop. What to measure: (1) per-port allocated_W vs actual_W, (2) system budget_margin_W with event timestamps. First fix: enforce “limit before disconnect,” then shed lowest-priority ports first, and restore in priority order with staged re-power.
Why do only long cable runs fail detection/classification?Maps to: H2-3 / H2-9 +
Short answer: long cables amplify resistance/leakage effects and reduce detect/classify margin, especially with moisture and surge-protection parasitics. What to measure: (1) detection_fail_count/classify_fail_count by cable length, (2) port current/voltage during detect/classify if available. First fix: validate with a short cable; if stable, treat field cabling/terminations as the first remediation.
OCP trips randomly—load issue or inrush/hot-plug behavior?Maps to: H2-4 / H2-11 +
Short answer: “random” OCP is usually timing-correlated to hot-plug, IR steps, or simultaneous port restarts. What to measure: (1) ilim_trip_count with timestamps, (2) power_on_attempt_count and bus_V_min around each trip. First fix: increase minimum off-time/backoff and stage port re-power to avoid synchronized inrush.
All ports drop together—PSU UVLO or protection coordination problem?Maps to: H2-5 / H2-9 / H2-11 +
Short answer: a whole-box drop almost always starts at the 54V bus (UVLO/OVP) or an upstream protection event. What to measure: (1) bus_uvlo_count + bus_V_min, (2) PSU temperature/fault flags and event logs. First fix: prioritize port-level limiting over bus collapse, and implement staged power-up to prevent bus sag cascades.
Video stutters but link stays up—QoS/IGMP or buffer contention?Maps to: H2-7 / H2-10 +
Short answer: stutter-without-linkdown is typically queue drops, multicast flooding, or uplink contention during bursty I-frames. What to measure: (1) per_queue_drop/queue_occupancy, (2) mcast_flood_count and IGMP table state. First fix: correct QoS queue mapping and IGMP snooping/querier behavior, then re-validate burst loss.
Thermal derating triggers too early—what are the typical hotspots?Maps to: H2-8 +
Short answer: early derating is usually sensor placement or local hotspots near PSE FETs, rectifiers, magnetics, PSU, or the switch SoC. What to measure: (1) T_pse/T_hotspot slope at load, (2) derating_state vs fan_rpm. First fix: improve airflow path and add hysteresis/minimum-on-time to avoid derating oscillation.
After a lightning storm, some ports are dead—what evidence distinguishes PHY vs PSE damage?Maps to: H2-9 / H2-11 +
Short answer: use “power vs link” separation plus detect/classify evidence to localize the damage boundary. What to measure: (1) detection_fail_count/classify_fail_count and port_short_state, (2) link status + error counters (CRC) on that port. First fix: retest with short known-good cable/load first; then service the affected port domain only.
Can I oversubscribe PoE budget safely for CCTV?Maps to: H2-5 +
Short answer: yes, but only with explicit priorities, per-port limits, and a predictable shedding/recovery policy. What to measure: (1) peak actual_W distribution (IR/heater/PTZ bursts), (2) frequency of limit/deny events per port. First fix: oversubscribe only after defining “limit-first, disconnect-last” and restoring critical ports first with staged re-power.
How should I log PoE events for maintenance and SLA?Maps to: H2-6 +
Short answer: log PoE as time-correlated events with reason codes so outages are explainable and repeatable. What to measure: (1) minimum fields: timestamp, port_id, class/requested_W, actual_W, port_off_reason, trip counters, duration, (2) system context: bus_V_min, temperatures, fan RPM. First fix: standardize a single event schema across CLI/SNMP/Web exports.
What’s the minimum production test to avoid field PoE issues?Maps to: H2-10 +
Short answer: a minimal gate must cover power bring-up, burst traffic loss, thermal soak, and recoverability after protection events. What to measure: (1) per-port power-on success + OCP response + MPS stability, (2) burst loss/IGMP stability and full-load temperature/derating behavior. First fix: ship only if logs prove deterministic behavior and controlled recovery under worst-case load.