123 Main Street, New York, NY 10001

PoE Switch for CCTV: PSE Power Budgeting, Switching & Protection

← Back to: Security & Surveillance

Core idea: A CCTV PoE switch is not “just a switch”—it must deliver predictable per-port power and stable video switching under burst loads, heat, and outdoor surges, with logs that explain every dropout and recovery.

If power budgeting, protection/thermal derating, and telemetry are engineered as one closed-loop system, cameras stay online and video stays smooth even at high port density.

H2-1. Definition & Boundary (What a CCTV PoE Switch Must Guarantee)

Definition (engineering-grade): A CCTV PoE switch is Ethernet switching plus controlled PoE sourcing (PSE) plus industrial survivability (protection, thermal policy, and logs) to keep cameras powered and streaming without unpredictable port drops.

This page focuses on the switch-side guarantees: power continuity, deterministic recovery behavior, video-friendly L2 functions, and measurable evidence (telemetry + event logs).

3 acceptance KPIs (written as verifiable outcomes):

KPI 1

Power uptime (no “reboot loops” under normal peaks)

Evidence to expose: port_power_on_count, OCP_trip_count, UVLO_event_count, MPS_drop_count, and minimum bus voltage bus_48V_min.

KPI 2

Port stability (no frequent flap or renegotiation)

Evidence to expose: link_flap_count, poe_detect_fail, classify_fail, power_denied_budget, and per-port last fault reason code.

KPI 3

Thermal headroom (sustained load with deterministic derating)

Evidence to expose: hotspot_temp, derating_state, fan_rpm, and “why” codes (OTP, power-budget, airflow).

Why these KPIs work: they map directly to field symptoms (random reboots, link drops, heat-related failures) and can be validated by counters and timestamps—no reliance on subjective “seems stable”.

Boundary (what this page will and will not cover):

  • In scope: PoE PSE behavior (budgeting, metering, recovery), switching features that matter for video (VLAN/QoS/IGMP), thermal/derating policy, surge/ESD survivability, and event logging.
  • Out of scope: camera ISP/codec tuning, NVR/VMS ingest architecture, cloud management systems, and deep protocol/security derivations.
PoE Switch for CCTV — System Planes Power plane + Data plane + Control/Telemetry Power Plane AC/DC In or -48V 48–57V PSU budget source PSE Array limit + meter Budget Meter Limit RJ45 Ports PoE out Data Plane Switch ASIC VLAN • QoS • IGMP PHY + Magnetics link integrity RJ45 + Uplink camera ports + backhaul Control & Telemetry MCU / BMC Sensors & Counters Logs / CLI / SNMP evidence for SLA Fan Protection
Cite this figure Caption: Power, data, and telemetry planes are coupled; PoE actions (inrush, derating, protection) must be observable and must not destabilize switching.
Figure F1. A CCTV PoE switch is best treated as three coupled planes: (1) controlled power sourcing (PSE), (2) switching fabric for video traffic, and (3) telemetry/logs that explain every fault and recovery action.

H2-2. CCTV Workload Model (Traffic + Power Profile You Design For)

Design input must be explicit: CCTV loads are not “steady resistors”. They create power events (step changes and short bursts) and traffic events (bitrate spikes and re-join storms). A robust PoE switch is designed to survive these events without turning them into port resets or packet loss.

  • Power axis: steady draw + IR/heater step + PTZ/motor burst → impacts PSE current limiting, bus droop, and derating policy.
  • Traffic axis: multi-stream VBR + I-frame spikes + multicast previews → impacts queues, uplink congestion, and IGMP behavior.
  • Coupling risk: a power event can trigger link renegotiation; a link event can trigger traffic bursts; both can cascade if policies are not deterministic.

Constraint checklist (use as engineering acceptance inputs):

  • Total power budget: PSU rating minus efficiency, temperature derating, and reserve margin.
  • Per-port transient margin: step and burst tolerance without false OCP or UVLO.
  • Backplane and uplink headroom: worst-case aggregation + burst spikes.
  • Queue/QoS + IGMP stability: prevent video starvation and multicast flooding.
A “good looking” spec is insufficient unless it states peak delta, peak duration, and the switch’s deterministic action when budget is exceeded.

Table A — Port class → power profile → PoE class target (typical engineering ranges)

Port load class Steady power (W) Peak delta (W) Peak duration PoE target Switch-side policy to define
Fixed camera (basic) 6–12 +2–6 (IR on) seconds–minutes 802.3af / 802.3at Per-port limit + “no reboot” threshold; staggered start optional
IR-heavy / dual-illuminator 8–15 +6–15 (IR step) minutes 802.3at Step handling: avoid false OCP; verify bus droop margin
Heated outdoor camera 10–18 +8–20 (heater) minutes–hours 802.3at / 802.3bt Derating: preserve critical ports; define “budget exceeded” action
PTZ / motorized 12–22 +15–40 (motor burst) 100 ms–2 s 802.3bt (Type 3) Burst tolerance: foldback vs trip; recovery behavior (no oscillation)
High-power edge device 20–45 +10–30 seconds 802.3bt (Type 3/4) Strict priority + budget reservation; detailed metering and alarms

Numbers are typical planning ranges to define margins and policies; validation must confirm actual peak delta and duration for the target camera set.

Table B — Traffic patterns → switching functions → what to measure

Traffic pattern Common symptom if mis-handled Switch feature to apply What to measure (evidence)
Multi-stream VBR (normal recording) Random stutter / mosaic QoS queues for video, avoid oversubscription Per-queue drops, uplink utilization, per-port RX/TX drops
I-frame / event spikes Short freezes at the same time across cameras Queue headroom + sensible buffer thresholds Microburst drop counters, queue depth peaks
Multicast preview / wall Flooding, uplink congestion IGMP snooping (and querier where needed) IGMP tables, multicast replication counters, CPU load
Re-join storm (after power event) Uplink saturation, long recovery Deterministic port recovery + rate limiting where appropriate Port flap count, reauth/rejoin count, uplink peak utilization
Segmentation requirement Cross-talk between camera and uplink domains VLAN isolation (camera / uplink / maintenance) VLAN membership audits, unexpected broadcast counts
CCTV Load Profile — Power Events Design for steps (IR/heater) and bursts (PTZ/motor) under a fixed budget time → power → Budget line Single Port Steady Step IR / heater Burst PTZ motor Interpretation: when many ports align in Step/Burst, the system approaches the budget line. System Total steady sum + steps + bursts aligned events risk zone
Cite this figure Caption: CCTV loads create steps (IR/heater) and short bursts (PTZ). The switch must enforce a deterministic policy when the system approaches the budget line.
Figure F2. A CCTV PoE switch is engineered against event-driven power profiles. The objective is to tolerate step and burst events without turning them into OCP/UVLO oscillations or link flaps, especially when many ports align in time.

H2-3. PoE Standards & Power Negotiation (af/at/bt, 2-pair/4-pair, LLDP)

Goal: Define what the switch PSE must do (and prove with logs) during PoE negotiation—without turning this chapter into a standards textbook. The engineering objective is predictable allocation, bounded power, and deterministic recovery for CCTV loads.

Key idea: every failure branch in negotiation should map to a reason code and a counter, so field issues never become “random”.

Stage 1 — Detect

Detection confirms a valid PD signature before power is applied.

  • Primary risk: false detect / detect fail under long cable, moisture, or leakage paths.
  • What to log: attempt count, fail count, and a stable reason code (e.g., DETECT_FAIL).

Evidence fields: detect_attempt_count, detect_fail_count, port_fault_reason

Stage 2 — Classify

Classification selects the safe power envelope for the port.

  • Primary risk: mis-classification → “power-on loops” (too low) or budget collapse (too high).
  • Engineering requirement: classification results must be repeatable for the same port + cable.

Evidence fields: classify_result, classify_fail_count, allocated_power_W, power_denied_budget

Stage 3 — Power On / Maintain

Power on must tame inrush; maintain must avoid false drop due to MPS behavior.

  • Primary risk: inrush triggers OCP / UVLO → repeated resets; MPS drop triggers false off.
  • Engineering requirement: bounded retries + backoff, never endless oscillation.

Evidence fields: power_on_fail, power_on_retry_count, OCP_trip_count, UVLO_event_count, MPS_drop_count

Stage 4 — LLDP Power (optional)

If supported, LLDP power enables dynamic requests, but must remain bounded and auditable.

  • Benefit: tighter budgeting and metered allocation.
  • Risk: unexpected power changes if policies are inconsistent.
  • CCTV preference: “predictable, capped power” with priority for critical ports.

Evidence fields: lldp_power_request_W, lldp_power_grant_W, lldp_change_count

Design rule (CCTV): prefer a policy that is predictable (stable class), limit-able (per-port cap), and explainable (reason-coded events), because CCTV failures are judged by uptime and service calls.

Planning table — PoE level → usable power → CCTV load class → margin & policy

PoE level Pairs Usable power (planning) Typical CCTV load class (switch view) Margin rule Policy note (CCTV)
802.3af 2-pair ~10–13 W steady + small step (IR) keep 20–30% headroom fixed cap, avoid frequent renegotiation
802.3at 2-pair ~20–25 W steady + step (IR/heater) reserve for cold-start and cable loss staggered start under high density
802.3bt Type 3 4-pair ~45–60 W steady + bursts (PTZ motor) budget for short bursts foldback preferred over hard trip
802.3bt Type 4 4-pair ~70–90 W high-power edge device strict per-port cap + priority LLDP can help, but cap must be enforced

Planning values are for setting policies and headroom. Verification should confirm real peak delta and duration for the target CCTV device set.

PoE Negotiation Sequence (PSE perspective) Deterministic stages + reason-coded failure branches Detect PD present? Classify power class Power On tame inrush Maintain MPS valid Retry / Lockout DETECT_FAIL Default / Deny CLASS_FAIL / BUDGET Backoff & Retry OCP / UVLO Controlled Off MPS / OTP LLDP Power (optional) request → grant (capped) log every change
Cite this figure Caption: PoE negotiation must be deterministic. Each failure branch (DETECT/CLASS/BUDGET/OCP/UVLO/MPS/OTP) should map to a reason code and counter.
Figure F3. Treat PoE as a staged state machine. CCTV-grade behavior requires bounded retries, stable classification, and reason-coded fault paths that can be audited in the field.

H2-4. PSE Power Path Architecture (Per-Port vs Multiport, Hot-Swap & Inrush)

Goal: Explain the hardware skeleton of the PSE power path and the behaviors that determine real-world uptime: fault isolation, inrush control, MPS robustness, and deterministic dropback policies. This chapter describes port-side PSE architecture (not upstream PSU topologies).

Architecture choice

Per-port vs multiport architectures decide whether faults are isolated or contagious.

  • Per-port isolation: one port trips without dragging neighbors.
  • Multiport sharing: bus sag or protection actions can cascade across a group.

Evidence fields: port_fault_count, group_fault_count, bus_48V_min

Hot-plug / Inrush

Hot-plug events are where CCTV ports most often “loop” (power on → trip → retry).

  • Failure mechanism: inrush → OCP/UVLO → retry oscillation.
  • Requirement: foldback / controlled ramp, plus bounded retries and backoff.

Evidence fields: power_on_retry_count, OCP_trip_count, UVLO_event_count

Maintain (MPS)

MPS robustness prevents false power-off when the load behavior is intermittent.

  • Failure symptom: “works for minutes, then drops” with no clear trigger.
  • Requirement: log MPS drops and correlate with port state transitions.

Evidence fields: MPS_drop_count, port_state, port_fault_reason

Dropback policy

Deterministic dropback defines what happens after OCP/OTP/budget events.

  • Must define: retry cap, backoff time, lockout conditions, restore conditions.
  • CCTV expectation: critical ports should survive via priority before non-critical ports.

Evidence fields: retry_backoff_ms, lockout_state, derating_state

Checklist mindset: a port power path is correct only if it can answer three questions with data: (1) what limited or tripped, (2) why it happened, and (3) why the recovery action chosen was safe and repeatable.

Design checklist — mandatory building blocks (port power path)

  • Protection coordination: input surge/ESD path that does not force a whole-switch brownout.
  • Power switch + thermal: MOSFET sized for sustained dissipation; predictable derating when hot.
  • Sensing and metering: current sense placement that avoids noise-triggered false trips.
  • Control behavior: inrush limiting + foldback strategy + bounded retries/backoff + lockout states.
  • Observability: reason codes and counters tied to every state transition (on/off/retry/deny).
Common pitfalls: noisy sense routing (false OCP), under-designed FET thermal (heat-only drops), and protection mismatch (surge causes cascading resets).
Single-Port PSE Power Path (hardware skeleton) Where to limit, where to measure, and where faults must be isolated 48–57V bus rail Protection surge / ESD Surge PSE Controller detect / classify Power Switch MOSFET + ramp Heat Sense meter + limit Noise RJ45 to cable Observability (must-have) Reason codes + counters per port OCP UVLO OTP MPS BUDGET LOG
Cite this figure Caption: A single PoE port power path must isolate faults, tame inrush, measure current reliably, and expose reason-coded events (OCP/UVLO/OTP/MPS/BUDGET).
Figure F4. The port power path is a repeatable skeleton: protection → PSE control → power switch → sense → RJ45. Stability comes from controlled inrush and deterministic policy, while serviceability depends on reason-coded logs.

H2-5. Power Budgeting & Allocation (Guaranteed Power vs Oversubscription)

Goal: Turn “limited total power” into a deterministic service policy: critical ports stay powered, non-critical ports degrade in a controlled way, and every action is explainable via reason-coded logs.

This chapter defines budgeting rules and shed/restore behavior. It does not deep-dive upstream PSU topologies.

Layer A — Available Power

Budget is sustained power, not a nameplate number.

  • Accounting: conversion efficiency loss + thermal derating + reserve.
  • Engineering outcome: an explicit available_budget_W under worst-case temperature.

Evidence fields: available_budget_W, reserve_W, derating_state

Layer B — Guaranteed Power

Guaranteed power protects critical CCTV ports from brownout and oscillation.

  • Applies to: entrance/exit cameras, critical PTZ zones, safety-related coverage.
  • Rule: guaranteed allocations must not be reclaimed during budget stress.

Evidence fields: port_priority, guaranteed_W, guaranteed_total_W

Layer C — Oversubscription Zone

Oversubscription (sum of caps > budget) is allowed only with a documented shed policy.

  • Trigger: budget waterline crosses Warning/Critical.
  • Requirement: cap-down first, selective off second, staged restore last.

Evidence fields: oversubscription_enabled, budget_waterline, shedding_active

Policy invariant: the same inputs (budget state + port priority + measured power) should produce the same actions every time. “Random port drops” are usually a missing policy invariant or missing reason codes.

Priority policy example (CCTV)

Port class Priority Guaranteed power Per-port cap Shed order (when Critical) Restore order
Entrance / Critical coverage P0 Yes (protect) cap-high (bounded) Last to shed (avoid off) First restore
PTZ / burst load P1 Partial (if needed) cap with burst allowance Cap-down before off Second restore
Normal cameras P2 No cap-medium First to shed (selective off) Staged restore

The table is a template. The key requirement is that the switch can expose and audit the policy (priority, caps, shed/restore sequence).

Decision tree — what happens when total power is not enough

1) Soft action: cap-down
  • Trigger: budget crosses Warning waterline.
  • Action: reduce caps on P2, then P1 if needed.
  • Log: event_type=CAP_APPLIED, cap_target_W, port_id, reason=BUDGET
2) Grace window: stabilize
  • Trigger: cap-down applied; wait for bus and thermals to recover.
  • Action: hold state for a bounded window.
  • Log: grace_timer_ms, bus_48V_min, hotspot_temp
3) Hard action: selective off
  • Trigger: budget remains in Critical after grace window.
  • Action: turn off lowest priority ports (P2 first), using a fixed ordering rule.
  • Log: event_type=PORT_OFF, reason=BUDGET_SHED, duration_ms
4) Restore: staged power-up
  • Trigger: budget returns to Safe waterline for a stable window.
  • Action: restore ports in priority order, with backoff to avoid synchronized inrush.
  • Log: event_type=RESTORE, restore_backoff_ms, restore_sequence_id
Behavior definition (must be explicit): when power is insufficient, decide whether the first response is cap-down or immediate off, which port selection rule is used, and what the restore conditions are. Ambiguity here creates field instability.
Power Budget “Waterline” Model Total budget → valves per port group → priority-based shed/restore Available Budget sustained power SAFE WARNING CRITICAL P0 — Critical Ports guaranteed power (protected) GUARANTEE P1 — Managed Ports cap-down first, off second CAP P2 — Best-effort Ports first to shed (selective off) stable cap-down SHED Actions: CAP (soft) → SHED (hard) → RESTORE (staged). Every action must emit reason code + counter + duration.
Cite this figure Caption: Budget waterlines drive deterministic actions—cap-down first, selective shed next, staged restore last—prioritized to protect critical CCTV ports.
Figure F5. Model budgeting as a waterline system. Oversubscription is acceptable only if the shed/restore sequence is deterministic and auditable with reason-coded events.

H2-6. Power Metering & Telemetry (What to Measure, What to Log)

Goal: Make observability executable: define what to measure, what to count, and what to log so field faults map to evidence (not guesses). Telemetry should explain OCP/UVLO/OTP/MPS/budget actions per port.

Port-level metrics
  • Metering: V_port, I_port, P_port (avg/peak suggested).
  • Counters: power_on_count, OCP_trip_count, MPS_drop_count.
  • Reason codes: port_fault_reason = OCP/UVLO/OTP/MPS/BUDGET/DETECT/CLASS.

Use: distinguish “inrush loop” vs “thermal drop” vs “budget shed”.

System-level metrics
  • Bus health: bus_48V_min, bus_48V_avg.
  • Thermals: psu_temp, hotspot_temp, derating_state.
  • Cooling: fan_rpm + fan_fault.

Use: prove whether a port fault is isolated or system-wide.

Event log schema (must-have)

A minimal event record should be consistent across all actions:

  • timestamp, event_type, port_id
  • reason_code, policy_action
  • V/I/P snapshot (optional but recommended)
  • duration_ms (when applicable)
Exposure (local only)
  • CLI: live port state + last-N events + counters.
  • SNMP: counters + alarms + waterline state.
  • Web UI: readable dashboards with exportable events.

Only “what to expose” is defined here (no cloud/remote platform design).

Audit rule: any automatic action (CAP/SHED/RESTORE/LOCKOUT/DERATE) must be visible as (1) an event record, (2) an incremented counter, and (3) a state transition. If one of these is missing, root-cause becomes guesswork.

Metrics → threshold style → fault discriminator → next action

Metric / counter Threshold style (how to use) Fault discriminator (what it proves) Next action (first move)
OCP_trip_count (per port) rate spike (per minute/hour) inrush/overload loop vs stable load check power-on events and cap policy; apply foldback or raise grace/backoff
MPS_drop_count (per port) correlate with idle/low-load intervals false maintain failure vs real disconnect confirm MPS policy and port state transitions; avoid oscillation by bounded retry
bus_48V_min (system) min over window (e.g., 1–10 s) system brownout vs isolated port trip if system-wide sag, prioritize cap-down/shed and stagger restore
hotspot_temp + derating_state temperature threshold with hysteresis thermal-driven drops vs electrical faults activate derate before off; verify fan state and airflow
event_type=PORT_OFF with reason=BUDGET_SHED count + duration policy shed (expected) vs protection trip (unexpected) review oversubscription settings; validate restore order/backoff

Threshold values should be site-tuned. The deliverable here is the mapping: which metric proves which failure class, and which action is safe to take first.

Telemetry Dataflow (local observability) Sensors → aggregation + timestamp → logs + interfaces (CLI/SNMP/Web) Sensors V/I/P (per port) 48V bus Temps Fan tach MCU / Control Plane aggregate • timestamp • reason-code Event builder type + port + reason Counters OCP / MPS / shed State machine cap / shed / restore Outputs Local Log exportable events CLI state + last-N SNMP counters + alarms Web UI readable dashboards Reason codes should be consistent across all outputs: OCP UVLO OTP MPS BUDGET
Cite this figure Caption: Telemetry should flow from sensors to a reason-coded event log and counters, then be exposed consistently via CLI/SNMP/Web for field diagnosis.
Figure F6. A minimal, high-value telemetry pipeline: measure, timestamp, reason-code, log, and expose. Consistent reason codes are the bridge from symptoms to root cause.

H2-7. Switching Fabric for Video (L2/L3 Features That Actually Matter)

Goal: Keep CCTV video stable under real load: low loss, bounded jitter, and predictable behavior during bursts (I-frames / event spikes). The focus is on configuration targets and how to verify them, not generic networking theory.

L2 — VLAN isolation
  • Target: separate Camera VLAN, Uplink/NVR VLAN, and O&M VLAN to reduce broadcast noise and limit lateral exposure.
  • Failure it prevents: unknown-unicast/broadcast flooding that steals buffers and creates microburst drops.
  • Verification: check broadcast/unknown-unicast counters and ensure cameras cannot talk across VLANs without explicit policy.
L2 — QoS for video
  • Target: ensure video traffic stays in the higher-priority queue during congestion.
  • Failure it prevents: video drops/jitter when management traffic or uplink contention occurs.
  • Verification: under a controlled congestion test, video queue drops remain near zero while lower queues absorb loss.

Evidence fields: per_queue_drop, queue_occupancy, egress_rate

L2 — IGMP Snooping (multicast)
  • Target: multicast streams only reach ports that joined the group; avoid “multicast = flood”.
  • Failure it prevents: multicast flooding that burns bandwidth, heats magnetics, and causes loss spikes.
  • Verification: non-subscribed ports should receive no multicast stream; group table should be stable.

Evidence fields: igmp_group_table, querier_state, mcast_flood_count

Ring (optional) — RSTP / ERPS
  • Use only if device is placed in industrial ring topologies.
  • Target: bounded recovery time after link break without persistent storms.
  • Verification: link pull test: measure video recovery time and topology-change event logs.

Evidence fields: topology_change_count, convergence_event_ts

L3 (minimal) — static route / ACL
  • Target: minimal segmentation and least-exposure between VLANs (only required flows allowed).
  • Keep it bounded: no deep routing features; use only what supports CCTV segmentation and auditability.
  • Verification: ACL hit counters show allowed flows; denies are counted and time-stamped.

Evidence fields: acl_hit_count, acl_deny_count

Acceptance metrics (must be measurable)
  • Loss: per-port drops and per-queue drops during normal + stress tests.
  • Jitter: bounded variation under uplink contention (watch buffers/queues).
  • Microbursts: spikes from I-frames/events should not collapse queues or trigger multicast floods.

The deliverable is a verification plan: inject congestion, observe queue drops, and confirm IGMP membership behavior.

Verification principle: every “feature” must map to (1) a specific failure mode, (2) a test method, and (3) evidence counters/log fields. Otherwise the feature is not operationally useful.

Feature → problem → verification → evidence (CCTV-focused)

Feature Solves (what breaks without it) Verification (how to test) Evidence (counters/logs)
VLAN isolation (Camera/Uplink/O&M) broadcast/unknown flooding, lateral exposure, buffer instability send broadcast/unknown-unicast stimulus; confirm containment per VLAN bcast_count, unknown_ucast_count, MAC learning table stability
QoS (video-priority queue) video drops/jitter under uplink contention create congestion on uplink; check video queue drops and latency stability per_queue_drop, queue_occupancy, egress_rate
IGMP Snooping (+ querier presence) multicast turning into flood, bandwidth/thermal stress, microburst loss subscribe one port, keep one non-subscriber; confirm only member receives stream igmp_group_table, querier_state, mcast_flood_count
RSTP/ERPS (optional) link-break outage and storms in ring deployments pull one ring link; measure recovery time and event count topology_change_count, convergence_event_ts
Static route / ACL (minimal) unbounded cross-segment reachability and un-audited exposure attempt disallowed traffic; confirm deny with counted hits acl_hit_count, acl_deny_count, login/audit logs
VLAN + Multicast (IGMP) Topology Camera VLAN isolated; multicast forwarded only to group members Camera VLAN PoE ports (cameras) Camera A Camera B Camera C Camera D O&M VLAN management station CCTV PoE Switch L2/L3 + QoS IGMP Snooping group table QoS queues video priority Uplink / NVR VLAN recorder & viewers NVR / Recorder Viewer 1 Viewer 2 IGMP Querier control plane member non-member IGMP Snooping forwards multicast only to group members; without a querier, membership may expire and revert to flooding. Acceptance: non-member ports must not receive multicast streams; verify via group table and per-port counters.
Cite this figure Caption: VLAN isolation keeps camera traffic contained; IGMP snooping + a querier prevents multicast flooding by forwarding streams only to subscribed ports.
Figure F7. CCTV-focused switching: VLAN separation, QoS for video, and IGMP snooping to avoid multicast flood. Verification is evidence-based (group tables + per-port counters).

H2-8. Thermal Design & Derating (Port Density Is a Heat Problem)

Goal: Treat heat as an operational control problem: identify hotspots, define sensor placement, and implement a derating state machine that prevents port oscillation (drop/reconnect loops) while preserving critical coverage.

Heat-source map (where PoE switches fail first)

PSE power stages
  • per-port FETs and current sense regions
  • high density: many watts in a small footprint
  • field symptom: repeated port resets under high power
48V bus + protection
  • rectification/protection parts can heat during stress
  • transient events can create thermal spikes
  • field symptom: protection events that correlate with temperature
Switch ASIC / PHY
  • throughput-driven thermal rise
  • high junction temp may degrade buffering behavior
  • field symptom: throughput drop and loss spikes at elevated temps
Magnetics (RJ45)
  • continuous PoE current + ambient heat
  • hot magnetics can indicate airflow problems
  • field symptom: link instability and thermal stress near ports
Operational rule: derating decisions must be explainable with sensor evidence (which sensor tripped, what state was entered, which actions were applied, and for how long).

Thermal design checklist (SOP-style)

Heat spreading
  • hotspot copper spreading and thermal vias in PSE zones
  • separate heat islands: avoid stacking PSE + DC/DC + ASIC too tightly
  • ensure consistent contact pressure for heatsinks where used
Airflow & cooling
  • airflow should sweep the highest density heat sources first
  • fan tach monitoring + fault handling (blocked/stalled)
  • dust aging: assume airflow drops over time; derate strategy must cover it
Sensor placement
  • T1: PSE hotspot region (highest risk)
  • T2: switch ASIC region
  • T3: outlet/near magnetics (airflow indicator)

Sensors must map to actions and logs (not just “for display”).

Derating state machine (temperature → action → log)

State Trigger style Actions (ordered) Evidence (events/counters)
SAFE below threshold with margin normal caps; allow high power per policy derating_state=SAFE
WARN T1/T2 rising; hold time window cap-down P2 → cap-down P1; stagger new port power-ups event_type=DERATE_ENTER, policy_action=CAP, cap_target_W
CRITICAL T1/T2 exceeds critical; persists disable highest power modes; selective shed P2 ports; enforce backoff event_type=PORT_OFF, reason=OTP or reason=THERMAL_DERATE, duration_ms
RESTORE temperature back to safe for stable window staged restore by priority; release caps gradually event_type=RESTORE, restore_sequence_id, restore_backoff_ms
Avoid oscillation: restore should require a stable “cool-down window”, and power-up should be staged to prevent synchronized inrush + thermal rebound.
Thermal Path & Derating Control Loop Heat sources → spread/airflow → sensors → derating actions (cap/shed/restore) Switch Board PSE DC-DC Switch ASIC Magnetics Heat Spreader / Heatsink Airflow inlet → hotspots → outlet T1 T2 T3 Thermal Control derating state machine CAP P2 → P1 SHED selective off RESTORE staged actions Thermal failures often appear as “random drops”. Require: sensor evidence → state transition → logged actions with duration. Derating should prevent oscillation: stable cool-down window + staged restore + backoff on re-power.
Cite this figure Caption: Thermal reliability is a control loop—hotspots and airflow determine sensor readings, which drive derating actions (cap, shed, restore) that must be logged and auditable.
Figure F8. Treat heat as a closed-loop policy. Sensors are meaningful only when they deterministically drive actions and produce explainable logs.

H2-9. Protection & Survivability (OC/SC/Surge/ESD/Lightning, Grounding)

Goal: Outdoor long-cable CCTV requires protection that is layered, placed correctly, and recoverable. The deliverable is not “a TVS list” but a clear mapping of threat → component → placement → expected failure mode → verification → evidence logs.

Threat model (what hits the port)
  • ESD: very fast transients; must be handled at the connector edge and return path.
  • Surge: higher energy; needs clamp + discharge + impedance control to keep energy out of PHY/PSE.
  • Lightning / induced events: common-mode energy and ground potential differences; grounding/isolation boundary matters.
  • OC/SC: misuse, water ingress, damaged cables; must trip per-port first without collapsing the whole switch.
Layered port protection (placement-first)
  • Layer 0 — RJ45 edge: divert common-mode energy to chassis/ground early; keep return loop short.
  • Layer 1 — Isolation boundary: magnetics + isolation strategy defines what can cross into PHY domain.
  • Layer 2 — Chip guard: final clamp / filtering before PHY/PSE sense nodes.

The same component can be “right” or “wrong” depending on placement and return path.

48V bus / input survivability
  • Bus OVP/UVLO: keep 48–57V domain stable; protect the PSU and prevent undefined operation.
  • Bus short-circuit protection: isolate the fault; avoid a full-system brownout if the fault is per-port.
  • Input surge handling: AC/DC front-end must reject surges without dragging the PoE bus into reset cycles.
Protection coordination (avoid “whole box down”)
  • Port-first principle: per-port trip/cut should handle most cable faults.
  • Bus protection as last resort: bus trips only when the bus is actually threatened (not a single port).
  • Recovery policy: backoff + staged restore prevents oscillation after surge events.
Evidence logging (surge vs unstable link)
  • Hard failure: power-on fail / link never up → distinguish PSE fault vs PHY fault by timestamps.
  • Intermittent: link flaps / CRC errors / drops → correlate with thermal and protection events.
  • Minimum log: timestamp, port, reason code, action taken, duration, recovery outcome.

Example evidence fields: port_off_reason, ilim_trip_count, link_flap_count, crc_error_count, bus_uvlo_count

Field reality: “random disconnects” often come from protection oscillation. Survivability means: trip is localized, recovery is controlled, and logs explain what happened.

Threat → protection → placement → failure mode → verification → evidence

Threat Protection parts (examples) Placement (must be explicit) Failure mode if missing Verification method Evidence (logs/counters)
ESD clamp at interface, controlled return path RJ45 edge (Layer 0), shortest route to chassis/ground PHY upset, link flap, latent damage spot ESD injection; check link stability and error counters link_flap_count, crc_error_count, event_type=ESD
Surge (common/diff-mode) TVS / discharge element / impedance control Layer 0 + Layer 1 boundary; keep energy out of PHY/PSE port dead, repeated trips, bus droop surge test (available fixture); validate trip locality + recovery port_off_reason, bus_V_min, recovery_time_ms
Lightning / induced grounding strategy + isolation boundary discipline chassis/ground entry control; do not inject into logic ground multi-port instability, unpredictable resets long-cable stress; monitor common-mode related symptoms multi_port_fault_count, event timestamps correlation
OC/SC (water ingress / damaged cable) per-port current limit, fast cut, backoff PSE per-port power path bus collapse, all ports drop, PSU resets short each port; confirm only the target port trips ilim_trip_count, port_off_reason=OCP/SC, bus_uvlo_count
Bus abnormal OVP/UVLO, bus SC protection 48–57V bus & PSU input domains undefined brownout, repeated reboot cycles bus droop injection / PSU fault simulation bus_ovp_count, bus_uvlo_count, reboot logs
Layered Port Protection — Cutaway Placement + return path defines survivability Layer 0 (RJ45 edge) Layer 1 (Magnetics) Layer 2 (Chip guard) RJ45 TVS GDT CMT / choke Magnetics Isolation boundary PHY Clamp RC PSE Sense 48–57V Chassis / Ground Return Short return path at Layer 0 reduces energy reaching PHY/PSE domains Coordination: per-port trip should localize faults; bus protection should not brownout the whole switch avoid energy crossing boundary
Cite this figure Caption: Layered port protection is defined by placement and return path—RJ45 edge diversion (Layer 0), isolation boundary discipline (Layer 1), and chip-guard clamps (Layer 2) keep surge energy out of PHY/PSE.
Figure F9. A cutaway view of a PoE port showing where protection lives and where surge energy should be steered. The key is coordination: localize faults per port and avoid full-bus brownouts.

H2-10. Validation & Production Test Plan (Pass/Fail Criteria You Can Ship With)

Goal: Turn Power/Network/Thermal/Protection into a repeatable shipping checklist with clear fixtures, steps, pass/fail criteria, and a consistent log field template for traceability.

Structure (four domains)
  • Power: power-up success rate, current limit response, MPS stability, bus droop margins.
  • Network: throughput, microburst loss, IGMP stability, VLAN isolation behavior.
  • Thermal: full-load soak, fan policy, derating entry/exit, staged restore without oscillation.
  • Protection: ESD/surge robustness with controllable recovery and explainable logs.
Minimum log template (traceability)
  • Required: timestamp, device_id, fw_version, test_id, domain, port_id, thresholds, result
  • Recommended: ambient, load_profile, traffic_profile, duration_ms
  • Events: reason_code, action, recovery_time_ms

A test without evidence fields cannot be debugged or audited after shipment.

Pass/fail philosophy
  • Inject stress (congestion / full-power / long soak / protection stimulus).
  • Observe counters (drops, faults, state transitions).
  • Require controlled recovery (no endless reboot loops, no port oscillation).

Master test table (fixture → steps → criteria → log fields)

Test ID Domain Fixture / Setup Steps (short) Pass / Fail criteria Log fields (minimum)
PWR-01 Power programmable PoE loads / per-port load profiles power-up all ports by priority; repeat N cycles power-up success rate meets target; no unexpected bus UVLO port_id, class, requested_W, actual_W, bus_V_min, result
PWR-02 Power short/OC stimulus per port apply OC/SC; verify localized trip + backoff only target port trips; system remains stable; recovery controlled ilim_trip_count, port_off_reason, backoff_ms, bus_uvlo_count
NET-01 Network traffic generator; uplink congestion injection run video-like streams + add congestion video-priority queue drops remain bounded; no collapse under microbursts per_queue_drop, queue_occupancy, port_drop, result
NET-02 Network IGMP join/leave cycling subscribe one port; keep one non-member; stream multicast non-member receives no multicast; group table stable igmp_group_table, querier_state, mcast_flood_count
THM-01 Thermal full-power + full-traffic soak; ambient control if available run for soak time; log T1/T2/T3 and fan tach temps within limits or derating enters as designed; no oscillation T1, T2, T3, fan_rpm, derating_state
THM-02 Thermal derating trigger & restore force WARN/CRITICAL; verify staged restore restore requires cool-down window; staged power-up; stable end state event_type, action, duration_ms, restore_sequence
PRO-01 Protection ESD spot injection (available fixture) apply ESD to defined points; monitor link + counters link recovers; no persistent flaps; errors within target crc_error_count, link_flap_count, recovery_time_ms
PRO-02 Protection surge stimulus (available fixture) apply surge; verify localized trip + controlled recovery no whole-box brownout; recovery policy executed and logged port_off_reason, bus_V_min, bus_uvlo_count, action
Ship-ready criterion: Each domain must have (1) a stress injection, (2) a measurable counter-based verdict, and (3) a recovery story that is logged and explainable.
Validation Matrix (Power / Network / Thermal / Protection) Each quadrant produces pass/fail + evidence logs Power Port power-up ILIM response MPS stability Network Throughput Microburst loss IGMP stability Thermal Full-load soak Derate enter/exit Protection ESD robustness Surge recovery Ship Criteria pass/fail + recovery counter evidence traceable logs Log Template timestamp device_id / fw test_id / domain port_id / thresholds result / reason / action
Cite this figure Caption: A ship-ready validation plan closes four loops—Power, Network, Thermal, Protection—each producing pass/fail decisions backed by counters and a traceable log template.
Figure F10. The validation matrix ensures coverage across power delivery, switching behavior, heat/derating, and survivability. Each quadrant must produce evidence logs and a controlled recovery story.

H2-11. Field Debug Playbook (Symptom → Evidence → Isolate → Fix)

Purpose: Give installers a shortest path. For each symptom, run two checks, use a discriminator to choose the branch, then apply the first fix that is measurable and reversible.

First 2 checks Discriminator Isolate First fix No guesswork
Symptom A

Port keeps powering up / powering down (power cycling).

First 2 checks

  • port_off_reason + ilim_trip_count (OCP/SC trips)
  • bus_uvlo_count + bus_V_min (48–57V bus droop)

Discriminator

  • High OCP trips + stable bus → localized port / cable / PD fault
  • Low OCP trips + frequent bus UVLO → total budget / PSU droop / simultaneous inrush
  • Trips correlate with temperature (T_pse/T_hotspot) → thermal derating oscillation

Isolate

  • Force the port to a lower power limit and retest with a short known-good cable.
  • Disable other high-power ports, then re-run power-up to see whether bus UVLO disappears.
  • Capture a 5–10 minute timeline: reason code → action → recovery result.

First fix (with MPN examples)

  • Per-port OCP/SC robustness: verify fast current limit + controlled backoff in the PSE. MPN examples: TPS23881 (TI 8-port PSE), LTC4296-1 (Analog Devices 4-port PSE), PD69200 (Microchip PoE PSE).
  • Bus droop / inrush: add/verify hot-swap / inrush limiting and staged port power-up policy. MPN examples: LTC4222 (ADI hot-swap), LTC4366 (ADI surge stopper/OVP), TPS25982 (TI eFuse with surge protection).
  • Thermal oscillation: smooth derating thresholds + minimum off-time before re-powering. MPN examples (temperature sensing): TMP102 (TI), EMC2101 (Microchip fan controller).
Do not do: repeated manual unplug/replug loops without logs—this hides whether the root cause is OCP, UVLO, or thermal.
Symptom B

Dropouts only at night / when IR illuminator turns on.

First 2 checks

  • actual_W trend (or port_power_step if available) at IR-on timestamps
  • ilim_trip_count or derating_state changes during IR-on

Discriminator

  • Power step → OCP/limit → power-off → port limit too tight or cable loss too high
  • Power step without power-off, but link errors rise (crc_error_count) → noise/return-path coupling (still switch-side)

Isolate

  • Temporarily raise port priority or limit; compare failure rate before/after.
  • Swap to a shorter/known-good cable; if issue disappears, cable loss is dominant.
  • Correlate IR-on times with bus droop (bus_V_min) and port events.

First fix (with MPN examples)

  • Enable predictable power negotiation / dynamic allocation so IR steps do not hit hard limits. MPN examples (PSE): TPS23881, LTC4296-1, PD69200.
  • Improve per-port power path noise immunity (layout/return path + appropriate magnetics/CMC selection). MPN examples (Ethernet magnetics family): Pulse H5007NL (example family), Würth 7490xxxx (CMC families).
  • Surge/ESD hardening at the RJ45 edge to reduce night-humidity related transients. MPN examples: SMBJ58A (TVS class, choose proper rating), SM712 (ESD array class for data lines).

Note: magnetics/CMC part numbers are platform-dependent (port count, PoE class, isolation spec). Use these as “example families” and finalize by your PoE class + EMI test results.

Symptom C

Multiple streams stutter, but links stay up (no link-down).

First 2 checks

  • per_queue_drop / queue_occupancy (microburst congestion evidence)
  • mcast_flood_count + igmp_table_state (multicast control plane)

Discriminator

  • Queue drops rise while link stays up → buffering/QoS mismatch vs workload bursts
  • Multicast flood rises → IGMP snooping/querier misbehavior
  • Errors remain low but stutter persists → verify video queue mapping and uplink bottleneck

Isolate

  • Halve camera count for 10 minutes: if drops scale down, you have a congestion-driven root cause.
  • Force video VLAN separation from management VLAN; verify drops shift to expected queues only.
  • Lock one multicast group; verify non-member ports see no multicast traffic.

First fix (with MPN examples)

  • Switch SoC with adequate buffers for bursty video + multicast. MPN examples (switch SoC families): BCM53xx (Broadcom family), Marvell Prestera (88E/98DX families).
  • Better timestamping/telemetry for queue drops so stutter correlates with counters. MPN examples (MCU for telemetry/logging): STM32F4 (ST), ESP32 (Espressif, if low-cost mgmt is needed).

These are platform SoC choices; the “field fix” is usually QoS/IGMP configuration + uplink planning, but listing SoC families helps readers understand what hardware capability is required.

Symptom D

After thunderstorm, a few ports are completely dead (no power).

First 2 checks

  • detection_fail_count / classify_fail_count (PoE detect/classify failures)
  • port_short_state + port_off_reason (hard short vs protection latch-off)

Discriminator

  • Detect/classify fails + link also dead → front-end boundary damage (RJ45/magnetics/PHY-side)
  • Hard short detected → external cable/PD fault; validate with known-good short cable first
  • Link works but power fails → PSE power path (FET/sense) likely impacted

Isolate

  • Test with a known-good short cable + known-good PoE load (avoid risking another port).
  • Move the same cable to a different port: cable vs port separation.
  • Check whether surge/ESD event logs exist around storm time.

First fix (with MPN examples)

  • Layer-0 surge diversion at RJ45 edge and correct return-to-chassis strategy. MPN examples (TVS class): SMBJ58A, SMCJ58A (choose rating to match PoE rail). For data-line ESD arrays: TPD4E05U06 (TI class), SM712 (industry class).
  • Bus input hardening so one surge does not brownout the entire box. MPN examples: LTC4366 (ADI surge stopper), TPS25982 (TI eFuse), LTC4222 (ADI hot-swap).
  • PSE replacement path for damaged ports (board-level service). MPN examples: TPS23881, LTC4296-1, PD69200.
Symptom E

After ~1 hour at full load, ports drop one by one.

First 2 checks

  • T_hotspot_max / T_pse / T_psu (temperature peak and slope)
  • derating_state + fan_rpm (derating entry/exit + fan stall evidence)

Discriminator

  • Temp rises → derating toggles → ports drop → derating policy oscillation or insufficient airflow
  • High temp + abnormal fan RPM → airflow blockage / fan fault dominates
  • Normal temp but ports drop → re-check power budget and OCP/UVLO evidence

Isolate

  • Reduce total PoE power by 10–20% for 10 minutes: if stable, thermal headroom is the constraint.
  • Increase fan duty temporarily; see if dropouts delay.
  • Stage port power-up and restore sequence; verify state machine is monotonic (no thrash).

First fix (with MPN examples)

  • Fan control + tach monitoring to prevent silent airflow collapse. MPN examples: EMC2101 (Microchip fan controller), MAX31760 (Analog Devices fan controller class).
  • Distributed temperature sensing near PSE FETs, magnetics, and PSU hot spots. MPN examples: TMP102 (TI), LM75-class temperature sensors.
  • Derating state machine that first limits per-port power then disables low-priority ports only if needed. Implementation uses MCU/SoC firmware; typical MCU families: STM32 (ST), NXP LPC (NXP).

MPN quick index (by function block)

Function block MPN examples Why it appears in field debug
PSE controller (multi-port) TPS23881 (TI), LTC4296-1 (ADI), PD69200 (Microchip) Explains OCP trips, detect/classify failures, per-port behavior and logging needs.
Hot-swap / inrush / bus protection LTC4222 (ADI), LTC4366 (ADI), TPS25982 (TI) Addresses bus UVLO, brownout loops, and surge-induced bus collapse.
ESD / TVS (class examples) TPD4E05U06 (TI), SM712 (ESD array class), SMBJ58A/SMCJ58A (TVS class) Correlates post-storm failures and intermittent link errors with protection layering.
Thermal monitoring / fan control TMP102 (TI), EMC2101 (Microchip), MAX31760 (ADI) Explains thermal soak dropouts, derating oscillation, fan stall evidence.
Switch SoC (family examples) Broadcom BCM53xx family, Marvell Prestera families Frames stutter-without-linkdown issues as buffer/QoS/IGMP capability limits.
Note: MPNs above are representative examples. Final selection depends on port count, PoE class (af/at/bt), thermal budget, surge level, and required management interface.
Port Dropout Decision Tree (10-minute field path) Two checks → discriminator → first fix Symptom: Port drops / cycles Check #1: OCP trips? ilim_trip_count, reason Check #2: Bus UVLO? bus_uvlo_count, Vmin If OCP high & bus stable Cable/PD fault or port limit too tight First fix: short cable + lower limit + backoff If bus UVLO frequent Total budget / PSU droop / inrush First fix: staged power-up + reduce total W Also check: Temperature / fan T_hotspot, derating_state, fan_rpm If thermal-driven: airflow + smooth derating If stutter, no link-down queue drops / IGMP flood If post-storm dead port detect/classify fail + short
Cite this figure Caption: A field-first decision tree: check per-port OCP evidence and bus UVLO evidence first, then branch to cable/PD faults, total budget/inrush, thermal derating, or network/multicast causes.
Figure F11. A compact decision tree that maps the most common CCTV PoE switch field symptoms to the first two measurements, the discriminator, and the first fix.

Request a Quote

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

H2-12. FAQs

Format: Each answer follows the same evidence chain: Short answerWhat to measure (2 items)First fix (1 action). This keeps troubleshooting inside the switch boundary (power, protection, thermal, switching telemetry, validation logs).

Why does a camera reboot when IR LEDs turn on?Maps to: H2-2 / H2-5 / H2-11 +

Short answer: IR LEDs create a step load that trips port current-limit or pulls the 54V bus into UVLO. What to measure: (1) actual_W / ilim_trip_count at IR-on timestamps, (2) bus_V_min / bus_uvlo_count. First fix: raise that port’s limit/priority and enable staged power-up or longer backoff.

Port shows “PoE on” but the camera never boots—what to check first?Maps to: H2-4 / H2-6 / H2-11 +

Short answer: “PoE on” can still be repeated retries, MPS loss, or a latched protection state. What to measure: (1) power_on_attempt_count + port_off_reason, (2) mps_lost_count or classify_fail_count. First fix: test with a short known-good cable + known-good PoE load, then tune inrush/backoff.

How to set power priority when total budget is insufficient?Maps to: H2-5 +

Short answer: define a deterministic shedding policy so critical ports never drop. What to measure: (1) per-port allocated_W vs actual_W, (2) system budget_margin_W with event timestamps. First fix: enforce “limit before disconnect,” then shed lowest-priority ports first, and restore in priority order with staged re-power.

Why do only long cable runs fail detection/classification?Maps to: H2-3 / H2-9 +

Short answer: long cables amplify resistance/leakage effects and reduce detect/classify margin, especially with moisture and surge-protection parasitics. What to measure: (1) detection_fail_count/classify_fail_count by cable length, (2) port current/voltage during detect/classify if available. First fix: validate with a short cable; if stable, treat field cabling/terminations as the first remediation.

OCP trips randomly—load issue or inrush/hot-plug behavior?Maps to: H2-4 / H2-11 +

Short answer: “random” OCP is usually timing-correlated to hot-plug, IR steps, or simultaneous port restarts. What to measure: (1) ilim_trip_count with timestamps, (2) power_on_attempt_count and bus_V_min around each trip. First fix: increase minimum off-time/backoff and stage port re-power to avoid synchronized inrush.

All ports drop together—PSU UVLO or protection coordination problem?Maps to: H2-5 / H2-9 / H2-11 +

Short answer: a whole-box drop almost always starts at the 54V bus (UVLO/OVP) or an upstream protection event. What to measure: (1) bus_uvlo_count + bus_V_min, (2) PSU temperature/fault flags and event logs. First fix: prioritize port-level limiting over bus collapse, and implement staged power-up to prevent bus sag cascades.

Video stutters but link stays up—QoS/IGMP or buffer contention?Maps to: H2-7 / H2-10 +

Short answer: stutter-without-linkdown is typically queue drops, multicast flooding, or uplink contention during bursty I-frames. What to measure: (1) per_queue_drop/queue_occupancy, (2) mcast_flood_count and IGMP table state. First fix: correct QoS queue mapping and IGMP snooping/querier behavior, then re-validate burst loss.

Thermal derating triggers too early—what are the typical hotspots?Maps to: H2-8 +

Short answer: early derating is usually sensor placement or local hotspots near PSE FETs, rectifiers, magnetics, PSU, or the switch SoC. What to measure: (1) T_pse/T_hotspot slope at load, (2) derating_state vs fan_rpm. First fix: improve airflow path and add hysteresis/minimum-on-time to avoid derating oscillation.

After a lightning storm, some ports are dead—what evidence distinguishes PHY vs PSE damage?Maps to: H2-9 / H2-11 +

Short answer: use “power vs link” separation plus detect/classify evidence to localize the damage boundary. What to measure: (1) detection_fail_count/classify_fail_count and port_short_state, (2) link status + error counters (CRC) on that port. First fix: retest with short known-good cable/load first; then service the affected port domain only.

Can I oversubscribe PoE budget safely for CCTV?Maps to: H2-5 +

Short answer: yes, but only with explicit priorities, per-port limits, and a predictable shedding/recovery policy. What to measure: (1) peak actual_W distribution (IR/heater/PTZ bursts), (2) frequency of limit/deny events per port. First fix: oversubscribe only after defining “limit-first, disconnect-last” and restoring critical ports first with staged re-power.

How should I log PoE events for maintenance and SLA?Maps to: H2-6 +

Short answer: log PoE as time-correlated events with reason codes so outages are explainable and repeatable. What to measure: (1) minimum fields: timestamp, port_id, class/requested_W, actual_W, port_off_reason, trip counters, duration, (2) system context: bus_V_min, temperatures, fan RPM. First fix: standardize a single event schema across CLI/SNMP/Web exports.

What’s the minimum production test to avoid field PoE issues?Maps to: H2-10 +

Short answer: a minimal gate must cover power bring-up, burst traffic loss, thermal soak, and recoverability after protection events. What to measure: (1) per-port power-on success + OCP response + MPS stability, (2) burst loss/IGMP stability and full-load temperature/derating behavior. First fix: ship only if logs prove deterministic behavior and controlled recovery under worst-case load.