Micro-DC Rack: DC Power Distribution, Protection & Telemetry
← Back to: Data Center & Servers
A Micro-DC Rack is an in-rack DC branch power platform that combines eFuses/high-side switches, telemetry + event logs, and OOB control to make power distribution controllable, protective, and diagnosable.
It keeps critical domains (OOB/alarms) alive while enforcing staged policies for faults and environmental/security events—without entering PSU/UPS or facility power topics.
Scope & Boundary
This page focuses on rack-internal DC distribution and branch-level control: how a Micro-DC Rack uses eFuses/high-side switches, per-branch sensing, event logs, and OOB hooks to make power delivery controllable, protectable, and observable at the branch level.
In scope (what this page goes deep on)
- DC bus → branch channels: how loads are segmented, switched, and isolated inside a rack.
- Per-branch protection: OCP/short/OV/UV/OT, SOA limits, reverse-current blocking, inrush control.
- Per-branch observability: voltage/current/temperature telemetry, fault reason codes, timestamps.
- Remote operations: power-cycle, staged turn-on, load shedding, “retry vs latch-off” policy.
- OOB hooks: minimal interfaces for reading telemetry/logs and applying safe, auditable control.
Out of scope (explicitly not covered here)
- PSU internal topologies (e.g., PFC/LLC), UPS/ATS behavior, or facility wiring codes/standards.
- AC mains PDU design details beyond a boundary comparison.
- PCIe/CXL fabrics, ToR interposers, or accelerator interconnects (separate pages).
- Full BMC SoC architecture and TPM/HSM internals (only referenced as integration dependencies).
Sibling-page link strategy (listed, not expanded)
- Need PSU details → CRPS / Server PSU
- Need bus hot-swap deep dive → 48 V / 12 V Bus & Hot-Swap
- Need broader rack sensors/access → Rack Environment & Access Control
The boundary is intentional: Micro-DC Rack treats upstream power as a DC source black box and concentrates on branch-level safety, control, and diagnostics.
1-minute Definition: What a Micro-DC Rack Solves
A Micro-DC Rack is a rack-internal DC distribution system that splits a DC bus into multiple protected branches, using eFuses/high-side switches to switch and isolate loads and using per-branch sensing to report telemetry and fault events. It enables safe remote operations—power-cycle, staged turn-on, and load shedding—through OOB management hooks with auditable logs and timestamps.
The focus is not “how power is generated,” but how rack-internal DC is distributed, protected, observed, and controlled at branch granularity.
Where it appears (typical deployments)
- Edge micro-sites / remote racks: limited on-site access, strong need for remote recovery and audit trails.
- Small private clusters: mixed loads (compute/network/storage) with different criticality levels.
- Unattended enclosures: environmental or access events must trigger safe, staged power actions.
What it enables (three keywords)
- Switching: per-branch on/off, power-cycle, staged ramp, group sequencing.
- Protection: fast short response, controlled current limiting, thermal/SOA enforcement, reverse blocking.
- Observability: branch V/I/T telemetry + reason codes + timestamps + “before/after” snapshots.
Practical boundary vs “traditional PDU” (conceptual, not AC deep-dive)
- Granularity: Micro-DC is built for per-branch actions and diagnostics, not only aggregate metering.
- Diagnostics: “why did it shut off?” is answered by reason code + timestamp, not guesswork.
- Automation: control is designed to be called by OOB workflows with permissions and audit logs.
System Architecture: Power Path + Management Path
A Micro-DC Rack is defined by two parallel paths: the power path (how DC is distributed and protected) and the management path (how measurements, events, and control decisions flow to OOB networks and platforms). The upstream supply is treated as a DC source black box; the emphasis is rack-internal segmentation, safety, and observability.
Power path (rack-internal)
- DC Source (black box) → Rack bus / busbar → Branch channels (×N) → Loads.
- Branch channels provide switching (on/off, power-cycle, staged turn-on) and protection (OCP/short/OT/SOA/reverse block).
- Loads are organized by criticality into domains (IT / Network / Aux), enabling predictable policies such as load shedding.
Management path (measure → log → act)
- Sensors (V/I/T + door/tamper) feed aggregation (ADC/MUX and low-speed buses).
- A controller enforces safe actions and creates event logs (reason codes + timestamps + snapshots).
- An OOB uplink exposes branch/domain objects to a platform for alerting, tickets, and audits.
Multi-domain power: a practical organizing model
- IT domain: compute loads that may tolerate staged power but require stable recovery procedures.
- Network domain: connectivity-critical loads often kept at higher priority during shedding.
- Aux domain: sensing, access, indicators, and OOB control power kept alive for diagnosis and recovery.
Power Distribution Strategy (Rack-internal, not PSU topology)
Distribution strategy is expressed as bus choices, two-level protection boundaries, and remote operation policies. The goal is predictable rack behavior: one faulty branch should be isolated quickly, while critical domains and OOB diagnostics remain available.
Bus voltage trade-offs (effects inside the rack)
- Higher bus voltage reduces current for the same power, easing conductor loss and busbar size.
- Device stress shifts: branch switches must tolerate higher voltage transients and safe SOA margins.
- Measurement implications: lower current reduces shunt heating; higher voltage increases the need for robust input protection and divider accuracy.
Two-level protection (avoid “one fault takes the rack”)
- Level 1 — Bus-level protection: a rack-wide guardrail and last-resort cutoff (referenced only).
- Level 2 — Branch-level electronic protection: the primary fault isolator (OCP/short/OT/SOA/reverse).
- Rule of thumb: branch channels should trip first and log the reason; bus-level protection is a rare fallback.
Remote operation policies (turn actions into predictable behavior)
- Power-cycle: define minimum off-time, retry budget, and “latch-off” conditions for repeated faults.
- Load shedding: prioritize domains so Aux/OOB stays alive for diagnosis and recovery.
- Staged turn-on: group branches and apply stagger intervals to reduce bus droop and false trips.
Branch Protection Core: eFuse / High-Side Switch (Deep Dive)
A branch channel is the unit that turns rack-internal DC into a controllable and diagnosable service. It must (1) deliver current safely under normal load, (2) isolate faults fast without collapsing the bus, and (3) explain every action using reason codes, timestamps, and snapshots. Topics are limited to DC-bus internal behavior (no facility/AC events).
What a branch channel must guarantee
- Controlled power: on/off, power-cycle, staged turn-on, and group policies.
- Electronic protection: OCP/short/OT/OV/UV with SOA-aware behavior.
- Isolation by design: one branch fault should not drag the whole rack bus down.
- Forensics: every trip produces a reason code + timestamp + “before/after” telemetry snapshots.
Mechanism: protection loops inside the branch
- Current path: switch FET(s) + sense element (shunt or RDS(on)) feed fast detection and measurement.
- Control path: gate control enforces soft-start and a programmable current-limit profile.
- Thermal path: temperature sensing triggers derating, cooldown, or latch-off depending on policy.
- Stateful recovery: retry budget, backoff, and latch-off conditions prevent endless oscillation.
Field pitfalls (why it “trips wrong” or “fails to trip”)
- dI/dt + parasitic L causes bus or switch-node spikes that look like OV/UV or false short events.
- Sense placement & drift: hot shunt or poor Kelvin routing biases current measurement and thresholds.
- Capacitive loads + sequencing: inrush pushes current-limit into thermal stress and late OT trips.
- Concurrent turn-on: multiple branches start together → bus droop → UV cascades; mitigate with stagger and priority domains.
Key specification checklist (spec → impact → knob → field symptom)
| Spec item | What it controls | Engineering knob | Field symptom if wrong |
|---|---|---|---|
| VIN / VBUS range | Operating margin vs DC-bus transients | OV/UV thresholds + deglitch + safe derating | Random resets during load steps; unexplained UV/OV events |
| ICONT / IPEAK | Thermal stability and startup envelope | Stagger, soft-start, inrush shaping | Stable idle but trips during boot or simultaneous start |
| SOA / short energy | Whether the switch survives worst-case faults | Fast short detect + limit profile + timeout | FET damage despite “not that high” steady current |
| Limit mode | Fault containment vs heating | Constant-current vs foldback vs fast-trip | Either nuisance trips (too aggressive) or overheating (too soft) |
| Short response time | Bus stability and device survival | Blanking/deglitch tuned to real parasitics | Does not trip on hard short, or false short on fast load steps |
| Reverse blocking | Domain isolation and backfeed prevention | Back-to-back FET behavior + reverse thresholds | Unexpected cross-domain coupling; “ghost power” paths |
| OV/UV + debounce | Cascading trips during bus droop/spike | Debounce windows + staged retry policies | Rack-wide oscillation: trip → recover → trip loops |
| Thermal sensing | Preventing long-tail failures | Cooldown vs latch-off; derating thresholds | Late OT trips after minutes; performance derates are invisible |
| Current accuracy + drift | Threshold truthfulness over temperature | Kelvin sense, calibration, conservative margin | “Doesn’t look high” but trips, or never trips until damage |
| Telemetry bandwidth | Debug resolution vs noise + storage | Low-rate trends + event snapshots | Either too noisy to trust, or too sparse to explain a trip |
| Reason code model | Forensic explainability | Primary + flags, consistent versioning | “Trip happened” with no actionable why/how/when |
Telemetry & Event Log: Make “Observability” Debuggable
Observability is useful only when it answers: which branch, why, when, and what changed before/after. This section defines the minimal telemetry objects and event-log design needed for reliable troubleshooting in a Micro-DC Rack.
Telemetry objects (by layer)
- Bus: Vbus (and optionally Ibus) to detect droop and operating margin.
- Branch: Ibranch, channel temperature, switch state, and fault flags.
- Env / Access: temperature/humidity plus door/tamper events for correlation.
Status & configuration provenance (avoid “unknown changes”)
- State: ON/OFF/LIMIT/TRIP/COOL with last transition cause.
- Reason code: primary cause + flags (e.g., OCP with UV flag).
- Retry: retry counter, last retry time, and latch-off conditions.
- Config version: threshold/policy version ID used at the time of an event.
Event-log “three-piece set” (dictionary · timestamps · persistence)
- Event dictionary: OCP/SC/OT/UV/OV/REMOTE_OFF/DOOR_OPEN as a versioned list.
- Timestamp strategy: local monotonic time + platform-aligned approximate time.
- Buffer & power-loss behavior: ring buffer + critical event snapshots (minimum viable).
Minimal event record fields (field → purpose → pitfall)
| Field | Purpose | Common pitfall if missing |
|---|---|---|
| event_type | Normalize causes (OCP/SC/OT/UV/REMOTE_OFF/DOOR_OPEN) | Different faults look identical; automation can’t route tickets |
| source_id | Bind to branch/domain/bus object | Cannot answer “which branch” quickly |
| t_mono | Reliable ordering on-device (monotonic) | Events reorder after reboots or clock changes |
| t_approx | Approximate absolute time aligned to platform | Hard to correlate with door/env alerts and platform logs |
| reason_code | Primary cause + flags (e.g., OCP with UV flag) | Only a “trip happened” statement, not actionable |
| pre_snapshot | State + V/I/T immediately before the event | No evidence for whether it was droop, inrush, or drift |
| post_snapshot | State + V/I/T immediately after the event | Cannot confirm recovery or thermal consequences |
| retry_count | Expose oscillation and policy behavior | Hidden “retry storms” waste time and mask real faults |
| config_version | Record threshold/policy revision in effect | “It used to work” becomes un-debuggable after config changes |
| actor (optional) | Mark remote/manual actions for audit trails | Remote-off events look like faults; accountability is lost |
Env/Security Sensors: Power-Aware Policies (Logic + Interfaces)
Environmental and security signals are not “extra dashboards” in a Micro-DC Rack—they become policy inputs that shape branch actions (derate, shed, lockout) and create auditable evidence. This section focuses on linkage logic, qualification (debounce/hysteresis), and event-log binding.
Sensor scope (kept intentionally narrow)
- Thermal / humidity: temp, humidity (trend + threshold events).
- Door / tamper: door-open, chassis tamper, service-panel open.
- Smoke / leak: presence events only (severity mapping is policy-driven).
- Interface: sensors feed the same policy engine that can actuate branch/group power states.
Power-side action primitives (what “linkage” can actually do)
- Branch actions: remote-off, lockout (manual clear), derate, delayed retry/cooldown.
- Group actions: load shedding by priority (non-critical first), staged restore.
- Preserve domains: keep OOB + alarm domain alive even during emergency shedding.
- Evidence capture: trigger pre/post snapshots and annotate the event log.
Policy examples (trigger → qualify → act → record → clear)
- Door open → debounce → write audit event + restrict high-risk actions (e.g., threshold changes) → record actor/request_id → clear when door closed + cool-down window.
- Smoke / leak → qualify + severity map → shed non-critical group first; preserve OOB/alarm → record “policy_shed” reason code + snapshots → clear requires manual confirmation.
- Temp high → hysteresis → derate group; if sustained, shed by priority → record “thermal_policy” + config_version → clear after sustained safe temp (with hysteresis).
Data trust (do not treat noise as emergencies)
- Debounce: door/tamper chatter and intermittent contacts must not cause oscillation.
- Hysteresis: avoid repeated shed/restore around a single threshold.
- Sensor fault detect: open/short/drift should raise sensor_fault and switch to a safe degraded policy.
- Event binding: every policy action must reference the sensor event_id and snapshots for postmortems.
Policy template table (copyable operating model)
| Sensor event | Severity | Power action | Preserve domain | Log requirement | Clear condition |
|---|---|---|---|---|---|
| Door open | L1 | Audit + restrict risky controls (threshold changes / bulk cycles) | OOB + alarm | event_type=DOOR_OPEN, actor, request_id, t_mono | Door closed + debounce + short cool-down |
| Temp high | L2 | Derate → if sustained, shed by priority group | OOB + alarm | event_type=TEMP_HIGH, config_version, snapshots | Safe temp sustained (with hysteresis) |
| Smoke / leak | L3 | Shed non-critical first; lockout restore until manual confirm | OOB + alarm | event_type=SMOKE/LEAK, reason_code=POLICY_SHED, snapshots | Manual clear + follow-up safe-check |
| Sensor fault | L2 | Degraded policy (limit risky automation), raise alarm | OOB + alarm | event_type=SENSOR_FAULT, sensor_id, fault_mode | Sensor recovers + validation window |
OOB Management Hooks: Where the BMC Connects—and Where It Stops
This section defines the minimum OOB closed loop for a Micro-DC Rack: read telemetry, apply safe branch/group controls, pull evidence logs, and trigger updates—without diving into BMC SoC internals. The goal is a clean boundary: object model + privilege gates + audit fields.
Minimum OOB loop (four capabilities)
- Read: bus/branch/env telemetry with clear object identifiers.
- Control: branch on/off, load-shed by group, policy/threshold application with versioning.
- Evidence: pull event logs + snapshots for postmortems and ticket attachments.
- Update hook: trigger firmware update workflows and report status (no deep implementation).
Platform responsibilities (beyond the rack)
- Alert routing: map events to severities and destinations (NOC, paging, ticketing).
- Ticket workflows: attach log packets and snapshots; enforce runbooks.
- Audit: immutable trails for who changed what, when, and why.
- Fleet policy: batch rollouts of thresholds and policies with staged validation.
Bus/protocol boundary (engineering allocation)
- I²C / I³C: local management bus for configuration + moderate-rate telemetry.
- SMBus / PMBus: power-oriented objects (telemetry + configuration) with consistent semantics.
- RS-485 (optional): longer reach / stronger noise immunity for slower control/monitor paths.
- Boundary rule: protocols are tools—object ownership and privilege gating define the system boundary.
Security hooks (hooks only; implementation lives elsewhere)
- Authentication: every write/control request is attributable to an actor/session.
- RBAC: read-only vs operator vs security admin; risky actions require elevated rights.
- Update safety: anti-rollback hook and rollback-safe state reporting.
- Non-repudiation: audit fields in logs (actor, request_id, approval_state, config_version).
Privilege matrix (action → role → audit requirement)
| Action | Role | Scope | Audit requirement | Safety gate |
|---|---|---|---|---|
| Read telemetry | Read-only | Rack / group / branch | request_id, t_mono (optional) | None |
| Pull logs & snapshots | Operator | Rack / branch | actor, request_id, time range | Rate limit |
| Remote off (single branch) | Operator | Branch | actor, reason, t_mono, branch_id | Optional confirm |
| Power-cycle (non-critical group) | Operator | Group | actor, request_id, group_id, snapshots | Cooldown window |
| Change thresholds / policies | Security admin | Group / rack | actor, config_version, approval_state | Two-step apply |
| Firmware update trigger | Security admin | Controller / rack | actor, package_id, anti-rollback status | Staged rollout |
| Clear latch-off / unlock | 2-person approve | Branch / group | two actors, reason, t_mono, snapshots | Dual confirmation |
Hardware Implementation Checklist: From “Works” to “Production-Stable”
Production stability in a Micro-DC Rack is usually lost in predictable places: current sensing realism, switching thermal paths, rack-internal power integrity, and management-bus robustness. This checklist focuses on the highest-yield details that prevent false trips, missing trips, and telemetry dropouts.
1) Current sensing (shunt + Kelvin + input protection)
- Shunt placement: keep the power loop compact; avoid placing the shunt where return current can bypass it.
- True Kelvin routing: sense traces must be a dedicated pair, symmetric, and kept away from high di/dt nodes.
- Thermal coupling: expect temperature gradients; reduce drift by keeping the shunt environment predictable.
- Front-end protection: protect amplifier/ADC inputs from inductive spikes without distorting normal sensing.
- Bandwidth intent: decide whether the control reacts to peaks or averages; filter accordingly to avoid “spike = trip”.
2) Switch & thermal path (heat flow is part of protection accuracy)
- Heat path: device → copper → via array → backside spreader; validate that heat actually leaves the hot spot.
- OT thresholds: ensure thermal shutdown and recovery behavior does not oscillate (add time qualification where needed).
- Copper sizing: sized for both current and heat spreading; narrow necks create local thermal cliffs.
- Parallel / redundancy (light touch): if used, keep symmetry; treat redundancy as a policy-managed grouping problem.
3) Bus and branch PI (rack-internal only)
- Return path discipline: high di/dt current must loop locally; avoid “return wandering” through sensitive ground.
- Decoupling distribution: bulk near the bus entry + high-frequency close to branch switching elements.
- Inrush concurrency: stagger/group enables so “many branches at once” does not create a bus droop trip cascade.
- Spike containment: the inductive spike source is often layout; treat mitigation as loop + placement first.
4) Communications robustness (I³C/I²C/PMBus + isolation trigger)
- Line length & pull-ups: plan for total bus capacitance; pull-ups must meet rise-time without creating ringing.
- Topology clarity: segment or buffer when needed; prevent a single long spur from dominating bus timing.
- Common-mode reality: if crossing grounds/domains or long cable runs, consider isolation at the boundary.
- Hang recovery: include a bus reset/recovery path (watchdog + controlled re-init) to avoid “silent telemetry loss”.
5) Fail-safe defaults (what happens when control is lost)
- Default states: define which branches default OFF and which must stay ON for OOB/alarm continuity.
- Priority domains: OOB + alarm domain power must be preserved during shedding/lockout policies.
- Config versioning: threshold/policy versions should be logged so “changed policy” is visible in postmortems.
Validation & Production Test: Calibration, Fault Injection, and Field Self-Checks
Testing for a Micro-DC Rack is not just “power on and read telemetry.” It must prove that protection actions are correct, telemetry is trustworthy across conditions, and evidence logs are sufficient for fast troubleshooting. The emphasis here is test hooks specific to rack DC distribution: stimulus → expected action → expected log fields.
R&D validation (stimulus → measurement → pass criteria)
- Overcurrent / short: step loads and short fixtures; measure response time and state transitions.
- SOA / thermal: steady + pulsed load; confirm derating behavior and thermal shutdown stability.
- False-trip resilience: noise injection, bus droop, concurrent enables; track trip probability and recovery behavior.
- Concurrency scenarios: staged enable vs all-at-once; confirm no cascading bus droop trips.
Production test + maintenance (minimum set with high coverage)
- Channel self-test: on/off response and readback of state bits and fault flags.
- Sensor open/short detection: door/leak/temp fault checks with explicit reason codes.
- Current calibration: one-point or two-point strategy with temperature drift compensation planning.
- Log pull + integrity check: verify log packet completeness and the presence of required audit fields.
Fault injection mindset (matrix-driven, not ad-hoc)
- Fault types: OCP, short, OT, UV/bus droop, remote-off, sensor-fault.
- Expected actions: limit/derate, shed, lockout/manual clear, preserve OOB + alarm domain.
- Expected logs: reason_code, t_mono, snapshots, config_version; plus actor/request_id for writes.
Field self-checks (fast, safe, and actionable)
- Read-only baseline: state bits, thresholds, policy/config version, and recent events.
- Low-risk actuation: validate a non-critical branch/group control under controlled windows.
- Degraded operation: if sensors are faulty or telemetry is partial, restrict risky automation and keep OOB/alarm online.
Parts / IC Selection Pointers: Vendor Questions + Example MPNs
This section is intentionally not a buying guide. It is a practical checklist of “must-ask questions” per function block, plus illustrative MPN examples to anchor the discussion. Always validate derating, SOA, thermal layout, diagnostics, and production test hooks with the latest datasheets and board-level measurements.
A) eFuse / High-Side Switch (branch protection core)
Two common approaches exist: integrated eFuse (fast integration) or controller + external MOSFET (SOA/thermal flexibility).
- Selection dimensions: VIN/VBUS range, continuous current, surge/peak + SOA, limit mode (constant/foldback/fast-trip), short-circuit response, reverse blocking, OV/UV/OT, dv/dt ramp, telemetry & diagnostics, retry/lockout policy.
- Must-ask questions:
- Is SOA specified with pulse conditions representative of branch inrush and short fixtures?
- What is the limit behavior (constant vs foldback) and how does it impact dissipation during stalls/overloads?
- How are faults differentiated (OCP vs SC vs OT vs UV/OV vs reverse)? Is there a latched last-fault register?
- Is retry policy configurable (attempt count, backoff, lockout, manual clear)? What is the default after a watchdog reset?
- What is the measurable response time from fault onset to limit/trip under worst-case wiring inductance?
- Example MPNs (illustrative):
- Texas Instruments TPS2662 / TPS2663 (integrated eFuse class, high-voltage range family)
- Texas Instruments LM5069 (hot-swap / inrush controller class, external MOSFET)
- Texas Instruments LM25066A (hot-swap + current monitor + PMBus/SMBus telemetry class)
- Infineon PROFET™ high-side switch families (12V/24V domain branch switching class; family selection depends on load)
B) Current / Voltage Monitoring (trustworthy observability)
- Selection dimensions: accuracy + drift, common-mode range, input transient behavior (saturation + recovery), sample rate/bandwidth, alert comparators (threshold granularity, hysteresis, debounce), interface (I²C/SMBus/PMBus), multi-channel needs.
- Must-ask questions:
- Under bus spikes/droops, does the monitor saturate? If yes, what is the recovery time?
- Can alerts be qualified (time debounce) and do they support hysteresis to prevent alarm storms?
- Is peak vs average reporting supported (or can firmware derive it without aliasing)?
- What calibration hooks exist (offset/gain trim, temperature compensation strategy)?
- Example MPNs (illustrative):
- Texas Instruments INA238 (precision current/power monitor class)
- Texas Instruments INA233 (current/power monitor class, SMBus compatible)
- Texas Instruments INA3221 (3-channel shunt/bus monitor class)
- Analog Devices LTC2946 (coulomb/energy monitor class)
C) Sensors & Aggregation (temp/humidity/door/leak + mux/ADC)
- Selection dimensions: sensor fault detectability (open/short/stuck), debounce/hysteresis strategy, multiplexing scale (channels, address conflicts), bus recovery (hung bus isolation/reset), ADC channels/resolution, input protection for long sensor leads.
- Must-ask questions:
- How are sensor wiring faults represented (explicit fault flags vs “invalid readings”)?
- Does the mux/buffer allow isolating a misbehaving branch device without taking the whole bus down?
- What is the recommended bus pull-up / capacitance limit for the intended cable lengths inside the rack?
- Example MPNs (illustrative):
- Texas Instruments TCA9548A (8-channel I²C switch / mux class)
- NXP PCA9548A (8-channel I²C switch / mux class)
- Texas Instruments ADS1115 (I²C ADC class for slow-to-moderate telemetry)
- Sensirion SHT31 (temp/humidity sensor class)
D) Controller (low-power MCU vs small management SoC)
- Selection dimensions: watchdog + reset domains, brownout behavior, non-volatile event storage hooks, bus peripherals (I³C/I²C/SMBus/PMBus), secure update hooks (versioning + rollback path), and deterministic recovery after faults.
- Must-ask questions:
- During brownouts, can the design guarantee last-fault capture (reason_code + timestamp + snapshot)?
- Is a robust update/rollback mechanism feasible (dual image / safe fallback), and can updates be audited (actor/request_id)?
- After a watchdog reset, what is the default policy for branch enables and the OOB/alarm domain?
- Example MPNs (illustrative):
- STMicroelectronics STM32H7 / STM32G4 (MCU class)
- NXP LPC55S69 (MCU class)
- Microchip SAME54 (MCU class)
- ASPEED AST2600 (management SoC / BMC-class device; use only for “hooks” here, not a deep dive)
E) Uplink communications (Ethernet PHY / isolation + optional RS-485)
- Selection dimensions: PHY robustness, ESD/EMI tolerance strategy (including magnetics/isolation boundary), link-down behavior, offline logging/buffering strategy, and optional long-line fallback (RS-485).
- Must-ask questions:
- If uplink drops, can the system continue logging locally and backfill events after recovery?
- Where is the isolation boundary (cross-domain/long cable/common-mode noise), and does isolation preserve management continuity?
- For RS-485 fallback, what is the minimum control+alarm set that must remain available during degraded operation?
- Example MPNs (illustrative):
- Texas Instruments DP83867 (Gigabit Ethernet PHY class)
- Microchip KSZ9131 (Gigabit Ethernet PHY class)
- Texas Instruments SN65HVD1781 (RS-485 transceiver class)
- Analog Devices ADM2587E (isolated RS-485 transceiver class)
| Checklist item | Option A (fill in) | Option B (fill in) | Option C (fill in) |
|---|---|---|---|
| SOA / surge margin | TBD | TBD | TBD |
| Limit behavior | CONST / FOLD / TRIP | CONST / FOLD / TRIP | CONST / FOLD / TRIP |
| Reverse blocking | OK / N/A | OK / N/A | OK / N/A |
| Diagnostics richness | reason_code granularity | reason_code granularity | reason_code granularity |
| Retry / lockout policy | attempts + backoff | attempts + backoff | attempts + backoff |
| Telemetry interface | I²C / SMBus / PMBus | I²C / SMBus / PMBus | I²C / SMBus / PMBus |
| Production test hooks | self-test + logs | self-test + logs | self-test + logs |
FAQs: Micro-DC Rack Branch Protection, Telemetry, Env Linkage, and OOB Hooks
These FAQs stay strictly inside the Micro-DC Rack boundary: branch eFuse/high-side switching, telemetry + event logs, environment/door/tamper linkage, and OOB management hooks.
1
What is the practical engineering boundary between a Micro-DC Rack and a traditional rack PDU?
What is the practical engineering boundary between a Micro-DC Rack and a traditional rack PDU?
2
Why can an eFuse trip frequently even when “average current looks small”?
Why can an eFuse trip frequently even when “average current looks small”?
3
Constant-current vs foldback current limiting—how to choose, and what are the common pitfalls?
Constant-current vs foldback current limiting—how to choose, and what are the common pitfalls?
4
What happens if short-circuit response is too fast or too slow?
What happens if short-circuit response is too fast or too slow?
5
After a remote power-cycle, the load still won’t come up—what codes and timestamps should be checked first?
After a remote power-cycle, the load still won’t come up—what codes and timestamps should be checked first?
6
How can “critical-domain priority power” be implemented without introducing a new single point of failure?
How can “critical-domain priority power” be implemented without introducing a new single point of failure?
7
How should door/tamper events link to power actions without causing accidental shutdowns?
How should door/tamper events link to power actions without causing accidental shutdowns?
8
What is the most reasonable division of labor for I³C, I²C, and PMBus inside a Micro-DC Rack?
What is the most reasonable division of labor for I³C, I²C, and PMBus inside a Micro-DC Rack?
9
If current measurement drifts over time, how to tell shunt heating from layout issues or the sampling chain?
If current measurement drifts over time, how to tell shunt heating from layout issues or the sampling chain?
10
How should an event log be designed so “who triggered the power action” remains traceable after power loss?
How should an event log be designed so “who triggered the power action” remains traceable after power loss?
11
If simultaneous power-up causes bus droop, how should branch staggering be implemented for stability?
If simultaneous power-up causes bus droop, how should branch staggering be implemented for stability?
12
When smoke/leak/high-temperature alarms occur, what is a recommended staged load-shedding policy?
When smoke/leak/high-temperature alarms occur, what is a recommended staged load-shedding policy?