OCP OpenRack Baseboard for I3C/I2C Bus Management
← Back to: Data Center & Servers
An OpenRack baseboard turns scattered I³C/I²C devices into an operable, segmentable, and recoverable management fabric by enforcing discovery/addressing, multi-domain telemetry context (domain + timestamp + identity), and tiered bus recovery—so observability survives hot-plug and power-domain transitions.
H2-1 — What “OpenRack Baseboard” Covers
An OpenRack baseboard is the physical layer for sideband connectivity and sensor/FRU visibility. Its engineering core is I3C/I2C bus governance (topology, segmentation, isolation), device discovery & addressing (DAA/address plan), and multi-domain telemetry aggregation (power/thermal + domain state) into the management plane.
- A bus architecture that stays stable at scale: segmented backbone + isolated islands so one fault does not take down the entire rack-side visibility.
- A repeatable discovery/addressing scheme: deterministic inventory mapping across hot-join events and multi-domain power states.
- A telemetry model that is actionable: raw readings → thresholds/alerts → event evidence (timestamps + affected segment/domain).
| Topic | In scope (this page) | Out of scope (linked elsewhere) |
|---|---|---|
| Management | Bus master placement, sideband signal paths, alert wiring, inventory evidence fields. | Redfish/IPMI software stack architecture, firmware workflows, UI/telemetry dashboards. |
| Power | Telemetry ingestion (voltage/current/power), domain states (AON/MAIN/HOTPLUG), back-power prevention at the bus layer. | PFC/LLC/CRPS conversion topology, VRM control-loop stability/compensation details. |
| Compute/IO | Sideband presence/FRU, sensor islands near connectors, segment isolation strategy. | PCIe/NVMe/IB/Ethernet protocol stacks, retimer equalization, dataplane acceleration. |
Practical rule: if a paragraph starts describing how power is converted or how a management protocol is implemented, it belongs to a sibling page. This page only defines the bus/device layer that makes those systems observable and recoverable.
H2-2 — System Context & Interfaces
A baseboard must be describable as a bus graph plus power/reset domains. The objective is to make every sideband endpoint (FRU, temperature, power monitors, presence) discoverable, addressable, and recoverable under real rack conditions: standby states, partial power loss, and hot-join events.
| Device class | Bus segment | Addressing | Power domain | Reset / Enable | Alert path | Failure impact |
|---|---|---|---|---|---|---|
| FRU / EEPROM | I2C Island-A | Static / behind mux | AON | Always enabled | Polling | Low (segmentable) |
| Temp array | I2C Island-A | Static | AON or MAIN | GPIO enable | INT (optional) | Medium (can flood bus) |
| Power monitor | I2C Island-B | Static | MAIN | Domain PGOOD gated | ALERT#/INT | High (stuck-low risk) |
| Presence / GPIO exp | I2C Island-A/C | Static | HOTPLUG | Hot-swap domain | INT | High (hot-join noise) |
| I3C hub/bridge | I3C Backbone | DAA / dynamic | AON | Always enabled | In-band | Critical (backbone) |
- Define an AON minimum observable chain: a small set of endpoints that must remain readable in standby or partial failure (e.g., FRU + environment temp + backbone health).
- Assign pull-ups to the correct domain: pull-ups that remain powered while a segment endpoint is off must not create back-power paths through I/O structures.
- Segment by domain boundaries: any domain that can be powered off or hot-plugged should sit behind a switch/mux/buffer so a fault cannot clamp the entire bus.
- Reset/enable must be coherent with discovery: if an endpoint can reboot independently, discovery should be re-entrant (re-scan or re-DAA) without destabilizing the backbone.
Minimum-observable does not mean “monitor everything”. It means “keep enough visibility to diagnose and recover”: domain presence, backbone health, and the most critical thermal/power indicators—without relying on full system power.
H2-3 — Bus Topology Patterns
The baseboard should be treated as a bus graph with explicit segments. The objective is not simply connectivity, but fault containment: long traces, high capacitive loads, and hot-join domains must not clamp the backbone or destabilize discovery.
Single-master (default)
One backbone master simplifies arbitration and makes recovery deterministic. Segments isolate failures so a single stuck line becomes a local event rather than a rack-wide visibility loss.
Multi-master (only with strong justification)
Adds arbitration, timing edges, and new failure modes. If introduced, segment boundaries and isolation must be stricter so a misbehaving master cannot flood retries or hold the bus in a degraded state.
- Physical distance: long runs amplify edge distortion and noise pickup; buffers/switches help restore electrical margin and limit the affected length.
- Capacitive loading: many endpoints slow edges and increase glitch sensitivity; islands cap per-segment load so timing stays stable under worst-case conditions.
- Power domains: any domain that can be off or partially powered must be behind a gate to prevent back-power and “stuck-low” propagation.
- Noise sources: hot-plug connector zones and high-current regions deserve separate islands to keep backbone discovery stable.
| Component | Best for | Not a fix for | Typical placement |
|---|---|---|---|
| MUX | Address conflicts, branch selection, reducing visible endpoints per scan. | Domain back-power, hot-join disturbance, hard fault isolation. | Between backbone and legacy branches; upstream of small islands. |
| SWITCH | Segment-level isolation, fault containment, controlled attach/detach of a domain. | Edge restoration on long lines if used alone; protocol-level inventory logic. | At domain boundaries (MAIN/HOTPLUG), at connector entry points. |
| BUFFER | Electrical margin: rise-time help, fanout, restoring edges after long runs. | Back-power prevention, isolating a stuck-low endpoint across domains. | On long trunks, before high-fanout branches, near noisy zones. |
- Hot-plug domains must be gated: attach/detach should affect only the island, not the backbone.
- Pull-up ownership must be explicit: avoid powering an unpowered island through bus pull-ups or I/O structures.
- Discovery must be scoped: hot-join triggers a limited rescan/attach sequence for that island, not a disruptive global churn.
Rule of thumb: if a failure can pull SDA/SCL low, it must be possible to isolate that failure within one segment using a gate (switch/isolator), regardless of whether addressing conflicts also exist.
H2-4 — I2C vs I3C in Baseboards
The practical question is not “new vs old”, but what becomes easier to operate. I3C strengthens discovery semantics and hot-join handling, while I2C islands remain valuable for legacy endpoints and simple devices that do not require dynamic identity or in-band alert behavior.
- Dynamic Address Assignment (DAA): reduces static address collisions and supports deterministic inventory mapping when endpoints change.
- Hot-join semantics: enables controlled attach workflows so insertion events can be scoped to an island rather than destabilizing the backbone.
- In-band interrupt / alerting: improves time-to-detect for threshold events without relying on heavy polling.
- Stronger discovery vocabulary: supports a cleaner “device identity → inventory record” pipeline at the bus/device layer.
- Legacy endpoints: FRU/EEPROM and many simple sensor classes remain cost-effective and widely available on I2C.
- Operational simplicity in islands: stable static addressing is acceptable for small, tightly bounded segments.
- Isolation-first design: keeping legacy on islands can reduce the blast radius of electrical faults or partial-power behavior.
Step 1 — I3C backbone (AON)
Upgrade the backbone that must remain visible in standby. Make discovery and minimum telemetry deterministic and recoverable.
Step 2 — Bridge + segment gates
Bring legacy islands under the same inventory model via bridges and gates, so hot-plug or power-off behavior stays local.
Step 3 — Replace only where it pays
Move endpoints that benefit from dynamic identity or in-band alerts, while leaving stable low-complexity devices on I2C islands.
| Dimension | I2C (islands) | I3C (backbone) |
|---|---|---|
| Addressing | Static planning; collisions handled via straps or mux segmentation. | DAA supports dynamic identity management and reduces static collision pressure. |
| Discovery | Scan-based visibility; works best for small bounded segments. | Stronger discovery semantics; better for inventory mapping across churn. |
| Alerts | Often sideband INT/ALERT or polling-driven detection. | In-band alerting reduces dependency on heavy polling and speeds reaction. |
| Scale behavior | Large fanout stresses edges and timing; segmentation becomes mandatory. | Backbone-first helps centralize discovery while islands limit capacitive blast radius. |
| Recovery hooks | Segment isolation + rescan; deterministic if islands are small. | Scoped attach/detach + re-DAA patterns support controlled churn handling. |
| Migration pitfalls | Address maps fragment easily if mux topology is undocumented. | Bridges must preserve isolation and identity mapping under power-domain changes. |
A mixed design is often the most robust: I3C for backbone-level discovery and alert semantics, I2C islands for bounded legacy endpoints. The design objective is stable observability, not maximal protocol uniformity.
H2-5 — Device Discovery & Addressing
A baseboard onboarding flow must be repeatable: plug in → detect → assign an address → verify reachability → register into inventory. Discovery becomes “operational” only when it records segment, power-domain state, alert path, and a verification result alongside the observed address.
Collision avoidance toolkit
Prefer segmentation (islands) to keep conflicting endpoints from sharing the same visible bus. Use straps/config pins when available. Use mux selection only when segmentation boundaries are documented and enforced.
Rule
The same numeric address can repeat across different islands, but must not appear in the same visible segment at the same time. Document every segment boundary and its gating element.
- When to run DAA: after segment attach, after power-domain transitions that change device visibility, and on hot-join events.
- Scope it: run onboarding on the affected segment/island, not the entire backbone, to avoid unnecessary address churn.
- Map to inventory: the inventory key must be device identity + segment context; the dynamic address is the current reachability handle.
- Record changes: log address transitions (before → after) and the reason (join, rescan, recovery).
| Evidence item | What to record | Why it matters operationally |
|---|---|---|
| Segment / island | Segment ID, upstream gate/switch, bridge path | Limits blast radius; enables scoped rescan and targeted isolation. |
| Power-domain | AON / MAIN / HOTPLUG state | Prevents false alarms when a domain is intentionally off; explains “missing” devices. |
| Alert path | In-band alert / INT line / none | Separates “device unreachable” from “event signal broken”; improves MTTR. |
| Address state | Static expected, or dynamic assigned; before → after on changes | Supports audit trails, avoids identity drift, and enables stable inventory linking. |
| Verification | Basic read/health check: pass/fail + failure category | Discovery without verification causes inventory pollution and noisy support tickets. |
This plan ties addressing to topology and domains. Use it to review changes before deployment.
| Domain | Segment / Island | Device Class | Inventory Key | Addr Type | Addr (Expected/Current) | Alert Path | Isolation Element | Notes |
|---|---|---|---|---|---|---|---|---|
| AON | B0 (Backbone) | Bridge | bridge.backbone.01 | DAA | dyn / dyn | in-band | switch gate | Scoped onboarding after attach |
| MAIN | I2C-1 | TEMP | temp.array.zoneA | Static | 0x4A / 0x4A | INT | mux branch | Address shared only within island |
| HOTPLUG | I2C-HP | FRU | fru.drawer.slot3 | Static | 0x50 / 0x50 | none | switch gate | Gate open only during service window |
This log turns “it disappeared” into a timed, explainable event with scope and recovery actions.
| Timestamp | Event | Domain / Segment State | Inventory Key | Address Before → After | Verify | Failure Category | Action Taken |
|---|---|---|---|---|---|---|---|
| YYYY-MM-DD hh:mm:ss | HOT-JOIN | HOTPLUG / gate=open | fru.drawer.slot3 | — → 0x50 | PASS | — | Inventory update |
| YYYY-MM-DD hh:mm:ss | RESCAN | MAIN / stable | temp.array.zoneA | 0x4A → 0x4A | FAIL | NACK storm | Isolate island, retry later |
| YYYY-MM-DD hh:mm:ss | RECOVERY | AON / backbone | bridge.backbone.01 | dyn → dyn | PASS | — | Scoped DAA on segment |
Boundary: this section defines bus/device-layer onboarding evidence (scope, address state, verification, logs). It does not define management protocol stacks or backend database implementations.
H2-6 — Bus Electrical Integrity
Most field failures begin as marginal edges: slow rise-time, excessive segment capacitance, glitches, or insufficient low-level margin. A bus that “works once” can still be unstable under temperature, hot-plug, or domain transitions. Evidence should be captured as waveforms and retry behavior, not assumptions.
- Estimate per-segment load: include trace length, connector parasitics, and endpoint input capacitance. Treat each island as a separate RC problem.
- Choose pull-up by segment: keep pull-up ownership aligned with the segment’s power domain to prevent back-power paths.
- Validate at the far end: measure rise-time and noise margin at the worst-case point (end of the longest branch), not only near the master.
What typically goes wrong
Short spikes can be interpreted as edges when rise-times are slow or thresholds are marginal. This can produce NACK storms or address-mapping drift during hot-join windows.
What to do (within scope)
Use segmentation and electrical buffering where long runs and noisy connector zones exist. Gate hot-plug islands so disturbances do not propagate into the backbone.
- Stretching amplifies marginality: one slow or partially-powered endpoint can elongate cycles and trigger timeouts.
- Marginal edges increase retries: a stable system shows low retry frequency; rising retries are an early warning long before “bus hang”.
- Contain the failure domain: a segment gate allows recovery actions (retry/rescan) without collapsing overall observability.
| Checkpoint | What to look at | Interpretation / next action |
|---|---|---|
| Rise-time | SCL/SDA edge speed at segment end; compare near-master vs far-end. | Slow edges suggest high C or weak pull-up; segment, buffer, or adjust pull-up ownership per domain. |
| Low-level margin | Low level “floor” stability under traffic; look for lifted lows. | Insufficient margin can cause false reads; isolate partial-power endpoints and review segment gating. |
| Glitches | Short spikes on SCL/SDA; correlate with missing devices or NACK storms. | Glitches + slow edges are a common pair; shorten/noise-isolate the segment and add electrical buffering where needed. |
| Retry / repeat-start | Protocol analyzer or firmware counters: retry rate over time. | Rising retries indicate shrinking margin; treat as pre-failure signal and scope the worst segment first. |
| Hot-plug window | Waveforms before/during/after attach; check if backbone edges degrade. | If backbone degrades, hot-plug domain is not contained; strengthen gating/isolation and rescan only that segment. |
Boundary: this section focuses on bus-edge integrity and measurement evidence. It does not prescribe full EMC design or chassis grounding rules.
H2-7 — Multi-Domain Power & Isolation
In baseboards, the most damaging failures come from cross-domain coupling: a powered-off island can back-power, drag the bus, or spread hot-plug disturbances into always-on visibility. The goal is to keep each domain electrically and operationally scoped with clear pull-up ownership and segment gating.
Typical paths
Cross-domain pull-ups, IO protection/clamp structures, and “half-powered” endpoints can create unintended current paths. The symptom is often unstable bus levels, stuck lows, or devices that never fully reset.
Design intent (within bus/domain scope)
Enforce segment visibility with gates/switches, keep pull-ups owned by the segment’s intended domain, and ensure powered-off islands become electrically quiet and logically invisible.
- Segment gates: isolate islands so a fault or off-domain endpoint cannot pull SCL/SDA low globally.
- Pull-up ownership: assign pull-ups to the domain that remains valid for that segment (avoid cross-domain pull-ups by default).
- Scoped recovery: isolate → log → rescan only the affected island (avoid global churn).
- Hot-plug = separate segment: attach islands behind a gate so plug-in transients do not degrade the backbone.
- Order of operations: power stable (PG) → open gate → discovery/addressing → verification → inventory update.
- Evidence: capture “attach/open/close” events with timestamps and segment IDs to explain reachability changes.
Use this matrix in design reviews to prove that off-domains cannot back-power or block the always-on view.
| Domain | Bus visibility | Pull-up owner | Gate element | When domain OFF | Recovery action |
|---|---|---|---|---|---|
| AON | Backbone B0 | AON only | Core gate | Must remain reachable; backbone stays stable | Scoped rescan of attached islands only |
| MAIN | I2C-1 / I2C-2 islands | MAIN (per island) | Island switch | Island should be invisible (gate closed) | Close gate → log → reopen after PG stable |
| HOTPLUG | HP island | HOTPLUG (local) | HP gate | Invisible during service / unpowered periods | Attach → verify → inventory; detach → mark absent |
Treat cross-domain connectivity as “allowed by design,” not accidental. If a signal is not listed here, it should not cross domains.
| Signal class | Allowed direction | Required properties | Notes (scope control) |
|---|---|---|---|
| I3C / I2C | Backbone → island (through gate) | Segment gate; pull-up ownership defined; OFF-domain becomes invisible | Prefer “attach/detach” semantics over always-connected wiring |
| ALERT / INT | Island → AON (minimal dependency) | Defined pull-up; known OFF-state behavior; debounced semantics | Use for “wake/attention” when in-band is unavailable |
| PRESENT / PGOOD | Island → AON | Stable level when OFF; no back-power path | Explains reachability changes without scanning |
| RESET (domain) | AON → island | Only valid when island domain is powered; no reverse feeding | Scope reset to the island to avoid global churn |
Boundary: this section defines domain visibility, segment gating, and pull-up ownership for bus stability. It does not define BMC protocol stacks, PSU/VRM power conversion, or full EMC grounding rules.
H2-8 — Telemetry Aggregation Model
A baseboard telemetry system is not a pile of sensors. It is a structured pipeline that binds every measurement to domain, segment, and an inventory key, with a clear path to alerts and event logs. The same measurement should be explainable over time (trend) and under transitions (attach/off/reset).
Layer 0 → 1
Raw reads become physical units (V/A/W/°C) with a domain + segment context.
Layer 2 → 3
Thresholds and alert policy produce events with timestamps and traceability back to inventory.
- Electrical: voltage/current/power per domain and segment (pair with domain state to avoid false alarms).
- Thermal arrays: hotspot/max/min and gradient cues (useful for localization, not only average temperature).
- Domain state: present/pgood/reset/attach (explains why data is missing or why a device is unreachable).
- Reachability evidence: segment gate state and retry level (links electrical issues to bus integrity symptoms).
| Alert path | Best fit | Design note (scope control) |
|---|---|---|
| In-band (I3C) | Structured events tied to discovery/inventory; when the bus view is stable and the segment is attached. | Bind alerts to domain + segment + key fields. |
| Out-of-band (ALERT/INT) | Minimal dependency signaling; “attention” when in-band is unavailable (OFF-domain transitions, early fault flags). | Define OFF-state behavior and debounce semantics; treat it as a trigger to fetch structured data later. |
Standardize field names so logs and alerts remain comparable across platforms and generations.
| Field name | Unit | Source class | Sampling suggestion | Threshold type | Domain | Segment | Inventory key link | Alert path | Log policy |
|---|---|---|---|---|---|---|---|---|---|
| dom.aon.vbus_v | V | power | normal | absolute + persistence | AON | B0 | power.aon.entry | in-band | periodic + on-alert |
| dom.main.pwr_w | W | power | normal | absolute + delta | MAIN | I2C-1 | power.main.zoneA | in-band | periodic |
| dom.hp.present | bool | domain_state | fast | edge-trigger | HOTPLUG | HP | fru.drawer.slotN | INT | on-change + on-alert |
| seg.hp.gate_state | bool | domain_state | fast | edge-trigger | HOTPLUG | HP | gate.hp.01 | in-band | on-change |
| tmp.zoneA.hotspot_c | °C | temp | normal | absolute + rate | MAIN | I2C-1 | temp.array.zoneA | in-band | periodic + on-alert |
| bus.i2c1.retry_level | count | reachability | normal | delta + persistence | MAIN | I2C-1 | bridge.i2c1 | in-band | periodic |
| evt.alert.ts | ms | event | on-event | — | any | any | inventory.key | in-band / INT | on-event |
Boundary: this section defines telemetry organization (fields, units, context binding, alert paths). It does not define storage backends, APIs, or BMC service implementations.
H2-9 — Fault Modes & Bus Recovery
The recovery objective is restoring observability without a full system reboot. Treat the baseboard bus as segmented infrastructure: recover the AON backbone first, then isolate and reattach affected islands. Every recovery step should emit evidence (timestamp, segment, action, result).
- SDA stuck-low: the bus cannot return high; scans hang or collapse into repeated timeouts.
- Timeout bursts: intermittent read/write failures that correlate with attach/off transitions or noisy edges.
- Address conflict: two devices respond; identity becomes ambiguous; inventory mismatches grow.
- Segment short / global drag: a single island pulls the whole network down.
- Hot-plug half-attach: the segment becomes unstable during service events, causing sporadic losses.
| Level | Trigger | Action | Exit criteria |
|---|---|---|---|
| L1 Retry + timeout | Intermittent failures; single-device errors | Scoped retry for the current segment/device | Error rate drops below threshold; no segment-wide impact |
| L2 Clock pulses | SDA stuck-low signature on a segment | Issue unstick pulses on the affected segment | SDA returns high; segment becomes reachable again |
| L3 Isolate segment | Global drag or suspected short/half-attach | Close gate for the suspected island; protect backbone | Backbone stable; other segments recover and scan reliably |
| L4 Rediscover / readdress | Inventory mismatch after attach/detach | Reattach → rediscover → (I3C) DAA reassign → verify | Inventory key ↔ address mapping converges; loss count returns to baseline |
Minimum evidence fields
ts domain segment_id symptom action result impact retry_level inventory_delta
| Observed symptom | Evidence check | Primary action | If not recovered |
|---|---|---|---|
| SDA stuck-low | Which segment_id? Gate state? Recent attach/off events? | L2 clock pulses on that segment | L3 isolate segment; keep backbone stable |
| Timeout bursts | Retry level trend; temperature/power transitions; gate flaps | L1 scoped retry with bounded timeout | L3 isolate if bursts expand to other segments |
| Missing devices | present/pg status; inventory_delta; segment attach state | Rescan the segment (scoped) | L4 reattach → rediscover/DAA → verify |
| Address conflict | Multiple responders; inventory key ambiguity | Scope to the conflicting island; isolate if needed | L4 rediscover + readdress; record mapping changes |
| Global drag | Backbone health vs islands; which gate change preceded collapse | L3 isolate the most recently attached / suspected island | Iterate isolation by segment until backbone recovers |
Boundary: recovery actions here are bus/segment level (retry, unstick pulses, isolate, rediscover/readdress). This section does not define OS drivers, BMC service logic, or database persistence.
H2-10 — Validation & Bring-up Checklist
Validation should eliminate “lab passes, rack fails” by proving stability at the worst corners: maximum nodes and length, extreme temperature, and domain/power disturbances. The bring-up sequence must establish an AON baseline first, then expand segments, then qualify hot-plug behavior.
- Stage 1 — AON baseline: backbone stable; minimum telemetry chain readable; evidence logs working.
- Stage 2 — Island expansion: add one segment at a time; prove faults remain scoped to that segment.
- Stage 3 — Hot-plug readiness: attach/detach cycles do not perturb the backbone; recovery is bounded in time and loss rate.
| Dimension | Progression | What to record (evidence) |
|---|---|---|
| Node count | Typical → upper bound | scan success rate, inventory_delta, retry_level trend |
| Length / load | Typical → worst cable/trace length and capacitance | timeout bursts, stuck-low incidence, segment isolation events |
| Temperature | Ambient → hot / cold corners | error rate vs temperature, hotspot indicators, recovery time |
| Domain / power disturb | Stable → off/on transitions + hot-plug events | time-to-visibility, loss rate, mapping drift count |
| Test ID | Setup | Steps | Pass criteria | Required logs |
|---|---|---|---|---|
| T01 | AON only, backbone | Boot → scan backbone → read minimum telemetry chain | No timeouts; stable scan rate | ts, segment_id=B0, retry_level, result |
| T02 | Max nodes, typical length | Scan loops for N cycles; record errors per cycle | Success rate ≥ target; bounded retries | ts, segment_id, retry_level, inventory_delta |
| T03 | Worst length/load | Repeat scans + induced attach events | No global drag; isolation works | gate_state, impact, action, result |
| T04 | Hot corner | Hold at high temp; scan; watch drift and timeouts | Error rate remains below threshold | temp hotspot, retry_level, timeout count |
| T05 | Cold corner | Repeat T04 at low temp | Recovery times remain bounded | time-to-visibility, loss rate |
| T06 | Domain off/on | Power off island → verify invisible → restore → rescan | Backbone stable; island recovery bounded | present/pg, gate_state, inventory_delta |
| T07 | Hot-plug cycles | Attach/detach for M cycles; verify mapping stability | Low loss; minimal mapping drift | ts, segment_id=HP, action, result, mapping drift |
Boundary: this checklist specifies bring-up order, stress corners, and evidence fields. It does not specify automation frameworks, OS tooling, or rack deployment procedures.
H2-11 — Parts / IC Selection Pointers (Bus & Telemetry Only)
This section lists common IC categories used on an OCP/OpenRack baseboard to make the sideband (I³C/I²C) fabric scalable, segmentable, and observable. Focus stays on the bus layer and telemetry plumbing—no BMC firmware stack, no power-conversion deep dive.
11.1 What to place where (one-glance placement rules)
Note: MPNs below are examples to speed up RFQs and param checks. Always validate I/O behavior during power-off, hot-join/hot-plug expectations, and bus-capacitance budgets against the current datasheet.
11.2 Example IC categories and MPNs (copy/paste shortlist)
The goal is not “one perfect part,” but the right function blocks: hub/bridge → segmentation → hot-swap friendliness → domain translation/isolation → telemetry & inventory.
| Category | Example MPNs | Typical use on baseboard | Selection pointers (what to ask) |
|---|---|---|---|
| I³C hub / fan-out |
NXP P3H2840HN Renesas RG3M88B12 |
Scale one I³C backbone into multiple downstream segments (AON plane), while keeping segments individually controllable for recovery and maintenance. | Downstream port count & control model; I³C vs mixed I²C support; hot-join behavior; reset semantics; fail-isolation (one bad segment impact); max bus speed and bus-cap budget. |
| I³C/I²C translators (for I³C rates) |
TI TCA39416 NXP P3A9606 |
Cross-voltage translation for sideband where I³C speed/edges matter (e.g., 1.2V↔1.8V/3.3V), keeping bidirectional open-drain semantics. | Confirm I³C compatibility; directionless/bidirectional behavior; rise-time impact; power-off leakage; EN/disable behavior (does it isolate or “half-connect”); ESD robustness for field swaps. |
| I²C mux / switch |
TI TCA9548A NXP PCA9548A |
Segment legacy I²C islands to solve address conflicts, reduce effective bus capacitance, and localize stuck-low faults to one branch. | Channel count; reset/power-up default; leakage in off channels; level-translation needs; switch Ron vs edge integrity; software model (single vs multiple channels enabled). |
| Hot-swap I²C buffers |
TI TCA4311A TI TCA4307 |
Protect the backbone from “half-inserted” cards/segments, precharge lines, and support stuck-bus recovery without rebooting the whole management plane. | Live-insertion behavior; precharge voltage; automatic stuck-bus recovery; how it handles clock stretching & arbitration; connection criteria (STOP/idle detection); capacitance isolation strength. |
| I²C bus repeater / segment buffer | NXP PCA9515A | Split a heavy I²C bus into two segments with buffered SDA/SCL to extend practical load/cap limits and isolate noisy branches. | Multi-master friendliness; contention behavior; level translation needs; propagation delay; power-off behavior; recommended pull-up placement per segment. |
| Long/noisy run extender (dI²C) |
NXP PCA9615 NXP P82B96 |
When baseboard-to-remote panel runs are noisy/long: convert to differential (PCA9615) or use buffer extension concepts (P82B96) to improve robustness. | Distance/noise target; required cabling; speed limits; common-mode tolerance; EMC implications; how failures localize (does one short kill both ends?). |
| I²C isolation (ground/domain isolation) |
ADI ADuM1250 TI ISO1540 |
Isolate sideband across different ground references or sensitive domains while retaining bidirectional I²C signaling semantics. | Isolation rating; bidirectional support; speed ceiling; power-off behavior; fail-safe IO; CMTI/noise immunity; whether it is truly “non-latching” during hot events. |
| Digital power monitors |
TI INA229 TI INA238 ADI LTC2947 |
Per-domain V/I/P telemetry (and sometimes energy) tied to the same inventory context: domain ID, segment, and timestamp. | Common-mode range; shunt vs integrated-sense options; conversion time/averaging; alert pins and thresholds; logging needs (min/max, energy); calibration strategy and drift expectations. |
| Temperature sensors (I³C/I²C) | NXP P3T1085UK | Clean temperature telemetry with I³C features (e.g., in-band interrupt capability), suitable for dense sensor deployments with discovery semantics. | Accuracy & response; I³C features used (IBI vs polling); alert mechanism; placement strategy (hotspots vs gradients); sampling cadence vs noise. |
| GPIO expanders (presence/LED/sideband pins) |
NXP PCA9555 TI TCA9535 Microchip MCP23017 |
Add low-speed IO for presence, latch signals, LEDs, and simple discrete telemetry where routing dedicated SoC pins is costly. | Interrupt output type; power-up default states; I/O drive & pull features; input glitch sensitivity; addressing options (how many can coexist); power-off leakage/back-power risk. |
| FRU / identity EEPROM | Microchip 24AA02E64 | Store a globally unique ID (and optional inventory fields) used by discovery pipelines to bind “dynamic bus address” to a stable asset identity. | Pre-programmed ID needs; write endurance; power-loss behavior during writes; address conflicts planning; data model (what fields are mandatory for operations). |
Practical rule: pick parts that make segmentation and recovery cheap. If a segment can be isolated, re-scanned, and re-inventoried in seconds, field uptime and debug time improve dramatically.
11.3 RFQ-ready checklist — 10 questions that prevent wrong parts
These questions align with real baseboard failure modes: stuck-low, address conflict, hot-plug glitches, cross-domain back-power, and “discovery without operability.”
- Bus role: hub, mux, buffer, translator, isolator, or sensor—what exact layer is being solved?
- Speed target: I²C (100/400/1MHz) vs I³C (up to 12.5MHz). Is the part truly compatible at the required mode?
- Power-off behavior: will any pin back-power another domain through protection structures or pull-ups?
- Isolation/segmentation: when disabled/reset, does it become a clean high-Z barrier or a partial path?
- Capacitance budget: what bus cap is assumed per segment, and how does the part help enforce it?
- Hot-plug events: precharge, connect criteria (idle/STOP), and behavior during “half insertion.”
- Fault containment: can one short/stuck device be localized to a single branch with minimal blast radius?
- Recovery hooks: reset pin semantics, stuck-bus recovery, and whether software can force re-discovery/re-addressing.
- Alert strategy: in-band (I³C/IBI) vs out-of-band (ALERT/INT). How are alerts latched and cleared?
- Ops mapping: how does the design bind dynamic addresses to stable identity (FRU/ID EEPROM) with timestamps?
Figure F10 — Reference placement example (hub → segments → isolation → telemetry)
A single-board view showing where hubs, switches/buffers, translators/isolators, and telemetry ICs are typically placed to keep the management plane resilient.
Reading the figure: each segment has a “control chokepoint” (switch/buffer/isolator) so recovery can be performed per-branch without taking down the entire management plane.
H2-12 — FAQs (I3C/I2C Governance, Telemetry, Recovery)
These FAQs target field issues on an OCP/OpenRack baseboard: bus stability, discovery/addressing, multi-domain behavior, telemetry organization, recovery without full reboot, and validation. Content stays at the bus/telemetry layer.
Figure F11 — Evidence-first troubleshooting ladder (FAQ map)
A compact visual for how to move from symptoms to evidence, then to a scoped action: electrical integrity → segmentation/isolation → re-discovery/inventory consistency → validation coverage.
The FAQ answers below follow this ladder so each issue produces actionable evidence and a scoped recovery plan.
FAQs (12)
Each answer stays within: I3C/I2C governance, discovery/addressing, multi-domain telemetry, bus recovery, and validation/bring-up.