Conference Speakerphone: Beamforming, AEC & PoE Design

Core thesis: A conference speakerphone is a local audio endpoint that must keep far-end speech intelligible while maintaining stable full-duplex—meaning beamforming and AEC must survive real rooms, loud playback, and noisy power/connectivity domains.

H2-1 · System definition & boundary

System Definition & Boundary (What this page covers)

Definition: A conference speakerphone combines a mic array, playback speakers, and a local DSP/SoC to deliver full-duplex voice over USB/BT/Ethernet. The critical engineering task is controlling the echo path and preserving speech clarity under realistic acoustic and electrical interference.

In-scope product modes (local device focus)

USB speakerphone: UAC endpoint + local DSP chain; predictable framing but clock/SRC decisions matter.
Bluetooth speakerphone: HFP voice path (and optional A2DP music); codec latency impacts AEC reference alignment.
Ethernet/PoE endpoint/bridge: PoE-powered local audio endpoint; audio quality depends on rail partition + PHY/DC-DC coexistence.

Out-of-scope (to prevent scope creep)

Meeting-app setup (Teams/Zoom), OS/driver walkthroughs, user software tutorials.
Cloud/backend architecture, enterprise VoIP server deployment, IT network tuning.
Generic “DSP theory class” without measurements and design decisions.

What “good” looks like (metrics you can verify)

Far-end intelligibility (primary): stable clarity at distance and off-axis talker angles; avoid “thin/robotic” artifacts from overly aggressive NR/AGC.
Echo suppression stability: high ERLE across voice band without collapsing during loud playback, EQ changes, or double-talk (barge-in).
Latency budget discipline: total end-to-end delay kept within a range where AEC can converge; avoid hidden buffers and sample-rate conversions.
Double-talk resilience: near-end speech remains present when far-end speaks; no “near-end gets canceled as echo.”

How this page is structured (evidence chain, not opinions)

Signal flow: mic capture → beamforming → AEC/NR/AGC → transmit; and receive → playback → speaker.
Reference integrity: where playback reference is tapped (pre/post EQ/limiter) and how delay is tracked.
Power & coexistence: PoE/DC-DC/class-D switching noise vs mic AFE quiet domain.
Validation mindset: measurable pass/fail checks (ERLE under stress, latency inventory, noise coupling correlation).

Cite this figure: Conference Speakerphone — Fig F1 (Top-Level Blocks)

Reading tip: if AEC “mysteriously” collapses, verify reference tap and end-to-end delay before tuning algorithm parameters.

H2-2 · Acoustic mechanics first

Acoustic Mechanics First (Enclosure, mic placement, speaker coupling)

Why this comes before algorithms: In real conference rooms, AEC and beamforming are limited by the echo path created by enclosure geometry, speaker-to-mic leakage, and tabletop reflections. When leakage dominates, “DSP tuning” often produces unstable, room-dependent behavior instead of repeatable gains.

Three coupling paths that set the AEC difficulty

Airborne direct leakage: speaker energy escapes through ports/grilles and reaches the mic array with short delay—this raises residual echo and forces heavier suppression.
Structure-borne vibration: enclosure or PCB vibration modulates mic signals (especially with tight mounts), producing echo-like components AEC cannot perfectly model.
Tabletop early reflections: near-field reflections add strong early peaks in impulse response; they can confuse beamforming steering and reduce ERLE consistency across angles.

Mic array geometry: decisions with measurable consequences

2 mics: simpler, lower cost; limited spatial selectivity; more sensitive to reflections and talker movement.
4 mics (common sweet spot): stronger directivity and better noise rejection with moderate calibration needs.
6 mics: best spatial selectivity potential, but demands tighter channel matching (gain/phase), better clocking, and more robust mechanical symmetry.
Spacing rule of thumb: larger spacing improves low-frequency directivity but increases spatial aliasing risk at higher frequencies; smaller spacing reduces aliasing but limits directivity.

Minimal evidence chain (fast, repeatable, hardware-first)

Evidence A: impulse / sweep to expose resonances

Capture impulse response (or chirp) using the mic array.
Look for strong early peaks (typical of tabletop reflection) and narrowband bumps (cavity/port resonances).
If early peaks dominate, beamforming and AEC will be angle- and room-sensitive.

Evidence B: near-field leakage mapping

Play a controlled test signal (pink noise / sweep) from the speaker.
Record each mic channel level: identify hot spots (asymmetry, vent leakage, mechanical coupling).
Large channel-to-channel leakage differences often predict unstable AEC convergence.

Discriminator: mechanics vs reference/latency

Mechanics-dominant signature: AEC works in one room/position but fails dramatically after small placement changes (device rotation, table material, distance).
Reference/latency signature: Echo behavior is consistently “wrong” across rooms, often tied to mode changes (USB vs BT) or audio pipeline changes (EQ/limiter order).

Cite this figure: Conference Speakerphone — Fig F2 (Acoustic Coupling)

Design implication: reduce leakage asymmetry and strong early reflections before expecting stable ERLE and consistent off-axis intelligibility.

H2-3 · Mic front-end

Mic Front-End (PDM/Analog mics, AFE noise, dynamic range, biasing)

Design goal: the mic front-end must preserve far-field speech while surviving loudspeaker leakage without saturating. Beamforming and AEC can only perform as well as the multi-channel input SNR and channel consistency allow.

PDM vs analog: choose based on the failure mode you can tolerate

PDM mics (digital output)

Strength: reduces analog routing sensitivity; multi-mic builds benefit from consistent digital interfaces.
Watch-outs: shared clock quality and routing matter; poor clock/grounding can imprint tones and correlated noise across channels.
Best fit: 4–6 mic arrays where board-level analog matching is costly and channel-to-channel stability is critical.

Analog mics (AFE/ADC dependent)

Strength: flexible analog conditioning and anti-alias filtering; easier to manage certain EMC corner cases with careful layout.
Watch-outs: analog pickup and reference contamination are common; channel matching depends heavily on layout symmetry and component tolerances.
Best fit: smaller mic counts or designs with strong analog layout discipline and well-partitioned supplies.

What actually breaks far-field beamforming

Self-noise + AFE noise floor: sets the intelligibility limit at distance. A clean DSP cannot recover information below the sensor/AFE noise floor.
Sensitivity tolerance & phase mismatch: small gain/phase errors inflate sidelobes and blur the main lobe, reducing directional SNR gains.
Leakage headroom: speakerphone mics must tolerate strong playback leakage; repeated clipping produces echo-like artifacts that confuse AEC and post-filters.
PDM clocking artifacts: correlated clock/ground noise can appear in all channels at once, making “noise reduction” less effective because the noise is coherent.

AFE/ADC (or PDM receiver): three front-end requirements that matter in practice

Low input-referred noise: keep equivalent acoustic noise low enough to resolve far-field speech in quiet and moderate-noise rooms.
Headroom under leakage: avoid saturating on loud playback, coughs, table taps, or sudden near-field talkers; recovery behavior matters as much as peak level.
Anti-alias & phase consistency: consistent filtering and group delay across channels protect beamforming performance; mismatched phase response reduces directional gain.

Matching & calibration hooks (designing for repeatability)

Per-channel trim capability: reserve gain/phase trim or calibration tables in DSP for channel alignment (within reasonable correction range).
Temperature drift awareness: bias networks and references drift; multi-channel mismatch can grow with temperature and aging, especially if layout is asymmetric.
Symmetry by construction: identical trace lengths (where meaningful), consistent bias/RC networks, and consistent decoupling keep calibration stable over time.

Minimal measurement set (fast, repeatable, diagnostic)

1) Noise floor (equiv. dBA SPL)

Record in silence (speaker off) and check spectrum for switching/clock fingerprints.
If noise rises with USB/PoE/Ethernet activity, suspect supply/ground/clock coupling before “DSP tuning”.

2) Channel mismatch (gain/phase)

Use a single source at fixed distance and sweep angle; compare channel amplitude/phase across frequency bands.
Growing high-frequency phase spread usually predicts weak directional gain and unstable steering.

3) Clipping behavior under leakage

Play loud test audio while recording mic channels; watch for repeated saturation and slow recovery.
Frequent clipping forces more aggressive suppression later, often trading intelligibility for “stability”.

Cite this figure: Conference Speakerphone — Fig F3 (Mic Front-End & Sync)

Debug hint: correlated tones/noise appearing in all mic channels often implicate shared clock/ground coupling rather than a single “bad mic”.

H2-4 · Beamforming pipeline

Beamforming Pipeline (from raw channels to a clean talker)

Pipeline view: beamforming is a controlled sequence—align channels, apply delays/weights, sum, then suppress residual noise. Each stage has measurable outputs, and each stage fails in recognizable ways under mismatch, reflections, or near-field interference.

Fixed vs adaptive beamforming (select by constraints, not hype)

Fixed beamforming: predictable latency and stable behavior; best when compute budget is tight and room conditions are moderately controlled.
Adaptive beamforming: can track changing noise sources, but requires careful update control; mis-tuned updates can “chase reflections” and degrade speech naturalness.
Speakerphone constraints: limited mic count, strict latency budget (AEC compatibility), and frequent early reflections from tables and walls.

Why VAD gating can destabilize beamforming

Over-sensitive gating: speech tails get chopped; output sounds thin and intermittent, especially with soft talkers or off-axis speech.
Under-sensitive gating: noise and leakage influence weight updates, creating steering drift or widened sidelobes.
Practical guardrail: ensure gating decisions are consistent with AEC/double-talk logic; otherwise the system oscillates between “suppress” and “recover”.

Steering errors: three root causes that look similar but test differently

1) Channel mismatch (gain/phase)

Symptom: weak directional gain; noisy far-end even in quiet rooms.
Tell: angle sweep shows a blurred main lobe and elevated sidelobes across frequency.

2) Early reflections (table/walls)

Symptom: works in one placement, fails after small rotation or different table material.
Tell: impulse response shows strong early peaks; polar response changes significantly by room/position.

3) Near-field talkers or interference

Symptom: beam “locks” on the wrong source; far-end hears the side talker or keyboard over the main talker.
Tell: distance sweep shows erratic SNR improvement; close sources dominate even off-axis.

Evidence chain: 3 quick tests that isolate the limiting factor

Angle sweep (polar sanity): fixed distance, rotate talker angle in steps; confirm a stable main lobe and acceptable sidelobes.
Distance sweep (SNR vs distance): 0.5 m → 1 m → 2 m; if improvements collapse at distance, noise floor dominates (return to H2-3).
Room/table A/B test: same setup with soft pad or different table; large improvement indicates reflection coupling dominates (return to H2-2).

Cite this figure: Conference Speakerphone — Fig F4 (Beamforming Flow)

Validation tip: if polar response changes drastically between rooms or table materials, the limiter is often early reflections rather than beamformer math.

H2-5 · AEC done right

AEC (Acoustic Echo Cancellation) Done Right

Core rule: AEC performance is limited less by “filter math” and more by whether the playback reference matches the echo component inside the mic signal in both content and timing. When reference is tapped at the wrong point or delay drifts, ERLE becomes unstable and barge-in fails.

Reference must be clean: where to tap (pre/post EQ, pre/post limiter)

Tap around EQ

Pre-EQ: clean reference, but less similar to actual acoustic output if EQ (or speaker response shaping) is significant.
Post-EQ: closer spectral match to the real echo; more stable ERLE when EQ is static.
Risk: dynamic EQ changes behave like a time-varying system; delay/shape tracking must be robust.

Tap around limiter

Pre-limiter: avoids injecting strong nonlinearity into the reference; best if limiter rarely engages.
Post-limiter: reference includes level shaping; helpful when limiting is frequent.
Risk: clipping/limiting creates nonlinear echo that linear AEC cannot fully model; residual suppression becomes mandatory.

Latency budget: hidden buffers and SRC are common ERLE killers

Buffering: FIFO depth changes, mode switches (USB ↔ BT ↔ Ethernet), and safety buffers silently add delay.
SRC (sample-rate conversion): introduces additional group delay and may drift under clock mismatch if not locked properly.
Transport delays: USB frame scheduling, Bluetooth codec packetization, and network buffering can shift reference timing.
Practical implication: AEC requires a stable and trackable delay between the reference and the mic echo; “mostly stable” often fails under stress.

Double-talk and residual echo: protect barge-in without destabilizing adaptation

Double-talk detection: prevents the adaptive filter from updating aggressively while near-end and far-end speak simultaneously.
Failure mode A (too strict): near-end speech is mistaken as echo → near-end gets suppressed, barge-in sounds weak.
Failure mode B (too loose): filter updates during double-talk → divergence → ERLE collapses even after double-talk ends.
Residual echo suppressor: cleans leftover echo after the main AEC, but excessive suppression trades intelligibility for “quiet”.

Nonlinear echo is real (speaker/amp/clipping): design for it

Common signature

ERLE drops sharply at high playback levels, even when reference delay is correct.
Residual echo becomes harsh/grainy, especially after limiter engagement or near-clipping.
Mic waveforms show repeated saturation or slow recovery under loud playback leakage.

When nonlinear echo dominates, stable full-duplex requires both reducing nonlinearity (avoid clipping, manage limiter behavior) and post-AEC cleanup (residual suppression tuned to minimize speech damage).

Evidence: what to measure and which stress cases expose root causes

Primary measurements

ERLE vs frequency: frequency bands that collapse often indicate spectral mismatch or strong resonances.
Convergence time: slow convergence suggests delay misalignment, excessive noise, or insufficient modeling length.

Stress cases (must-pass)

Loud music / EQ change: stresses tap-point choice and time-varying effects.
Clipping proximity: reveals nonlinear echo and recovery behavior.
Barge-in (double-talk): verifies stability and near-end protection.

Cite this figure: Conference Speakerphone — Fig F5 (AEC Reference & Echo Path)

Debug shortcut: if ERLE collapses mainly when limiter engages, suspect nonlinear echo and re-evaluate tap-point and headroom before algorithm tuning.

H2-6 · Noise control stack

Noise Control Stack (NR/ANC/AGC, wind/fan, keyboard, room)

Goal: reduce background noise while preserving speech naturalness. Noise suppression is always a trade: higher SNR often increases artifacts unless processing order and dynamics are controlled.

Clarify terms: NR vs “ANC” in speakerphone context

NR/NS (noise reduction/suppression): time-frequency suppression and post-filters applied to the speech signal.
ANC: in strict terms requires secondary paths and error sensing; in many speakerphone discussions it loosely refers to suppression. This chapter focuses on noise suppression behavior.

Noise types drive different failure modes

Steady noise (fan/AC)

Strong suppression is possible, but can create musical noise if overly aggressive.
Look for tonal residue and spectral “holes” that shimmer over time.

Impulsive noise (keyboard/table taps)

Often triggers pumping through AGC/limiter interaction.
Short-term level variance spikes and audible “breathing” are common symptoms.

AGC placement: mis-ordering can break AEC and destabilize the stack

Dynamic blocks change the signal statistics: AGC and limiter reshape levels; if placed before AEC or in the wrong loop, they can degrade reference consistency.
Common pitfall: AGC behavior that tracks far-end playback can interfere with AEC’s adaptation and double-talk logic.
Practical rule: keep AEC’s view of reference and mic echo as stable as possible; apply aggressive dynamics after echo stability is secured.

Common artifacts and how to tie them to measurable indicators

Pumping / breathing

Symptom: noise floor rises and falls between words.
Indicator: elevated short-term level variance and gain fluctuations.

Musical noise

Symptom: metallic “chirps” in quiet gaps.
Indicator: time-varying spectral holes and narrowband tones.

Speech distortion

Symptom: speech becomes thin or smeared; consonants lose clarity.
Indicator: excessive suppression during voiced segments and unnatural spectral shaping across formant regions.

Evidence: report SNR gains together with artifact cost

SNR improvement vs distance: validate the gain at 0.5 m, 1 m, and 2 m; if improvements vanish at distance, noise floor dominates upstream (return to mic front-end).
Artifact-cost metrics: track level variance, tonal residue, and time-varying spectral holes alongside subjective listening.
Decision framing: prefer configurations that maintain intelligibility and stable full-duplex rather than maximizing “quietness”.

Cite this figure: Conference Speakerphone — Fig F6 (Processing Order)

Tuning rule: if pumping appears after improving noise suppression, re-check AGC/limiter behavior and placement before increasing suppression depth.

H2-7 · Playback chain

Playback Chain (DAC / Class-D amps / speakers / protection)

Design goal: deliver stable loudness and clean audio while minimizing leakage-driven artifacts that raise the workload for echo control. The playback chain must be treated as a controlled actuator: output power, distortion, noise, EMI, and thermal behavior all feed back into conferencing quality.

Class-D selection: choose by “bad-day behavior”, not by headline specs

Output power vs supply: verify achievable power at the worst-case supply and speaker impedance; headroom matters more than peak numbers.
THD+N vs power curve: near-max power distortion growth often dominates perceived harshness and echo residue.
Idle noise (hiss): low-level noise becomes obvious in quiet rooms; check noise with playback muted and with system activity (USB/Ethernet/LED PWM).
Spread-spectrum options: helps reduce EMI peaks; effectiveness depends on layout, filter components, and switching strategy.
EMI reality: class-D edges can couple into mic/AFE grounds and RF paths; selection must consider switching frequency and control mode stability.

Speaker protection: the protection strategy is part of sound quality

Protective functions

Limiter: prevents overload but can sound compressed if too aggressive.
Thermal foldback: avoids overheating; must degrade gracefully to prevent “sudden quiet” complaints.
DC offset / short protection: safety and speaker survival; validate recovery behavior after a fault.
Pop/click suppression: power-up, mute, and mode-switch transitions need timed ramps and stable references.

What goes wrong

Harsh residual echo: clipping or limiter engagement creates nonlinear echo components.
Pumping: limiter and AGC interaction causes audible loudness breathing.
Silent gaps: over-protect or unstable recovery can introduce dropouts perceived as “audio cutting”.

Feedback and monitoring: make playback a closed-loop system

Clip detect: triggers smooth gain reduction or EQ adjustment before hard clipping creates harsh artifacts.
Temp sense: enables predictive thermal control (slow ramp-down) instead of sudden foldback.
Current sense: detects abnormal loads and protects against overcurrent while providing a diagnostic signal for “bad cables/speaker faults”.
DSP control: the cleanest solutions shape behavior upstream (avoid clipping) rather than relying on heavy post-processing.

Evidence: four minimal tests that explain most field complaints

Audio integrity

THD(+N) vs power: measure at low/medium/high levels and near the top operating point.
Idle noise: evaluate hiss with mute engaged and during system activity to reveal coupling sources.

Events and stability

Pop/click capture: record and scope power/mute transitions; confirm ramp and reference stability.
Thermal rise: steady SPL run; log temperature curve and verify foldback point and recovery smoothness.

Cite this figure: Conference Speakerphone — Fig F7 (Playback + Protection Loop)

Field heuristic: if harshness appears only at high volume and ERLE drops at the same time, clipping or limiter-induced nonlinearity is often the shared root cause.

H2-8 · USB + Bluetooth

Connectivity: USB Audio + Bluetooth Audio (and control plane)

System view: USB and Bluetooth are two I/O modes feeding the same DSP chain. The hardest problems are not “can it connect”, but clock-domain alignment, predictable latency, and clean mode switching without glitches, dropouts, or echo-control collapse.

USB Audio (UAC): hardware-focused failure modes

Sample-rate handling: switching rates often triggers SRC and buffer resets; this can change end-to-end latency and break reference alignment.
Clock domain crossing: USB-side timing and audio-processing timing must be bridged with stable buffering; under/overrun causes crackles or periodic dropouts.
Frame/buffer effects: USB scheduling and internal FIFOs add fixed and sometimes variable delays; measure rather than assume.

Bluetooth: HFP vs A2DP and how codec latency impacts echo alignment

HFP (call mode)

Voice-oriented path; stability matters more than bandwidth.
Latency and buffering still exist; mode changes can shift delay and require re-alignment.

A2DP (media mode)

Higher fidelity is possible, but codec and jitter buffers can raise latency.
Delay changes complicate AEC reference alignment and transition behavior.

Control plane (mute buttons, LEDs, touch): common noise injection paths

LED PWM: can modulate ground and supply rails, producing audible tones or raising the noise floor.
Touch scanning: periodic excitation and long traces can couple into audio references if return paths are not controlled.
Mute and mode keys: switching events must be sequenced to prevent pops/clicks and avoid transient reference disruption.

Evidence: measure mode switching and latency like a validation engineer

Latency & timing

End-to-end latency: compare USB vs BT (HFP/A2DP) using the same stimulus (click/impulse).
Mode switch time: log mute window length and any glitch bursts during transition.

Glitch/dropout root causes

Clock/CDC: buffer under/overrun, periodic crackle, drift-driven resets.
Buffer strategy: unstable FIFO behavior during rate changes and reconnection.
Power transient: interface resets and audio chain re-initialization under load.

Cite this figure: Conference Speakerphone — Fig F8 (Dual-Path I/O Modes)

Troubleshooting pattern: if glitches appear mainly during mode switches, measure delay change and buffer reset timing before chasing “random RF issues”.

H2-9 · Ethernet + PoE

Ethernet + PoE: Power + Data Without Instability

Design goal: deliver reliable link-up and stable audio while Ethernet activity and PoE power conversion coexist in the same enclosure. Most “random reboot” and “Ethernet makes audio noisy” complaints reduce to inrush + PD/MPS behavior, start-up sequencing, and noise coupling paths between hot power zones and the mic/AFE domain.

PoE PD controller: four behaviors that decide field success

Inrush & start-up

Inrush limiting: large bulk capacitance without controlled ramp can cause repeated start attempts.
DC/DC sequencing: rails must rise in a predictable order; slow or unstable ramps can trigger brownout loops.

Classification & MPS

Classification margin: insufficient budget shows up as resets during loud playback or heavy compute.
MPS stability: if the maintain-power signature is not sustained, the switch may remove power, appearing as “random reboot”.

Isolation boundaries: keep “hot entry” and “audio quiet” physically and electrically separated

Magnetics boundary: treat RJ45 + magnetics as a high-energy entry region with its own return paths.
PoE front-end boundary: PD + primary conversion are a switching-noise source; keep them out of the mic/AFE reference area.
Optional digital isolation (concept): used when the system must break ground loops or keep noisy domains from polluting sensitive references.

Noise coupling: the three most common paths into the mic domain

PHY activity → ground return

Traffic bursts modulate digital return currents.
Audio symptom: noise correlated with Ethernet activity level.

PoE DC/DC ripple → rails

Switching ripple or burst-mode noise reaches sensitive rails.
Audio symptom: tones/harmonics near converter frequency.

Class-D edges + shared return paths

Loud playback increases di/dt and raises coupling risk.
Audio symptom: noise and echo stability degrade at higher volume, sometimes only when Ethernet is active.

Evidence: three minimal tests that separate “power” from “coupling”

Cold-start success rate

Repeat link-and-power cycles; log pass/fail modes.
Look for patterns: specific switches, cable lengths, or temperature.

Link-up transient response

Scope PoE input/main rail + 5V during link-up and mode changes.
Failing cases often show sag, repeated ramps, or oscillatory recovery.

Audio noise vs Ethernet activity

Compare idle vs heavy traffic; capture noise spectrum and perceived artifacts.
Correlation to activity strongly indicates return-path or rail coupling rather than “random firmware”.

Cite this figure: Conference Speakerphone — Fig F9 (PoE Front-End + Rails)

Debug shortcut: if cold-start failures cluster by switch model, prioritize inrush and MPS stability before touching audio DSP settings.

H2-10 · Power tree & grounding

Power Tree & Grounding for Mixed-Signal Audio (the silent killer)

Core rule: most hiss, touch-induced noise, and “reboot at loud volume” issues are power/return-path problems. A mixed-signal speakerphone must treat the mic/AFE as a protected island and control where high di/dt currents return, especially from PoE conversion and class-D switching.

Rail partitioning: build islands and keep noisy domains out

AFE rails: highest sensitivity; minimize ripple and prevent shared return paths with switching domains.
Digital core: bursty current; isolate via local regulation and controlled return routes.
RF domain: can both suffer from and inject noise; keep supply impedance stable.
Class-D power: high di/dt; keep loops tight and returns away from AFE references.
PoE/DC-DC: switching hot zone; prevent ripple and burst-mode noise from reaching analog rails.

Ground strategy: return-path control beats “pretty ground splits”

What matters

High di/dt returns: class-D and DC/DC returns must not traverse the AFE reference region.
Single-point tie: connect islands at a controlled point that prevents noisy domains from sharing analog returns.

Shields (concept)

Shield/metalwork can protect or pollute depending on where it ties.
Wrong tie points can import Ethernet/USB noise into audio references.

Transients: link-up, hot-plug, and bursts can trigger brownout loops

Inrush & hot-plug: sudden load changes can sag the main rail and force repeated resets if UVLO thresholds are too tight.
PoE burst behavior: light-load burst modes can create audible tones if ripple reaches sensitive rails.
Peak load events: loud playback and compute spikes can expose insufficient margin and poor sequencing.

First 2 measurements: the fastest way to classify most failures

CH1: mic AFE rail ripple (probe near the AFE load).
CH2: system 5V or PoE main rail sag (trigger on mode switch, touch scan, Ethernet traffic burst, or volume step).

Discriminator patterns: correlate noise with the real aggressor

Switching-frequency correlation

Fixed tones / harmonics often map to DC/DC switching or PWM activity.
Prioritize rail filtering and return-path containment.

Activity correlation

Noise that rises with traffic indicates PHY/return-current coupling.
Noise that rises with volume indicates class-D edge/return coupling.

Fix priority: isolate first, then stabilize events, then polish features

Isolation: protect the AFE island and keep noisy returns out.
Event stability: tame inrush/hot-plug/link-up and choose UVLO/brownout behavior with margin.
Polish: minimize LED/touch injection and interface coexistence issues once fundamentals are clean.

Cite this figure: Conference Speakerphone — Fig F10 (Power/Ground Partition Map)

Fast diagnosis: if CH1 (AFE rail ripple) spikes during Ethernet bursts or volume steps, prioritize partitioning and return-path containment before tuning DSP noise suppression.

H2-11 · EMC / ESD coexistence

EMC/ESD/Audio Coexistence (pass tests and sound good)

Goal: survive ESD/EFT and radiated/conducted stress without muting, rebooting, or degrading near-end voice. Practical success comes from mapping aggressors → coupling → victims, then proving fixes with correlation (near-field scans, hit maps, and audio artifact logs tied to test points).

Typical aggressors (where the energy starts)

Class-D outputs: high dv/dt and speaker cable radiation; edges can capacitively inject into mic inputs.
PoE DC/DC edges: switching ripple and burst-mode behavior can become audible if rails leak into AFE references.
Ethernet PHY activity: traffic bursts modulate digital return currents; noise can track activity level.
USB cable entry + ESD clamps: the clamp current return path can create ground bounce that upsets refs, clocks, or reset lines.

Typical victims (where small signals break first)

Mic inputs: high impedance, low signal level; susceptible to E-field coupling and common-impedance return noise.
AFE references / bias nodes: ripple or bounce here turns into broadband hiss or tonal artifacts.
Clocks: disturbance can cause SRC/DSP timing artifacts, dropouts, or mode-switch instability.
Touch / LED lines: scanning/PWM can both inject noise and latch-up/reset under ESD if entry protection is weak.

Layout tactics that preserve audio quality (hardware-first)

Loop + return control

Minimize loop area for Class-D power and PoE switching loops.
Keep high di/dt returns out of the AFE reference region.
Single controlled tie between quiet analog and noisy digital/power returns.

Cable-entry protection

Place TVS/ESD parts at the connector (short path to the intended return).
Guard/shield strategy near mic inputs (do not route aggressors near high-Z nodes).
Use common-mode control where cables behave like antennas (speaker leads, USB, Ethernet).

Evidence: make interference visible and repeatable

Near-field correlation

Scan around Class-D, PoE/DC-DC, PHY, and connector clamps.
Match hot spots to audible artifacts (tone, hiss, clicks) and to rail ripple signatures.

ESD hit map

Hit points: USB shell, RJ45 shield, buttons/touch area, speaker opening.
Log symptoms: mute, reboot, dropouts, pops; map to likely victims and returns.

Audible artifact log (tie sound to test points)

During ESD/EFT or traffic bursts, capture: AFE rail ripple, main rail sag, reset/PG, and audio output (short recording).
Correlation beats guesswork: if the noise rises with PHY activity, return-path coupling is often the dominant mechanism.

Concrete example MPNs (typical “first picks”)

These are commonly used reference parts to speed selection. Final choice depends on voltage, IEC level targets, capacitance limits, and layout constraints.

Function	Example MPNs	When to use / notes
USB ESD low-C multi-line protect	`ST USBLC6-2SC6` `TI TPD4E02B04` `Nexperia PESD5V0S1UL`	Connector-adjacent clamps; watch capacitance on high-speed lines and return-path routing to avoid ground bounce.
Ethernet entry ESD / surge helper	`Semtech RClamp0524P` `Littelfuse SP3012-04UTG` `Nexperia PESD1CAN`	Place at the jack region; coordinate with magnetics and chassis/shield strategy. (Exact choice depends on interface variant and topology.)
PoE PD controller (stability anchor)	`TI TPS2372` `TI TPS2373` `ADI LTC4269`	PD classification + inrush + MPS behavior; select for power class needs and preferred DC/DC architecture.
Hot-plug eFuse / inrush control	`TI TPS25940` `TI TPS25942` `ADI LTC4365`	Mitigate link-up/hot-plug rail sag, limit fault current, and reduce brownout loops that present as “random reboot”.
Supervisor reset / PG robustness	`TI TPS3839` `Maxim MAX16054` `Microchip MCP1316`	Use when short dips or noisy rails trigger unstable resets; pick thresholds and hysteresis to match real sag events.
Low-noise LDO for AFE rails	`TI TPS7A20` `ADI LT3042` `Microchip MIC5504`	Protect mic/AFE rails from switching noise; place close to AFE load and avoid shared high-current returns.
Class-D amp (speakerphone-class)	`TI TAS5825M` `TI TAS5805M` `Maxim MAX98390`	Choose by THD+N vs power, idle noise, EMI behavior, and protection telemetry (clip/temp/current) options.
Digital isolator (if needed)	`TI ISO7741` `ADI ADuM1250` `Silicon Labs Si86xx`	Use when a hard boundary is required to break noisy return coupling; verify timing/jitter compatibility with the control interface.

Cite this figure: Conference Speakerphone — Fig F11 (Interference Coupling Paths)

If an ESD strike causes mute/reboot only at specific hit points, the return path is usually the story—optimize clamp placement and where that current is allowed to flow.

H2-12 · Validation & field debug SOP

Validation & Field Debug Playbook (symptom → evidence → isolate → fix)

Format: each symptom is handled with the same repeatable sequence: First 2 measurements (minimum tools) → Discriminator (what proves the root-cause category) → First fix (highest leverage change). Component-level examples are included to accelerate troubleshooting and redesign loops.

1) Far-end hears echo / ERLE low

First 2 measurements: (A) capture the playback reference at the chosen tap point (pre/post EQ/limiter), (B) measure end-to-end delay change between modes (USB vs BT) using an impulse.
Discriminator: ERLE collapses only at high volume → nonlinear playback (clipping/limiter) contaminates the echo; ERLE shifts after mode switching → delay/buffer reset misalignment.
First fix: move reference tap to a stable point; align delay after SRC/buffer; avoid hard limiting in the reference path.
Example MPNs (playback/telemetry): TI TAS5825M, TI TAS5805M, Maxim MAX98390 (choose based on EMI/noise/protection telemetry needs).

2) Near-end voice thin/robotic (NR artifacts)

First 2 measurements: (A) compare spectra pre/post NR+AGC, (B) log output level variance vs time (pumping) during steady speech/noise.
Discriminator: artifacts increase with fan/keyboard noise → aggressive suppression; artifacts appear after CPU load or streaming changes → timing/buffer stress or order issues.
First fix: enforce processing order stability (avoid AGC breaking AEC); relax thresholds/time constants; cap suppression aggressiveness before chasing “more DSP”.
Hardware helpers (noise immunity): low-noise AFE rail regulation often reduces “NR overreaction” by lowering the raw noise floor: TI TPS7A20, ADI LT3042.

3) Howling at high volume (acoustic coupling + limiter + AEC instability)

First 2 measurements: (A) record mic input and speaker output simultaneously to locate feedback frequency, (B) monitor clip/temp/current flags at the amp stage.
Discriminator: fixed strong frequency → acoustic/mechanical resonance; howling appears at limiter action → nonlinearity plus AEC instability.
First fix: reduce leakage (mechanical/placement) and stabilize limiter behavior (smooth control, avoid hard clipping); validate at worst-case volume.
Example MPNs: amps with robust protection hooks: TI TAS5825M, Maxim MAX98390.

4) Touch/LED causes hiss or clicks

First 2 measurements: (A) AFE rail ripple near the AFE load, (B) correlate ripple/noise with LED PWM or touch scan timing.
Discriminator: noise matches PWM frequency/harmonics → conducted/return coupling; click at touch events → transient injection via return path or clamp current path.
First fix: isolate returns; slow edges or move PWM frequency; add rail filtering close to AFE; ensure clamp currents return outside the quiet reference region.
Example MPNs (rail isolation): load switch TI TPS22919 (domain gating), low-noise LDO TI TPS7A20.

5) PoE negotiation resets / reboots on link-up

First 2 measurements: (A) PoE main rail / 5V sag during link-up and start-up, (B) reset/PG/UVLO behavior (is it a brownout loop?).
Discriminator: repeated ramp-up then drop → inrush/MPS/power budget; only fails on certain switches → boundary/tolerance sensitivity (often inrush behavior).
First fix: tune inrush and start sequencing; add hot-swap/eFuse if needed; set supervisor thresholds/hysteresis for real sag events.
Example MPNs: PD controller TI TPS2372/TPS2373, hot-swap/eFuse TI TPS25942, supervisor TI TPS3839.

6) BT drops when speaker plays loud (ground bounce / RF detune)

First 2 measurements: (A) main rail sag vs class-D current events (volume steps), (B) RF/BT supply noise or ground reference disturbance under loud playback.
Discriminator: dropouts scale with volume more than distance → power/return bounce; dropouts change with hand position/enclosure → antenna detune sensitivity.
First fix: keep class-D return away from RF domain; add transient headroom; stabilize RF rail; reduce edge coupling.
Example MPNs (power stability): eFuse TI TPS25940, low-noise LDO TI TPS7A20 for sensitive rails (as applicable).

Cite this figure: Conference Speakerphone — Fig F12 (Decision Tree)

This playbook is designed for fast field isolation: two channels on a scope plus simple event correlation can eliminate most “mystery” audio issues.

Conference Speakerphone Design: Mic Array, AEC, PoE & Amps