SATCOM Terminal: LNB/LNA & BUC Control, Modem + Backhaul
← Back to: Avionics & Mission Systems
A SATCOM terminal is best understood as a closed-loop system: LNB/LNA and BUC control, frequency/lock sequencing, modem/FEC diagnostics, crypto insertion, and Ethernet backhaul must be proven with clear counters and logs. When link quality drops or reacquisition slows, a layered checklist (RF lock → modem errors → network continuity → power/thermal evidence) pinpoints the fault domain quickly and prevents “wrong-layer” fixes.
H2-1 · What a SATCOM terminal contains (system boundary & interfaces)
This chapter defines the SATCOM terminal boundary, the minimum interface set, and the state flow that decides when traffic is allowed. The goal is a practical “where to probe, what to read, and what the terminal is responsible for”.
Key takeaway: A SATCOM terminal is the control + conversion + modem + security + backhaul hub between the antenna-side hardware (LNB/LNA/BUC) and the IP network. It must expose lock status, RF health, modem health, crypto state, and backhaul health as measurable signals.
1) Boundary: terminal vs antenna-side hardware vs network
A SATCOM terminal (IDU-class) owns the closed-loop controls and the “traffic release gate”. Antenna-side hardware provides RF amplification/conversion (LNB/LNA/BUC), while the network side provides IP transport. The terminal is responsible for coordinating acquisition/lock, maintaining stable operation across temperature and supply variation, and exporting enough telemetry to isolate faults quickly.
2) Internal blocks: name the block AND the observable outputs
For engineering use, each internal block should be described with the specific signals or counters it must expose. Without these “observables”, troubleshooting degenerates into guessing.
- RF front-end control (LNB/LNA/BUC): bias voltage/current, AGC/RSSI, LO/PLL lock flags, output power detector reading, VSWR/over-temp alarms.
- Frequency & reference: reference-stable indicator, synthesizer lock, lock-time, unlock reason codes, re-lock attempt counters.
- Modem ASIC: acquisition/sync state, BER/FER counters, FEC decoder stress indicators (e.g., iteration/erasure trends), buffer under/overrun counters.
- Encryption (terminal-side): enable/disable state, throughput utilization, error counters, audit-friendly event IDs (without exposing sensitive internals).
- Ethernet backhaul & management: link up/down, DHCP/DNS failures, queue/QoS drops, management-plane isolation status, remote health heartbeat.
3) External interfaces: a checklist that supports field isolation
External interfaces should be listed by interface class so they remain applicable across vendors and form factors. The checklist below is designed to enable isolation between RF issues, modem issues, and network issues.
| Interface class | What it carries | Must-have signals to read/log |
|---|---|---|
| RF/IF path | RX IF into modem chain; TX IF out to BUC chain; optional coupled samples | RSSI/AGC, IF clip/overload flags, TX drive level, coupled power detector reading |
| Control & health | Bias/enable commands; status/alarms from LNB/LNA/BUC | Bias V/I, LO lock, VSWR/over-temp, “power limit active” flags |
| Ethernet (data plane) | User traffic IP frames | Link status, throughput, queue drops, congestion indicators |
| Ethernet (management) | Remote configuration/telemetry access | Isolation enable, auth state (high-level), heartbeat, telemetry stream health |
| Power & reset | Terminal internal rails and restart behavior | PG/RESET tree, brownout events, reset cause codes, “fast reacquire” state |
| Service/diagnostics | Log extraction, factory test hooks | BIT results, event log bundle, timestamped “lock transitions” history |
4) Operational workflow: the “traffic release gate” must be explicit
A robust terminal behaves like a state machine. The critical engineering concept is the traffic release gate: user traffic is only enabled when RF lock, modem sync, crypto readiness, and backhaul readiness are all satisfied.
H2-2 · Link budget knobs you can actually control (G/T, EIRP, margin)
This chapter turns “link quality” into a small set of controllable knobs and observable metrics. It separates what the terminal can tune from what it can only detect and record.
Key takeaway: Most “unstable link” complaints are a mix of three levers: G/T (receive cleanliness), EIRP (transmit strength/linearity), and Margin (how close operation is to the FEC threshold). A good terminal exposes RSSI/AGC, lock stability, power detector, and BER/FER history to pinpoint which lever moved.
1) Practical interpretation of G/T, EIRP, and margin
G/T summarizes how clean the received signal appears at the demodulator input (front-end noise and losses dominate). EIRP summarizes how effectively transmit power is delivered on-air (including back-off and protection limits). Margin captures how far the link is from the FEC threshold once modulation and coding are applied. The terminal should convert these into actionable observables rather than leaving them as abstract budget terms.
2) G/T knobs (receive path): it is more than noise figure
- Front-end noise & loss: any added loss before the first low-noise gain stage directly reduces recoverable SNR.
- Gain distribution: too little early gain raises the relative impact of downstream noise/quantization; too much early gain increases overload risk and AGC hunting.
- Temperature & cabling effects: temperature drift and loss variations often present as “AGC climbing over time” or “more frequent re-lock”.
- What to watch: RSSI trend, AGC step history, LNB bias current trend, LO lock dropouts, and lock acquisition time.
3) EIRP knobs (transmit path): usable linear power is the target
- PA bias + power loop: setpoint should be reached through a stable detector loop (avoid oscillation/hunting).
- OBO (output back-off): back-off trades raw power for better linearity (often improving EVM/ACPR and decoding stability).
- Protection limits: VSWR/over-temp/under-voltage events can force power rollback; logging “limit active” flags is essential.
- What to watch: power detector reading, “power limit active” flag, temperature, VSWR alarm count, and re-try counters.
4) Margin knobs: Eb/N0 behavior, FEC threshold, and stability over time
Margin is where RF, modem, and control policies meet. A terminal should record time series rather than single snapshots: BER/FER over windows, sync state transitions, and any rate/adaptation changes. A common failure pattern is “near-threshold operation” where small environmental changes cause repeated state churn.
- If RSSI drops with AGC pegged: receive path impairment is likely (loss/pointing/weather). The terminal can detect and timestamp the onset.
- If RSSI is stable but BER/FER rises: suspect non-linear transmit behavior, lock instability, or interference-like conditions; the terminal must correlate with power limit flags and lock history.
- If frequent re-lock happens: focus on reference/synth stability and the lock sequence; record lock times and failure codes.
5) “Controllable vs not controllable”: draw the responsibility line
| Category | Terminal can control (examples) | Terminal cannot control (but must detect/record) |
|---|---|---|
| Receive (G/T) | Gain/AGC policy, lock sequencing, alarm thresholds, telemetry cadence | Severe rain fade, physical blockage, external pointing errors (record onset + severity) |
| Transmit (EIRP) | Power loop setpoint, OBO policy, rollback behavior, “safe enable” gating | Mechanical antenna issues, external constraints that cap allowed EIRP (record limit triggers) |
| Modem/Margin | Sync gating, windowing for BER/FER decisions, rate adaptation stability rules | External network congestion beyond the terminal’s interface (record backhaul health + drops) |
6) Field quick checks (fast isolation without special gear)
- Check lock timeline: Reference stable → synthesizer lock → RF lock → modem sync. Any missing step points to the responsible block.
- Check “limit active” flags: power limit, VSWR, over-temp. These convert mystery throughput drops into explainable events.
- Check RSSI/AGC trend: slow climbs often indicate loss/temperature drift; sudden steps often indicate environment or configuration changes.
- Check BER/FER window history: spikes aligned with lock transitions suggest acquisition instability; spikes with stable lock suggest linearity/interference-like conditions.
- Check backhaul queue drops: link can be RF-healthy while traffic stalls due to queuing/QoS mis-prioritization.
H2-3 · LNB/LNA control: biasing, gain steps, LO lock and health reads
Practical terminal-side control means more than “power on”. A robust design defines bias ramp rules, AGC stability rules, and lock/telemetry that translate raw readings into actionable alarm codes.
Key takeaway: LNB/LNA control is a closed-loop system: Bias (V/I) → Sense → Controller, plus an AGC gain-step policy that prevents hunting, and a lock/health model that stamps time, reason codes, and severity.
1) Biasing: ramp, monitor, and protect (terminal internal)
Bias rails should be treated as controlled resources because they can destabilize the entire terminal if they surge or collapse. Use a soft-start ramp to avoid inrush dips, and continuously monitor both voltage and current to detect wiring faults, shorts, and thermal drift that degrades RF performance over time.
- Soft-start ramp: raise bias in steps or with a fixed slope; freeze other “release gates” until the rail settles.
- Dual monitoring: voltage confirms “rail present”, current confirms “load is sane”. Logging current trend is often the earliest fault indicator.
- Protection sequence: limit → foldback/disable → cooldown delay → retry with counter; expose a clear alarm code and retry count.
Recommended engineering rules (simple and robust)
(a) Allow a short inrush window, then require current to fall below a steady-state threshold. (b) Require bias voltage to remain within tolerance for a hold time before declaring “BiasOK”. (c) After N retries, latch an alarm and require manual intervention or a maintenance command.
2) Gain steps & AGC: avoid hunting and false alarms
Gain-step AGC can create “instantaneous” level changes that look like faults to downstream detectors. A stable policy uses deadband and rate limiting, and applies a settling window after each step so protection logic does not react to transient artifacts.
- Deadband / hysteresis: require a meaningful delta before changing gain; prevents oscillation between adjacent steps.
- Step rate limit: cap how frequently steps can occur; limits noise-induced chatter and lock disturbances.
- Settling window: after a step, suppress overload/lock-fail decisions for a short window; only then evaluate health metrics.
3) LO/PLL lock: make lock status debuggable
A single lock flag is not enough. Lock behavior should be represented as a state machine with time-to-lock and reason codes. When a system shows “powered but not locked”, the fastest isolation is achieved by correlating RefStable, SynthLock, LO_Lock, and any overload indicators around the acquisition period.
- Time-to-lock: record how long each lock step took; spikes often point to reference instability or marginal loop settings.
- Unlock reason codes: ref lost, overload, temperature limit, bias dip, external alarm — keep codes short and consistent.
- Lock stability counters: “unlock events per hour” is more useful than a single snapshot.
4) Health reads → alarm codes: convert signals into actions
Telemetry should produce alarm codes that indicate the most likely subsystem, not just “fail”. Use a small set of categories and always attach a snapshot of key readings (RSSI, AGC step, bias current, temperature, lock flags) to each event record.
| Primary symptom (observables) | Likely cause class | Recommended alarm code |
|---|---|---|
| Bias V out of range OR I high after settle | Short, miswire, failing bias stage, abnormal load | BIAS_FAULT (include V/I + retry count) |
| AGC steps oscillate, IF overload flags | Gain policy unstable, overload, incorrect step thresholds | AGC_HUNT or OVERLOAD (include step history) |
| LO lock fails with RefStable true | Synth/LO loop marginal, temperature drift, overload coupling | LOCK_FAIL (include time-to-lock + reason code) |
| RSSI drops while bias current rises with temperature | Thermal drift, device degradation, cabling loss trend | THERMAL_DRIFT (include temp + slope) |
| Sensor readback invalid or missing | Telemetry path failure, ADC fault, link fault | SENSOR_INVALID (include channel ID) |
5) Field quick checks (fast isolation)
- Bias first: verify BiasOK (V in range + I settled) before interpreting RSSI or lock failures.
- Lock timeline: RefStable → SynthLock → LO_Lock. Missing the first step points to reference/power; missing later steps points to RF/loop margin.
- AGC history: rapid step toggling suggests hunting; step changes followed by lock drops suggest transient sensitivity.
- Correlate with temperature: lock instability that tracks temperature often indicates drift or derating triggers.
- Use event snapshots: any alarm should include RSSI, AGC step, V/I, Temp, and lock flags to avoid “no data” support loops.
H2-4 · BUC control: PA bias, power detector loop, OBO and protection
A stable uplink is achieved by controlling usable linear power, not chasing the maximum output number. The terminal should close a slow, well-behaved power loop, apply a practical back-off policy, and enforce protection sequencing.
Key takeaway: BUC control is a layered system: power detector → averaging → controller → bias/attenuation, guided by an OBO linearity policy, and guarded by a protection state machine (VSWR/thermal/under-voltage) that limits first and trips last.
1) PA bias & temperature compensation: predict drift and derate smoothly
Temperature changes shift PA gain and compression behavior, and they also shift detector sensitivity. A robust terminal uses temperature as a first-class input to maintain stability: apply a derate curve and compensate bias/gain so the system stays in a predictable linear region.
- Derate curve: reduce target power gradually above a thermal knee; avoid abrupt throughput collapse.
- Bias/comp tables: apply piecewise compensation that keeps the PA in a stable region across temperature.
- Telemetry correlation: always log temperature, target power, measured power, and “limit active” flags together.
2) Output power closed-loop control: detector → ADC window → controller → bias/atten
Most uplink instabilities come from a poorly tuned loop: detector noise, insufficient averaging, or a controller that reacts too fast. Power control should be slower than the thermal time constants and should use step ramps to avoid overshoot that triggers protection.
- Detector source: a coupled power detector is the primary measurement; treat it as a noisy sensor that needs windowed averaging.
- Controller behavior: prefer slow, monotonic convergence; avoid hunting around the setpoint.
- Actuators: attenuation is fast (good for quick limiting), bias is slower (good for steady operating point).
Power-loop stability checklist
(a) Use averaging windows long enough to smooth detector noise. (b) Apply target ramps for large setpoint changes. (c) Rate-limit the controller output. (d) Freeze “fault decisions” during planned ramps to avoid false trips.
3) OBO and linearity: choose usable power, not headline power
Output back-off (OBO) is a practical lever that trades raw power for linearity. Too little back-off pushes the PA into compression, causing distortion that can degrade demodulation stability. Too much back-off reduces EIRP and erodes receive margin. The terminal should treat OBO as a policy tied to measured quality metrics and protection headroom.
- When BER/FER worsens with stable lock: prioritize checking linearity policy and “limit active” events before increasing power.
- When thermal rollback is frequent: increase OBO or lower the target setpoint to avoid repeated protection churn.
- Record OBO state: always log the current OBO mode/level alongside BER/FER windows and throughput.
4) Protection sequencing: limit first, trip last (VSWR, thermal, UV)
Protection should be a state machine that enforces priority and debounce windows. The most common failure mode in the field is “chattering”: brief events cause repeated limit/trip actions that destabilize the link. Debounce and hold times convert noisy events into stable decisions.
- VSWR: apply immediate limiting → derate if persistent → trip only if the condition remains beyond a verified window.
- Thermal: derate smoothly → trip on hard over-temp → require cooldown before retry.
- Under-voltage: freeze power increases → reduce setpoint → protect against brownout-induced oscillation.
5) Field quick checks (fast isolation)
- Compare target vs measured power: large persistent error suggests loop or actuator limitation; log “controller saturated” if available.
- Check “limit active” flags: VSWR/thermal/UV flags convert unexplained throughput drops into timestamped causes.
- Check temperature vs power: power collapse that tracks temperature indicates derate/rollback rather than random fading.
- Check retry counters: repeated trip-retry cycles indicate poor debounce/hold rules or intermittent sensors.
- Correlate with BER/FER windows: quality drops aligned with limit events usually point to EIRP/linearity, not receive sensitivity.
H2-5 · Frequency plan & reference: synthesizers, phase noise, and lock sequencing
A SATCOM terminal can “look locked” and still behave poorly if the frequency plan, reference quality, or lock sequencing allows images, spurs, or unstable intermediate states to leak into the receive or transmit chain.
Key takeaway: Treat frequency planning and reference/lock as one system: choose an IF that avoids known pitfalls, enumerate images/spurs early, and enforce a traffic release gate only after RefStable → SynthLock → RF_Lock → CarrierLock.
1) Frequency planning: IF choice and “do-not-step-on” checks
Frequency planning is a pre-emptive debug tool. A clean plan reduces surprises such as unexpected self-interference, images folding into the band, or reference-related spurs landing inside the effective receive bandwidth. The simplest practical approach is to define an IF, then enumerate predictable mixing products and confirm none land in-band.
Frequency plan review checklist (practical)
(a) Verify LO±IF does not place images inside the intended passband. (b) Enumerate 2·LO±IF and LO±2·IF products and check proximity to the passband. (c) Identify strong internal spur candidates (reference-related and divider spurs) and check if any land in-band. (d) Ensure filter transition bands have margin—avoid “barely rejected” images. (e) Keep a short “pitfall list” for the chosen IF so future revisions do not silently break it.
2) Synthesizers and phase noise: why “locked” is not always “clean”
Phase noise and spurs can degrade demodulation margin without triggering a hard unlock. The terminal should connect synthesizer quality to observable effects: increased decoder workload, higher frame error windows, or stability issues during acquisition. These symptoms become actionable when correlated with lock timestamps and setpoint changes.
3) Reference inside the terminal: TCXO vs OCXO boundary (lock & drift focus)
The reference clock impacts time-to-lock, lock stability, and effective phase-noise floor. A stronger reference option is justified when the system repeatedly re-acquires under temperature swing or when quality margins are tight and decoder workload rises even with stable RF lock. The selection boundary should be expressed in operational terms: lock stability and drift sensitivity.
- Lock stability: unstable reference behavior tends to lengthen lock time and increase unlock/re-lock events.
- Quality under margin: higher phase-noise floor can manifest as higher decoder iterations or worse error windows.
- Temperature drift: large drift during warm-up or environmental changes can shrink acquisition tolerance and trigger re-lock cycles.
4) Lock sequencing: enforce a traffic release gate
Lock sequencing should be treated as a state machine with explicit settle windows. The goal is to prevent intermediate, partially-locked states from being mistaken as “ready”, which often produces unstable behavior after traffic begins. A clean sequence uses a release gate: traffic is enabled only after the full chain is stable.
H2-6 · Modem ASIC chain: framing, FEC, rate adaptation and diagnostics
Users searching for “SATCOM modem ASIC / LDPC / ACM” typically want actionable anchors: where performance is determined, which knobs exist in silicon, and which diagnostics quickly prove whether instability is RF margin, decoding pressure, or sync.
Key takeaway: A modem ASIC is a measurable pipeline: Ethernet ingress → framing/encap → FEC → modulation → sync/decoder. Stability comes from windowed quality metrics (BER/FER, iterations, sync state) and from ACM/VCM rules (hysteresis + minimum dwell time + hold-off after reacquire).
1) Data path: from Ethernet to IF (and back) with measurable checkpoints
The most useful mental model is a pipeline with checkpoints. Each stage should expose a short set of counters that survive real-world noise: windowed error rates, loss counters, and buffer levels. These are the signals that convert “it’s slow today” into a reproducible diagnosis.
2) FEC in practice: decoder workload is the early warning
FEC is best treated as a workload engine rather than an algorithm lesson. Near the decoding threshold, the system often shows a characteristic signature: iteration counts rise, latency can increase, and frame error windows widen. Those are measurable outputs that remain meaningful even when a hard unlock never occurs.
- IterationAvg / IterationMax: rising iterations indicate shrinking margin or increased impairment, even with stable lock.
- FER window: windowed frame errors are more actionable than single-point estimates.
- DecoderFail count: repeated decoder failures usually precede rate fallback or reacquire events.
3) ACM/VCM: switching rules that prevent oscillation and “rate thrash”
Rate adaptation is a control system. If it reacts too quickly, it can thrash between modes and degrade user experience with repeated bursts of loss or re-sync. A stable policy uses hysteresis, minimum dwell time, and a hold-off window after any reacquire.
ACM/VCM stability rules (recommended)
(a) Use separate thresholds for step-up vs step-down (hysteresis). (b) Enforce minimum dwell time per mode. (c) After reacquire, freeze adaptation for a short hold-off window. (d) Log every mode change as an event with a metric snapshot.
4) Diagnostics: build a readable “modem health panel”
A high-value terminal exposes a compact diagnostic panel: sync state, BER/FER windows, decoder iteration statistics, frame loss counters, buffer levels, and current ACM mode with dwell time. When these metrics are time-aligned with lock events, support and field troubleshooting become deterministic instead of speculative.
| Observed pattern | Most likely interpretation | What to log / alarm |
|---|---|---|
| RF lock stable, IterMax rises, FER window widens | Quality impairment without hard unlock; margin shrinking | FEC_PRESSURE (IterAvg/Max + FERwin snapshot) |
| ACM mode changes frequently within short time | Rate thrash due to missing hysteresis/dwell rules | ACM_THRASH (mode history + dwell timers) |
| Sync state flips, frame loss spikes | Sync instability or transitions without hold-off | SYNC_UNSTABLE (state timeline + hold-off flags) |
| Ingress buffer overflow/underflow | Pipeline backpressure, mismatch of rate control vs traffic | BUFFER_EVENT (levels + drops + timestamps) |
| CRC errors increase without RF unlock | Framing integrity issue or intermittent impairment | FRAME_INTEGRITY (CRC counters + mode + sync) |
H2-7 · Encryption in a terminal: where it sits, what must be protected, what to log
Terminal encryption is an engineering placement decision. The best design makes the insertion point explicit, preserves throughput, avoids accidental MTU breakage, and produces logs that can prove “why traffic failed” in the field.
Key takeaway: Choose a crypto insertion point (L2/L3-style vs link-layer style) by its impact on throughput, latency, MTU/fragmentation risk, and debuggability. Then enforce a minimal key lifecycle (load → active → rotate → retire) and log reason-coded failures.
1) Crypto insertion points: terminal-level consequences
Encryption can be inserted near the Ethernet stack (L2/L3-style) or inside a terminal-specific link layer. The practical differences are not academic: header overhead changes effective MTU, policy boundaries change what flows are protected, and observability determines whether troubleshooting is deterministic or guesswork.
| Placement | What it changes in a terminal | What to instrument |
|---|---|---|
| L2/L3-style insertion (MAC/IP-like) |
Effective MTU can shrink; fragmentation/reassembly risk increases; policy boundaries become “network-shaped”; management and traffic flows need clear separation to avoid unintentional coupling. | MTU_exceed, fragment_count, policy_id, cipher_bps, auth_fail, replay_drop |
| Terminal link-layer insertion (proprietary) |
Easier to align crypto metrics with modem metrics; fewer external dependencies; interoperability/debug depends on terminal-owned counters and logs. | session_id, mode/state, cipher_bps, drop_reason, key_state, latency_p95 |
2) Throughput and latency: keep the data path predictable
Crypto performance issues are usually pipeline issues: packet-per-second limits, queue backpressure, extra copies, or small ring buffers. A robust terminal exposes counters that differentiate “crypto engine saturation” from “backhaul queue drops” and from “policy/auth failures”.
3) Key interface: load, rotation, and invalid handling (principles only)
A terminal needs a clear operational key lifecycle that does not destabilize traffic. Rotation should be visible, failure handling should be reason-coded, and transitions should be staged to avoid sudden “all traffic stops” events.
Key lifecycle (terminal view)
Load → Validate → Active → Rotate → Retire. If a key is invalid or missing, the terminal should expose a reason code such as MISSING_KEY, EXPIRED, POLICY_MISMATCH, or AUTH_FAIL, and clearly indicate whether new sessions are blocked or existing sessions continue.
4) What to log: audit-friendly, reason-coded, timestamped
A crypto failure without a reason code is not diagnosable. Logs should be compact but complete enough to prove: whether crypto was enabled, which policy was active, what key state existed, and why packets or sessions were rejected.
| Field | Meaning | Examples |
|---|---|---|
| timestamp | Monotonic + wall-clock correlation for audits | ms resolution |
| direction | Traffic direction for correlation | Tx / Rx |
| crypto_enabled + policy_id | Proves which policy was active | POLICY_A / POLICY_B |
| key_state + key_id (index/hash) | Key availability and lifecycle state | ACTIVE / ROTATING |
| reason_code | Deterministic failure classification | AUTH_FAIL, REPLAY_DROP, EXPIRED, MISSING_KEY |
| bytes + drops + latency bucket | Performance snapshot for support | cipher_bps_tx, latency_p95 |
H2-8 · Ethernet backhaul: QoS, VLAN, jitter buffers, and “don’t drop the link”
Backhaul is where user experience is won or lost. A terminal should provide a minimal, reliable L2/L3 loop, stable queueing/shaping to absorb bursts, and layered drop-out diagnostics that clearly separate network-side failures from modem-side events.
Key takeaway: Treat backhaul as three flows—Mgmt, Control, Traffic—with explicit queues, rate caps, and drop policies. Correlate link/DHCP/DNS failures with modem timing so “dropouts” can be attributed to the correct layer.
1) Minimal L2/L3 loop: VLAN/QoS/DSCP mapping that survives the field
A practical configuration starts with classification and mapping: identify management, control, and traffic flows, then map VLAN and DSCP into internal queues. The terminal should make these mappings visible and verifiable through counters, rather than relying on assumptions about upstream equipment.
2) Buffers & congestion: queues, shaping, and jitter buffers
SATCOM throughput is variable due to rate adaptation and reacquire events. Backhaul must damp bursts and protect the modem interface from sudden overrun/underrun. This is achieved with per-class queueing, shaping (rate caps), and small jitter buffers where appropriate.
Backhaul stability rules (recommended)
(a) Apply rate caps for management and background tasks. (b) Keep control traffic above management during congestion. (c) Prevent bursts from directly hitting the modem interface by using shaping + queue depth limits. (d) Export underrun/overrun counters with timestamps.
3) “Don’t drop the link”: layered diagnostics (PHY → IP → DNS)
Dropouts must be attributed by layer. The terminal should expose link status and renegotiation events, IP-level failures such as DHCP issues, and name-resolution failures such as DNS timeouts. Each category should have a reason code and a retry counter, allowing support to separate “network-side trouble” from modem-side reacquire or quality events.
| Layer | What to detect | Counters / reason codes |
|---|---|---|
| PHY / Link | Link up/down, renegotiation, error frames | LINK_FLAP, SPEED_CHANGE, ERR_FRAME_CNT |
| IP / Routing | DHCP failures, lease renew issues, gateway unreachable | DHCP_FAIL, LEASE_RENEW_FAIL, GW_UNREACH |
| DNS / Services | DNS timeout, resolution failures, service reconnect storms | DNS_FAIL, RESOLVE_TIMEOUT, RECONNECT_STORM |
4) Separate management plane from data plane
Remote management is essential, but it must never starve user traffic. The clean solution is explicit separation: a management queue with a strict rate cap, independent counters, and “business protection mode” behaviors that prioritize traffic and control over management uploads or remote sessions during congestion.
H2-9 · Power & sequencing inside the terminal: rails, brownout behavior, and safe restart
Reliable reacquire starts with deterministic power behavior. A robust terminal treats power as multiple domains (RF, synthesizers, modem, crypto, PHY/management), gates reset with PG and time windows, and turns brownouts into reason-coded events that trigger a safe restart path instead of “mystery lock failures”.
Key takeaway: Define power domains and dependencies, then implement PG/RESET gating + brownout grading + minimal state save + staged relock. Export rail telemetry and reset cause codes so field dropouts become diagnosable.
1) Multi-rail domains: what must be isolated and why
Domain separation prevents one unstable rail from corrupting unrelated state. It also makes recovery faster: only the affected domain needs reinitialization, while the terminal preserves a consistent “known-good” baseline for the relock sequence.
| Domain | Dependency / “Ready” gate | Common brownout symptom (terminal view) |
|---|---|---|
| Synth / Ref | Ref stable + Synth PLL lock + settle time | LO appears “locked” but reacquire loops; carrier lock instability; relock time spikes |
| RF Front-End | Synth ready + RF bias OK + gain state initialized | AGC thrash, RSSI baseline shifts, intermittent lock detect events |
| Modem | RF lock gate + internal self-test complete | Frame sync never stabilizes; decoder iterations surge; traffic enable is unsafe |
| Crypto | Key state valid + policy loaded + counters reset | Traffic blocked by policy/key errors; opaque failures without reason codes |
| Ethernet PHY | PHY power-good + link training/negotiation done | Link flap storms; renegotiation loops; DHCP/DNS failures unrelated to RF quality |
2) Sequencing & reset tree: gate releases, do not “hope it’s stable”
A safe sequence is a set of gates, not a list of rails. The terminal should export PG states per domain, then use a reset controller to enforce: rail stable → PG asserted → time window → reset release. Planned transitions (startup, mode change, restart) should use a short “mask window” to prevent false alarms while rails and locks settle.
Recommended release gates (examples)
Ref gate: ref_ok & settle_timer_done → allow Synth reset release.
Synth gate: pll_lock & lock_stable_timer → allow RF enable.
RF gate: bias_ok & gain_state_init → allow Modem acquisition.
Traffic gate: sync_ok & health_ok → allow traffic enable.
3) Brownout behavior: grade the event, preserve the minimum, restart safely
Not all dips are equal. A practical design grades brownouts and responds with deterministic actions: freeze unsafe writes, snapshot a minimal recovery context, and enter a safe restart flow that relocks in stages (ref → synth → RF → modem → traffic). The restart flow must record which stage fails, so “won’t lock back” becomes actionable.
4) Critical telemetry: rails, PG, and reset cause codes
Power problems are unfixable without power evidence. Export rail voltage/current per domain, PG transitions with timestamps, and reset cause codes. The core requirement is to separate BOR resets (supply integrity) from WDT resets (software control path), and to log the recovery stage that failed.
H2-10 · Thermal & mechanical: keeping RF linearity and lock across temperature & vibration
Temperature and vibration can look like “RF quality issues” even when the link budget is unchanged. A robust terminal maps thermal nodes to measurable performance outcomes (gain, NF, PLL stability, PA linearity), applies deterministic derating, and monitors mechanical intermittency (connector micro-motion, cable strain) with time-aligned counters.
Key takeaway: Build a “thermal influence map” and a minimal sensor layout, then use policy actions (derate, OBO increase, freeze switching, extend settle) that are visible in logs. For vibration, treat intermittent errors as a measurable class with event bursts, not as random noise.
1) Thermal → RF performance: map cause to measurable effects
Thermal drift matters because it changes operating points. The terminal should translate “temperature change” into measurable symptoms and counters, so stability can be proven instead of guessed.
| Thermal effect | What changes | What to observe |
|---|---|---|
| Gain drift | AGC working point shifts; RSSI baseline moves | RSSI_baseline, AGC_state, gain_step_rate |
| NF variation | Same RSSI produces worse decoding margin | FER/BER window, decoder_iter, relock_count |
| LO drift / lock margin | Synth stability reduces; reacquire increases | pll_lock_events, lock_loss, relock_time |
| PA linearity shift | Compression point moves; EVM/ACPR degrades | EVM_flag, OBO_level, power_limit_active |
2) Thermal control: partition, sense, and apply visible policy actions
Thermal control inside a terminal is primarily about placement and policy. Place sensors where they predict performance drift (PLL/ref, PA, LNA/front-end, board hotspot, air-in or case). Then apply derating actions that are deterministic and observable.
Recommended thermal triggers (terminal view)
Warm drift: raise settle time, log PLL margin warnings.
Hotspot: increase OBO, cap power, freeze frequent mode switching, prioritize link continuity.
Critical: enter protective mode with explicit reason code, preserve minimum state, and avoid risky writes.
3) Mechanical/vibration: intermittent faults that mimic link degradation
Vibration can create short error bursts through connector micro-motion, cable strain, or intermittent contact. These events often appear as sudden spikes in errors or brief link flaps. The terminal should treat this as a distinct class: detect bursts, record timing, and correlate with temperature and lock state.
- Connector micro-motion: short FER/BER bursts, occasional link flap counters, repeatable under vibration.
- Cable strain / coax micro-movement: RSSI jitter, AGC over-activity, transient lock margin loss.
- Vibration-sensitive phase/jitter: decoder iterations surge without a clear RSSI drop, reacquire frequency increases.
4) Observability: time-align thermal, lock state, and error bursts
Stability requires aligned evidence. Export sensor readings with sensor IDs, derate states, lock state, and error windows, so support can attribute symptoms to thermal drift, mechanical intermittency, or true channel impairment.
H2-11 · BIT/BIST & field troubleshooting: prove where the fault is (RF vs modem vs network)
A SATCOM terminal becomes supportable when built-in tests (BIT/BIST) produce layered evidence. This chapter defines a practical fault-isolation method that separates RF lock and power health, modem synchronization and error behavior, and Ethernet/network continuity using a small set of counters, reason codes, and a deterministic step-by-step flow.
Field rule: Do not guess. Prove the fault domain with layer state + counters + timestamps. Always check Lock first, then Errors, then Network, then Power/Thermal evidence.
1) BIT types that actually help in the field (PBIT / CBIT / IBIT)
Use three complementary BIT modes. Each mode should output a concise verdict (PASS / DEGRADED / FAIL) plus a reason code.
- PBIT: verifies “can the terminal enter acquire safely?” (ref OK → synth lock → RF bias OK → modem basic sync → PHY link).
- CBIT: watches drift and intermittency (lock-loss bursts, thermal derate events, link flaps, error-rate windows).
- IBIT: forces structured checks during a fault (read lock states, run loopback/self-checks, snapshot counters with timestamps).
2) Layered self-test: RF → Modem → Network (separate domains with minimal checkpoints)
A useful troubleshooting system avoids mixing symptoms. Each layer must have its own checkpoints and “ready gates” so faults are isolated quickly rather than blamed on the wrong subsystem.
| Layer | Minimum checkpoints (readable states) | BIT output (examples) |
|---|---|---|
| RF |
ref_ok, pll_lock, lo_lock, rf_lock tx_power_det, vswr_trip, pa_temp, rssi_baseline |
RF_PASS / RF_DEGRADED / RF_FAIL reason: LOCK_LOSS, VSWR_TRIP, POWER_LOOP_UNSTABLE |
| Modem |
frame_sync, acquisition_state, ACM_state BER/FER_window, decoder_iterations, drop_reason |
MODEM_PASS / MODEM_DEGRADED / MODEM_FAIL reason: SYNC_LOST, FEC_STRESS, ACM_TOGGLE_STORM |
| Network |
phy_link_up, negotiation_state, error_frame_cnt dhcp_ok, dns_ok, throughput, q_drop_by_class |
NET_PASS / NET_DEGRADED / NET_FAIL reason: LINK_FLAP, DHCP_FAIL, QUEUE_DROPS |
3) Must-have counters & reason codes (small set, high diagnostic value)
Counters should be stable, time-windowed, and aligned to a single time base so correlation is possible (e.g., “lock lost at 12:03:21, FER spike at 12:03:23, link flap at 12:03:24”). The list below is intentionally short: it is designed to point to a fault domain quickly.
| Counter / field | Layer | How it isolates the fault (quick interpretation) |
|---|---|---|
| lock_loss_count | RF | Rising during a fault strongly suggests an RF/Synth stability issue before investigating Ethernet. |
| relock_time_p95 | RF/Modem | Long tail relock time indicates marginal lock, thermal drift, or sequencing instability; correlate with temperature nodes. |
| lock_fail_stage | RF | Pinpoints where reacquire stops (REF / SYNTH / RF / SYNC). “SYNTH” often points to ref/synth gating stability. |
| FER_window | Modem | High FER with stable lock suggests channel impairment or modem stress; compare with decoder_iterations to separate cases. |
| decoder_iterations_max | Modem | Iterations spike without RSSI change often indicates phase/jitter margin loss or intermittent contact rather than pure fade. |
| ACM_toggle_rate | Modem | Excessive switching can cause throughput jitter; if toggling storms occur, prefer stabilizing policy over chasing Ethernet. |
| link_flap_cnt | Network | Separates physical link drops from RF issues; a rising flap count indicates PHY/negotiation/cabling class problems. |
| dhcp_fail_cnt / dns_fail_cnt | Network | Distinguishes IP service failures from PHY drops; useful when RF is healthy but “service is down”. |
| q_drop_by_class | Network | Proves whether management/control traffic is starving user traffic (or vice versa); supports QoS root-cause attribution. |
| reset_cause (WDT/BOR) | Power evidence | BOR points to supply integrity; WDT points to control-path failures. Always record with timestamp and stage snapshot. |
| brownout_crit_cnt / pg_glitch_cnt | Power evidence | Explains “random” reacquire failures: power gating instability creates state inconsistencies that mimic RF faults. |
| thermal_event_cnt / derate_level | Thermal evidence | Connects stability loss to derating actions (OBO++, power caps, freeze switching). Must be visible in logs. |
4) Field troubleshooting steps (deterministic workflow)
Use the same order every time to avoid false conclusions. Each step produces a binary decision and a short action set.
Step 1 — Check Lock: verify ref_ok / pll_lock / lo_lock / rf_lock and confirm
lock_loss_count is not increasing.
Action: if lock is unstable, do not blame Ethernet yet—capture lock_fail_stage and proceed to RF/Synth checks.
Step 2 — Check Errors: read FER_window and decoder_iterations_max with timestamps.
Action: stable lock + high iterations suggests margin issues (thermal/mechanical/jitter) before chasing DHCP.
Step 3 — Check Network Continuity: differentiate link_flap_cnt vs dhcp_fail_cnt vs
q_drop_by_class.
Action: link flaps indicate PHY/negotiation/cabling class issues; DHCP/DNS failures indicate IP service path issues.
Step 4 — Check Power/Thermal Evidence: read reset_cause, brownout_crit_cnt,
derate_level, and temperature nodes.
Action: BOR/PG glitches explain intermittent reacquire problems; thermal derating explains throughput drops and EVM-related alarms.
Step 5 — Output a Proven Conclusion: FaultDomain + Evidence + Next Action.
Example: “Network / Evidence: link_flap_cnt rising, lock stable / Action: inspect PHY negotiation and physical link path.”
5) Common root-cause cheat sheet (symptom → evidence → most likely domain)
| Observed symptom | Evidence pattern (counters) | Most likely domain |
|---|---|---|
| Frequent reacquire / won’t relock | lock_loss_count↑ + lock_fail_stage=SYNTH/RF + relock_time_p95↑ | RF / Synth stability |
| Throughput collapses but lock is stable | ACM_toggle_rate↑ or q_drop_by_class↑ with link_flap_cnt stable | Modem policy or Network queueing |
| Short error bursts (intermittent) | FER_window spikes + decoder_iterations_max spikes + RSSI baseline ~stable | Margin / thermal / mechanical intermittency |
| “Service down” while PHY is up | link_flap_cnt stable + dhcp_fail_cnt↑ / dns_fail_cnt↑ | Network service path |
| Random resets & state corruption | reset_cause=BOR + brownout_crit_cnt↑ + pg_glitch_cnt↑ | Power integrity / sequencing |
| Performance degrades with temperature | thermal_event_cnt↑ + derate_level↑ + OBO/power_limit active | Thermal drift / derating behavior |
6) Example part numbers (reference only; illustrates how BIT points are implemented)
The items below are examples to show how common BIT/BIST checkpoints are realized in hardware. Final selection must follow frequency plan, bandwidth, temperature grade, reliability constraints, and interface requirements.
Reference parts (examples)
RF power / log detectors (TX/RX health, loop stability):
ADI ADL5519, AD8318 (log detectors); ADI AD8361/AD8362 (gain/detect classes)
Current / power monitors (rail telemetry for proof):
TI INA226, INA238 (I²C power monitors)
Voltage supervisors / reset controllers (reason-coded resets):
TI TPS386000 (supervisor/reset class)
Temperature sensors (thermal evidence & derating triggers):
TI TMP117 (precision temperature sensor class)
Ethernet PHY diagnostics (link flap evidence):
TI DP83867; Microchip KSZ9031 (Gigabit PHY examples)
Event log storage (fault evidence retention):
Winbond W25Q series (SPI-NOR class) for compact event records and snapshots
H2-12 · FAQs (SATCOM Terminal)
These FAQs target field-debug questions within a SATCOM terminal boundary: RF lock and health, modem error behavior, encryption placement, Ethernet backhaul continuity, internal power events, and the minimal evidence bundle needed to prove the fault domain.
1 LNB is powered but no lock—what are the first three checks?
Start with three proof checks before changing settings: (1) confirm the terminal reference and synthesizer are stable (ref_ok, pll_lock), (2) confirm the LNB LO lock flag is asserted and not flapping (lo_lock, lock_loss_count), and (3) verify bias current and RSSI baseline are plausible (bias_I, rssi_baseline).
2 Why does increasing gain improve RSSI but worsen BER?
RSSI can rise while decode quality falls when the receive chain moves into compression, distortion, or phase-noise-limited operation. Typical signatures are high decoder_iterations_max and rising FER_window while lock remains stable. Check AGC/gain-step rate and whether the chain is saturating (power detector/overload flag) or amplifying spurs that hurt demodulation.
3 How to pick OBO without killing throughput?
Choose OBO by targeting a stable “decode margin” rather than maximum power. Increase OBO until error behavior stops being nonlinear: FER_window falls and decoder_iterations_max stays bounded during bursts. If thermal derate is active (derate_level), use a more conservative OBO to avoid repeated policy oscillations (ACM toggling) that crush throughput.
4 VSWR alarm keeps tripping—detector fault or real mismatch?
Separate “real mismatch” from “measurement/threshold noise” using correlation. A real event usually tracks transmit power and persists for a minimum duration; a detector/threshold issue often appears as short spikes that do not match tx_power_det changes. Compare vswr_trip_cnt and trip duration against temperature/vibration time stamps, and confirm whether reducing power reduces trip rate.
5 PLL lock is stable at room temp but fails hot/cold—what’s drifting?
Temperature failures usually come from reference frequency drift, VCO tuning headroom, or loop stability margin changing with temperature. Look for lock instability that correlates with the PLL temperature node (temp_PLL) and long-tail reacquire (relock_time_p95). If lock_fail_stage indicates SYNTH/REF, extend “ref-stable” gating and avoid enabling downstream LO before the reference is settled.
6 Which counters best separate RF impairment vs modem impairment?
Use a minimal “split set.” RF-domain evidence: lock_loss_count, lock_fail_stage, relock_time_p95, and stable RSSI baseline. Modem-domain evidence: frame_sync_state, FER_window, decoder_iterations_max, and ACM_toggle_rate. If lock is stable but iterations/FER rise, investigate margin/thermal/mechanical before blaming RF lock.
7 Does encryption reduce throughput, and where is the bottleneck?
Encryption can reduce throughput through processing limits, buffering/MTU overhead, or contention with queues and management traffic. Prove the bottleneck by checking crypto engine utilization and drops (crypto_engine_util, crypto_drop_reason) alongside network queue drops (q_drop_by_class). A flat crypto utilization with rising queue drops points to backhaul scheduling, not encryption capacity.
8 Ethernet link is up but traffic stalls—what queue/QoS settings matter?
A “link up, traffic stalled” symptom is often queueing and shaping, not PHY health. Check q_depth_by_class and q_drop_by_class to see whether management/control traffic is starving user traffic (or user bursts are drowning control). Ensure DSCP/VLAN mapping is consistent, apply shaping to smooth bursts, and keep a protected control queue so the modem never underruns/overruns.
9 Why do brief brownouts cause long reacquisition time?
Short brownouts can create cross-domain state inconsistency: some rails glitch while others continue, leaving synthesizers/modem state machines in mismatched phases. That often causes repeated partial relock attempts, stretching reacquisition tail (relock_time_p95). Confirm with reset_cause=BOR, rising brownout_crit_cnt, and pg_glitch_cnt. A clean staged relock sequence with “ref-stable gating” usually fixes the long tail.
10 What telemetry should be streamed for proactive maintenance?
Stream a small, high-value set: lock state and relock counters (lock_state, relock_count), power/VSWR health (tx_power_det, vswr_trip_cnt), modem quality windows (FER_window, decoder_iterations_max, ACM_state), network continuity (link_flap_cnt, dhcp_fail_cnt), and thermal/power evidence (derate_level, reset_cause). Use slow sampling plus event-triggered snapshots.
11 How to validate a terminal on the bench without a live satellite?
Bench validation can prove each domain independently: run modem-chain self-tests/loopbacks to verify sync and FEC counters (frame_sync_state, FER_window, iterations), apply controlled IF stimulus to confirm lock sequencing gates, and use a repeatable Ethernet traffic profile to stress queues and measure drops (q_drop_by_class, throughput). Capture a baseline “healthy snapshot” to compare against field logs later.
12 What’s the minimal log bundle to ask from the field?
Request a compact evidence bundle that proves the fault domain: (1) a timestamped snapshot of layer states (lock/sync/link), (2) counters over a short window (e.g., last 30–60 s) for lock loss, FER/iterations, and link flaps, (3) the last N alarms and reason codes, (4) last reset cause with brownout/PG evidence plus temperature/derate level, and (5) a configuration summary (freq plan, OBO, ACM policy, QoS mapping).