Flight Control Computer (FCC): Safety Computing for Flight Control
← Back to: Avionics & Mission Systems
A Flight Control Computer (FCC) is a safety-critical, deterministic control node that must detect faults, contain them, and keep outputs bounded under real-world disturbances. Lockstep compute, ECC memory, supervised power/reset, watchdog/FDIR, redundancy voting, and evidence-grade logging together turn “it runs” into “it can be proven safe and diagnosable in the field.”
H2-1 · What an FCC is (and what it is not)
A Flight Control Computer (FCC) is the safety-critical compute node inside the flight-control closed loop. It must turn time-bounded inputs into actuator commands with deterministic timing, and it must handle faults in a controlled, traceable way (detect → contain → command-safe → log evidence).
In-scope (FCC owns)
- Closed-loop compute: sample → compute → command update, with a fixed control period.
- Determinism: bounded latency, jitter control, timeouts, and safe degradation rules.
- Safety mechanisms: lockstep compare, ECC (detect/correct), watchdog supervision.
- Power/reset control: sequencing, PG/RESET tree, brownout behavior, safe bring-up.
- Fault handling: detection → isolation/containment → output inhibit/degrade → recovery.
- Evidence chain: event logging (reset cause, ECC counters, watchdog trips, mismatch flags).
Out-of-scope (handled by other pages)
- Sensor front ends: IMU/AHRS AFE/ADC choices, sensor fusion algorithms, calibration math.
- Network infrastructure: AFDX/ARINC 664 switch architecture, bus protocol deep dives.
- Display pipelines: HUD/cockpit video interfaces, rendering/graphics processing.
- RF payload chains: radar/EW Tx/Rx, channelizers, anti-jam RF front-end details.
- Timing distribution: PTP/SyncE system-wide clock network design.
Practical rule: if a topic does not change the FCC’s determinism, fault reaction, or evidence logging, it does not belong on this page.
1) Sample the control-relevant inputs at a defined moment and validate freshness/consistency.
2) Compute commands within a bounded time budget; handle overruns as a safety event, not a “software glitch”.
3) Drive actuator command outputs, including a safe inhibit/degrade path that is independent and predictable.
H2-2 · Safety goals drive the architecture (DAL mindset → design constraints)
Safety requirements should not remain abstract labels. They translate into non-negotiable architectural constraints: independence, diagnostic coverage, bounded reaction time, and a verifiable evidence chain. In an FCC, those constraints directly justify lockstep, ECC, independent watchdog/power supervision, and structured event logging.
| Safety constraint (generic) | FCC mechanism (what must exist) | Verifiable evidence (how to prove) |
|---|---|---|
| No single fault may produce a hazardous command. Faults must be detected and contained before unsafe output. |
Lockstep compare + controlled output inhibit/degrade path; independent fault handler entry. | Inject mismatch / illegal state → output inhibit occurs within bounded time; event log records cause + action. |
| Latent faults must become observable. Silent corruption must turn into counters/events. |
ECC on critical memories; scrubbing strategy; health counters exposed to logging. | ECC corrected/uncorrected counters change under injection; thresholds trigger an event with context snapshot. |
| Reaction time must be bounded. Overruns and stalls are safety events. |
Windowed watchdog; time budget monitors; overrun detection hooks tied to safe handling. | Force compute overrun → watchdog/monitor triggers within defined window; log includes loop ID and timing. |
| Power anomalies must not create ambiguous states. Brownout is riskier than a clean reset. |
Safety PMIC/supervisor with sequencing, PG/RESET tree, brownout detection, deterministic reset path. | Brownout profile test → consistent reset behavior; logs show reset cause + rails that dropped (PG history). |
| Redundancy must avoid hidden coupling. Common-cause paths must be minimized. |
Independent supervision domains (power/reset/clock monitoring concept), cross-monitor checks, voting-ready outputs. | Remove one channel / fault one supervisor → other channel maintains bounded behavior; mismatch is recorded. |
| Certification-grade traceability. Decisions must be explainable after the fact. |
Event IDs, counters, state snapshots, and a consistent log readout interface (concept). | Any safety action produces an auditable record: what happened, when, what was commanded, and why. |
Writing rule for the rest of the page: every “important” block must answer three questions — what constraint it serves, how it works, and how to verify it.
H2-3 · Compute core: lockstep MCU/SoC and fault containment
Lockstep is not a “performance feature.” It is a measurable safety mechanism that turns silent compute faults into an observable mismatch, forcing the FCC into a controlled reaction path: detect → enter safety handler → inhibit/degrade outputs → log evidence. The value is only real when the reaction time is bounded and the output path is contained.
Dual-core lockstep
Two compute cores run the same flow and their results are compared. A mismatch is treated as a safety event and triggers a fault reaction path. The critical design question is not “how fast,” but how quickly and reliably the mismatch forces a safe output state.
Delayed lockstep
The second core runs with an intentional delay. This can change coverage for certain transient behaviors, but it requires clean synchronization points and disciplined buffering so that the comparison remains meaningful and does not introduce ambiguity in the control loop timeline.
This page stays at the safety path level (compare → exception → safe output). Micro-architecture internals are intentionally out-of-scope.
1) Time containment
- Overrun is a safety event: if control computation exceeds its time budget, the system must transition to a defined safe action, not “keep trying.”
- Reaction time is bounded: mismatch detection must lead to a safety handler within a predictable upper bound.
2) State containment
- Normal → Fault-handling is a deliberate state transition. Partial / ambiguous states must be avoided.
- Fault handling should rely on minimal, deterministic code paths and avoid complex dependencies.
3) Output containment
- The output chain needs a safe output mode (inhibit or degraded commands) that does not depend on “application tasks being healthy.”
- When mismatch is detected, unsafe actuator commands must stop before any recovery attempts are made.
| Metric | What it controls in an FCC | What to check / prove |
|---|---|---|
| Compare latency | How late a compute mismatch can be detected after it occurs; defines the “latest possible” detection time inside the control period. | Mismatch becomes observable quickly enough to preserve safe output behavior within the loop budget. |
| Error reaction time | Time from mismatch detection to the FCC reaching a defined safe output state (inhibit/degrade). | Worst-case mismatch triggers a bounded reaction that is independent of normal application scheduling. |
| Exception routing (IRQ/NMI) | Whether a mismatch can bypass “normal software” and reliably enter a safety handler even under overload. | Fault path remains effective even when the main control task is stalled, overloaded, or in a bad state. |
| Fault capture fidelity | Ability to log actionable evidence (mismatch type, timing, state snapshot) without adding unstable jitter. | Event log records cause + action with consistent ordering; logging overhead does not break determinism. |
Practical rule: a safety mechanism is only as strong as its fault reaction path and output containment.
What it catches
- Random compute faults that lead to different results on Core A vs Core B.
- Transient upsets that flip internal state and produce a mismatch during the compare window.
What it misses
- Common-mode failures (shared power/reset/clock disturbance) that can make both cores fail similarly.
- Same-software defects where both cores produce the same wrong result.
- Faults in non-lockstep peripherals unless those are independently monitored.
What to verify
- Injected mismatch triggers IRQ/NMI and enters the safety handler without relying on normal tasks.
- Outputs reach a defined safe state within a bounded time.
- Event logging captures cause + action + timing without violating the loop budget.
H2-4 · Memory system: ECC, buffers, and deterministic data handling
ECC is valuable in an FCC because it changes memory faults from “silent corruption” into an observable and auditable health signal. A practical FCC memory design treats ECC as a detect → correct → record pipeline and then enforces determinism: the worst-case correction/refresh/bus contention behavior must still stay inside the loop’s latency and jitter budgets.
SRAM (on-chip state)
- Holds control states and critical tables that influence the loop immediately.
- ECC/parity prevents one-bit upsets from becoming a silent wrong command; counters provide early warning.
External memory (bandwidth + refresh + contention)
- Large buffers may experience contention and refresh-related jitter; ECC is necessary but not sufficient.
- Determinism requires access discipline: priority, bandwidth budgeting, and predictable refresh impact.
Nonvolatile storage (configuration integrity)
- Corruption can turn into a wrong parameter set rather than a crash; verification must focus on detectability and traceability.
- This page stays on the FCC evidence chain and determinism impact; storage security details are out-of-scope.
ECC covers what
- Control-relevant states: mode/state machine variables and safety-reaction flags.
- Command buffers: the data structure that feeds outputs must not silently corrupt.
- Health counters: corrected/uncorrected counts must remain consistent and readable.
Verification: inject bit errors and confirm detection/correction plus consistent counter reporting.
Scrub strategy
- When: scrub in a dedicated maintenance time slice or controlled slack windows, not inside the tightest compute section.
- What first: prioritize memory regions holding long-lived safety states and lookup tables.
- Observable output: corrected/uncorrected counters must feed thresholds and generate events with context.
Verification: scrubbing produces measurable counter movement under injection; threshold crossing generates a structured log event.
Determinism guardrails (budget/latency)
- Worst-case correction latency must be accounted for: ECC correction/retry cannot silently push the loop beyond deadline.
- Bus contention must be bounded: background tasks (logging, scrubbing) must not starve control data access.
- Refresh impact must be predictable: refresh-related jitter must not create timeout storms or unsafe mode thrashing.
Verification: stress the memory subsystem while running the control loop; demonstrate bounded jitter and no unsafe timeouts.
Real-time safety depends on consistent snapshots. Double-buffering prevents the controller from reading “half-updated” inputs or states. Each control cycle should consume a single coherent frame (snapshot) and produce a command frame that can be validated for freshness and integrity.
Practical patterns (concept-level)
- Input snapshot: validate age/freshness, then lock a frame for the cycle.
- Compute on stable data: no mid-cycle partial updates from background traffic.
- Command frame: publish atomically; if faults occur, the safe output mode must override the frame.
H2-5 · Power, sequencing, reset: safety PMIC + supervised startup/shutdown
The most dangerous FCC failures often originate from power, reset, and brownout behavior, not from a lack of compute. A safety PMIC/supervisor makes power-up and power-down predictable and traceable: each transition follows a defined sequence, outputs remain inhibited until health is confirmed, and every anomaly becomes an auditable event.
Core domain
- Must reach a stable operating region before releasing control computation.
- Reset release must be consistent to avoid early lockstep mismatch behavior.
IO / interface domain
- Should remain inhibited until the core is stable and health checks are complete.
- Prevents “ghost outputs” during partial startup or brownout recovery.
Aux / supervision domain
- Supports supervision and safe-state controls that must remain meaningful during faults.
- Conceptually enables independence for monitoring and evidence capture.
This section stays inside the FCC module: it does not expand into aircraft-wide distribution or surge front-end design.
Brownout “half-alive” state
- Some logic may keep partial state while other blocks collapse, creating undefined state machines.
- A clean reset is safer than allowing ambiguous mid-voltage operation.
Inconsistent reset release
- If reset is released unevenly across compute and memory, lockstep can diverge before software has control.
- The reset tree must be designed and verified as a system, not as “one pin.”
Ghost outputs during startup
- IO rails can become active while the core is not stable, producing unintended actuator signaling.
- Use output gating and safe-state control to prevent unsafe commands until health is confirmed.
Phase 1: Pre-check
- Monitor: brownout status, previous reset cause, rail readiness flags (as available), supervisor status.
- Action: hold reset, keep outputs gated, block command publishing.
- Evidence: log reset cause and last known power anomaly markers.
Phase 2: Ramp / sequence
- Monitor: rail order, PG validity, timeouts, and sequencing state.
- Action: if PG fails or times out, force safe-state and re-assert reset rather than “continuing partially.”
- Evidence: log which rail did not reach PG and whether the transition was aborted or retried.
Phase 3: Health confirm
- Monitor: stable rail flags, supervision domain alive, basic self-check readiness signals (concept-level).
- Action: release reset in a controlled order; enable outputs only after explicit “health confirmed.”
- Evidence: log a “startup complete” marker with key health counters snapshot (minimal set).
Shutdown should be symmetrical: remove output authority first (safe-state), then collapse rails in a defined order, and record the reason.
H2-6 · Watchdogs, monitors, and fault handling (FDIR flow)
A watchdog is not “one pin that resets the board.” In an FCC it is part of a monitoring and fault-handling system that detects timing failures, isolates unsafe behavior, triggers controlled recovery or degradation, and records evidence for certification and field diagnosis. The key is to cover wrong time failures with bounded actions and clear logs.
Software heartbeat
- Shows a task is progressing, but may not prove overall timing health.
- Best used as a layer, not as the last line of defense.
Windowed watchdog
- Must be serviced within a defined window, catching both “too slow” and “too fast” behavior.
- Turns timing drift into a bounded fault path rather than a silent performance issue.
External independent watchdog
- Conceptually independent in clocking/supply, so it can act when internal timing is compromised.
- Provides a predictable response when software and internal supervision are no longer trustworthy.
| Fault source | Detector | Action | Evidence |
|---|---|---|---|
| Control loop overrun deadline missed |
Time monitor / windowed watchdog | Enter degraded mode or inhibit output authority; prevent repeated unsafe scheduling. | Log overrun marker + loop ID + timing context. |
| Lockstep mismatch compute divergence |
Comparator → IRQ/NMI | Immediate safety handler; inhibit outputs; optional controlled reset depending on policy. | Log mismatch cause + action + ordering relative to output gating. |
| Watchdog timeout stalled execution |
External / independent watchdog | Force deterministic reset or safe-state transition; avoid “half-alive” continuation. | Log reset cause and last valid heartbeat timestamp (if available). |
| ECC uncorrected data integrity fail |
ECC logic + thresholds | Isolate affected region or transition to safe mode; block publishing corrupted frames. | Log uncorrected count snapshot + threshold crossing event. |
| Brownout detected voltage unsafe |
Supervisor BOD / PG | Hold reset and gate outputs; restart with supervised sequence, not partial recovery. | Log PG history markers and brownout cause flag. |
| Repeated transient faults trend indicates worsening |
Counters + rate thresholds | Escalate from retry to degrade; avoid endless reboot loops. | Log fault rate, escalation decision, and final operating mode. |
FDIR should treat “reset” as one tool among several. The primary goal is to keep outputs safe and make every transition explainable.
Latched vs retryable
- Latched faults: remain active until explicit conditions are met; protects against repeated unsafe oscillations.
- Retryable faults: allow controlled restart attempts with a bounded retry budget and escalation on repetition.
Degraded mode vs safe output
- Degraded mode: reduces authority and limits outputs while continuing operation with stricter supervision.
- Safe output: inhibits or forces a safe command state when correct behavior cannot be guaranteed.
H2-7 · Redundancy & voting: dual/triple FCC channels without hidden coupling
Redundancy is only effective when failures remain independent. The fastest way to defeat a dual/triple FCC architecture is hidden coupling: shared power, clock, reset, or software commonality that turns independent channels into a single common-cause failure. This section focuses on decoupling, cross-monitoring, and voting/selection behavior that can be verified.
2oo2 (two-out-of-two)
- Both channels must agree before authority is granted.
- Strong at preventing a single wrong channel from driving outputs.
- Highly sensitive to hidden coupling (both can fail together).
2oo3 (two-out-of-three)
- Majority vote tolerates one disagreeing channel.
- Can maintain operation while isolating a suspected channel.
- Still requires independence; coupled failures can defeat majority logic.
1oo2 (one-out-of-two)
- One channel can provide outputs while the other monitors.
- Useful for controlled degradation strategies with tight supervision.
- Requires clear rules for authority transfer and fault evidence.
The purpose here is to explain behavior and verification targets, not to define a specific aircraft architecture.
| Coupling type | Why it breaks redundancy | Decouple methods (concept) | Evidence to prove |
|---|---|---|---|
| Power | A rail disturbance can push all channels into the same fault state at the same time. | Independent monitoring domains; separate supervision paths; avoid single shared “health truth.” | Single-channel power anomaly does not synchronize failure across channels; logs remain distinguishable. |
| Clock | A shared clock failure can create simultaneous timing collapse and watchdog storms. | Independent clock sources or independent clock validation per channel. | One channel’s clock anomaly is detected and isolated without pulling other channels into the same timing fault. |
| Reset | A shared reset tree can cause simultaneous reboot/undefined state across all channels. | Reset domain separation; controlled authority gating; channel-local reset policy. | Channel reset does not force peer resets; reset-cause logs remain channel-specific. |
| Software commonality | The same defect can produce the same wrong result on every channel, defeating agreement checks. | Design for detectability: cross-checking + independent monitors + bounded outputs on anomalies. | A wrong-time or inconsistent-state condition triggers isolation/degradation rather than silent agreement. |
The goal is not “perfect independence,” but eliminating hidden shared dependencies that can collapse all channels together.
Liveness cross-check
- Peer heartbeat freshness (time-since-last-update) to detect stalled peers.
- Detects “not running” and “not updating on time” before voting grants authority.
Consistency cross-check
- Compare a minimal state summary: operating mode, health flags, and authority readiness.
- Detects divergence early and routes disagreements into isolation/degradation logic.
Cross-monitoring is used to identify which channel is no longer trustworthy, not to “prove correctness” of complex application logic.
H2-8 · I/O integrity and timing budget (determinism end-to-end)
End-to-end determinism is not created by a bus name. It comes from a timing budget, monitor points, and a timeout policy that turns “late or inconsistent data” into bounded actions. This section uses a generic chain: sample → compute → publish → confirm, and focuses on budgeting methods rather than protocol details.
CRC / integrity check
- Turns silent bit corruption into a detectable event.
- Failed integrity should block command publishing or force degradation.
Sequence counter
- Detects drops, repeats, and out-of-order updates without protocol detail.
- Enables “freshness” decisions in the control loop.
Timeout / freshness window
- Late data is treated as unsafe data.
- Timeout transitions should be logged and tied to a defined output policy.
Examples like ARINC/AFDX can carry these signals, but the determinism method is independent of any specific bus stack.
1) Sample window
- Cap: sampling must complete within a bounded acquisition window.
- Monitor: timestamp + sequence freshness.
- Action: mark stale, enter suspect or degrade; record reason.
2) Compute window
- Cap: control compute must finish before the output publish deadline.
- Monitor: loop overrun monitor; watchdog window compliance.
- Action: degrade authority or inhibit publishing on repeated overruns.
3) Output publish window
- Cap: outputs must be published atomically and at a bounded time.
- Monitor: integrity check on command frame + publish timestamp.
- Action: block publish on integrity failure; keep safe-state output.
4) Confirm / feedback window
- Cap: confirmation must arrive before a timeout threshold.
- Monitor: confirm freshness + sequence progress.
- Action: escalate to degrade or safe mode on missing/late confirmation.
The budget is complete only when each segment has a monitor point, a bounded action, and evidence logged for traceability.
H2-9 · Event logging & evidence: what to record to prove safety and catch latent faults
An FCC log is not a “debug trace.” It is an evidence chain that explains what happened, why it triggered, and what action was taken. The most valuable field outcome is the ability to reconstruct an incident from a small set of consistent records and prove the fix with repeatable evidence.
Power / Reset evidence
- Reset cause (e.g., supervised restart, brownout marker, watchdog)
- Rail PG drops / sequencing abort markers
- Thermal warnings / overtemp trips
Timing / determinism evidence
- Task overrun / deadline misses
- Loop timing anomalies (jitter out-of-window, concept)
- Timeout escalations (from suspect to isolate/degrade)
Integrity / redundancy evidence
- ECC corrected counter increases and threshold crossings
- ECC uncorrected events (must be explicit)
- Voting mismatch and channel isolation/authority transfer
- Watchdog trip and recovery path taken
These events directly support proof of mechanisms described earlier: power supervision, watchdog policy, ECC observability, timing budgets, and redundant channel voting.
| Field | What it means | Why it is required (evidence value) |
|---|---|---|
| EventID | Event type enum (reset, watchdog, ECC, PG drop, overrun, vote mismatch). | Makes incidents searchable and classifiable; enables consistent reporting across builds. |
| Severity | Info / Warn / Fault classification. | Separates noise from safety-relevant signals and supports trend thresholds. |
| Timestamp | Monotonic time or synchronized time tag (concept-level). | Enables causal ordering and timing budget proof (late vs on-time). |
| ChannelID | Channel A/B/C (when redundant). | Prevents “merged truth”; proves independence and helps isolate common-cause patterns. |
| Mode / FDIR state | Normal / Suspect / Isolate / Degrade / Safe. | Links detection to controlled action; demonstrates the fault-handling state machine. |
| Context snapshot | Small snapshot: counters + state flags relevant to the EventID. | Transforms “an event happened” into “why it happened” without requiring full debug traces. |
| ActionTaken | Output inhibit, supervised restart, channel isolation, degrade, safe mode entry. | Proves the system reacted in a bounded and defined way, not an uncontrolled behavior. |
| CorrelationID | Incident chain identifier linking related events. | Reconstructs multi-step chains (e.g., PG drop → brownout marker → reset → startup checks). |
| SeqNo | Append-only record sequence number. | Detects missing records, wrap-around issues, and preserves ordering under stress. |
A practical rule: each safety-relevant event must be explainable using EventID + Timestamp + Mode + ActionTaken, with Context snapshot for fast root cause.
Append-only
- Records are added, not rewritten.
- Preserves incident truth and avoids silent “history edits.”
Sequence number
- Each record increments SeqNo.
- Makes drops and ordering faults detectable.
Power-fail safe write
- Critical events maximize retention under power loss.
- Prevents the worst case: no evidence after a power incident.
These are design principles that can be validated without tying the discussion to any specific storage device or file system.
Step 1 — Grab
- Input: maintenance readout + last incident window by Timestamp / CorrelationID.
- Output: a compact chain of related records (not a raw dump).
Step 2 — Attribute
- Input: chain pattern (e.g., PG drop → reset cause → startup incomplete).
- Output: root category (power/reset vs timing vs integrity vs voting) with a trigger hypothesis.
Step 3 — Fix & prove
- Input: corrective change (policy threshold, supervision rule, bounded action).
- Output: repeated scenario produces a safer outcome (degrade/safe) and a cleaner evidence chain.
The goal is reproducible evidence: the system should show a different, controlled timeline after the fix.
H2-10 · Verification checklist: what proves an FCC design is done
“Done” means evidence closure: under faults and boundary conditions, the FCC detects issues, enters a controlled state, and produces logs that prove both the trigger and the action. The checklist below is layered to match real workflows: development validation, production test, and field self-check.
Each line follows the same structure: condition → expected action → evidence output. This makes verification results easy to trace and compare.
Field checks focus on catching latent faults early and proving behavior over time, not only at the moment of a lab test.
H2-11 · FAQs (Flight Control Computer · FCC)
These FAQs clarify what the FCC safety mechanisms really cover, what can still escape, and how to verify behavior using bounded actions and evidence (counters, state transitions, and event logs).